Related
I was working on linear probing. Which hashes the values on mod of table size and wrote some code for it.
public class LinearProbing
{
private int table[];
private int size;
LinearProbing(int size)
{
this.size=size;
table=new int[size];
}
public void hash(int value)
{
int key=value%size;
while(table[key]!=0)
{
key++;
if(key==size)
{
key=0;
}
}
table[key]=value;
}
public void display()
{
for(int i=0;i<size;i++)
{
System.out.println(i+"->"+table[i]);
}
}
}
It works fine for every value except zero(0). When zero is in values to be hashed, as in java array each index is initially initiated with zero. Checking with zero to see whether the index is free or not causing trouble if zero is to be hashed and can be overwritten. I also checked with equality with null but it raises an error type mismatch.
Does anyone have any suggestion?
Computers don't work that way, at least, not without paying a rather great cost.
Specifically, a new int[10] quite literally just creates a contiguous block of memory that is precisely large enough to hold 10 int variables, and not a bit more than that. Specifically, each int will cover 32 bits, and those bits can be used to represent precisely 2^32 different things. Think about it: If I give you a panel of 3 light switches, and all you get to do is walk in, flip some switches, and walk back out again, then I walk in and I get to look at what you have been flipping, and that is all the communication channel we ever get, we can pre-arrange for 8 different signals. Why 8? Because that's 2^3. A bit is like that lightswitch. It's on, or off. There is no other option, and there is no 'unset'. There is no way to represent 'oh, you have not been in the room yet' unless we 'spend' one of our 8 different arrangements on this signal, leaving only 7 left.
Thus, if you want each 'int' to also know 'whether it has been set or not', and for 'not set yet' to be different from any of the valid values, you need an entire new bit, and given that modern CPUs don't like doing work on sub-word units, that one bit is excessively expensive. In either case, you have to program it.
For example:
private int table[];
private int set[];
LinearProbing(int size) {
this.size = size;
this.table = new int[size];
this.set = new int[(size + 31) / 32];
}
boolean isSet(int idx) {
int setIdx = idx / 32;
int bit = idx % 32;
return this.set[setIdx] >>> bit != 0;
}
private void markAsSet(int idx) {
int setIdx = idx / 32;
int bit = idx % 32;
this.set[setIdx] |= (1 << bit);
}
This rather complex piece of machinery 'packs' that additional 'is it set?' bit into a separate array called set, which we can get away with making 1/32nd the size of the whole thing, as each int contains 32 bits and we just need 1 bit to mark an index slot as 'unset'. Unfortunately, this means we need to do all sorts of 'bit wrangling', and thus we're using the bitwise OR operator (|=), and bit shifts (<< and >>) to isolate the right bit.
This is why, usually, this is not the way, bit wrangling isn't cheap.
It's a much, much better idea to take away exactly one of the 2^32 different values a hash can be. You could choose 0, but you can also choose some arbitrarily chosen value; there is a very minor benefit to picking a large prime number. Let's say 7549.
Now all you need to do is decree a certain algorithm: The practical hash of a value is derived from this formula:
If the actual hash is 7549 specifically, we say the practical hash is 6961. Yes, that means 6961 will occur more often.
If the actual hash is anything else, including 6961, the practical hash is identical.
Tada: This algorithm means '7549' is free. No practical hash can ever be 7549. That means we can now use 7549 as marker as meaning 'unset'.
The fact that 6961 is now doubled up is technically not relevant: Any hash bucket system cannot just state that equal hashes means equal objects - after all, there are only 2^32 hashes, so collisions are mathematically impossible to avoid. That's why e.g. java's own HashMap doesn't JUST compare hashes - it also calls .equals. If you shove 2 different (as in, not .equals) objects in the same map that so happen to hash to the same value, HashMap is fine with it. Hence, having more conflicts around 6961 is not particularly relevant.
The additional cost associated with the additional chance of collision on 6961 is vastly less than the additional cost associated with keeping track of which buckets have been set or not. After all, assuming good hash distribution, our transformation algorithm that frees up 7549 means 1 in 4 billion items happens to collide twice more likely. That's... an infinitesimal occurrence on top of another infinitesimal, it's not going to matter.
NB: 6961 and 7549 are randomly chosen prime numbers. Prime numbers are merely slightly less likely to collide, it's not crucial that you pick primes here.
I'm developing a new data structure that theoretically is more efficient than a hashmap. It does this by having an O(1) resize when there is a collision. The issue is that when inserting data(the metric I care most about), it is slightly slower than a hashmap when it should be significantly faster.
here is all the code used in the insert:
private void insert(Pair data, SwitchArray currTable){
if (currTable.isExpanded == false && currTable.iValue==null) { //checks the very first iValue
currTable.iValue = data;
return;
}
else if (!currTable.isExpanded) {// if there is a new colision
SwitchArray[] x = new SwitchArray[currTable.primeArray[currTable.depth]];
currTable.sA = x;
Integer index= Math.abs(data.key.hashCode()) % currTable.sA.length;
currTable.sA[index] = new SwitchArray(data, currTable.depth+1);
currTable.isExpanded = true;
insert(currTable.iValue,currTable);
currTable.iValue = null;
}
else{ // if expanded
Integer index= Math.abs(data.key.hashCode()) % currTable.sA.length;
if (currTable.sA[index] == null){
currTable.sA[index] = new SwitchArray(data, currTable.depth+1); //this updates ivalue in the constructor
} else {
currTable = currTable.sA[index];
insert(data,currTable);//go one level deeper
}
}
}
these are the two subclasses I refrence
class SwitchArray{
int depth;
int length;
SwitchArray[] sA;
Pair iValue;
int[] primeArray = new int[]{7,11,13,17,19,23,29,31,37};
boolean isExpanded = false;
public SwitchArray(Pair iValue, int depth){
this.iValue = iValue;
this.depth = depth;
length = primeArray[depth];
if (iValue != null)
iValue.myDepth = depth;
}
}
class Pair{
String key;
Integer value;
int myDepth;
public Pair(String key, Integer value){
this.key=key;
this.value=value;
myDepth = -1;
}
public String toString(){
return "( " + key + ", " + value + " | depth: " + myDepth + ")";
}
}
here is the code in its entirety
I have tested the efficiency by adding varying amounts of data (from 1 pair all the way until I got a Java heap space error) to both hashmaps and my MDHT, and graphed them by using excel. Consistantly MDHTs are slightly slower.
(I would like to also add that this is just a fun project I am doing, not trying to overthrow hashmaps or anything.)
So the question I ask you is how do I fix it or slightly improve it at least?
new SwitchArray[currTable.primeArray[currTable.depth]];
This is relatively slow as it needs to clear out the new array. You can't opt of this, although hotspot tends to recognize any array whose values are almost immediately guaranteed entirely filled up, and omits the initial writing of zeroes into the heap for it. This doesn't apply here and isn't an optimization that seems possible to add here.
insert
This method is recursive, and the number of times it recurses is related to the amount of collisions you have, therefore, it isn't O(1).
So the question I ask you is how do I fix it or slightly improve it at least?
HashMap wasn't written by some random moron. It's possibly not perfect but a rote algorithmic complexity improvement is not available. You may be able to build a theoretical improvement in basic opcode count, but this is extremely unlikely to beat hashmap. The reason? Hotspot.
The hotspot engine is a gigantic pattern matcher. It finds patterns that it knows how to optimize and optimizes them. Whilst it does all sorts of magic in order to recognize as many patterns as it can, there is one simple fundamental truth: It recognizes idiomatic java. This library of patterns to optimize isn't built based on 'what sequence of opcodes can I optimize?'. It's built on a much simpler notion than that: 'Which sequence of opcodes is commonly observed in java code?'
In other words, commonly used patterns are better optimized. And HashMap is very commonly used. Hence:
Your notion that you can do O(1) insertion when there are collisions is certainly possible, but you can't guarantee O(1) lookup by fundamental definitions. However, as a general rule, as long as you aren't overloading on collisions, that isn't the controlling performance issue. At small n, an O(n) algorithm and an O(n^2) algorithm are simply unrelated. The algorithmically slower algorithm will beat the faster one, or not - the point is, the algorithmic complexity is completely meaningless until n is 'large enough'. When is 'large enough'? Depends on the hardware, the algorithm, the data, and the phase of the moon - the point of big-O notation isn't to predict when 'large enough' is reached, merely to posit that there is SOME n, could be incredulously large, when the algorithmic complexity 'takes over' and accurately predicts the faster algorithm. Point is, with hashmaps, most likely either:
[A] This is an academic case where you add thousands of objects with clashing hashcodes. Who gives a piddle what the performance of anything is at this point? The fix is to address the broken hash impl, not to futz about trying to shine the turd. lookups are guaranteed to be O(n) in this case and the primary point of a hashmap is to faster than that. Just use ArrayList in this case, you can't beat its performance then. It has O(1) inserts and O(n) lookups. Besides, your code will just crash if you try; your buckets are limited to at most 37 items. A map with 37 items in it is far to the left of that magical fulcrum point where 'n' becomes relevant.
[B] There aren't a ton of collisions. n is simply not large enough for algorithmic complexity to matter.
And also:
Trying to improve on things by just 'writing it slightly more optimized' is doomed to failure: The 'judge' (the hotspot VM) is biased because HashMap is so common, all hotspot implementations are designed to recognize the bytecode in j.u.HashMap and optimize it. You may be able to do some theoretic improvements but they will be small; too small to outweigh the penalty of this biased judge.
CONCLUSION: It's not possible to improve HashMap's performance without adding significant caveats to the data you intend to store in your BetterHashMap. In other words, any generalized hashmap that is significantly better than j.u.HM in some regards and not significantly worse in others is an extraordinary job and likely impossible.
I have gone through Google and Stack Overflow search, but nowhere I was able to find a clear and straightforward explanation for how to calculate time complexity.
What do I know already?
Say for code as simple as the one below:
char h = 'y'; // This will be executed 1 time
int abc = 0; // This will be executed 1 time
Say for a loop like the one below:
for (int i = 0; i < N; i++) {
Console.Write('Hello, World!!');
}
int i=0; This will be executed only once.
The time is actually calculated to i=0 and not the declaration.
i < N; This will be executed N+1 times
i++ This will be executed N times
So the number of operations required by this loop are {1+(N+1)+N} = 2N+2. (But this still may be wrong, as I am not confident about my understanding.)
OK, so these small basic calculations I think I know, but in most cases I have seen the time complexity as O(N), O(n^2), O(log n), O(n!), and many others.
How to find time complexity of an algorithm
You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
For example, lets see how we simplify 2N + 2 machine instructions to describe this as just O(N).
Why do we remove the two 2s ?
We are interested in the performance of the algorithm as N becomes large.
Consider the two terms 2N and 2.
What is the relative influence of these two terms as N becomes large? Suppose N is a million.
Then the first term is 2 million and the second term is only 2.
For this reason, we drop all but the largest terms for large N.
So, now we have gone from 2N + 2 to 2N.
Traditionally, we are only interested in performance up to constant factors.
This means that we don't really care if there is some constant multiple of difference in performance when N is large. The unit of 2N is not well-defined in the first place anyway. So we can multiply or divide by a constant factor to get to the simplest expression.
So 2N becomes just N.
This is an excellent article: Time complexity of algorithm
The below answer is copied from above (in case the excellent link goes bust)
The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
statement;
Is constant. The running time of the statement will not change in relation to N.
for ( i = 0; i < N; i++ )
statement;
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
while ( low <= high ) {
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
void quicksort (int list[], int left, int right)
{
int pivot = partition (list, left, right);
quicksort(list, left, pivot - 1);
quicksort(list, pivot + 1, right);
}
Is N * log (N). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( <type> ) where <type> is the measure. The quicksort algorithm would be described as O (N * log(N )).
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course. ;)
Taken from here - Introduction to Time Complexity of an Algorithm
1. Introduction
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.
2. Big O notation
The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. When expressed this way, the time complexity is said to be described asymptotically, i.e., as the input size goes to infinity.
For example, if the time required by an algorithm on all inputs of size n is at most 5n3 + 3n, the asymptotic time complexity is O(n3). More on that later.
A few more examples:
1 = O(n)
n = O(n2)
log(n) = O(n)
2 n + 1 = O(n)
3. O(1) constant time:
An algorithm is said to run in constant time if it requires the same amount of time regardless of the input size.
Examples:
array: accessing any element
fixed-size stack: push and pop methods
fixed-size queue: enqueue and dequeue methods
4. O(n) linear time
An algorithm is said to run in linear time if its time execution is directly proportional to the input size, i.e. time grows linearly as input size increases.
Consider the following examples. Below I am linearly searching for an element, and this has a time complexity of O(n).
int find = 66;
var numbers = new int[] { 33, 435, 36, 37, 43, 45, 66, 656, 2232 };
for (int i = 0; i < numbers.Length - 1; i++)
{
if(find == numbers[i])
{
return;
}
}
More Examples:
Array: Linear Search, Traversing, Find minimum etc
ArrayList: contains method
Queue: contains method
5. O(log n) logarithmic time:
An algorithm is said to run in logarithmic time if its time execution is proportional to the logarithm of the input size.
Example: Binary Search
Recall the "twenty questions" game - the task is to guess the value of a hidden number in an interval. Each time you make a guess, you are told whether your guess is too high or too low. Twenty questions game implies a strategy that uses your guess number to halve the interval size. This is an example of the general problem-solving method known as binary search.
6. O(n2) quadratic time
An algorithm is said to run in quadratic time if its time execution is proportional to the square of the input size.
Examples:
Bubble Sort
Selection Sort
Insertion Sort
7. Some useful links
Big-O Misconceptions
Determining The Complexity Of Algorithm
Big O Cheat Sheet
Several examples of loop.
O(n) time complexity of a loop is considered as O(n) if the loop variables is incremented / decremented by a constant amount. For example following functions have O(n) time complexity.
// Here c is a positive integer constant
for (int i = 1; i <= n; i += c) {
// some O(1) expressions
}
for (int i = n; i > 0; i -= c) {
// some O(1) expressions
}
O(nc) time complexity of nested loops is equal to the number of times the innermost statement is executed. For example, the following sample loops have O(n2) time complexity
for (int i = 1; i <=n; i += c) {
for (int j = 1; j <=n; j += c) {
// some O(1) expressions
}
}
for (int i = n; i > 0; i += c) {
for (int j = i+1; j <=n; j += c) {
// some O(1) expressions
}
For example, selection sort and insertion sort have O(n2) time complexity.
O(log n) time complexity of a loop is considered as O(log n) if the loop variables is divided / multiplied by a constant amount.
for (int i = 1; i <=n; i *= c) {
// some O(1) expressions
}
for (int i = n; i > 0; i /= c) {
// some O(1) expressions
}
For example, [binary search][3] has _O(log n)_ time complexity.
O(log log n) time complexity of a loop is considered as O(log log n) if the loop variables is reduced / increased exponentially by a constant amount.
// Here c is a constant greater than 1
for (int i = 2; i <=n; i = pow(i, c)) {
// some O(1) expressions
}
//Here fun is sqrt or cuberoot or any other constant root
for (int i = n; i > 0; i = fun(i)) {
// some O(1) expressions
}
One example of time complexity analysis
int fun(int n)
{
for (int i = 1; i <= n; i++)
{
for (int j = 1; j < n; j += i)
{
// Some O(1) task
}
}
}
Analysis:
For i = 1, the inner loop is executed n times.
For i = 2, the inner loop is executed approximately n/2 times.
For i = 3, the inner loop is executed approximately n/3 times.
For i = 4, the inner loop is executed approximately n/4 times.
…………………………………………………….
For i = n, the inner loop is executed approximately n/n times.
So the total time complexity of the above algorithm is (n + n/2 + n/3 + … + n/n), which becomes n * (1/1 + 1/2 + 1/3 + … + 1/n)
The important thing about series (1/1 + 1/2 + 1/3 + … + 1/n) is around to O(log n). So the time complexity of the above code is O(n·log n).
References:
1
2
3
Time complexity with examples
1 - Basic operations (arithmetic, comparisons, accessing array’s elements, assignment): The running time is always constant O(1)
Example:
read(x) // O(1)
a = 10; // O(1)
a = 1,000,000,000,000,000,000 // O(1)
2 - If then else statement: Only taking the maximum running time from two or more possible statements.
Example:
age = read(x) // (1+1) = 2
if age < 17 then begin // 1
status = "Not allowed!"; // 1
end else begin
status = "Welcome! Please come in"; // 1
visitors = visitors + 1; // 1+1 = 2
end;
So, the complexity of the above pseudo code is T(n) = 2 + 1 + max(1, 1+2) = 6. Thus, its big oh is still constant T(n) = O(1).
3 - Looping (for, while, repeat): Running time for this statement is the number of loops multiplied by the number of operations inside that looping.
Example:
total = 0; // 1
for i = 1 to n do begin // (1+1)*n = 2n
total = total + i; // (1+1)*n = 2n
end;
writeln(total); // 1
So, its complexity is T(n) = 1+4n+1 = 4n + 2. Thus, T(n) = O(n).
4 - Nested loop (looping inside looping): Since there is at least one looping inside the main looping, running time of this statement used O(n^2) or O(n^3).
Example:
for i = 1 to n do begin // (1+1)*n = 2n
for j = 1 to n do begin // (1+1)n*n = 2n^2
x = x + 1; // (1+1)n*n = 2n^2
print(x); // (n*n) = n^2
end;
end;
Common running time
There are some common running times when analyzing an algorithm:
O(1) – Constant time
Constant time means the running time is constant, it’s not affected by the input size.
O(n) – Linear time
When an algorithm accepts n input size, it would perform n operations as well.
O(log n) – Logarithmic time
Algorithm that has running time O(log n) is slight faster than O(n). Commonly, algorithm divides the problem into sub problems with the same size. Example: binary search algorithm, binary conversion algorithm.
O(n log n) – Linearithmic time
This running time is often found in "divide & conquer algorithms" which divide the problem into sub problems recursively and then merge them in n time. Example: Merge Sort algorithm.
O(n2) – Quadratic time
Look Bubble Sort algorithm!
O(n3) – Cubic time
It has the same principle with O(n2).
O(2n) – Exponential time
It is very slow as input get larger, if n = 1,000,000, T(n) would be 21,000,000. Brute Force algorithm has this running time.
O(n!) – Factorial time
The slowest!!! Example: Travelling salesman problem (TSP)
It is taken from this article. It is very well explained and you should give it a read.
When you're analyzing code, you have to analyse it line by line, counting every operation/recognizing time complexity. In the end, you have to sum it to get whole picture.
For example, you can have one simple loop with linear complexity, but later in that same program you can have a triple loop that has cubic complexity, so your program will have cubic complexity. Function order of growth comes into play right here.
Let's look at what are possibilities for time complexity of an algorithm, you can see order of growth I mentioned above:
Constant time has an order of growth 1, for example: a = b + c.
Logarithmic time has an order of growth log N. It usually occurs when you're dividing something in half (binary search, trees, and even loops), or multiplying something in same way.
Linear. The order of growth is N, for example
int p = 0;
for (int i = 1; i < N; i++)
p = p + 2;
Linearithmic. The order of growth is n·log N. It usually occurs in divide-and-conquer algorithms.
Cubic. The order of growth is N3. A classic example is a triple loop where you check all triplets:
int x = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
x = x + 2
Exponential. The order of growth is 2N. It usually occurs when you do exhaustive search, for example, check subsets of some set.
Loosely speaking, time complexity is a way of summarising how the number of operations or run-time of an algorithm grows as the input size increases.
Like most things in life, a cocktail party can help us understand.
O(N)
When you arrive at the party, you have to shake everyone's hand (do an operation on every item). As the number of attendees N increases, the time/work it will take you to shake everyone's hand increases as O(N).
Why O(N) and not cN?
There's variation in the amount of time it takes to shake hands with people. You could average this out and capture it in a constant c. But the fundamental operation here --- shaking hands with everyone --- would always be proportional to O(N), no matter what c was. When debating whether we should go to a cocktail party, we're often more interested in the fact that we'll have to meet everyone than in the minute details of what those meetings look like.
O(N^2)
The host of the cocktail party wants you to play a silly game where everyone meets everyone else. Therefore, you must meet N-1 other people and, because the next person has already met you, they must meet N-2 people, and so on. The sum of this series is x^2/2+x/2. As the number of attendees grows, the x^2 term gets big fast, so we just drop everything else.
O(N^3)
You have to meet everyone else and, during each meeting, you must talk about everyone else in the room.
O(1)
The host wants to announce something. They ding a wineglass and speak loudly. Everyone hears them. It turns out it doesn't matter how many attendees there are, this operation always takes the same amount of time.
O(log N)
The host has laid everyone out at the table in alphabetical order. Where is Dan? You reason that he must be somewhere between Adam and Mandy (certainly not between Mandy and Zach!). Given that, is he between George and Mandy? No. He must be between Adam and Fred, and between Cindy and Fred. And so on... we can efficiently locate Dan by looking at half the set and then half of that set. Ultimately, we look at O(log_2 N) individuals.
O(N log N)
You could find where to sit down at the table using the algorithm above. If a large number of people came to the table, one at a time, and all did this, that would take O(N log N) time. This turns out to be how long it takes to sort any collection of items when they must be compared.
Best/Worst Case
You arrive at the party and need to find Inigo - how long will it take? It depends on when you arrive. If everyone is milling around you've hit the worst-case: it will take O(N) time. However, if everyone is sitting down at the table, it will take only O(log N) time. Or maybe you can leverage the host's wineglass-shouting power and it will take only O(1) time.
Assuming the host is unavailable, we can say that the Inigo-finding algorithm has a lower-bound of O(log N) and an upper-bound of O(N), depending on the state of the party when you arrive.
Space & Communication
The same ideas can be applied to understanding how algorithms use space or communication.
Knuth has written a nice paper about the former entitled "The Complexity of Songs".
Theorem 2: There exist arbitrarily long songs of complexity O(1).
PROOF: (due to Casey and the Sunshine Band). Consider the songs Sk defined by (15), but with
V_k = 'That's the way,' U 'I like it, ' U
U = 'uh huh,' 'uh huh'
for all k.
For the mathematically-minded people: The master theorem is another useful thing to know when studying complexity.
O(n) is big O notation used for writing time complexity of an algorithm. When you add up the number of executions in an algorithm, you'll get an expression in result like 2N+2. In this expression, N is the dominating term (the term having largest effect on expression if its value increases or decreases). Now O(N) is the time complexity while N is dominating term.
Example
For i = 1 to n;
j = 0;
while(j <= n);
j = j + 1;
Here the total number of executions for the inner loop are n+1 and the total number of executions for the outer loop are n(n+1)/2, so the total number of executions for the whole algorithm are n + 1 + n(n+1/2) = (n2 + 3n)/2.
Here n^2 is the dominating term so the time complexity for this algorithm is O(n2).
Other answers concentrate on the big-O-notation and practical examples. I want to answer the question by emphasizing the theoretical view. The explanation below is necessarily lacking in details; an excellent source to learn computational complexity theory is Introduction to the Theory of Computation by Michael Sipser.
Turing Machines
The most widespread model to investigate any question about computation is a Turing machine. A Turing machine has a one dimensional tape consisting of symbols which is used as a memory device. It has a tapehead which is used to write and read from the tape. It has a transition table determining the machine's behaviour, which is a fixed hardware component that is decided when the machine is created. A Turing machine works at discrete time steps doing the following:
It reads the symbol under the tapehead.
Depending on the symbol and its internal state, which can only take finitely many values, it reads three values s, σ, and X from its transition table, where s is an internal state, σ is a symbol, and X is either Right or Left.
It changes its internal state to s.
It changes the symbol it has read to σ.
It moves the tapehead one step according to the direction in X.
Turing machines are powerful models of computation. They can do everything that your digital computer can do. They were introduced before the advent of digital modern computers by the father of theoretical computer science and mathematician: Alan Turing.
Time Complexity
It is hard to define the time complexity of a single problem like "Does white have a winning strategy in chess?" because there is a machine which runs for a single step giving the correct answer: Either the machine which says directly 'No' or directly 'Yes'. To make it work we instead define the time complexity of a family of problems L each of which has a size, usually the length of the problem description. Then we take a Turing machine M which correctly solves every problem in that family. When M is given a problem of this family of size n, it solves it in finitely many steps. Let us call f(n) the longest possible time it takes M to solve problems of size n. Then we say that the time complexity of L is O(f(n)), which means that there is a Turing machine which will solve an instance of it of size n in at most C.f(n) time where C is a constant independent of n.
Isn't it dependent on the machines? Can digital computers do it faster?
Yes! Some problems can be solved faster by other models of computation, for example two tape Turing machines solve some problems faster than those with a single tape. This is why theoreticians prefer to use robust complexity classes such as NL, P, NP, PSPACE, EXPTIME, etc. For example, P is the class of decision problems whose time complexity is O(p(n)) where p is a polynomial. The class P do not change even if you add ten thousand tapes to your Turing machine, or use other types of theoretical models such as random access machines.
A Difference in Theory and Practice
It is usually assumed that the time complexity of integer addition is O(1). This assumption makes sense in practice because computers use a fixed number of bits to store numbers for many applications. There is no reason to assume such a thing in theory, so time complexity of addition is O(k) where k is the number of bits needed to express the integer.
Finding The Time Complexity of a Class of Problems
The straightforward way to show the time complexity of a problem is O(f(n)) is to construct a Turing machine which solves it in O(f(n)) time. Creating Turing machines for complex problems is not trivial; one needs some familiarity with them. A transition table for a Turing machine is rarely given, and it is described in high level. It becomes easier to see how long it will take a machine to halt as one gets themselves familiar with them.
Showing that a problem is not O(f(n)) time complexity is another story... Even though there are some results like the time hierarchy theorem, there are many open problems here. For example whether problems in NP are in P, i.e. solvable in polynomial time, is one of the seven millennium prize problems in mathematics, whose solver will be awarded 1 million dollars.
After looking the Fork/Join Tutorial, I created a class for computing large factorials:
public class ForkFactorial extends RecursiveTask<BigInteger> {
final int end;
final int start;
private static final int THRESHOLD = 10;
public ForkFactorial(int n) {
this(1, n + 1);
}
private ForkFactorial(int start, int end) {
this.start = start;
this.end = end;
}
#Override
protected BigInteger compute() {
if (end - start < THRESHOLD) {
return computeDirectly();
} else {
int mid = (start + end) / 2;
ForkFactorial lower = new ForkFactorial(start, mid);
lower.fork();
ForkFactorial upper = new ForkFactorial(mid, end);
BigInteger upperVal = upper.compute();
return lower.join().multiply(upperVal);
}
}
private BigInteger computeDirectly() {
BigInteger val = BigInteger.ONE;
BigInteger mult = BigInteger.valueOf(start);
for (int iter = start; iter < end; iter++, mult = mult.add(BigInteger.ONE)) {
val = val.multiply(mult);
}
return val;
}
}
The question I have is how to determine the threshold for which I subdivide the task? I found a page on fork/join parallelism which states:
One of the main things to consider when implementing an algorithm
using fork/join parallelism is chosing the threshold which determines
whether a task will execute a sequential computation rather than
forking parallel sub-tasks.
If the threshold is too large, then the program might not create
enough tasks to fully take advantage of the available
processors/cores.
If the threshold is too small, then the overhead of task creation and
management could become significant.
In general, some experimentation will be necessary to find an
appropriate threshold value.
So what experimentation would I need to do in order to determine the threshold?
PigeonHole estimation: Set an arbitrary Threshold, calculate the computation time.
and based on it increase and decrease the threshold to see if your computation time improves, till the time you see no improvement by lowering the threshold.
Choosing a threshold depends on many factors:
The actual computation should take a reasonable amount of time. If you're summing an array and the array is small then it is probably better to do it sequentially. If the array length is 16M, then splitting it into smaller pieces and parallel processing should be worthwhile. Try it and see.
The number of processors should be sufficient. Doug Lea once documented his framework with the number 16+ processors to make it worthwhile. Even splitting an array in half and running on two threads will produce about a 1.3% gain in throughput. Now you have to consider the split/join overhead. Try running on many configurations to see what you get.
The number of concurrent requests should be small. If you have N processors and 8(N) concurrent requests, then using one thread per request is often more efficient for throughput. The logic here is simple. If you have N processors available and you split your work accordingly but there are hundreds of other tasks ahead of you, then what's the point of splitting?
This is what experimenting means.
Unfortunately, this framework doesn't come with the means for accountability. There is no way to see the load on each thread. The high water mark in deques. Total requests processed. Errors encountered, etc.
Good luck.
Note that arithmetic is not constant time with BigInteger, it is proportional to the length of the inputs. The actual complexity of each operation is not readily at hand, though the futureboy implementation referenced in that Q/A section does document what it (expects) to achieve under different circumstances.
Getting the work estimating function correct is important both when it comes to deciding how to partition the problem into smaller chunks and for determining whether or not a particular chunk is worth dividing again.
When using experimentation to determine your threshold, you need to take care that you do not just benchmark one corner of the problem space.
As I understand, this experiment is an optimization, so it should be applied only when there is a need.
You could experiment on different split strategies - i.e. one can split by two equal parts or by estimated multiplication cost which depends on the integer decimal length.
For each of the strategies you could test as many threshold values as possible for get the full picture of your strategies. If you are limited in CPU resource, than you could test i.e. each 5th or 10th. So, from my experience the first important thing here is to get the full picture of how your algorithm performs.
How can I store a 100K X 100K matrix in Java?
I can't do that with a normal array declaration as it is throwing a java.lang.OutofMemoryError.
The Colt library has a sparse matrix implementation for Java.
You could alternatively use Berkeley DB as your storage engine.
Now if your machine has enough actual RAM (at least 9 gigabytes free), you can increase the heap size in the Java command-line.
If the vast majority of entries in your matrix will be zero (or even some other constant value) a sparse matrix will be suitable. Otherwise it might be possible to rewrite your algorithm so that the whole matrix doesn't exist simultaneously. You could produce and consume one row at a time, for example.
Sounds like you need a sparse matrix. Others have already suggested good 3rd party implementations that may suite your needs...
Depending on your applications, you could get away without a third-party matrix library by just using a Map as a backing-store for your matrix data. Kind of...
public class SparseMatrix<T> {
private T defaultValue;
private int m;
private int n;
private Map<Integer, T> data = new TreeMap<Integer, T>();
/// create a new matrix with m rows and n columns
public SparseMatrix(int m, int n, T defaultValue) {
this.m = m;
this.n = n;
this.defaultValue = defaultValue;
}
/// set value at [i,j] (row, col)
public void setValueAt(int i, int j, T value) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
data.put(i * n + j, value);
}
/// retrieve value at [i,j] (row, col)
public T getValueAt(int i, int j) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
T value = data.get(i * n + j);
return value != null ? value : defaultValue;
}
}
A simple test-case illustrating the SparseMatrix' use would be:
public class SparseMatrixTest extends TestCase {
public void testMatrix() {
SparseMatrix<Float> matrix =
new SparseMatrix<Float>(100000, 100000, 0.0F);
matrix.setValueAt(1000, 1001, 42.0F);
assertTrue(matrix.getValueAt(1000,1001) == 42.0);
assertTrue(matrix.getValueAt(1001,1000) == 0.0);
}
}
This is not the most efficient way of doing it because every non-default entry in the matrix is stored as an Object. Depending on the number of actual values you are expecting, the simplicity of this approach might trump integrating a 3rd-party solution (and possibly dealing with its License - again, depending on your situation).
Adding matrix-operations like multiplication to the above SparseMatrix implementation should be straight-forward (and is left as an exercise for the reader ;-)
100,000 x 100,000 = 10,000,000,000 (10 billion) entries. Even if you're storing single byte entries, that's still in the vicinity of 10 GB - does your machine even have that much physical memory, let alone have a will to allocate that much to a single process?
Chances are you're going to need to look into some kind of a way to only keep part of the matrix in memory at any given time, and the rest buffered on disk.
There are a number possible solutions depending on how much memory you have, how sparse the array actually is, and what the access patterns are going to be.
If the calculation of 100K * 100K * 8 is less than the amount of physical memory on your machine for use by the JVM, a simple non-sparse array is viable solution.
If the array is sparse, with (say) 75% or more of the elements being zero, then you can save space by using a sparse array library. Various alternatives have been suggested, but in all cases, you still need to work out if this is going to give you enough savings. Figure out how many non-zero elements there are going to be, multiply that by 8 (to give you doubles) and (say) 4 to account for the overheads of the sparse array. If that is less than the amount of physical memory that you can make available to the JVM, then sparse arrays are a viable solution.
If sparse and non-sparse arrays (in memory) won't work, things will get more complicated, and the viability of any solution will depend on the access patterns for the array data.
One approach is to represent the array as a file that is mapped into memory in the form of a MappedByteBuffer. Assuming that you don't have enough physical memory to store the entire file in memory, you are going to be hitting the virtual memory system hard. So it is best if your algorithm only needs to operate on contiguous sections of the array at any time. Otherwise, you'll probably die from swapping.
A second approach is a variation of the first. Map the array/file a section at a time, and when you are done, unmap and move to the next section. This only works if the algorithm works on the array in sections.
A third approach is to represent the array using a light-weight database like BDB. This will be slower than any in-memory solution because reading array elements will translate into disc accesses. But if you get it wrong it won't kill the system like the memory mapped approach will. (And if you do this on Linux/Unix, the system's disc block cache may speed things up, depending on your algorithm's array access patterns)
A fourth approach is to use a distributed memory cache. This replaces disc i/o with network i/o, and it is hard to say whether this is a good or bad thing.
A fifth approach is to analyze your algorithm and see if it is amenable to implementing as a distributed algorithm; e.g. with sections of the array and corresponding parts of the algorithm on different machines.
You can upgrade to this machine:
http://www.azulsystems.com/products/compute_appliance.htm
864 processor cores and 768 GB of memory, only costs a single family house somewhere.
Well, I'd suggest that you increase the memory in your jvm but you've going to need a lot of memory, as you're talking about 10 billion items. It's (barely) possible with lots of memory or a clustered jvm, but that's probably the wrong answer.
You're getting the outOfmemory because if you declare int[1000], the memory is allocated immediately (additionally doubles take up more space than ints-an int representation will also save you space). Maybe you can substitute a more efficient implementation of your array (if you have many empty entries lookup "sparse matrix" representations).
You could store pieces in an outside system, like memcached or memory-mapped buffers.
There are lots of good suggestions here, maybe if you posted a more detailed description of the problem you're trying to solve people could be more specific.
You should try an "external" package to handle matrices, I never did that though, maybe something like jama.
Unless you have 100K x 100K x 8 ~ 80GB of memory, you cannot create this matrix in memory. You can create this matrix on disk and access it using memory mapping. However, using this approach will be very slow.
What are you trying to do? You may find that representing your data in a different way will be much more efficient.