Complexity does not match the actual growth of the runtime? - java

I've been doing a homework sheet for a while now, and there is a massive discrepancy between what I think the asymptomatic complexity is and what the runtime result suggests.
Below is a table for the runtime for the program.
| Input Size | Runtime (Seconds) |
|---------------------|-------------------------|
| 10000 | 0.040533803 |
|---------------------|-------------------------|
| 20000 | 0.154712122 |
|---------------------|-------------------------|
| 30000 | 0.330814060 |
|---------------------|-------------------------|
| 40000 | 0.603440983 |
|---------------------|-------------------------|
| 50000 | 0.969272780 |
|---------------------|-------------------------|
| 60000 | 1.448454467 |
string = "";
newLetter = "a";
for (int i = 500; i < n; i++) {
string = string + newLetter;
}
return string
Why would there be a discrepancy between the complexity of the algorithm and the growth of the runtime apart from me being wrong?
From the runtime results, it looks like the program has a time complexity of O(n2). Doubling the input size seems to increase the runtime by a factor of 4, which would suggest a quadratic function?
But looking at the program itself, I'm 99.9% certain that the time complexity is actually O(n).
Is it possible that there's an extraneous reason for this discrepancy? Is there any possibility that the runtime results are indeed linear?
My best guess for this discrepancy is that the for loop makes the next iteration slower (which, looking at the program makes sense as I think the Java compiler would have to iterate every additional newLetter given) but that would still be linear no? It's not a nested for loop.

The code you've written actually does run in time Θ(n2). The reason has to do with this part of the code:
string = string + newLetter;
Here, your intent is to say "please append a new character to the end of this string." However, what Java actually does is the following:
Evaluate the expression string + newLetter. This makes a brand-new string formed by copying the full contents of string, then appending newLetter to the end of that new string.
Assign the result to string.
The issue here is that step (1) takes longer and longer to execute the longer the string is, since all the characters have to be copied over. In particular, if string has length k, then step (1) takes time Θ(k) because k characters must be copied. Since the length of string matches the current loop iteration index, this means the work done is
Θ(1 + 2 + 3 + ... + n)
= Θ(n(n+1) / 2)
= Θ(n2),
which is why you're seeing the plot you're seeing.
You can speed this up by using StringBuilder rather than String to assemble the larger string. It doesn't make all the intermediary copies as it goes, and instead maintains an internal array that can grow as needed.

Related

Big O. not sure how the update on a for loop affects it

I'm a bit confused as to how an update on a for loop affects the BIG O
on a code like this :
public static void bigO(int n){
for(int i=n; i>1; i=i/2){
for (int j=n; j>1; j=j/2){
sum++;
}
}
}
I'm not sure how the update (j=j/2) would affect it.
The two for loops are independent of each other, so the total complexity should roughly be the product of the complexities of the two loops. Each loop is O(lgN), where lg means "log base 2." So, multiplying this together yields O(lgN*lgN) for the overall complexity.
To understand better whence O(lgN) is, consider an input value of n=16. The outer for loop in i would then have these iterations:
i | iteration #
16 | 1
8 | 2
4 | 3
2 | 4
lg(16) in fact equals 4, because 2^4 = 16, so this confirms the complexity which we expect. You may also test other n values to convince yourself of this. The inner loop in j behaves the same way, and is independent of the outer loop in i.

Determining Time and Space Complexity of program

So I had a coding challenge for an internship and part of it was to determine the space and time complexity of my program. The program was roughly as follows.
while(A){
int[][] grid;
// additional variables
while(B){ //for loop involves iterating through grid
// additional variables
for(...)
for(....)
}
for(...) //for loop involves iterating through grid
for(....)
}
So what I said was that the program overall has time complexity of (AN^2+BN^2), therefore concluding that it has an amortized time of O(N^2).
As for the space complexity, was I supposed to sum the number space used by all variables? Assuming every variable is an int and there is 3 in loop A and two in loop B would the space complexity be (A*24 + B*16)?
To avoid mistakes, I tend to use an approach such that you make a side note for each line representing how many times it gets executed (to be more accurate, you can include both at its best and its worst case).
Taking into consideration the example, the idea may look as follows:
num_exec
| while(A){
A | int[][] grid;
A | additional variables
|
| while(B){ //for loop involves iterating through grid
AB | additional variables
ABN^2 | for(...)
| for(....)
| }
|
AN^2 | for(...) //for loop involves iterating through grid
| for(....)
| }
To estimate your code's time complexity a simple summation of those side noted numbers does the thing (as you may have done yourself, though you have obtained slightly different results than mine):
As for your memory complexity, your intuition is right for an 8-bit integer. However, if we are talking about primitive datatypes, you can simply think of them as constants. Thus, you should be rather concerned about complex datatypes i.e. an array, since it aggregates multiple primitives. To sum up, you take into account data sizes of elements designated to preserve your data.
Consequently, applied on the example:
memory
| while(A){
ANk | int[][] grid;
A3k | additional variables
|
| while(B){ //for loop involves iterating through grid
AB2k | additional variables
| for(...)
| for(....)
| }
|
| for(...) //for loop involves iterating through grid
| for(....)
| }
Supposing the grid size of , a primitive datatype with size of and the total number of additional variables to be 3 in the outer loop followed by 2 in the inner one, the total space complexity adds up to:
Note, to assume the complexities given above and have to be both significantly less than and independent of it at all.
You may be interested in further explanation of the matter provided on this link. Hope that helps (even it is just approximate because of coarser details you've provided) and best of luck!

JNA:How exactly function and its parameter mapping take place from java to VC++

In java i am using JNA to load VC++ dll file and calling functions within it,in one function i need to send 6 parameters,In VC++ function definition i am getting correct values for first 3 parameters but last 3 are having value '0',2nd parameter is byte array,when i send 1024 bytes, i am getting 5th bool parameter as false but when i pass 10 bytes it is taken as true my function prototype :int deviceupload(Pointer p, _byte[] data_, long startaddress, long datalength,_Boolean rt_,Pointer xyz);
So will mapping changes depending upon the size of parameters?
or JNA stack is so small that it cant hold 6 paramters?but according to JNA documentation MAX_NARGS of Function class value is 256 so i think 6 number of parameters is not an issue,even though 3rd and 4th parameters have same data type,in VC++ function definition startaddress is correctly received but datalength received value is 0
so any idea why it is behaving so weird ?
Actually my problem got solved,VC++ function was expecting long which is 4 byte,but in java i was passing long which is 8 bytes,but according to JNA documentation VC++ long equivalent is NativeLong in java ,but we can't subtract two NativeLong variables which i am supposed to do it.so i was passing Long.but later i passed int(4bytes) arguments in java for long(4bytes) parameters in VC++ then it worked properly but i dint understood why other parameters were affected because of previous parameter data type miss matchin my que because of startaddress variable datatype miss match datalength variable data was getting effected ,so still my question is unanswered ,can any one help me understand it?,unfortunately only my error was cleared
Arguments are placed in memory "slots" on the stack. A small argument might take one slot, or share a slot with another argument, while a large argument might take two slots. You can see how if you mistake a small argument for a large one, it will shift all the subsequent arguments into incorrect positions.
void my_func(int32 p1, int64 p2)
| |
+---------------+
| P2 (64-bit) |
+---------------+
| P2 | (P2 takes up two slots)
+---------------+
+ P1 (32-bit) |
+---------------+ <-- Top of stack
Now if you mistakenly use a 64-bit value for P1, you get this instead:
| |
+---------------+
| P2 (64-bit) |
+---------------+
| P2 |
+---------------+
+ P1 (64-bit) |
+---------------+
+ P1 |
+---------------+ <-- Top of stack
The callee (the function you called) has no idea that this shift has taken place, and therefore attempts to read the arguments from their expected positions. For small arguments it might get a piece of a larger one, and for larger arguments it might get pieces of two or more other arguments.
What the callee actually sees is this:
| |
+---------------+
| (P2) |
+---------------+
+ (P1) |
+---------------+
+ (P1) |
+---------------+ <-- Top of stack
You can see that the value read for P1 is actually only half of the larger value, while the value read for P2 comprises the other half of P1 and half of the value passed in as P2.
(This explanation is somewhat simplified, but generally indicates how the stack works)

Combinations Array Java

Given a array for any dimension (for instance [1 2 3]), a function that gives all combinations like
1 |
1 2 |
1 2 3 |
1 3 |
2 |
2 1 3 |
2 3 |
...
Since I'm guessing this is homework, I'll try to refrain from giving a complete answer.
Suppose you already had all combinations (or permutations if that is what you are looking for) of an array of size n-1. If you had that, you could use those combinations/permutations as a basis for forming the new combinations/permutations by adding the nth element to them in the appropriate way. That is the basis for what computer scientists call recursion (and mathematicians like to call a very similar idea induction).
So you could write a method that would handle the the n case, assuming the n-1 case had been handled, and you can put a check to handle the base case as well.

Efficient Matching Algorithm for Set Based Triplets

I am looking for an efficient way to solve the following problem.
List 1 is a list of records that are identified by a primitive triplet:
X | Y | Z
List 2 is a list of records that are identified by three sets. One Xs, one Ys, one Zs. The X, Y, Zs are of the same 'type' as those in list one so are directly comparable with one another.
Set(X) | Set(Y) | Set(Z)
For an item in list 1 I need to find all the items in list 2 where the X, Y, Z from list 1 all occur in their corresponding sets in list 2. This is best demonstrated by an example:
List 1:
X1, Y1, Z1
List 2:
(X1, X2) | (Y1) | (Z1, Z3)
(X1) | (Y1, Y2) | (Z1, Z2, Z3)
(X3) | (Y1, Y3) | (Z2, Z3)
In the above, the item in list 1 would match the first two items in list 2. The third item would not be matched as X1 does not occur in the X set, and Z1 does not occur in the Z set.
I have written a functionally correct version of the algorithm but am concerned about performance on larger data sets. Both lists are very large so iterating over list 1 and then performing an iteration over list 2 per item is going to be very inefficient.
I tried to build an index by de-normalizing each item in list 2 into a map, but the number of index entries in the index per item is proportional to the size of the item's subsets. As such this uses a very high level of memory and also requires some significant resource to build.
Can anyone suggest to me an optimal way of solving this. I'm happy to consider both memory and CPU optimal solutions but striking a balance would be nice!
There are going to be a lot of ways to approach this. Which is right depends on the data and how much memory is available.
One simple technique is to build a table from list2, to accelerate the queries coming from list1.
from collections import defaultdict
# Build "hits". hits[0] is a table of, for each x,
# which items in list2 contain it. Likewise hits[1]
# is for y and hits[2] is for z.
hits = [defaultdict(set) for i in range(3)]
for rowid, row in enumerate(list2):
for i in range(3):
for v in row[i]:
hits[i][v].add(rowid)
# For each row, query the database to find which
# items in list2 contain all three values.
for x, y, z in list1:
print hits[0][x].intersection(hits[1][y], hits[2][z])
If the total size of the Sets is not too large you could try to model List 2 as bitfields. The structure will be probably quite fragmented though - maybe the structures referenced in the Wikipedia article on Bit arrays (Judy arrays, tries, Bloom filter) can help address the memory problems of you normalization approach.
You could build a tree out of List2; the first level of the tree is the first of (X1..Xn) that appears in set X. The second level is the values for the second item, plus a leaf node containing the set of lists which contain only X1. The next level contains the next possible value, and so on.
Root --+--X1--+--EOF--> List of pointers to list2 lines containing only "X1"
| |
| +--X2---+--EOF--> List of pointers to list2 lines containing only "X1,X2"
| | |
| | +--X3--+--etc--
| |
| +--X3---+--EOF--> "X1,X3"
|
+--X2--+--EOF--> "X2"
| |
| +--X3---+--EOF--> "X2,X3"
| | |
...
This is expensive in memory consumption (N^2 log K, I think? where N=values for X, K=lines in List2) but results in fast retrieval times. If the number of possible Xs is large then this approach will break down...
Obviously you could build this index for all 3 parts of the tuple, and then AND together the results from searching each tree.
There's a fairly efficient way to do this with a single pass over list2. You start by building an index of the items in list1.
from collections import defaultdict
# index is HashMap<X, HashMap<Y, HashMap<Z, Integer>>>
index = defaultdict(lambda: defaultdict(dict))
for rowid, (x, y, z) in enumerate(list1):
index[x][y][z] = rowid
for rowid2, (xs, ys, zs) in enumerate(list2):
xhits = defaultdict(list)
for x in xs:
if x in index:
for y, zmap in index[x].iteritems():
xhits[y].append(zmap)
yhits = defaultdict(list)
for y in ys:
if y in xhits:
for z, rowid1 in xhits[y].iteritems():
yhits[z].append(rowid1)
for z in zs:
if z in yhits:
for rowid1 in yhits[z]:
print "list1[%d] matches list2[%d]" % (hit[z], rowid2)
The extra bookkeeping here will probably make it slower than indexing list2. But since in your case list1 is typically much smaller than list2, this will use much less memory. If you're reading list2 from disk, with this algorithm you never need to keep any part of it in memory.
Memory access can be a big deal, so I can't say for sure which will be faster in practice. Have to measure. The worst-case time complexity in both cases, barring hash table malfunctions, is O(len(list1)*len(list2)).
How about using HashSet (or HashSets) for List 2 ? This way you will only need to iterate over List 1
If you use Guava, there is a high-level way to do this that is not necessarily optimal but doesn't do anything crazy:
List<SomeType> list1 = ...;
List<Set<SomeType>> candidateFromList2 = ...;
if (Sets.cartesianProduct(candidateFromList2).contains(list1)) { ... }
But it's not that hard to check this "longhand" either.

Categories