I've been doing a homework sheet for a while now, and there is a massive discrepancy between what I think the asymptomatic complexity is and what the runtime result suggests.
Below is a table for the runtime for the program.
| Input Size | Runtime (Seconds) |
|---------------------|-------------------------|
| 10000 | 0.040533803 |
|---------------------|-------------------------|
| 20000 | 0.154712122 |
|---------------------|-------------------------|
| 30000 | 0.330814060 |
|---------------------|-------------------------|
| 40000 | 0.603440983 |
|---------------------|-------------------------|
| 50000 | 0.969272780 |
|---------------------|-------------------------|
| 60000 | 1.448454467 |
string = "";
newLetter = "a";
for (int i = 500; i < n; i++) {
string = string + newLetter;
}
return string
Why would there be a discrepancy between the complexity of the algorithm and the growth of the runtime apart from me being wrong?
From the runtime results, it looks like the program has a time complexity of O(n2). Doubling the input size seems to increase the runtime by a factor of 4, which would suggest a quadratic function?
But looking at the program itself, I'm 99.9% certain that the time complexity is actually O(n).
Is it possible that there's an extraneous reason for this discrepancy? Is there any possibility that the runtime results are indeed linear?
My best guess for this discrepancy is that the for loop makes the next iteration slower (which, looking at the program makes sense as I think the Java compiler would have to iterate every additional newLetter given) but that would still be linear no? It's not a nested for loop.
The code you've written actually does run in time Θ(n2). The reason has to do with this part of the code:
string = string + newLetter;
Here, your intent is to say "please append a new character to the end of this string." However, what Java actually does is the following:
Evaluate the expression string + newLetter. This makes a brand-new string formed by copying the full contents of string, then appending newLetter to the end of that new string.
Assign the result to string.
The issue here is that step (1) takes longer and longer to execute the longer the string is, since all the characters have to be copied over. In particular, if string has length k, then step (1) takes time Θ(k) because k characters must be copied. Since the length of string matches the current loop iteration index, this means the work done is
Θ(1 + 2 + 3 + ... + n)
= Θ(n(n+1) / 2)
= Θ(n2),
which is why you're seeing the plot you're seeing.
You can speed this up by using StringBuilder rather than String to assemble the larger string. It doesn't make all the intermediary copies as it goes, and instead maintains an internal array that can grow as needed.
I'm a bit confused as to how an update on a for loop affects the BIG O
on a code like this :
public static void bigO(int n){
for(int i=n; i>1; i=i/2){
for (int j=n; j>1; j=j/2){
sum++;
}
}
}
I'm not sure how the update (j=j/2) would affect it.
The two for loops are independent of each other, so the total complexity should roughly be the product of the complexities of the two loops. Each loop is O(lgN), where lg means "log base 2." So, multiplying this together yields O(lgN*lgN) for the overall complexity.
To understand better whence O(lgN) is, consider an input value of n=16. The outer for loop in i would then have these iterations:
i | iteration #
16 | 1
8 | 2
4 | 3
2 | 4
lg(16) in fact equals 4, because 2^4 = 16, so this confirms the complexity which we expect. You may also test other n values to convince yourself of this. The inner loop in j behaves the same way, and is independent of the outer loop in i.
I'm looking for an algorithm(C/C++/Java - doesn't matter) which will resolve a problem which consists of finding the shortest path between 2 nodes (A and B) of a graph. The catch is that the path must visit certain other given nodes(cities). A city can be visited more than once. Example of path(A-H-D-C-E-F-G-F-B) (where A is source, B is destination, F and G are cities which must be visited).
I see this as a variation of the Traveling Salesman Problem but I couldn't find or write a working algorithm based on my searches.
I was trying to find a solution starting from these topics but without any luck:
https://stackoverflow.com/questions/24856875/tsp-branch-and-bound-implementation-in-c and
Variation of TSP which visits multiple cities
An easy reduction of the problem to TSP would be:
For each (u,v) that "must be visited", find the distance d(u,v) between them. This can be done efficiently using Floyd-Warshall Algorithm to find all-to-all shortest path.
Create a new graph G' consisting only of those nodes, with all edges existing, with the distances as calculated on (1).
Run standard TSP algorithm to solve the problem on the reduced graph.
I think that in addition to amit's answer, you'll want to increase the cost of the edges that have A or B as endpoints by a sufficient amount (the total cost of the graph + 1 would probably be sufficient) to ensure that you don't end up with a path that goes through A or B (instead of ending at A and B).
A--10--X--0--B
| | |
| 10 |
| | |
+---0--Y--0--+
The above case would result in a path from A to Y to B to X, unless you increase the cost of the A and B edges (by 21).
A--31--X--21--B
| | |
| 10 |
| | |
+---21--Y--21-+
Now it goes from A to X to Y to B.
Also make sure you remove any edges (A,B) (if they exist).
Given a array for any dimension (for instance [1 2 3]), a function that gives all combinations like
1 |
1 2 |
1 2 3 |
1 3 |
2 |
2 1 3 |
2 3 |
...
Since I'm guessing this is homework, I'll try to refrain from giving a complete answer.
Suppose you already had all combinations (or permutations if that is what you are looking for) of an array of size n-1. If you had that, you could use those combinations/permutations as a basis for forming the new combinations/permutations by adding the nth element to them in the appropriate way. That is the basis for what computer scientists call recursion (and mathematicians like to call a very similar idea induction).
So you could write a method that would handle the the n case, assuming the n-1 case had been handled, and you can put a check to handle the base case as well.
I am looking for an efficient way to solve the following problem.
List 1 is a list of records that are identified by a primitive triplet:
X | Y | Z
List 2 is a list of records that are identified by three sets. One Xs, one Ys, one Zs. The X, Y, Zs are of the same 'type' as those in list one so are directly comparable with one another.
Set(X) | Set(Y) | Set(Z)
For an item in list 1 I need to find all the items in list 2 where the X, Y, Z from list 1 all occur in their corresponding sets in list 2. This is best demonstrated by an example:
List 1:
X1, Y1, Z1
List 2:
(X1, X2) | (Y1) | (Z1, Z3)
(X1) | (Y1, Y2) | (Z1, Z2, Z3)
(X3) | (Y1, Y3) | (Z2, Z3)
In the above, the item in list 1 would match the first two items in list 2. The third item would not be matched as X1 does not occur in the X set, and Z1 does not occur in the Z set.
I have written a functionally correct version of the algorithm but am concerned about performance on larger data sets. Both lists are very large so iterating over list 1 and then performing an iteration over list 2 per item is going to be very inefficient.
I tried to build an index by de-normalizing each item in list 2 into a map, but the number of index entries in the index per item is proportional to the size of the item's subsets. As such this uses a very high level of memory and also requires some significant resource to build.
Can anyone suggest to me an optimal way of solving this. I'm happy to consider both memory and CPU optimal solutions but striking a balance would be nice!
There are going to be a lot of ways to approach this. Which is right depends on the data and how much memory is available.
One simple technique is to build a table from list2, to accelerate the queries coming from list1.
from collections import defaultdict
# Build "hits". hits[0] is a table of, for each x,
# which items in list2 contain it. Likewise hits[1]
# is for y and hits[2] is for z.
hits = [defaultdict(set) for i in range(3)]
for rowid, row in enumerate(list2):
for i in range(3):
for v in row[i]:
hits[i][v].add(rowid)
# For each row, query the database to find which
# items in list2 contain all three values.
for x, y, z in list1:
print hits[0][x].intersection(hits[1][y], hits[2][z])
If the total size of the Sets is not too large you could try to model List 2 as bitfields. The structure will be probably quite fragmented though - maybe the structures referenced in the Wikipedia article on Bit arrays (Judy arrays, tries, Bloom filter) can help address the memory problems of you normalization approach.
You could build a tree out of List2; the first level of the tree is the first of (X1..Xn) that appears in set X. The second level is the values for the second item, plus a leaf node containing the set of lists which contain only X1. The next level contains the next possible value, and so on.
Root --+--X1--+--EOF--> List of pointers to list2 lines containing only "X1"
| |
| +--X2---+--EOF--> List of pointers to list2 lines containing only "X1,X2"
| | |
| | +--X3--+--etc--
| |
| +--X3---+--EOF--> "X1,X3"
|
+--X2--+--EOF--> "X2"
| |
| +--X3---+--EOF--> "X2,X3"
| | |
...
This is expensive in memory consumption (N^2 log K, I think? where N=values for X, K=lines in List2) but results in fast retrieval times. If the number of possible Xs is large then this approach will break down...
Obviously you could build this index for all 3 parts of the tuple, and then AND together the results from searching each tree.
There's a fairly efficient way to do this with a single pass over list2. You start by building an index of the items in list1.
from collections import defaultdict
# index is HashMap<X, HashMap<Y, HashMap<Z, Integer>>>
index = defaultdict(lambda: defaultdict(dict))
for rowid, (x, y, z) in enumerate(list1):
index[x][y][z] = rowid
for rowid2, (xs, ys, zs) in enumerate(list2):
xhits = defaultdict(list)
for x in xs:
if x in index:
for y, zmap in index[x].iteritems():
xhits[y].append(zmap)
yhits = defaultdict(list)
for y in ys:
if y in xhits:
for z, rowid1 in xhits[y].iteritems():
yhits[z].append(rowid1)
for z in zs:
if z in yhits:
for rowid1 in yhits[z]:
print "list1[%d] matches list2[%d]" % (hit[z], rowid2)
The extra bookkeeping here will probably make it slower than indexing list2. But since in your case list1 is typically much smaller than list2, this will use much less memory. If you're reading list2 from disk, with this algorithm you never need to keep any part of it in memory.
Memory access can be a big deal, so I can't say for sure which will be faster in practice. Have to measure. The worst-case time complexity in both cases, barring hash table malfunctions, is O(len(list1)*len(list2)).
How about using HashSet (or HashSets) for List 2 ? This way you will only need to iterate over List 1
If you use Guava, there is a high-level way to do this that is not necessarily optimal but doesn't do anything crazy:
List<SomeType> list1 = ...;
List<Set<SomeType>> candidateFromList2 = ...;
if (Sets.cartesianProduct(candidateFromList2).contains(list1)) { ... }
But it's not that hard to check this "longhand" either.