During the last months I created in Java some classes implementing data structures, more specifically Lists, Binary Search Trees and Binary Heaps. I decided to make a stress test by creating an Integer array of n values between 0 and 10*n, then sorting in various ways and measuring the times.
Initially it was just a curiosity. Obviously I expected my classes to costs far more than the normal Arrays.sort() method. However when I did the tests and compared my classes against each other I found some unexpected surprises.
This is the list of the tests, with details and comments.
1. A copy of the array is created, then the copy is sorted using the Arrays.sort() method, native of Java. The time evaluated is that of the Arrays.sort() method, i.e. the creation of the copy of the array doesn't count. It is the fastest way, as expected.
2. A list is created from the array, then the list is sorted using the Insertion Sort algorithm. The time evaluated is that of the sorting method, i.e. the creation of the list from the array doesn't count. Since the performance of Insertion Sort is not exactly great, this method costs around 50 times the array method.
3. A Binary Search Trees (BST from now on) is created from the array repeating the add() method. The BST is not a balanced tree like the AVL or the Red-Black, merely a normal BST like found on Wikipedia: every node has links to three other nodes (parent, leftChild and rightChild), incapsulates a value, etc... This method costs arount 500 times the list method, i.e. 25 000 times the array method.
4-5. Two Binary Heap (BH_1 and BH_2 from now on) are created from the array repeating the add() method, then they are converted into two (sorted) array repeating the extractMin() method. Both BH belongs to the same class and stores the value in a Vector<Integer>. Both BH costs around 2 times the BST and 50 000 times the array method. However there is a twist.
BH_2 creates the array using the method convertHeapToArray() of the Heap<Integer> interface. convertHeapToArray() calls the method extractMin() for n times, while extractMin() in turn calls the method heapify() one time.
This doesn't happen in BH_1, which uses the method convertHeapToArray_1(). Instead of calling extractMin(), my "new" method directly executes the code of extractMin() - and when extractMin() calls the method heapify(), BH_1 instead executed its code. In short, a copy-paste that avoids a few calls.
In theory BH_1 should always costs less than BH_2: same input data, same code, less method calling. However this is true only 73% of the times!
My questions are the following:
1. Why the Binary Search Trees sorting method (computational complexity n log(n), expected to be balanced if created by add() only) costs 500 times an Insertion Sort (computational complexity n2, is preferable if n is less than 23)
2. Why the Binary Heaps (same computational complexity of Binary Search Trees, specifically designed to sort quickly) costs 2 times a Binary Search Trees?
3. And, more baffling than ever, why making less calls is more costly than making more calls in one case out of 4?!
Code for convertHeapToArray():
public default T[] convertHeapToArray(){
T[] output = AArrays.createTarray(length(), min());
for(int i=0 ; i<length() ; i++)
output[i] = this.extractMin();
return output;
}
public T extractMin() {
T min = storer.get(indexOfMinimum);
AVector.swap(storer, indexOfMinimum, length);
AVector.swap(storer, indexOfMinimum, length);
length--;
heapify(indexOfMinimum);
return min;
}
Report (5000 tests, 100 random arrays each):
The array use a Comparator<Integer>.
A Comparator<Integer> executes a confront in 66083 nanoseconds.
The list use a Comparator<NodeList<Integer>>.
A Comparator<NodeList<Integer>> executes a confront in 85973 nanoseconds.
The BST, BH_1 and BH_2 use a Relationship<Integer>.
A Relationship<Integer> executes a confront in 107145 nanoseconds.
The total time for the array sorting is 239 717 392 nanoseconds.
The total time for the list sorting is 984 872 184 nanoseconds.
The total time for the BST sorting is 533 338 808 811 nanoseconds.
The total time for the BH_1 sorting is 1 055 836 440 689 nanoseconds.
The total time for the BH_2 sorting is 1 198 365 741 676 nanoseconds.
The medium time for the array sorting is 47 943 nanoseconds.
The medium time for the list sorting is 196 974 nanoseconds.
The medium time for the BST sorting is 106 667 761 nanoseconds.
The medium time for the BH_1 sorting is 211 167 288 nanoseconds.
The medium time for the BH_2 sorting is 239 673 148 nanoseconds.
The first method for the Binary Heap has been faster than the second for 3 634 times out of 5 000.
EDIT:
Re-reading what I wrote, I realized I wasn't exactly clear in my initial question. Allow me to rectify my mistake.
I am aware that there is a difference between the actual time executed by the program and the computational complexity. I never had any doubt on the computational complexity of the methods used: the data structures are simple, their code is pratically taken from Wikipedia . I was certain that the written code was not well performing. It wasn't written with performance to begin with.
My test is on the actual time of execution. The Arrays.sort() method was included as a reference parameter. Since I didn't wrote the code with performance in mind, I suspected before the test that the results would be more costly than expected. However my prediction on what exactly was costing more than it should were thrown into the garbage bin by the results.
For example I believed that less method calls would results in a lesser cost, but I was wrong. The whole difference between the two Binary Heaps was that the first executed the same code of the second, just with less method calls (minimal code for the BinaryHeap included below). I expected that the first Binary Heap would always have a lesser cost: this was proven wrong, and I don't know why.
Another thing that baffled me was the fact that the Binary Heap costed more than the Binary Search Tree. The array used is created randomly. While with this starting condition the height of a Binary Search Tree is expected to be around log(n) (chapter 12, paragraph 4 of Introduction to Algorithm, by Cormen, Leiserson, Rivest, Stein), I never heard of an algorithm using Binary Search Trees to sort arrays: I included it in my test only as a curiosity. And yet for the small number of elements (initially 100) the Binary Search Trees was found consistently quicker than the Binary Heap.
Why this happens? When a Binary Heap starts being more convenient? Was including the conversion Heap»Array a mistake?
A third unexpected result is the performance of the double-linked List. As with the Binary Search Trees, I included in the test as a curiosity. The sorting algorithm is a very basic Insertion Sort, which is faster only for a number of element far less than what I used, and the code of the entire class is most definitively not created for quickness. I supposed it would have revealed as the most slow of my classes, instead it was the quickest! And I still don't know why.
This is why I asked advice. I didn't included the code because between all the classes it is around the ~3000 lines of code, most of which is unused by the test. I would not have read 3k lines of code, nor I expected the random stackoverflow-er to do so! However I included the code for the BinaryHeap<T>, which is ~300 lines.
Code for BinaryHeap<T> :
public class BinaryHeap<T> implements Heap<T> {
private static final int indexOfMinimum = 1;
private Vector<T> storer = new Vector<T>(indexOfMinimum + 10);
private Relationship<T> relationship = null;
// The class Relationship<T> has a single method whose signature is
// public boolean confront(T a, T b);
// The reason I included it instead of a Comparator is that, in my code, the value null represents -∞
//
// The following code works.
//
// public class Relationship<T>{
// Comparator<T> c;
// public Relationship<T>(Comparator<T> c){ this.c = c; }
// public boolean confront(T a, T b){ return a==null ? true : c.compare(a,b); }
// }
// TODO | Constructors
public BinaryHeap(Relationship<T> relationship){
storer.add(null);
this.relationship = relationship;
}
// TODO | Methods of the interface Heap
public void add(T newData) {
storer.add(null);
updateValue(storer.size() - 1, newData);
}
public T extractMin() {
T min = storer.get(indexOfMinimum);
heapify(indexOfMinimum);
return min;
}
public void updateValue(int indexOfToBeUpgraded, T newValue) {
int i = indexOfToBeUpgraded;
T support;
if( i >= indexOfMinimum )
{
storer.set(i, newValue);
while( i > indexOfMinimum && ! relationship.confront(storer.get(i/2), storer.get(i)) )
{
support = storer.get(i);
storer.set(i, storer.get(i/2));
storer.set(i, support);
i = i/2;
}
}
}
private void heapify(int i){
int j = i;
int maximumIndexOfArray = storer.size();
while( j <= maximumIndexOfArray/2 ) // i.e. if H[j] isn't a leaf of the Tree
{
int indexOfLocalMinimum = relationship.confront(storer.get(j), storer.get(2*j))
? ( relationship.confront(storer.get( j), storer.get(2*j+1)) ? j : 2*j+1 )
: ( relationship.confront(storer.get(2*j), storer.get(2*j+1)) ? 2*j : 2*j+1 ) ;
if( j != indexOfLocalMinimum )
{
AVector.swap(storer, j, indexOfLocalMinimum);
j = indexOfLocalMinimum;
}
else j = maximumIndexOfArray;
}
}
public default T[] convertHeapToArray(){
T[] output = (T[]) Array.newInstance(min().getClass(), length());
for(int i=0 ; i<length() ; i++)
output[i] = this.extractMin();
return output;
}
// TODO | Second version of the method convertHeapToArray, found out not to be improved
public T[] convertHeapToArray_1(){
int length = length(), j;
T[] output = (T[]) Array.newInstance(min().getClass(), length());
for(int i=0 ; i<length ; i++)
{
// output[i] = this.extractMin();
output[i] = storer.get(indexOfMinimum);
// heapify(indexOfMinimum);
j = indexOfMinimum;
int maximumIndexOfArray = storer.size();
while( j <= maximumIndexOfArray/2 ) // i.e. if H[j] isn't a leaf of the Tree
{
int indexOfLocalMinimum = relationship.confront(storer.get(j), storer.get(2*j))
? ( relationship.confront(storer.get( j), storer.get(2*j+1)) ? j : 2*j+1 )
: ( relationship.confront(storer.get(2*j), storer.get(2*j+1)) ? 2*j : 2*j+1 ) ;
if( j != indexOfLocalMinimum )
{
AVector.swap(storer, j, indexOfLocalMinimum);
j = indexOfLocalMinimum;
}
else j = maximumIndexOfArray;
}
}
return output;
}
Computational complexity does not measure nanoseconds, or milliseconds, or anything of that sort. It measures how the running time of an algorithm varies as a function of the size of its input, and it says nothing about the efficiency of the code that me or you might write to implement the algorithm.
Now, when you write an actual implementation of an algorithm you are introducing overhead which depends on a number of factors that computational complexity theory just does not care about.
The performance of your code depends on your language of choice, your execution environment, programming choices that you make, how experienced you are at writing performing code and avoiding common performance pitfalls, etc.
Furthermore, when you test the performance of your code, a lot depends on how well you know how to do that in your execution environment. With bytecode-translated, just-in-time-compiled, garbage-collected languages like java, on systems with hundreds of simultaneously running processes, it is quite quirky.
So, the answer to your question is that your comparisons are unequal because a) you have not written well performing code, b) certain things cost a lot more than you might expect in java, and c) the system you are trying to benchmark is more chaotic and less closed than you think.
In order to do a test which is true to computational complexity theory you would have to measure the performance of your code theoretically. This means that you would need to count not nanoseconds, but theoretical operations. Those would be the number of node accesses and the number of node creations (in trees.)
However, computational complexity theory still holds, so even if you insist on counting time, if you performance-test your algorithms for several orders of magnitude larger Ns, (which might mean allowing them to run for years, not hundreds of nanoseconds,) you should eventually begin to see the differences predicted by theory, because on the long run, log N beats N, which in turn beats N log N, which in turn beats N2, despite the inequalities introduced by badly performing code.
Of course there is always a chance that the algorithms and data structures that you are performance-testing have completely different computational complexity characteristics than what what we think they have, due to implementation tricks. For example, the linked list that you are performance-testing could internally employ a hash-map as an aid to boost its performance, who knows. But we would only be able to judge that if you posted your code so that we can see exactly what is going on.
Related
I am trying to understand the time complexity while using backtracking. The problem is
Given a set of unique integers, return all possible subsets.
Eg. Input [1,2,3] would return [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]
I am solving it using backtracking as this:
private List<List<Integer>> result = new ArrayList<>();
public List<List<Integer>> getSubsets(int[] nums) {
for (int length = 1; length <= nums.length; length++) { //O(n)
backtrack(nums, 0, new ArrayList<>(), length);
}
result.add(new ArrayList<>());
return result;
}
private void backtrack(int[] nums, int index, List<Integer> listSoFar, int length) {
if (length == 0) {
result.add(listSoFar);
return;
}
for (int i = index; i < nums.length; i++) { // O(n)
List<Integer> temp = new ArrayList<>();
temp.addAll(listSoFar); // O(2^n)
temp.add(nums[i]);
backtrack(nums, i + 1, temp, length - 1);
}
}
The code works fine, but I am having trouble understanding the time/space complexity.
What I am thinking is here the recursive method is called n times. In each call, it generates the sublist that may contain max 2^n elements. So time and space, both will be O(n x 2^n), is that right?
Is that right? If not, can any one elaborate?
Note that I saw some answers here, like this but unable to understand. When recursion comes into the picture, I am finding it a bit hard to wrap my head around it.
You're exactly right about space complexity. The total space of the final output is O(n*2^n), and this dominates the total space used by the program. The analysis of the time complexity is slightly off though. Optimally, time complexity would, in this case, be the same as the space complexity, but there are a couple inefficiencies here (one of which is that you're not actually backtracking) such that the time complexity is actually O(n^2*2^n) at best.
It can definitely be useful to analyze a recursive algorithm's time complexity in terms of how many times the recursive method is called times how much work each call does. But be careful about saying backtrack is only called n times: it is called n times at the top level, but this is ignoring all the subsequent recursive calls. Also every call at the top level, backtrack(nums, 0, new ArrayList<>(), length); is responsible for generating all subsets sized length, of which there are n Choose length. That is, no single top-level call will ever produce 2^n subsets; it's instead that the sum of n Choose length for lengths from 0 to n is 2^n:
Knowing that across all recursive calls, you generate 2^n subsets, you might then want to ask how much work is done in generating each subset in order to determine the overall complexity. Optimally, this would be O(n), because each subset varies in length from 0 to n, with the average length being n/2, so the overall algorithm might be O(n/2*2^n) = O(n*2^n), but you can't just assume the subsets are generated optimally and that no significant extra work is done.
In your case, you're building subsets through the listSoFar variable until it reaches the appropriate length, at which point it is appended to the result. However, listSoFar gets copied to a temp list in O(n) time for each of its O(n) characters, so the complexity of generating each subset is O(n^2), which brings the overall complexity to O(n^2*2^n). Also, some listSoFar subsets are created which never figure into the final output (you never check to see that there are enough numbers remaining in nums to fill listSoFar out to the desired length before recursing), so you end up doing unnecessary work in building subsets and making recursive calls which will never reach the base case to get appended to result, which might also worsen the asymptotic complexity. You can address the first of these inefficiencies with back-tracking, and the second with a simple break statement. I wrote these changes into a JavaScript program, leaving most of the logic the same but re-naming/re-organizing a little bit:
function getSubsets(nums) {
let subsets = [];
for (let length = 0; length <= nums.length; length++) {
// refactored "backtrack" function:
genSubsetsByLength(length); // O(length*(n Choose length))
}
return subsets;
function genSubsetsByLength(length, i=0, partialSubset=[]) {
if (length === 0) {
subsets.push(partialSubset.slice()); // O(n): copy partial and push to result
return;
}
while (i < nums.length) {
if (nums.length - i < length) break; // don't build partial results that can't finish
partialSubset.push(nums[i]); // O(1)
genSubsetsByLength(length - 1, ++i, partialSubset);
partialSubset.pop(); // O(1): this is the back-tracking part
}
}
}
for (let subset of getSubsets([1, 2, 3])) console.log(`[`, ...subset, ']');
The key difference is using back-tracking to avoid making copies of the partial subset every time you add a new element to it, such that each is built in O(length) = O(n) time rather than O(n^2) time, because there is now only O(1) work done per element added. Popping off the last character added to the partial result after each recursive call allows you to re-use the same array across recursive calls, thus avoiding the O(n) overhead of making temp copies for each call. This, along with the fact that only subsets which appear in the final output are built, allows you to analyze the total time complexity in terms of the total number of elements across all subsets in the output: O(n*2^n).
Your code works not efficiently.
Like first solution in the link, you only think about the number will be included or not. (like getting combination)
It means, you don't have to iterate in getSubsets and backtrack function.
"backtrack" function could iterate "nums" array with parameter
private List<List<Integer>> result = new ArrayList<>();
public List<List<Integer>> getSubsets(int[] nums) {
backtrack(nums, 0, new ArrayList<>(), new ArrayList<>());
return result;
}
private void backtrack(int[] nums, int index, List<Integer> listSoFar)
// This function time complexity 2^N, because will search all cases when the number included or not
{
if (index == nums.length) {
result.add(listSoFar);
return;
}
// exclude num[index] in the subset
backtrack(nums, index+1, listSoFar)
// include num[index] in the subset
backtrack(nums, index+1, listSoFar.add(nums[index]))
}
I want to analyze moving 1 2 3 4 to 3 1 2 4 (list of integers) using LinkedList or ArrayList.
What I have done:
aux = arraylist.get(2); // O(1)
arraylist.remove(2); // O(n)
arraylist.add(0, aux); // O(n), shifting elements up.
aux = linkedlist.get(2); // O(n)
linkedlist.remove(2); // O(n)
linkedlist.addFirst(aux); // O(1)
So, in this case, can we say that they are the same or am I missing something?
You can indeed say this specific operation takes O(n) time for both a LinkedList and an ArrayList.
But you can not say that they take the same amount of real time as a result of this.
Big-O notation only tells you how the running time will scale as the input gets larger since it ignores constant factors. So an O(n) algorithm can take at most 1*n operation or 100000*n operations or a whole lot more, and each of those operations can take a greatly varying amount of time. This also means it's generally a pretty inaccurate measure of performance for small inputs, since constant factors can have a bigger effect on the running time than the size of a small input (but performance differences tend to be less important for small inputs).
See also: What is a plain English explanation of "Big O" notation?
Here's a quick and dirty benchmark. I timed both operations and repeated one million times:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedList;
public class Main {
public static void main(String[] args) {
int arrayListTime = 0;
int linkedListTime = 0;
int n = 10000000;
for (int i=0; i<n; i++) {
ArrayList<Integer> A = new ArrayList<>(Arrays.asList(1,2,3,4));
long startTime = System.currentTimeMillis();
int x = A.remove(2);
A.add(0, x);
long endTime = System.currentTimeMillis();
arrayListTime += (endTime - startTime);
LinkedList<Integer> L = new LinkedList<>(Arrays.asList(1,2,3,4));
long startTime2 = System.currentTimeMillis();
int x2 = L.remove(2);
L.addFirst(x2);
long endTime2 = System.currentTimeMillis();
linkedListTime += (endTime2 - startTime2);
}
System.out.println(arrayListTime);
System.out.println(linkedListTime);
}
}
The difference is pretty small. My output was:
424
363
So using LinkedList was only 61ms faster over the course of 1,000,000 operations.
Big O complexity means nothing when the input is very small, like your case. Recall that the definition of big O only holds for "large enough n", and I doubt you have reached that threashold for n==4.
If this runs in a tight loop with small data size (and with different data each time), the only thing that would actually matter is going to be cache peformance, and the array list solution is much more cache friendly than the linked list solution, since each Node in a linked list is likely to require a cache seek.
In this case, I would even prefer using a raw array (int[]), to avoid the redundant wrapper objects, which triggers more cache misses).
This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.
You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.
The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.
For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.
This is a problem I'm trying to solve on my own to be a bit better at recursion(not homework). I believe I found a solution, but I'm not sure about the time complexity (I'm aware that DP would give me better results).
Find all the ways you can go up an n step staircase if you can take k steps at a time such that k <= n
For example, if my step sizes are [1,2,3] and the size of the stair case is 10, I could take 10 steps of size 1 [1,1,1,1,1,1,1,1,1,1]=10 or I could take 3 steps of size 3 and 1 step of size 1 [3,3,3,1]=10
Here is my solution:
static List<List<Integer>> problem1Ans = new ArrayList<List<Integer>>();
public static void problem1(int numSteps){
int [] steps = {1,2,3};
problem1_rec(new ArrayList<Integer>(), numSteps, steps);
}
public static void problem1_rec(List<Integer> sequence, int numSteps, int [] steps){
if(problem1_sum_seq(sequence) > numSteps){
return;
}
if(problem1_sum_seq(sequence) == numSteps){
problem1Ans.add(new ArrayList<Integer>(sequence));
return;
}
for(int stepSize : steps){
sequence.add(stepSize);
problem1_rec(sequence, numSteps, steps);
sequence.remove(sequence.size()-1);
}
}
public static int problem1_sum_seq(List<Integer> sequence){
int sum = 0;
for(int i : sequence){
sum += i;
}
return sum;
}
public static void main(String [] args){
problem1(10);
System.out.println(problem1Ans.size());
}
My guess is that this runtime is k^n where k is the numbers of step sizes, and n is the number of steps (3 and 10 in this case).
I came to this answer because each step size has a loop that calls k number of step sizes. However, the depth of this is not the same for all step sizes. For instance, the sequence [1,1,1,1,1,1,1,1,1,1] has more recursive calls than [3,3,3,1] so this makes me doubt my answer.
What is the runtime? Is k^n correct?
TL;DR: Your algorithm is O(2n), which is a tighter bound than O(kn), but because of some easily corrected inefficiencies the implementation runs in O(k2 × 2n).
In effect, your solution enumerates all of the step-sequences with sum n by successively enumerating all of the viable prefixes of those step-sequences. So the number of operations is proportional to the number of step sequences whose sum is less than or equal to n. [See Notes 1 and 2].
Now, let's consider how many possible prefix sequences there are for a given value of n. The precise computation will depend on the steps allowed in the vector of step sizes, but we can easily come up with a maximum, because any step sequence is a subset of the set of integers from 1 to n, and we know that there are precisely 2n such subsets.
Of course, not all subsets qualify. For example, if the set of step-sizes is [1, 2], then you are enumerating Fibonacci sequences, and there are O(φn) such sequences. As k increases, you will get closer and closer to O(2n). [Note 3]
Because of the inefficiencies in your coded, as noted, your algorithm is actually O(k2 αn) where α is some number between φ and 2, approaching 2 as k approaches infinity. (φ is 1.618..., or (1+sqrt(5))/2)).
There are a number of improvements that could be made to your implementation, particularly if your intent was to count rather than enumerate the step sizes. But that was not your question, as I understand it.
Notes
That's not quite exact, because you actually enumerate a few extra sequences which you then reject; the cost of these rejections is a multiplier by the size of the vector of possible step sizes. However, you could easily eliminate the rejections by terminating the for loop as soon as a rejection is noticed.
The cost of an enumeration is O(k) rather than O(1) because you compute the sum of the sequence arguments for each enumeration (often twice). That produces an additional factor of k. You could easily eliminate this cost by passing the current sum into the recursive call (which would also eliminate the multiple evaluations). It is trickier to avoid the O(k) cost of copying the sequence into the output list, but that can be done using a better (structure-sharing) data-structure.
The question in your title (as opposed to the problem solved by the code in the body of your question) does actually require enumerating all possible subsets of {1…n}, in which case the number of possible sequences would be exactly 2n.
If you want to solve this recursively, you should use a different pattern that allows caching of previous values, like the one used when calculating Fibonacci numbers. The code for Fibonacci function is basically about the same as what do you seek, it adds previous and pred-previous numbers by index and returns the output as current number. You can use the same technique in your recursive function , but add not f(k-1) and f(k-2), but gather sum of f(k-steps[i]). Something like this (I don't have a Java syntax checker, so bear with syntax errors please):
static List<Integer> cache = new ArrayList<Integer>;
static List<Integer> storedSteps=null; // if used with same value of steps, don't clear cache
public static Integer problem1(Integer numSteps, List<Integer> steps) {
if (!ArrayList::equal(steps, storedSteps)) { // check equality data wise, not link wise
storedSteps=steps; // or copy with whatever method there is
cache.clear(); // remove all data - now invalid
// TODO make cache+storedSteps a single structure
}
return problem1_rec(numSteps,steps);
}
private static Integer problem1_rec(Integer numSteps, List<Integer> steps) {
if (0>numSteps) { return 0; }
if (0==numSteps) { return 1; }
if (cache.length()>=numSteps+1) { return cache[numSteps] } // cache hit
Integer acc=0;
for (Integer i : steps) { acc+=problem1_rec(numSteps-i,steps); }
cache[numSteps]=acc; // cache miss. Make sure ArrayList supports inserting by index, otherwise use correct type
return acc;
}
We have the following Java method:
static void comb(int[] a, int i, int max) {
if(i < 0) {
for(int h = 0; h < a.length; h++)
System.out.print((char)(’a’+a[h]));
System.out.print("\n");
return;
}
for(int v = max; v >= i; v--) {
a[i] = v;
comb(a, i-1, v-1);
}
}
static void comb(int[] a, int n) { // a.length <= n
comb(a, a.length-1, n - 1);
return;
}
I have to determine an asymptotic estimate of the cost of the algorithm comb(int[],int) in function of the size of the input.
Since I'm just starting out with this type of exercises, I can not understand if in this case for input size means the size of the array a or some other method parameter.
Once identified the input size, how to proceed to determine the cost of having a multiple recursion?
Please, you can tell me the recurrence equation which determines the cost?
To determine the complexity of this algorithm you have to understand on which "work" you spend most of the time. Different kind of algorithm may depend different aspects of its parameters like input size, input type, input order, and so on. This one depends on array size and n.
Operations like System.out.print, (char), 'a' + a [h], a.length, h++ and so on are constant time operations and mostly depends on processor commands you will get after compilation and from a processor on which you will execute those instructions. But eventually they can be summed to constant say C. This constant will not depend on the algorithm and input size so you can safely omit it from estimation.
This algorithm has linearly dependent on the input size because it cycles, it's input array (with a cycle from h = 0 to last array element). And because n can be equal to array size (a.length = n - this is the worst case for this algorithm because it forces it execute recursion "array size" times) we should consider this input case in our estimation. And then we get another cycle with recursion which will execute method comb other n times.
So in the worst case we will get a O(n*n*C) number of execution steps for significantly large input size constant C will become insignificant so you can omit it from estimation. Thus final estimation will be O(n^2).
The original method being called is comb(int[] a, int n), and you know that a.length <= n. This means you can bound the running time of the method with a function of n, but you should think whether you can compute a better bound with a function of both n and a.length.
For example, if the method executes a.length * n steps and each step takes a constant amount of time, you can say that the method takes O(n^2) time, but O(a.length * n) would be more accurate (especially if n is much larger than a.length.
You should analyze how many times the method is called recursively, and how many operations occur in each call.
Basically for a given size of input array, how many steps does it take to compute the answer? If you double the input size, what happens to the number of steps? The key is to examine your loops and work out how many times they get executed.