Java - Space complexity of this code [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What is the space complexity of the code?
I think it is O(1), since I don't technically store all the inputs into one array, i sort of split it up into 2 arrays.
This code is supposed to take an array with duplicates, and find the two numbers that don't have duplicates.
My Code:
public int[] singleNumber(int[] nums) {
int[] res = new int[2];
HashSet<Integer> temp = new HashSet<>();
HashSet<Integer> temp2 = new HashSet<>();
for (int i : nums) {
if (!temp.contains(i)) temp.add(i);
else temp2.add(i);
}
for (int i : nums) {
if (!temp2.contains(i)) {
if (res[0] == 0) res[0] = i;
else res[1] = i;
}
}
return res;
}

The space complexity is "best case" O(1) and "worst-case" O(N). The best case is when all of the numbers in nums are the same, and the worst case occurs in a variety of situations ... including when there close to N/2 duplicates, which is your intended use-case.
The time complexity is O(N) in all cases.
Here is my reasoning.
Time complexity.
It we assume that the hash function is well-behaved, then the time complexity of HashSet.add and HashSet.contains are both O(1).
The time complexity of Integer.valueOf(int) is also O(1).
These O(1) operations are performed at most "some constant" times in two O(N) loops, making the entire computation O(N) in time.
Space complexity.
This is a bit more complex, but let us just consider the worst case where all int values are unique. This statement
if (!temp.contains(i)) temp.add(i);
else temp2.add(i);
is going to add Integer.valueOf(i) to either temp or temp2. And in this particular case, they will all end up in temp. (Think about it ...) So that means we end up with N unique entries in the temp set, and none in the temp2 set.
Now the space required for a HashSet with N entries is O(N). (The constant of proportionality is large ... but we are taling about space complexity here so that is not pertinent.)
The best case occurs when all of the int values are the same. Then you end up with one entry in temp and one entry in temp2. (The same value is repeatedly added to the temp2 set ... which does nothing.) The space usage of two maps with one entry each is O(1).
But (I hear you ask) what about the objects created by Integer.valueOf(int)? I argue that they don't count:
Unless they are made reachable (via one of the HashSet objects), the Integer objects will be garbage collected. Therefore they don't count as space usage in the sense that it is normally considered in Java1.
A smart compiler could actually optimize away the need for Integer objects entirely. (Not current generation HotSpot Java compilers, but in the future we could see such things in the production compilers.)
1 - If you start considering temporary (i.e. immediately unreachable) Java objects as space utilization and sum up this utilization, you will get meaningless results; e.g. an application with O(N^2) "space utilization" that runs in an O(N) sized heap. Temporary objects do count, but only while they are reachable. In your example, the contribution is O(1).

Related

Why is using sorting (O(n log n) complexity) to find the majority element faster than using a HashMap (O(n) complexity)?

Majority element question:
Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times.
You may assume that the array is non-empty and the majority element always exist in the array.
// Solution1 - Sorting ----------------------------------------------------------------
class Solution {
public int majorityElement(int[] nums) {
Arrays.sort(nums);
return nums[nums.length/2];
}
}
// Solution2 - HashMap ---------------------------------------------------------------
class Solution {
public int majorityElement(int[] nums) {
// int[] arr1 = new int[nums.length];
HashMap<Integer, Integer> map = new HashMap<>(100);
Integer k = new Integer(-1);
try{
for(int i : nums){
if(map.containsKey(i)){
map.put(i, map.get(i)+1);
}
else{
map.put(i, 1);
}
}
for(Map.Entry<Integer, Integer> entry : map.entrySet()){
if(entry.getValue()>(nums.length/2)){
k = entry.getKey();
break;
}
}
}catch(Exception e){
throw new IllegalArgumentException("Error");
}
return k;
}
}
The Arrays.sort() function is implemented in Java using QuickSort and has O(n log n) time complexity.
On the other hand, using HashMap to find the majority element has only O(n) time complexity.
Hence, solution 1 (sorting) should take longer than solution 2 (HashMap), but when I was doing the question on LeetCode, the average time taken by solution 2 is much more (almost 8 times more) than solution 1.
Why is that the case? I'm really confused.....
Is the size of the test case the reason? Will solution 2 become more efficient when the number of elements in the test case increases dramatically?
Big O isn't a measure of actual performance. It's only going to give you an idea of how your performance will evolve in comparison to n.
Practically, an algorithms in O(n.logn) will eventually be slower than O(n) for some n. But that n might be 1, 10, 10^6 or even 10^600 - at which point it's probably irrelevant because you'll never run into such a data set - or you won't have enough hardware for it.
Software engineers have to consider both actual performance and performance at the practical limit. For example hash map lookup is in theory faster than an unsorted array lookup... but then most arrays are small (10-100 elements) negating any O(n) advantage due the extra code complexity.
You could certainly optimize your code a bit, but in this case you're unlikely to change the outcome for small n unless you introduce another factor (e.g. artificially slow down the time per cycle with a constant).
(I wanted to find a good metaphor to illustrate, but it's harder than expected...)
It depends on the test cases, some test cases will be faster in HashMap while others not.
Why is that? The Solution 1 grantee in worst case O(N log2 N), but the HashMap O(N . (M + R)) where M is the cost of collisions and R the cost of resizing the array.
HashMap uses an array named table of the nodes internally, and it resizes different times when the input increase or shrink. And you assigned it with an initial capacity of 100.
So let see what happens? Java uses Separate chaining for resolving the collisions and some test cases may have lots of collisions which lead to consuming lots of time when a query or update the hashmap.
Conclusion the implementation of hashmap is affected by two factors: 1. Resize the table array based on the input size 2. How many collision appears in the input

Comparison of these two algorithms?

So I'm presented with a problem that states. "Determine if a string contains all unique characters"
So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.
private static boolean allUniqueCharacters(String s) {
Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++) {
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar)) {
charSet.add(currentChar);
} else {
return false;
}
}
return true;
}
According to the book I am reading this is the "optimal solution"
public static boolean isUniqueChars2(String str) {
if (str.length() > 128)
return false;
boolean[] char_set = new boolean[128];
for (int i = 0; i < str.length(); i++) {
int val = str.charAt(i);
if (char_set[val]) {
return false;
}
char_set[val] = true;
}
return true;
}
My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?
Thank you.
As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.
Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.
For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.
This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.
Both algorithms have time complexity of O(N). The difference is in their space complexity.
The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).
The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.
The hashmap is in theory acceptable, but is a waste.
A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.
This adds a lot of overhead in terms of space and time, compared to a straight array.
Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.
As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.
Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.
People are talking about O notation as growth rate here
Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.
The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).
This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.
* assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.
This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.

What is the runtime of array.length? [duplicate]

This question already has answers here:
time complexity or hidden cost of <Array Name>.length in java
(5 answers)
Closed 9 years ago.
Let's say I have an array:
int[] array = new int[10];
What is the runtime of:
int len = array.length;
I would think that this would be a constant time operations, but today in an interview, the interviewer told me that this would be O(n) because the number of elements would need to be counted.
Additionally, if I have a loop like this:
for (int i = array.length - 1; i >=0; i--) {
something with array[i];
}
Does this entail an extra n operations to get to the end of the array to start the loop? The interviewer came from a C background, so maybe they were mistaken about how Java works, but I didn't want to push it during the interview.
array.length is O(1) and the loop is O(n) overall (assuming the "something" is constant-time).
Is it different for c?
C is different in that, depending on how the array is allocated, you can either find out its size in O(1) time or not at all. By "not at all" I mean that you have to keep track of the size yourself.
(On a personal note, if that's the caliber of interviewers, I would have reservations about coming to work there.)
It is a constant time operation in all of JAVA implementations because JVM has to store this field to check index (it has to throw IndexOutOfBoundsException if index is invalid ).
It might be a good idea to cache array length in local variable because JVM stack access is faster but this improvement is very minor, normally loop body execution will overweight this optimization.
Here's another SO thread that explains the implementation of array.length:
How is length implemented in Java Arrays?
Calling the length property is an O(1) operation, because it doesn't actually count the array, that field is set when the array is created.
Your loop, on the other hand, is an O(n) operation.

removing duplicate strings from a massive array in java efficiently?

I'm considering the best possible way to remove duplicates from an (Unsorted) array of strings - the array contains millions or tens of millions of stringz..The array is already prepopulated so the optimization goal is only on removing dups and not preventing dups from initially populating!!
I was thinking along the lines of doing a sort and then binary search to get a log(n) search instead of n (linear) search. This would give me nlogn + n searches which althout is better than an unsorted (n^2) search = but this still seems slow. (Was also considering along the lines of hashing but not sure about the throughput)
Please help! Looking for an efficient solution that addresses both speed and memory since there are millions of strings involved without using Collections API!
Until your last sentence, the answer seemed obvious to me: use a HashSet<String> or a LinkedHashSet<String> if you need to preserve order:
HashSet<String> distinctStrings = new HashSet<String>(Arrays.asList(array));
If you can't use the collections API, consider building your own hash set... but until you've given a reason why you wouldn't want to use the collections API, it's hard to give a more concrete answer, as that reason could rule out other answers too.
ANALYSIS
Let's perform some analysis:
Using HashSet. Time complexity - O(n). Space complexity O(n). Note, that it requires about 8 * array size bytes (8-16 bytes - a reference to a new object).
Quick Sort. Time - O(n*log n). Space O(log n) (the worst case O(n*n) and O(n) respectively).
Merge Sort (binary tree/TreeSet). Time - O(n * log n). Space O(n)
Heap Sort. Time O(n * log n). Space O(1). (but it is slower than 2 and 3).
In case of Heap Sort you can through away duplicates on fly, so you'll save a final pass after sorting.
CONCLUSION
If time is your concern, and you don't mind allocating 8 * array.length bytes for a HashSet - this solution seems to be optimal.
If space is a concern - then QuickSort + one pass.
If space is a big concern - implement a Heap with throwing away duplicates on fly. It's still O(n * log n) but without additional space.
I would suggest that you use a modified mergesort on the array. Within the merge step, add logic to remove duplicate values. This solution is n*log(n) complexity and could be performed in-place if needed (in this case in-place implementation is a bit harder than with normal mergesort because adjacent parts could contain gaps from the removed duplicates which also need to be closed when merging).
For more information on mergesort see http://en.wikipedia.org/wiki/Merge_sort
Creating a hashset to handle this task is way too expensive. Demonstrably, in fact the whole point of them telling you not to use the Collections API is because they don't want to hear the word hash. So that leaves the code following.
Note that you offered them binary search AFTER sorting the array: that makes no sense, which may be the reason your proposal was rejected.
OPTION 1:
public static void removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort/quicksort here: n log n
for(int i=1; i<input.length; i++){
if(input[i-1] == input[i])
input[i-1]=null;
}
}
OPTION 2:
public static String[] removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort here: n log n
int size = 1;
for(int i=1; i<input.length; i++){
if(input[i-1] != input[i])
size++;
}
System.out.println(size);
String output[] = new String[size];
output[0]=input[0];
int n=1;
for(int i=1;i<input.length;i++)
if(input[i-1]!=input[i])
output[n++]=input[i];
//final step: either return output or copy output into input;
//here I just return output
return output;
}
OPTION 3: (added by 949300, based upon Option 1). Note that this mangles the input array, if that is unacceptable, you must make a copy.
public static String[] removeDuplicates(String[] input){
Arrays.sort(input);//Use mergesort/quicksort here: n log n
int outputLength = 0;
for(int i=1; i<input.length; i++){
// I think equals is safer, but are nulls allowed in the input???
if(input[i-1].equals(input[i]))
input[i-1]=null;
else
outputLength++;
}
// check if there were zero duplicates
if (outputLength == input.length)
return input;
String[] output = new String[outputLength];
int idx = 0;
for ( int i=1; i<input.length; i++)
if (input[i] != null)
output[idx++] = input[i];
return output;
}
Hi do you need to put them into an array. It would be faster to use a collection using hash values like a set. Here each value is unique because of its hash value.
If you put all entries to a set collection type. You can use the
HashSet(int initialCapacity)
constructor to prevent memory expansion while run time.
Set<T> mySet = new HashSet<T>(Arrays.asList(someArray))
Arrays.asList() has runtime O(n) if memory do not have to be expanded.
Since this is an interview question, I think they want you to come up with your own implementation instead of using the set api.
Instead of sorting it first and compare it again, you can build a binary tree and create an empty array to store the result.
The first element in the array will be the root.
If the next element is equals to the node, return. -> this remove the duplicate elements
If the next element is less than the node, compare it to the left, else compare it to the right.
Keep doing the above the 2 steps until you reach to the end of the tree, then you can create a new node and know this has no duplicate yet.
Insert this new node value to the array.
After the traverse of all elements of the original array, you get a new copy of an array with no duplicate in the original order.
Traversing takes O(n) and searching the binary tree takes O(logn) (insertion should only take O(1) since you are just attaching it and not re-allocating/balancing the tree) so the total should be O(nlogn).
O.K., if they want super speed, let's use the hashcodes of the Strings as much as possible.
Loop through the array, get the hashcode for each String, and add it to your favorite data structure. Since you aren't allowed to use a Collection, use a BitSet. Note that you need two, one for positives and one for negatives, and they will each be huge.
Loop again through the array, with another BitSet. True means the String passes. If the hashcode for the String does not exist in the Bitset, you can just mark it as true. Else, mark it as possibly duplicate, as false. While you are at it, count how many possible duplicates.
Collect all the possible duplicates into a big String[], named possibleDuplicates. Sort it.
Now go through the possible duplicates in the original array and binary Search in the possibleDuplicates. If present, well, you are still stuck, cause you want to include it ONCE but not all the other times. So you need yet another array somewhere. Messy, and I've got to go eat dinner, but this is a start...

Counting sort and usage in Java applications [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I did an algoritms refresher and I (re)read about a sorting algorithm that runs in linear time, namely Counting Sort.
To be honest I had forgotten about it.
I understand the structure and logic and the fact that it runs in linear time is a very attactive quality.
But I have the folowing question:
As I understanding the concrete implementation of the algorithm relies on 2 things:
1) The range of input numbers is small (otherwise the intermediate array will be huge and with many gaps).
2) We actually know the range of numbers.
Taking that these 2 assumptions are correct (please correct me otherwise), I was wondering what is the best application domain that this algorithm applies to.
I mean specifically in Java, is an implementation like the following Java/Counting sort sufficient:
public static void countingSort(int[] a, int low, int high)
{
int[] counts = new int[high - low + 1]; // this will hold all possible values, from low to high
for (int x : a)
counts[x - low]++; // - low so the lowest possible value is always 0
int current = 0;
for (int i = 0; i < counts.length; i++)
{
Arrays.fill(a, current, current + counts[i], i + low); // fills counts[i] elements of value i + low in current
current += counts[i]; // leap forward by counts[i] steps
}
}
or it is not a trivial matter to come up with the high and low?
Is there a specific application in Java that counting sort is best suited for?
I assume there are subtleties like these otherwise why would anyone bother with all the O(nlogn) algorithms?
Algorithms are not about the language, so this is language-agnostic. As you have said - use counting sort when the domain is small. If you have only three numbers - 1, 2, 3 it is far better to sort them with counting sort, than a quicksort, heapsort or whatever which are O(nlogn). If you have a specific question, feel free to ask.
It is incorrect to say counting sort is O(n) in the general case, the "small range of element" is not a recommendation but a key assumption.
Counting sort assumes that each of the elements is an integer in the
range 1 to k, for some integer k. When k = O(n), the Counting-sort
runs in O(n) time.
In the general situation, the range of key k is independent to the number of element n and can be arbitrarily large. For example, the following array:
{1, 1000000000000000000000000000000000000000000000000000000000000000000000000000}
As for the exact break even point value of k and n where counting sort outperform traditional sort, it is hugely dependent on the implementation and best done via benchmarking (trying both out).

Categories