Why does Java documentation not include time complexity? [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I find it really surprising that the java does not specify any time or space complexities for any of the Collection libraries. Given that garbage collection in java is unpredictable, hence nothing is guaranteed, but isn't it helpful to give a average time complexity at least? What am I missing here?

The time complexities are dependant on how you use the collections, however they generally follow the standard time complexities. You can find out the time complexity of an array, linked list, tree, or hash map anywhere, but there is no requirement that an implementation follow these complexities.
In short, time complexity is for an ideal machine, not real machine with real implementations, so even if you know the time complexity, the details of the actual use case can be more important.

The time complexities are mostly self-explanatory based on the implementation. LinkedList is going to be constant time to add items to the end, approaching linear to add items in the middle. HashMap is going to be near constant access time, ArrayList will be linear, until it needs to grow the array, etc.

I don't know what you're talking about. HashSet:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
ArrayList:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

There are time complexities for the major ones, warnings about methods' performance and references to original works. I think these comments are useful, and big-O's might not always be useful.
I.e. Arrays.sort(Object[] array):
Implementation note: This implementation is a stable, adaptive,
iterative mergesort that requires far fewer than n lg(n) comparisons
when the input array is partially sorted, while offering the
performance of a traditional mergesort when the input array is
randomly ordered. If the input array is nearly sorted, the
implementation requires approximately n comparisons. Temporary
storage requirements vary from a small constant for nearly sorted
input arrays to n/2 object references for randomly ordered input
arrays.
The implementation takes equal advantage of ascending and
descending order in its input array, and can take advantage of
ascending and descending order in different parts of the the same
input array. It is well-suited to merging two or more sorted arrays:
simply concatenate the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for
Python ([TimSort - http://svn.python.org/projects/python/trunk/Objects/listsort.txt).
It uses techiques from Peter McIlroy's "Optimistic
Sorting and Information Theoretic Complexity", in Proceedings of the
Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
January 1993.
Or CopyOnWriteArrayList:
This is ordinarily too costly, but may be more efficient
than alternatives when traversal operations vastly outnumber
mutations, and is useful when you cannot or don't want to synchronize
traversals, yet need to preclude interference among concurrent
threads. The "snapshot" style iterator method uses a reference to
the state of the array at the point that the iterator was created.
This array never changes during the lifetime of the iterator, so
interference is impossible and the iterator is guaranteed not to
throw ConcurrentModificationException.

Related

About time complexity of arraylist and linkedlist

In page 290 of the book data structures and algorithms it is mentioned that complexity of remove(i) for arraylist is O(1). My first question is why not O(n)? It is also mentioned add(i,e) for linked list is O(n), so my second question is why not O(min(i,n-i))?
Finally, my 3rd question is the reason the complexity is mentioned as O(min(i,n-i)) is it due to being a doubly linked list, meaning we could either traverse from beginning (i) or end (n-i)?
The first one is debatable. When you remove the last element in an ArrayList, it's constant, but for a middle element, you need to shift all successor elements to the left. Java does that using System.arrayCopy(), a very fast native routine for copying arrays, but even that method is clearly O(n), not constant, so I'm inclined to agree with you. It's different for an insert, where the amortized cost of resizing arrays up to the required index is averaged out to a constant factor, so add() is O(1).
The second one could be implemented that way, but it isn't. Remove starts from the beginning only. I'm guessing the choice was made to reduce accidents by unsynchronized access.
Finally, in notations of Big-O complexity, less significant factors are discarded, so O(min(i,n-i)) is actually equivalent to O(n), even though the real world tells us that the former would certainly be an optimization.

In Java, why is comparing two elements of relatively time consuming

So I am on a program that uses, insertion sort, selection sort, and merge sort. I time all the programs and make a table on which are the fastest. I understand why merge sort is more efficient than selection sort and insertion sort(b/c of the effectiveness of comparing elements).
My question is why does comparing elements of an array relatively consuming and why does it make insertion and selection sort less efficient.
Note : I am new to java and couldn't find anything on this topic. Thx for your responses.
My question is why comparing 2 elements of an array relatively consuming ....
Relative to what?
In fact, the time taken to compare two instances of some class depends on how the compareTo or compare method is implemented. However comparison is typically expensive because of the nature of the computation.
For example, if you have to compare two strings that are equal (but different objects), you have to compare each character in one string with the corresponding character in the other one. For strings of length M, that is M character comparisons plus the overheads of looping over the characters. (Obviously, the comparison is cheaper in other cases .... depending on how different the strings are, for example.)
and why does it make insertion and selection sort less efficient.
The reason that insertion and selection sort are slower (for large datasets) is because they do more comparisons than other more complicated algorithms. Given a dataset with N elements:
The number of comparisons for quicksort and similar is proportional to N * logN
The number of comparisons for insertion sort and similar is proportional to N * N.
As N gets bigger N * N gets bigger than N * log N irrespective of the constants of proportionality.
Assuming that the datasets and element classes are the same, if you do more comparisons, that takes more CPU time.
The other thing to note is that the number of comparisons performed by a sort algorithm is typically proportional to other CPU overheads of the algorithm. That means that it is typically safe (though not mathematically sound) to use the comparison count as a proxy for the overall complexity of a sort algorithm.
why comparing 2 elements of an array relatively consuming
As asked in Stephen C's answer, relative to what?
Selection sort and insertion sort have time complexity O(n^2), while merge sort has time complexity O(n log(n)), so for reasonably large n, merge sort will be faster, but not because of compare overhead compared to the O(n^2) sorts.
For merge sort on an optimizing compiler, where compared elements are loaded into registers (assuming the elements fit in registers), then the compare overhead is small since the move will will write the value that was loaded into a register rather than read it from memory again.
As for compare overhead, if sorting an array of primitives, indexing is used to access the primitives, but if sorting an array of objects, which is usually implemented as an array of pointers to objects, the compare overhead is increased due to dereferencing of pointers. This would impact a comparison of quick sort versus merge sort (more moves, fewer compares), but the issue with merge sort versus insertion sort or selection sort is the O(n^2) time complexity versus merge sort O(n log(n)) time complexity.
In the case of sorting an array of objects, there's also the issue of sorting the pointers versus sorting the objects, which is a cache locality issue. Depending on object size, it may be better to sort the objects rather than sort the pointers, but this isn't really related to compare overhead as asked in the original question.

Is it possible to add/update a sorted list in constant time?

Suppose you are given a list of integers that have already been sorted such as (1,7,13,14,50). It should be noted that the list will contain no duplicates.
Is there some data structure that could store this while allowing me to add any new element (at it's proper location) in constant time? add(10) would yield (1,7,10,13,14,50).
Similarly, would I be able to update an element (such as changing 7 to 19) and shift the order accordingly in constant time? change(7,19) yields (1,13,14,19,50).
For a class I need to write a data structure that performs these operations as quickly as possible, but I just wanted to know if constant time could be done and if not, then what would the ideal runtime be?
To insert in constant time, O(1), this would only occur as a best case for any of the data structures. Hash tables generally have the best insertion time, but it might not always be O(1) if there are collisions and there is separate chaining. You do not sort a hash so the complexity is irrelevent.
Binary tree's have a good insertion time, and as a bonus, it is sorted already upon inserting a new node. This takes on average O(logn) time however. The best case for inserting is O(1) if the tree is empty.
Those were just a couple examples, see here for more info on the complexities of these operations: http://bigocheatsheet.com/
In general? No. Determining where to insert a new element or re-ordering the list after insertion involves performing analysis of the list's contents, which involves reading the elements of the list, which (in general) means iterating over some portion of the length of the list. This (again, in general) is dependant on how many elements are in the list, which by definition is not a constant. Hence, a constant-time sorted insert is simply not possible except in special cases.
A binary tree, TreeSet, would be adequate. An array with Arrays.binarySearch and Arrays.copy would be fine too because here we have ints, and then we do not need the wrapper class Integer.
For real constant time, O(1), one must pay in space. Use a BitSet. To add 17 simply set 17 to true. There are optimized methods to find the next set bit and so on.
But I doubt optimizing is really needed at this spot. File I/O might pay off more.

Loop Through a set of Objects

I have a map of key-value pairs of huge size, approximately 10^7, and I have to loop through it 15 times a second in order to update its contents
Is there any class or structure that offers good complexity and reduces the time needed to loop through?
Currently, I am using TreeMap but the complexity is log n only for contains, put, get and remove. Looping through the elements is of n complexity
Do you know any structure or do you have any idea that may reduce the complexity below n?
If you have to arbitrary loop over the entire collection, you will not get better than n. If you have to loop the entire collection, you could use a simple ArrayList. but if you need to access specific data in the collection using a key, TreeMap will be fine.
You can't beat the O(n) bound on any sequential (or finitely parallel) computer, if your problem is just to look at all of O(n) values.
If you have a finitely parallel machine and depending on exactly how you're updating the elements, you could achieve speedup. For instance, using CUDA and a GPU or OpenMP/MPI and a cluster/multi-core workstation, you could compute A[i] = A[i]^3 or some such with good speedup. Of course, then there's the question of communication... but this might be something to look at.

Is a Java hashmap search really O(1)?

I've seen some interesting claims on SO re Java hashmaps and their O(1) lookup time. Can someone explain why this is so? Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on, there must always exist a dataset that contains collisions.
In which case, the lookup would be O(n) rather than O(1).
Can someone explain whether they are O(1) and, if so, how they achieve this?
A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.
pcollision = n / capacity
So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.
O(n) = O(k * n)
We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.
pcollision x 2 = (n / capacity)2
This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to
pcollision x k = (n / capacity)k
And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.
We talk about this by saying that the hash-map has O(1) access with high probability
You seem to mix up worst-case behaviour with average-case (expected) runtime. The former is indeed O(n) for hash tables in general (i.e. not using a perfect hashing) but this is rarely relevant in practice.
Any dependable hash table implementation, coupled with a half decent hash, has a retrieval performance of O(1) with a very small factor (2, in fact) in the expected case, within a very narrow margin of variance.
In Java, how HashMap works?
Using hashCode to locate the corresponding bucket [inside buckets container model].
Each bucket is a LinkedList (or a Balanced Red-Black Binary Tree under some conditions starting from Java 8) of items residing in that bucket.
The items are scanned one by one, using equals for comparison.
When adding more items, the HashMap is resized (doubling the size) once a certain load percentage is reached.
So, sometimes it will have to compare against a few items, but generally, it's much closer to O(1) than O(n) / O(log n).
For practical purposes, that's all you should need to know.
Remember that o(1) does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w.r.t. the number of items in the container. So if it takes on average 4 comparisons to find an item in a container with 100 items, it should also take an average of 4 comparisons to find an item in a container with 10000 items, and for any other number of items (there's always a bit of variance, especially around the points at which the hash table rehashes, and when there's a very small number of items).
So collisions don't prevent the container from having o(1) operations, as long as the average number of keys per bucket remains within a fixed bound.
I know this is an old question, but there's actually a new answer to it.
You're right that a hash map isn't really O(1), strictly speaking, because as the number of elements gets arbitrarily large, eventually you will not be able to search in constant time (and O-notation is defined in terms of numbers that can get arbitrarily large).
But it doesn't follow that the real time complexity is O(n)--because there's no rule that says that the buckets have to be implemented as a linear list.
In fact, Java 8 implements the buckets as TreeMaps once they exceed a threshold, which makes the actual time O(log n).
O(1+n/k) where k is the number of buckets.
If implementation sets k = n/alpha then it is O(1+alpha) = O(1) since alpha is a constant.
If the number of buckets (call it b) is held constant (the usual case), then lookup is actually O(n).
As n gets large, the number of elements in each bucket averages n/b. If collision resolution is done in one of the usual ways (linked list for example), then lookup is O(n/b) = O(n).
The O notation is about what happens when n gets larger and larger. It can be misleading when applied to certain algorithms, and hash tables are a case in point. We choose the number of buckets based on how many elements we're expecting to deal with. When n is about the same size as b, then lookup is roughly constant-time, but we can't call it O(1) because O is defined in terms of a limit as n → ∞.
Elements inside the HashMap are stored as an array of linked list (node), each linked list in the array represents a bucket for unique hash value of one or more keys.
While adding an entry in the HashMap, the hashcode of the key is used to determine the location of the bucket in the array, something like:
location = (arraylength - 1) & keyhashcode
Here the & represents bitwise AND operator.
For example: 100 & "ABC".hashCode() = 64 (location of the bucket for the key "ABC")
During the get operation it uses same way to determine the location of bucket for the key. Under the best case each key has unique hashcode and results in a unique bucket for each key, in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O(1).
Under the worst case, all the keys have same hashcode and stored in same bucket, this results in traversing through the entire list which leads to O(n).
In the case of java 8, the Linked List bucket is replaced with a TreeMap if the size grows to more than 8, this reduces the worst case search efficiency to O(log n).
We've established that the standard description of hash table lookups being O(1) refers to the average-case expected time, not the strict worst-case performance. For a hash table resolving collisions with chaining (like Java's hashmap) this is technically O(1+α) with a good hash function, where α is the table's load factor. Still constant as long as the number of objects you're storing is no more than a constant factor larger than the table size.
It's also been explained that strictly speaking it's possible to construct input that requires O(n) lookups for any deterministic hash function. But it's also interesting to consider the worst-case expected time, which is different than average search time. Using chaining this is O(1 + the length of the longest chain), for example Θ(log n / log log n) when α=1.
If you're interested in theoretical ways to achieve constant time expected worst-case lookups, you can read about dynamic perfect hashing which resolves collisions recursively with another hash table!
It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions.
Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.
This basically goes for most hash table implementations in most programming languages, as the algorithm itself doesn't really change.
If there are no collisions present in the table, you only have to do a single look-up, therefore the running time is O(1). If there are collisions present, you have to do more than one look-up, which drives down the performance towards O(n).
It depends on the algorithm you choose to avoid collisions. If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value (poor choice of the hash function for example). In that case, data lookup is no different from a linear search on a linked list i.e. O(n). However, the probability of that happening is negligible and lookups best and average cases remain constant i.e. O(1).
Only in theoretical case, when hashcodes are always different and bucket for every hash code is also different, the O(1) will exist. Otherwise, it is of constant order i.e. on increment of hashmap, its order of search remains constant.
Academics aside, from a practical perspective, HashMaps should be accepted as having an inconsequential performance impact (unless your profiler tells you otherwise.)
Of course the performance of the hashmap will depend based on the quality of the hashCode() function for the given object. However, if the function is implemented such that the possibility of collisions is very low, it will have a very good performance (this is not strictly O(1) in every possible case but it is in most cases).
For example the default implementation in the Oracle JRE is to use a random number (which is stored in the object instance so that it doesn't change - but it also disables biased locking, but that's an other discussion) so the chance of collisions is very low.

Categories