Java - HashSet: Is iterator not random access?

Java - HashSet: Is iterator not random access? - java

I know the underlying data structure for the hashset is an array. I thought I can get a random value from the hashset by using iterator().next().
I looked at the source code but couldn't really tell. Does iterator not traverse the values in the hashset in a random order?

The iterator will traverse the elements by hash table bucket which is based on the hash code of the objects, and thus they will be in an arbitrary order which might certainly seem random, however they will be consistent for a given HashSet size and contents. Because the order is arbitrary, hash-based containers make no guarantees about the iteration order of their elements, but they do not make any effort to randomize the order.

Random Access in terms of data structures means that you can get the elements in an array-like operation using an index. It lets you select any location by specifying the aforementioned index. Lists are also random access as they have a get() method. If you want to get the elements in a random order other you could put them in a List and then shuffle the list.
List<Integer> list = new ArrayList<>(List.of(1,2,3,4,5,6,7,8,9));
Collections.shuffle(list);
for (int i : list) {
System.out.println(i);
}
prints something like the following without repeated elements.
4
5
3
7
6
1
9
8
2
If you want to just get values randomly including possible repeated elements. Then use Random as suggested and generate a random index from 0 to list.size() and retrieve the value using list.get(). You could do that as long as required without exhausting the supply.

Related

Data structure for permutations in Java

I need to store a permutation of n integers and be able to compute both the permutation of a value and the inverse operation in efficient time.
I.e, I need to store a reordering of values [0...n-1] in such a way I can ask for position(i) and value(j) (with 0 <= i,j <= n).
With an example—Suppose we have the following permutation of values:
[7,2,3,6,0,4,8,9,1,5]
I need the following operations:
position(7) = 9
value(9) = 7
I know libraries in C++ for that, such as: https://github.com/fclaude/libcds2
Is there any structure or library in Java that allows to do that and is efficient in space and time?

If there are no duplicates, the List interface will suit your needs.
It provides the following methods:
List#get(index) returns the element with index index
List#indexOf(element) returns the index of the first encountered element

Find the only unique element in an array of a million elements

I was asked this question in a recent interview.
You are given an array that has a million elements. All the elements are duplicates except one. My task is to find the unique element.
var arr = [3, 4, 3, 2, 2, 6, 7, 2, 3........]
My approach was to go through the entire array in a for loop, and then create a map with index as the number in the array and the value as the frequency of the number occurring in the array. Then loop through our map again and return the index that has value of 1.
I said my approach would take O(n) time complexity. The interviewer told me to optimize it in less than O(n) complexity. I said that we cannot, as we have to go through the entire array with a million elements.
Finally, he didn't seem satisfied and moved onto the next question.
I understand going through million elements in the array is expensive, but how could we find a unique element without doing a linear scan of the entire array?
PS: the array is not sorted.

I'm certain that you can't solve this problem without going through the whole array, at least if you don't have any additional information (like the elements being sorted and restricted to certain values), so the problem has a minimum time complexity of O(n). You can, however, reduce the memory complexity to O(1) with a XOR-based solution, if every element is in the array an even number of times, which seems to be the most common variant of the problem, if that's of any interest to you:
int unique(int[] array)
{
int unpaired = array[0];
for(int i = 1; i < array.length; i++)
unpaired = unpaired ^ array[i];
return unpaired;
}
Basically, every XORed element cancels out with the other one, so your result is the only element that didn't cancel out.

Assuming the array is un-ordered, you can't. Every value is mutually exclusive to the next so nothing can be deduced about a value from any of the other values?
If it's an ordered array of values, then that's another matter and depends entirely on the ordering used.
I agree the easiest way is to have another container and store the frequency of the values.

In fact, since the number of elements in the array was fix, you could do much better than what you have proposed.
By "creating a map with index as the number in the array and the value as the frequency of the number occurring in the array", you create a map with 2^32 positions (assuming the array had 32-bit integers), and then you have to pass though that map to find the first position whose value is one. It means that you are using a large auxiliary space and in the worst case you are doing about 10^6+2^32 operations (one million to create the map and 2^32 to find the element).
Instead of doing so, you could sort the array with some n*log(n) algorithm and then search for the element in the sorted array, because in your case, n = 10^6.
For instance, using the merge sort, you would use a much smaller auxiliary space (just an array of 10^6 integers) and would do about (10^6)*log(10^6)+10^6 operations to sort and then find the element, which is approximately 21*10^6 (many many times smaller than 10^6+2^32).
PS: sorting the array decreases the search from a quadratic to a linear cost, because with a sorted array we just have to access the adjacent positions to check if a current position is unique or not.

Your approach seems fine. It could be that he was looking for an edge-case where the array is of even size, meaning there is either no unmatched elements or there are two or more. He just went about asking it the wrong way.

How does iteration work in Hashset?

I just stumbled upon this statement about the java.util.HashSet and it goes "This class makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time." Can any one explain the statement?
Statement Source: click here

HashSet is using N buckets and stores the elements based on their hashcode in one of these buckets to make searching faster: when you search for an element, the set calculates the hash of the element to know which bucket it needs to search, then it checks if that bucket contains this element. This makes searching N times faster since the set doesn't need to check the other N-1 buckets.
For a small number of elements, the number of buckets can be small. But with more elements arriving, the buckets will start to contain more elements which means that searching will go slower. To solve this problem, the set will need to add more buckets and rearrange its elements to use the new buckets.
Now when we iterate over a HashSet we do it by starting with the elements from the first bucket, then from the second bucket and so on. You see that Sets which are using only buckets can't guarantee the same order of elements since the buckets can change between iterations.

Because HashSet is not ordered, the iterator probably walks all buckets and steps through each bucket's contents in turn. This means that if more items are added so that the buckets are rebalanced then the order can change .
E.g. if you have 1,2,3 and you iterate you may well get 1,3,2. Also, if you later add 4 you could then get 4,2,3,1 or any other order.

It basically means a HashSet has no order. Then you should not rely on values order in your code.
If your set contains values {2, 1, 3} (insert order), nothing guarantees that iterating over it will return {2, 1, 3} nor {1, 2, 3}.

difference between 2d array and hashmap

I am relatively new to Java and I just want to make sure I get the basic concepts correctly. So my question is how is hashmap different to 2d array. I will illustrate an example and if someone could possibly correct me if I am wrong that would be great. So
You cannot access/change the 1st array of the 2d array directly in contrast to hashmap. So for example if you have got arr[2][5] the first arr[2] you cannot change it to something else.In other words if we have int arr[2][2] you cannot change it to say arr[Cars][2] whereas with hashmap you can. You cannot even access this at all whereas with hashmap you can. If you have got map Martin, 25 you can possibly make this to Joe, 22 easily.
You can search quite easily in hashmap on the first value. Say if you want to find the age of martin from the previous example you can easily search on Martin and the age 25 will appear.
I have been taught that 2d arrays represent a table. Something like.
arr[2][3]
1 [1 , 2 , 3]
2 [1 , 2 , 3]
But in reality you cannot access/change 1 and 2 outside the [] grid. This should serve only as an imaginary help to illustrate the concept of 2d arrays.
Could you please correct me if I am wrong or make any additional comments on that.
Thank you

A hashmap uses keys and values, not indices. Therefore you can only search for keys, and thus not access any index. Keys need to be unique, you can not have two identical keys, the old key's value will be replaced if you try to reassign something to it. In a hashmap, key can be any object (an index of an array has to be a number). The key kind of works as the index of an array. As said before, the key can be any object, an array's indexes must be int primitives.

It's like comparing apples and oranges.
A 2D array is just a bidimensional grid of objects, an HashMap is a special kind of associative array (called also dictionary or map) which associates generic keys to generic values. The HashMap is not the only one existing, a TreeMap, for example, exists too, which provides roughly the same interface but a totally different implementation.
The other main difference is that an HashMap is made to fulfill a specific requirement which is unnecessary in an array: being able to store sparse keys without wasting too much space while keeping complexity of get and set operation constant.
This can be seen easily:
int[] intMap = new int[10];
HashMap<Integer,Integer> hashIntMap = new HashMap<Integer,Integer>();
Now suppose that you want to insert the pair (500,100):
intMap[500] = 100;
hashIntMap.put(500, 100);
In the first case you will need to have enough room in the array (at least 501 elements) to be able to access cell at index 500. In an HashMap there is no such requirement since elements are stored by using an hash code and bucketed in a lot less cells than the required one.

Collection to store primitive ints that allows for faster contains() & ordered iteration

I need a space efficient collection to store a large list of primitive int(s)(with around 800,000 ints), that allows for fast operations for contains() & allows for iteration in a defined order.
Faster contains() operations to check whether an int is there in the list or not, is main priority as that is done very frequently.
I'm open to using widely used & popular 3rd party libraries like Trove, Guava & such others.
I have looked at TIntSet from Trove but I believe that would not let me define the order of iteration anyhow.
Edit:
The size of collection would be around 800,000 ints.
The range of values in the collection will be from 0 to Integer.Max_VALUE. The order of iteration should be actually based on the order in which I add the value to collection or may be I just provide an ordered int[] & it should iterate in the same order.

As data structure I would choose an array of longs (which I logically treat as two ints). The high-int part (bits 63 - 32) represent the int value you add to the collection. The low-int part (bits 31 - 0) represents the index of the successor when iterating. In case of your 800.000 unique integers you need to create a long array of size 800.000.
Now you organize the array as a binary balanced tree ordered by your values. To the left the smaller values and to the right the higher values. You need two more tracking values: one int to point to the first index to start iterating at and one int to point to the index of the value inserted last.
Whenever you add a new value, reorganize your binary balanced tree and update the pointer from the last value added pointing to the currently added value (as indexes).
Wrap this values (the array and both int values) as the collection of your choice.
With this data structure you get a search performance of O(log(n)) and a memory usage of two times the size of values.

As this reeks of database, but you require a more direct approach, use a memory mapped file of java.nio. Especially a self-defined ordering of 800_000 ints will not do otherwise. The contains could be realized with a BitSet in memory though, parallel to the ordering in the file.

You can use 2 Sets one set is set based on hash (e.g. TIntSet) for fast contains operations. Another is set based on tree structure like TreeSet to iterate in speicific order.
And when you need to add int, you update both sets at the same time.

It sounds like LinkedHashSet might be what you're looking for. Internally, it maintains two structures--a HashSet and a LinkedList, allowing for both fast 'contains()' (from the former) and defined iteration order (from the latter).

Just use a ArrayList<Integer>.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - HashSet: Is iterator not random access? - java

I know the underlying data structure for the hashset is an array. I thought I can get a random value from the hashset by using iterator().next(). I looked at the source code but couldn't really tell. Does iterator not traverse the values in the hashset in a random order?

Related

Data structure for permutations in Java

Find the only unique element in an array of a million elements

How does iteration work in Hashset?

difference between 2d array and hashmap

Collection to store primitive ints that allows for faster contains() & ordered iteration

Categories

Resources