Data structure for permutations in Java - java

I need to store a permutation of n integers and be able to compute both the permutation of a value and the inverse operation in efficient time.
I.e, I need to store a reordering of values [0...n-1] in such a way I can ask for position(i) and value(j) (with 0 <= i,j <= n).
With an example—Suppose we have the following permutation of values:
[7,2,3,6,0,4,8,9,1,5]
I need the following operations:
position(7) = 9
value(9) = 7
I know libraries in C++ for that, such as: https://github.com/fclaude/libcds2
Is there any structure or library in Java that allows to do that and is efficient in space and time?

If there are no duplicates, the List interface will suit your needs.
It provides the following methods:
List#get(index) returns the element with index index
List#indexOf(element) returns the index of the first encountered element

Related

Is there a more efficient way to reduce Java TreeSet to a subset based on index?

I am currently implementing the Mash algorithm for genome comparison. For this I need to create a sketch S(A) for each genome, which is a set of certain size s containing the lowest hash values corresponding to k-mers in the genome. To compare two genomes, I am computing the Jaccard index, for which I need an additional sketch of the union of the two genomes, i.e. S(A u B) for two genomes A and B. This should contain the s lowest hash values found in the union of A and B.
I am computing the sketches as a TreeSet because in the original algorithm to compute a sketch I need to remove the biggest value from the set whenever I add a new value that is lower and the sketch has already reached the maximum size s. This is very easily accomplished using TreeSet because the largest value will be in the last position of the set.
I have computed a union of two sketches and now want to remove the larger elements to reduce the sketch size to size s. I first implemented this using a while loop and removing the last element until reaching the desired size.
The following is my code using an example TreeSet and size s = 10:
SortedSet<Integer> example = new TreeSet<>();
for (int i = 0; i < 15; i++) {
example.add(i);
}
while (example.size() > 10) example.remove(example.last());
However, in the real application sketch sizes will be much larger and the size of the union can be up to two times the size of a single sketch. Trying to find a more efficient way to reduce the sketch size I found that you could convert the TreeSet to an array. Thus, my second approach would be the following:
Object[] temp = example.toArray();
int value = (int) temp[10];
example = example.headSet(value);
So, here I am getting the value at index s from the array, which I can then use to create a headSet from the TreeSet.
Now, I am wondering if there is a more efficient way to reduce the size of the TreeSet, where I don't need to iterate over the size of the TreeSet over and over again or generate an extra array.

Java - HashSet: Is iterator not random access?

I know the underlying data structure for the hashset is an array. I thought I can get a random value from the hashset by using iterator().next().
I looked at the source code but couldn't really tell. Does iterator not traverse the values in the hashset in a random order?
The iterator will traverse the elements by hash table bucket which is based on the hash code of the objects, and thus they will be in an arbitrary order which might certainly seem random, however they will be consistent for a given HashSet size and contents. Because the order is arbitrary, hash-based containers make no guarantees about the iteration order of their elements, but they do not make any effort to randomize the order.
Random Access in terms of data structures means that you can get the elements in an array-like operation using an index. It lets you select any location by specifying the aforementioned index. Lists are also random access as they have a get() method. If you want to get the elements in a random order other you could put them in a List and then shuffle the list.
List<Integer> list = new ArrayList<>(List.of(1,2,3,4,5,6,7,8,9));
Collections.shuffle(list);
for (int i : list) {
System.out.println(i);
}
prints something like the following without repeated elements.
4
5
3
7
6
1
9
8
2
If you want to just get values randomly including possible repeated elements. Then use Random as suggested and generate a random index from 0 to list.size() and retrieve the value using list.get(). You could do that as long as required without exhausting the supply.

Fastest way to find number of elements in a range

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) or better complexity?
my implementation is this but it is O(n)
for(a=i;a<=j;a++)
if(p[a]>=x) // p[] is array containing n elements
count++;
If you are allowed to preprocess the array, then with O(n log n) preprocessing time, we can answer any [i,j] query in O(log n) time.
Two ideas:
1) Observe that it is enough to be able to answer [0,i] and [0,j] queries.
2) Use a persistent* balanced order statistics binary tree, which maintains n versions of the tree, version i is formed from version i-1 by adding a[i] to it. To answer query([0,i], x), you query the version i tree for the number of elements > x (basically rank information). An order statistics tree lets you do that.
*: persistent data structures are an elegant functional programming concept for immutable data structures and have efficient algorithms for their construction.
If the array is sorted you can locate the first value less than X with a binary search and the number of elements greater than X is the number of items after that element. That would be O(log(n)).
If the array is not sorted there is no way of doing it in less than O(n) time since you will have to examine every element to check if it's greater than or equal to X.
Impossible in O(log N) because you have to inspect all the elements, so a O(N) method is expected.
The standard algorithm for this is based on quicksort's partition, sometimes called quick-select.
The idea is that you don't sort the array, but rather just partition the section containing x, and stop when x is your pivot element. After the procedure is completed you have all elements x and greater to the right of x. This is the same procedure as when finding the k-th largest element.
Read about a very similar problem at How to find the kth largest element in an unsorted array of length n in O(n)?.
The requirement index i to j is not a restriction that introduces any complexity to the problem.
Given your requirements where the data is not sorted in advance and constantly changing between queries, O(n) is the best complexity you can hope to achieve, since there's no way to count the number of elements greater than or equal to some value without looking at all of them.
It's fairly simple if you think about it: you cannot avoid inspecting every element of a range for any type of search if you have no idea how it's represented/ordered in advance.
You could construct a balanced binary tree, even radix sort on the fly, but you're just pushing the overhead elsewhere to the same linear or worse, linearithmic O(NLogN) complexity since such algorithms once again have you inspecting every element in the range first to sort it.
So there's actually nothing wrong with O(N) here. That is the ideal, and you're looking at either changing the whole nature of the data involved outside to allow it to be sorted efficiently in advance or micro-optimizations (ex: parallel fors to process sub-ranges with multiple threads, provided they're chunky enough) to tune it.
In your case, your requirements seem rigid so the latter seems like the best bet with the aid of a profiler.

Pseudo Range Minimum Query

I have a problem with my assignment which requires me to solve a problem that is similar to range-minimum-query. The problem is roughly described below:
I am supposed to code a java program which reads in large bunch of integers (about 100,000) and store them into some data structure. Then, my program must answer queries for the minimum number in a given range [i,j]. I have successfully devised an algorithm to solve this problem. However, it is just not fast enough.
The pseudo-code for my algorithm is as follows:
// Read all the integers into an ArrayList
// For each query,
// Read in range values [i,j] (note that i and j is "actual index" + 1 in this case)
// Push element at index i-1 into a Stack
// Loop from index i to j-1 in the ArrayList (tracking the current index with variable k)
[Begin loop]
// If element at k is lesser than the one at the top of the stack, push the element at k into the Stack.
[End of loop]
Could someone please advise me on what I could do so that my algorithm would be fast enough to solve this problem?
The assignment files can be found at this link: http://bit.ly/1bTfFKa
I have been stumped by this problem for days. Any help would be much appreciated.
Thanks.
Your problem is a static range minimum query (RMQ). Suppose you have N numbers. The simplest algorithm you could use is an algorithm that would create an array of size N and store the numbers, and another one that will be of size sqrtN, and will hold the RMQ of each interval of size sqrtN in the array. This should work since N is not very large, but if you have many queries you may want to use a different algorithm.
That being said, the fastest algorithm you could use is making a Sparse Table out of the numbers, which will allow you to answer the queries in O(1). Constructing the sparse table is O(NlogN) which, given N = 10^5 should be just fine.
Finally, the ultimate RMQ algorithm is using a Segment Tree, which also supports updates (single-element as well as ranges), and it's O(N) to construct the Segment Tree, and O(logN) per query and update.
All of these algorithms are very well exposed here.
For more information in Segment Trees see these tutorials I wrote myself.
link
Good Luck!

Subsets of a given Set of Integers whose sum is a Constant N : Java

Given a set of integers, how to find a subset that sums to a given value...the subset problem ?
Example : S = {1,2,4,3,2,5} and n= 7
Finding the possible subsets whose sum is n.
I tried to google out found many links,but were not clear.
How can we solve this in java and what is the data structure to be used and its complexity ?
In three steps:
Find the powerset of S (the set of all subsets of S)
Compute the sum of each subset
Filter out subsets that did not sum to 7.
I wont give you any code, but explain how it works.
Run a loop from 0 to (2^k-1)
For each value in 1, a 1 in its binary representation indicates that this value is chosen and 0 otherwise.
Test to see if the sum of chosen numbers is equal to n.
The above method will evaluate each possible subset of the given set.
If the upper limit of the values is small, then Dynamic Programming Approach could be used.

Categories