Index based subset of SortedSet

Index based subset of SortedSet - java

I need to add objects from a sourceList to a collection that sorts the collection as we add objects to the collection. I am thinking of using TreeSet.
TreeSet bookSet
Based on certain conditions, I need to take subset of the bookSet. Subset will be first N elements. The value of N is known only after entire bookSet is prepared from another sourceList.
Is there anyway I can get the subset of bookSet using index N, Similar to arrayList.subList(0,N).
I can use headSet, but I need to know N+1 th element.

Depending on what your trying to achieve:
You can use the TreeSet.iterator() and iterate N times.
In Java 8 you can use bookSet.stream().limit(N)
You can simply copy to an new ArrayList(bookSet) and then take a subList

Related

Best way to compare a big list with another small list in java?

Consider I have a listA[String] with a 2 million records and another listB with 20 thousands of records.
I want to compare and check how many elements from listB is not contained in listA.
Very basic method is listA.contains(listB[i]). But for 20 thousand records, It will traverse listA for 20 thousand times and time complexity will be O(n*n).
Is there any better way to do it ?

You could use a HashSet (or LinkedHashSet if the order of elements is important). Set is a collection that contains no duplicate elements and inserting/searching is much faster than searching in a List. You will need to implement equals and hashCode methods.
If you need a List, you can convert it back after the searching:
List<Object> list = new ArrayList<Object>(hashset);

Fastest Variable Set in Java

I need Structure (Arraylist, LinkedList, etc) that is very fast for this case:
While the structure is not empty I search the structure for elements that satisfy a condition , lets say k, remove the elements that satisfy k and start over for another condition lets say k+1.
e.g.:
for (int i = 1 ; i <= 1000000; i++) {
structure.add(i);
}
d = 2;
while (!structure.isEmpty()) {
for(int boom : structure.clone) {
if (boom % d == 2) {
structure.remove(boom);
}
d++;
}
}

If the elements are primitives, then the fastest structure will most probably be a specialized primitive collection (e.g., trove). Following references for boxed primitives is a nearly sure cache miss and this probably dominates the costs.
I wouldn't suggest a LinkedList for the same reason: It's dead slow due to cache misses.
If the order is unimportant, than an ArrayList is perfect. Instead of removing an element, replace it by the last one and remove the last array element. This is an O(1) operation and doesn't suffer from the bad spatial locality.
If the order is important, you can build your own ArrayList-like structure. Instead of removing an element, you mark it for removal e.g. in a BitSet or in a boolean[]. Finally you perform the removal in one sweep by moving all elements to their right position and adjusting the length. The optimized loop will most probably look similar to CharMatcher.removeFrom loop.
A simpler solution would be to use an ArrayList and copy all surviving elements to another one. I'd bet it'd beat the LinkedList hands down. As a minor GC-friedly optimization you can work with two lists.

LinkedList should be fastest for this case. Use the iterator explicitly (structure.iterator()) and call the remove method of the iterator instead of calling structure.remove(element)!

I don't know your exact use case, but here's one note.
If you have your predicates P1 .. PN pre-compiled, available, and if you are not modifying the contents of the collection and if your predicates are not dependent on each other, you might want to create a composite predicate, like bundle up N predicates in some logical order, and then in only one iteration over your collection perform the filtering method.
As for data structure, I'd think if it like this:
If my filtering predicates will be totally arbitrary, then a list should be OK to use.
In some more specific cases with very limited and strict value sets, you might consider a tree-like or a graph-like structure, where you could have some master nodes which would denote that property "property1" has value "value1". In case you wanted to drop all items where "property1" value is "value1" you could tell that master node to remove all his children (and that they should detach themselves from any other parent master nodes they might have).

Sorted List data structure
If you construct the lists yourself you can consider using a sorted data-structure. It will give you best search performance( log n complexity so it is very fast).
Linked List data structure
LinkedList gives you constant time element removal but random access doesn't have constant complexity (is slow).
You will have to benchmark if a LinkedList or a sorted list would be faster for your scenario.

If your elements are ints, I suppose bit set would be the fastest data structure for this task. Iteration would be slightly slower than through array list (even not standard java.util.ArrayList, only primitive specialization), but remove ops cost nearly nothing, while removes from any array list are quite expensive.
Note, you can gain much by working directly with long[] as bit set and performing bitwise operations by hand, because java.util.BitSet is not very performance-focused. But, of cause, start with BitSet.

Indexed addition without IndexOutOfBounds Exception

I have n objects each of them with an identifying number. I get them unsorted but the range of indexes (0, n-1) is used to identify them. I want to access them as fastest as possible. I suppose that an ArrayList would be the best option, I'd add the object with identifier n at the position of the ArrayList with index n by:
list.add(identifier, object);
The problem is that when I am adding the objects I get an IndexOutOfBounds Exception because I'm adding them unsorted and the size() is smaller although I know that previous positions will also be filled.
Another option is to use a HashMap but I suppose that this will decrease performance.
Do you know a collection that has the behavior described above?

Do you know a collection that has the behavior described above?
It sounds like you need a plain old Java array. And if you need it as a collection, then use "Arrays.asList(...)" to create a List wrapper for it.
Now this won't work if you needed to add or remove elements from the array / collection, but it sounds like you don't need to from your problem description.
If you do need to add / remove elements (as distinct from using set to update the element at a given position, then Peter Lawrey's approach is best.
By contrast, a HashMap<Integer, Object> would be an expensive alternative. At a rough estimate, I'd say it that "indexing" operations would be at least 10 times slower, and the data structure would take 10 times the space compared to an equivalent array or ArrayList type. A hash table based solution is only really a viable alternative (from a performance perspective) if the array is large and sparse.

Sometimes you get the indexes out of order. This requires you to add dummy entries which may be filled later.
int indexToAdd = ...
E elementToAdd = ...
while(list.size() <= indexToAdd) list.add(null);
list.set(indexToAdd, elementToAdd);
This will allow you to add entries beyond the current end of the list.
The Javadoc for List.add(int, E) and List.set(int, E) both state
IndexOutOfBoundsException - if the index is out of range (index < 0 || index > size())
If you attempt to add entries beyond the end
List list = new ArrayList();
list.add(1, 1);
you get
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:612)
at java.util.ArrayList.add(ArrayList.java:426)
at Main.main(Main.java:28)

I'm not sure how much more expensive a HashMap<Integer, T> would be. Integer.hashCode() is quite efficient, though the only really expensive operation might be copying the data into a new larger array as the number of items increases. However, if you know your n, you could use a normal array.
As an alternative, you could implement your own Map<Integer, T> that does not use the hash code but the Integer itself. Before you do this, make sure that neither an array is sufficient nor HashMap is efficient enough!

I think you have at least two good options.
The first is to use a straight Java array initialized to the appropriate length, then add objects with the syntax:
theArray[i] = object;
The second would be to use a HashMap, and add objects with the syntax:
theMap.put(i, object);
I'm not sure what performance issues you're worried about, but adding elements within the range, clearing (out of an array) or removing (out of a HashMap), and finding elements from a given index (or key, for a HashMap) are all O(1) for both structures. I would also suggest taking a look at Wikipedia's list of data structures if neither of these seem good.

Adding elements into ArrayList at position larger than the current size

Currently I'm using an ArrayList to store a list of elements, whereby I will need to insert new elements at specific positions. There is a need for me to enter elements at a position larger than the current size. For e.g:
ArrayList<String> arr = new ArrayList<String>();
arr.add(3,"hi");
Now I already know there will be an OutOfBoundsException. Is there another way or another object where I can do this while still keeping the order? This is because I have methods that finds elements based on their index. For e.g.:
ArrayList<String> arr = new ArrayList<String>();
arr.add("hi");
arr.add(0,"hello");
I would expect to find "hi" at index 1 instead of index 0 now.
So in summary, short of manually inserting null into the elements in-between, is there any way to satisfy these two requirements:
Insert elements into position larger than current size
Push existing elements to the right when I insert elements in the middle of the list
I've looked at Java ArrayList add item outside current size, as well as HashMap, but HashMap doesn't satisfy my second criteria. Any help would be greatly appreciated.
P.S. Performance is not really an issue right now.
UPDATE: There have been some questions on why I have these particular requirements, it is because I'm working on operational transformation, where I'm inserting a set of operations into, say, my list (a math formula). Each operation contains a string. As I insert/delete strings into my list, I will dynamically update the unapplied operations (if necessary) through the tracking of each operation that has already been applied. My current solution now is to use a subclass of ArrayList and override some of the methods. I would certainly like to know if there is a more elegant way of doing so though.

Your requirements are contradictory:
... I will need to insert new elements at specific positions.
There is a need for me to enter elements at a position larger than the current size.
These imply that positions are stable; i.e. that an element at a given position remains at that position.
I would expect to find "hi" at index 1 instead of index 0 now.
This states that positions are not stable under some circumstances.
You really need to make up your mind which alternative you need.
If you must have stable positions, use a TreeMap or HashMap. (A TreeMap allows you to iterate the keys in order, but at the cost of more expensive insertion and lookup ... for a large collection.) If necessary, use a "position" key type that allows you to "always" generate a new key that goes between any existing pair of keys.
If you don't have to have stable positions, use an ArrayList, and deal with the case where you have to insert beyond the end position using append.
I fail to see how it is sensible for positions to be stable if you insert beyond the end, and allow instability if you insert in the middle. (Besides, the latter is going to make the former unstable eventually ...)

even you can use TreeMap for maintaining order of keys.

First and foremost, I would say use Map instead of List. I guess your problem can be solved in better way if you use Map. But in any case if you really want to do this with Arraylist
ArrayList<String> a = new ArrayList<String>(); //Create empty list
a.addAll(Arrays.asList( new String[100])); // add n number of strings, actually null . here n is 100, but you will have to decide the ideal value of this, depending upon your requirement.
a.add(7,"hello");
a.add(2,"hi");
a.add(1,"hi2");

Use Vector class to solve this issue.
Vector vector = new Vector();
vector.setSize(100);
vector.set(98, "a");
When "setSize" is set to 100 then all 100 elements gets initialized with null values.

For those who are still dealing with this, you may do it like this.
Object[] array= new Object[10];
array[0]="1";
array[3]= "3";
array[2]="2";
array[7]="7";
List<Object> list= Arrays.asList(array);
But the thing is you need to identify the total size first, this should be just a comment but I do not have much reputation to do that.

Comparator for TreeBag to sort by the number of occurrences

I have a source of strings (let us say, a text file) and many strings repeat multiple times. I need to get the top X most common strings in the order of decreasing number of occurrences.
The idea that came to mind first was to create a sortable Bag (something like org.apache.commons.collections.bag.TreeBag) and supply a comparator that will sort the entries in the order I need. However, I cannot figure out what is the type of objects I need to compare. It should be some kind of an internal map that combines my object (String) and the number of occurrences, generated internally by TreeBag. Is this possible?
Or would I be better off by simply using a hashmap and sort it by value as described in, for example, Java sort HashMap by value

Why don't you put the strings in a map. Map of string to number of times they appear in text.
In step 2, traverse the items in the map and keep on adding them to a minimum heap of size X. Always extract min first if the heap is full before inserting.
Takes nlogx time.
Otherwise after step 1 sort the items by number of occurrences and take first x items. A tree map would come in helpful here :) (I'd add a link to the javadocs, but I'm in a tablet )
Takes nlogn time.

With Guava's TreeMultiset, just use Multisets.copyHighestCountFirst.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Index based subset of SortedSet - java

Depending on what your trying to achieve: You can use the TreeSet.iterator() and iterate N times. In Java 8 you can use bookSet.stream().limit(N) You can simply copy to an new ArrayList(bookSet) and then take a subList

Related

Best way to compare a big list with another small list in java?

Fastest Variable Set in Java

Indexed addition without IndexOutOfBounds Exception

Adding elements into ArrayList at position larger than the current size

Comparator for TreeBag to sort by the number of occurrences

Categories

Resources