Java Using Comparators in combination with custom Comparators - java

I want to sort the following example list which currently contains only Strings with my own custom rules.
ArrayList<String> coll = new ArrayList<>();
coll.add("just");
coll.add("sdsd");
coll.add("asb");
coll.add("b as");
coll.add("just");
coll.add("dhfga");
coll.add("jusht");
coll.add("ktsa");
coll.add("just");
coll.add("just");
I know that I could write my own comparator for this, but as I know that Java also got comparators which solve this problem partially I want to know how I can use the ones from the Java API in combination with my own one.
How should it be sorted?
The word just should always be the first word to appear in the list followed by all other words in alphabetical order.
Comparator.naturalOrder() sorts the list in alphabetical order, but how can I combine this comperator with a custom one which checks whether the word is just or something else.

You can do this something like that:
coll.sort(Comparator
.comparingInt((String s) -> s.equals("just") ? 0 : 1) // Words "just" first
.thenComparing(Comparator.naturalOrder())); // Then others

You could integrate the criteria into the comparator like
coll.sort(Comparator.comparing((String s) -> !s.equals("just"))
.thenComparing(Comparator.naturalOrder()));
or you separate the operations, first moving all occurrences of "just" to the front, then sorting the remaining elements only:
int howManyJust = 0;
for(int ix = 0, num = coll.size(); ix < num; ix++)
if(coll.get(ix).equals("just") && ++howManyJust <= ix)
Collections.swap(coll, ix, howManyJust-1);
coll.subList(howManyJust, coll.size()).sort(Comparator.naturalOrder());
while this may look more complicated, it is potentially more efficient, especially for larger lists.

The first step should be to define the custom order. I would do that by using a Map.
Map<String, Integer> orderMap = new HashMap<>();
int order = 0;
for(String specialWord : yourListOfSpecialWords){
orderMap.put(specialWord, order++);
}
Now build comparator using that map and natural order as backup:
Comparator<String> comparator = ((Comparator<String>) (o1, o2) -> {
int leftScore = orderMap.getOrDefault(o1, Integer.MAX_VALUE);
int rightScore = orderMap.getOrDefault(o2, Integer.MAX_VALUE);
return Integer.compare(leftScore, rightScore);
}).thenComparing(String::compareTo);
Use this comparator to sort your list. Note: you probably want to initialize your map only once and keep it in a constant or at least in a cache.
But if your special case is only a single word, as your update suggests, then this is of course overkill, and you should go with one of the other answers here.

Related

Getting the indices of an unsorted double array after sorting

This question comes as a companion of this one that regarded fastest sorting of a double array.
Now I want to get the top-k indices corresponding to the unsorted array.
I have implemented this version which (unfortunately) uses autoboxing and HashMap as proposed in some answers including this one:
HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}
As you can see this uses a HashMap with keys the Double values of the original array and as values the indices of the original array.
So, after sorting the original array I just retrieve it from the map.
I also use HashSet as I am interested in deciding if an int is included in this set, using .contains() method. (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). If this does not make a difference point it out though.
I am not interested in the value per se, only the indices.
My question is whether there is a faster approach to go with it?
This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code.
Instead create an object:
class Data {
double value;
int originalIndex;
}
Create an array of Data objects storing the original value and index.
Sort them using a custom comparator that looks at data.value and sorts descending.
Now the top X items in your array are the ones you want and you can just look at the value and originalIndex as you need them.
As Tim points out linking a multiple collections is rather errorprone. I would suggest using a TreeMap as this would allow for a standalone solution.
Lets say you have double[] data, first copy it to a TreeMap:
final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
dataWithIndex.put(data[i], i);
}
N.B. You can declare dataWithIndex as a NavigableMap to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK.
This will populate the Map in O(n lg n) time as each put is O(lg n) - this is the same complexity as sorting. In reality it will be probably be a little slower, but it will scale in the same way.
Now, say you need the first k elements, you need to first find the kth element - this is O(k):
final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
kthKey = keyIter.next();
}
Now you just need to get the sub-map that has all the entries upto the kth entry:
final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);
If you only need to do this once, then with Java 8 you can do something like this:
List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
limit(k).
collect(toList());
i.e. take an IntStream for the indices of data and mapToObj to an Entry of the data[i] => i (using the AbsractMap.SimpleEntry implementation). Now sort that using Entry::getKey and limit the size of the Stream to k entries. Now simply collect the result to a List. This has the advantage of not clobbering duplicate entries in the data array.
It is almost exactly what Tim suggests in his answer, but using an existing JDK class.
This method is also O(n lg n). The catch is that if the TreeMap approach is reused then it's O(n lg n) to build the Map but only O(k) to reuse it. If you want to use the Java 8 solution with reuse then you can do:
List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
collect(toList());
i.e. don't limit the size to k elements. Now, to get the first k elements you just need to do:
List<Entry<Double, Integer>> subList = sorted.subList(0, k);
The magic of this is that it's O(1).

Find number of distinct elements in a linked list

I have a LinkedList that contains many objects. How can I find the number and frequency of the distinct elements in the LinkedList.
You can iterate the list with a for-each loop while maintaining a histogram.
The histogram will actually be a Map<T,Integer> where T is the type of the elements in the linked list.
If you use a HashMap, this will get you O(n) average case algorithm for it - be sure you override equals() and hashCode() for your T elements. [if T is a built-in class [like Integer or String], you shouldn't be worried about this, they already override these methods].
The idea is simple: iterate the array, for each element: search for it in the histogram - if it is not there, insert it with value 1 [since you just saw it for the first time]. If it is in the histogram already, extract the value, and re-insert the element - with the same key and with value + 1.
should look something like this: [list is of type LinkedList<Integer>]
Map<Integer,Integer> histogram = new HashMap<Integer, Integer>();
for (Integer x : list) {
Integer value = histogram.get(x);
if (value == null) histogram.put(x,1);
else histogram.put(x, value + 1);
}
A simpler variation of the histogram solution with a Guava Multiset:
Multiset<Integer> multiset = HashMultiset.create();
multiset.addAll(linkedList);
int count = multiset.count(element); // number of occurrences of element
Set<Integer> distinctElements = multiset.elementSet();
// set of all the unique elements seen
(Disclosure: I work on Guava.)
#amit's answer is good, but I want to share a slight variation (and can't format a block of code in comment - otherwise this would just be a comment). I like to make two passes, one to create the histogram elements and the second to populate them. This feels cleaner to me, although it may be less efficient.
Map<Integer,Integer> histogram = new HashMap<Integer, Integer>();
for (Integer n : list)
histogram.put(n, 0);
for (Integer n : list)
histogram.put(n, histogram.get(n) + 1);
The LambdaJ Library offers a few interesting methods to query collections very easily as well:
List<Jedi> jedis = asList(
new Jedi("Luke"), new Jedi("Obi-wan"), new Jedi("Luke"),
new Jedi("Yoda"), new Jedi("Mace-Windu"),new Jedi("Luke"),
new Jedi("Obi-wan")
);
Group<Jedi> byName = with(jedis).group(Groups.by(on(Jedi.class).getName()));
System.out.println(byName.find("Luke").size()); //output 3
System.out.println(byName.find("Obi-wan").size()); //ouput 2
i have just learned about HashSet. i have no idea about map yet. so let me suggest my solution base on HashSet.
for(String a:Linklist1){
if(Hashset1.add(a){
count++;
}
}
System.out.println(count);
hope this helps.

How to sort ArrayList<Long> in decreasing order?

How to sort an ArrayList<Long> in Java in decreasing order?
Here's one way for your list:
list.sort(null);
Collections.reverse(list);
Or you could implement your own Comparator to sort on and eliminate the reverse step:
list.sort((o1, o2) -> o2.compareTo(o1));
Or even more simply use Collections.reverseOrder() since you're only reversing:
list.sort(Collections.reverseOrder());
Comparator<Long> comparator = Collections.reverseOrder();
Collections.sort(arrayList, comparator);
You can use the following code which is given below;
Collections.sort(list, Collections.reverseOrder());
or if you are going to use custom comparator you can use as it is given below
Collections.sort(list, Collections.reverseOrder(new CustomComparator());
Where CustomComparator is a comparator class that compares the object which is present in the list.
Java 8
well doing this in java 8 is so much fun and easier
Collections.sort(variants,(a,b)->a.compareTo(b));
Collections.reverse(variants);
Lambda expressions rock here!!!
in case you needed a more than one line logic for comparing a and b you could write it like this
Collections.sort(variants,(a,b)->{
int result = a.compareTo(b);
return result;
});
Sort normally and use Collections.reverse();
For lamdas where your long value is somewhere in an object I recommend using:
.sorted((o1, o2) -> Long.compare(o1.getLong(), o2.getLong()))
or even better:
.sorted(Comparator.comparingLong(MyObject::getLong))
Sort, then reverse.
By using Collections.sort() with a comparator that provides the decreasing order.
See Javadoc for Collections.sort.
A more general approach to implement our own Comparator as below
Collections.sort(lst,new Comparator<Long>(){
public int compare(Long o1, Long o2) {
return o2.compareTo(o1);
}
});
The following approach will sort the list in descending order and also handles the 'null' values, just in case if you have any null values then Collections.sort() will throw NullPointerException
Collections.sort(list, new Comparator<Long>() {
public int compare(Long o1, Long o2) {
return o1==null?Integer.MAX_VALUE:o2==null?Integer.MIN_VALUE:o2.compareTo(o1);
}
});
You can also sort an ArrayList with a TreeSet instead of a comparator. Here's an example from a question I had before for an integer array. I'm using "numbers" as a placeholder name for the ArrayList.
import.java.util.*;
class MyClass{
public static void main(String[] args){
Scanner input = new Scanner(System.in);
ArrayList<Integer> numbers = new ArrayList<Integer>();
TreeSet<Integer> ts = new TreeSet<Integer>(numbers);
numbers = new ArrayList<Integer>(ts);
System.out.println("\nThe numbers in ascending order are:");
for(int i=0; i<numbers.size(); i++)
System.out.print(numbers.get(i).intValue()+" ");
System.out.println("\nThe numbers in descending order are:");
for(int i=numbers.size()-1; i>=0; i--)
System.out.print(numbers.get(i).intValue()+" ");
}
}
So, There is something I would like to bring up which I think is important and I think that you should consider. runtime and memory. Say you have a list and want to sort it, well you can, there is a built in sort or you could develop your own. Then you say, want to reverse the list. That is the answer which is listed above.
If you are creating that list though, it might be good to use a different datastructure to store it and then just dump it into an array.
Heaps do just this. You filter in data, and it will handle everything, then you can pop everything off of the object and it would be sorted.
Another option would be to understand how maps work. A lot of times, a Map or HashMap as something things are called, have an underlying concept behind it.
For example.... you feed in a bunch of key-value pairs where the key is the long, and when you add all the elements, you can do: .keys and it would return to you a sorted list automatically.
It depends on how you process the data prior as to how i think you should continue with your sorting and subsequent reverses
Comparator's comparing method can be used to compare the objects and then method reversed() can be applied to reverse the order -
list.stream().sorted(Comparator.comparing(Employee::getName).reversed()).collect(toList());
Using List.sort() and Comparator.comparingLong()
numberList.sort(Comparator.comparingLong(x -> -x));

Java: Getting the 500 most common words in a text via HashMap

I'm storing my wordcount into the value field of a HashMap, how can I then get the 500 top words in the text?
public ArrayList<String> topWords (int numberOfWordsToFind, ArrayList<String> theText) {
//ArrayList<String> frequentWords = new ArrayList<String>();
ArrayList<String> topWordsArray= new ArrayList<String>();
HashMap<String,Integer> frequentWords = new HashMap<String,Integer>();
int wordCounter=0;
for (int i=0; i<theText.size();i++){
if(frequentWords.containsKey(theText.get(i))){
//find value and increment
wordCounter=frequentWords.get(theText.get(i));
wordCounter++;
frequentWords.put(theText.get(i),wordCounter);
}
else {
//new word
frequentWords.put(theText.get(i),1);
}
}
for (int i=0; i<theText.size();i++){
if (frequentWords.containsKey(theText.get(i))){
// what to write here?
frequentWords.get(theText.get(i));
}
}
return topWordsArray;
}
One other approach you may wish to look at is to think of this another way: is a Map really the right conceptual object here? It may be good to think of this as being a good use of a much-neglected-in-Java data structure, the bag. A bag is like a set, but allows an item to be in the set multiple times. This simplifies the 'adding a found word' very much.
Google's guava-libraries provides a Bag structure, though there it's called a Multiset. Using a Multiset, you could just call .add() once for each word, even if it's already in there. Even easier, though, you could throw your loop away:
Multiset<String> words = HashMultiset.create(theText);
Now you have a Multiset, what do you do? Well, you can call entrySet(), which gives you a collection of Multimap.Entry objects. You can then stick them in a List (they come in a Set), and sort them using a Comparator. Full code might look like (using a few other fancy Guava features to show them off):
Multiset<String> words = HashMultiset.create(theWords);
List<Multiset.Entry<String>> wordCounts = Lists.newArrayList(words.entrySet());
Collections.sort(wordCounts, new Comparator<Multiset.Entry<String>>() {
public int compare(Multiset.Entry<String> left, Multiset.Entry<String> right) {
// Note reversal of 'right' and 'left' to get descending order
return right.getCount().compareTo(left.getCount());
}
});
// wordCounts now contains all the words, sorted by count descending
// Take the first 50 entries (alternative: use a loop; this is simple because
// it copes easily with < 50 elements)
Iterable<Multiset.Entry<String>> first50 = Iterables.limit(wordCounts, 50);
// Guava-ey alternative: use a Function and Iterables.transform, but in this case
// the 'manual' way is probably simpler:
for (Multiset.Entry<String> entry : first50) {
wordArray.add(entry.getElement());
}
and you're done!
Here you can find a guide how to sort a HashMap by the values. After the sorting you can just iterate over the first 500 entries.
Take a look at the TreeBidiMap provided by the Apache Commons Collections package. http://commons.apache.org/collections/api-release/org/apache/commons/collections/bidimap/TreeBidiMap.html
It allows you to sort the map according to both the key or the value set.
Hope it helps.
Zhongxian

Improving performance of merging lots of sorted maps into one sorted map - java

I have a method that gets a SortedMap as input, this map holds many SortedMap objects, the output of this method should be one SortedMap containing all elements of the maps held in the input map. the method looks like this:
private SortedMap mergeSamples(SortedMap map){
SortedMap mergedMap = new TreeMap();
Iterator sampleIt = map.values().iterator();
while(sampleIt.hasNext())
{
SortedMap currMap = (SortedMap) sampleIt.next();
mergedMap.putAll(currMap);
}
return mergedMap;
}
This is a performance killer, what can I improve here?
I don't see anything wrong with your code; all you can really do is try alternative implementations of SortedMap. First one would be ConcurrentSkipListMap and then look at Commons Collections, Google Collections and GNU Trove. The latter can yield very good results especially if your maps' keys and values are primitive types.
Is it a requirement for the input to be a SortedMap? To me it would seem easier if the input was just a Collection or List. That might speed up creating the input, and might make iteration over all contained maps faster.
Other than that I believe the most likely source of improving the performance of this code is by improving the speed of the compareTo() implementation of the values in the the sorted maps being merged.
Your code is as good as it gets. However, it seems to me that the overall design of the data structure needs some overhaul: You are using SortedMap<?, SortedMap<?, ?>, yet the keys of the parent map are not used.
Do you want to express a tree with nested elements with that and your task is it to flatten the tree? If so, either create a Tree class that supports your approach, or use an intelligent way to merge the keys:
public class NestedKey implements Comparable<NestedKey> {
private Comparable[] entries;
public NestedKey(Comparable... entries) {
assert entries != null;
this.entries = entries;
}
public int compareTo(NestedKey other) {
for(int i = 0; i < other.entries.length; i++) {
if (i == entries.length)
return -1; // other is longer then self <=> self is smaller than other
int cmp = entries[i].compareTo(other.entries[i]);
if (cmp != 0)
return cmp;
}
if (entries.length > other.entries.length)
return 1; // self is longer than others <=> self is larger than other
else
return 0;
}
}
The NestedKey entry used as a key for a SortedMap compares to other NestedKey objects by comparing each of its entries. NestedKeys that are in all elements present, but that have more entries are assumed to be larger. Thus, you have a relationship like this:
NestedKey(1, 2, 3) < NestedKey(1, 2, 4)
NestedKey(1, 3, 3) < NestedKey(2, 1, 1)
NestedKey(1, 2, 3) < NestedKey(2)
If you use only one SortedMap that uses NestedKey as its keys, then its .values() set automatically returns all entries, flattened out. However, if you want to use only parts of the SortedMap, then you must use .subMap. For example, if you want all entries wite NestedKeys between 2 and 3 , use .subMap(new NestedKey(2), new NestedKey(3))

Categories