PriorityQueue with indices for keeping counts sorted - java

A problem I often encounter in Java (usually while writing computational linguistics code) is the need to count the number of occurrences of some items in a dataset, then sort the items by their counts. The simplest concrete example is word counting: I need to count the number of occurrences of each word in a text file, then sort the words by their counts to find the most frequently used words.
Unfortunately, Java doesn't seem to have a good data structure for this task. I need to use the words as indices of a collection while I'm counting, so that I can efficiently look up the right counter to increment every time I read a word, but the values I want to sort on are the counts, not the words.
Map<String, Integer> provides the interface I need for looking up the count associated with a word, but Maps can only be sorted by their keys (i.e. TreeMap). PriorityQueue is a nice heap implementation that will sort on whatever comparator you give it, but it provides no way to access the elements by some kind of index and no way to update and re-heapify an element (other than by removing and adding it). Its single type parameter also means I need to stick the words and their counts together into one object in order to use it.
My current "solution" is to store the counts in a Map while counting them, then copy them all into a PriorityQueue to sort them:
Map<String, Integer> wordCounts = countStuff();
PriorityQueue<NamedCount> sortedCounts = new PriorityQueue<>(wordCounts.size(),
Collections.reverseOrder());
for(Entry<String, Integer> count : wordCounts.entrySet()) {
sortedCounts.add(new NamedCount(count.getKey(), count.getValue()));
}
(Note that NamedCount is just a simple pair<string, int> that implements Comparable to compare the integers). But this is inefficient, especially since the data set can be very large, and keeping two copies of the count set in memory is wasteful.
Is there any way I can get random access to the objects inside the PriorityQueue, so that I can just store one copy of the counts in the PriorityQueue and re-heapify as I update them? Would it make sense to use a Map<String, NamedCount> that keeps "pointers" to the objects in the PriorityQueue<NamedCount>?

First, for the base data structure, typically Guava's Multiset<String> is preferable to Map<String, Integer> in the same way that Set<String> is preferable to Map<String, Boolean>. It's a cleaner API and encapsulates the incrementing.
Now, if this were me, I would implement a custom Multiset which adds some additional logic to index the counts, and return them. Something like this:
class IndexedMultiset<T extends Comparable<T>> extends ForwardingMultiset<T> {
private final Multiset<T> delegate = HashMultiset.create();
private final TreeMultimap<Integer, T> countIndex = TreeMultimap.create();
#Override
protected Multiset<T> delegate() {
return delegate;
}
#Override
public int add(T element, int occurrences) {
int prev = super.add(element, occurrences);
countIndex.remove(prev, element);
countIndex.put(count(element), element);
return prev;
}
#Override
public boolean add(T element) {
return super.standardAdd(element);
}
//similar for remove, setCount, etc
}
Then I'd add whatever query capabilities you need based on counts. For example, retrieving an iterable of word/count pairs in descending order could look something like this:
public Iterable<CountEntry<T>> descendingCounts() {
return countIndex.keySet().descendingSet().stream()
.flatMap((count) -> countIndex.get(count).stream())
.map((element) -> new CountEntry<>(element, count(element)))
.collect(Collectors.toList());
}
public static class CountEntry<T> {
private final T element;
private final int count;
public CountEntry(T element, int count) {
this.element = element;
this.count = count;
}
public T element() {
return element;
}
public int count() {
return count;
}
#Override
public String toString() {
return element + ": " + count;
}
}
And it would all be used like this:
public static void main(String... args) {
IndexedMultiset<String> wordCounts = new IndexedMultiset<>();
wordCounts.add("foo");
wordCounts.add("bar");
wordCounts.add("baz");
wordCounts.add("baz");
System.out.println(wordCounts.descendingCounts()); //[baz: 2, bar: 1, foo: 1]
wordCounts.add("foo");
wordCounts.add("foo");
wordCounts.add("foo");
System.out.println(wordCounts.descendingCounts()); //[foo: 4, baz: 2, bar: 1]
}

If you can use third-party libraries like Guava, Multiset is designed pretty specifically as a solution to this problem:
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
System.out.println(Multisets.copyHighestCountFirst(multiset));

Related

Building a Sort object based on Map<Enum, Enum>

I would like to build a Sort object based on Map<Column, Direction>. I have a problem with the fact that the Sort class only has a private constructor, it just has to be created by the static method by() or and(), therefore I have a problem with initialising the sort object with the first element from the map.
private Sort buildSort(Map<WorklistColumn, Direction> columnsDirectionsmap){
Sort sort = by("wartość inicjalna której nie chcemy", Direction.Ascending);
for (Map.Entry<WorklistColumn, Direction> columnWithDirection : columnsDirectionsmap.entrySet()) {
sort.and(columnWithDirection.getKey().toString(), columnWithDirection.getValue());
}
return sort;
}
public class Sort {
private List<Column> columns = new ArrayList();
private Sort() {
}
public static Sort by(String column) {
return (new Sort()).and(column);
}
public static Sort by(String column, Direction direction) {
return (new Sort()).and(column, direction);
}
public Sort and(String name) {
this.columns.add(new Column(name));
return this;
}
public Sort and(String name, Direction direction) {
this.columns.add(new Column(name, direction));
return this;
}
Build a Sort object from a map
I think the question is about the fact that to fully configure a Sort object from your map, you need to use the first map entry in conjunction with Sort.by(), and then use all the other entries in conjunction with Sort.and(). That is, the first entry requires different handling than the rest.
There are lots of ways of dealing with that, but the one I'm going to suggest is to work directly with the iterator of the map's entry set. Something like this:
private Sort buildSort(Map<WorklistColumn, Direction> columnsDirectionsMap) {
if (columnsDirectionsMap.isEmpty()) {
throw new NoCriteriaException(); // or whatever
}
Iterator<Map.Entry<WorklistColumn, Direction>> criterionIterator =
columnsDirectionsMap.entrySet().iterator();
Map.Entry<WorklistColumn, Direction> criterion = criterionIterator.next();
Sort sort = Sort.by(criterion.key().toString(), criterion.value());
while (criterionIterator.hasNext()) {
criterion = criterionIterator.next();
sort.and(criterion.key().toString(), criterion.value());
}
return sort;
}
Do note that depending on the Map implementation involved, the order of the entries may not be easily predictable. I assume that you need control of that order for this approach to work as desired, so it's on you to choose a Map implementation that provides that. A LinkedHashMap might be suitable, for example, but probably not a HashMap.

Get an Object from a collection without looping in java

I need to repeatedly (hundred of thousands of times) retrieve an element (different each time) from a Collection which contains dozens of thousand of Objects.
What is the quickest way to do this retrieval operation? At the moment my Collection is a List and I iterate on it until I have found the element, but is there a quicker way? Using a Map maybe? I was thinking to do:
Putting the Objects in a Map, with the key being the id field of the Object, and the Object itself being the value.
Then doing get(id) on the Map should be much faster than looping through a List.
If that is a correct way to do it, should I use a HashMap or TreeMap? - my objects have no particular ordering.
Any advice on the matter would be appreciated!
Last note: if an external library provides a tool to answer this I'd take it gladly!
As per the documentation of the Tree Map (emphasis my own):
The map is sorted according to the natural ordering of its keys,
or by a Comparator provided at map creation time, depending on which
constructor is used.
In your case, you state that the items have no particular order and it does not seem that you are after any particular order, but rather just be able to retrieve data as fast as possible.
HashMaps provide constant read time but do not guarantee order, so I think that you should go with HashMaps:
This class makes no guarantees as to the order of the map; in
particular, it does not guarantee that the order will remain constant
over time. This implementation provides constant-time performance for
the basic operations (get and put), assuming the hash function
disperses the elements properly among the buckets.
As a side note, the memory footprint of this can get quite high quite fast, so it might also be a good idea to look into a database approach and maybe use a cache like mechanism to handle more frequently used information.
I've created code which tests the proformance of BinarySearch, TreeMap and HashMap for the given problem.
In case you are rebuilding the collection each time, HashMap is the fastest (even with standard Object's hashCode() implementation!), sort+array binary search goes second and a TreeMap is last (due to complex rebuilding procedure).
proc array: 2395
proc tree : 4413
proc hash : 1325
If you are not rebuilding the collection, HashMap is still the fastest, an array's binary search is second and a TreeMap is the slowest, but with only slightly lower speed than array.
proc array: 506
proc tree : 561
proc hash : 122
Test code:
public class SearchSpeedTest {
private List<DataObject> data;
private List<Long> ids;
private Map<Long, DataObject> hashMap;
private Map<Long, DataObject> treeMap;
private int numRep;
private int dataAmount;
private boolean rebuildEachTime;
static class DataObject implements Comparable<DataObject>{
Long id;
public DataObject(Long id) {
super();
this.id = id;
}
public DataObject() {
// TODO Auto-generated constructor stub
}
#Override
public final int compareTo(DataObject o) {
return Long.compare(id, o.id);
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public void dummyCode() {
}
}
#FunctionalInterface
public interface Procedure {
void execute();
}
public void testSpeeds() {
rebuildEachTime = true;
numRep = 100;
dataAmount = 60_000;
data = new ArrayList<>(dataAmount);
ids = new ArrayList<>(dataAmount);
Random gen = new Random();
for (int i=0; i< dataAmount; i++) {
long id = i*7+gen.nextInt(7);
ids.add(id);
data.add(new DataObject(id));
}
Collections.sort(data);
treeMap = new TreeMap<Long, DataObject>();
populateMap(treeMap);
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
Procedure[] procedures = new Procedure[] {this::testArray, this::testTreeMap, this::testHashMap};
String[] names = new String[] {"array", "tree ", "hash "};
for (int n=0; n<procedures.length; n++) {
Procedure proc = procedures[n];
long startTime = System.nanoTime();
for (int i=0; i<numRep; i++) {
if (rebuildEachTime) {
Collections.shuffle(data);
}
proc.execute();
}
long endTime = System.nanoTime();
long diff = endTime - startTime;
System.out.println("proc "+names[n]+":\t"+(diff/1_000_000));
}
}
void testHashMap() {
if (rebuildEachTime) {
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
}
testMap(hashMap);
}
void testTreeMap() {
if (rebuildEachTime) {
treeMap = new TreeMap<Long, SearchSpeedTest.DataObject>();
populateMap(treeMap);
}
testMap(treeMap);
}
void testMap(Map<Long, DataObject> map) {
for (Long id: ids) {
DataObject ret = map.get(id);
ret.dummyCode();
}
}
void populateMap(Map<Long, DataObject> map) {
for (DataObject dataObj : data) {
map.put(dataObj.getId(), dataObj);
}
}
void testArray() {
if (rebuildEachTime) {
Collections.sort(data);
}
DataObject key = new DataObject();
for (Long id: ids) {
key.setId(id);
DataObject ret = data.get(Collections.binarySearch(data, key));
ret.dummyCode();
}
}
public static void main(String[] args) {
new SearchSpeedTest().testSpeeds();
}
}
HashMap will be more efficient in general, so use it whenever you don't care about the order of the keys.
when you want to keep your entries in Map sorted by key than use TreeMap but sorting will be overhead in your case as you dont want any particulate order.
You can use a map if you have a good way to define the key of the map. In the worst case you can use your object as key and value.
As ordering is not important use a HashMap. To maintain the order in a TreeMap there is additional cost when adding an element, as it must be added at the correct position.

Sorting a list based on frequency of words

I would need to sort a list of words based on its frequency.
My input:
Haha, hehe, haha, haha, hehe, hehe.... , Test
For example in my data structure I would have
Haha:3
Hehe:5
Test:10
I would need the data structure to be sorted at the output in this manner:
Test:10
Hehe:5
Haha:3
Such that if I pop the top of the data structure I would be able to obtain the element and its corresponding frequency.
The number of elements is unknown initially and hence, an array would not be feasible. If I would like to obtain the top few elements I would just need to access it sequentially. Is this possible in Java?
First, want to confirm:
Can you get all the whole words before sorting? Or these words come continuously in a stream?
(1)For the former case, you can use a Set to store the words, then put them into a PriorityQueue. If you implement the comparator function, the queue will sort the words automatically. I create a new class Pair to store the text and frequency, see the code:
import java.util.Queue;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.HashSet;
import java.util.Comparator;
public class PriorityQueueTest {
public static class Pair {
private String text;
private int frequency;
#Override
public int hashCode() {
return text.hashCode();
}
#Override
public String toString() {
return text + ":" + frequency;
}
public Pair(String text, int frequency) {
super();
this.text = text;
this.frequency = frequency;
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
public static Comparator<Pair> idComparator = new Comparator<Pair>(){
#Override
public int compare(Pair o1, Pair o2) {
if(o1.getFrequency() > o2.getFrequency()) {
return -1;
}
else if(o1.getFrequency() < o2.getFrequency()){
return 1;
}
else {
return 0;
}
}
};
public static void main(String[] args) {
Set<Pair> data = new HashSet<Pair>();
data.add(new Pair("haha", 3));
data.add(new Pair("Hehe", 5));
data.add(new Pair("Test", 10));
Queue<Pair> queue = new PriorityQueue(16, idComparator);
for(Pair pair : data) {
queue.add(pair);
}
// Test the order
Pair temp = null;
while((temp = queue.poll()) != null) {
System.out.println(temp);
}
}
}
(2)For the other case(the words come continuously), you may use a TreeMap to keep the order.
See ref: http://www.java-samples.com/showtutorial.php?tutorialid=370
To keep the information you need, you could create a class that holds your string and the count (e.g. Pair) and keep the instances of this class in a List<Pair>. This approach would make the increment of the count for a given string inefficient since you would have to look for the element that holds the string in linear time (O(N)) and then increment it.
A better approach is to use a Map<String, Integer>, that way the search is done in constant time (O(1)) and then you can sort the elements in the Set<Map.Entry<String, Integer>> returned by Map.entrySet().
List item
I am starting with the URL below as reference and I will be building on that reference:
How can I count the occurrences of a list item in Python?
Now, the building starts:
>>> from collections import Counter
>>> word_list = ['blue', 'red', 'blue', 'yellow', 'blue', 'red','white','white']
>>> Counter(word_list)
Counter({'blue': 3, 'red': 2, 'white': 2, 'yellow': 1})
Note how Counter(word_list) displays the list of elements i.e. word/frequency pairs sorted in order of decreasing frequency for you. Unfortunately, extracting the words and compiling them in a list sorted in the same order takes a little more work:
(1) Get "size" as the number of elements in the JSON object.
(2) Apply the "most_common" method on the JSON object to get a sorted array of the elements by frequency.
(3) Apply a list comprehension to generate the list of the words extracted from the sorted array.
>>> size = len(Counter(word_list))
4
>>> word_frequency_pairs = Counter(word_list).most_common(size)
>>> word_frequency_pairs
[('blue', 3), ('white', 2), ('red', 2), ('yellow', 1)]
>>> [i[0] for i in word_frequency_pairs]
['blue', 'white', 'red', 'yellow']
There is a reason why I love Python :)

A TreeSet or TreeMap that allow duplicates

I need a Collection that sorts the element, but does not removes the duplicates.
I have gone for a TreeSet, since TreeSet actually adds the values to a backed TreeMap:
public boolean add(E e) {
return m.put(e, PRESENT)==null;
}
And the TreeMap removes the duplicates using the Comparators compare logic
I have written a Comparator that returns 1 instead of 0 in case of equal elements. Hence in the case of equal elements the TreeSet with this Comparator will not overwrite the duplicate and will just sort it.
I have tested it for simple String objects, but I need a Set of Custom objects.
public static void main(String[] args)
{
List<String> strList = Arrays.asList( new String[]{"d","b","c","z","s","b","d","a"} );
Set<String> strSet = new TreeSet<String>(new StringComparator());
strSet.addAll(strList);
System.out.println(strSet);
}
class StringComparator implements Comparator<String>
{
#Override
public int compare(String s1, String s2)
{
if(s1.compareTo(s2) == 0){
return 1;
}
else{
return s1.compareTo(s2);
}
}
}
Is this approach fine or is there a better way to achieve this?
EDIT
Actually I am having a ArrayList of the following class:
class Fund
{
String fundCode;
BigDecimal fundValue;
.....
public boolean equals(Object obj) {
// uses fundCode for equality
}
}
I need all the fundCode with highest fundValue
You can use a PriorityQueue.
PriorityQueue<Integer> pQueue = new PriorityQueue<Integer>();
PriorityQueue(): Creates a PriorityQueue with the default initial capacity (11) that orders its elements according to their natural ordering.
This is a link to doc: https://docs.oracle.com/javase/8/docs/api/java/util/PriorityQueue.html
I need all the fundCode with highest fundValue
If that's the only reason why you want to sort I would recommend not to sort at all. Sorting comes mostly with a complexity of O(n log(n)). Finding the maximum has only a complexity of O(n) and is implemented in a simple iteration over your list:
List<Fund> maxFunds = new ArrayList<Fund>();
int max = 0;
for (Fund fund : funds) {
if (fund.getFundValue() > max) {
maxFunds.clear();
max = fund.getFundValue();
}
if (fund.getFundValue() == max) {
maxFunds.add(fund);
}
}
You can avoid that code by using a third level library like Guava. See: How to get max() element from List in Guava
you can sort a List using Collections.sort.
given your Fund:
List<Fund> sortMe = new ArrayList(...);
Collections.sort(sortMe, new Comparator<Fund>() {
#Override
public int compare(Fund left, Fund right) {
return left.fundValue.compareTo(right.fundValue);
}
});
// sortMe is now sorted
In case of TreeSet either Comparator or Comparable is used to compare and store objects . Equals are not called and that is why it does not recognize the duplicate one
Instead of the TreeSet we can use List and implement the Comparable interface.
public class Fund implements Comparable<Fund> {
String fundCode;
int fundValue;
public Fund(String fundCode, int fundValue) {
super();
this.fundCode = fundCode;
this.fundValue = fundValue;
}
public String getFundCode() {
return fundCode;
}
public void setFundCode(String fundCode) {
this.fundCode = fundCode;
}
public int getFundValue() {
return fundValue;
}
public void setFundValue(int fundValue) {
this.fundValue = fundValue;
}
public int compareTo(Fund compareFund) {
int compare = ((Fund) compareFund).getFundValue();
return compare - this.fundValue;
}
public static void main(String args[]){
List<Fund> funds = new ArrayList<Fund>();
Fund fund1 = new Fund("a",100);
Fund fund2 = new Fund("b",20);
Fund fund3 = new Fund("c",70);
Fund fund4 = new Fund("a",100);
funds.add(fund1);
funds.add(fund2);
funds.add(fund3);
funds.add(fund4);
Collections.sort(funds);
for(Fund fund : funds){
System.out.println("Fund code: " + fund.getFundCode() + " Fund value : " + fund.getFundValue());
}
}
}
Add the elements to the arraylist and then sort the elements using utility Collections.sort,. then implement comparable and write your own compareTo method according to your key.
wont remove duplicates as well, can be sorted also:
List<Integer> list = new ArrayList<>();
Collections.sort(list,new Comparator<Integer>()
{
#Override
public int compare(Objectleft, Object right) {
**your logic**
return '';
}
}
)
;
I found a way to get TreeSet to store duplicate keys.
The problem originated when I wrote some code in python using SortedContainers. I have a spatial index of objects where I want to find all objects between a start/end longitude.
The longitudes could be duplicates but I still need the ability to efficiently add/remove specific objects from the index. Unfortunately I could not find the Java equivalent of the Python SortedKeyList that separates the sort key from the type being stored.
To illustrate this consider that we have a large list of retail purchases and we want to get all purchases where the cost is in a specific range.
// We are using TreeSet as a SortedList
TreeSet _index = new TreeSet<PriceBase>()
// populate the index with the purchases.
// Note that 2 of these have the same cost
_index.add(new Purchase("candy", 1.03));
Purchase _bananas = new Purchase("bananas", 1.45);
_index.add(new Purchase(_bananas);
_index.add(new Purchase("celery", 1.45));
_index.add(new Purchase("chicken", 4.99));
// Range scan. This iterator should return "candy", "bananas", "celery"
NavigableSet<PriceBase> _iterator = _index.subset(
new PriceKey(0.99), new PriceKey(3.99));
// we can also remove specific items from the list and
// it finds the specific object even through the sort
// key is the same
_index.remove(_bananas);
There are 3 classes created for the list
PriceBase: Base class that returns the sort key (the price).
Purchase: subclass that contains transaction data.
PriceKey: subclass used for the range search.
When I initially implemented this with TreeSet it worked except in the case where the prices are the same. The trick is to define the compareTo() so that it is polymorphic:
If we are comparing Purchase to PriceKey then only compare the price.
If we are comparing Purchase to Purchase then compare the price and the name if the prices are the same.
For example here are the compareTo() functions for the PriceBase and Purchase classes.
// in PriceBase
#Override
public int compareTo(PriceBase _other) {
return Double.compare(this.getPrice(), _other.getPrice());
}
// in Purchase
#Override
public int compareTo(PriceBase _other) {
// compare by price
int _compare = super.compareTo(_other);
if(_compare != 0) {
// prices are not equal
return _compare;
}
if(_other instanceof Purchase == false) {
throw new RuntimeException("Right compare must be a Purchase");
}
// compare by item name
Purchase _otherPurchase = (Purchase)_other;
return this.getName().compareTo(_otherChild.getName());
}
This trick allows the TreeSet to sort the purchases by price but still do a real comparison when one needs to be uniquely identified.
In summary I needed an object index to support a range scan where the key is a continuous value like double and add/remove is efficient.
I understand there are many other ways to solve this problem but I wanted to avoid writing my own tree class. My solution seems like a hack and I am surprised that I can't find anything else. if you know of a better way then please comment.

Using binarySearch with Comparator and regex

I am trying to write a quick search that searches a List<String>
Instead of looping through the list and manually checking, I want to do this using binarySearch, but I am not sure how to do it.
Old way:
for(String s : list) {
if(s.startsWith("contact.")
return true;
}
Instead I would like something like this:
Collections.sort(list);
Collections.binarySearch(list, FindContactComparator());
Can someone help me write this Comparator?
Is there any better way of doing this instead of using binarySearch?
This should work:
Comparator<String> startsWithComparator = new Comparator<String>() {
public int compare(String currentItem, String key) {
if(currentItem.startsWith(key)) {
return 0;
}
return currentItem.compareTo(key);
}
};
int index = Collections.binarySearch(items, "contact.", startsWithComparator);
However sorting and then binary searching is less efficient than the single pass iteration.
Addendum:
Though the above answer helps you, here is another way (inspired from Scala, Google Collections) :
List<String> items = Arrays.asList("one", "two", "three", "four", "five", "six");
int index = find(items, startsWithPredicate("th"));
System.out.println(index);
public static Predicate<String> startsWithPredicate(final String key) {
return new Predicate<String>(){
#Override
public boolean apply(String item) {
return item.startsWith(key);
}
};
}
public static <T> int find(Collection<T> items, Predicate<T> predicate) {
int index = 0;
for(T item: items) {
if(predicate.apply(item)) {
return index;
}
index++;
}
return -1;
}
interface Predicate<T> {
boolean apply(T item);
}
Here the thing is the find() method is not tied with your 'matching' logic; it just finds an element that satisfies the predicate. So you could pass on a different implementation of predicate, for ex. which can check 'endsWith' to find() method and it would return the found item which ends with a particular string. Further the find() method works for any type of collection; all it needs is a predicate which transforms an element of collection element type to a boolean. This multiple lines of code around a simple logic also show the Java's lack of support for first class functions.
The problem is that binary search never looks back.
I solved this by finding the first matching an element using binary search, then loop backward to find the first occurrence of this substring, followed by a loop which collects all matching elements.
I think that the way you are doing this now is actually the best way from a performance standpoint. Sorting itself is probably more expensive than simply iterating through the unsorted list. But to be sure you would have to run some tests (although that's not as easy as it may sound due to JIT compilation).
Is the criterium you are looking for always 'starts with'? Because in your question you're talking about a regex.
If you do want to implement this, you should at least use the same Comparator for sorting as for searching. The comparator itself can be very simple. Just write one that puts everything that matches your criterium in front of everything that doesn't. My syntax may not be completely correct since I haven't done Java in a while.
public class MyComparator<string> implements Comparator<string> {
private string prefix;
public MyComparator(string prefix) {
this.prefix = prefix;
}
public int compare(string s0, string s1) {
if (s0.startsWith(prefix) && s1.startsWith(prefix)) {
return 0;
}
else if (s0.startsWith(prefix)) {
return -1;
}
else if (s1.startsWith(prefix)) {
return 1;
}
return 0;
}
public bool equals(object comp) {
return true;
}
}
Sorting the list itself takes more time than a linear scan of the list. (Comparison based sort takes time proportional to n(log n) where n is the length of the list.)
Even if the list is completely sorted most of the times, the sorting algorithm will have to at least iterate through the list to check this.
Basically, no matter how you implement a sorting algorithm, the algorithm (even in the best case) has to at least look at all elements. Thus, a linear search for "concat" is probably your best option here.
A more elaborate solution would be to subclass the list that contains the strings, and maintain the index of the first occurnece of "concat".
Given that strings are immutable, all you have to do is to override add, remove and so on, and update the index accordingly.
Just another comparator (with regex):
Comparator<String> comparator = new Comparator<String>() {
private final Pattern containsPattern = Pattern.compile(searchTerm,Pattern.CASE_INSENSITIVE);
public int compare(String o1, String o2) {
Matcher contains1 = containsPattern.matcher(o1);
Matcher contains2 = containsPattern.matcher(o2);
boolean find1 = contains1.find();
boolean find2 = contains2.find();
if(find1 && find2){
int compareContains = contains1.end() - contains2.end();
if (compareContains == 0) {
return o1.compareTo(o2);
} else {
return compareContains;
}
}else if(find1){
return -1;
}else if(find2){
return 1;
}else{
return o1.compareTo(o2);
}
}
};
Input ArrayList (search term: dog):
"yxcv",
"dogb",
"doga",
"abcd",
"a Dog"
Output(sorted) ArrayList:
"doga",
"dogb",
"a Dog",
"abcd",
"yxcv"

Categories