I want to sort something like this:
Given an ArrayList of objects with name Strings, I am trying to write the compareTo function such that Special T is always first, Special R is always second, Special C is always third, and then everything else is just alphabetical:
Special T
Special R
Special C
Aaron
Alan
Bob
Dave
Ron
Tom
Is there a standard way of writing this kind of compare function without needing to iterate over all possible combinations between the special cases and then invoking return getName().compareTo(otherObject).getName()); if it's a non-special case?
I would put the special cases in a HashMap<String, Integer> with the name as key and position as value. The advantages are:
search is in O(1) order of magnitude
The HashMap may be populated from an external source
Related
I am trying to solve the following exercise from "Core Java for the Impatient" by Cay Horstmann:
When an encoder of a Charset with partial Unicode coverage can’t encode a
character, it replaces it with a default—usually, but not always, the encoding of "?".
Find all replacements of all available character sets that support encoding. Use the
newEncoder method to get an encoder, and call its replacement method to get
the replacement. For each unique result, report the canonical names of the charsets
that use it.
For the sake of education, I have decided to tackle the exercise with gargantuan one-liner using the streaming API, even though - in my opinion - a cleaner solution would divide the calculations into a number of steps, with intermediate variables in-between (certainly it would ease the debugging). Without further ado, here is a monster of code I have created:
Charset.availableCharsets().values().stream().filter(charset -> charset.canEncode()).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode())),
Collectors.mapping( charset -> charset.name(), Collectors.toList()))).
values().stream().map(list -> list.stream().collect(Collectors.joining(", "))).forEach(System.out::println);
Basically, we take into account only the charsets that canEncode; create a Map with replacement as key and a list of canonical names as values; because grouping didn't work for arrays with default implementation of groupingBy, which uses HashMap, I have decided to use TreeMap. We then work with the Lists of canonical names, join them with comma and print.
Unfortunately, I have found it to give incoherent results. If I run the function twice in the same program, the first instance returns results consisting of 23 Strings, the second one - just 21 Strings. I suspect it has something to do with a poor implementation of Comparator for TreeMap, which was defined as follows:
((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode()))
If that is the cause, what should be a proper Comparator in this case? Apart from that, can the one-liner be improved in any way?
I am also curious if such convoluted constructs as the code I have written are encountered in professional programs? Maybe it's only me who find it unreadable?
There is no guarantee that the hash code of two distinct instances will be different. That would be an ideal situation, but is never guaranteed. Only the opposite is true: if two objects are equal, they have the same hash code.
So if you create a comparator that considers the objects to be the same when they have the same hash code, arbitrary objects might be considered to be the same. Since the byte[] arrays returned by replacement() are defensive copies, read temporary objects, the result may vary in every run of this code.
Further, since the hash code of an array has nothing to do with its content, your comparator violates the transitivity rule: two arrays with equal content are supposed to be the same, but since they might/very likely have different hash codes, they have a different relation when being compared with a third array, not having the same content, a == b, but a < c and b > c. This is the reason why even equal arrays, which you compare by Arrays.equals can end up in different groups, as the TreeSet failed to find the existing key when comparing with other keys then.
If you want the arrays to be compared by value, you can use:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>(Comparator.comparing(ByteBuffer::wrap)),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
ByteBuffers are Comparable and consistently evaluate the contents of the wrapped array.
I moved the Collectors.joining collector into the grouping collector to avoid the creation of the temporary List whose contents you are going to join afterwards anyway.
By the way, never use code like expression == true. There is no reason to append == true as expression is already sufficient.
Since you are only interested in the values, in other words, don’t need the keys to be of a certain type, you may wrap all arrays beforehand, simplifying the operation and even make it slightly more efficient:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
TreeMap::new,
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This change even allows resorting to hashing, if no consistent iteration order is required:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This works, because ByteBuffer also implements equals and hashCode.
I am writing a program which will add a growing number or unique strings to a data structure. Once this is done, I later need to constantly check for existence of the string in it.
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found (or reach the end and return false).
However, with a HashMap I know that in constant time I can simply use the key as a String and return any non-null object, making this operation faster. However, I am not keen on filling a HashMap where the value is completely arbitrary. Is there a readily available data structure that uses hash functions, but doesn't require a value to be placed?
If I were to use an ArrayList I believe checking for the existence of some specified string would iterate through all items until a matching string is found
Correct, checking a list for an item is linear in the number of entries of the list.
However, I am not keen on filling a HashMap where the value is completely arbitrary
You don't have to: Java provides a HashSet<T> class, which is very much like a HashMap without the value part.
You can put all your strings there, and then check for presence or absence of other strings in constant time;
Set<String> knownStrings = new HashSet<String>();
... // Fill the set with strings
if (knownString.contains(myString)) {
...
}
It depends on many factors, including the number of strings you have to feed into that data structure (do you know the number by advance, or have a basic idea?), and what you expect the hit/miss ratio to be.
A very efficient data structure to use is a trie or a radix tree; they are basically made for that. For an explanation of how they work, see the wikipedia entry (a followup to the radix tree definition is in this page). There are Java implementations (one of them is here; however I have a fixed set of strings to inject, which is why I use a builder).
If your number of strings is really huge and you don't expect a minimal miss ratio then you might also consider using a bloom filter; the problem however is that it is probabilistic; but you can get very quick answers to "not there". Here also, there are implementations in Java (Guava has an implementation for instance).
Otherwise, well, a HashSet...
A HashSet is probably the right answer, but if you choose (for simplicity, eg) to search a list it's probably more efficient to concatenate your words into a String with separators:
String wordList = "$word1$word2$word3$word4$...";
Then create a search argument with your word between the separators:
String searchArg = "$" + searchWord + "$";
Then search with, say, contains:
bool wordFound = wordList.contains(searchArg);
You can maybe make this a tiny bit more efficient by using StringBuilder to build the searchArg.
As others mentioned HashSet is the way to go. But if the size is going to be large and you are fine with false positives (checking if the username exists) you can use BloomFilters (probabilistic data structure) as well.
I have a large collection of Strings. I want to be able to find the Strings that begin with "Foo" or the Strings that end with "Bar". What would be the best Collection type to get the fastest results? (I am using Java)
I know that a HashSet is very fast for complete matches, but not for partial matches I would think? So, what could I use instead of just looping through a List? Should I look into LinkedList's or similar types? Are there any Collection Types that are optimized for this kind of queries?
The best collection type for this problem is SortedSet. You would need two of them in fact:
Words in regular order.
Words with their characters inverted.
Once these SortedSets have been created, you can use method subSet to find what you are looking for. For example:
Words starting with "Foo":
forwardSortedSet.subSet("Foo","Fop");
Words ending with "Bar":
backwardSortedSet.subSet("raB","raC");
The reason we are "adding" 1 to the last search character is to obtain the whole range. The "ending" word is excluded from the subSet, so there is no problem.
EDIT: Of the two concrete classes that implement SortedSet in the standard Java library, use TreeSet. The other (ConcurrentSkipListSet) is oriented to concurrent programs and thus not optimized for this situation.
It's been a while but I needed to implement this now and did some testing.
I already have a HashSet<String> as source so generation of all other datastructures is included in search time. 100 different sources are used and each time the data structures need to be regenerated. I only need to match a few single Strings each time. These tests ran on Android.
Methods:
Simple loop through HashSet and call endsWith() on
each string
Simple loop through HashSet and perform precompiled
Pattern match (regex) on each string.
Convert HashSet to single String joined by \n and
single match on whole String.
Generate SortedTree with reversed Strings from
HashSet. Then match with subset() as explained by #Mario Rossi.
Results:
Duration for method 1: 173ms (data setup:0ms search:173ms)
Duration for method 2: 6909ms (data setup:0ms search:6909ms)
Duration for method 3: 3026ms (data setup:2377ms search:649ms)
Duration for method 4: 2111ms (data setup:2101ms search:10ms)
Conclusion:
SortedSet/SortedTree is extremely fast in searching. Much faster than just looping through all Strings. However, creating the structure takes a lot of time. Regexes are much slower, but generating a single large String out of hundreds of Strings is more of a bottleneck on Android/Java.
If only a few matches need to be made, then you better loop through your collection. If you have much more matches to make it may be very useful to use a SortedTree!
If the list of words is stable (not many words are added or deleted), a very good second alternative is to create 2 lists:
One with the words in normal order.
The second with the characters in each word reversed.
For speed purposes, make them ArrayLists. Never LinkedLists or other variants which perform extremely bad on random access (the core of binary search; see below).
After the lists are created, they can be sorted with method Collections.sort (only once each) and then searched with Collections.binarySearch. For example:
Collections.sort(forwardList);
Collections.sort(backwardList);
And then to search for words starting in "Foo":
int i= Collections.binarySearch(forwardList,"Foo") ;
while( i < forwardList.size() && forwardList.get(i).startsWith("Foo") ) {
// Process String forwardList.get(i)
i++;
}
And words ending in "Bar":
int i= Collections.binarySearch(backwardList,"raB") ;
while( i < backwardList.size() && backwardList.get(i).startsWith("raB") ) {
// Process String backwardList.get(i)
i++;
}
Edit: I should have probably mentioned that I am extremely new to Java programming. I just started with the language about two weeks ago.
I have tried looking for an answer to this questions, but so far I haven't found one so that is why I am asking it here.
I writing java code for an Dungeons and Dragons Initiative Tracker and I am using a TreeMap for its ability to sort on entry. I am still very new to java, so I don't know everything that is out there.
My problem is that when I have two of the same keys, the tree merges the values such that one of the values no longer exists. I understand this can be desirable behavior but in my case I cannot have that happen. I was hoping there would be an elegant solution to fix this behavior. So far what I have is this:
TreeMap<Integer,Character> initiativeList = new TreeMap<Integer,Character>(Collections.reverseOrder());
Character [] cHolder = new Character[3];
out.println("Thank you for using the Initiative Tracker Project.");
cHolder[0] = new Character("Fred",2);
cHolder[1] = new Character("Sam",3,23);
cHolder[2] = new Character("John",2,23);
for(int i = 0; i < cHolder.length; ++i)
{
initiativeList.put(cHolder[i].getInitValue(), cHolder[i]);
}
out.println("Initiative List: " + initiativeList);
Character is a class that I have defined that keeps track of a player's character name and initiative values.
Currently the output is this:
Initiative List: {23=John, 3=Fred}
I considered using a TreeMap with some sort of subCollection but I would also run into a similar problem. What I really need to do is just find a way to disable the merge. Thank you guys for any help you can give me.
EDIT: In Dungeons and Dragons, a character rolls a 20 sided dice and then added their initiative mod to the result to get their total initiative. Sometimes two players can get the same values. I've thought about having the key formatted like this:
Key = InitiativeValue.InitiativeMod
So for Sam his key would be 23.3 and John's would be 23.2. I understand that I would need to change the key type to float instead of int.
However, even with that two players could have the same Initiative Mod and roll the same Initiative Value. In reality this happens more than you might think. So for example,
Say both Peter and Scott join the game. They both have an initiative modifier of 2, and they both roll a 10 on the 20 sided dice. That would make both of their Initiative values 12.
When I put them into the existing map, they both need to show up even though they have the same value.
Initiative List: {23=John, 12=Peter, 12=Scott, 3=Fred}
I hope that helps to clarify what I am needing.
If I understand you correctly, you have a bunch of characters and their initiatives, and want to "invert" this structure to key by initiative ID, with the value being all characters that have that initiative. This is perfectly captured by a MultiMap data structure, of which one implementation is the Guava TreeMultimap.
There's nothing magical about this. You could achieve something similar with a
TreeMap<Initiative,List<Character>>
This is not exactly how a Guava multimap is implemented, but it's the simplest data structure that could support what you need.
If I were doing this I would write my own class that wrapped the above TreeMap and provided an add(K key, V value) method that handled the duplicate detection and list management according to your specific requirements.
You say you are "...a TreeMap for its ability to sort on entry..." - but maybe you could just use a TreeSet instead. You'll need to implement a suitable compareTo method on your Character class, that performs the comparison that you want; and I strongly recommend that you implement hashCode and equals too.
Then, when you iterate through the TreeSet, you'll get the Character objects in the appropriate order. Note that Map classes are intended for lookup purposes, not for ordering.
This is a question more about best practices/design patterns than regexps.
In short I have 3 values: from, to and the value I want to change. From has to match one of several patterns:
XX.X
>XX.X
>=XX.X
<XX.X
<=XX.X
XX.X-XX.X
Whereas To has to be a decimal number. Depending on what value is given in From I have to check whether a value I want to change satisfies the From condition. For example the user inputs "From: >100.00 To: 150.00" means that every value greater than 100.00 should be changed.
The regexp itself isn't a problem. The thing is if I match the whole From against one regexp and it passes I still need to check which option was inputted - this will generate at least 5 IFs in my code and every time I want to add another option I will need to add another IF - not cool. Same thing if I were to create 5 Patterns.
Now I have a HashMap which holds a pattern as the key and a ValueMatcher as the value. When a user inputs a From value then I match it in a loop against every key in that map and if it matches then I use the corresponding ValueMatcher to actually check if the value that I want to change satisfies the "From" value.
This aproach on the other hand requires me to have a HashMap with all the possibilities, a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Is there any other way to do it? Or is this how I actually should do it? I really regret that we can't hold methods in a HashMap/pass them as arguments because then I'd only have 1 class with all the matching methodes and store them in a HashMap.
How about a chain of responsibility.
Each ValueMatcher object exactly one From/To rule and a reference to the next ValueMatcher in the chain. Each ValueMatcher has a method which examines a candidate and either transaforms it or passes it on to the next in the chain.
This way adding a new rule is a trivial extension and the controlling code just passes the candidate to the first member of the chain.
a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Well, for something as simple as evaluating a number against an operator and a limit value, couldn't you just write one slightly more generic ValueMatcher which has a limit value and an operator as its parameters? It would then be pretty easy to add 5 instances of this ValueMatcher with a few combinations of >, >=, etc.
EDIT: Removed non Java stuff... sorry about that.