How to detect the "likeness" of data

How to detect the "likeness" of data - java

Generally speaking, can you suggest an approach which would let me test objects to make sure they are alike.
Accept that objects are alike if over 'n%' worth of content of the object is identical.
Other then a brute force, are there any libraries available i can take advantage of?
thanks

As a starting point, have a look at something called the Levenshtein distance and see if it's relevant to your use?

This could only be done on a case by case basis. If I really needed this functionality, I'd define an interface:
public interface Similar<Entity> {
boolean isSimilar(Entity other);
}
Each implementing Class can define what it means to be 'similar' to another instance. Things to keep in mind would be same issues that you would keep in mind for cloning: shallow copy vs deep copy, etc.
Naive implementation of Person:
public class Person implements Similar<Person> {
private String firstName;
private String lastName;
public String getLastName() {
return lastName;
}
public String getFirstName() {
return firstName;
}
public boolean isSimilar(Person other) {
if (other != null) {
if (lastName.equalsIgnoreCase(other.getLastName())
|| (firstName.equalsIgnoreCase(other.getFirstName()))) {
return true;
}
}
return false;
}
}

I believe you can find a good solution if you focus on the details of your specific problem. The only "reasonable" solution I have in mind for the general case is based on reflection: scan the data members and find similarities of corresponding pairs of members recursively.
However, there are so many problems with this idea, so I don't think it's feasible. Among them:
1) The concept of weight of member subtrees should be well defined in order to be able to return a similarity percent.
2) How to handle data members that only belong to one of the objects? this will happen frequently when comparing an instance of class A to an instance of a descendant class B.
3) Maybe the biggest problem: The mapping between the internal structure of an object to its abstract data representation is not an injective function. For example, two hashmaps representing the same mapping may have different inner structure, due to different history of table re-allocations.

One thing you can try is encoding the objects then comparing the result... In particular I've done this with JSON. For detecting if objects match completely, this is straightforward.

You could implement the Comparable interface and define your own 'logic' for comparing instances of a class.
As mentioned before me, for text similarities you could use distance calculation algorithms which you can find in the the SimMetrics library (http://www.dcs.shef.ac.uk/~sam/simmetrics.html).
Another way to compare is by comparing object hashcodes (after you override the hashCode() method of the Object class) - note sure that it's what you are looking for.

Related

Is comparing serializable object representation a good idea in java?

I need to compare huge, multiple time nested, complex java object. There are no circular references (ufff). Adding "equal" method to each class takes weeks. I got the idea that maybe serializing these objects to array of bytes, and then comparing these arrays can be easier.
Very simplified example:
public class Car implements Serializable{
String name;
String brand;
public Car(String name, String brand) {
this.name = name;
this.brand = brand;
}
}
public class App {
public static void main(String[] args) {
Car carA = new Car("Ka", "Ford");
Car carB = new Car("P85D", "Tesla");
Car carC = new Car("Ka", "Ford");
byte[] carAr = SerializationUtils.serialize(carA);
byte[] carBr = SerializationUtils.serialize(carB);
byte[] carCr = SerializationUtils.serialize(carC);
boolean res = Arrays.equals(carAr, carBr);
System.out.println(res);
res = Arrays.equals(carAr, carAr);
System.out.println(res);
res = Arrays.equals(carAr, carCr);
System.out.println(res);
}
}
My initial tests (for much smaller and simpler structures) show that it works. Unfortunately I cannot find any proof that object properties are always serialized in the same order.
UPDATE
These classes are generated by a maven plugin and I do not know yet, how to intercept that process to add equals and hashCode methods.

If you're looking to compare objects for equality, serializing them and then comparing is pointless when you can simply override the class' equals method and compare them by their fields (there will undoubtedly be cases where certain fields of the class should not be taken into account, and your method of serialization may not offer that).
Adding "equal" method to each class takes weeks.
I'm sure this isn't the case. Modern IDEs are able to generate these methods for you, and libraries (such as Guava) offer methods that can compare objects by their fields in a single line of code.

You might want to give Apache's EqualsBuilder a try, most specifically this method (and variances of it); I think in the latest versions (3.7) it is fully recursive, but I think they also have a bug with Strings only solved in the yet-to-be-released v3.8, checking transient fields is also an option of some of the methods.

No, it isn't a good idea.
First, it ignores the content of all non-serializable base classes.
Second, it ignores any transient fields that may be present.
Third, it takes into account differences in referenced objects that you may wish to ignore.
Fourth, it requires that all classes concerned are serializable.
Fifth, it wastes time and space.
It isn't a good idea to post-modify generated code, but you could look into building a set of custom Comparators, which exist outside the class, such that you can use Comparatar.compare(T object1, T object2) to compare them.

Is it good practice to use ordinal of enum?

I have an enum:
public enum Persons {
CHILD,
PARENT,
GRANDPARENT;
}
Is there any problem with using ordinal() method to check "hierarchy" between enum members? I mean - is there any disadvantages when using it excluding verbosity, when somebody can change accidentally order in future.
Or is it better to do something like that:
public enum Persons {
CHILD(0),
PARENT(1),
GRANDPARENT(2);
private Integer hierarchy;
private Persons(final Integer hierarchy) {
this.hierarchy = hierarchy;
}
public Integer getHierarchy() {
return hierarchy;
}
}

TLDR: No, you should not!
If you refer to the javadoc for ordinal method in Enum.java:
Most programmers will have no use for this method. It is
designed for use by sophisticated enum-based data structures, such
as java.util.EnumSet and java.util.EnumMap.
Firstly - read the manual (javadoc in this case).
Secondly - don't write brittle code. The enum values may change in future and your second code example is much more clear and maintainable.
You definitely don't want to create problems for the future if a new enum value is (say) inserted between PARENT and GRANDPARENT.

As suggested by Joshua Bloch in Effective Java, it's not a good idea to derive a value associated with an enum from its ordinal, because changes to the ordering of the enum values might break the logic you encoded.
The second approach you mention follows exactly what the author proposes, which is storing the value in a separate field.
I would say that the alternative you suggested is definitely better because it is more extendable and maintainable, as you are decoupling the ordering of the enum values and the notion of hierarchy.

The first way is not straight understandable as you have to read the code where the enums are used to understand that the order of the enum matters.
It is very error prone.
public enum Persons {
CHILD,
PARENT,
GRANDPARENT;
}
The second way is better as it is self explanatory :
CHILD(0),
PARENT(1),
GRANDPARENT(2);
private SourceType(final Integer hierarchy) {
this.hierarchy = hierarchy;
}
Of course, orders of the enum values should be consistent with the hierarchical order provided by the enum constructor arguments.
It introduces a kind of redundancy as both the enum values and the arguments of the enum constructor conveys the hierarchy of them.
But why would it be a problem ?
Enums are designed to represent constant and not frequently changing values.
The OP enum usage illustrates well a good enum usage :
CHILD, PARENT, GRANDPARENT
Enums are not designed to represent values that moves frequently.
In this case, using enums is probably not the best choice as it may breaks frequently the client code that uses it and besides it forces to recompile, repackage and redeploy the application at each time an enum value is modified.

First, you probably don't even need a numeric order value -- that's
what Comparable
is for, and Enum<E> implements Comparable<E>.
If you do need a numeric order value for some reason, yes, you should
use ordinal(). That's what it's for.
Standard practice for Java Enums is to sort by declaration order,
which is why Enum<E> implements Comparable<E> and why
Enum.compareTo() is final.
If you add your own non-standard comparison code that doesn't use
Comparable and doesn't depend on the declaration order, you're just
going to confuse anyone else who tries to use your code, including
your own future self. No one is going to expect that code to exist;
they're going to expect Enum to be Enum.
If the custom order doesn't match the declaration order, anyone
looking at the declaration is going to be confused. If it does
(happen to, at this moment) match the declaration order, anyone
looking at it is going to come to expect that, and they're going to
get a nasty shock when at some future date it doesn't. (If you write
code (or tests) to ensure that the custom order matches the
declaration order, you're just reinforcing how unnecessary it is.)
If you add your own order value, you're creating maintenance headaches
for yourself:
you need to make sure your hierarchy values are unique
if you add a value in the middle, you need to renumber all
subsequent values
If you're worried someone could change the order accidentally in the
future, write a unit test that checks the order.
In sum, in the immortal words of Item 47:
know and use the libraries.
P.S. Also, don't use Integer when you mean int. 🙂

If you only want to create relationships between enum values, you can actually use the trick of using other enum values:
public enum Person {
GRANDPARENT(null),
PARENT(GRANDPARENT),
CHILD(PARENT);
private final Person parent;
private Person(Person parent) {
this.parent = parent;
}
public final Parent getParent() {
return parent;
}
}
Note that you can only use enum values that were declared lexically before the one you're trying to declare, so this only works if your relationships form an acyclic directed graph (and the order you declare them is a valid topological sort).

Using ordinal() is unrecommended as changes in the enum's declaration may impact the ordinal values.
UPDATE:
It is worth noting that the enum fields are constants and can have duplicated values, i.e.
enum Family {
OFFSPRING(0),
PARENT(1),
GRANDPARENT(2),
SIBLING(3),
COUSING(4),
UNCLE(4),
AUNT(4);
private final int hierarchy;
private Family(int hierarchy) {
this.hierarchy = hierarchy;
}
public int getHierarchy() {
return hierarchy;
}
}
Depending on what you're planning to do with hierarchy this could either be damaging or beneficial.
Furthermore, you could use the enum constants to build your very own EnumFlags instead of using EnumSet, for example

I would use your second option (using a explicit integer) so the numeric values are assigned by you and not by Java.

Let's consider following example:
We need to order several filters in our Spring Application. This is doable by registering filters via FilterRegistrationBeans:
#Bean
public FilterRegistrationBean compressingFilterRegistration() {
FilterRegistrationBean registration = new FilterRegistrationBean();
registration.setFilter(compressingFilter());
registration.setName("CompressingFilter");
...
registration.setOrder(1);
return registration;
}
Let's assume we have several filters and we need to specify their order (e.g. we want to set as first the filter which add do MDC context the JSID for all loggers)
And here I see the perfect usecase for ordinal(). Let's create the enum:
enum FilterRegistrationOrder {
MDC_FILTER,
COMPRESSING_FILTER,
CACHE_CONTROL_FILTER,
SPRING_SECURITY_FILTER,
...
}
Now in registration bean we can use:
registration.setOrder(MDC_FILTER.ordinal());
And it works perfectly in our case. If we haven't had an enum to do that we would have had to re-numerate all filters orders by adding 1 to them (or to constants which stores them). When we have enum you only need to add one line in enum in proper place and use ordinal. We don't have to change the code in many places and we have the clear structure of order for all our filters in one place.
In the case like this I think the ordinal() method is the best option to achieve the order of filters in clean and maintainable way

You must use your judgement to evaluate which kind of errors would be more severe in your particular case. There is no one-size-fits-all answer to this question. Each solution leverages one advantage of the compiler but sacrifices the other.
If your worst nightmare is enums sneakily changing value: use ENUM(int)
If your worst nightmare is enum values becoming duplicated or losing contiguousness: use ordinal.

According to java doc
Returns the ordinal of this enumeration constant (its position in its
enum declaration, where the initial constant is assigned an ordinal of
zero). Most programmers will have no use for this method. It is
designed for use by sophisticated enum-based data structures, such as
EnumSet and EnumMap.
You can control the ordinal by changing the order of the enum, but you cannot set it explicitly.One workaround is to provide an extra method in your enum for the number you want.
enum Mobile {
Samsung(400), Nokia(250),Motorola(325);
private final int val;
private Mobile (int v) { val = v; }
public int getVal() { return val; }
}
In this situation Samsung.ordinal() = 0, but Samsung.getVal() = 400.

This is not a direct answer to your question. Rather better approach for your usecase. This way makes sure that next developer will explicitly know that values assigned to properties should not be changed.
Create a class with static properites which will simulate your enum:
public class Persons {
final public static int CHILD = 0;
final public static int PARENT = 1;
final public static int GRANDPARENT = 2;
}
Then use just like enum:
Persons.CHILD
It will work for most simple use cases. Otherwise you might be missing on options like valueOf(), EnumSet, EnumMap or values().

set.contains() and polymorphism?

This feels like a really basic question, but it's pestering me. I want to get a diff between two sets, but it seems as if I'm incapable of doing so due to the set potentially containing objects with different implementations of Hash and Equals.
To give a simplified example of my problem lets say I have an IFoo interface which is implemented by two classes, call them BasicFoo and ExtendedFoo. They both describe the same physical object, but ExtendedFoo is enriched with some extra information BasicFoo lacks. Essentially BasicFoo describes what physical object my caller is interested in, which I'll eventually use to look up the appropriate ExtendedFoo object in my model. A BasicFoo could be thought of as being equal to an ExtendedFoo if they describe the same physical object.
I have a method that would do something sort of like this:
public void removeFooInModel(HashSet<IFoo> fooColection){
HashSet<Foo> fooInModel=Model.getFoo();
fooCollection.removeAll(fooInModel);
return;
}
My problem is that I don't think it will always work. if the user passes in a fooCollection that actually contains BasicFoo objects, and my Model.getFoo() method returns a Set that actually contains ExtendedFoo then their equal and hashCode methods will be different (ExtendedFoo has far more state to compare for equality). Despite my having a very clear understanding of how to map equality and hash between the two objects It seems like there is no way to force this knowledge to be utilized by my Set object.
What I'm wondering is if there is any way to get the convenience of a Sets methods, like removeAll an Contains, when I may wish to mix and match implementations?
One way to do this would be to make my ExtendedFoo and BasicFoo HashCode methods identical, only hashing on state that is available in the IFoo interface, and make their equal methods compare any IFoo object of a different class by only comparing IFoo getter methods. However, if I someone later writes a FooBar object that extends IFoo how can I gaurentee that they also write their hashcode method identical to mine so my method still works with their classes?
I can work around my own issue easily enough by not using a Set in the one place this is an issue. However, what would others consider the 'proper' solution if I needed the efficency gains offered by a HashSet?

Unfortunately, you cannot convince the hash set to compare basic and extended objects for equality as if they both were basic objects. However, you could build a wrapper object that holds a basic or an extended object, and uses the comparison method of the basic object to compare the two.
Here is a sketch of a possible implementation:
class FooWrapper {
private final BasicFoo obj;
public FooWrapper(BasicFoo obj) {
this.obj = obj;
}
public BasicFoo getWrapped() {
return obj;
}
public int hashCode() {
// Compute the hash code the way the BasicFoo computes it
}
public boolean equals(Object other) {
// Compare the objects the way the BasicFoo does
}
}
Now you can take a mixed collection of BasicFoo and ExtendedFoo, wrap them in the FooWrapper, and put them in a hash set. You can perform operations on the sets, and then harvest the results by unwrapping the individual BasicFoo objects from the set of wrappers.

Finding Java Enum types from code values?

We use code values in our database, and Enums in Java. When querying the database, we need to take a code value and get an Enum instance.
Is it overkill to have a HashMap to avoid iteration? What would you do? Is there an easier way?
public enum SomeEnum
{
TYPE_A(2000), TYPE_B(2001);
private int codeValue;
private static HashMap<Integer, SomeEnum> codeValueMap = new HashMap<Integer, SomeEnum>(2);
static
{
for (SomeEnum type : SomeEnum.values())
{
codeValueMap.put(type.codeValue, type);
}
}
//constructor and getCodeValue left out
public static SomeEnum getInstanceFromCodeValue(int codeValue)
{
return codeValueMap.get(codeValue);
}
}

That's exactly the approach I'd take to solve that particular problem. I see nothing wrong with it from a design point of view, it's intuitive, efficient and (as far as I can see) does exactly what it should.
The only other sensible approach I can think of would be to have the map in a separate class and then call that class to update the map from SomeEnum's constructor. Depending on the use case, this separation could be beneficial - but unless it would have a hard benefit I would take your approach and encapsulate everything within the enum itself.

Thanks, I guess my main concern is memory usage, and if it is worth it.
Unless that enum has thousands of values, memory usage will be trivial. (And if it does have thousands of values, then using iteration to do the lookup would be a major performance killer.)
This is a sensible use of memory, IMO.
Perhaps I am over thinking this.
Perhaps you are.

I think in this case we can't avoid iteration. It's either HashMap doing it, or we wrote our own iteration code.
If performance really does matter maybe you can try a binary tree approach.

If your enum space is dense, that is, not a lot of unused values, you could use the toString() and valueOf() methods. Name your values with a common string prefix, then attach the prefix before using valueOf() and strip it after using toString(). This has the disadvantage that you would have to convert to a numeric value if that's how it's stored in your database.
Alternatively, you could add common methods for conversion and assign your database value to a specific enum value.
Both these techniques have the advantage of leveraging the design of enum classes.
There is a lot of good, mind-bending information about enums (and Java, in general) at http://mindprod.com/jgloss/enum.html.
Though, there's nothing wrong with your way if it does the job you want.

That's fine. Don't worry about tiny performance differences.
One would think that if there are only two instances for an enum, like in your example, a trivial code of iterating would be faster:
public static SomeEnum getInstanceFromCodeValue(int codeValue)
{
for (SomeEnum type : SomeEnum.values()) {
if(type.codeValue == codeValue)
return type;
}
}
But there's a hidden cost, quite expensive one if we do care about performane at such level. It's fixable, but you need to see it first:)

To get the ID:
EnumDay day = EnumDay.WEDNESDAY;
int myID = day.ordinal();
To load the day from the myID:
EnumDay dayCopy = EnumDay.values()[myID];

Why not allow an external interface to provide hashCode/equals for a HashMap?

With a TreeMap it's trivial to provide a custom Comparator, thus overriding the semantics provided by Comparable objects added to the map. HashMaps however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap (or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).

.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.

A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap (in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap is commons-collections' own IdentityMap (only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator" to a Map instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.

HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy with the built in HashSet or HashMap. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy and a java.util.Map called UnifiedMapWithHashingStrategy.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map? UnifiedSetWithHashingStrategy uses half the memory of a UnifiedMap, and one quarter the memory of a HashMap. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains(), but not get(). UnifiedSetWithHashingStrategy implements Pool in addition to Set, so it also implements a form of get().
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase() which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
#Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
#Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.

Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.

Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String, you can write a wrapper class around String for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}

good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.

I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)

This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.

There's such a feature in com.google.common.collect.CustomConcurrentHashMap, unfortunately, there's currently no public way how to set the Equivalence (their Hasharator). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.