I have a table whose PK consists of two short varchars (15 and 5) and one datetime field.
My thoughts on creating a hashCode was to formate the datetime to something like yyyyMMddHHmmss and then concatenate it with the other two fields using some delimiter (e.g. _) and then ask for the hash code on that string.
Was wondering if there may be a more elegant approach.
Thanks
All depends on what you mean by "bulletproof". If you just mean it can be used as the hashCode of a Java object, then it should be fine. Doesn't Hibernate return a datetime as a java Date? If so, just use hashCode on that Date. You can xor (or add, ...) with the other hashCodes instead of concatenating and hashing, it may be a bit faster.
If by "bulletproof" you need a cryptographically secure hash, then you need to do more.
Related
I have an object that is identified by 3 fields. One them is a String that represents 6 hex bytes, the other two are integers of not more than 1 bytes each. This all summed up is 8 bytes of data, which fits in a 64 bit integer.
I need to map these objects for fast access, and I can think of two approaches:
Use the 3 fields to generate a 64 bit key used to map the objects. This however would mean parsing the String to Hex for every access (and there will a lot of accesses, which need to be fast).
Use 3 HashMap levels, each nested inside the next, to represent the 3 identifying fields.
My question is which of these approaches should be the fastest.
Why not use a MultiKeyMap?
This might be not related to your question.
I have a suggestion for you.
Create an object with the 3 attributes that will form the key. Use the object has the key because it will be unique.
Map<ObjectKey,Object> map = new HashMap<>();
This makes sense for your use case? If you can add a bit more explanation maybe I can go further in suggest you possible solutions.
EDIT: You can override the equals and do something using this kind of logic:
#Override
public boolean equals(Object obj) {
if (!(obj instanceof Key))
return false;
ObjectKey objectKey= (Key) obj;
return this.key1.equals(objectKey.key1) && this.key2.equals(objectKey.key2) &&
...
this.keyN.equals(objectKey.keyN)
}
I would take the following steps:
Write it in the most readable way first, and profile it.
Refactor it to an implementation you think might be faster, then profile it again.
Compare.
Repeat.
Your key fits into a 64-bit value. Assuming you will build the HashMap in one go and then read from it multiple times (using it as a lookup table), my hunch is that using a Long type as the key of your HashMap will be about as fast as you can get.
You are concerned about having to parse the string as a hex number every time you look up a key in the map. What's the alternative? If you use a key containing the three separate fields, you will still have to parse the string to calculate its hash code (or, rather, the Java API implementation will calculate its hash code by parsing the string contents). The HashMap will not only call String.hashCode() but also String.equals(), so your string will be iterated twice. By contrast, calculating a Long and comparing it to the precalculated keys in the HashMap will consist of iterating the string only once.
If you use three levels of HashMap, as per your second suggestion, you will still have to calculate the hash code of your string, as well as having to look up the values of all three fields anyway, so the multi-level map doesn't give you any performance advantage.
You should also experiment with the HashMap constructor arguments to get the most efficiency. These will determine how efficiently your data will get spread into separate buckets.
Since there is known fact that Java generates around 4 Billion unique Hashcodes.
I am using Hashcode of Some String (Example Fname + Lname + DOB + DATE) which becomes Primary Key of my Database
in #PrePersist I set it with Hashcode which helps me in generating Hashcode for new Users. (Which has to be unique).
Now I am running out of has codes. Possible alternative for me is to use SHA-2 , MD5 etc.
How can I increase size of hash code & yet avoid that big collisions.
If your goal is to create a unique identifier for the database, I would suggest using UUID.
UUID Version 3, as it uses a namespace, will fit your case.
Some databases have native support for UUID, for instance PostgreSQL
I think you are confused about using int Object.hashCode(), which you can override and which returns an int and using a secure hash function. Those are two things. Object.hashCode is not intended to return unique integers (returning 1 is a valid implementation). So, using String.hashCode() for object identity is not a great idea since it can and will have collisions. It's intended for use with e.g. HashTables; which means it is optimized for performance and not for avoiding collisions.
You can indeed use sha1, sha2, sha3, or md5 if you want some kind of content hash. If not, use SecureRandom or UUID to generate something random. All of these have a very low probability of ever giving you a collision (not completely 0 of course).
I'm running this off a java program which connects to my sql server on the same machine.
Basically I'm trying to call a certain 'String' which can be identified by the string self or by it's already stored 'long'(int64) which is a method that stores an unique long related to the string.
So in this case my question would be, would long comparison at a SQL lookup be faster or wouldn't it matter that much versus String comparison.
SELECT * FROM playerAccount WHERE playerName = {string in Java}
or
SELECT * FROM playerAccount WHERE nameHash = {long in Java}
Thanks in advance ;)
The comparison operation itself is rather negligible. However, in general in computer code, the comparison of the long is going to use fewer cycles than the comparison for a string.
The reason is that comparing the bits in a numeric value is unambiguous and the code doesn't need to worry about the length of the value. When comparing strings, the underlying code has to "parse" the strings, character by character, to make the comparison, figure out where they end, and handle collations and character pages.
But, this is rather unimportant. For speed, you want an index. And although an index using the numeric value might be an iota faster than an index using a string, this is the wrong criteria for choosing which to use. Your code should be designed to function correctly and to be maintainable. It is doubtful that an optimization of this sort would ever be necessary to achieve a real-world goal.
Generally comparing long values is faster than comparing string values.
If string and long are stored in a database the problem is not the comparison, but the presence or absence of an index.
So the better solution is using the long value with an index the database.
Please note that if "nameHash" is the hashCode of the field "playerName" the two search don't returns necessarely the same record, infact two players with different names can possible have the same hashCode, so consider exactly what are your needs and eventually update the code.
I have an
HashMap<String,AnObject>
and I'd like to give the string key a value from some infos the AnObject value contains.
Suppose AnObject is made this way:
public class AnObject(){
public String name;
public String surname;
}
Is it correct to assign the key to:
String.valueOf(o.name.hashcode()+o.surname.hashcode());
? Or Is there a better way to compute a String hash code from a value list?
No, absolutely not. hashCode() is not guaranteed to be unique.
The rules of a hash code are simple:
Two equal values must have the same hash code
Two non-equal values will ideally have different hash codes, but can have the same hash code. In particular, there are only 232 possible values to return from hashCode(), but more than 232 possible strings, making uniqueness impossible.
The hash code of an object should not change unless some equality-sensitive aspect of it changes. Indeed, it's generally a good idea to make types implementing value equality immutable, at least in equality-sensitive aspects. Otherwise you can easily find that you can't look up an entry using the exact same object reference that you previously used for the key!
Hash codes are an optimization technique to make it quick to find a "probably small" set of candidate values equal to some target, which you then iterate through with a rigorous equality check to find whether any of them is actually equal to the target. That's what lets you quickly look something up by key in a hash-based collection. The key isn't the hash itself.
If you need to create a key from two strings, you're going to basically have to make it from those two strings (with some sort of delimiter so you can tell the difference between {"a", "bc"} and {"ab", "c"} - understanding that the delimiter itself might appear in the values if you're not careful).
See Eric Lippert's blog post on the topic for more information; that's based on .NET rather than Java, but they all apply. It's also worth understanding that the semantics of hashCode aren't necessarily the same as those of a cryptographic hash. In particular, it's fine for the result of hashCode() to change if you start a new JVM but create an object with the same fields - no-one should be persisting the results of hashCode. That's not the case with something like SHA-256, which should be permanently stable for a particular set of data.
The hash code for String is lossy; many String values will result in the same hash code. An integer has 32 bit positions and each position has two values. There's no way to map even just the 32-character strings (for instance) (each character having lots of possibilities) into 32 bits without collisions. They just won't fit.
If you want to use arbitrary precision arithmetic (say, BigInteger), then you can just take each character as an integer and concatenate them all together.
No, hashCode() (BTW pay attention on case of letter C) does not guarantee uniqueness. You can have a lot of objects that produce the same hash code.
If you need unique identifier use class java.util.UUID.
The Sortedset can sort itself automatically but in some case, it doesn't work as I want. For example. I stored String date value in a sortedset but apparently it didn't work as my expectation. This is what I got:
[03-10-2013, 06-10-2013, 08-10-2013, 09-10-2013, 18-09-2013, 24-09-2013, 29-09-2013]
Is there any good way to deal with this problem without having to introduce a comparator?
The best way is to avoid using String to represent a Date. Use a Date, which has a natural chronological order. Transform the date to a String only when necessary, i.e. to display it to users or store them in files.
The reason it doesn't work is that the natural ordering of String is the lexicographic order. So "18-09-2013" comes after "03-10-2013", simply because '1' comes after '0' in the lexicographic order.
Use a set of either:
Date objects java.util.Date or
Time in milli seconds java.lang.Integer
These objects can be compared much easier.