I have an 18 Character String that I need to convert into a unique long (in Java).
A sample String would be: AAA2aNAAAAAAADnAAA
My String is actually an Oracle ROWID, so it can be broken down if needs be, see:
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#CNCPT713
The long number generated, (1) Must be unique, as no two results can point to the same database row and (2) Must be reversible, so I can get the ROWID String back from the long?
Any suggestions on an algorithm to use would be welcome.
Oracle forum question on this from a few years ago : http://forums.oracle.com/forums/thread.jspa?messageID=1059740
Ro
You can't, with those requirements.
18 characters of (assuming) upper and lower case letters has 5618 or about 2.93348915 × 10331 combinations. This is (way) more than the approximate 1.84467441 × 1019 combinations available among 64 bits.
UPDATE: I had the combinatorics wrong, heh. Same result though.
Just create a map (dictionary / hashtable) that maps ROWID strings to an (incremented) long. If you keep two such dictionaries and wrap them up in a nice class, you will have a bidirectional lookup between the strings and the long IDs.
Pseudocode:
class BidirectionalLookup:
dict<string, long> stringToLong
dict<long, string> longToString
long lastId
addString(string): long
newId = atomic(++lastId)
stringToLong[string] = newId
longToString[newId] = string
return newId
lookUp(string): long
return stringToLong[string]
lookUp(long): string
return longToString[long]
Your String of 18 characters representing a base 64 encoding represents a total of 108 bits of information, which is almost twice that of long's 64. We have a bit of a problem here if we want to represent every possible key and have the representation be reversible.
The string can be broken down into 4 numbers easily enough. Each of those 4 numbers represents something - a block number, an offset in that block, whatever. If you manage to establish upper limits on the underlying quantities such that you know larger numbers will not occur (i.e. if you find a way to identify at least 44 of those bits that will always be 0), then you can map the rest onto a long, reversibly.
Another possibility would be to relax the requirement that the equivalent be a long. How about a BigInteger? That would make it easy.
I'm assuming that's a case-insensitive alpha-numeric string, and so drawn from the set [a-zA-Z0-9]*
In that case you have
26 + 26 + 10 = 62
possible values for each character.
62 < 64 = 2^6
In other words you need (at least) 6 bits to store each of the 18 characters of the key.
6 * 18 = 108 bits
to store the entire string uniquely.
108 bits = (108 / 8) = 13.5 bytes.
Therefore as long as your data type can store at least 13.5 bytes then you can fairly simply define a mapping:
Map from raw ASCII for each character to a representation using only 6 bits
Concatenate all 18 reduced representations to a sinlde 14 byte value
Cast this to your final data value
Obviously Java has nothing more than an 8 byte long. So if you have to use a long then it is NOT possible to uniquely map the strings, unless there is something else which reduces the space of valid input strings.
Theoretically, you can't represent ROWID in a long (8 bytes). However, depending on the size of your databases (the whole server, not only your table), you might be able to encode it into a long.
Here is the layout of ROWID,
OOOOOO-FFF-BBBBBB-RRR
Where O is ObjectID. F is FileNo. B is Block and R is Row Number. All of them are Base64-encoded. As you can see O & B can have 36-bits and B&R can have 18.
If your database is not huge, you can use 2 byte for each part. Basically, your ObjectId and block number will be limited to 64K. Our DBA believes our database has to be several magnitude bigger for us to get close to these limits.
I would suggest you find max of each part in your database and see if you are close. I wouldn't use long if they are anywhere near the limit.
Found a way to extract the ROWID in a different manner from the database....
SQL> select DBMS_ ROWID.ROWID_ TO_RESTRICTED( ROWID, 1 ) FROM MYTABLE;
0000EDF4.0001.0000
0000EDF4.0002.0000
0000EDF4.0004.0000
0000EDF4.0005.0000
0000EDF4.0007.0000
0000EDF5.0000.0000
0000EDF5.0002.0000
0000EDF5.0003.0000
Then convert it to a number like so :
final String hexNum = rowid.replaceAll( "\.", "" );
final long lowerValue = Long.parseLong( hexNum.substring( 1 ), 16 );
long upperNibble = Integer.parseInt( hexNum.substring( 0, 1 ), 16 );
if ( upperNibble >= 8 ) {
//Catch Case where ROWID > 8F000000.0000.0000
upperNibble -= 8;
return -( 9223372036854775807L - ( lowerValue - 1 + ( upperNibble << 60 ) ) );
} else {
return ( lowerValue + ( upperNibble << 60 ) );
}
Then reverse that number back to String format like so:
String s = Long.toHexString( featureID );
//Place 0's at the start of the String making a Strnig of size 16
s = StringUtil.padString( s, 16, '0', true );
StringBuffer sb = new StringBuffer( s );
sb.insert( 8, '.' );
sb.insert( 13, '.' );
return sb.toString();
Cheers for all the responses.
This sounds ... icky, but I don't know your context so trying not to pass judgement. 8)
Have you considered converting the characters in the string into their ASCII equivalents?
ADDENDUM: Of course required truncating out semi-superflous characters to fit, which sounds like an option you may have from comments.
Related
for example I have such number 4302033, it is in binary system 10000011010010011010001, and I need to make inversion, but when I do inversion, it turns out so 11111111101111100101101100101110, and how to take away these units in the beginning
You are wrong!
The binary representation of 4302033 is not 10000011010010011010001!
In fact it is 000000000010000011010010011010001.
In Java, an int has 32 bits (check Integer.SIZE), always, no matter how large the number is that you store within that int field. Even for zero, it has those 32 bit (Ok, the first (most left) bit is not really used for the value, but for the sign).
This means that when you invert an int, you invert all these 32 bits, even including those left from the first 1 (those you referred to as "these units at the beginning").
If this is unwanted, you have to take precautions against that:
var value = 4302033;
var leadingZeros = Integer.numberOfLeadingZeros( value );
var inverted = value ^ 0xFFFFFFFF << leadingZeros >>> leadingZeros;
System.out.println( Integer.toBinaryString( value );
System.out.println( Integer.toBinaryString( inverted );
I have something like this:
int[0] = 4123;
int[1] = 2571;
I would like to combine them and make one long value in Java.
This is my attempt:
int[] r = { 4123, 2571 };
long result = ( (r[1] & 0xFFFF) << 16 | (rs[0] & 0xFFFF) );
System.out.prinln(result);
The output should be: 10111627 but I get 168497179. Probably I miss something in conversion but don't have idea what...
EDIT
This is example how the value is placed into 32-bit register.
I try the summarize and hopefully clarify what the several comments on your question already indicate:
If you want to get the number from your image which is
00001010 00001011 00010000 00011011 = 0x0A0B101B = 168497179
in one single long value and you have two ints
0001000000011011 = 0x101B = 4123 and
0000101000001011 = 0x0A0B = 2571
than your code is correct.
I would recommend you to get used to hexadecimal numbers as they show easily that there is no binary relation between 0x0A0B & 0x101B and 0x009A4A8B = 10111627.
BTW your image is contradictory: the binary numbers represent as seen above the number 0x0A0B101B but the hexadecimals read 0x0A0B101E (notice the E) while the decimals support the binary value.
Finally, I figured out your flaw:
You seem to expect to get the decimal number concatenated together as result. But unlike the hexadecimals here it does not work this way in decimal!
Let me elaborate that. You have the binary number:
00001010 00001011 00010000 00011011
Which you can easily convert to hex block by block
0x0A 0x0B 0x10 0x1B
and than just join them together
0x0A0B101B
But that magic join is just a simplification only applying to hex (and the reason why hex is so popular among programmers).
The long version is you have to multiply the higher blocks/bytes (towards the left) with the 'basis' of the preceding block (to the right). The right most block is always multiplied by 1. The base for the next block is (since there are 8 bits in the first block) 28 = 256 = 0x100. The base for the third block is (8+8 bits) 216 = 65536 = 0x10000. The last (left most) has to be multiplied by (8+8+8 bits) 224 = 16777216 = 0x1000000.
Lets make an example for the first two blocks:
Hexadecimal:
0x10 || 0x1B
(0x10 * 0x100) + (0x1B* 0x1)
0x1000 + 0x1B = 0x101B
Decimal:
16 || 27
(16 * 256) + (27 * 1)
4096 + 27 = 4123
As you can see on your image they both in it (notice the E/B issue which is in decimal a 6/3 issue) but there is no 1627. So converting binary or hexadecimal numbers to decimal is a nontrivial task (for humans), best to use a calculator.
I have a very odd situation,
I'm writing a filter engine for another program, and that program has what are called "save areas". Each of those save areas is numbered 0 through 32 (why there are 33 of them, I don't know). They are turned on or off via a binary string,
1 = save area 0 on
10 = save area 1 on, save area 0 off
100 = save area 2 on, save areas 1 and 0 off.
and so on.
I have another program passing in what save areas it needs, but it does so with decimal representations and underscores - 1_2_3 for save areas 1, 2, and 3 for instance.
I would need to convert that example to 1110.
What I came up with is that I can build a string as follows:
I break it up (using split) into savePart[i]. I then iterate through savePart[i] and build strings:
String saveString = padRight("0b1",Integer.parseInt(savePart[i]));
That'll give me a string that reads "0b1000000" in the case of save area 6, for instance.
Is there a way to read that string as if it was a binary number instead. Because if I were to say:
long saveBinary = 0b1000000L
that would totally work.
or, is there a smarter way to be doing this?
long saveBinary = Long.parseLong(saveString, 2);
Note that you'll have to leave off the 0b prefix.
This will do it:
String input = "1_2_3";
long areaBits = 0;
for (String numTxt : input.split("_")) {
areaBits |= 1L << Integer.parseInt(numTxt);
}
System.out.printf("\"%s\" -> %d (decimal) = %<x (hex) = %s (binary)%n",
input, areaBits, Long.toBinaryString(areaBits));
Output:
"1_2_3" -> 14 (decimal) = e (hex) = 1110 (binary)
Just take each number in the string and treat it as an exponent. Accumulate the total for each exponent found and you will get your answer w/o the need to remove prefixes or suffixes.
// will contain our answer
long answer = 0;
String[] buckets = givenData.split("_"); // array of each bucket wanted, exponents
for (int x = 0; x < buckets.length; x++){ // iterate through all exponents found
long tmpLong = Long.parseLong(buckets[x]); // get the exponent
answer = (10^tmpLong) + answer; // add 10^exponent to our running total
}
answer will now contain our answer in the format 1011010101 (what have you).
In your example, the given data was 1_2_3. The array will contain {"1", "2", "3"}
We iterate through that array...
10^1 + 10^2 + 10^3 = 10 + 100 + 1000 = 1110
I believe this is also why your numbers are 0 - 32. x^0 = 1, so you can dump into the 0 bucket when 0 is in the input.
Background information:
In my project I'm applying Reinforcement Learning (RL) to the Mario domain. For my state representation I chose to use a hashtable with custom objects as keys. My custom objects are immutable and have overwritten the .equals() and the .hashcode() (which were generated by the IntelliJ IDE).
This is the resulting .hashcode(), I've added the possible values in comments as extra information:
#Override
public int hashCode() {
int result = (stuck ? 1 : 0); // 2 possible values: 0, 1
result = 31 * result + (facing ? 1 : 0); // 2 possible values: 0, 1
result = 31 * result + marioMode; // 3 possible values: 0, 1, 2
result = 31 * result + (onGround ? 1 : 0); // 2 possible values: 0, 1
result = 31 * result + (canJump ? 1 : 0); // 2 possible values: 0, 1
result = 31 * result + (wallNear ? 1 : 0); // 2 possible values: 0, 1
result = 31 * result + nearestEnemyX; // 33 possible values: - 16 to 16
result = 31 * result + nearestEnemyY; // 33 possible values: - 16 to 16
return result;
}
The Problem:
The problem here is that the result in the above code can exceed Integer.MAX_VALUE. I've read online this doesn't have to be a problem, but in my case it is. This is partly due to algorithm used which is Q-Learning (an RL method) and depends on the correct Q-values stored inside the hashtable. Basically I cannot have conflicts when retrieving values. When running my experiments I see that the results are not good at all and I'm 95% certain the problem lies with the retrieval of the Q-values from the hashtable. (If needed I can expand on why I'm certain about this, but this requires some extra information on the project which isn't relevant for the question.)
The Question:
Is there a way to avoid the integer overflow, maybe I'm overlooking something here? Or is there another way (perhaps another datastructure) to get reasonably fast the values given my custom-key?
Remark:
After reading some comments I do realise that my choice for using a HashTable wasn't maybe the best one as I want unique keys that do not cause collisions. If I still want to use the HashTable I will probably need a proper encoding.
You need a dedicated Key Field to guarantee uniqueness
.hashCode() isn't designed for what you are using it for
.hashCode() is designed to give good general results in bucketing algorithms, which can tolerate minor collisions. It is not designed to provide a unique key. The default algorithm is a trade off of time and space and minor collisions, it isn't supposed to guarantee uniqueness.
Perfect Hash
What you need to implement is a perfect hash or some other unique key based on the contents of the object. This is possible within the boundries of an int but I wouldn't use .hashCode() for this representation. I would use an explicit key field on the object.
Unique Hashing
One way to use use SHA1 hashing that is built into the standard library which has an extremely low chance of collisions for small data sets. You don't have a huge combinational explosion in the values you posts to SHA1 will work.
You should be able to calculate a way to generate a minimal perfect hash with the limited values that you are showing in your question.
A minimal perfect hash function is a perfect hash function that maps n
keys to n consecutive integers—usually [0..n−1] or [1..n]. A more
formal way of expressing this is: Let j and k be elements of some
finite set K. F is a minimal perfect hash function iff F(j) =F(k)
implies j=k (injectivity) and there exists an integer a such that the
range of F is a..a+|K|−1. It has been proved that a general purpose
minimal perfect hash scheme requires at least 1.44 bits/key.2 The
best currently known minimal perfect hashing schemes use around 2.6
bits/key.[3]
A minimal perfect hash function F is order preserving if keys are
given in some order a1, a2, ..., an and for any keys aj and ak, j
A minimal perfect hash function F is monotone if it preserves the
lexicographical order of the keys. In this case, the function value is
just the position of each key in the sorted ordering of all of the
keys. If the keys to be hashed are themselves stored in a sorted
array, it is possible to store a small number of additional bits per
key in a data structure that can be used to compute hash values
quickly.[6]
Solution
Note where it talks about a URL it can be any byte[] representation of any String that you calculate from your object.
I usually override the toString() method to make it generate something unique, and then feed that into the UUID.nameUUIDFromBytes() method.
Type 3 UUID can be just as useful as well UUID.nameUUIDFromBytes()
Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a
fully qualified domain name, an object identifier, a distinguished
name (DN as used in Lightweight Directory Access Protocol), or on
names in unspecified namespaces. Version 3 UUIDs have the form
xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit
and y is one of 8, 9, A, or B.
To determine the version 3 UUID of a given name, the UUID of the
namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for a domain) is
transformed to a string of bytes corresponding to its hexadecimal
digits, concatenated with the input name, hashed with MD5 yielding 128
bits. Six bits are replaced by fixed values, four of these bits
indicate the version, 0011 for version 3. Finally, the fixed hash is
transformed back into the hexadecimal form with hyphens separating the
parts relevant in other UUID versions.
My preferred solution is Type 5 UUID ( SHA version of Type 3)
Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the
same idea as in version 3. RFC 4122 states that version 5 is preferred
over version 3 name based UUIDs, as MD5's security has been
compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits
to make the length work out. An erratum addresses the example in
appendix B of RFC 4122.
Key objects should be immutable
That way you can calculate toString(), .hashCode() and generate a unique primary key inside the Constructor and set them once and not calculate them over and over.
Here is a straw man example of an idiomatic immutable object and calculating a unique key based on the contents of the object.
package com.stackoverflow;
import javax.annotation.Nonnull;
import java.util.Date;
import java.util.UUID;
public class Q23633894
{
public static class Person
{
private final String firstName;
private final String lastName;
private final Date birthday;
private final UUID key;
private final String strRep;
public Person(#Nonnull final String firstName, #Nonnull final String lastName, #Nonnull final Date birthday)
{
this.firstName = firstName;
this.lastName = lastName;
this.birthday = birthday;
this.strRep = String.format("%s%s%d", firstName, lastName, birthday.getTime());
this.key = UUID.nameUUIDFromBytes(this.strRep.getBytes());
}
#Nonnull
public UUID getKey()
{
return this.key;
}
// Other getter/setters omitted for brevity
#Override
#Nonnull
public String toString()
{
return this.strRep;
}
#Override
public boolean equals(final Object o)
{
if (this == o) { return true; }
if (o == null || getClass() != o.getClass()) { return false; }
final Person person = (Person) o;
return key.equals(person.key);
}
#Override
public int hashCode()
{
return key.hashCode();
}
}
}
For a unique representation of your object's state, you would need 19 bits in total. Thus, it is possible to represent it by a "perfect hash" integer value (which can have up to 32 bits):
#Override
public int hashCode() {
int result = (stuck ? 1 : 0); // needs 1 bit (2 possible values)
result += (facing ? 1 : 0) << 1; // needs 1 bit (2 possible values)
result += marioMode << 2; // needs 2 bits (3 possible values)
result += (onGround ? 1 : 0) << 4; // needs 1 bit (2 possible values)
result += (canJump ? 1 : 0) << 5; // needs 1 bit (2 possible values)
result += (wallNear ? 1 : 0) << 6; // needs 1 bit (2 possible values)
result += (nearestEnemyX + 16) << 7; // needs 6 bits (33 possible values)
result += (nearestEnemyY + 16) << 13; // needs 6 bits (33 possible values)
}
Instead of using 31 as a your magic number, you need to use the number of possibilities (normalised to 0)
#Override
public int hashCode() {
int result = (stuck ? 1 : 0); // 2 possible values: 0, 1
result = 2 * result + (facing ? 1 : 0); // 2 possible values: 0, 1
result = 3 * result + marioMode; // 3 possible values: 0, 1, 2
result = 2 * result + (onGround ? 1 : 0); // 2 possible values: 0, 1
result = 2 * result + (canJump ? 1 : 0); // 2 possible values: 0, 1
result = 2 * result + (wallNear ? 1 : 0); // 2 possible values: 0, 1
result = 33 * result + (16 + nearestEnemyX); // 33 possible values: - 16 to 16
result = 33 * result + (16 + nearestEnemyY); // 33 possible values: - 16 to 16
return result;
}
This will give you 104544 possible hashCodes() BTW you can reverse this process to get the original values from the code by using a series of / and %
Try Guava's hashCode() method or JDK7's Objects.hash(). It's way better than writing your own. Don't repeat code yourself (and anyone else when you can use out of box solution):
I know there are N threads for this question but some people are using different and different methods to convert a byte to int. Is this correct what am I writing? Hex to int or hex to decimal? Which one is the correct?
Anyway, why I'm getting 4864 instead of 19 ?
byte[] buffer = ....
buffer[51] = 0x13;
System.out.println( buffer[51] << 8 );
Is this correct what am I writing?
The code you've posted does implicit conversion of int to String, but that will display it in decimal. It's important to understand that a number isn't in either hex or decimal - it's just a number. The same number can be converted to different textual representations, and that's when the base matters. Likewise you can express the same number with different literals, so these two statements are exactly equivalent:
int x = 16;
int x = 0x10;
Anyway, why I'm getting 4864 instead of 19
Because you're explicitly shifting the value left 8 bits:
buffer[51] << 8
That's basically multiplying by 256, and 19 * 256 is 4864.
you are getting 4864 as a result because 4864 is 0x1300 in hex.
if you are expecting 19(0x13) as result then I guess you are trying to do circular shifting.
you can do that using writing like that,
/*hex 0x13 (19 in decimal) is assigned to buffer[51] as int*/
buffer[51] = 0x13;
System.out.println( Integer.rotateRight(buffer[51], 8));