How to convert Parse ObjectId (String) to long? - java

Every object in Parse.com has your own ObjectId, that is a string with 10 char and apparently it is created by this regex: [0-9a-zA-Z]{10}.
Example of ObjectId in Parse:
X12wEq4sFf
Weg243d21s
zwg34GdsWE
I would like to convert this String to Long, because it will save memory and improve searching. (10 chars using UTF-8 has 40 bytes, and 1 long has 8 bytes)
If we calculate the combinations, we can find:
String ObjectId: 62^10 = 839299365868340224 different values;
long: is 2^64 = 18446744073709551616 different values.
So, we can convert these values without losing information. There is a simple way to do it safely? Please, consider any kind of encoding for Chars (UTF-8, UTF-16, etc);
EDIT: I am just thinking in a hard way to solved it. I am asking if there is an easy way.

Your character set is a subset of the commonly-used Base64 encoding, so you could just use that. Java has the Base64 class, no need to roll your own codec for this.
Are you sure this is actually valuable? "because it will save memory and improve searching" seems like an untested assertion; saving a few bytes on the IDs may very well be offset by the added cost of encoding and decoding every time you want to use something.
EDIT: Also, why are you using UTF-8 strings for guaranteed-ascii data? If you represent 10 char IDs as a byte[10], that's just 10 bytes instead of 40 (i.e. much closer to the 8 for a long). And you don't need to do any fancy conversions.

Here's a straightforward solution using 6 bits to store a single character.
public class Converter {
private static final String CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static int convertChar(char c) {
int ret = CHARS.indexOf( c );
if (ret == -1)
throw new IllegalArgumentException( "Invalid character encountered: "+c);
return ret;
}
public static long convert(String s) {
if (s.length() != 10)
throw new IllegalArgumentException( "String length must be 10, was "+s.length() );
long ret = 0;
for (int i = 0; i < s.length(); i++) {
ret = (ret << 6) + convertChar( s.charAt( i ));
}
return ret;
}
}
I'll leave the conversion from long to String for you to implement, it's basically the same in reverse.
P.s.: If you really want to save space, don't use Long, it adds nothing compared to the primitive long except overhead.
P.s 2: Also note that you aren't really saving much with this conversion: storing the ASCII characters can be done in 10 bytes, while a long takes up 4. What you save here is mostly the overhead you'd get if you stored those 10 bytes in a byte array.

Related

Java hashcode brute-forcing [duplicate]

Is there any way that I can use a hashcode of a string in java, and recreate that string?
e.g. something like this:
String myNewstring = StringUtils.createFromHashCode("Hello World".hashCode());
if (!myNewstring.equals("Hello World"))
System.out.println("Hmm, something went wrong: " + myNewstring);
I say this, because I must turn a string into an integer value, and reconstruct that string from that integer value.
This is impossible. The hash code for String is lossy; many String values will result in the same hash code. An integer has 32 bit positions and each position has two values. There's no way to map even just the 32-character strings (for instance) (each character having lots of possibilities) into 32 bits without collisions. They just won't fit.
If you want to use arbitrary precision arithmetic (say, BigInteger), then you can just take each character as an integer and concatenate them all together. VoilĂ .
No. Multiple Strings can have the same hash code. In theory you could create all the Strings that have have that hash code, but it would be near infinite.
Impossible I'm afraid. Think about it, a hashcode is a long value i.e. 8 bytes. A string maybe less than this but also could be much longer, you cannot squeeze a longer string into 8 bytes without losing something.
The Java hashcode algorithm sums every 8th byte if I remember correctly so you'd lose 7 out of 8 bytes. If your strings are all very short then you could encode them as an int or a long without losing anything.
For example, "1019744689" and "123926772" both have a hashcode of -1727003481. This proves that for any integer, you might get a different result (i.e. reversehashcode(hashcode(string)) != string).
Let's assume the string consists only of letters, digits and punctuation, so there are about 70 possible characters.
log_70{2^32} = 5.22...
This means for any given integer you will find a 5- or 6-character string with this as its hash code. So, retrieving "Hello World": impossible; but "Hello" might work if you're lucky.
You could do something like this:
char[] chars = "String here".toCharArray();
int[] ints = new int[chars.length];
for (int i = 0; i < chars.length; i++) {
ints[i] = (int)chars[i];
}
Then:
char[] chars = new char[ints.length]
for (int i = 0; i < chars.length; i++) {
chars[i] = (char)ints[i];
}
String final = new String(chars);
I have not actually tested this yet... It is just "concept" code.

How do you make a byte into a binary number and not to a string in java

When I want to print the binary number of a byte, I have to do:
byte byte1 = 16;
String byteString = Integer.toBinaryString(byte1);
System.out.println(byteString);
This makes the byte into a string, but when I try to parse it into a byte, it makes it into a base 10 number again, Is there a way to make a byte into a binary number byte, and not to a base-10? I want to make it so that if you printed the byte, it would print the binary. do you have to tell it to print the Binary every time?
I want to know if there is a way to make it print the binary representation of the byte every time, instead of having to convert it to a binary string every time, and without making a new string variable to print.
You don't need a String variable because you can just do this:
System.out.println(Integer.toBinaryString(a));
but there is no way to make the conversion happen automatically without using toBinaryString. If this code is too long, you could make a simple method like this
public static void printInBinary(int a) {
System.out.println(Integer.toBinaryString(a));
}
As you have identified, both approaches will result in unnecessary work if you need to print the same number repeatedly, but you do not need to worry about this. Worrying about stuff like that is a waste of time (99% of the time).
Because of programmer efficiency.
At physical level, computers do not have the concept of neither "binary" numbers nor "decimal" numbers (I mean, in form of "110011" or "123"). It's all electrical impulses in there. When you are printing a number onto the screen, it ALWAYS has to convert the "impulses" into characters on your screen in one way or another.
When the number is stored in memory as a "number", it is not compatible with neither decimal nor binary representation. Converting the "number" into a "string" of any kind requires approximately same amount of computing power.
Let's say you have this code:
byte byte1 = 16;
String byteString = Integer.toBinaryString(byte1);
System.out.println(byteString);
System.out.println(byte1);
In reality, the operations performed by the cpu would look something like this:
String byteString = Integer.toBinaryString(byte1);
String decimalString = toDecimalString(byte1);
System.out.println(byteString);
System.out.println(decimalString);
That is, unless you save your number as String already, your CPU has to do extra work to convert it into either decimal or hexadecimal or binary representation. It is just that by default a decimal representation is chosen. And, there is no way to somehow "switch" this default representation neither for one variable nor globally for entire application.
Therefore, you need to convert it to binary every time you want a variable of any numeric type printed on the screen as a character.
What about Byte.parseByte(s, radix)?
System.out.println(Byte.parseByte("10000", 2)); // prints 16
What about this:
byte byte1 = 76;
StringBuilder byteString = new StringBuilder();
for (int i = 128; i > 0; i /= 2) {
if ((byte1 & i) == 0) {
byteString.append(0);
} else {
byteString.append(1);
}
}
System.out.println(byteString);

How would you convert a string to a 64 bit integer?

I am making an application that involves a seed to generate a world and some games let you provide that seed with text. I'm wondering how would you 'convert' a string to an integer.
A simple way would be to use the ASCII values of all the characters and append them to a string which you would then parse to an integer, but that severely limits the size of the string. How would you be able to do this with a larger string?
EDIT: 64 bit not 32
I would just call String.hashcode(). The standard String.hashcode() function makes use of all characters in the target string and gives good dispersal.
The only thing I would question is whether 32 bits of seed is going to be enough. It might mean that your world generator could generate at most 232 different worlds.
Random seeds for Random can be at least 48-bit, ideally 64-bit. You can write your own hash code like this.
public static long hashFor(String s) {
long h = 0;
for(int i = 0; i < s.length(); i++)
h = h * 10191 + s.charAt(i);
return h;
}
The Standard way for converting a String to Integer is using Integer.parseInt(String);
You pass the string into this and it would convert the String to int. Try it and let me know!

How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?

Because MySQL 5.1 does not support 4 byte UTF-8 sequences, I need to replace/drop the 4 byte sequences in these strings.
I'm looking a clean way to replace these characters.
Apache libraries are replacing the characters with a question-mark is fine for this case, although ASCII equivalent would be nicer, of course.
N.B. The input is from external sources (e-mail names) and upgrading the database is not a solution at this point in time.
We ended up implementing the following method in Java for this problem.
Basicaly replacing the characters with a higher codepoint then the last 3byte UTF-8 char.
The offset calculations are to make sure we stay on the unicode code points.
public static final String LAST_3_BYTE_UTF_CHAR = "\uFFFF";
public static final String REPLACEMENT_CHAR = "\uFFFD";
public static String toValid3ByteUTF8String(String s) {
final int length = s.length();
StringBuilder b = new StringBuilder(length);
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
if (codepoint > CharUtils.LAST_3_BYTE_UTF_CHAR.codePointAt(0)) {
b.append(CharUtils.REPLACEMENT_CHAR);
} else {
if (Character.isValidCodePoint(codepoint)) {
b.appendCodePoint(codepoint);
} else {
b.append(CharUtils.REPLACEMENT_CHAR);
}
}
offset += Character.charCount(codepoint);
}
return b.toString();
}
Another simple solution is to use regular expression [^\u0000-\uFFFF]. For example in java:
text.replaceAll("[^\\u0000-\\uFFFF]", "\uFFFD");
5 byte utf-8 sequences begin with a 111110xx-byte and 6 byte utf-8 sequences begin with a 1111110x-byte. Important to note is, that no follow-up bytes of 1-4-byte utf-8 sequences contain bytes that large because follow-up bytes are always of the form 10xxxxxx.
Therefore you can just go through the bytes and every time you see a byte of kind 111110xx then only emit a '?' to the output-stream/array while skipping the next 4 bytes from the input; analogue for the 6-byte-sequences.

C /C++ long long to Java long

I have a file on disk which I'm reading which has been written by c/c++ code. I know I have two 64-bit unsigned integers to read, but Java doesn't support unsigned integers, so the value I get when I do DataInputStream.readLong() is incorrect. (Ignore byte-order for now I'm actually using a derivative of DIS called LEDataInputStream which I downloaded from the web)
A lot of posts on here talk about using BigInteger but the javadoc for reading a bytearray only talks about loading a bytearray respresentation, and the questions seem centered on the fact that some people are going outside the positive bounds of the java long type, which I will be nowhere near with the data I'm reading.
I have a MATLab/Octave script which reads these long long values as two 32-bit integers each, then does some multiplying and adding to get the answer it wants too.
I suppose the question is - how do i read a 64-bit unsigned integer either using BigInteger, or using [LE]DataInputStream.XXX?
Thanks in advance
I would suggest using a ByteBuffer and then using code such as this to get what you want.
You can use a long as a 64-bit value to store unsigned data. Here is a module showing that most Unsigned operations can be performed using the standard long type. It really depends on what you want to do with the value as whether this is problem or not.
EDIT: A common approach to handling unsigned numbers is to widen the data type. This simpler in many cases but not a requirement (and for long using BigInteger doesn't make things any simpler IMHO)
EDIT2: What is wrong with the following code?
long max_unsigned = 0xFFFFFFFFFFFFFFFFl;
long min_unsigned = 0;
System.out.println(Unsigned.asString(max_unsigned) + " > "
+ Unsigned.asString(min_unsigned) + " is "
+ Unsigned.gt(max_unsigned, min_unsigned));
prints
18446744073709551615 > 0 is true
first you check out this question
Also see this
Now use of BigInteger class
// Get a byte array
byte[] bytes = new byte[]{(byte)0x12, (byte)0x0F, (byte)0xF0};
// Create a BigInteger using the byte array
BigInteger bi = new BigInteger(bytes);
// Format to binary
String s = bi.toString(2); // 100100000111111110000
// Format to octal
s = bi.toString(8); // 4407760
// Format to decimal
s = bi.toString(); // 1183728
// Format to hexadecimal
s = bi.toString(16); // 120ff0
if (s.length() % 2 != 0) {
// Pad with 0
s = "0"+s;
}
// Parse binary string
bi = new BigInteger("100100000111111110000", 2);
// Parse octal string
bi = new BigInteger("4407760", 8);
// Parse decimal string
bi = new BigInteger("1183728");
// Parse hexadecimal string
bi = new BigInteger("120ff0", 16);
// Get byte array
bytes = bi.toByteArray();

Categories