I would like to convert a long value to a char sequence like in this caluculator here.
I don't really know how that value is converted into the ASCII sequence (or if its even correct). I thought that an ASCII value is 8 bit long so that would mean that I have to convert the long value to binary and then split it into 8 bit blocks, is that correct?
Strictly, ASCII characters are 7 bit long and we usually just add an extra 0 to the beginning to get 8 bits.
Extensions to ASCII (such as ISO 8859) have 8-bit long characters. The calculator you linked seems to be using one of those extensions.
In Java, longs have 64-bits (and one of those bits is used for the sign), so you can indeed have 8 chunks of 8-bit long characters.
First, you'll have to convert your long to a byte array (not all of that question is relevant to this case, but some of it is -- particularly the part that mentions ByteBuffer).
byte[] bytes = ByteBuffer.allocate(8).putLong(someLong).array();
Once you have the array, convert each byte to a char, using a simple cast.
EDIT: Instead of manually converting each character, you may use the java.lang.String(byte[]) constructor.
String str = new String(bytes);
Note that this will use the platform's default charset. If this is not desirable, you can use one of the constructors that also take a charset.
Nope. There are several ways to do this. One way is using java.math.BigInteger which has a method called toByteArray(). Try it if it fits your problem.
Try this code to convert a long value to a 4-char array.
//convert long to char array
long longValIn = 229902744986400000L;
ByteBuffer bb1 = ByteBuffer.allocate(8);
bb1.putLong(longValIn);
char [] charArr = new char[4];
charArr[0] = bb1.getChar(0);
charArr[1] = bb1.getChar(2);
charArr[2] = bb1.getChar(4);
charArr[3] = bb1.getChar(6);
//convert char array to long
ByteBuffer bb2 = ByteBuffer.allocate(8);
bb2.putChar(0,charArr[0]);
bb2.putChar(2,charArr[1]);
bb2.putChar(4,charArr[2]);
bb2.putChar(6,charArr[3]);
long longValOut = bb2.getLong(0);
Related
I'm working on Huffman compression.
String binaryString = "01011110";
outFile.write(byte);
I have a String which I want to convert to a byte so that I can write that byte to the file. Does anyone know how I can do this?
You can turn that String into a numerical value using the overloaded parseByte that lets you specify the radix:
String binaryString = "01011110";
byte b = Byte.parseByte(binaryString, 2); //this parses the string as a binary number
outFile.write(b);
The second argument to parseByte() lets you specify the Number system in which the String should be parsed. By default, a radix of 10 is used because us humans usually use the decimal system. The 2 says that the number should be treated as a binary value (which is base 2).
You can use Byte.parseByte() with a radix of 2:
byte b = Byte.parseByte(str, 2);
example:
System.out.println(Byte.parseByte("01100110", 2));
Could write (a String[256] with each manually written 1 and 0 set of 8 bits) it out , its only 256 of them. gives you the ability to check with String.indexOf(binnum[arrayIndex])
and make a corresponding array of new byte[256] and set each in matching order with new Integer(increment).byteValue(), it should reproduce for checking printable over the byte[] array using new Byte(bytarray[incr]).intValue()+"\n"
I been reading about encoding Unicode Java 9 compact Strings in the last two days i am getting quite well. But there is something that i dont understand.
About byte data type
1). Is a 8-bit storage ranges from -128 to 127
Questions
1). Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
2). Does the negative value mean something i mean i have try a simple example using Java 11
final char value = (char)200;//in byte would overflow
final String stringValue = new String(new char[]{value});
System.out.println(stringValue);//THE SAME VALUE OF JAVA 8
I have checked the String.value variable and i see a byte array of
System.out.println(value[0]);//-56
The same questions like before arise does the -56 mean something i mean the (negative value) in other languages this overflow is detected to return to the value 200? How can Java know that -56 value is the same as 200 in char.
I have try hardest examples like codepoint 128048 and i see in String.value variable a array of bytes like this.
0 = 61
1 = -40
2 = 48
3 = -36
I know this codepoint takes 4 bytes but i get it how is transformed char[] to byte[] but i dont know how String handle this byte[] data.
Sorry if this question is simple and sorry any typing english is not my natural language thanks a lot.
Why Java didn't implement it like char unsigned 16 bits? i mean it would be in a range of 0.256 because from 0 to 127 only can i hold a Ascii value but what would happen if i set the value 200 a extended ascii would overflow to -56.
Java’s primitive data types were settled with Java 1.0 a quarter century ago. The compact strings were introduced in Java 9, less than two years ago. This new feature, which is merely an implementation detail, did not justify fundamental changes at Java’s type system.
Besides that, you are looking at one interpretation of the data stored in a byte. For the sake of representing iso-latin-1 units, it is entirely irrelevant whether interpreting the same data as Java’s built-in signed byte would result in a positive or negative number.
Likewise Java’s I/O API allows reading a file into a byte[] array and write byte[] arrays back to files and these two operations are already sufficient to copy a file losslessly, regardless of its file format which would be relevant when interpreting its content.
So the following works since Java 1.1:
byte[] bytes = "È".getBytes("iso-8859-1");
System.out.println(bytes[0]);
System.out.println(bytes[0] & 0xff);
-56
200
The two numbers, -56 and 200 are just different interpretations of the bit pattern 11001000 whereas the iso-latin-1 interpretation of a byte containing the bit pattern 11001000 is the character È.
A char value is also just an interpretation of a two byte quantity, i.e. as UTF-16 code unit. Likewise, a char[] array is a sequence of bytes in the computer’s memory with a standard interpretation.
We can also interpret other byte sequences this way.
StringBuilder sb = new StringBuilder().appendCodePoint(128048);
byte[] array = new byte[4];
StandardCharsets.UTF_16LE.newEncoder()
.encode(CharBuffer.wrap(sb), ByteBuffer.wrap(array), true);
System.out.println(Arrays.toString(array));
will print the value you’ve seen, [61, -40, 48, -36].
The advantage of using a byte[] array inside the String class is, that now, the interpretation can be chosen, to use iso-latin-1 when all characters are representable with this encoding or utf-16 otherwise.
The possible numeric interpretations are irrelevant to the string. However, when you ask “How can Java know that -56 value is the same as 200”, you should ask yourself, how does it know that the bit pattern 11001000 of a byte is -56 in the first place?
System.out.println(value[0]);
bears an actually expensive operation, compared to ordinary computer arithmetic, the conversion of a byte (or an int) to a String. This conversion operation is often overlooked as it has been defined as the default way of printing a byte, but is not more natural than a conversion to a String interpreting the value as an unsigned quantity. For further reading, I recommend Two's complement.
This is because not all bytes in a string are interpreted the same. This depends to the string's character encoding.
Example:
if a string is an UTF-8 string, its characters will be 8-bits in size.
in an UTF-16 string, its characters will be 16-bits in size.
etc...
This means, if the string is to be represented as UTF-8, the characters will be made by reading 1 byte at a time; if 16-bits, the characters will made by reading 2 bytes at a time.
Look at this code: a single byte array data is transformed to string using UTF-8 and UTF-16.
byte[] data = new byte[] {97, 98, 99, 100};
System.out.println(new String(data, StandardCharsets.UTF_8));
System.out.println(new String(data, StandardCharsets.UTF_16));
The output of this code is:
abcd // 4 bytes = 4 chars, 1 byte per char
慢捤 // 4 bytes = 2 chars, 2 byte per char
Going back to the question, what motivated the developers to do so is to reduce memory footprint on strings. Not all strings uses all the 16-bits a char offers.
EDIT: Code here
Let's say I have a byte array and I try to encode it to UTF_8 using the following
String tekst = new String(result2, StandardCharsets.UTF_8);
System.out.println(tekst);
//where result2 is the byte array
Then, I get the bytes using getBytes() with values from 0 to 128
byte[] orig = tekst.getBytes();
And then, I wish to do a frequency count of my byte[] orig using the ff:
int frequencies = new int[256];
for (byte b: orig){
frequencies[b]++;
}
Everything goes well till I encounter an error which states
java.lang.ArrayIndexOutOfBoundsException: -61
Does that mean that my byte still contains negative values despite converting it to UTF-8? Is there something wrong that I'm doing? Can someone please give me clarity on this cause I'm still a beginner on the subject. Thank you.
Answering the specific question
Does that mean that my byte still contains negative values despite converting it to UTF-8?
Yes, absolutely. That's because byte is signed in Java. A byte value of -61 would be 195 as an unsigned value. You should expect to get bytes which aren't in the range 0-127 when you encode any non-ASCII text with UTF-8.
The fix is easy: just clamp the range to 0-255 with a bit mask:
frequencies[b & 0xff]++;
Addressing what you're attempting to do
This line:
String tekst = new String(result2, StandardCharsets.UTF_8);
... is only appropriate if result2 is genuinely UTF-8-encoded text. It's not appropriate if result2 is some arbitrary binary data such as an image, compressed data, or even text encoded in some other encoding.
If you want to preserve arbitrary binary data as a string, you should use something like Base64 or hex. Basically, you need to determine whether your data is inherently textual (in which case, you should use strings for as much of the time as possible, and use an appropriate Charset to convert to binary where necessary) or inherently binary (in which case you should use bytes for as much of the time as possible, and use base64 or hex to convert to text where necessary).
This line:
byte[] orig = tekst.getBytes();
... is almost always a bad idea. It uses the platform-default encoding to convert a string to bytes. If you really, really want to use the platform-default encoding, I would make that explicit:
byte[] orig = tekst.getBytes(Charset.defaultCharset());
... but this is an extremely unusual requirement these days. It's almost always better to stick to UTF-8 everywhere.
when I doing the signature encoding I meet a stranger problem:
When I want to rebuild a byte array, it always failed with :
//digest is the original byte array
String messageHex = bytesToHex(digest);
byte[] hexRestore = messageHex.getBytes();
assert Arrays.equals(digest, hexRestore); //false!
String utf8Digest = new String(digest, "UTF8");
byte[] utf8Restore = utf8Digest.getBytes("UTF8");
assert Arrays.equals(digest, utf8Restore); //false!
Then I use big Integer:
BigInteger messageBig = new BigInteger(digest);
byte[] bigRestore = messageBig.toByteArray();
assert Arrays.equals(digest, bigRestore)); //true!
Then it works, I don't know why, c
Don't use either of these approaches. Either convert into hex directly (not using BigInteger) or use base64. BigInteger will faithfully reproduce numbers, but it's not meant to be a general purpose binary-to-hex converter. In particular, it will lose leading zeroes, because they're insignificant when treating the data as an integer. (If you know the expected length you could always format it to that, but why bother? Just treat the data as arbitrary data instead of as a number.)
Definitely don't try to "decode" the byte array as if it's UTF-8-encoded text - it isn't.
There are plenty of questions on Stack Overflow about converting byte arrays to hex or base64. (Those are just links to two examples... search for more.)
I am coming across a strange thing. I have a number in binary in the form of string particularly "01001100". But I am getting the exception mentioned above by executing the following code.
String s = "01001100";
byte b = Byte.parseByte(s);
But why is it happening? Whereas in a byte we can store max no. upto 127 and min. upto -128.
And the decimal equivalent of the above number is 76 which is perfectly in the range.
The particular exception I am getting is as:
java.lang.NumberFormatException:Value out of range. value:01001100 radix:10
Is there any way to get rid of it. Yes and it is compulsory for me to use byte only as I am extracting the data stored in the image byte by byte only.
Thank you.
The key is at the end of the exception string: radix:10. You are converting the decimal value 1,001,100 to a byte, and it does not fit. Try this:
String s = "01001100";
byte b = Byte.parseByte(s, 2);
01001100 is a fairly large number in decimal (over a million; see the docs for parseByte(String)). You probably want the version that accepts a radix:
byte b = Byte.parseByte(s, 2);