based on this array :
final char[] charValue = { 'u', ' ', '}','+' };
i want to print the double value and the ascii value from it in Java.
i can't find a proper solution for that in internet. I just found how to convert a single Character into Integer value. But what about many characters?
the main problem is, i have a large char[] and some double and int values are stored in. for double values they are stored within 4 bytes size and integer 1 or 2 bytes so i have to read all this and convert into double or integer.
Thanks for you help
When java was designed, there was C char being used for binary bytes and text.
Java made a clear separation between binary data (byte[], InputStream/OutputStream) and Unicode text (char, String, Reader/Writer). Hence Java has full Unicode support. The binary data, byte[], need information: their used encoding, in order to be convertable to text: char[]/String.
In Java a char[] will rarely be used (as in C/C++), and it seems byte[] is intended, as you mention 4 elements to be used for an int etcetera. A char is 16 bits, containing UTF-16 text.
For this case one can use a ByteBuffer either wrapping a byte[] or being taken from a memory mapped file.
Writing
ByteBuffer buf = ByteBuffer.allocate(13); // 13 bytes
buf.order(ByteOrder.LITTLE_ENDIAN); // Intel order
buf.putInt(42); // at 0
buf.putDouble(Math.PI); // at 4
buf.put((byte) '0'); // at 12
buf.putDouble(4, 3.0); // at 4 overwrite PI
byte[] bytes = buf.array();
Reading
ByteBuffer buf = ByteBuffer.wrap(bytes);
buf.order(ByteOrder.LITTLE_ENDIAN); // Intel order
int a = buf.getInt();
double b = buf.getDouble();
byte c = buf.get();
Related
I've been suggested a TCP-like checksum, which consists of the sum of the (integer) sequence and ack field values, added to a character-by-character sum of the payload field of the packet (i.e., treat each character as if it were an 8 bit integer and just add them together).
I'm assuming it would go along the lines of:
char[] a = data.toCharArray();
for (int i = 0; int < len; i++) {
...
}
Though I'm pretty clueless as to how I could complete the actual conversion?
My data is string, and I wish to go through the string (converted to a char array (though if there's a better way to do this let me know!)) and now I'm ready to iterate though how does one convert each character to an int. I will then be summing the total.
As String contains Unicode, and char is a two-byte UTF-16 implementation of Unicode, it might be better to first convert the String to bytes:
byte[] bytes = data.getBytes(StandardCharsets.UTF_8);
data = new String(bytes, StandardCharsets.UTF_8); // Inverse.
int crc = 0;
for (byte b : bytes) {
int n = b & 0xFF; // An int 0 .. 255 without sign extension
crc ^= n;
}
Now you can handle any Unicode content of a String. UTF-8 is optimal when sufficient ASCII letters are used, like Chinese HTML pages. (For a Chinese plain text UTF-16 might be better.)
I have an application where I want to have a method that creates a string of a specified size in bytes.
here's what I basically wrote. I just want to make sure that this produces for example a string of size bytes
static String createMsg(int size){
byte[] msgB= new byte[size];
for(int i = 0; i < size; i++){
msgB[i] = 105;
}
String x = new String(msgB);
return x;
}
Thank you.
To create a string of a specific size in byte, you must create a string only containing ASCII characters. All characters between 0x0 (0) and 0x7F (127) are ASCII characters and each ASCII character takes one byte. It takes even one byte, if our encoding is UTF, as the first 128 characters in UTF and ASCII are the same for compatiblity reasons.
You can also rely on Apache Commons 3 and use this code snippet to generate a string of a specific size.
RandomStringUtils.randomAscii(100);
I'm writing a Simplified DES algorithm to encrypt and subsequently decrypt a string. Suppose I have the initial character ( which has the binary value 00101000 which I get using the following algorithm:
public void getBinary() throws UnsupportedEncodingException {
byte[] plaintextBinary = text.getBytes("UTF-8");
for(byte b : plaintextBinary){
int val = b;
int[] tempBinRep = new int[8];
for(int i = 0; i<8; i++){
tempBinRep[i] = (val & 128) == 0 ? 0 : 1;
val <<= 1;
}
binaryRepresentations.add(tempBinRep);
}
}
After I perform the various permutations and shifts, ( and it's binary equivalent is transformed into 10001010 and it's ASCII equivalent Š. When I come around to decryption I pass the same character through the getBinary() method I now get the binary string 11000010 and another binary string 10001010 which translates into ASCII as x(.
Where is this rogue x coming from?
Edit: The full class can be found here.
You haven't supplied the decrypting code, so we can't know for sure, but I would guess you missed the encoding either when populating your String. Java Strings are encoded in UTF-16 by default. Since you're forcing UTF-8 when encrypting, I'm assuming you're doing the same when decrypting. The problem is, when you convert your encrypted bytes to a String for storage, if you let it default to UTF-16, you're probably ending up with a two-byte character because the 10001010 is 138, which is beyond the 127 range for ASCII charaters that get represented with a single byte.
So the "x" you're getting is the byte for the code page, followed by the actual character's byte. As suggested in the comments, you'd do better to just store the encrypted bytes as bytes, and not convert them to Strings until they're decrypted.
The problem I am facing occurs when I try to type cast some ASCII values to char.
For example:
(char)145 //returns ?
(char)129 //also returns ?
but it is supposed to return a different character. It happens to many other values as well.
I hope I have been clear enough.
ASCII is a 7-bit encoding system. Some programs even use this to detect if a file is binary or textual. Characters below 32 are escape characters and are used as directives (for instance new lines, print command)
The program however will still work. A character is simply stored as a short (thus sixteen bits). But the values in that range don't have an interpretation. This means that the textual output of both values will lead to nothing. On the other hand comparisons like (char) 145 == (char) 129 will still work (return false). Simply because for a processor, there is no difference between a short and a character.
If you are interested in converting your value such that only the lowest seven bits count (this modifying the value such that it is in the valid range), you can use masking:
int value = 145;
value &= 0x7f;
char c = (char) value;
The char type is Unicode 16 bit, UTF-16. So you could do (char) 265 for c-with-circumflex. ASCII is 7 bits 0 - 127.
String s = "" + ((char)145) + ((char)129);
The above is a string of two Unicode characters (each 2 bytes, UTF-16).
byte[] bytes = s.getBytes(StandardCharsets.US_ASCII); // ASCII with '?' as 7bit
s = new String(bytes, StandardCharsets.US_ASCII); // "??"
byte[] bytes = s.getBytes(StandardCharsets.ISO_8859_1); // ISO-8859-1 with Latin1
byte[] bytes = s.getBytes("Windows-1252"); // With Windows Latin1
byte[] bytes = s.getBytes(StandardCharsets.UTF_8); // No information loss.
s = new String(bytes, StandardCharsets.UTF_9); // Orinal string.
In java String/char/Reader/Writer tackle text (in Unicode), whereas byte[]/InputStream/OutputStream tackle binary data, bytes.
And for bytes must always be associated with an encoding to give text.
Answer: as soon as there is a conversion from text to some encoding that does not represent that char, a question mark can be written.
These expressions evaluate to true:
((char) 145) == '\u0091';
((char) 129) == '\u0081';
These UTF-16 values map to the Unicode code points U+0091 and U+0081:
0091;<control>;Cc;0;BN;;;;;N;PRIVATE USE ONE;;;;
0081;<control>;Cc;0;BN;;;;;N;;;;;
These are both control characters without visible graphemes (the question mark acts as a substitution character) and one of them is private use so has no designated purpose. Neither are in the ASCII set.
In Java, if I have a String x, how can I calculate the number of bytes in that string?
A string is a list of characters (i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.
That said, you can turn the string into a byte array and then look at its size as follows:
// The input string for this test
final String string = "Hello World";
// Check length, in characters
System.out.println(string.length()); // prints "11"
// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"
final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"
final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"
final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"
final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"
So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents every character as a single byte, as that's not true either:
final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms
// Check length, in characters
System.out.println(interesting.length()); // prints "4"
// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"
final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"
final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"
final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")
final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")
(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)
If you're running with 64-bit references:
sizeof(string) =
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code
In other words:
sizeof(string) = 36 + string.length() * 2
On a 32-bit VM or a 64-bit VM with compressed OOPs (-XX:+UseCompressedOops), the references are 4 bytes. So the total would be:
sizeof(string) = 32 + string.length() * 2
This does not take into account the references to the string object.
The pedantic answer (though not necessarily the most useful one, depending on what you want to do with the result) is:
string.length() * 2
Java strings are physically stored in UTF-16BE encoding, which uses 2 bytes per code unit, and String.length() measures the length in UTF-16 code units, so this is equivalent to:
final byte[] utf16Bytes= string.getBytes("UTF-16BE");
System.out.println(utf16Bytes.length);
And this will tell you the size of the internal char array, in bytes.
Note: "UTF-16" will give a different result from "UTF-16BE" as the former encoding will insert a BOM, adding 2 bytes to the length of the array.
According to How to convert Strings to and from UTF8 byte arrays in Java:
String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);
A String instance allocates a certain amount of bytes in memory. Maybe you're looking at something like sizeof("Hello World") which would return the number of bytes allocated by the datastructure itself?
In Java, there's usually no need for a sizeof function, because we never allocate memory to store a data structure. We can have a look at the String.java file for a rough estimation, and we see some 'int', some references and a char[]. The Java language specification defines, that a char ranges from 0 to 65535, so two bytes are sufficient to keep a single char in memory. But a JVM does not have to store one char in 2 bytes, it only has to guarantee, that the implementation of char can hold values of the defines range.
So sizeof really does not make any sense in Java. But, assuming that we have a large String and one char allocates two bytes, then the memory footprint of a String object is at least 2 * str.length() in bytes.
There's a method called getBytes(). Use it wisely .
Try this :
Bytes.toBytes(x).length
Assuming you declared and initialized x before
To avoid try catch, use:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
System.out.println(b.length);
Try this using apache commons:
String src = "Hello"; //This will work with any serialisable object
System.out.println(
"Object Size:" + SerializationUtils.serialize((Serializable) src).length)