What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput) - java

I have a problem with CharsetDecoder class.
First example of code (which works):
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final ByteBuffer b = ByteBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1);
}
try {
b.flip();
System.out.println("a" + dec.decode(b).toString() + "a");
} catch (CharacterCodingException e1) {
e1.printStackTrace();
}
The result is a€a
But when i execute this code:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
dec.decode(buffer, chars, i == 2);
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
The result is a
Why is not the same result?
How to use the method decode(ByteBuffer, CharBuffer, endOfInput) of class CharsetDecoder in order to retrieve the result a€a ?
-- EDIT --
So with code of Jesper I do that. It's no perfect but works with a step = 1, 2 and 3
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(6);
final byte[] tab = new byte[]{(byte)97, (byte)-30, (byte)-126, (byte)-84, (byte)97, (byte)97}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
final int step = 3;
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab, i, step);
i+=step-1;
// Remember the current position
final int pos = buffer.position();
int l=chars.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i >= tab.length -1);
System.out.println(result);
if (result.isUnderflow() && chars.position() == l) {
// Underflow, prepare the buffer for more writing
buffer.position(pos);
}else{
if (buffer.position() == buffer.limit()){
//ByteBuffer decoded
buffer.clear();
buffer.position(0);
}else{
//a part of ByteBuffer is decoded. We keep only bytes which are not decoded
final byte[] b = buffer.array();
final int f = buffer.position();
final int g = buffer.limit() - buffer.position();
buffer.clear();
buffer.position(0);
buffer.put(b, f, g);
}
}
buffer.limit(buffer.capacity());
}
dec.flush(chars);
chars.flip();
System.out.println(chars.toString());

The method decode(ByteBuffer, CharBuffer, boolean) returns a result, but you are ignoring the result. If print the result in your second code fragment:
for (int i = 0; i < tab.length; i++) {
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
System.out.println(dec.decode(buffer, chars, i == 2));
}
you'll see this output:
UNDERFLOW
MALFORMED[1]
MALFORMED[1]
a a
Apparently it does not work correctly if you start decoding in the middle of a character. The decoder expects that the first thing it reads is the start of a valid UTF-8 sequence.
edit - When the decoder reports UNDERFLOW, it expects you to add more data to the input buffer and then try to call decode() again, but you must re-offer it the data from the start of the UTF-8 sequence that you are trying to decode. You can't continue in the middle of an UTF-8 sequence.
Here is a version that works, adding one byte from tab in every iteration of the loop:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte) -30, (byte) -126, (byte) -84}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab[i]);
// Remember the current position
final int pos = buffer.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i == 2);
System.out.println(result);
if (result.isUnderflow()) {
// Underflow, prepare the buffer for more writing
buffer.limit(buffer.capacity());
buffer.position(pos);
}
}
dec.flush(chars);
chars.flip();
System.out.println("a" + chars.toString() + "a");

The decoder does not internally cache the data from partial characters, but this does not mean that you have to do complicated things to figure out what data to re-feed the decoder. You gave it a clear way to represent what data it actually consumed, i.e. the input ByteBuffer and its position. In the second example, by giving it a new ByteBuffer every time, the OP failed to pass the decoder back what it reported it had not yet consumed.
The standard pattern for using NIO Buffers is input, flip, output, compact, loop. Short of optimization (which may be premature), there is no reason to re-implement compact yourself. You might just get it wrong, like #Jesper and #lecogiteur did (if more than a single character was ever presented). You should NOT be resetting to the position from before the decode call.
The second example should have read something like:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
final ByteBuffer buffer = ByteBuffer.wrap(new byte[3]);
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1); // In actual usage some type of IO read/transfer would occur here
b.flip();
dec.decode(buffer, chars, i == 2);
b.compact();
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
NOTE: The above does not check the return value to detect malformed input or other error handling for running safely on arbitrary input/IO conditions.

Related

Buffer to String

I send/receive data between Android and other device through the usb.
The code that I use for receive data:
StringBuilder stringBuilder = new StringBuilder();
int i = 0;
int s = buffer[0];
for (; i < s; i++) {
stringBuilder.append(String.valueOf((char)buffer[i]));
}
byte[] b = String.valueOf(stringBuilder).getBytes();
I receive fine all of bytes, except when the byte is bigger than 127. How to do?
I try to use:
stringBuilder2.append(String.valueOf((int)buffer[i] & 0xFF));
And work fine if I read String.valueOf(stringBuilder), but not when I create byte[]
If all the bytes you're receiving are in format ASCII, in your stringBuilder you already have the text of the String you need.
On the other side assuming that buffer[0] is the size of your buffer you could try something like this:
byte[] tmp = new byte[buffer[0]];
System.arraycopy(buffer, 1, tmp, 0, buffer[0]);
String result = new String(tmp);

Java: ByteArray to positive number and vice versa conversion

I'm looking for a way, how to reversibly convert a byte[] of an arbitrary length to positive number (String representation in numbers).
BigInteger offers a solution:
byte[] originalBytes = ...
String string = new BigInteger(originalBytes).toString();
...
byte[] decodedBytes = new BigInteger(string).toByteArray();
However, I'm not sure how to get gracefully rid of negative values (or where to store the sign) and keep the process reversible.
Edit: just replace
String string = new BigInteger(originalBytes).toString();
with
String string = new BigInteger(1, originalBytes).toString();
The 1, signals that the passed array represents a positive number (signum = 1)
Original:
You can just prefix the array with a zero byte:
byte[] original = new byte[] { (byte) 255 };
System.out.println(new BigInteger(original).toString()); // prints "-1"
byte[] paddedCopy = new byte[original.length + 1];
for (int i = 0; i < original.length; i++) {
paddedCopy[i + 1] = original[i];
}
System.out.println(new BigInteger(paddedCopy).toString()); // prints "255"
This will essentially nullify the sign bit, making the number unsigned.

convert byte array to string, string can't convert back after transfer

Code for the same:
public byte[] stringToBytesUTFCustom(String str) {
char[] buffer1 = str.toCharArray();
byte[] b = new byte[buffer1.length << 1];
for(int i = 0; i < buffer1.length; i++) {
int bpos = i << 1;
b[bpos] = (byte) ((buffer1[i]&0xFF00)>>8);
b[bpos + 1] = (byte) (buffer1[i]&0x00FF);
}
return b;
}
public String bytesToStringUTFCustom(byte[] bytes) {
char[] buffer = new char[bytes.length >> 1];
for(int i = 0; i < buffer.length; i++) {
int bpos = i << 1;
char c = (char)(((bytes[bpos]&0x00FF)<<8) + (bytes[bpos+1]&0x00FF));
buffer[i] = c;
}
String txt = String.valueOf(buffer);
//return new String(buffer);
return txt;
}
First, I implement a SMS encryption app (Client to Client) and then want to encode cipher(format "Byte[]") to string, Base64 it's work but can't send because more than 160 character.
I'm want to convert byte array to string ,when use function above it's work for same function, but when I use bytesToStringUTFCustom and then send this text(SMS) can't work.
Receiver cannot read a text to decode from.
Cipher is a result of bytesToStringUTFCustom function, so anyone can help me?
Thanks.
Did you know these:
String.getBytes(Charset encoding)
new String(byte[] byteArray, Charset encoding)
You can use Charset.forName(String) to get the Charset.
Charset UTF8 = Charset.forName("UTF-8");
byte[] bytes = str.getBytes(UTF8);
String reverted = new String(bytes, UTF8);

Read line as array of bytes without default encoding

How to read in the most efficient way one line from a file (finished by \n, or \r, or both) as an array of bytes withouth going through String (if I read line into String, the default encoding is applied and I don't want to have this step).
I don't think you can do this without doing it manually. But to save you time, I'll write the code for you:
public static byte[] firstLine(InputStream in) {
byte[] buffer = new byte[1024]; // arbitrary number
int idx = 0;
byte b;
while ((b = in.read()) != 0x0d || b != 0x0a) { // those codes are CR and LF
if (idx >= buffer.length)
buffer = Arrays.copyOf(buffer, buffer.length * 2);
buffer[idx] = b;
return Arrays.copyOf(buffer, idx);
}

How to autoconvert hexcode to use it as byte[] in Java?

I have many hexcodes here and I want to get them into Java without appending 0x to every entity. Like:
0102FFAB and I have to do the following:
byte[] test = {0x01, 0x02, 0xFF, 0xAB};
And I have many hexcodes which are pretty long. Is there any way to make this automatically?
You could try and put the hex codes into a string and then iterate over the string, similar to this:
String input = "0102FFAB";
byte[] bytes = new byte[input.length() / 2];
for( int i = 0; i < input.length(); i+=2)
{
bytes[i/2] = Integer.decode( "0x" + input.substring( i, i + 2 ) ).byteValue();
}
Note that this requires even length strings and it is quite a quick and dirty solution. However, it should still get you started.
You can use BigInteger to load a long hex string.
public static void main(String[] args) {
String hex = "c33b2cfca154c3a3362acfbde34782af31afb606f6806313cc0df40928662edd3ef1d630ab1b75639154d71ed490a36e5f51f6c9d270c4062e8266ad1608bdc496a70f6696fa6e7cd7078c6674188e8a49ecba71fad049a3d483ccac45d27aedfbb31d82adb8135238b858143492b1cbda2e854e735909256365a270095fc";
byte[] bytes2 = hexToBytes(hex);
for(byte b: bytes2)
System.out.printf("%02x", b & 0xFF);
}
public static byte[] hexToBytes(String hex) {
// add a 10 to the start to avoid sign issues, or an odd number of characters.
BigInteger bi2 = new BigInteger("10" +hex, 16);
byte[] bytes2 = bi2.toByteArray();
byte[] bytes = new byte[bytes2.length-1];
System.arraycopy(bytes2, 1, bytes, 0, bytes.length);
return bytes;
}
prints
0c33b2cfca154c3a3362acfbde34782af31afb606f6806313cc0df40928662edd3ef1d630ab1b75639154d71ed490a36e5f51f6c9d270c4062e8266ad1608bdc496a70f6696fa6e7cd7078c6674188e8a49ecba71fad049a3d483ccac45d27aedfbb31d82adb8135238b858143492b1cbda2e854e735909256365a270095fc
note: it handles the possibility that there is one hex value short at the start.

Categories