Encoding String to "modified UTF-8" for the DataInput

Encoding String to "modified UTF-8" for the DataInput - java

I would like to encode String value to the modified UTF-8 format bytes. Something like
byte[] bytes = MagicEncoder.encode(str, "modified UTF-8");
DataInput input = new DataInputStream(new ByteArrayInputStream(bytes));
Each read*() method of the DataInput has to be able to properly read the underlaying bytes.

Use DataOutputStream
ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
DataOutputStream dataOutputStream = new DataOutputStream(byteOutputStream);
dataOutputStream.writeUTF("some string to write");
dataOutputStream.close();
result is available in byteOutputStream.toByteArray()

As info:
The modified UTF-8 encoding simply replaces the nul character U+0000, normally encoded as byte 0, as the byte sequence C0 80, the normal multi-byte encoding, used for codes > 0x7F.
(Hence normal UTF-8 decoding suffices.)
byte[] originalBytes;
int nulCount = 0;
for (int i = 0; i < originalBytes.length; ++i) {
if (originalBytes[i] == 0) {
++nulCount;
}
}
byte[] convertedBytes = new byte[originalCount + nulCount];
for (int i = 0, j = 0; i < originalBytes.length; ++i, ++j) {
convertedBytes[j] = originalBytes[i];
if (originalBytes[i] == 0) {
convertedBytes[j] = 0xC0;
++j;
convertedBytes[j] = 0x80;
}
}
Better to use System.arrayCopy, and check whether nulCount == 0.

Related

Buffer to String

I send/receive data between Android and other device through the usb.
The code that I use for receive data:
StringBuilder stringBuilder = new StringBuilder();
int i = 0;
int s = buffer[0];
for (; i < s; i++) {
stringBuilder.append(String.valueOf((char)buffer[i]));
}
byte[] b = String.valueOf(stringBuilder).getBytes();
I receive fine all of bytes, except when the byte is bigger than 127. How to do?
I try to use:
stringBuilder2.append(String.valueOf((int)buffer[i] & 0xFF));
And work fine if I read String.valueOf(stringBuilder), but not when I create byte[]

If all the bytes you're receiving are in format ASCII, in your stringBuilder you already have the text of the String you need.
On the other side assuming that buffer[0] is the size of your buffer you could try something like this:
byte[] tmp = new byte[buffer[0]];
System.arraycopy(buffer, 1, tmp, 0, buffer[0]);
String result = new String(tmp);

What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput)

I have a problem with CharsetDecoder class.
First example of code (which works):
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final ByteBuffer b = ByteBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1);
}
try {
b.flip();
System.out.println("a" + dec.decode(b).toString() + "a");
} catch (CharacterCodingException e1) {
e1.printStackTrace();
}
The result is a€a
But when i execute this code:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
dec.decode(buffer, chars, i == 2);
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
The result is a
Why is not the same result?
How to use the method decode(ByteBuffer, CharBuffer, endOfInput) of class CharsetDecoder in order to retrieve the result a€a ?
-- EDIT --
So with code of Jesper I do that. It's no perfect but works with a step = 1, 2 and 3
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(6);
final byte[] tab = new byte[]{(byte)97, (byte)-30, (byte)-126, (byte)-84, (byte)97, (byte)97}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
final int step = 3;
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab, i, step);
i+=step-1;
// Remember the current position
final int pos = buffer.position();
int l=chars.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i >= tab.length -1);
System.out.println(result);
if (result.isUnderflow() && chars.position() == l) {
// Underflow, prepare the buffer for more writing
buffer.position(pos);
}else{
if (buffer.position() == buffer.limit()){
//ByteBuffer decoded
buffer.clear();
buffer.position(0);
}else{
//a part of ByteBuffer is decoded. We keep only bytes which are not decoded
final byte[] b = buffer.array();
final int f = buffer.position();
final int g = buffer.limit() - buffer.position();
buffer.clear();
buffer.position(0);
buffer.put(b, f, g);
}
}
buffer.limit(buffer.capacity());
}
dec.flush(chars);
chars.flip();
System.out.println(chars.toString());

The method decode(ByteBuffer, CharBuffer, boolean) returns a result, but you are ignoring the result. If print the result in your second code fragment:
for (int i = 0; i < tab.length; i++) {
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
System.out.println(dec.decode(buffer, chars, i == 2));
}
you'll see this output:
UNDERFLOW
MALFORMED[1]
MALFORMED[1]
a a
Apparently it does not work correctly if you start decoding in the middle of a character. The decoder expects that the first thing it reads is the start of a valid UTF-8 sequence.
edit - When the decoder reports UNDERFLOW, it expects you to add more data to the input buffer and then try to call decode() again, but you must re-offer it the data from the start of the UTF-8 sequence that you are trying to decode. You can't continue in the middle of an UTF-8 sequence.
Here is a version that works, adding one byte from tab in every iteration of the loop:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte) -30, (byte) -126, (byte) -84}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab[i]);
// Remember the current position
final int pos = buffer.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i == 2);
System.out.println(result);
if (result.isUnderflow()) {
// Underflow, prepare the buffer for more writing
buffer.limit(buffer.capacity());
buffer.position(pos);
}
}
dec.flush(chars);
chars.flip();
System.out.println("a" + chars.toString() + "a");

The decoder does not internally cache the data from partial characters, but this does not mean that you have to do complicated things to figure out what data to re-feed the decoder. You gave it a clear way to represent what data it actually consumed, i.e. the input ByteBuffer and its position. In the second example, by giving it a new ByteBuffer every time, the OP failed to pass the decoder back what it reported it had not yet consumed.
The standard pattern for using NIO Buffers is input, flip, output, compact, loop. Short of optimization (which may be premature), there is no reason to re-implement compact yourself. You might just get it wrong, like #Jesper and #lecogiteur did (if more than a single character was ever presented). You should NOT be resetting to the position from before the decode call.
The second example should have read something like:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
final ByteBuffer buffer = ByteBuffer.wrap(new byte[3]);
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1); // In actual usage some type of IO read/transfer would occur here
b.flip();
dec.decode(buffer, chars, i == 2);
b.compact();
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
NOTE: The above does not check the return value to detect malformed input or other error handling for running safely on arbitrary input/IO conditions.

Not able to buffer special characters

while (!bStop) {
byte[] buffer = new byte[256];
if (inputStream.available() > 0) {
inputStream.read(buffer);
int i = 0;
for (i = 0; i < buffer.length && buffer[i] != 0; i++) {
}
final String strInput = new String(buffer, 0, i);
System.out.println(strInput);`
}
The inputstream data is coming in encrypted form in bytes. When i print the data i get funny characters. How can i directly convert the inputstream to hexadecimal in a form of -> 01 2A 03 AA.
Please Help.

try like this
byte[] array = ByteStreams.toByteArray(inputStream);

Checksum doesn't match when computed string contains accented characters

First : I have a string which contains an accented character .
Second : I calcul the checksum for it .
private static String checkSumInStream(String Str, String checksumAlgorithm) throws Exception
{
InputStream stream = new ByteArrayInputStream(Str.getBytes());
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
StringBuffer sb = new StringBuffer();
try{
input = stream;
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
byte[] sum = digest.digest();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
}catch(IOException io)
{
}finally{
if(input != null)
input.close();
}
return sb.toString();
}
Then i write the string in text file and i I recalcul the checksum of the file
private String checkSum(File file,String checksumAlgorithm) throws Exception
{
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
input = new FileInputStream(file);
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
input.close();
byte[] sum = digest.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
return sb.toString();
}
--> Result : the comparison between checksum of an output steam and the file doesn't match when text contains an accented character .

How do you write the String to a file? You must be very careful to do that in the equivalent way of how you read it back from the file.
In your case:
OutputStream out = new FileOutputStream(myfile);
out.write(str.getBytes());
out.close();
Then it should work. But you need to keep in mind that str.getBytes() is not a safe method to use when you write to files, because it uses the platform default encoding for your characters. If you send such a file to some other place and use it there, you may be reading it back with the wrong encoding.
And it's possible that your platform default encoding doesn't even support accented characters! (But if you write and read files in exactly the same way, then you should get exactly the same result, so this wouldn't be the cause of your problem)
The best thing to do is to use the UTF-8 character encoding.
Where ever you used str.getBytes(), replace it with str.getBytes("UTF-8"), or str.getBytes(Charset.forName("UTF-8")) if you want to avoid having to catch UnsupportedEncodingException [even though every Java implementation is required to support the UTF-8 encoding. It's annoying...]

convert byte array to string, string can't convert back after transfer

Code for the same:
public byte[] stringToBytesUTFCustom(String str) {
char[] buffer1 = str.toCharArray();
byte[] b = new byte[buffer1.length << 1];
for(int i = 0; i < buffer1.length; i++) {
int bpos = i << 1;
b[bpos] = (byte) ((buffer1[i]&0xFF00)>>8);
b[bpos + 1] = (byte) (buffer1[i]&0x00FF);
}
return b;
}
public String bytesToStringUTFCustom(byte[] bytes) {
char[] buffer = new char[bytes.length >> 1];
for(int i = 0; i < buffer.length; i++) {
int bpos = i << 1;
char c = (char)(((bytes[bpos]&0x00FF)<<8) + (bytes[bpos+1]&0x00FF));
buffer[i] = c;
}
String txt = String.valueOf(buffer);
//return new String(buffer);
return txt;
}
First, I implement a SMS encryption app (Client to Client) and then want to encode cipher(format "Byte[]") to string, Base64 it's work but can't send because more than 160 character.
I'm want to convert byte array to string ,when use function above it's work for same function, but when I use bytesToStringUTFCustom and then send this text(SMS) can't work.
Receiver cannot read a text to decode from.
Cipher is a result of bytesToStringUTFCustom function, so anyone can help me?
Thanks.

Did you know these:
String.getBytes(Charset encoding)
new String(byte[] byteArray, Charset encoding)
You can use Charset.forName(String) to get the Charset.
Charset UTF8 = Charset.forName("UTF-8");
byte[] bytes = str.getBytes(UTF8);
String reverted = new String(bytes, UTF8);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Encoding String to "modified UTF-8" for the DataInput - java

Use DataOutputStream ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); DataOutputStream dataOutputStream = new DataOutputStream(byteOutputStream); dataOutputStream.writeUTF("some string to write"); dataOutputStream.close(); result is available in byteOutputStream.toByteArray()

Related

Buffer to String

What is CharsetDecoder.decode(ByteBuffer, CharBuffer, endOfInput)

Not able to buffer special characters

Checksum doesn't match when computed string contains accented characters

convert byte array to string, string can't convert back after transfer

Categories

Resources