I am getting some unexpected results from what I thought was a simple test. After running the following:
byte [] bytes = {(byte)0x40, (byte)0xE2, (byte)0x56, (byte)0xFF, (byte)0xAD, (byte)0xDC};
String s = new String(bytes, Charset.forName("UTF-8"));
byte[] bytes2 = s.getBytes(Charset.forName("UTF-8"));
bytes2 is a 14 element long array nothing like the original (bytes). Is there a way to do this sort of conversion and retain the original decomposition to bytes?
Is there a way to do this sort of conversion and retain the original decomposition to bytes?
Well that doesn't look like valid UTF-8 to me, so I'm not surprised it didn't round-trip.
If you want to convert arbitrary binary data to text in a reversible way, use base64, e.g. via this public domain encoder/decoder.
This should do:
public class Main
{
/*
* This method converts a String to an array of bytes
*/
public void convertStringToByteArray()
{
String stringToConvert = "This String is 76 characters long and will be converted to an array of bytes";
byte[] theByteArray = stringToConvert.getBytes();
System.out.println(theByteArray.length);
}
/**
* #param args the command line arguments
*/
public static void main(String[] args)
{
new Main().convertStringToByteArray();
}
}
Two things:
The byte sequence does not appear to be valid UTF-8
$ python
>>> '\x40\xe2\x56\xff\xad\xdc'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 1: invalid continuation byte
Even if it were valid UTF-8, decoding and then encoding can result in different bytes due to things like precombined characters and other Unicode features.
If you want to encode arbitrary binary data in a string in a way where you are guaranteed to get the same bytes back when you decode them, your best bet is something like base64.
Related
I get byte array using:
byte[] digitalSignature = signature.sign();
So what is the best way to save this at the end of txt file or anyone file type, so that I can read it against when I verify sign. My idea is to make String: "Digital signature:" and add on this byte array in String form I tried this:
String stringAddOnEndOfDocument = new String("Digital signature:" + new String(digitalSignature));
When I read file, I find "Digital signature:", and read String after that and convert to byte array using getBytes() method, and then delete this from file.. But I can not verify signature of document with this.. I suppose that there is problem with conversion from bytes to string, but I do not what exactly..
Here is the code how I verify signature:
deleteHashDataFromEndOfFile(testFile);
byte[] messageBytes = Files.readAllBytes(Paths.get(testFile.toString()));
signature.update(messageBytes);
signature.verify(byteArray)
The best way is to use a separate file, but if you really have to store the signature at the end of the file there are a few ways to do this.
Keep in mind that new String(digitalSignature) likely skips unprintable bytes and therefore destroys your signature. You need to handle it always as byte[] or encode it to a printable format using Hex or Base64 encoding.
Using "Digital Signature:" as a marker that the signature follows might work, but breaks if the actual text file contains exactly this text. To fix that you can either search the whole file for that text and only take the last occurrence, or you can to that in a binary fashion and store the signature always at the end. Since signatures usually always have the same length it works by slicing or copying only the last known bytes from the file. If there is a chance that the signature might have a variable length, you can designate the last 4 bytes for example as the length of the signature and be sure that the signature is exactly before that.
You can use int lengthOfSignature = new BigInteger(theLast4Bytes).intValue() to read the value and write it with BigInteger.valueOf(lengthOfSignature).toByteArray() (make sure that it is 4 bytes long and if necessary pad with 0x00 bytes at the front). When reading the signature length your code should test whether the number makes sense: is positive and in a range that you expect 255-257 bytes for example. After that it is only a little bit of index math to get the signature.
Writing might look like this:
byte[] messageBytes = Files.readAllBytes(file);
byte[] signature = sign(privateKey, messageBytes);
byte[] signatureLength = BigInteger.valueOf(lengthOfSignature).toByteArray();
byte[] messageOutput = new byte[messageBytes.length + signature.length + 4];
System.arraycopy(messageBytes, 0, messageOutput, 0, messageBytes.length);
System.arraycopy(signature, 0, messageOutput, messageBytes.length, signature.length);
System.arraycopy(signatureLength, 0, messageOutput, messageBytes.length + signature.length + 4 - signatureLength.length, signatureLength.length); // padding included
// TODO write messageOutput to file
Reading and verifying the signature would look like this:
byte[] msgAndSigBytes = Files.readAllBytes(file);
byte[] signatureLengthBytes = Arrays.copyOfRange(msgAndSigBytes, msgAndSigBytes.length-4, msgAndSigBytes);
int signatureLength = new BigInteger(signatureLengthBytes).intValue();
// TODO check for proper size according to your signature algorithm
byte[] signature = Arrays.copyOfRange(msgAndSigBytes, msgAndSigBytes.length-4-signatureLength, msgAndSigBytes-4);
byte[] msgBytes = Arrays.copyOf(msgAndSigBytes, msgAndSigBytes.length-4-signatureLength);
boolean success = verify(publicKey, signature);
If the file is large then streams should be used. When writing, then normal input or output streams can be used, but reading would either require seekable streams or multiple passes.
I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore.
I'm using fetchData to return a byte[] of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!
Here's my only problem now:
The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.
The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?
The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:
String decoded = new String(bytes, "UTF-8"); // example for one encoding type
By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.
-109 = 0x93: Control Code "Set Transmit State"
The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.
0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:
System.out.println(new String(new byte[]{-109}, "Cp1252"));
Java 7 and above
You can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets. This may be safer than passing the encoding as a String, as suggested in the other answers.
For example, for UTF-8 encoding
String bytesAsString = new String(bytes, StandardCharsets.UTF_8);
You can try this.
String s = new String(bytearray);
public class Main {
/**
* Example method for converting a byte to a String.
*/
public void convertByteToString() {
byte b = 65;
//Using the static toString method of the Byte class
System.out.println(Byte.toString(b));
//Using simple concatenation with an empty String
System.out.println(b + "");
//Creating a byte array and passing it to the String constructor
System.out.println(new String(new byte[] {b}));
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
new Main().convertByteToString();
}
}
Output
65
65
A
public static String readFile(String fn) throws IOException
{
File f = new File(fn);
byte[] buffer = new byte[(int)f.length()];
FileInputStream is = new FileInputStream(fn);
is.read(buffer);
is.close();
return new String(buffer, "UTF-8"); // use desired encoding
}
I suggest Arrays.toString(byte_array);
It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array). In this case, s is equal to equivalent of [1, 2, 3] in format of character.
The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.
To work out whether it is Java or your display that is a problem, do this:
for(int i=0;i<str.length();i++) {
char ch = str.charAt(i);
System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
}
Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.
I am trying to determine if an in-house method will decode a byte array correctly given different encodings. The following code is how I approached generating data to encode.
public class Encoding {
static byte[] VALUES = {(byte) 0x00, ..... (byte) 0xFF};
static String[] ENCODING = {"Windows-1252","ISO-8859-1"};
public static void main(String[] args) throws UnsupportedEncodingException {
for(String encode : ENCODING) {
for(byte value : VALUES) {
byte[] inputByte = new byte[]{value};
String input = new String(inputByte, encode);
String houseInput = houseMethod(input.getBytes());
}
}
}
}
My question is when it comes making the call to the house method, what encoding will it send to that method? It is my understanding when Java stores a String, it converts it to UTF-16. So when I am sending Input.getBytes(), is it sending the UTF-16 encoding byte or the encoding scheme that I set when I created a new String? I am guessing that it is UTF-16, but I am not sure. Should the house method be???
houseMethod(input.getBytes(encode))
See String.getBytes():
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.
You are well advised to use the String.getBytes(Charset) method instead and explicitly specify the desired encoding.
As per Java documentation String.getBytes():
Encodes this String into a sequence of bytes using the platform's
default charset, storing the result into a new byte array
So the bytes that the in house method gets depend on which OS you are, as well as your locale settings.
OTH, String.getBytes(encoding) ensures you get the bytes in the encoding you pass as parameter.
When reading from a file using readChar() in RandomAccessFile class, unexpected output comes.
Instead of the desired character ? is displayed.
package tesr;
import java.io.RandomAccessFile;
import java.io.IOException;
public class Test {
public static void main(String[] args) {
try{
RandomAccessFile f=new RandomAccessFile("c:\\ankit\\1.txt","rw");
f.seek(0);
System.out.println(f.readChar());
}
catch(IOException e){
System.out.println("dkndknf");
}
// TODO Auto-generated method stub
}
}
You probably intended readByte. Java char is UTF-16BE, a 2 bytes Unicode representation, and on random binary data very often not representable, no correct UTF-16BE or a half "surrogate" - part of a combination of two char forming one Unicode code point. Java represents a failed conversion in your case as question mark.
If you know in what encoding the file is in, then for a single byte encoding it is simple:
byte b = in.readByte();
byte[] bs = new byte[] { b };
String s = new String(bs, "Cp1252"); // Some single byte encoding
For the variable multi-byte UTF-8 it is also simple to identify a sequence of bytes:
single byte when high bit = 0
otherwise a continuation byte when high bits 10
otherwise a starting byte (with some special cases) telling the number of bytes by its high bits.
For UTF-16LE and UTF-16BE the file positions must be a multiple of 2 and 2 bytes long.
byte[] bs = new byte[2];
in.read(bs);
String s = new String(bs, StandardCharsets.UTF_16LE);
You almost certainly have a character encoding problem. It is not possible to simply read characters from a file. What must be done is that an appropriate sequence of bytes are read, then those bytes are interpreted according to a character encoding scheme to translate them to a character. When you want to read a file as text, Java must be told, perhaps implicitly, which character encoding to use.
If you tell Java the wrong encoding you will get gibberish. If you pick an arbitrary point in a file and start reading, and that location is not the start of the encoding of a character, you will get gibberish. One or both of those has happened in your case.
Is it possible to convert a byte array to a string but where the length of the string is exactly the same length as the number of bytes in the array? If I use the following:
byte[] data; // Fill it with data
data.toString();
The length of the string is different than the length of the array. I believe that this is because Java and/or Android takes some kind of default encoding into account. The values in the array can be negative as well. Theoretically it should be possible to convert any byte to some character. I guess I need to figure out how to specify an encoding that generates a fixed single byte width for each character.
EDIT:
I tried the following but it didn't work:
byte[] textArray; // Fill this with some text.
String textString = new String(textArray, "ASCII");
textArray = textString.getBytes("ASCII"); // textArray ends up with different data.
You can use the String constructor String(byte[] data) to create a string from the byte array. If you want to specify the charset as well, you can use String(byte[] data, Charset charset) constructor.
Try your code sample with US-ASCII or ISO-8859-1 in place of ASCII. ASCII is not a built-in Character encoding for Java or Android, but one of those two are. They are guaranteed single-byte encodings, with a caveat that characters not in the character set will be silently truncated.
This should work fine!
public static byte[] stringToByteArray(String pStringValue){
int length= pStringValue.length();
byte[] bytes = new byte[length];
for(int index=0; index<length; index++){
char ch= pStringValue.charAt(index);
bytes[index]= (byte)ch;
}
return bytes;
}
since JDK 1.6:
You can also use:
stringValue.getBytes() which will return you a byte array.
In case of passing a NULL string, you need to handle that by either throwing the nullPointerException or handling it inside the method itself.