Trying to convert a byte[] to base64 string using
org.apache.commons.codec.binary.Base64..For this my java code looks like:
base64String = Base64.encodeBase64URLSafeString(myByteArray);
But what i see is some invalid characters in the generated base64 string..
Why do I see these ____ lines in my generated base64 String?
Is it a valid string?
Note the length of the generated string is dividable by four.
have you tried with the encodeBase64String method instead of using encodeBase64URLSafeString ?
encodeBase64URLSafeString:
Encodes binary data using a URL-safe variation of the base64 algorithm
but does not chunk the output. The url-safe variation emits - and _
instead of + and / characters.
encodeBase64String:
Encodes binary data using the base64 algorithm but does not chunk the
output. NOTE: We changed the behaviour of this method from multi-line
chunking (commons-codec-1.4) to single-line non-chunking
(commons-codec-1.5).
source : org.apache.commons.codec.binary.Base64 javadoc
Use
String sCert = javax.xml.bind.DatatypeConverter.printBase64Binary(pk);
This may be helpful
sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
encoder.encode(byteArray);
you've got (at least) two flavours of base64, the original one using '+' and '/' in addition to alphanumeric characters, and the "url safe" one, using "-" and "_" so that the content can be enclosed in a URL (or used as a filename, btw).
It looks like you're using a base64 encoder that has been turned into "url-safe mode".
apache's javadoc for URLSafeString()
Oh, and '_' being the last character of the base64 alphabet, seeing strings of "____" means you've been encoding chunks of 0xffffff ... , just like seeing "AAAAAA" means there's a lot of consecutive zeroes.
If you want to be convinced that it's a normal case, just pick your favourite hexadecimal dumper/editor and check what your binary input looked like.
By the below process you can convert byte array to Base64
// Convert a byte array to base64 string
byte[] bytearray= new byte[]{0x12, 0x23};
String s = new sun.misc.BASE64Encoder().encode(bytearray);
// Convert base64 string to a byte array
bytearray = new sun.misc.BASE64Decoder().decodeBuffer(s);
Edit:
Please check for guide below link commons apache
Related
I am working on converting a string from one charset to another and read many example on it and finally found below code, which looks nice to me and as a newbie to Charset Encoding, I want to know, if it is the right way to do it .
public static byte[] transcodeField(byte[] source, Charset from, Charset to) {
return new String(source, from).getBytes(to);
}
To convert String from ASCII to EBCDIC, I have to do:
System.out.println(new String(transcodeField(ebytes,
Charset.forName("US-ASCII"), Charset.forName("Cp1047"))));
And to convert from EBCDIC to ASCII, I have to do:
System.out.println(new String(transcodeField(ebytes,
Charset.forName("Cp1047"), Charset.forName("US-ASCII"))));
The code you found (transcodeField) doesn't convert a String from one encoding to another, because a String doesn't have an encoding¹. It converts bytes from one encoding to another. The method is only useful if your use case satisfies 2 conditions:
Your input data is bytes in one encoding
Your output data needs to be bytes in another encoding
In that case, it's straight forward:
byte[] out = transcodeField(inbytes, Charset.forName(inEnc), Charset.forName(outEnc));
If the input data contains characters that can't be represented in the output encoding (such as converting complex UTF8 to ASCII) those characters will be replaced with the ? replacement symbol, and the data will be corrupted.
However a lot of people ask "How do I convert a String from one encoding to another", to which a lot of people answer with the following snippet:
String s = new String(source.getBytes(inputEncoding), outputEncoding);
This is complete bull****. The getBytes(String encoding) method returns a byte array with the characters encoded in the specified encoding (if possible, again invalid characters are converted to ?). The String constructor with the 2nd parameter creates a new String from a byte array, where the bytes are in the specified encoding. Now since you just used source.getBytes(inputEncoding) to get those bytes, they're not encoded in outputEncoding (except if the encodings use the same values, which is common for "normal" characters like abcd, but differs with more complex like accented characters éêäöñ).
So what does this mean? It means that when you have a Java String, everything is great. Strings are unicode, meaning that all of your characters are safe. The problem comes when you need to convert that String to bytes, meaning that you need to decide on an encoding. Choosing a unicode compatible encoding such as UTF8, UTF16 etc. is great. It means your characters will still be safe even if your String contained all sorts of weird characters. If you choose a different encoding (with US-ASCII being the least supportive) your String must contain only the characters supported by the encoding, or it will result in corrupted bytes.
Now finally some examples of good and bad usage.
String myString = "Feng shui in chinese is 風水";
byte[] bytes1 = myString.getBytes("UTF-8"); // Bytes correct
byte[] bytes2 = myString.getBytes("US-ASCII"); // Last 2 characters are now corrupted (converted to question marks)
String nordic = "Här är några merkkejä";
byte[] bytes3 = nordic.getBytes("UTF-8"); // Bytes correct, "weird" chars take 2 bytes each
byte[] bytes4 = nordic.getBytes("ISO-8859-1"); // Bytes correct, "weird" chars take 1 byte each
String broken = new String(nordic.getBytes("UTF-8"), "ISO-8859-1"); // Contains now "Här är några merkkejä"
The last example demonstrates that even though both of the encodings support the nordic characters, they use different bytes to represent them and using the wrong encoding when decoding results in Mojibake. Therefore there's no such thing as "converting a String from one encoding to another", and you should never use the broken example.
Also note that you should always specify the encoding used (with both getBytes() and new String()), because you can't trust that the default encoding is always the one you want.
As a last issue, Charset and Encoding aren't the same thing, but they're very much related.
¹ Technically the way a String is stored internally in the JVM is in UTF-16 encoding up to Java 8, and variable encoding from Java 9 onwards, but the developer doesn't need to care about that.
NOTE
It's possible to have a corrupted String and be able to uncorrupt it by fiddling with the encoding, which may be where this "convert String to other encoding" misunderstanding originates from.
// Input comes from network/file/other place and we have misconfigured the encoding
String input = "Här är några merkkejä"; // UTF-8 bytes, interpreted wrongly as ISO-8859-1 compatible
byte[] bytes = input.getBytes("ISO-8859-1"); // Get each char as single byte
String asUtf8 = new String(bytes, "UTF-8"); // Recreate String as UTF-8
If no characters were corrupted in input, the string would now be "fixed". However the proper approach is to use the correct encoding when reading input, not fix it afterwards. Especially if there's a chance of it becoming corrupted.
I have a Japanese String 文字列 I want to convert it to UTF-8 encoding. This question seems like a bit duplicate. I have googled for sometime but not able to find direct answer.
Encoding a String is the process of transforming a sequence of characters into a sequence of bytes.
For that use the getBytes() method.
This method accepts and encoding parameter, which defines the encoding used in this process. Therefore, you can use :
byte[] encoded = "文字列".getBytes("UTF-8");
As per Jon Skeet comment, don't use magic strings:
byte[] encoded = "文字列".getBytes(StandardCharsets.UTF_8);
byte[] commonsDecode = Base64.decodeBase64(data);
debug("The data is " + commonsDecode.length + " bytes long for the apache commons base64 decoder.");
BASE64Decoder decoder = new BASE64Decoder();
byte[] sunDecode = decoder.decodeBuffer(data);
Log.debug("The data is " + sunDecode.length + " bytes long for the SUN base64 decoder.");
Please explain to me why these two method calls would produce different length for the resulting byte arrays. I initially thought it might have to do with character encodings but if so I don't understand all of the issues properly. The above code was executed on the same system and in the same application, in the order shown above. So the default character encoding on that system would be the same.
The input (test) data:
The below is a System.out.println of the Java String.
qFkIQgDq jk3ScHpqx8BPVS97YE4pP/nBl5Qw7mBnpSGqNqSdGIkLPVod0pBl Uz7NgpizHDicGzNCaauefAdwGklpPr0YdwCu4wRkwyAuvtDmL0BYASOn2tDw72LMz5FChtSa0CoCBQ2ARsFG2GdflnIWsUuBQapX73ZBMiqqm ZCOnMRv9Ol8zT1TECddlKZMYAvmjANgq0sBPyUMF7co XY9BYAjV3L/cA8CGQpXGdrsAgjPKMhzk4hh1GAoQ1soX2Dva8p3erPJ4sy2Vcb6lS1Hap9FR0AZFawbJ10FFSTg10wxc24539kYA6xxq/TFqkhaEoSyTqjXjvo1SA==
Apache commons decoder says it's 252 length byte array.
Java Sun decoder says 256.
The decoded data is not valid Base64 data.
Valid Base64 data can contain whitespace. Usually, it has a newline every 72 characters. However, your data contains spaces in random places. If they are removed (as every Base64 decoder is supposed to do), 339 characters remain. Yet, valid Base64 data has to be a multiple of 4 characters.
Interestingly, your data contains no plus signs. I suspect it once contained them but they have probably been replaced with spaces somewhere in transmission. If you replace all spaces with plus signs, the Base64 data is valid and the decoded data will have a length of 256 bytes: 344 characters / 4 * 3 - 2 padding characters.
I further suspect that the Base64 data was used in a URL without proper URL encoding. That's a probable cause for the missing plus signs. Note that Base64 encoded data is not URL safe. Both the plus and the equal signs need to be escaped.
SecureRandom random = SecureRandom.getInstance("SHA1PRNG");
byte[] salt = new byte[16];
random.nextBytes(salt);
I would like to convert salt to a string to store/read. I don't seem to be able to get this to work. I have read that I need to use the right encoding but I'm not sure what encoding to use. I have tried the following but get junk:
String s = new String(salt, "UTF-8");
String s = new String(salt, "UTF-16");
String s = new String(salt);
Edit: For context, I'm trying to work through and understand this code. I'm trying to view the salt and password so I can monkey with the code.
You need to use Base64 (Apache Commons) class or sun.misc.BASE64Encoder/BASE64Decode to encode the byte array.
Like AVD says, the solution is to use Base64 encoding or some other binary-as-text encoding. (For example, Hex encoding.)
Why? Because binary data is not text!
What you are currently doing is telling the String constructor that the bytes are text that has been correctly encoded as UTF-8 or UTF-16 or (in the last case) the platform's default encoding. This is patently false. The "junk" you are seeing is what you get if you attempt to decode random binary stuff as text.
Worse still, the decoding process is probably lossy when you apply it to random binary data. For instance, some sequences of bytes are simply invalid if you try to treat them as UTF-8. (The spec for UTF-8 says so!) When the UTF-8 decoder sees one of these invalid sequences, it replaces it with a character (such as a '?') that means "invalid character". If you then turn the characters in the string back into bytes, you will get a different byte sequence to the one that you started with. That's probably a disaster for your use-case.
Currently I'm transferring a String across the network, using DataInput/OutputStream's. The String I am transferring needs to be converted into a byte array, to be decrypted.
However, since when the string was written using DataOutputStream.writeUTF("foobar"), its byte array contains encoded Java Modified UTF-8 data, which stuffs up the encryption process.
How can I get the original bytes from the Java modified UTF-8 String?
Unicode has several variants, where s-with-^ can either be one character or two: s plus combining-^. Java has a Normalizer class to convert to one specific variant.
See http://docs.oracle.com/javase/tutorial/i18n/text/normalizerapi.html
or look immediately at the API.
This requires that the original string adheres to one variant. One cannot take bytes and then interprete them as UTF-8, because there are illegal sequences. This was done to prevent recognizing a wrong byte/character when in the middle of a byte sequence.
String normalizedString = Normalizer.normalize(s, Normalizer.Form.NFD);
What if you write your string as byte[] and read it as byte[] using http://docs.oracle.com/javase/1.4.2/docs/api/java/io/DataOutputStream.html#write(byte[], int, int)