Error: Bad Base64Coder input character - java

I am currently facing an error called Bad Base64Coder input character at ...
Here is my code in java.
String nonce2 = strNONCE;
byte[] nonceBytes1 = Base64Coder.decode(nonce2);
System.out.println("nonceByte1 value : " + nonceBytes1);
The problem now is I get Bad Base64Coder input character error and the nonceBytes1 value is printed as null. I am trying to decode the nonce2 from Base64Coder. My strNONCE value is 16
/** Generating nonce value */
public static String generateNonce() {
try {
byte[] nonce = new byte[16];
Random rand;
rand = SecureRandom.getInstance ("SHA1PRNG");
rand.nextBytes(nonce);
//convert byte array to string.
strNONCE = new String(nonce);
}catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return strNONCE;
}

//convert byte array to string.
strNONCE = new String(nonce);
That is not going to work. You need to base64 encode it.
strNONCE = Base64Coder.encode(nonce);

It simply look like you're confusing some independent concepts and are pretty new to Java as well. Base64 is a type of encoding which converts "human unreadable" byte arrays into "human readable" strings (encoding) and the other way round (decoding). It is usually used to transfer or store binary data as characters there where it is strictly been required (due to the protocol or the storage type).
The SecureRandom thing is not an encoder or decoder. It returns a random value which is in no way to be corelated with a certain cipher or encoder. Here are some extracts from the before given links:
ran·dom
adj.
1. Having no specific pattern, purpose, or objective
Cipher
In cryptography, a cipher (or cypher)
is an algorithm for performing
encryption or decryption — a series
of well-defined steps that can be
followed as a procedure.
Encoding
Encoding is the process of
transforming information from one
format into another. The opposite
operation is called decoding.
I'd strongly recommend you to align those concepts out for yourself (click the links to learn more about them) and not to throw them in one big and same hole. Here's at least an SSCCE which shows how you can properly encode/decode a (random) byte array using base64 (and how to show arrays as string (a human readable format)):
package com.stackoverflow.q2535542;
import java.security.SecureRandom;
import java.util.Arrays;
import org.apache.commons.codec.binary.Base64;
public class Test {
public static void main(String[] args) throws Exception {
// Generate random bytes and show them.
byte[] bytes = new byte[16];
SecureRandom.getInstance("SHA1PRNG").nextBytes(bytes);
System.out.println(Arrays.toString(bytes));
// Base64-encode bytes and show them.
String base64String = Base64.encodeBase64String(bytes);
System.out.println(base64String);
// Base64-decode string and show bytes.
byte[] decoded = Base64.decodeBase64(base64String);
System.out.println(Arrays.toString(decoded));
}
}
(using Commons Codec Base64 by the way)
Here's an example of the output:
[14, 52, -34, -74, -6, 72, -127, 62, -37, 45, 55, -38, -72, -3, 123, 23]
DjTetvpIgT7bLTfauP17Fw==
[14, 52, -34, -74, -6, 72, -127, 62, -37, 45, 55, -38, -72, -3, 123, 23]

A base64 encoded string would only have printable characters in it. You're generating strNONCE directly from random bytes, so it will have non-printable characters in it.
What exactly is it you're trying to do?

Related

No sense length() result

Since today I'm fronting a really weird error related to byte[] to String conversion.
Here is the code:
private static final byte[] test_key = {-112, -57, -45, 125, 91, 126, -118, 13, 83, -60, -119, 57, 38, 118, -115, -52, -92, 39, -24, 75, 59, -21, 88, 84, 66, -125};
public static void main(String[] args) {
byte[] encryptedArray = xor("ciao".getBytes(), test_key);
System.out.println("Encrypted arrray: " + Arrays.toString(encryptedArray));
final String encrypted = new String(encryptedArray);
System.out.println("Length: " + new String(encryptedArray).length());
System.out.println(Arrays.toString(encrypted.getBytes()));
System.out.println("Encrypted value: " + encrypted);
System.out.println("Decrypted value: " + new String(xor(encrypted.getBytes(), test_key)));
}
private static byte[] xor(byte[] data, byte[] key) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.length; i++) {
result[i] = (byte) (data[i] ^ key[i % key.length]);
}
return result;
}
My output is:
Encrypted arrray: [-13, -82, -78, 18]
Length: 2
[-17, -65, -67, 18]
Encrypted value: �
Decrypted value: xno
Why does length() return 2? What am I missing?
There is no 1-to-1 mapping between byte and char, rather it depends on the charset you use. Strings are logically chars sequences. So if you want to convert between chars and bytes, you need a character encoding, which specifies the mapping from chars to bytes, and vice versa. Your bytes in encryptedArray are first converted to Unicode string, which attempts to create UTF-8 char sequence from these bytes.
If you want to use String and revert back the exact bytes, you need to do a Base64 of the encryptedArray and then do a new String() of it:
String encoded = new String(Base64.getEncoder().encode(encryptedArray));
To retreive, just decode:
Base64.getDecoder().decode(encoded);
I just thought of a good way of showing what happens by simply replacing the new String(byte[]) method by another one, which is why I will answer the question. This one performs the same basic action as the constructor, with one change: it throws an exception if any invalid characters are found.
private static final byte[] test_key = {-112, -57, -45, 125, 91, 126, -118, 13, 83, -60, -119, 57, 38, 118, -115, -52, -92, 39, -24, 75, 59, -21, 88, 84, 66, -125};
public static void main(String[] args) throws Exception {
byte[] encryptedArray = xor("ciao".getBytes(), test_key);
System.out.println("Encrypted arrray: " + Arrays.toString(encryptedArray));
final String encrypted = new String(encryptedArray);
// original
System.out.println("Length: " + new String(encryptedArray).length());
// replacement
System.out.println("Length: " + decode(encryptedArray).length());
System.out.println(Arrays.toString(encrypted.getBytes()));
System.out.println("Encrypted value: " + encrypted);
System.out.println("Decrypted value: " + new String(xor(encrypted.getBytes(), test_key)));
}
private static String decode(byte[] encryptedArray) throws CharacterCodingException {
var decoder = Charset.defaultCharset().newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
var decoded = decoder.decode(ByteBuffer.wrap(encryptedArray));
return decoded.toString();
}
private static byte[] xor(byte[] data, byte[] key) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.length; i++) {
result[i] = (byte) (data[i] ^ key[i % key.length]);
}
return result;
}
The method is called decode because that's what you are actually doing: you are decoding the bytes to a text. A character encoding is the encoding of characters as bytes, which means that the opposite must be decoding after all.
As you will see, the above will first print out 2 if your platform uses the default UTF-8 encoding (Linux, Android, MacOS). You can get the same result by replacing Charset.defaultCharset() with StandardCharsets.UTF_8 on Windows which uses the Windows-1252 charset instead (a single byte encoding which is an expansion of Latin-1, which itself is an expansion of ASCII). However, it will generate the following exception if you use the decode method:
java.nio.charset.MalformedInputException: Input length = 3
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:815)
at StackExchange/com.stackexchange.so.ShowBadEncoding.decode(ShowBadEncoding.java:36)
at StackExchange/com.stackexchange.so.ShowBadEncoding.main(ShowBadEncoding.java:24)
Now maybe you'd expect 4 here, the size of the byte array. But note that UTF-8 characters may be encoded over multiple bytes. The error occurs not on the entire string, but on the last character it is trying to read. Obviously it is expecting a longer encoding based on the previous byte values.
If you replace REPORT with the default decoding action REPLACE (heh) you will see that the result is identical to the constructor, and length() will now return the value 2 again.
Of course, Topaco is correct when he says you need to use base 64 encoding. This encodes bytes to characters instead so that all of the meaning of the bytes is maintained, and the reverse is of course the decoding of text back to bytes.
The elements of a String are not bytes, they are chars. A char is not a byte.
There are many ways of converting a char to a sequence of bytes (i.e., many character-set encodings).
Not every sequence of chars can be converted to a sequence of bytes; there is not always a mapping for every char. It depends on your chosen character-set encoding.
Not every sequence of bytes can be converted to a String; the bytes have to be syntactically valid for the specified character set.

ByteBuffer.PutDouble equivalent in C#

I have a double(float/Decimal) value and I want to get the same byte array as produced by Java ByteBuffer in C#.
However the byte array produced by using ByteBuffer.PutDouble in Java and BinaryWriter in C# is different. Can someone please explain the implementation detail difference between the two.
Java:
ByteBuffer bytes = ByteBuffer.allocate(8).putDouble(0,1.12346);
bytes[] = {63, -15, -7, -83, -45, 115, -106, 54};
C#:
double value = 1.12346;
byte[] arr;
using (MemoryStream stream = new MemoryStream())
{
using (BinaryWriter writer = new BinaryWriter(stream))
{
writer.Write(value);
arr = stream.ToArray();
}
}
arr[] = {153, 211, 101, 49, 177, 249, 241, 63};
ByteBuffer is big endian by default and bytes are signed.
In C#, it's little endian by default and bytes are unsigned.
You have the same data in the opposite order, from a serialization point of view, the sign of the bytes is not important except it is a little confusing.
In C# you can use the EndianBinaryWriter BinaryWriter Endian issue

How do I convert a Windows-1251 text to something readable?

I have a string, which is returned by the Jericho HTML parser and contains some Russian text. According to source.getEncoding() and the header of the respective HTML file, the encoding is Windows-1251.
How can I convert this string to something readable?
I tried this:
import java.io.UnsupportedEncodingException;
public class Program {
public void run() throws UnsupportedEncodingException {
final String windows1251String = getWindows1251String();
System.out.println("String (Windows-1251): " + windows1251String);
final String readableString = convertString(windows1251String);
System.out.println("String (converted): " + readableString);
}
private String convertString(String windows1251String) throws UnsupportedEncodingException {
return new String(windows1251String.getBytes(), "UTF-8");
}
private String getWindows1251String() {
final byte[] bytes = new byte[] {32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32};
return new String(bytes);
}
public static void main(final String[] args) throws UnsupportedEncodingException {
final Program program = new Program();
program.run();
}
}
The variable bytes contains the data shown in my debugger, it's the result of net.htmlparser.jericho.Element.getContent().toString().getBytes(). I just copy and pasted that array here.
This doesn't work - readableString contains garbage.
How can I fix it, i. e. make sure that the Windows-1251 string is decoded properly?
Update 1 (30.07.2015 12:45 MSK): When change the encoding in the call in convertString to Windows-1251, nothing changes. See the screenshot below.
Update 2: Another attempt:
Update 3 (30.07.2015 14:38): The texts that I need to decode correspond to the texts in the drop-down list shown below.
Update 4 (30.07.2015 14:41): The encoding detector (code see below) says that the encoding is not Windows-1251, but UTF-8.
public static String guessEncoding(byte[] bytes) {
String DEFAULT_ENCODING = "UTF-8";
org.mozilla.universalchardet.UniversalDetector detector =
new org.mozilla.universalchardet.UniversalDetector(null);
detector.handleData(bytes, 0, bytes.length);
detector.dataEnd();
String encoding = detector.getDetectedCharset();
System.out.println("Detected encoding: " + encoding);
detector.reset();
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
return encoding;
}
I fixed this problem by modifying the piece of code, which read the text from the web site.
private String readContent(final String urlAsString) {
final StringBuilder content = new StringBuilder();
BufferedReader reader = null;
InputStream inputStream = null;
try {
final URL url = new URL(urlAsString);
inputStream = url.openStream();
reader =
new BufferedReader(new InputStreamReader(inputStream);
String inputLine;
while ((inputLine = reader.readLine()) != null) {
content.append(inputLine);
}
} catch (final IOException exception) {
exception.printStackTrace();
} finally {
IOUtils.closeQuietly(reader);
IOUtils.closeQuietly(inputStream);
}
return content.toString();
}
I changed the line
new BufferedReader(new InputStreamReader(inputStream);
to
new BufferedReader(new InputStreamReader(inputStream, "Windows-1251"));
and then it worked.
(In the light of updates I deleted my original answer and started again)
The text which appears
пїЅпїЅпїЅпїЅпїЅпїЅ
is an accurate decoding of these byte values
-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67
(Padded at either end with 32, which is space.)
So either
1) The text is garbage or
2) The text is supposed to look like that or
3) The encoding is not Windows-1215
This line is notably wrong
return new String(windows1251String.getBytes(), "UTF-8");
Extracting the bytes out of a string and constructing a new string from that is not a way of "converting" between encodings. Both the input String and the output String use UTF-16 encoding internally (and you don't normally even need to know or care about that). The only times other encodings come into play are when text data is stored OUTSIDE of a string object - ie in your initial byte array. Conversion occurs when the String is constructed and then it is done. There is no conversion from one String type to another - they are all the same.
The fact that this
return new String(bytes);
does the same as this
return new String(bytes, "Windows-1251");
suggests that Windows-1251 is the platforms default encoding. (Which is further supported by your timezone being MSK)
Just to make sure you understand 100% how java deals with char and byte.
byte[] input = new byte[1];
// values > 127 become negative when you put them in an array.
input[0] = (byte)239; // the array contains value -17 now.
// but all 255 values are preserved.
// But if you cast them to integers, you should use their unsigned value.
// (casting alone isn't enough).
int output = input[0] & 0xFF; // output is 239 again
// you shouldn't cast directly from a single-byte to a char.
// because: char is 16-bit ; but you only want to use 1 byte ; unfortunately your negative values will be applied in the 2nd byte, and break it.
char corrupted = (char) input[0]; // char-code: 65519 (2 bytes are used)
char corrupted = (char) ((int)input[0]); // char-code: 66519 (2 bytes are used)
// just casting to an integer/character is ok for values < 0x7F though
// values < 0x7F are always positive, even when casted to byte
// AND the first 7-bits in any ascii-encodings (e.g. windows-1251) are identical.
byte simple = (byte) 'a';
char chr = (char) ascii_LT_7F; // will result in 'a' again
// But it's still more reliable to use the & 0xFF conversion.
// Because it ensures that your character can never be greater than char code 255 (a single byte), even when the byte is unexpectedly negative (> 0x7F).
char chr = (char) ((byte)simple & 0xFF); // also results in 'a'
// for value 239 (which is 0xEF) it's impossible though.
// a java char is 16-bit encoded internally, following the unicode character set.
// characters 0x00 to 0x7F are identical in most encodings.
// but e.g. 0xEF in windows-1251 does not match 0xEF in UTF-16.
// so, this is a bad idea.
char corrupted = (char) (input[0] & 0xFF);
// And that's something you can only fix by using encodings.
// It's good practice to use encodings really just ALWAYS.
// the encoding indicates what your bytes[] are encoded in NOW.
// your bytes will be converted to 16-bit characters.
String text = new String(bytes, "from-encoding");
// if you want to change that text back to bytes, use an encoding !!
// this time the encoding specifies is the TARGET-ENCODING.
byte[] bytes = text.getBytes("to-encoding");
I hope this helps.
As for the displayed values:
I can confirm that the byte[] is displayed correctly. I checked them in the Windows-1251 code page. (byte -17 = int 239 = 0xEF = char 'п')
In other words, your byte values are incorrect, or it's a different source-encoding.

Java byte[] to/from String conversion

Why does this junit test fail?
import org.junit.Assert;
import org.junit.Test;
import java.io.UnsupportedEncodingException;
public class TestBytes {
#Test
public void testBytes() throws UnsupportedEncodingException {
byte[] bytes = new byte[]{0, -121, -80, 116, -62};
String string = new String(bytes, "UTF-8");
byte[] bytes2 = string.getBytes("UTF-8");
System.out.print("bytes2: [");
for (byte b : bytes2) System.out.print(b + ", ");
System.out.print("]\n");
Assert.assertArrayEquals(bytes, bytes2);
}
}
I would assume that the incoming byte array equaled the outcome, but somehow, probably due to the fact that UTF-8 characters take two bytes, the outcome array differs from the incoming array in both content and length.
Please enlighten me.
The reason is 0, -121, -80, 116, -62 is not a valid UTF-8 byte sequence. new String(bytes, "UTF-8") does not throw any exception in such situations but the result is difficult to predict. Read http://en.wikipedia.org/wiki/UTF-8 Invalid byte sequences section.
The array bytes contains negative noted vales, these have the 8th bit (bit7) set and are converted into UTF-8 as multibyte sequences. bytes2 will be identical to bytes if you use only bytes with values in range 0..127. To make a copy of bytes as given one may use for example the arraycopy method:
byte[] bytes3 = new byte[bytes.length];
System.arraycopy(bytes, 0, bytes3, 0, bytes.length);

Decrypt byte array with AES 128

I've looked for javascript libraries that can decrypt against an AES 128 encrypted String. I've found several :
http://www.movable-type.co.uk/scripts/aes.html
http://www.hanewin.net/encrypt/aes/aes-test.htm : you will have to look the source
https://code.google.com/p/crypto-js/
My problem is these algoritms take as input either a String or a HexString. My case is a bit special, because my input in a byte array. I've coded a test case in Java :
String key = "MrSShZqHM6dtVNdX";
String message = "NzZiNGM3ZjIyNjM5ZWM3M2YxMGM5NjgzZDQzZDA3ZTQ=";
String charsetName = "UTF-8";
String algo = "AES";
// decode message
byte[] decodeBase64 = Base64.decodeBase64(message.getBytes(charsetName));
System.out.println("decoded message: " + new String(decodeBase64));
// prepare the key
SecretKeySpec secretKeySpec = new SecretKeySpec(key.getBytes(charsetName), algo);
// aes 128 decipher
Cipher cipher = Cipher.getInstance(algo);
cipher.init(Cipher.DECRYPT_MODE, secretKeySpec);
byte[] doFinal = cipher.doFinal(Hex.decodeHex(new String(decodeBase64).toCharArray()));
System.out.println("done with: " + new String(doFinal));
Output is :
decoded message: 76b4c7f22639ec73f10c9683d43d07e4
done with: 390902
But this is Java, right? The org.apache.commons.codec.binary.Hex.decodeHex method converts an array of characters representing hexadecimal values into an array of bytes of those same values. The returned array will be half the length of the passed array, as it takes two characters to represent any given byte. An exception is thrown if the passed char array has an odd number of elements.
In decimal representation, Hex.decodeHex method gives this byte array : [118, -76, -57, -14, 38, 57, -20, 115, -15, 12, -106, -125, -44, 61, 7, -28];
The java AES decipher takes a byte array as input, but in Javascript, no lib does that. I've tried to tweak a bit the one here but dude that's hardcore code. This is really not my field...
The closest I've been was on this online tool. My key is MrSShZqHM6dtVNdX and with apache commons Hex.encodeHex I get 4d725353685a71484d366474564e6458 giving me an output of 3339303930320a0a0a0a0a0a0a0a0a0a, which is almost my wanted output (390902)...

Categories