Difference between basic and url base64 encoding in Java 8 - java

The Java 8 Base64 library has two variants that can be used in URI building: the "Basic" one and the "URL and Filename safe". The documentation points to RFC 4648 Table 2 as an explanation for the differences.
After reading the spec it still isn't clear to me what the practical difference is between both encodings: are both standards "widely" supported? What about browsers specifically? Is the URL and filename safe encoding recommended for data URI encoding? Are there known support limitations?

The easiest way is to provide an example(IMHO):
Base64.Encoder enc = Base64.getEncoder();
Base64.Encoder encURL = Base64.getUrlEncoder();
byte[] bytes = enc.encode("subjects?_d".getBytes());
byte[] bytesURL = encURL.encode("subjects?_d".getBytes());
System.out.println(new String(bytes)); // c3ViamVjdHM/X2Q= notice the "/"
System.out.println(new String(bytesURL)); // c3ViamVjdHM_X2Q= notice the "_"
Base64.Decoder dec = Base64.getDecoder();
Base64.Decoder decURL = Base64.getUrlDecoder();
byte[] decodedURL = decURL.decode(bytesURL);
byte[] decoded = dec.decode(bytes);
System.out.println(new String(decodedURL));
System.out.println(new String(decoded));
Notice how one is URL safe and the other is not.
As a matter of fact if you look at the implementation, there are two look-up tables used for encoding: toBase64 and toBase64URL. There are two characters only that differ for them:
+ and / for toBase64 versus - and _ for toBase64URL.
So it seems that your question is one URI safe and should be used there?; the answer is yes.

Running some tests, encoding a data URI using base64 "URL and filename safe" produces URIs that are not recognised by Chrome.
Example: data:text/plain;base64,TG9yZW0/aXBzdW0= is properly decoded to Lorem?ipsum, while its URL-safe counterpart data:text/plain;base64,TG9yZW0_aXBzdW0= is not (ERR_INVALID_URL).

Related

What is the equivalent of Android's Base64.encodeToString(data, Base64.NO_WRAP) in Java

I have looked for this information on stackoverflow but I can't find the exact answer I want.
If we use Java's version of Base64 in java.util, what is Java's equivalent of Android's Base64.encodeToString(data, Base64.NO_WRAP) in Java?
You've identified correctly that you need java.util.Base64. If you read its documentation, you'd see that it supports three types of Base 64 en/decoding. Since the Android code you are trying to translate says NO_WRAP, you should use either the basic encoder or the URL encoder, both of which do not wrap lines. The MIME decoder does wrap lines, which is not what you want.
Base64.Encoder encoder = Base64.getEncoder(); // for the basic encoder, or:
// Base64.Encoder encoder = Base64.getUrlEncoder(); for the URL encoder
On Base64.Encoder, you'll see a method with exactly the same name as the android method - encodeToString.
String encoded = encoder.encodeToString(data);

Ruby base 64 decoding for Java base 64 encoding

I having a string that is encoded in java using
data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
I am receiving this encoded data as an API response. I want to base64 decode this in ruby. I am using
Base64.strict_decode64(data)
for this. but this is not working. Can anyone help me with this?
Your Java code is correct:
byte[] encVal = "Hello World".getBytes();
String data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
System.out.println(data); // SGVsbG8gV29ybGQ=
The SGVsbG8gV29ybGQ= decodes correctly using multiple tools, e.g. https://www.base64decode.org/.
You are observing garbage characters decoding your value most likely due to an error in creating byte[]. Possibly you have to specify the correct encoding when creating byte[].

Java: Is there any encoder which will not include forward slash during data encryption?

I need to pass data in encrypted format with URL like this,
http://localhost:8080/app/{encrypted_data}
So, is there any encoder which will not include forward slash(/) in encoding?
Please Note: I don't want to replace '/' by another character, manually, from the encoded data.
..............................................................................................................
Edited: comment from Oleg Estekhin of using Base64 URL safe Encoding is also working fine, I'm just adding an example over here.
EXAMPLE: Encode:
String str = "subjects?_d=1";
byte[] bytesEncoded = Base64.encodeBase64URLSafe((str.getBytes()));
Decode:
Base64 decoder = new Base64(true);
byte[] decodedBytes = decoder.decode(new String(bytesEncoded));
System.out.println(new String(decodedBytes));
Output:
c3ViamVjdHM_X2Q9MQ
subjects?_d=1
http://en.wikipedia.org/wiki/Base32
example:
Encode string to base32 string in Java

UTF-8 -- ISO 8859-1 mapping tool

When I convert a UTF-8 String with chars that are not known in 8859-1 to 8859-1 then i get question marks here and there. Sure what sould he do else!
Is there a java tool that can map a string like "İKEA" to "IKEA" and avoid ? to make the best out of it?
For the specific example, you can:
decompose the letters and diacritics using compatibility form Unicode normalization
instruct the encoder to drop unsupported characters (the diacritics)
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
// create encoder
CharsetEncoder encoder = StandardCharsets.ISO_8859_1.newEncoder();
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
// write data
String ikea = "\u0130KEA";
String decomposed = Normalizer.normalize(ikea, Form.NFKD);
CharBuffer cbuf = CharBuffer.wrap(decomposed);
ByteBuffer bbuf = encoder.encode(cbuf);
out.write(bbuf.array());
// verify
String decoded = new String(out.toByteArray(), StandardCharsets.ISO_8859_1);
System.out.println(decoded);
You're still transcoding from a character set that defines 109,384 values (Unicode 6) to one that supports 256 so there will always be limitations.
Also consider a more sophisticated transformation API like ICU for features like transliteration.

UTF8 convertion for text obtained from internet

ElasticSearch is a search Server which accepts data only in UTF8.
When i tries to give ElasticSearch following text
Small businesses potentially in line for a lighter reporting load include those with an annual turnover of less than £440,000, net assets of less than £220,000 and fewer than ten employees"
Through my java application - Basically my java application takes this info from a webpage , and gives it to elasticSearch. ES complaints it cant understand £ and it fails. After filtering through below code -
byte bytes[] = s.getBytes("ISO-8859-1");
s = new String(bytes, "UTF-8");
Here £ is converted to �
But then when I copy it to a file in my home directory using bash and it goes in fine. Any pointers will help.
You have ISO-8895-1 octets in bytes, which you then tell String to decode as if it were UTF-8. When it does that, it doesn't recognize the illegal 0xA3 sequence and replaces it with the substitution character.
To do this, you have to construct the string with the encoding it uses, then convert it to the encoding that you want. See How do I convert between ISO-8859-1 and UTF-8 in Java?.
UTF-8 is easier than one thinks. In String everything is unicode characters.
Bytes/string conversion is done as follows.
(Note Cp1252 or Windows-1252 is the Windows Latin1 extension of ISO-8859-1; better use
that one.)
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(file), "Cp1252"));
PrintWriter out = new PrintWriter(
new OutputStreamWriter(new FileOutputStream(file), "UTF-8"));
response.setContentType("text/html; charset=UTF-8");
response.setEncoding("UTF-8");
String s = "20 \u00A3"; // Escaping
To see why Cp1252 is more suitable than ISO-8859-1:
http://en.wikipedia.org/wiki/Windows-1252
String s is a series of characters that are basically independent of any character encoding (ok, not exactly independent, but close enough for our needs now). Whatever encoding your data was in when you loaded it into a String has already been decoded. The decoding was done either using system default encoding (which is practically ALWAYS AN ERROR, do not ever use system default encoding, trust me I have over 10 years of experience in dealing with bugs related to wrong default encodings) or the encoding you explicitely specified when you loaded the data.
When you call getBytes("ISO-8859-1") for a String, you request that the String is encoded into bytes according to ISO-8859-1 encoding.
When you create a String from a byte array, you need to specify the encoding in which the characters in the byte array are represented. You create a string from a byte array that has been encoded in UTF-8 (and just above you encoded it in ISO-8859-1, that is your error).
What you want to do is:
byte bytes[] = s.getBytes("UTF-8");
s = new String(bytes, "UTF-8");

Categories