Encoding from 1252 to Unicode .NET equivalent in java - java

I have the request to port a .NET web service to java. I need to find the equivalent java code for this piece of code written in .NET:
byte[] b = ... // Some file binary data.
byte[] encoded = System.Text.Encoding.Convert(System.Text.Encoding.GetEncoding(1252), System.Text.Encoding.Unicode, b);
Thanks in advance!

byte[] b = ...
byte[] encoded = new String(b, "Cp1252").getBytes("UTF-16");

Have a look on the List of Supported Encoding in java. Cp1252 encoding in java is theequivalent encoding of windows 1252.

Related

Ruby base 64 decoding for Java base 64 encoding

I having a string that is encoded in java using
data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
I am receiving this encoded data as an API response. I want to base64 decode this in ruby. I am using
Base64.strict_decode64(data)
for this. but this is not working. Can anyone help me with this?
Your Java code is correct:
byte[] encVal = "Hello World".getBytes();
String data = new String(Base64.getEncoder().encode(encVal), StandardCharsets.UTF_8);
System.out.println(data); // SGVsbG8gV29ybGQ=
The SGVsbG8gV29ybGQ= decodes correctly using multiple tools, e.g. https://www.base64decode.org/.
You are observing garbage characters decoding your value most likely due to an error in creating byte[]. Possibly you have to specify the correct encoding when creating byte[].

How to get the same MD5 string in Java as in C#

I have code in C# which produces MD5 encoded byte[] from String and then this byte[] is converted to String. The C# code is
byte[] valueBytes = (new UnicodeEncoding()).GetBytes(value);
byte[] newHash = (new MD5CryptoServiceProvider()).ComputeHash(valueBytes);
I need to get the same result in Java. I'm trying to do this
Charset utf16 = Charset.forName("UTF-16");
return new String(DigestUtils.md5(value.getBytes(utf16)), utf16);
The code is using Apache Commons Codec library for MD5 calculations. I'm using UTF16 charset because I've read in other SO questions that C#'s UnicodeEncoding uses it by default.
So the code snippets look like they do the same thing, but when I'm passing the string byndyusoft2014, C# gives me hV7u6mQYRgBXXF9jOWWYJg== and Java gives me ﹡둛뭶魙ꇥ늺ꢑ. I've tried UTF16LE and UTF16BE as charsets with no luck.
Has anyone idea about what I'm doing wrong?
I think because of the java decode string to byte[] with utf-8,but the C# is not.So the java and C# encode the different byte array,and then get the different result.You can decode the string to byte[] at c# with utf-8,and see the result.Like following code:
UTF8Encoding utf8 = new UTF8Encoding();
byte[] bytes=utf8.GetBytes("byndyusoft2014");
byte[] en=(new MD5CryptoServiceProvider()).ComputeHash(bytes);
Console.WriteLine(Convert.ToBase64String(en));
and the java code:
byte[] en = DigestUtils.md5Digest("byndyusoft2014".getBytes());
byte[] base64 = Base64Utils.encode(en);
System.out.println(new String(base64));
Of course,in your description,the result of C# like be encoded with base64,so the java should encode the byte array with base64.
The result of them is same as swPvmbGDI1GbPKQwL9knjQ==
The DigestUtils and Base64Utils is some implementation of MD5 and BAS64 in spring library
As it turned out, the main difference was not presented in my original code snippet - it was convertation from MD5 encoded byte[] to String. You need to use Base64 to get final result. This is the working code snippet in Java
Charset utf16 = Charset.forName("UTF-16LE");
return new String(Base64.encodeBase64(DigestUtils.md5(value.getBytes(utf16))));
With this code I get the same result as with C#. Thank you all for good hints!

Difference between basic and url base64 encoding in Java 8

The Java 8 Base64 library has two variants that can be used in URI building: the "Basic" one and the "URL and Filename safe". The documentation points to RFC 4648 Table 2 as an explanation for the differences.
After reading the spec it still isn't clear to me what the practical difference is between both encodings: are both standards "widely" supported? What about browsers specifically? Is the URL and filename safe encoding recommended for data URI encoding? Are there known support limitations?
The easiest way is to provide an example(IMHO):
Base64.Encoder enc = Base64.getEncoder();
Base64.Encoder encURL = Base64.getUrlEncoder();
byte[] bytes = enc.encode("subjects?_d".getBytes());
byte[] bytesURL = encURL.encode("subjects?_d".getBytes());
System.out.println(new String(bytes)); // c3ViamVjdHM/X2Q= notice the "/"
System.out.println(new String(bytesURL)); // c3ViamVjdHM_X2Q= notice the "_"
Base64.Decoder dec = Base64.getDecoder();
Base64.Decoder decURL = Base64.getUrlDecoder();
byte[] decodedURL = decURL.decode(bytesURL);
byte[] decoded = dec.decode(bytes);
System.out.println(new String(decodedURL));
System.out.println(new String(decoded));
Notice how one is URL safe and the other is not.
As a matter of fact if you look at the implementation, there are two look-up tables used for encoding: toBase64 and toBase64URL. There are two characters only that differ for them:
+ and / for toBase64 versus - and _ for toBase64URL.
So it seems that your question is one URI safe and should be used there?; the answer is yes.
Running some tests, encoding a data URI using base64 "URL and filename safe" produces URIs that are not recognised by Chrome.
Example: data:text/plain;base64,TG9yZW0/aXBzdW0= is properly decoded to Lorem?ipsum, while its URL-safe counterpart data:text/plain;base64,TG9yZW0_aXBzdW0= is not (ERR_INVALID_URL).

Java Unicode to readable text conversion decoding

I am developing a Java application where I am consuming a web service. The web service is created using a SAP server, which encodes the data automatically in Unicode. I get a Unicode string from the web service.
"
倥䙄ㄭ㌮਍쿣ී㈊〠漠橢਍圯湩湁楳湅潣楤杮਍湥潤橢਍″‰扯൪㰊഼┊敄瑶灹⁥佐呓′†䘠湯⁴佃剕䕉⁒渠牯慭慌杮䔠ൎ⼊祔数⼠潆瑮਍匯扵祴数⼠祔数റ⼊慂敳潆瑮⼠潃牵敩൲⼊慎敭⼠う㄰਍䔯据摯湩⁧′‰൒㸊ാ攊摮扯൪㐊〠漠橢਍㰼਍䰯湥瑧⁨‵‰൒㸊ാ猊牴慥൭ 䘯〰‱⸱2
"
above is the response.
I want to convert it to readable text format like String. I am using core Java.
倥䙄ㄭ㌮਍쿣ී㈊〠漠橢਍圯湩湁楳湅潣楤杮਍湥潤橢਍″‰扯൪㰊഼┊敄瑶灹⁥佐呓′†䘠湯⁴佃剕䕉⁒渠牯慭慌杮䔠ൎ⼊祔数⼠潆瑮਍匯扵祴数⼠祔数റ⼊慂敳潆瑮⼠潃牵敩൲⼊慎敭⼠う㄰਍䔯据摯湩⁧′‰൒㸊ാ攊摮扯൪㐊〠漠橢਍㰼਍䰯湥瑧⁨‵‰൒㸊ാ猊牴慥൭ 䘯〰‱⸱2
That's a PDF file that has been interpreted as UTF-16LE.
You need to look at what component is receiving the response and how it's dealing with the input to stop it being decoded as UTF-16LE, but ultimately there isn't a 'readable' version of it as such, as it's a binary file. Extracting the document text out of a PDF file is a much bigger problem!
(Note: Unicode is a character set, UTF-16LE is an encoding of that set into bytes. Microsoft call the UTF-16LE encoding "Unicode" due to a historical accident, but that's misleading.)
If you have byte[] or an InputStream (both binary data) you can get a String or a Reader (both text) with:
final String encoding = "UTF-8"; // "UTF16LE" or "UTF-16BE"
byte[] b = ...;
String s = new String(b, encoding);
InputStream is = ...;
BufferedReader reader = new BufferedReader(new InputStreamReader(is, encoding));
for (;;) {
String line = reader.readLine();
}
The reverse process uses:
byte[] b = s.geBytes(encoding);
OutputStream os = ...;
BufferedWriter writer = new BufferedWriter(new OuputStreamWriter(os, encoding));
writer.println(s);
Unicode is a numbering system for all characters. The UTF variants implement Unicode as bytes.
Your problem:
In normal ways (web service), you would already have received a String. You could write that string to a file using the Writer above for instance. Either to check it yourself with a full Unicode font, or to pass the file on for a check.
You need (?) to check, which UTF variant the text is in. For Asiatic scripts UTF-16 (little endian or big endian) are optimal. In XML it would be defined already.
Addition:
FileWriter writes to a file using the default encoding (from operating system on your machine). Instead use:
new OutputStreamWriter(new FileOutputStream(new File("...")), "UTF-8")
If it is a binary PDF, as #bobince said, use just a FileOutputStream on byte[] or InputStream.
This is definitely not a valid string. This looks like mangled UTF-16.
UPDATE
Indeed #Bobince is right, this is a PDF file (most probably in UTF-8 / or plain ASCII) displayed in UTF-16. When Displayed in UTF-8 this string indeed shows PDF source code. Good catch.

How to encode character in Microsoft Windows Codepage 1251 (Cyrl) in java

I want to encode string in Java with Microsoft Windows Codepage 1251 (Cyrl) table.
You don't have to "encode" a string. When you turn a string from/to bytes you need to decode/encode them. So you actually encode a binary array.
byte[] cp1251encodedBytes = "your characters".getBytes(Charset.forName("Cp1251"));
List of supported encodings: http://download.oracle.com/javase/1.4.2/docs/guide/intl/encoding.doc.html
Update: updated to Charset.forName() as McDowell commented.

Categories