How to encode character in Microsoft Windows Codepage 1251 (Cyrl) in java

How to encode character in Microsoft Windows Codepage 1251 (Cyrl) in java - java

I want to encode string in Java with Microsoft Windows Codepage 1251 (Cyrl) table.

You don't have to "encode" a string. When you turn a string from/to bytes you need to decode/encode them. So you actually encode a binary array.
byte[] cp1251encodedBytes = "your characters".getBytes(Charset.forName("Cp1251"));
List of supported encodings: http://download.oracle.com/javase/1.4.2/docs/guide/intl/encoding.doc.html
Update: updated to Charset.forName() as McDowell commented.

Related

How to Encode a HexString to Base64 RFC 1421 in Java

I need to do a conversion from Hexadecimal String to Base64 in RFC 1421 format. So far I have been doing it with:
org.apache.commons.codec.binary.Base64
But reading the documentation it says the following: Provides Base64 encoding and decoding as defined by RFC 2045.
Therefore it doesn't work for me, I have tried to look for examples to convert a Hex String to Base64 RFC 1421 in Java, but I can't find anything.
Can you give me a hand?
Thanks in advance.
A greeting.

Have you tried using the java.util.Base64 class (available since java 8)?
It has a getMimeEncoder(int, byte[]) method which you can use with a lineLength of 64 and the resulting Encoder should be RFC1421 compliant:
Encoder rfc1421 = Base64.getMimeEncoder(64, new byte[] {'\r', '\n'});
Note: there may be other specificities that I don't know of.

Character Encoding Conversion In Groovy From UTF-8 to EUC-JP

We require character encoding conversion for one of our service, our requirement is to fetch characters in UTF-8 encoded format and should convert to EUC-JP then prepare some hashing on (Groovy based on) jdk8.
In php, similar solution works fine for us and coded as,
$encodedToEucJp = mb_convert_encoding($inputStringWithUtf8, “EUC-JP”);
Print_r(md5($encodedToEucJp));
We have tried many ways for the solution, e.g.,
Java.security.MessageDigest.getInstance(‘MD5’)
.digest(New String(inputStringWithUtf8.getBytes(“UTF-8”), “EUC-JP”)
.getBytes(“EUC-JP”))
.encodeHex()
.toString();
But, this solution failed for some of the characters that produces different digest then from our php coded solution. Here few characters are mentioned ―, ĭ, ? etc. That’s the reason why we couldn't product same digest with same input both in php and java system.
Thanks, in advance.

The error is in this part of the code:
New String(inputStringWithUtf8.getBytes(“UTF-8”), “EUC-JP”)
Basically, you try to interpret an UTF-8 byte array as if it were encoded in EUC-JP, which is a non-sense.
The following code should do the job
Java.security.MessageDigest.getInstance(‘MD5’)
.digest(inputStringWithUtf8.getBytes(“EUC-JP”))
.encodeHex()
.toString();

UTF8 convertion for text obtained from internet

ElasticSearch is a search Server which accepts data only in UTF8.
When i tries to give ElasticSearch following text
Small businesses potentially in line for a lighter reporting load include those with an annual turnover of less than £440,000, net assets of less than £220,000 and fewer than ten employees"
Through my java application - Basically my java application takes this info from a webpage , and gives it to elasticSearch. ES complaints it cant understand £ and it fails. After filtering through below code -
byte bytes[] = s.getBytes("ISO-8859-1");
s = new String(bytes, "UTF-8");
Here £ is converted to �
But then when I copy it to a file in my home directory using bash and it goes in fine. Any pointers will help.

You have ISO-8895-1 octets in bytes, which you then tell String to decode as if it were UTF-8. When it does that, it doesn't recognize the illegal 0xA3 sequence and replaces it with the substitution character.
To do this, you have to construct the string with the encoding it uses, then convert it to the encoding that you want. See How do I convert between ISO-8859-1 and UTF-8 in Java?.

UTF-8 is easier than one thinks. In String everything is unicode characters.
Bytes/string conversion is done as follows.
(Note Cp1252 or Windows-1252 is the Windows Latin1 extension of ISO-8859-1; better use
that one.)
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(file), "Cp1252"));
PrintWriter out = new PrintWriter(
new OutputStreamWriter(new FileOutputStream(file), "UTF-8"));
response.setContentType("text/html; charset=UTF-8");
response.setEncoding("UTF-8");
String s = "20 \u00A3"; // Escaping
To see why Cp1252 is more suitable than ISO-8859-1:
http://en.wikipedia.org/wiki/Windows-1252

String s is a series of characters that are basically independent of any character encoding (ok, not exactly independent, but close enough for our needs now). Whatever encoding your data was in when you loaded it into a String has already been decoded. The decoding was done either using system default encoding (which is practically ALWAYS AN ERROR, do not ever use system default encoding, trust me I have over 10 years of experience in dealing with bugs related to wrong default encodings) or the encoding you explicitely specified when you loaded the data.
When you call getBytes("ISO-8859-1") for a String, you request that the String is encoded into bytes according to ISO-8859-1 encoding.
When you create a String from a byte array, you need to specify the encoding in which the characters in the byte array are represented. You create a string from a byte array that has been encoded in UTF-8 (and just above you encoded it in ISO-8859-1, that is your error).
What you want to do is:
byte bytes[] = s.getBytes("UTF-8");
s = new String(bytes, "UTF-8");

Converting from Java String to Windows-1252 Format

I want to send a URL request, but the parameter values in the URL can have french characters (eg. è). How do I convert from a Java String to Windows-1252 format (which supports the French characters)?
I am currently doing this:
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), "Windows-1252");
However, it makes:
param=Stationnement extèrieur into param=Stationnement extÃ©rieur .
How do I fix this? Any suggestions?
Edit for further clarification:
The user chooses values from a drop down. When the language is French, the values from the drop down sometimes include French characters, like 'è'. When I send this request to the server, it fails, saying it is unable to decipher the request. I have to figure out how to send the 'è' as a different format (preferably Windows-1252) that supports French characters. I have chosen to send as Windows-1252. The server will accept this format. I don't want to replace each character, because I could miss a special character, and then the server will throw an exception.

Use URLEncoder to encode parameter values as application/x-www-form-urlencoded data:
String param = "param="
+ URLEncoder.encode("Stationnement extr\u00e8ieur", "cp1252");
See here for an expanded explanation.

Try using
String encodedURL = new String (unencodedUrl.getBytes("UTF-8"), Charset.forName("Windows-1252"));

As per McDowell's suggestion, I tried encoding doing:
URLEncoder.encode("stringValueWithFrechCharacters", "cp1252") but it didn't work perfectly. I replayced "cp1252" with HTTP.ISO_8859_1 because I believe Android does not have the support for Windows-1252 yet. It does allow for ISO_8859_1, and after reading here, this supports MOST of the French characters, with the exception of 'Œ', 'œ', and 'Ÿ'.
So doing this made it work:
URLEncoder.encode(frenchString, HTTP.ISO_8859_1);
Works perfectly!

NSData to Java String

I've been writing a Web Application recently that interacts with iPhones. The iPhone iphone will actually send information to the server in the form of a plist. So it's not uncommon to see something like...
<key>RandomData</key>
<data>UW31vrxbUTl07PaDRDEln3EWTLojFFmsm7YuRAscirI=</data>
Now I know this data is hashed/encrypted in some fashion. When I open up the plist with an editor (Property List Editor), it shows me a more "human readable" format. For example, the data above would be converted into something like...
<346df5da 3c5b5259 74ecf683 4431249f 711630ba 232c54ac 9bf2ee44 0r1c8ab2>
Any idea what the method of converting it is? Mainly I'm looking to get this into a Java String.
Thanks!

According to our friends at wikipedia, the <data> tag contains Base64 encoded data. So, use your favorite Java "Base64" class to decode (see also this question).
ps. technically, this is neither "hashed" nor "encrypted", simply "encoded". "Hashed" implies a one-way transformation where multiple input values can yield the same output value. "Encrypted" implies the need for a (usually secret) "key" to reverse the encryption. Base64 encoding is simply a way of representing arbitrary binary data using only printable characters.

After base64 decoding it you need to hex encode it. This is what PL Editor is showing you.
So...
<key>SomeData</key>
<data>UW31ejxbelle7PaeRAEen3EWMLojbFmsm7LuRAscirI=</data?
Can be represented with...
byte[] bytes = Base64.decode("UW31ejxbelle7PaeRAEen3EWMLojbFmsm7LuRAscirI=");
BigInteger bigInt = new BigInteger(bytes);
String hexString = bigInt.toString(16);
System.out.println(hexString);
To get...
<516df5aa 3c5b5259 74ecf683 4401259f 711630ba 236c59ac 9bb2ee44 0b1c8ab2>

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to encode character in Microsoft Windows Codepage 1251 (Cyrl) in java - java

I want to encode string in Java with Microsoft Windows Codepage 1251 (Cyrl) table.

Related

How to Encode a HexString to Base64 RFC 1421 in Java

Character Encoding Conversion In Groovy From UTF-8 to EUC-JP

UTF8 convertion for text obtained from internet

Converting from Java String to Windows-1252 Format

NSData to Java String

Categories

Resources