How to convert this hex string to unicode in java? - java

I get below result from a web service:
"\\x52\\x50\\x1F\\x1F\\x44\\x46\\x57\\x47"
I need to get the strings in unicode characters, which i think would be:
"\u0052\u0050\u001F\u001F\u0044\u0046\u0057\u0047"
i.e. "RPDFWG"
I cannot use replace("\\x", "\u00"); because it says "\u00" is not a valid unicode

This code works for me:
try {
String orig = "\\x52\\x50\\x1F\\x1F\\x44\\x46\\x57\\x47";
byte[] bytes = new byte[orig.length() / 4];
for (int i = 0; i < orig.length(); i += 4) {
bytes[i / 4] = (byte) Integer.parseInt(orig.substring(i + 2, i + 4), 16);
}
System.out.println(new String(bytes, "UTF-8"));
}
catch (Exception e) {
e.printStackTrace();
}
You might want to change the encoding to ISO-8859-1, or just plain ASCII -- I can't tell from your example what encoding is relevant here.

Related

Convert Unicode to UTF-8

My question may already have been answered on StackoverFlow, but I can't find it.
My problem is simple: I request data via an API, and the data returned have unicode characters, for example:
"SpecialOffer":[{"title":"Offre Vente Priv\u00e9e 1 jour 2019 2020"}]
I need to convert the "\u00e9e" to "é".
I cant't make a "replaceAll", because I cannot know all the characters that there will be in advance.
I try this :
byte[] utf8 = reponse.getBytes("UTF-8")
String string = new String(utf8, "UTF-8");
But the string still has "\u00e9e"
Also this :
byte[] utf8 = reponse.getBytes(StandardCharsets.UTF_8);
String string = new String(utf8, StandardCharsets.UTF_8);
Also tried this :
string = string.replace("\\\\", "\\");
byte[] utf8Bytes = null;
String convertedString = null;
utf8Bytes = string.getBytes("UTF8") -- Or StandardCharsets.UTF_8 OR UTF-8 OR UTF_8;
convertedString = new String(utf8Bytes, "UTF8") -- Or StandardCharsets.UTF_8 OR UTF-8 OR UTF_8;;
System.out.println(convertedString);
return convertedString;
But it doesn't work either.
I tested other methods but I think I deleted everything like that didn't work so I can't show them to you here.
I am sure there is a very simple method, but I should not search with the right vocabulary on the internet. Can you help me please ?
I wish you a very good day, and thank you very much in advance.
The String.getBytes method requires a valid Charset [1]
From the javadoc [2] the valid cases are
US-ASCII
ISO-8859-1
UTF-8
UTF-16BE
UTF-16LE
UTF-16
So you need to use UTF-8 in the getBytes method.
[1] https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes-java.nio.charset.Charset-
[2] https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html
You can use small json library
String jsonstring = "{\"SpecialOffer\":[{\"title\":\"Offre Vente Priv\\u00e9e 1 jour 2019 2020\"}]}";
JsonValue json = JsonParser.parse(jsonstring);
String value = json.asObject()
.first("SpecialOffer").asArray().get(0)
.asObject().first("title").asStringLiteral().stringValue();
System.out.println(" result: " + value);
or
String text = "Offre Vente Priv\\u00e9e 1 jour 2019 2020";
System.out.println(" result: " + JsonEscaper.unescape(text));
The problem that I had not seen, is that the API did not return me "\u00e9e" but "\\u00e9e" as it was a character sequence and not a unicode character!
So I have to recreate all the unicodes, and everything works fine!
int i=0, len=s.length();
char c;
StringBuffer sb = new StringBuffer(len);
while (i < len) {
c = s.charAt(i++);
if (c == '\\') {
if (i < len) {
c = s.charAt(i++);
if (c == 'u') {
// TODO: check that 4 more chars exist and are all hex digits
c = (char) Integer.parseInt(s.substring(i, i+4), 16);
i += 4;
} // add other cases here as desired...
}
} // fall through: \ escapes itself, quotes any character but u
sb.append(c);
}
return sb.toString();
Find this solution here:
Java: How to create unicode from string "\u00C3" etc

Need help in converting EBCDIC to Hexadecimal

I am writing an hive UDF to convert the EBCDIC character to Hexadecimal.
Ebcdic characters are present in hive table.Currently I am able to convert it, bit it is ignoring few characters while conversion.
Example:
This is the EBCDIC value stored in table:
AGNSAñA¦ûÃÃÂõÂjÂq  à ()
Converted hexadecimal:
c1c7d5e2000a5cd4f6ef99187d07067203a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
What I want as output:
c1c7d5e200010a5cd4f6ef99187d0706720103a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
It is ignoring to convert the below EBCDIC characters:
01 - It is start of heading
10 - It is a escape
15 - New line.
Below is the code I have tried so far:
public class EbcdicToHex extends UDF {
public String evaluate(String edata) throws UnsupportedEncodingException {
byte[] ebcdiResult = getEBCDICRawData(edata);
String hexResult = getHexData(ebcdiResult);
return hexResult;
}
public byte[] getEBCDICRawData (String edata) throws UnsupportedEncodingException {
byte[] result = null;
String ebcdic_encoding = "IBM-037";
result = edata.getBytes(ebcdic_encoding);
return result;
}
public String getHexData(byte[] result){
String output = asHex(result);
return output;
}
public static String asHex(byte[] buf) {
char[] HEX_CHARS = "0123456789abcdef".toCharArray();
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}
}
While converting, its ignoring few EBCDIC characters. How to make them also converted to hexadecimal?
I think the problem lies elsewhere, I created a small testcase where I create a String based on those 3 bytes you claim to be ignored, but in my output they do seem to be converted correctly:
private void run(String[] args) throws Exception {
byte[] bytes = new byte[] {0x01, 0x10, 0x15};
String str = new String(bytes, "IBM-037");
byte[] result = getEBCDICRawData(str);
for(byte b : result) {
System.out.print(Integer.toString(( b & 0xff ) + 0x100, 16).substring(1) + " ");
}
System.out.println();
System.out.println(evaluate(str));
}
Output:
01 10 15
011015
Based on this it seems both your getEBCDICRawData and evaluate method seem to be working correctly and makes me believe your String value may already be incorrect to start with. Could it be the String is already missing those characters? Or perhaps a long shot, but maybe the charset is incorrect? There are different EBCDIC charsets, so maybe the String is composed using a different one? Although I doubt this would make much difference for the 01, 10 and 15 bytes.
As a final remark, but probably unrelated to your problem, I usually prefer to use the encode/decode functions on the charset object to do such conversions:
String charset = "IBM-037";
Charset cs = Charset.forName(charset);
ByteBuffer bb = cs.encode(str);
CharBuffer cb = cs.decode(bb);

How to encode String with Java to receive identical effect as with PHP

I've got a PHP function which encodes strings with saml/xml as content:
function encodeSamlRequest($samlRequest) {
return addslashes(rawurlencode(base64_encode(gzdeflate($samlRequest))));
}
I've to create Java method which can produce identical output:
private String encodeRequest(String samlRequest) {
byte[] samlRequestBytes = samlRequest.getBytes();
// gzdeflate
Deflater compressor = new Deflater();
compressor.deflate(samlRequestBytes);
// base64_encode
byte[] encodedDeflatedSamlRequest = Base64.getEncoder().encode(samlRequestBytes);
// rawurlenccode
String encodedSamlRequest = null;
try {
encodedSamlRequest = URLEncoder.encode(new String(encodedDeflatedSamlRequest), "UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return encodedSamlRequest;
}
private byte[] toByteArray(int value) {
return new byte[]{
(byte) (value >> 24),
(byte) (value >> 16),
(byte) (value >> 8),
(byte) value};
}
I tried to use Deflater, Base64.Encoder, URIEncoder and URI classes (two last without urls because my reputation < 10), but output is totally different.
Test input:
<samlp:AuthnRequest xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="_e48eec122b20b6b83702" Version="2.0" IssueInstant="2017-09-28T13:54:16Z" Destination="https://hetman-int.epuap.gov.pl/DracoEngine2/draco.jsf" IsPassive="false" AssertionConsumerServiceURL="http://localhost:8085/index.jsp"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">/test12/myapp-12423534</saml:Issuer></samlp:AuthnRequest>
Expected output:
fVHJbsIwEP2VyPfEiRMgtUgkVHpAolIFLYdeKhMG4ioZux4H0b%2BvA6pELz3O8pZ5MyfVd1YuBt%2FiBr4GIB9d%2Bg5JXgcVGxxKo0iTRNUDSd%2FI7eJ5LUWSSuuMN43pWLRaVuwDihKgyYTYi3Q%2F3Zf5LBUs2oEjbbBiAREWiQZYIXmFPrTSbBanD7EoX7NcTgqZTd9ZtAwmNCp%2FRbXeW5Kct%2BB7hbFGn4AdlE1O5pzYji%2BdaswTnjSC4IexSD7pOAq9KCJ9hoodVUfAogURuJH00SANPbgtuLNu4G2zvskElc40qmsNeVmm5YRrPMAl8FlWz8c85NW%2Bu0vo%2F4DUrySruQ9XZYL338raOBOFyCd5Med3tPWt%2BvuN%2Bgc%3D
My output:
eJxhbWxwOkF1dGhuUmVxdWVzdCB4bWxuczpzYW1scD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOnByb3RvY29sIiBJRD0iX2U0OGVlYzEyMmIyMGI2YjgzNzAyIiBWZXJzaW9uPSIyLjAiIElzc3VlSW5zdGFudD0iMjAxNy0wOS0yOFQxMzo1NDoxNloiIERlc3RpbmF0aW9uPSJodHRwczovL2hldG1hbi1pbnQuZXB1YXAuZ292LnBsL0RyYWNvRW5naW5lMi9kcmFjby5qc2YiIElzUGFzc2l2ZT0iZmFsc2UiIEFzc2VydGlvbkNvbnN1bWVyU2VydmljZVVSTD0iaHR0cDovL2xvY2FsaG9zdDo4MDg1L3N1Y2Nlc3MuanNwIj48c2FtbDpJc3N1ZXIgeG1sbnM6c2FtbD0idXJuOm9hc2lzOm5hbWVzOnRjOlNBTUw6Mi4wOmFzc2VydGlvbiI%2BL3Rlc3Rwbi9Eb215nGxuYS0xNDk2MDM0NjAyOTA5PC9zYW1sOklzc3Vlcj48L3NhbWxwOkF1dGhuUmVxdWVzdD4%3D
Any advices or ideas? Thanks for any help.

Convert byte into special character Java

I want send any value via Bluetooth, by any i mean value 0-255, but I can't convert this value into a char in string. I tried few different ways, but without success.
int a= 240 ;
char z=(char)a;
mConnectedThread.write("START"+"\240"+","+"\0240"+","+"\030"+","+Integer.toString(a)+","+z+","+"\0f1"+"STOP");
I get this (left - value in decimal, right value in ASCII char):
83-'S'
84-'T'
65- 'A'
82- 'R'
84- 'T'
194-'Â'
160- ''
44- ','
20-'\024'
48-'0'
44-','
24-'\030'
44-','
50-'2'
52-'4'
48-'0'
44-','
195-'Ã'
176-'°'
44 ','
0-'\0'
102-'f'
49-'1'
83-'S'
84-'T'
79-'O'
80-'P'
When I send \030 then I receive 24-'\30' in one character, but I can't send bigger numbers.
So my question is: how to set/convert any value in the range 0-255 in one string character. I don't need to display this, it's not important.
I receive and send byte[] data, and it working good, but i think that makin this on string will be better, bocouse when i make data table it's looks like this
byte[] dat = new byte[50];
dat[i++] = (byte) '!';
dat[i++] = (byte) '!';
dat[i++] = (byte) 0xf0;
dat[i++] = (byte) 0xf0;
dat[i++] = (byte) SET_COUPLING;
dat[i++] = (byte) (coupling + 64*this.name);
dat[i++] = (byte) 0xf0;
dat[i++] = (byte) 0xf0;
dat[i++] = (byte)SET_MULTI;
i just looking for better way to construct this command.
My send function :
void write(String str)
{
if (STATE != CONNECTED)
return;
try
{
mmOutStream.write(str.getBytes());
} catch (IOException e)
{
synchronized (MainActivity.this)
{
btDisconnect();
changeState(CONNECTION_ERROR);
mConnectedThread = null;
}
}
}
or second write function using byte[]:
void write(byte[] data)
{
if (STATE != CONNECTED)
return;
try
{
mmOutStream.write(data);
} catch (IOException e)
{
handler.post(new Runnable()
{
public void run()
{
btDisconnect();
changeState(CONNECTION_ERROR);
mConnectedThread = null;
}
});
}
}
i see that i convert this string into byte[] using mmOutStream.write(str.getBytes());, but i was thinking that making string command and then convert int byte[] is more elegant method.
I thought that is simple way to make this like in sprintf
sprintf(str,"%4.1f\xdf""C", temp);
where "\xdf" is just 223 in one byte and is degree mark when i put this into alphanumeric display
Edited
If you just want to write a string to the stream without converting it to bytes, you can use BufferedWriter :
OutputStreamWriter osw = new OutputStreamWriter(outputStream);
BufferedWriter bufferedWriter = new BufferedWriter(osw);
bufferedWriter.write("some string");
Make sure that you understand flush and close methods from the api:
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedWriter.html
This is very standard in java to wrap the low level input/output stream objects with higher level Reader/Writer object to get access to more convenient methods.
Original answer
You can construct String using specific charset to decode bytes :
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String(byte[], java.nio.charset.Charset)
US-ASCII could be enough for your use case
http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
So you can construct a string out of byte array like this :
new String(byteArray, StandardCharsets.US_ASCII);

Python encoded utf-8 string \xc4\x91 in Java

How to get proper Java string from Python created string 'Oslobo\xc4\x91enja'?
How to decode it? I've tryed I think everything, looked everywhere, I've been stuck for 2 days with this problem. Please help!
Here is the Python's web service method that returns JSON from which Java client with Google Gson parses it.
def list_of_suggestions(entry):
input = entry.encode('utf-8')
"""Returns list of suggestions from auto-complete search"""
json_result = { 'suggestions': [] }
resp = urllib2.urlopen('https://maps.googleapis.com/maps/api/place/autocomplete/json?input=' + urllib2.quote(input) + '&location=45.268605,19.852924&radius=3000&components=country:rs&sensor=false&key=blahblahblahblah')
# make json object from response
json_resp = json.loads(resp.read())
if json_resp['status'] == u'OK':
for pred in json_resp['predictions']:
if pred['description'].find('Novi Sad') != -1 or pred['description'].find(u'Нови Сад') != -1:
obj = {}
obj['name'] = pred['description'].encode('utf-8').encode('string-escape')
obj['reference'] = pred['reference'].encode('utf-8').encode('string-escape')
json_result['suggestions'].append(obj)
return str(json_result)
Here is solution on Java client
private String python2JavaStr(String pythonStr) throws UnsupportedEncodingException {
int charValue;
byte[] bytes = pythonStr.getBytes();
ByteBuffer decodedBytes = ByteBuffer.allocate(pythonStr.length());
for (int i = 0; i < bytes.length; i++) {
if (bytes[i] == '\\' && bytes[i + 1] == 'x') {
// \xc4 => c4 => 196
charValue = Integer.parseInt(pythonStr.substring(i + 2, i + 4), 16);
decodedBytes.put((byte) charValue);
i += 3;
} else
decodedBytes.put(bytes[i]);
}
return new String(decodedBytes.array(), "UTF-8");
}
You are returning the string version of the python data structure.
Return an actual JSON response instead; leave the values as Unicode:
if json_resp['status'] == u'OK':
for pred in json_resp['predictions']:
desc = pred['description']
if u'Novi Sad' in desc or u'Нови Сад' in desc:
obj = {
'name': pred['description'],
'reference': pred['reference']
}
json_result['suggestions'].append(obj)
return json.dumps(json_result)
Now Java does not have to interpret Python escape codes, and can parse valid JSON instead.
Python escapes unicode characters by converting their UTF-8 bytes into a series of \xVV values, where VV is the hex value of the byte. This is very different from the java unicode escapes, which are just a single \uVVVV per character, where VVVV is hex UTF-16 encoding.
Consider:
\xc4\x91
In decimal, those hex values are:
196 145
then (in Java):
byte[] bytes = { (byte) 196, (byte) 145 };
System.out.println("result: " + new String(bytes, "UTF-8"));
prints:
result: đ

Categories