JSON Jackson + HTTPClient with german umlauts - java

I'm having a problem regarding a json string, i acquire with the Apache http client, containing german umlauts.
The mapping of json strings is only working, if the string does not contain any german umlaut, otherwise i get an "JsonMappingException: Can not deserialize instance of [...] out of START_ARRAY.
The Apache http client is set with "Accept-Charset" to HTTP.UTF-8, but as result i always get e.g. "\u00fc" instead "ü". When i manually replace e.g. "\u00fc" with "ü" the mapping works perfect.
How can i get a utf-8 encoded json response from Apache http client?
Or is the server output the problem?
params.setParameter(HttpProtocolParams.USE_EXPECT_CONTINUE, false);
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);
httpclient = new DefaultHttpClient(params);
httpclient = new DefaultHttpClient(params);
HttpGet httpGetContentLoad = new HttpGet(url);
httpGetContentLoad.setHeader("Accept-Charset", "utf-8");
httpGetContentLoad.setParams(params);
response = httpclient.execute(httpGetContentLoad);
entity = response.getEntity();
String loadedContent = null;
if (entity != null)
{
loadedContent = EntityUtils.toString(entity, HTTP.UTF_8);
entity.consumeContent();
}
if (HttpStatus.SC_OK != response.getStatusLine().getStatusCode())
{
throw new Exception("Loading content failed");
}
closeConnection();
return loadedContent;
And the json code is mapped here:
String jsonMetaData = loadGetRequestContent(getLatestEditionUrl(newspaperEdition));
Newspaper loadedNewspaper = mapper.readValue(jsonMetaData, Newspaper.class);
loadedNewspaper.setEdition(newspaperEdition);
Update 1:
JsonMetaData is type of String containing the fetched json code.
Update2:
This code i use to transform the json output to me needs:
public static String convertJsonLatestEditionMeta(String jsonCode)
{
jsonCode = jsonCode.replaceFirst("\\[\"[A-Za-z0-9-[:blank:]]+\",\\{", "{\"edition\":\"an-a1\",");
jsonCode = jsonCode.replaceFirst("\"pages\":\\{", "\"pages\":\\[");
jsonCode = Helper.replaceLast(jsonCode, "}}}]", "}]}");
jsonCode = jsonCode.replaceAll("\"[\\d]*\"\\:\\{\"", "\\{\"");
return jsonCode;
}
Update3:
Json conversion example:
jsoncode before conversion:
["Newspaper title",
{
"date":"20130103",
"pages":
{
"1": {"ressort":"ressorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
"2":{"ressort":"ressorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
[...]
}
}
]
Jsoncode after conversion:
{
"edition":"Newspaper title",
"date":"20130103",
"pages":
[
{"ressort":"Resorttitle1","pdfpfad":"pathToPdf1","number":1,"size":281506},
{"ressort":"Resorttitle2","pdfpfad":"pathToPdf2","number":2,"size":281533},
[...]
]
}
Solution:
I started using GSON as #Boris suggested and the problem regarding umlauts is gone! Further more GSON really seems to be faster than Jackson Json.
A workaround would be to replace the characters manually following this table:
Sign Unicode representation
Ä, ä \u00c4, \u00e4
Ö, ö \u00d6, \u00f6
Ü, ü \u00dc, \u00fc
ß \u00df
€ \u20ac

Try parsing like that:
entity = response.getEntity();
Newspaper loadedNewspaper=mapper.readValue(entity.getContent(), Newspaper.class);
No reason to go through String, Jackson parses InputStreams directly. Also Jackson will automatically detect the encoding if you use my proposed approach.
EDIT By the way consider using GSON JSON parsing library. It is even faster than Jackson and easier to use. However, Jackson recently started parsing XMl, too, which is a virtue.
EDIT2 After all you have added as details I would suppose the problem is with the server implementation of the services - the umlauts are not to be unicode escaped in the json - UTF 8 is native encoding for it. Why don't you instead of manually replace e.g. "\u00fc" with "ü" do it via regex?

Related

JsonParsing : UTF-8 encoding : JsonParseException: Illegal unquoted character

I am trying to parse a JSON which is obtained from rest service via RestTemplate.
When I try to parse a JSON which contains a newline or tab character, it is throwing the following exception:
org.codehaus.jackson.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 13)): has to be escaped using backslash to be included in string value
at [Source: java.io.StringReader#a8f373; line: 1, column: 663]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
at org.codehaus.jackson.impl.ReaderBasedParser._skipString(ReaderBasedParser.java:1416)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:366)
at com.demo.myapp.JsonParsingUtil.parseJson(JsonParsingUtil.java:67) at java.lang.Thread.run(Thread.java:745)
The code is working fine in my local server [Windows OS, JBoss server].
When I deploy the code in the [Linux OS, JBoss server], this parsing exception pops out.
The issue scenario is:
User enters some data via webapp in Windows environment.
I access the data via REST service
Data is parsed to retrieve some information and new JSON created with the info and passes to some other team.
I am using JsonParser to parse the JSON, and I have set the features as:
JsonFactory jsonFactory = new MappingJsonFactory();
JsonParser jsonParser = jsonFactory.createJsonParser(jsonString);
jsonParser.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, true);
jsonParser.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
I have also tried like this:
byte[] byteArray = jsonData.getBody().getBytes();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(byteArray.length);
byteArrayOutputStream.write(byteArray, 0, byteArray.length);
jsonString = new String(byteArray, Charset.forName("UTF-8"));
I have set UTF-8 encoding in RestTemplate object, which is used in obtaining the JSON from the REST service:
RestTemplate restTemplate = new RestTemplate();
restTemplate.getMessageConverters().add(0, new StringHttpMessageConverter(Charset.forName("UTF-8")));
Since my JSON contains a lot of {key:values} using JsonGenerator is not a viable option. I hope somebody can show me the right direction to solve the issue.
Please find the code on which I'm working :
JsonFactory jsonFactory = new MappingJsonFactory();
JsonParser jsonParser = jsonFactory.createJsonParser(jsonString);
jsonParser.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, true);
jsonParser.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, true);
while (!jsonParser.isClosed()) {
JsonToken jsonToken = jsonParser.nextToken();
if (JsonToken.FIELD_NAME.equals(jsonToken)) {
String fieldName = jsonParser.getCurrentName();
if (fieldName.equals("Comments")) {
String code = jsonParser.nextTextValue();
}
}
}
And jsonString obtaining from REST template is:
{"Comments":"Approved.
Please proceed"}
which is
Approved. NewLine Char Please proceed
NOTE: I am using Codehaus jackson-1.9.13 library for JsonParsing.

Encoding issue when reading from Google translator API and writing to properties file

I am using Google translator API to generate Arabic property file from English property file.
Making a URL connection and making a GET request to the URL.,passing original language, translation language and value to be translated
URLConnection urlCon = null;
String urlStr = "https://www.googleapis.com/language/translate/v2";
URL url = new URL(urlStr + "?key=" + apikey + "&source=" + origlang + "&target=" + translateToLang + "&q=" + value);
urlCon = url.openConnection();
urlCon.setConnectTimeout(1000 * 60 * 5);
urlCon.setReadTimeout(1000 * 60 * 5);
urlCon.setDoInput(true);
urlCon.setDoOutput(true);
urlCon.setUseCaches(false);
((HttpURLConnection) urlCon).setRequestMethod("GET");
urlCon.setRequestProperty("Accept-Charset", "UTF-8");
Reading the response from the URL connection through inputstream reader. Passing UTF-8 in the encoding parameter.
BufferedReader br = new BufferedReader(new InputStreamReader(((URLConnection) urlCon).getInputStream(), "UTF-8"));
/* Reading the response line by line */
StringBuffer responseString = new StringBuffer();
String nextLine = null;
while ((nextLine = br.readLine()) != null) {
responseString.append(nextLine);
}
// if response is null or empty, throw exception
String response = responseString.toString();
Parsing the JSON received through GSON parser
JsonElement jelement = new JsonParser().parse(response);
JsonObject jobject = jelement.getAsJsonObject();
jobject = jobject.getAsJsonObject("data");
JsonArray jarray = jobject.getAsJsonArray("translations");
jobject = jarray.get(0).getAsJsonObject();
String result = jobject.get("translatedText").toString();
Writing the translated value in a new property file through fileoutstream
FileOutputStream foutStream = new FileOutputStream(outFile);
foutStream.write(key.getBytes());
foutStream.write("=".getBytes());
foutStream.write(transByte.getBytes());foutStream.write("\n".getBytes());
The issue is I am getting garbled text(?????) written in the new property file for Arabic language.
When you call transByte.getBytes(), the Arabic translation is encoded with your platform default encoding, which will only handle Arabic if your machine is configured for UTF-8 or Arabic. Otherwise, characters will be replaced by '�' or '?' .
Create a new Properties instance, and populate it using setProperty() calls. Then when you store it, the proper escaping will be applied to your Arabic text, which is necessary because property files are encoded with ISO-8859-1 (an encoding for Western Latin characters).
Alternatively, you can store the Properties using a Writer instance that is configured with whatever encoding you choose, but the encoding isn't stored in the file itself, so you will need meta-data or a convention to set the correct encoding when reading the file again.
Finally, you can store the Properties in an XML format, which will use UTF-8 by default, or you can specify another encoding. The file itself will specify the encoding, so it's easier to use an optimal encoding for each language.
Trying to emit a file format using custom string concatenation, as you are doing, is an oft-repeated recipe for disaster. Whether it's XML, JSON, or a simple properties file, it's far too easy to overlook special cases that require escape sequences, etc. Use a library designed to emit the format instead.

MySQL - Hebrew characters become question marks in DB

I have DB which all of its columns are set to be "hebrew_general_ci".
When I try to manually insert hebrew values to my DB, or through Postman, I can see that the values at the DB are indeed in hebrew.
But, When I try to insert the values from my app (android app - coded in java), the values become question marks - ????
I tried to code my text to UTF-8 at the app itself but it didn't work.
here is the code which suppose to do this:
private String POST(String url, String jsonParamsAsString) {
String result = "";
String fixedUrl = url.replace(" ","%20");
try {
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost(fixedUrl);
byte ptext[] = jsonParamsAsString.getBytes();
jsonParamsAsString = new String(ptext, "UTF-8");
StringEntity input = new StringEntity(jsonParamsAsString);
input.setContentType("application/json; charset=utf-8" );
//input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response = httpClient.execute(postRequest);
result = convertInputStreamToString(response.getEntity().getContent());
/*byte ptext[] = result.getBytes();
result = new String(ptext, "UTF-8");*/
} catch (Exception e) {
Log.d("InputStream", e.getLocalizedMessage());
}
return result;
}
You need to set your encoding for the database as utf8 / utf8_general_ci or utf8mb4 / utf8mb4_general_ci if you are running latest version of MySQL & need to handle emoji. Here is the documentation. Basically, you don't need to set your table to a specific char set for a particular language. I've used the above settings and it handles Arabic, Russian, Chinese, English, etc. out of the box, it is language agnostic and just works. Good luck.
Edit: You also need to make sure your query connection has the following two parameters: useUnicode=yes and characterEncoding=UTF-8

invoke rest api with array of string

[
['a','b','c','d','e','f','g'],
['a1','b1','c1','d1','e1','f1','g1'],
['a2','b2','c2','d2','e2','f2','g2'],
['a3','b3','c3','d3','e3','f3','g3']
]
I have above string, have converted to json object as:
JSONObject json = new JSONObject();
json.put("name", "a");
json.put("country", "b");
json.put("state", "c");
...
...
HttpPost post = new HttpPost(url);
...
HttpResponse httpresponse = client.execute(post);
This works fine for me.
My question is instead of above code, i,e creating json object and sending, can I directly pass the string to my rest component? If possible, what type of mime I need to select?
Requirement is pass the entire string and split in backend java component instead of split, convert to json then send to back end.
The MIME media type for JSON text is application/json. The default encoding is UTF-8. (Source: RFC 4627).

Parsing Facebook signed_request using Java returns malformed JSON

I'm trying to parse Facebook signed_request inside Java Servlet's doPost. And I decode the signed request using commons-codec-1.3's Base64.
Here is the code which I used to do it inside servlet's doPost
String signedRequest = (String) req.getParameter("signed_request");
String payload = signedRequest.split("[.]", 2)[1];
payload = payload.replace("-", "+").replace("_", "/").trim();
String jsonString = new String(Base64.decodeBase64(payload.getBytes()));
when I System.out the jsonString it's malformed. Sometime's it misses the ending } of JSON
sometime it misses "} in the end of the string.
How can I get the proper JSON response from Facebook?
facebook is using Base64 for URLs and you are probably trying to decode the text using the standard Base64 algorithm.
among other things, the URL variant doesn't required padding with "=".
you could add the required characters in code (padding, etc)
you can use commons-codec 1.5 ( new Base64(true)), where they added support for this encoding.
The Facebook is sending you "unpadded" Base64 values (the URL "standard") and this is problematic for Java decoders that don't expect it. You can tell you have the problem when the Base64 encoded data that you want to decode has a length that is not a multiple of 4.
I used this function to fix the values:
public static String padBase64(String b64) {
String padding = "";
// If you are a java developer, *this* is the critical bit.. FB expects
// the base64 decode to do this padding for you (as the PHP one
// apparently
// does...
switch (b64.length() % 4) {
case 0:
break;
case 1:
padding = "===";
break;
case 2:
padding = "==";
break;
default:
padding = "=";
}
return b64 + padding;
}
I have never done this in Java so I don't have a full answer, but the fact that you are sometimes losing one and sometimes two characters from the end of the string suggests it may be an issue with Base64 padding. You might want to output the value of payload and see if when it ends with '=' then jsonString is missing '}' and when payload ends with '==' then jsonString is missing '"}'. If that seems to be the case then something is going wrong with the interpretation of the equals signs at the end of payload which are supposed to represent empty bits.
Edit: On further reflection I believe this is because Facebook is using Base64 URL encoding (which does not add = as pad chars) instead of regular Base64, whereas your decoding function is expecting regular Base64 with the trailing = chars.
I've upgraded to common-codec-1.5 using code very similar to this and am not experiencing this issue. Have you confirmed that payload really is malformed by using an online decoder?
Hello in the year 2021.
The other answers are obsolete, because with Java 8 and newer you can decode the base64url scheme by using the new Base64.getUrlDecoder() (instead of getDecoder).
The base64url scheme is a URL and filename safe dialect of the main base64 scheme and uses "-" instead of "+" and "_" instead of "/" (because the plus and slash chars have special meanings in URLs). Also it does not use "=" chars for the padding (0 to 4 chars) at the end of string.
Here is how you can parse the Facebook signed_request parameter in Java into a Map object:
public static Map<String, String> parseSignedRequest(HttpServletRequest httpReq, String facebookSecret) throws ServletException {
String signedRequest = httpReq.getParameter("signed_request");
String splitArray[] = signedRequest.split("\\.", 2);
String sigBase64 = splitArray[0];
String payloadBase64 = splitArray[1];
String payload = new String(Base64.getUrlDecoder().decode(payloadBase64));
try {
Mac sha256_HMAC = Mac.getInstance("HmacSHA256");
SecretKeySpec secretKey = new SecretKeySpec(facebookSecret.getBytes(), "HmacSHA256");
sha256_HMAC.init(secretKey);
String sigExpected = Base64.getUrlEncoder().withoutPadding().encodeToString(sha256_HMAC.doFinal(payloadBase64.getBytes()));
if (!sigBase64.equals(sigExpected)) {
LOG.warn("sigBase64 = {}", sigBase64);
LOG.warn("sigExpected = {}", sigExpected);
throw new ServletException("Invalid sig = " + sigBase64);
}
} catch (IllegalStateException | InvalidKeyException | NoSuchAlgorithmException ex) {
throw new ServletException("parseSignedRequest", ex);
}
// use Jetty JSON parsing or some other library
return (Map<String, String>) JSON.parse(payload);
}
I have used the Jetty JSON parser:
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<version>9.4.43.v20210629</version>
</dependency>
but there are more libraries available in Java for parsing JSON.

Categories