GSON Library parse JSON UTF-8 not properly, replace accent by "?" - java

I'm try to develop a Java application with GSON library to parse JSON (from PHP file (encoding in UTF-8) -> json_encode)
My php source :
<?php
$base = mysql_connect ('****', '*****', '*****');
mysql_select_db ('*****', $base) ;
$req = mysql_query("SELECT ***, ****, ***, ****, **** from *****");
function jsonRemoveUnicodeSequences($struct) {
return preg_replace("/\\\\u([a-f0-9]{4})/e", "iconv('UCS-4LE','UTF-8',pack('V', hexdec('U$1')))", json_encode($struct));
}
while ($row = mysql_fetch_array($req)) {
$output[] = $row;
}
print(jsonRemoveUnicodeSequences($output));
mysql_free_result ($req);
?>
JSON string is properly displayed with accent in in my web browser.
My Java source :
BufferedReader reader = null;
try {
URL url = new URL("**************");
URLConnection urlConnection = url.openConnection();
reader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "UTF-8"));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line);
System.out.println("JSON data issu du PHP : "+ line + "\n");
Gson gson = new Gson();
Type type = new TypeToken<List<AlertTable>>(){}.getType();
ArrayList<AlertTable> bddListJson = gson.fromJson(line, type);
bddList = (ArrayList<AlertTable>) bddListJson.clone();
}
} catch (IOException e) {
//
} finally {
if (reader != null) {
//
}
}
System.out.println(bddList.get(1).getTypeAlert());
System.out.println(bddList.get(1).getLigne());
System.out.println(bddList.get(1).getSens());
System.out.println(bddList.get(1).getStation());
System.out.println(bddList.get(1).getTimeAlert());
And in console, character with accent is replaced by "?".
Any idea ?

First, note that your code may fail on some systems because MySQL uses a connection charset to move text between the client and the server. You should issue a raw query like SET NAMES <charset> to set the encoding of the input data available in your script.
Now, assuming there's no transcoding issue (because your columns only store ASCII characters and all charset involved are ASCII-compatible), I wrote the following code to replace \u0000 escape sequences with UTF8-encoded strings:
<?php
while ($row = mysql_fetch_array($req)) {
$out[] = preg_replace_callback("/\\\\u([a-f0-9]{4})/i", "unescape", $row);
}
// On PHP 5.4+ use json_encode($out, JSON_UNESCAPED_UNICODE)
echo json_encode($out);
/* Accept the matcher array
* return the UTF-8 encoded string
*/
function unescape($match) {
return call_user_func_array('pack', get_pack_args(hexdec($match[1])));
}
function get_pack_args($cp) {
if ($cp < 0x80) return array('C1', $cp);
if ($cp < 0x0800) {
$length = 2;
} else if ($cp < 0x010000) {
$length = 3;
} else {
$length = 4;
}
$args[0] = "C{$length}";
// lead byte
$args[1] = (0xFE << (7 - $length)) | ($cp >> (6 * ($length - 1)));
// continuation bytes
for ($l = 0; $l < ($length - 1); $l++) {
$args[$length - $l] = 0x80 | (($cp >> (6 * $l)) & 0x3F);
}
// stupid PHP...
ksort($args);
return $args;
}
You should test the code deeply, however it's a good starting point. For displaying in a browser, you should specify the encoding with a HTTP header:
header('Content-Type: application/json; charset=utf8');
Note that json_encode() escapes unicode sequences on its part by default. This is inefficient, so you may want to use the JSON_UNESCAPED_UNICODE flag, or choose another JSON library for encoding.
At this point the server part should be ok, and you can test it with your browser. If it works, but the Java program can't still show the right characters on the console, it may be a problem with the console itself, or even with the font used by the console application. Without the actual data one can't tell, however follow my advice and print the JSON to a text file UTF8-encoded. Then open it with a text editor (specifying UTF-8 if needed) and tell what you see.
Finally, note that the usage of the mysql PHP extension is discouraged. Use mysqli or PDO instead.

if You set parameter of json_encode Correctly like
json_encode($WhoRank,JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP | JSON_UNESCAPED_UNICODE);
no problem in encoding value and transfer data but if is not work again you can Use urlencode for Encode UTF-8 Value and Decode it on Java
like this
urlencode('aیgfسبd');
output
a%DB%8Cgf%D8%B3%D8%A8d
with this output You haven't problem in json value.

Related

CSV special characters issue for non UTF 8 encoding

Hi I am using a spring mvc application to process excel and csv file. I have encountered one issue that for special characters such as DèéêàáâÉ once process it is converting it to D������� which is wrong.
However when the csv file encoding is UTF8 the special characters is converted successfully.
Part of the ajax call is shown below:
$('#fileuploading').fileupload({
url: 'uploadFile',
dataType: 'json',
acceptFileTypes: /(\.|\/)(csv|xlsx)$/i,
maxFileSize: 10000000,
autoUpload: false,
disableImageLoad: true,
disableAudioPreview: true,
disableVideoPreview: true,
disableValidation: false,
disableImageResize: true
})
My controller method is shown below:
#RequestMapping(value = "/uploadFile", method = RequestMethod.POST)
public #ResponseBody List<JSONResult> uploadFileHandler(
#RequestParam("files") MultipartFile file, HttpServletRequest request) {
logger.info("Starting upload of file: " + file.getOriginalFilename());
JSONResult result = null;
try {
result = uploadFile(file, appUserDTO, result, request);
} catch (IllegalStateException | IOException e) {
logger.error(e.getMessage() + e.getStackTrace());
errorLogService.saveErrorLog("FileUploadController: uploadFileHandler. Error: "+ e.getMessage(), appUserDTO.getUser().getUsrUid());
}
List<JSONResult> array = new ArrayList<>();
array.add(result);
return array;
}
Please find below method for processing the file
public CsvFileReader(String path, String delimeter, File file) throws FileNotFoundException {
String line="";
rows = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF8"))) {
while ((line = br.readLine()) != null) {
String[] lineData = line.split(delimeter,-1);
if(SanityCheck.isValid(lineData)){
rows.add(lineData);
}
}
} catch (IOException e) {
logger.error(e.getMessage());
}
}
Any one can point me out to the right direction how to solve this please?
Your program tries to read the files in UTF-8, therefore the files need to be in UTF-8 and it won't work if they aren't.
If you're asking how to handle files that can be in any encoding, the encoding of a file cannot be guessed, so you need to inform the server of the file's encoding when you upload it, using extra information such as a form field indicating the encoding.
If you're asking how to handle files that can be in any encoding, while you don't know how where to obtain from the encoding of a file because the files are just stashed there and you're not aware of any listing of the encoding of each of them, well like I said, it cannot be guessed.
If you feel like it, you can attempt to guess the encoding of the file, by reading it in UTF-8 first, and checking whether the result contains invalid characters. If not, reading it in UTF-8 was most likely correct. If there are invalid characters, then it's probable UTF-8 was not the correct encoding and you should try another. That other encoding may be windows-1252... And it may be something else entirely. No way to know, really.

Can't base64 decode utf-8 special characters properly

I have an Android app that communicates with a Python backend. It sends a base64 encoded string and the Python server decodes it. It currently works fine for 0-9 A-z characters, but if I put in a æ, ø or other special characters, the entire output is garbled. Im assumning its an encoding issue, but I cant quite figure out what I'm doing wrong.
Java:
public static String encodeToBase64(String text){
byte[] data = null;
try {
data = text.getBytes("UTF-8");
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
return text;
}
text = Base64.encodeToString(data, Base64.DEFAULT);
return text;
}
And Python
subject = base64.b64decode(subject)
body = base64.b64decode(body).decode('UTF-8')
Can anyone see what the issue is here?
On the Java side, if I send in the following text, everything is fine
input = "[Mar 6 - 15:45:5] hello world"
output = "W01hciA2IC0gMTU6NDU6NV0gaGVsbG8gd29ybGQKQmF0dGVyeTogNTAuMCU="
However, the following does not work.
input = "[Mar 6 - 15:46:54] hello wørld"
When Python decodes it it gets the following:
[Mar 6 - 15:46:54] ᘀ栀攀氀氀漀 眀爀氀
It gets encoded as

Java Rest Web Service: '+' character inside String param is treated as space character

My client app sends a String to the servers side to decode it. The String in question may contain '+' characters.
My problem is when i want to treat the String, it seams that the '+' chars are gone (probably treated as concatenation operators. How could I solve this problem? Is String not the right type for that? Should I use Byte[] instead?
Client Side:
$.ajax({
type: "GET",
url: "my/url/DecryptString",
data: "encryptedString="+$("#myStringInput").val(),
...
Code on Server Side:
...
public String decryptString(#QueryParam("encryptedString") String encryptedString) {
logger.info("=====> decryptString()");
String decryptedString = null;
Properties properties = new Properties();
logger.debug("encryptedString: " + encryptedString);
// get crypto properties
try {
properties.load(toto.class.getResourceAsStream("/config.properties"));
} catch (IOException e1) {
logger.error(e1.getStackTrace());
}
if(properties.isEmpty()) {
logger.error("Properties file is EMPTY!");
}
String encodeKey = properties.getProperty("toto.crypt.encodeKey");
decryptedString = Crypto.Decrypt(encodeKey, encryptedString.toString());
return decryptedString;
}
The String passed as the encrypted string on client side is: '7NFASg++qLOSfDXBR8wzcw=='
Here are the traces on server side:
DEBUG encryptedString: 7NFASg qLOSfDXBR8wzcw==
+ in query params are treated as encodings for spaces, that's why 7NFASg++qLOSfDXBR8wzcw== is decoded as 7NFASg qLOSfDXBR8wzcw==. If you want to transport a + through a query param, encode it as %2B (see also Request Parameter Losing Plus Sign).

Why do Java MD5 and PHP MD5 deliver entirely different results?

While preparing access data for a website, I suddenly got aware of the fact that the MD5 encrypted passwords created by Java are not at all the same as those created by PHP MD5. Looking through the postings here and elsewhere I found several solutions in Java assuring that the would deliver exactly the same result as PHP - but all Java implementations deliver the same result among themselves but an entirely different result from the one by PHP.
I therefore wrote a small Java program sending the same password to my local server for conversion by PHP, and at the same time converting it by itself using Java. Here the program:
public PHPDriver() {
String pwd = "aabbccdd";
String p = encodeByPHP("http://localhost/testsite/md5.php?pwd=" + pwd);
System.out.println("PHPDriver: " + pwd + " -> " + p);
System.out.println("md5: " + pwd + " -> " + md5(p));
...
public String encodeByPHP(String url) {
try {
// create a link to a URL
URL urlAddress = new URL(url);
URLConnection link = urlAddress.openConnection();
BufferedReader inStream = new BufferedReader(new InputStreamReader(link.getInputStream()));
return inStream.readLine();
} catch (MalformedURLException e) {
...
public String md5(String input) {
String result = input;
try {
if(input != null) {
MessageDigest md = MessageDigest.getInstance("MD5"); //or "SHA-1"
md.update(input.getBytes());
BigInteger hash = new BigInteger(1, md.digest());
result = hash.toString(16);
while(result.length() < 32) {
result = "0" + result;
}
}
} catch (NoSuchAlgorithmException nsa) {
And the (overwhelmingly complicated;-) PHP page:
<?php
$pwd = $_GET['pwd'];
// echo $pwd . ' -> ';
echo sha1($pwd);
?>
The result of this looks as follows:
PHPDriver: aabbccdd -> 527bee2730bf234e9a78bde5af091ece9c6302d5
md5: aabbccdd -> ab86815613f7f321001efef1935dbe7d
What's going wrong here? Is this an mis-encoding problem? And why is the PHP result 40 char's long instead of the usual 32?
Your returning sha1 from your PHP script
Instead Use:
<?php
$pwd = $_GET['pwd'];
// echo $pwd . ' -> ';
echo md5($pwd);
?>
Besides using sha1 in one and md5 in the other, you're not specifying a Charset when you call String.getBytes() in Java. This can result in unpredictable behavior. Specifying a Charset like UTF-8 will get consistent results, and then you can check in PHP how to make sure you're hashing the string with UTF-8.

Java: how to check the content of a php script output?

sorry for the stupid question but my knowledge of java net is terrible.
BAsically in my android application a call many php scripts to get data from a mysql db.
These data are returned in json format and i use Google json library to parse them.
Everything works fine but know in each php page i have to add a test. It the test is successfull, then the script continues and returns the json file, but if the test fails, the script return the string "false", or the value false (that's up to me) and my application instead of showing data has to redirect the user to a login page.
The code is the following:
URL url = new URL(Costanti.IP_SERVER+"myApps.php"+"?userId="+this.userId);
try
{
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
int status = conn.getResponseCode();
if (status >= 200 && status <= 299)
{
Reader r = new InputStreamReader(conn.getInputStream());
Applicazioni dati = new Applicazioni();
try
{
dati = gsonReader.fromJson(r, Applicazioni.class);
return dati;
}
catch (Exception e)
{
System.out.println("Ho fallito a prendere i dati");
Log.e("JSON_Parser",e.toString());
}
}
}
catch (IOException e)
{
System.out.println("Ho fallito la connnection");
e.printStackTrace();
}
}
So basically i use this google library to read the json file inside the imputStreamReader and fill the Applicazioni object with my data.
How can i check if the content of the imputStreamReader is the string "false" or the boolean false and if it's different parse it with the json library????
In the php at the end i do
echo json_encode($applicazione);
in one case or
echo "false" in the other case
Tnx
InputStream in = new URL(url).openStream();
Scanner scanner = new Scanner(new InputStreamReader(in));
String result = scanner.useDelimiter("\\Z").next(); // this reads the whole
// script output in a string
if(result.equals("false"))
handle the false value...
else
dati = gsonReader.fromJson(result, Applicazioni.class);
You can json encode the false result also like ["result"=>"false"] from PHP, This way you can always JSON decode in your Java program, and then look for result value.
You can put the result value in both cases in the output.

Categories