Corrupted JSON encoding? - java

I'm sending a JSON object of the same class from a servlet to an applet, but
all strings variables in this class are missing some characters like: 'ą', 'ę', 'ś', 'ń', 'ł'.
However, 'ó' is displayed normally (?). For example:
"Zaznacz prawid?ow? operacj? porównywania dwóch zmiennych typu"
Solution
I wish I could explain it more thoroughly, but as Henry noticed, it's IDE causing this issue. I solved it using farmer1992's class from the google ticket. It prints escaped unicode characters (\u...) - the only way my applet could encode characters correctly. Also I have to restart NetBeans IDE from time to time to force the Tomcat servlet to work correctly (I have no idea why :) ).
Servlet code (updated with solution):
//begin of the servlet code extract
public void sendToApplet(HttpServletResponse response, String path) throws IOException
{
TestServlet x = new TestServlet();
x.load(path);
String json = new Gson().toJson(x);
response.setCharacterEncoding("UTF-8");
response.setContentType("application/json;charset=UTF-8");
PrintWriter out = response.getWriter();
//out.print(json);
//out.flush();
GhettoAsciiWriter out2 = new GhettoAsciiWriter(out);
out2.write(json);
out2.flush();
}
//end of the servlet code extract
Applet code:
//begin of the applet code extract
public void retrieveFromServlet(String path) throws MalformedURLException, IOException
{
String encoder = URLEncoder.encode(path, "UTF-8");
URL urlServlet = new URL("http://localhost:8080/ProjektServlet?action=" + encoder);
URLConnection connection = urlServlet.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setUseCaches(false);
connection.setRequestProperty("Content-Type", "application/json;charset=UTF-8");
InputStream inputStream = connection.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String json = br.readLine();
Test y = new Gson().fromJson(json, Test.class);
inputStream.close();
}
//end of the applet code extract

those chars should encode in \uxxxx form
you can see this ticket
http://code.google.com/p/google-gson/issues/detail?id=388#c4

With this line
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
the platform character encoding will be used (which may or may not be UTF-8). Try to set the encoding explicitly with
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));

Related

HTTP URL connection response

I am trying to hit the URL and get the response from my Java code.
I am using URLConnection to get this response. And writing this response in html file.
When opening this html in browser after executing the java class, I am getting only google home page and not with the results.
Whats wrong with my code, my code here,
FileWriter fWriter = null;
BufferedWriter writer = null;
URL url = new URL("https://www.google.co.in/?gfe_rd=cr&ei=aS-BVpPGDOiK8Qea4aKIAw&gws_rd=ssl#q=google+post+request+from+java");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String(encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Charset", "UTF-8");
connection.setDoInput(true);
connection.setRequestProperty("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream) connection.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(content));
String line;
try {
fWriter = new FileWriter(new File("f:\\fileName.html"));
writer = new BufferedWriter(fWriter);
while ((line = in.readLine()) != null) {
String s = line.toString();
writer.write(s);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Same code works couple of days back, but not now.
The reason is that this url does not return search results it self. You have to understand google's working process to understand it. Open this url in your browser and view its source. You will only see lots of javascript there.
Actually, in a short summary, google uses Ajax requests to process search queries.
To perform required task you either have to use a headless browser (the hard way) which can execute javascript/ajax OR better use google search api as directed by anand.
This method of searching is not advised is supposed to fail, you must use google search APIs for this kind of work.
Note: Google uses some redirection and uses token, so even if you will find a clever way to handle it, it is ought to fail in long run.
Edit:
This is a sample of how using Google search APIs you can get your work done in reliable way; please do refer to the source for more information.
public static void main(String[] args) throws Exception {
String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
String search = "stackoverflow";
String charset = "UTF-8";
URL url = new URL(google + URLEncoder.encode(search, charset));
Reader reader = new InputStreamReader(url.openStream(), charset);
GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
// Show title and URL of 1st result.
System.out.println(results.getResponseData().getResults().get(0).getTitle());
System.out.println(results.getResponseData().getResults().get(0).getUrl());
}

How to guarantee a java POST request string / text to be UTF-8 encoding

I have a textmessage/string with letters like ä,ü,ß. I want everything to be UTF-8 encoded. When I write to a file or print the string to console, everything is fine. But when I want to send the same string to a web service, I get instead of ä,ü,ß the following �
I read the file from a Servlet.
Do I really have to use the following 2 lines to get a UTF-8 encoded text?
byte [] bray = text.getBytes("UTF-8");
text = new String(bray);
.
public static String readAsStream_UTF8(String filePathName){
String text ="";
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("resources/"+filePathName);
if(input == null){
System.out.println("Inputstream null.");
}else{
InputStreamReader isr = null;
try {
isr = new InputStreamReader((InputStream)input, "UTF-8");
BufferedReader reader = new BufferedReader(isr);
StringBuilder sb = new StringBuilder();
String sCurrentLine;
while ((sCurrentLine = reader.readLine()) != null) {
sb.append(sCurrentLine);
}
text= sb.toString();
//it works only if I use the following 2 lines
byte [] bray = text.getBytes("UTF-8");
text = new String(bray);
} catch (Exception e1) {
e1.printStackTrace();
}
}
return text;
}
My sendPOST method looks something like the following:
String charset = "UTF-8";
OutputStreamWriter writer = null;
HttpURLConnection con = null;
String response_txt ="";
InputStream iss = null;
try {
URL url = new URL(urlService);
con = (HttpURLConnection)url.openConnection();
con.setDoOutput(true); //triggers POST
con.setDoInput(true);
con.setRequestMethod("POST");
con.setRequestProperty("accept-charset", charset);
//con.setRequestProperty("Content-Type", "application/soap+xml");
con.setRequestProperty("Content-Type", "application/soap+xml;charset=UTF-8");
writer = new OutputStreamWriter(con.getOutputStream());
writer.write(msg); //send POST data string
writer.flush();
writer.close();
What do I have to do to force the msg, that will be sent to the web service, to really be UTF-8 encoded.
If you know the encoding of the file which you want to send you don't need to convert it to an intermediary string. Simply copy its bytes to the output:
// inputstream to a UTF-8 encoded resource file
InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("resources/"+filePathName);
HttpURLConnection con = ...
// set contenttype and encoding
con.setRequestProperty("Content-Type", "application/soap+xml;charset=UTF-8");
// copy input to output
copy(in, con.getOutputStream());
using some copy function.
Additionally you could also set the Content-Length header to the size of the resource file.

HttpURLConnection response is incorrect

When using this code below to make a get request:
private String get(String inurl, Map headers, boolean followredirects) throws MalformedURLException, IOException {
URL url = new URL(inurl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setInstanceFollowRedirects(followredirects);
// Add headers to request.
Iterator entries = headers.entrySet().iterator();
while (entries.hasNext()) {
Entry thisEntry = (Entry) entries.next();
Object key = thisEntry.getKey();
Object value = thisEntry.getValue();
connection.addRequestProperty((String)key, (String)value);
}
// Attempt to parse
InputStream stream = connection.getInputStream();
InputStreamReader isReader = new InputStreamReader(stream );
BufferedReader br = new BufferedReader(isReader );
System.out.println(br.readLine());
// Disconnect
connection.disconnect();
return connection.getHeaderField("Location");
}
The resulting response is completely nonsensical (e.g ���:ks�6��﯐9�rђ� e��u�n�qש�v���"uI*�W��s)
However I can see in Wireshark that the response is HTML/XML and nothing like the string above. I've tried a myriad of different methods for parsing the InputStream but I get the same result each time.
Please note: this only happens when it's HTML/XML, plain HTML works.
Why is the response coming back in this format?
Thanks in advance!
=== SOLVED ===
Gah, got it!
The server is compressing the response when it contains XML, so I needed to use GZIPInputStream instead of InputSream.
GZIPInputStream stream = new GZIPInputStream(connection.getInputStream());
Thanks anyway!
use an UTF-8 encoding in input stream like below
InputStreamReader isReader = new InputStreamReader(stream, "UTF-8");

Java UTF-8 encoding not working HttpURLConnection

I tried to do post call and to pass input with this value - "ä€愛لآहที่"
I got error message
{"error":{"code":"","message":{"lang":"en-US","value":{"type":"ODataInputError","message":"Bad Input: Invalid JSON format"}}}}
This is my code
conn.setRequestMethod(ConnectionMethod.POST.toString());
conn.setRequestProperty(CONTENT_LENGTH, Integer.toString(content.getBytes().length));
conn.setRequestProperty("Accept-Charset", "UTF-8");
conn.setUseCaches(false);
conn.setDoInput(true);
conn.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
wr.writeBytes(content);
wr.flush();
wr.close();
InputStream resultContentIS;
String resultContent;
try {
resultContentIS = conn.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(resultContentIS));
StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
it falied on conn.getInputStream();
The value of content is
{ "input" : "ä€愛لآहที่" }
It is working where the input is String or integer
When I added the statement
conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
I got different message
{"error":{"code":"","message":{"lang":"en-US","value":{"type":"Error","message":"Internal server error"}}}}
Please try this code below:
DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(wr, "UTF-8"));
writer.write(content);
writer.close();
wr.close();
You should use JSONObject to pass params
The input, please try
BufferedReader reader = new BufferedReader(new InputStreamReader(resultContentIS, "UTF-8"));
If the out put is: ???????, so do not worry because your output console do not support UTF-8
It seems that your variable content does already have the wrong data because you may have converted a String without any attention to the required encoding.
Setting the correct enconding on the writer and use write() instead of writeBytes() should be worth a try.
You have to send content via byte array
DataOutputStream outputStream= new DataOutputStream(conn.getOutputStream());
outputStream.write(content.toString().getBytes());
This is completely solution for your file name character problems. The imported point is string sending via byte array. Every character changing via byte character. This is prevent your character encoding problems.

Wrong encoding with Java HttpURLConnection

Trying to read a generated XML from a MS Webservice
URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
line = buff.readLine();
text.append(line + "\n");
} while (line != null);
box.setText(text.toString());
or
URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
System.out.println(inputLine);
}
in.close();
Any page reads fine except the web service output
it reads the greater and less than signs strangely
it read < to "& lt;" and > to "& gt;" without spaces, but if i type them here without spaces stackoverflow makes them < and >
Please help
thanks
First there seem to be a confusion on this row:
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
This effectively says that you expect every row in the document that your server is providing to be URL encoded. URL encoding is not the same as document encoding.
http://en.wikipedia.org/wiki/Percent-encoding
http://en.wikipedia.org/wiki/Character_encoding
Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after.
In terms of document character encoding. You are making a conversion on this line:
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
conn.getContent() returns an InputStream that operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. Checkout the other constructors of InputStreamReader which takes the encoding as second argument. Without the second argument you are falling back on whatever is your platform default in java.
InputStreamReader(InputStream in, String charsetName)
for instance lets you change your code to:
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");
But the real question will be "what encoding is your server providing the content in?" If you own the server code too, you may just hard code it to something reasonable such as utf-8. But if it can vary, you need to look at the http header Content-Type to figure it out.
String contentType = conn.getHeaderField("Content-Type");
The contents of contentType will look like
text/plain; charset=utf-8
A short hand way of getting this field is:
String contentEncoding = conn.getContentEncoding();
Notice that it's entirely possible that no charset is provided, or no Content-Type header, in which case you must fall back on reasonable defaults.
Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format
Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way.

Categories