UTF-8 response with servlet - java

I am reading HTTP response from a Perl page in a Servlet like this:
public String getHTML(String urlToRead) {
URL url;
HttpURLConnection conn;
BufferedReader rd;
String line;
String result = "";
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept-Charset", "UTF-8");
conn.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
while ((line = rd.readLine()) != null) {
byte [] b = line.getBytes();
result += new String(b, "UTF-8");
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
I am displaying this result with this code:
response.setContentType("text/plain; charset=UTF-8");
PrintWriter out = new PrintWriter(new OutputStreamWriter(response.getOutputStream(), "UTF-8"), true);
try {
String query = request.getParameter("query");
String type = request.getParameter("type");
String res = getHTML(url);
out.write(res);
} finally {
out.close();
}
But the response still is not encoded as UTF-8. What am I doing wrong?
Thanks in advance.

That call to line.getBytes() looks suspicious. You should probably make it line.getBytes("UTF-8") if you are certain that what is returned is UTF-8 encoded. Additionally, I'm not sure why it is even necessary. A typical approach to getting data out of a BufferedReader is to use a StringBuilder to continue appending each String retrieved from readLine into a result. The conversion back and forth between String and byte[] is unnecessary.
Change result into a StringBuilder and do this:
while ((line = rd.readLine()) != null) {
result.append(line);
}

Here is where you break the chain of character encoding conversions:
while ((line = rd.readLine()) != null) {
byte [] b = line.getBytes(); // NOT UTF-8
result += new String(b, "UTF-8");
}
From String#getBytes() javadoc:
Encodes this String into a sequence of bytes using the platform's
default charset, storing the result into a new byte array
And, defaullt charset is probably not UTF-8.
But why do all the conversions in the first place? Just read the raw bytes from the source and write the raw bytes to the consumer. It's supposed to be UTF-8 all the way.

I also faced the same problem in another scenario, but just do it I believe it will work:
byte[] b = line.getBytes(UTF8_CHARSET);
in the while loop:
while ((line = rd.readLine()) != null) {
byte [] b = line.getBytes(); // NOT UTF-8
result += new String(b, "UTF-8");
}

In my case, I have do add another configuration.
Previously, I was writing the page this way:
try (PrintStream printStream = new PrintStream(response.getOutputStream()) {
printStream.print(pageInjecting);
}
I changed to:
try (PrintStream printStream = new PrintStream(response.getOutputStream(), false, "UTF-8")) {
printStream.print(pageInjecting);
}

Related

How to guarantee a java POST request string / text to be UTF-8 encoding

I have a textmessage/string with letters like ä,ü,ß. I want everything to be UTF-8 encoded. When I write to a file or print the string to console, everything is fine. But when I want to send the same string to a web service, I get instead of ä,ü,ß the following �
I read the file from a Servlet.
Do I really have to use the following 2 lines to get a UTF-8 encoded text?
byte [] bray = text.getBytes("UTF-8");
text = new String(bray);
.
public static String readAsStream_UTF8(String filePathName){
String text ="";
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("resources/"+filePathName);
if(input == null){
System.out.println("Inputstream null.");
}else{
InputStreamReader isr = null;
try {
isr = new InputStreamReader((InputStream)input, "UTF-8");
BufferedReader reader = new BufferedReader(isr);
StringBuilder sb = new StringBuilder();
String sCurrentLine;
while ((sCurrentLine = reader.readLine()) != null) {
sb.append(sCurrentLine);
}
text= sb.toString();
//it works only if I use the following 2 lines
byte [] bray = text.getBytes("UTF-8");
text = new String(bray);
} catch (Exception e1) {
e1.printStackTrace();
}
}
return text;
}
My sendPOST method looks something like the following:
String charset = "UTF-8";
OutputStreamWriter writer = null;
HttpURLConnection con = null;
String response_txt ="";
InputStream iss = null;
try {
URL url = new URL(urlService);
con = (HttpURLConnection)url.openConnection();
con.setDoOutput(true); //triggers POST
con.setDoInput(true);
con.setRequestMethod("POST");
con.setRequestProperty("accept-charset", charset);
//con.setRequestProperty("Content-Type", "application/soap+xml");
con.setRequestProperty("Content-Type", "application/soap+xml;charset=UTF-8");
writer = new OutputStreamWriter(con.getOutputStream());
writer.write(msg); //send POST data string
writer.flush();
writer.close();
What do I have to do to force the msg, that will be sent to the web service, to really be UTF-8 encoded.
If you know the encoding of the file which you want to send you don't need to convert it to an intermediary string. Simply copy its bytes to the output:
// inputstream to a UTF-8 encoded resource file
InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream("resources/"+filePathName);
HttpURLConnection con = ...
// set contenttype and encoding
con.setRequestProperty("Content-Type", "application/soap+xml;charset=UTF-8");
// copy input to output
copy(in, con.getOutputStream());
using some copy function.
Additionally you could also set the Content-Length header to the size of the resource file.

HTTP GET request in java returns meaningless data from App Engine

I'm requesting a json file from an App Engine URL
http://1-1-26a.wordbuzzweb.appspot.com/json/level-images.json
The file encoding is UTF-8 without a BOM. If I look at this file on my local disk it's size is 12414 bytes. If I get the file in Chrome is reads it perfectly well. If I then save it it's 12414 bytes. However, if I try and download the file with a GET request in java I only get 780 bytes returned and the returned data would appear to be meaningless.
I've tried several different types of get request, both of the methods below I have used elsewhere perfectly effectively. The response code on the GET requests is 200. Interestingly, if I do a POST with no content instead of a GET, then I get the valid response.
If I download the file from this URL on Google Drive instead, then the GET methods below work perfectly.
edit This code is now working, however, this is a recurring issue that comes and goes. If anyone has any ideas what might be causing it then please say so!
This doesn't work
public static String doGetSync(String urlToRead) throws IOException {
URL url = new URL(urlToRead);
InputStream is = url.openStream();
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[BUFFER_SIZE];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
buffer.flush();
byte[] bytes = buffer.toByteArray();
return new String(bytes, "UTF-8");
}
Neither does this
public static String doGetSync2(String urlToRead) throws IOException {
final String charset = "UTF-8";
// Create the connection
HttpURLConnection connection = (HttpURLConnection) new URL(urlToRead).openConnection();
// Check the error stream first, if this is null then there have been no issues with the request
InputStream inputStream = connection.getErrorStream();
if (inputStream == null)
inputStream = connection.getInputStream();
// Read everything from our stream
BufferedReader responseReader = new BufferedReader(new InputStreamReader(inputStream, charset));
String inputLine;
StringBuilder response = new StringBuilder();
while ((inputLine = responseReader.readLine()) != null) {
response.append(inputLine);
}
responseReader.close();
return response.toString();
}
This code works
public static String doPostSync(final String url, final String content) throws IOException {
final String charset = "UTF-8";
// Create the connection
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
// setDoOutput(true) implicitly set's the request type to POST
connection.setDoOutput(true);
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("Content-type", "application/json");
// Write to the connection
OutputStream output = connection.getOutputStream();
output.write(content.getBytes(charset));
output.close();
// Check the error stream first, if this is null then there have been no issues with the request
InputStream inputStream = connection.getErrorStream();
if (inputStream == null)
inputStream = connection.getInputStream();
// Read everything from our stream
BufferedReader responseReader = new BufferedReader(new InputStreamReader(inputStream, charset));
String inputLine;
StringBuilder response = new StringBuilder();
while ((inputLine = responseReader.readLine()) != null) {
response.append(inputLine);
}
responseReader.close();
return response.toString();
}

How to escape the NÃO while reading the URL

InputStream in = address.openStream();
URL url = new URL("://www.mydomain.com/?param1=NÃO&param2=NÃO");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder result = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
result.append(line);
}
System.out.println(result.toString());
But when i am trying to put the result into StringBuilder the NÃO Special character à is getting escaped
How to bring it with out losing the char set value ?
I believe you want to use URLEncoder.encode(String, String) to encode your parameter like
try {
String value = URLEncoder.encode("NÃO", "utf-8");
String url = "://www.mydomain.com/?param1=" + value + "&param2="
+ value;
System.out.println(url);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Output is
://www.mydomain.com/?param1=N%C3%83O&param2=N%C3%83O

Grabbing JSON works from one link, not from another

I'm doing a simple JSON grab from two links with the same code. I'm doing it two separate times, so the cause of my issue isn't because they're running into each other or something.
Here is my code:
#Override
protected String doInBackground(Object... params) {
try {
URL weatherUrl = new URL("my url goes here");
HttpURLConnection connection = (HttpURLConnection) weatherUrl
.openConnection();
connection.connect();
responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
InputStream inputStream = connection.getInputStream();
Reader reader = new InputStreamReader(inputStream);
int contentLength = connection.getContentLength();
char[] charArray = new char[contentLength];
reader.read(charArray);
String responseData = new String(charArray);
Log.v("test", responseData);
When I try this with:
http://www.google.com/calendar/feeds/developer-calendar#google.com/public/full?alt=json
I get an error of having an array lenth of -1
For this link:
http://api.openweathermap.org/data/2.5/weather?id=5815135
It returns fine and I get a log of all of the JSON. Does anyone have any idea why?
Note: I tried stepping through my code in debug mode, but I couldn't catch anything. I also downloaded a Google chrome extension for parsing json in the browser and both urls look completely valid. I'm out of ideas.
Log this: int contentLength = connection.getContentLength();
I don't see the google url returning a content-length header.
If you just want String output from a url, you can use Scanner and URL like so:
Scanner s = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A");
out = s.next();
s.close();
(don't forget try/finally block and exception handling)
The longer way (which allows for progress reporting and such):
String convertStreamToString(InputStream is) throws UnsupportedEncodingException {
BufferedReader reader = new BufferedReader(new
InputStreamReader(is, "UTF-8"));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null)
sb.append(line + "\n");
} catch (IOException e) {
// Handle exception
} finally {
try {
is.close();
} catch (IOException e) {
// Handle exception
}
}
return sb.toString();
}
}
and then call String response = convertStreamToString( inputStream );

How to convert the DataInputStream to the String in Java?

I want to ask a question about Java. I have use the URLConnection in Java to retrieve the DataInputStream. and I want to convert the DataInputStream into a String variable in Java. What should I do? Can anyone help me. thank you.
The following is my code:
URL data = new URL("http://google.com");
URLConnection dataConnection = data.openConnection();
DataInputStream dis = new DataInputStream(dataConnection.getInputStream());
String data_string;
// convent the DataInputStream to the String
import java.net.*;
import java.io.*;
class ConnectionTest {
public static void main(String[] args) {
try {
URL google = new URL("http://www.google.com/");
URLConnection googleConnection = google.openConnection();
DataInputStream dis = new DataInputStream(googleConnection.getInputStream());
StringBuffer inputLine = new StringBuffer();
String tmp;
while ((tmp = dis.readLine()) != null) {
inputLine.append(tmp);
System.out.println(tmp);
}
//use inputLine.toString(); here it would have whole source
dis.close();
} catch (MalformedURLException me) {
System.out.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.out.println("IOException: " + ioe);
}
}
}
This is what you want.
You can use commons-io IOUtils.toString(dataConnection.getInputStream(), encoding) in order to achieve your goal.
DataInputStream is not used for what you want - i.e. you want to read the content of a website as String.
If you want to read data from a generic URL (such as www.google.com), you probably don't want to use a DataInputStream at all. Instead, create a BufferedReader and read line by line with the readLine() method. Use the URLConnection.getContentType() field to find out the content's charset (you will need this in order to create your reader properly).
Example:
URL data = new URL("http://google.com");
URLConnection dataConnection = data.openConnection();
// Find out charset, default to ISO-8859-1 if unknown
String charset = "ISO-8859-1";
String contentType = dataConnection.getContentType();
if (contentType != null) {
int pos = contentType.indexOf("charset=");
if (pos != -1) {
charset = contentType.substring(pos + "charset=".length());
}
}
// Create reader and read string data
BufferedReader r = new BufferedReader(
new InputStreamReader(dataConnection.getInputStream(), charset));
String content = "";
String line;
while ((line = r.readLine()) != null) {
content += line + "\n";
}

Categories