HTTPGet unicode characters appearing in response String - java

I have a utility used for integrating data and have run into an issue when special characters are used such as "Ã". Below is the method in question where the issue comes in. The response is from an API and is in xml format.
protected String getStringHttpContent(URI url, Map<String,String> headerParameters) throws IOException
{
HttpGet request = new HttpGet(url);
for(String parameter : headerParameters.keySet())
request.setHeader(parameter, headerParameters.get(parameter));
CloseableHttpResponse response = getClient().execute(request);
dumpHeaders(response);
BufferedReader br = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
StringBuffer sb = new StringBuffer();
String output;
while ((output = br.readLine()) != null) {
sb.append(output);
}
response.close();
return sb.toString();
}
The result of njÃmientill in the response string is njämientill. I've tried changing the encoding, but result remains the same. Any advice would be appreciated.

Make sure that you are using UTF-8 encoding end-to-end (through the whole chain). This includes you web pages and user input if it comes from a html form (for example), setting UTF-8 on pages, web services (web.xml, sun-web.xml or so). Also Inbound HttpRequest should include the header attribute "charset", eg. "Content-Type: text/html; charset=utf-8 ". The way your configure server-side and client-side depends on the technologies you use (which I don't know).
EDIT: regarding your comment, even if you are the client you should set the content-type to define which type of content you expect from the server (as this one may be able to serve different contents at the same URL).
Please try configure your HttpGet with:
request.setHeader(HttpHeaders.CONTENT_TYPE, "application/xml; charset=utf-8");
or (if the server is quite old):
request.setHeader(HttpHeaders.CONTENT_TYPE, "text/xml; charset=utf-8");
Better, maybe specify the accept header together with the accepted charset:
request.setHeader("Accept-Charset", "utf-8");
request.setHeader("Accept", "application/xml");
If none of these works I suggest you show your Postman query here or do a Wireshark capture to see the actual request and response, plus also list the content of the headerParameters map. Otherwise we cannot help you more (as the rest of your code looks good, to my opinion).

Related

java URLConnection to http URL and extracting XML response

I have a rest http URL from which I have to extract the XML response. When I browse the URL using a browser, it returns html content. My code also sees the same html content instead of XML content.
Is there a way to get the XML content instead of html content? In the below code, I am getting only the html response. But if I check with postman plugin in chrome it shows a nice XML response. How do I get the same response using my code.
public static void sendURL(String urlValue)throws Exception{
URL oracle = new URL("https://whois.arin.net/rest/asn/AS2639");
URLConnection yc = oracle.openConnection();
yc.setRequestProperty("content-type", "application/xml");
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
Try to replace this:
yc.setRequestProperty("content-type", "application/xml");
with this:
yc.setRequestProperty("Accept", "application/xml");
Indeed the main purpose is totally different, Content-Type describes what you have in the body of your request while Accept indicates to the server what kind of content the client can manage which is what you want to do.
Content-Type:
The MIME type of the body of the request (used with POST and PUT
requests)
Accept:
Content-Types that are acceptable for the response.
So you already have a stream. What you need to do next is to pass that stream to a library that can decode and parse XML. Try https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilder.html#parse(org.xml.sax.InputSource)
UPDATE
Sorry, your initial question was not very clear. If your java invocation of the HTTP request is yielding HTML and the one you want is an XML response, there must be some difference between the HTTP requests you make through the browser and through Java. You can use a tool like TCPMON to sit between your backend and your Java program to capture the raw HTTP request and then compare that with the one you make through the browser.
Since HTTP is a request/response pair, equivalent HTTP requests should always send back the same response.
I found the answer. Updated code. We just need to accept only the xml response.
public static void sendURL(String urlValue)throws Exception{
URL oracle = new URL("https://whois.arin.net/rest/asn/AS2639");
URLConnection yc = oracle.openConnection();
yc.setRequestProperty("accept", "application/xml");
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}

Getting RDF/XML web page using GET request with Accept header in Java

I want to send a GET requests that accept only results of type application/rdf+xml using the Accept: header. Is the following code right?
URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept", "application/rdf+xml");
InputStream response = connection.getInputStream();
#gigadot nailed it, the Accept header is a suggestion to the server which the server is free to ignore.
If your application can only accept RDF/XML then you need to add logic to the receipt of the request to enforce this.
You can use the getContentType() method of a URLConnection to see what content type the server returned you and take an appropriate action (e.g. report an error) if it does not match your requirements.

How can post data to form which is in other server

Iam creating automated system to post the data to the form for registering into the web site
URL url = new URL("https://www.walmart.com/subflow/YourAccountLoginContext/1471476370/sub_generic_login/create_account.do");
String postData = "firstName="+xlsDataList.get(0)+"&lastName="+xlsDataList.get(1)+"&userName="+xlsDataList.get(2)+"&userNameConfirm="+xlsDataList.get(3)+"&pwd="+xlsDataList.get(5)+"&pwdConfirm="+xlsDataList.get(6);
HttpsURLConnection uc = (HttpsURLConnection) url.openConnection();
uc.setDoInput(true);
uc.setDoOutput(true);
uc.setRequestMethod("POST");
uc.setRequestProperty("Accept", "*/*");
uc.setRequestProperty("Content-Length", Integer.toString(postData.getBytes().length));
uc.setRequestProperty("Content-Type", "text/html; charset=utf-8");
OutputStreamWriter outputWriter = new OutputStreamWriter(uc.getOutputStream());
outputWriter.write(postData);
outputWriter.flush();
outputWriter.close();
I thought that those above postdata are just request attributes , and coded accordingly. But after closely checking the view source, i came to know that those are form attributes.
I dnt have access to that form. Now how can i post the data to the form, so that the user get registered by the site?
i have to set the values to formbean.
Please provide your suggesions.
Your are using the wrong Content-Type in your POST: you need to use application/x-www-form-urlencoded. Once you change that, the server will interpret your request body as request parameters and (likely) your "formbean" will be filled with the data.
The above code may be a test case, but you really ought to take care to properly encode all of your data that you are trying to POST. Otherwise, you run the risk of either having a syntactically invalid request (in which case, the server will either reject the request, or ignore important parameters) or introducing a security vulnerability where a user can inject arbitrary request parameters into your POST. I highly recommend code that looks like this:
import java.net.URLEncoder;
String charset = "UTF-8"; // Change this if you want some other encoding
StringBuilder postData = new StringBuilder();
postData.append(URLEncoder.encode("firstName", charset));
postData.append("=");
postData.append(URLEncoder.encode(xlsDataList.get(0)), charset);
postData.append("&");
postData.append(URLEncoder.encode("lastName", charset));
postData.append("=");
postData.append(URLEncoder.encode(xlsDataList.get(1), charset));
postData.append("&");
postData.append(URLEncoder.encode("userName", charset));
postData.append("=");
postData.append(URLEncoder.encode(xlsDataList.get(2), charset));
postData.append("&");
postData.append(URLEncoder.encode("userNameConfirm", charset));
postData.append("=");
postData.append(URLEncoder.encode(xlsDataList.get(3), charset));
postData.append("&");
postData.append(URLEncoder.encode("pwd", charset));
postData.append("=");
postData.append(URLEncoder.encode(xlsDataList.get(5), charset));
postData.append("&");
postData.append(URLEncoder.encode("pwdConfirm", charset));
postData.append("=");
postData.append(xlsDataList.get(6), charset));
It seems silly to encode the static strings like "userNameConfirm", but if you get into that habit, you'll end up using it all the time and your code will be a lot safer.
Also, you need to make sure that the data you send through the OutputStream has the right Content-Length: you are computing the content-length properly, but then you aren't using the bytes you used for the computation to send to the client. You want your code to look more like this:
byte[] postDataBytes = postData.getBytes(charset);
uc.setRequestProperty("Content-Length", Integer.toString(postDataBytes.length));
uc.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
OutputStream outputStream = uc.getOutputStream();
outputStream.write(postDataBytes);
outputStream.flush();
outputStream.close();
You can find a very comprehensive HTTPUrlConnection tutorial in the community wiki: Using java.net.URLConnection to fire and handle HTTP requests
I recommend to use Apache HttpClient. its faster and easier to implement.
PostMethod post = new PostMethod("https://www.walmart.com/subflow/YourAccountLoginContext/1471476370/sub_generic_login/create_account.do");
NameValuePair[] data = {
new NameValuePair("firstName", "joe"),
new NameValuePair("lastName", "bloggs")
};
post.setRequestBody(data);
InputStream in = post.getResponseBodyAsStream();
// handle response.
For details you can refer http://hc.apache.org/
If your project uses Spring 3.x or later I would recommend using the Spring RestTemplate its pretty handy for doing http, code below will log do a form post.
public String login(String username, String password)
{
MultiValueMap<String, String> form = new LinkedMultiValueMap<>();
form.add(usernameInputFieldName, username);
form.add(passwordInputFieldName, password);
RestTemplate template = new RestTemplate();
URI location = template.postForLocation(loginUrl(), form);
return location.toString();
}
The "HTTP Error 500" you've described in your comment is an "Internal server error".
This means that the server either can't use your request (GET/POST) or there's a problem specific to the server you are trying to call.
Taking a look at the URL you're calling, I immediately the same Error 500.
Same happens for both GET and POST requests at httqs://www.walmart.com/subflow/YourAccountLoginContext/1471476370/sub_generic_login/create_account.do (Live link deactivated; replace "q" with "p" to make it work.)
In short: the generally returned "HTTP Error 500" from WallMart's servers prevents your call to succeed.
By the way:
It's not uncommon to get an error 500 instead of a 403 if they are locking your access down.
As you probably don't own the WallMart website and since you're trying to access levels of their websites that are worth to be protected from 3rd party acces, this might well be the case. ;)
PS: I'm not sure if it's wise to show the AccountLogin number in public like this. After all, it's the client ID of a specific WallMart account holder. But hey, that's your choice, not mine.
Also, double check the parameters you are sending. There may be some validations on input data the server is doing. Eg, some fields are mandatory, some are numbers only, etc.
Try spoofing as a browser by modifying the User Agent. WalMart may have a security mechanism that detects that you are doing this in an automated way.
(If you have problems setting the user agent see this post: Setting user agent of a java URLConnection)

403 error in accessing an URL but works fine in browsers

String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.
The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}
HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.
This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.
Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder
I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html
you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

Problem reading request body in servlet

I'am writing a HTTP proxy that is part of a test/verification
system. The proxy filters all requests coming from the client device
and directs them towards various systems under test.
The proxy is implemented as a servlet where each request is forwarded
to the target system, it handles both GET and POST. Somtimes the
response from the target system is altered to fit various test
conditions, but that is not the part of the problem.
When forwarding a request, all headers are copied except for those
that is part of the actual HTTP transfer such as Content-Length and
Connection headers.
If the request is a HTTP POST, then the entity body of the request is
forwarded as well and here is where it doesnt work sometimes.
The code reading the entity body from the servlet request is the following:
URL url = new URL(targetURL);
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
String method = request.getMethod();
java.util.Enumeration headers = request.getHeaderNames();
while(headers.hasMoreElements()) {
String headerName = (String)headers.nextElement();
String headerValue = request.getHeader(headerName);
if (...) { // do various adaptive stuff based on header
}
conn.setRequestProperty(headerName, headerValue);
}
// here is the part that fails
char postBody[] = new char[1024];
int len;
if(method.equals("POST")) {
logger.debug("guiProxy, handle post, read request body");
conn.setDoOutput(true);
BufferedReader br = request.getReader();
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(conn.getOutputStream()));
do {
logger.debug("Read request into buffer of size: " + postBody.length);
len = br.read(postBody, 0, postBody.length);
logger.debug("guiProxy, send request body, got " + len + " bytes from request");
if(len != -1) {
bw.write(postBody, 0, len);
}
} while(len != -1);
bw.close();
}
So what happends is that the first time a POST is received, -1
characters are read from the request reader, a wireshark trace shows
that the entity body containing URL encoded post parameters are there
and it is in one TCP segment so there is no network related
differences.
The second time, br.read successfully returns the 232 bytes in the
POST request entity body and every forthcoming request works as well.
The only difference between the first and forthcoming POST requests is
that in the first one, no cookies are present, but in the second one,
a cookie is present that maps to the JSESSION.
Can it be a side effect of entity body not being available since the
request processing in the servlet container allready has read the POST
parameters, but why does it work on forthcoming requests.
I believe that the solution is of course to ignore the entity body on
POST requests containing URL encoded data and fetch all parameters
from the servlet request instead using getParameter and reinsert them
int the outgoing request.
Allthough that is tricky since the POST request could contain GET
parameters, not in our application right now, but implementing it
correctly is some work.
So my question is basically: why do the reader from
request.getReader() return -1 when reading and an entity body is
present in the request, if the entity body is not available for
reading, then getReader should throw an illegal state exception. I
have also tried with InputStream using getInputStream() with the same
results.
All of this is tested on apache-tomcat-6.0.18.
So my question is basically: why do the reader from request.getReader() return -1 when reading.
It will return -1 when there is no body or when it has already been read. You cannot read it twice. Make sure that nothing before in the request/response chain has read it.
and an entity body is present in the request, if the entity body is not available for reading, then getReader should throw an illegal state exception.
It will only throw that when you have already called getInputStream() on the request before, not when it is not available.
I have also tried with InputStream using getInputStream() with the same results.
After all, I'd prefer streaming bytes than characters because you then don't need to take character encoding into account (which you aren't doing as far now, this may lead to future problems when you will get this all to work).
Seems, that moving
BufferedReader br = request.getReader()
before all operations, that read request (like request.getHeader() ), works for me well .

Categories