Using URLConnetion.getInputStream() to get source code (amazon.de)

Using URLConnetion.getInputStream() to get source code (amazon.de) - java

When I want to get the source code of a specific web page, I use following code:
URL url = new URL("https://google.de");
URLConnection urlConnect = url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(urlConnect.getInputStream())); //Here is the error with the amazon url
StringBuffer sb = new StringBuffer();
String line, htmlData;
while((line=br.readLine())!=null){
sb.append(line+"\n");
}
htmlData = sb.toString();
The code above works without problems, but when your url is called...
URL url = new URL("https://amazon.de");
...then you might get sometimes a IOException error -> Server error code 503. In my opinion, this doesn't make any sense, because I can enter the amazon web page with the browser without any errors.

When accessing https://amazon.de with curl -v https://amazon.de you either get a 503 or a 301 status code in the response (When following the redirect, you get a 503 from the referenced location https://www.amazon.de/). The body contains the following comment:
To discuss automated access to Amazon data please contact api-services-support#amazon.com.
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.de/ref=rm_5_sv, or our Product Advertising API at https://partnernet.amazon.de/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
I assume Amazon is returning this response when your request is detected as coming from a non browser context (i.e. by parsing the user agent) to hint you towards using the APIs and not crawling the sites directly.

Related

Graph API in SSO is not working in Azure AD

I am trying to develop a Java web application with SSO by following this azure tutorial. I created an account in Azure and created an AD. Developed and deployed the code in Tomcat. When I try to access the page, I am getting the following error
Exception - java.io.IOException: Server returned HTTP response code: 403 for URL: https://graph.windows.net/ppceses.onmicrosoft.com/users?api-version=2013-04-05
I do not find enough answers for this error. I changed the api-version to 1.6. Even then it did not work.
MORE ANALYSIS:
After troubleshooting, I found out that the logged-in user info is fetched and is available in Sessions object. It errors out when its trying to read the response and convert into the String object. Following is the calling method where it errors out.
HttpClientHelper.getResponseStringFromConn(conn, true);
Actual method to write the response into String:
public static String getResponseStringFromConn(HttpURLConnection conn, boolean isSuccess) throws IOException {
BufferedReader reader = null;
if (isSuccess) {
reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
} else {
reader = new BufferedReader(new InputStreamReader(conn.getErrorStream()));
}
StringBuffer stringBuffer = new StringBuffer();
String line = "";
while ((line = reader.readLine()) != null) {
stringBuffer.append(line);
}
return stringBuffer.toString();
}
The actual issue is on the Graphic API where we try to read the response in the String format.

#Anand, According to Microsoft Graph error responses and resource types, the response code 403 means Forbidden below.
Access is denied to the requested resource. The user might not have enough permission.
Please move to the CONFIGURE tab of your application registered in your AAD domain on Azure classic portal, then check whether enable enough permission, please see the figure below.

I got the same error, been struggling with it a few days. What I noticed was that even if I checked ALL permissions for Windows Azure Active Directory I still got the 403. So, I deleted the app in App Registrations and created it again from scratch, generated new application key and readded reply urls. In Required Permissions/Windows Azure Active Directory check:
Sign in and read user profile
Access the directory as the signed-in user
I can now call me/memberOf successfully.
Hope it helps.

The below worked for me.
At the active directory app registrations -> app ->settings->permissions-> enable delegate permissions to read directory data. Save and close the blade. Also Click Grant Permissions and close the blade.
Once the above is done, Log out and Log in back with a fresh token to the application. (Guess the token with prior authorizations will not reflect the latest permission changes and hence the re-login may have worked in my case)

Taking text from a response web page using Java

I am sending commands to a server using http, and I currently need to parse a response that the server sends back (I am sending the command via the command line, and the servers response appears in my browser).
There are a lot of resources such as this: Saving a web page to a file in Java, that clearly illustrate how to scrape a page such as cnn.com. However, since this is a response page that is only generated when the camera receives a specific command, my attempts to use the method described by Mike Deck (in the link above) have met with failure. (Specifically, when my program requests the page again the server returns a 401 error.)
The response from the server opens a new tab in my browser. Essentially, I need to know how to save the current web page using java, since reading in a file is probably the most simple way to approach this. Do any of you know how to do this?
TL;DR How do you save the current webpage to a webpage.html or webpage.txt file using java?
EDIT: I used Base64 from the Apache commons codec, which solved my 401 authentication issue. However, I am still getting a 400 error when I attempt to connect my InputStream (see below). Does this mean a connection isn't being established in the first place?
URL url = new URL ("http://"+ipAddress+"/axis-cgi/record/record.cgi?diskid=SD_DISK");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String (encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setDoInput (true);
connection.setRequestProperty ("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream)connection.getInputStream();
BufferedReader in = new BufferedReader (new InputStreamReader (content));
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
EDIT 2: Changing the request to a GET resolved the issue.

So while scrutinizing my code above, I decided to change
connection.setRequestMethod("POST");
to
connection.setRequestMethod("GET");
This solved my problem. In hindsight, I think the server was not recognizing the HTTP because it is not set up to handle the various trappings that come along with post.

Error 503 in HTTP during page parsing in java

Today I'm developing a java RMI server (and also the client) that gets info from a page and returns me what I want. I put the code right down here. The problem is that sometimes the url I pass to the method throws an IOException that says that the url given makes a 503 HTTP error. It could be easy if it was always that way but the thing is that it appears sometimes.
I have this method structure because the page I parse is from a weather company and I want info from many cities, not only for one, so some cities works perfectly at the first chance and others it fails. Any suggestions?
public ArrayList<Medidas> parse(String url){
medidas = new ArrayList<Medidas>();
int v=0;
String sourceLine;
String content = "";
try{
// The URL address of the page to open.
URL address = new URL(url);
// Open the address and create a BufferedReader with the source code.
InputStreamReader pageInput = new InputStreamReader(address.openStream());
BufferedReader source = new BufferedReader(pageInput);
// Append each new HTML line into one string. Add a tab character.
while ((sourceLine = source.readLine()) != null){
if(sourceLine.contains("<tbody>")) v=1;
else if (sourceLine.contains("</tbody>"))
break;
else if(v==1)
content += sourceLine + "\n";
}
........................
........................ NOW THE PARSING CODE, NOT IMPORTANT
}

HTTP 500 errors reflect server errors so it has likely nothing to do with your client code.
You would get a 400 error if you were passing invalid parameters on your request.
503 is "Service Unavailable" and may be sent by the server when it is overloaded and cannot process your request. From a publicly accessible server, that could explain the erratic behavior.
Edit
Build a retry handler in your code when you detect a 503. Apache HTTPClient can do that automatically for you.
List of HTTP Status Codes

Check that the IOException is really not a MalformedURLException. Try printing out the URLs to verify a bad URL is not causing the IOException.
How large is the file you are parsing? Perhaps your JVM is running out of memory.

Response code 401 when accesing rest based web services with correct credentials

I am getting Unauthorized error when accessing restful web services. My sample program looks like this.
public static void main(String[] args){
// Use apache commons-httpclient to create the request/response
HttpClient client = new HttpClient();
Credentials defaultcreds = new UsernamePasswordCredentials("aaa", "cdefg");
client.getState().setCredentials(AuthScope.ANY, defaultcreds);
GetMethod method = new GetMethod(
"http://localhost:8080/userService/usersByID/1234");
try {
client.executeMethod(method);
InputStream in = method.getResponseBodyAsStream();
// Use dom4j to parse the response and print nicely to the output stream
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder out = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
out.append(line);
}
System.out.println(out.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
My credentials are correct. My web services will consume Basic Http Authentication.
I have doubt at scope of authentication.
client.getState().setCredentials(AuthScope.ANY, defaultcreds);
My credentials are correct.
Can any one help to resolve this issue.
Thanks.

First check your url via browser and verify ?? as mentioned here
Fixing 401 errors - general
Each Web Server manages user authentication in its own way. A security officer (e.g. a Web Master) at the site typically decides which users are allowed to access the URL. This person then uses Web server software to set up those users and their passwords. So if you need to access the URL (or you forgot your user ID or password), only the security officer at that site can help you. Refer any security issues direct to them.
If you think that the URL Web page *should* be accessible to all and sundry on the Internet, then a 401 message indicates a deeper problem. The first thing you can do is check your URL via a Web browser. This browser should be running on a computer to which you have never previously identified yourself in any way, and you should avoid authentication (passwords etc.) that you have used previously. Ideally all this should be done over a completely different Internet connection to any you have used before (e.g. a different ISP dial-up connection). In short, you are trying to get the same behaviour a total stranger would get if they surfed the Internet to the Web page.
If this type of browser check indicates no authority problems, then it is possible that the Web server (or surrounding systems) have been configured to disallow certain patterns of HTTP traffic. In other words, HTTP communication from a well-known Web browser is allowed, but automated communication from other systems is rejected with an 401 error code. This is unusual, but may indicate a very defensive security policy around the Web server.
Manual Fix
Hit the url from the browser and record the HTTP traffic (Headers,body)
Run the Java client code and record the HTTP traffic (Headers,body)
Analyze and fix the differences

403 error in accessing an URL but works fine in browsers

String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.

The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}

HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.

This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.

Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder

I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html

you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.