Java: can't get html from URL [duplicate] - java

This question already has answers here:
403 Forbidden with Java but not web browser?
(4 answers)
Closed 4 years ago.
I'm trying to read html file from URL. My code works with most of sites except some of them, such as http://dota2.gamepedia.com/Dota_2_Wiki. I guess I need to set java proxy or something?...
Here's my code:
try {
URL webPage = new URL("http://dota2.gamepedia.com/Dota_2_Wiki");
URLConnection con = webPage.openConnection();
con.setConnectTimeout(5000);
con.setReadTimeout(5000);
BufferedReader in = new BufferedReader(
newInputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
catch (MalformedURLException exc){exc.printStackTrace();}
catch (IOException exc){exc.printStackTrace();}
As the result:
java.io.IOException: Server returned HTTP response code: 403 for URL: http://dota2.gamepedia.com/Dota_2_Wiki
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1838)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
at com.Popov.Main.main(Main.java:17)
Error code 403: How can I get access to it? Btw, it works correctly in browser

Most likely your problem is because of not setting up user agent properly. for you guys who love vanilla java. these are the codes
private void sendGet() throws Exception {
String url = "http://dota2.gamepedia.com/Dota_2_Wiki";
URL obj = new URL(url);
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("User-Agent", USER_AGENT);
int responseCode = con.getResponseCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());
}
note that you also need to setup the cookie because when i try it without it, the code will give me to many redirect loop

You can simple try using jsoup html parser.See sample code;
public static void main(String[] args) throws IOException {
Document doc = Jsoup
.connect("http://dota2.gamepedia.com/Dota_2_Wiki")
.userAgent(
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36")
.timeout(0).followRedirects(true).execute().parse();
Elements titles = doc.select(".entrytitle");
// print all titles in main page
for (Element e : titles) {
System.out.println("text: " + e.text());
System.out.println("html: " + e.html());
}
// print all available links on page
Elements links = doc.select("a[href]");
for (Element l : links) {
System.out.println("link: " + l.attr("abs:href"));
}
}

I think your problem here is that the server doesn't accept your "user agent" string and returns a 403 forbidden code.
One answer suggested using Jsoup and setting the user agent manually, but didn't explain that setting the user agent is the crucial step. You could use that approach.
Or, you could read Setting user agent of a java URLConnection and set the user agent of the URLConnection yourself. This approach doesn't need any external libraries.

Related

Java - doing post request to login page

Im trying to log in to https://flow.polar.com/login page using java. I did the post request with 'email' and 'password' value. I should be redirected to page to authorize geting data from the user, but insted im getting 200 response and im redirecting to main page (https://flow.polar.com). I was checking all the values in the browser request option,as I log in normally, but still got this bug.
My question is, is the anything im missing to log in, or is there a method to click sign in button?
I want to also add that when I provide wrong email or password im getting 400 response. So everything seems to work fine except im redirecting to wrong page :(
Here is my code:
URL obj = new URL(url);
HttpsURLConnection conn1 = (HttpsURLConnection) obj.openConnection();
// Acts like a browser
conn1.setUseCaches(false); // just going to main page instead of logging!!!
conn1.setRequestMethod("POST");
conn1.setRequestProperty("Host", "flow.polar.com");
conn1.setRequestProperty("User-Agent", USER_AGENT);
conn1.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
conn1.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
conn1.setRequestProperty("Accept-Language", "pl,en-US;q=0.7,en;q=0.3");
conn1.setRequestProperty("Connection", "keep-alive");
conn1.setRequestProperty("Referer", "https://flow.polar.com/login?n=D6FQQAAAAAAAAAAACXDM2CSAIAIABYDX3GB5XJTBP5IPEIESYYNNTUUOLAD6JXLR7H5G4EKE2WFJJ4MIOOLP54XGF6GJ4Q5T2G7HFWFJR7TUVNPDSEJLO6AKWH3WG3IIFTRKIJCZKVEAKMCJDSUPYZSUV2WYMDFU5CPBOIZJBGZGCAAAAA%3D%3D%3D%3D%3D%3D");
conn1.setDoOutput(true);
conn1.setDoInput(true);
// Send post request
DataOutputStream wr = new DataOutputStream(conn1.getOutputStream());
wr.writeBytes(postParams);
System.out.println(postParams);
wr.flush();
wr.close();
int responseCode = conn1.getResponseCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + postParams);
System.out.println("Response Code : " + responseCode);
BufferedReader in =
new BufferedReader(new InputStreamReader(conn1.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
System.out.println("Response: ");
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
Okey. I got it working. The problem was that I didnt POST ALL input fileds. I forgot about hidden ones.
Here is working code:
Connection.Response response1 =
Jsoup.connect("https://flow.polar.com/login?n=XXX")
.userAgent(USER_AGENT)
.timeout(10 * 1000)
.method(Method.POST)
.data("email", "xxx#gmail.com")
.data("password", "passwd")
.data("returnUrl", "/?n=XXX")
.followRedirects(true)
.execute();

How i can get connected with qc 12 with rest api

Can u please help me to understand with simple piece of java code to get connect wth qc 12 using rest api.
I gone thorough the rest api documentation but am not clear with how to start with.but it will be helpful if people can show me a simple java code for authentication(login,logout or getting defect details) using rest api. Also want to know do i need to include any jars in my build path.
Thanks a lot friends.
I don't quite get what you're asking, but if you want to connect to a REST API, there are several ways... I usually use HttpURLConnection, here's an example of a get:
public String getProfile(String URL) throws IOException {
URL getURL = new URL(url);
//Establish a https connection with that URL.
HttpURLConnection con = (HttpURLConnection) getURL.openConnection();
//Select the request method, in this case GET.
con.setRequestMethod("GET");
//Add the request headers.
con.setRequestProperty("header", headerValue);
System.out.println("\nSending 'GET' request to URL : " + url);
int responseCode;
try {
responseCode = con.getResponseCode();
System.out.println("Response Code : " + responseCode);
} catch (Exception e) {
System.out.println("Error: Connection problem.");
}
InputStreamReader isr = new InputStreamReader(con.getInputStream());
BufferedReader br = new BufferedReader(isr);
StringBuffer response = new StringBuffer();
String inputLine;
while ((inputLine = br.readLine()) != null) {
//Save the response.
response.append(inputLine + '\n');
}
br.close();
return response.toString();
}

Read XML from valid URL not returned. Header formatting issue?

I am trying to use the code below to read from a valid url. I can copy and paste the url into my browser and it works perfectly (displays the xml) but when I try to access it programatically it returns nothing (no data and no error). I have already tried to set the user-agent via this post: Can't read in HTML content from valid URL but it didnt fix my problem. If it matters I am trying to do a single Eve API call. I believe the problem is that I do not have my headers formatted correctly, and the Eve site is rejecting the query. I can access the data fine using PHP, but I recently had to change languages.
public static void readFileToXML(String urlString,String fName)
{
try{
java.net.URL url = new java.net.URL(urlString);
System.out.println(url);
URLConnection cnx = url.openConnection();
cnx.setAllowUserInteraction(false);
cnx.setDoOutput(true);
cnx.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/531.0 (KHTML, like Gecko) Chrome/3.0.183.1 Safari/531.0");
System.out.println(cnx.getContentLengthLong());// a change suggested in the comments. returns -1
InputStream is = cnx.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
File file=new File("C:\\Users\\xxx\\Desktop\\"+fName);
BufferedWriter bw = new BufferedWriter(new FileWriter(file,false));
String inputLine;
while ((inputLine = br.readLine()) != null) {
bw.write(inputLine);
System.out.println(inputLine);
}
System.out.println("Finished read");
bw.close();
br.close();
}
catch(Exception e)
{
System.out.println("Exception: "+e.getMessage());
}
}

Java POST Connection Timeout Using HttpsUrlConnection

I have a question about making a POST request with Java, and since this is my first attempt at something of this magnitude, please bear with me. I am working on a third party application in Java to connect to a website and make POST requests. Am I doing this correctly? Here is what I have so far:
Website Code:
(This is the code the website has for "bumping a trade" which simply sends 2 pieces of data to a php file. The URL is http://cdn.dota2lounge.com/script/trades.js)
function bumpTrade(trade, code) {
$.ajax({
type: "POST",
url: "ajax/bumpTrade.php",
data: "trade=" + trade + "&code=" + code
});
}
My Java Code:
private void sendPost() throws Exception {
//String url = "https://www.cdn.dota2lounge.com/script/ajax/bumpTrade.php";
String url = "https://www.cdn.dota2lounge.com/script/ajax/bumpTrade.php";
URL obj = new URL(url);
HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();
//add request header
con.setRequestMethod("POST");
con.setRequestProperty("User-Agent", USER_AGENT);
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
String urlParameters = "trade=96510389&code=94cebd9";
// Send post request
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(urlParameters);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + urlParameters);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
System.out.println(response.toString());
}
However I am receiving a connection timeout error when attempting to connect. I would be very grateful if someone could point me in the right direction!
The Java client code seems to be on the right track. But it looks like the URL in the code was the wrong URL.
Using the url "http://www.dota2lounge.com/ajax/bumpTrade.php" and HttpUrlConnection, I was able to get a 200 response (OK):
Sending 'POST' request to URL : http://www.dota2lounge.com/ajax/bumpTrade.php
Post parameters : trade=96510389&code=94cebd9
Response Code : 200
However nothing beyond that. Not sure of the API of the remote site but hopefully that's some help.

Java HTTPUrlConnection returns 500 status code

I'm trying to GET a url using HTTPUrlConnection, however I'm always getting a 500 code, but when I try to access that same url from the browser or using curl, it works fine!
This is the code
try{
URL url = new URL("theurl");
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:14.0) Gecko/20100101 Firefox/14.0.1");
System.out.println(httpcon.getHeaderFields());
}catch (Exception e) {
System.out.println("exception "+e);
}
When I print the headerfields, it shows the 500 code.. when I change the URL to something else like google.com , it works fine. But I don't understand why it doesn't work here but it works fine on the browser and with curl.
Any help would be highly appreciated..
Thank you,
This is mostly happening because of encoding.
If you are using browser OK, but getting 500 ( internal server error ) in your program,it is because the browsers have a highly sophisticated code regarding charsets and content-types.
Here is my code and it works in the case of ISO8859_1 as charset and english language.
public void sendPost(String Url, String params) throws Exception {
String url=Url;
URL obj = new URL(url);
HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();
con.setRequestProperty("Acceptcharset", "en-us");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
con.setRequestProperty("charset", "EN-US");
con.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
String urlParameters=params;
// Send post request
con.setDoOutput(true);
con.setDoInput(true);
con.connect();
//con.
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(urlParameters);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + urlParameters);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
//print result
System.out.println(response.toString());
this.response=response.toString();
con.disconnect();
}
and in the main program , call it like this:
myclassname.sendPost("https://change.this2webaddress.desphilboy.com/websitealias/orwebpath/someaction","paramname="+URLEncoder.encode(urlparam,"ISO8859_1"))
The status code 500 suggests that the code at web server have been crashed .Use HttpURLConnection#getErrorStream() to get more idea of the error. Refer Http Status Code 500
I ran into the problem of "URL works in browser, but when I do http-get in java I get a 500 Error".
In my case the problem was that the regular http-get ended up in an infinite redirect loop between /default.aspx and /login.aspx
URL oUrl = new URL(url);
HttpURLConnection con = (HttpURLConnection) oUrl.openConnection();
con.setRequestMethod("GET");
...
int responseCode = con.getResponseCode();
What was happening was: The server serves up a three-part cookie and con.getResponseCode() only used one of the parts. The cookie data in the header looked like this:
header.key = null
value = HTTP/1.1 302 Found
...
header.key = Location
value = /default.aspx
header.key = Set-Cookie
value = WebCom-lbal=qxmgueUmKZvx8zjxPftC/bHT/g/rUrJXyOoX3YKnYJxEHwILnR13ojZmkkocFI7ZzU0aX9pVtJ93yNg=; path=/
value = USE_RESPONSIVE_GUI=1; expires=Wed, 17-Apr-2115 18:22:11 GMT; path=/
value = ASP.NET_SessionId=bf0bxkfawdwfr10ipmvviq3d; path=/; HttpOnly
...
So the server when receiving only a third of the needed data got confused: You're logged in! No wait, you have to login. No, you're logged in, ...
To work around the infinite redirect-loop I had to manually look for re-directs and manually parse through the header for "Set-cookie" entries.
con = (HttpURLConnection) oUrl.openConnection();
con.setRequestMethod("GET");
...
log.debug("Disable auto-redirect. We have to look at each redirect manually");
con.setInstanceFollowRedirects(false);
....
int responseCode = con.getResponseCode();
With this code the parsing of the cookie, if we get a redirect in the responseCode:
private String getNewCookiesIfAny(String origCookies, HttpURLConnection con) {
String result = null;
String key;
Set<Map.Entry<String, List<String>>> allHeaders = con.getHeaderFields().entrySet();
for (Map.Entry<String, List<String>> header : allHeaders) {
key = header.getKey();
if (key != null && key.equalsIgnoreCase(HttpHeaders.SET_COOKIE)) {
// get the cookie if need, for login
List<String> values = header.getValue();
for (String value : values) {
if (result == null || result.isEmpty()) {
result = value;
} else {
result = result + "; " + value;
}
}
}
}
if (result == null) {
log.debug("Reuse the original cookie");
result = origCookies;
}
return result;
}
Make sure that your connection allows following redirects - this is one of the possible reasons for difference in behaviour between your connection and the browser (allows redirect by default).
It should be returning code 3xx, but there maybe something else somewhere that changes it to 500 for your connection.
I faced the same issue, and our issue was there was a special symbol in one of the parameter values. We fixed it by using URLEncoder.encode(String, String)
In my case it turned out that the server always returns HTTP/1.1 500 (in Browser as in Java) for the page I wanted to access, but successfully delivers the webpage content nonetheless.
A human accessing the specific page via Browser just doesn't notice, since he will see the page and no error message, in Java I had to read the error stream instead of the input stream (thanks #Muse).
I have no idea why, though. Might be some obscure way to keep Crawlers out.
This is an old question, but I have had same issue and solved it this way.
This might help other is same situation.
In my case I was developing system on local environment, and every thing worked fine when I checked my Rest Api from browser but I got all the time thrown HTTP error 500 in my Android system.
The problem is when you work on Android, it works on VM (Virtual Machine), that said it means your local computer firewall might preventing your Virtual Machine accessing the local URL (IP) address.
You need just to allow that in your computer firewall. The same thing apply if you trying to access system from out side your network.
Check the parameter
httpURLConnection.setDoOutput(false);
Only for GET Method and set to true on POST, this save me lot of time!!!

Categories