If I request the following URL
http://www.google.com/recaptcha/api/noscript?k=MYPUBLICKEY
I will get old no-script version of captcha, containing image of Google street number, like this
But if I'll do the same with HtmlUnit I will get some faked version of image, like this:
It happens all the time: real-world street number from browser and blackish distorted text from HtmlUnit. Public key is the same.
How can Google server distinguish between browser and HtmlUnit?
The HtmlUnit code is follows:
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
final HtmlPage page = webClient.getPage("http://www.google.com/recaptcha/api/noscript?k=" + getPublicKey());
HtmlImage image = page.<HtmlImage>getFirstByXPath("//img");
ImageReader imageReader = image.getImageReader();
Process is observable with Fiddler.
And how about setting correct Headers for your request? User-Agent is a key here.
Headers are the way that backend can get client information (Firefox, Chrome etc) and what is it in your case? Set correct headers eg. for Firefox:
conn.setRequestProperty("User-Agent", " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1) Gecko/20100101 Firefox/8.0.1");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
This snipped if from my code using Apache HttpClient, you need to adapt it to your needs.
I know this is old post but, good way is to use
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER);
How you solve your problem?
Related
i am trying to get list of stops(stations) for a train by passing train number and other required parameters(got from web developer tools-firefox) with the url(POST method), but i get 404-page not found error code. when i tried with POSTMAN, it gets the webpage with the requested data, what is wrong with the code?
Document doc= Jsoup.connect("https://enquiry.indianrail.gov.in/mntes/q?")
.data("opt","TrainRunning")
.data("subOpt","FindStationList")
.data("trainNo",trainNumber)
.data("jStation","")
.data("jDate","25-Aug-2021")
.data("jDateDay","Wed")
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0")
.referrer("https://enquiry.indianrail.gov.in/mntes/")
.ignoreHttpErrors(true)
.post();
System.out.println(doc.text());
thank you in advance
I've tried to make the request work with Jsoup but to no avail. An odd way of sending form data is being used. Form data is passed as URL query parameters in a POST request.
Jsoup uses a simplified HTTP API in which this particular use case was not foreseen. It is debatable whether it is appropriate to send form parameters the way https://enquiry.indianrail.gov.in/mntes expects them to be sent.
If you're using Java 11 or later, you could simply fetch the response of your POST request via the modern Java HTTP Client. It fully supports the HTTP protocol. You can then feed the returned String into Jsoup.
Here's what you could do:
// 1. Get the response
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://enquiry.indianrail.gov.in/mntes/q?opt=TrainRunning&subOpt=ShowRunC&trainNo=07482&jDate=25-Aug-2021&jDateDay=Wed&jStation=DVD%23false"))
.POST(BodyPublishers.noBody())
.build();
HttpResponse<String> response =
client.send(request, BodyHandlers.ofString());
// 2. Parse the response via Jsoup
Document doc = Jsoup.parse(response.body());
System.out.println(doc.text());
I've simply copy-pasted the proper URL from Postman. You might want to build your query string in a more robust way. See:
Java URL encoding of query string parameters
How to convert map to url query string?
I'm working on a pet project to scrape fantasy football stats from MY own fantasy league on ESPN. The problem that I'm running into that I can't seem to get past is the login which is needed before I can make requests for my league's page.
The URL I hit is
http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016
and by looking at the GET requests it looks like I get redirected to
http://games.espn.com/ffl/signin?redir=http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016
Which immediately gets me to a login prompt window. When I log in I inspect the POST request and note down all the Request Header. Looks like the requested URL on the POST is
https://registerdisney.go.com/jgc/v5/client/ESPN-FANTASYLM-PROD/guest/login?langPref=en-US
additionally I noted the following JSON objected is passed along:
{"loginValue":"myusername","password":"mypassword"}
using the Request Headers and JSON object I did the following:
String url = "http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016";
String rawData = "{\"loginValue\":\"myusername\",\"password\":\"mypassword\"}";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
con.setRequestProperty("Accept-Encoding", "gzip, deflate");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
con.setRequestProperty("Authorization", "APIKEY 8IYGqTgmpFTX51iF1ldp6MBtWrdQ0BxNUf8bg5/empOdV4u16KUSrnkJqy1DXy+QxV8RaxKq45o2sM8Omos/DlHYhQ==");
con.setRequestProperty("Cache-Control", "no-cache");
con.setRequestProperty("Content-Length", "52");
con.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
con.setRequestProperty("Expires", "-1");
con.setRequestProperty("Host", "registerdisney.go.com");
con.setRequestProperty("Origin", "https://cdn.registerdisney.go.com");
con.setRequestProperty("Pragma", "no-cache");
con.setRequestProperty("Referer", "https://cdn.registerdisney.go.com/v2/ESPN-ESPNCOM-PROD/en-US?include=config,l10n,js,html&scheme=http&postMessageOrigin=http%3A%2F%2Fwww.espn.com%2F&cookieDomain=www.espn.com&config=PROD&logLevel=INFO&topHost=www.espn.com&ageBand=ADULT&countryCode=US&cssOverride=https%3A%2F%2Fsecure.espncdn.com%2Fcombiner%2Fc%3Fcss%3Ddisneyid%2Fcore.css&responderPage=https%3A%2F%2Fwww.espn.com%2Flogin%2Fresponder%2Findex.html&buildId=157599bfa88");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
con.setRequestProperty("conversation-id", "5a4572f4-c940-454c-8f86-9af27345c894, adffddd3-8c31-41a0-84d7-7a0401cd2ad0");
con.setRequestProperty("correlation-id", "4d9ddc78-b00e-4c5a-8eec-87622961fd34")
con.setDoOutput(true);`
OutputStreamWriter w = new OutputStreamWriter(con.getOutputStream(), "UTF-8");
w.write(rawData);
w.close();
int responseCode = con.getResponseCode();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
Assuming I'm on the right track what I'm currently getting back from the server is server is
returned HTTP response code: 400 for URL: https://registerdisney.go.com/jgc/v5/client/ESPN-FANTASYLM-PROD/guest/login?langPref=en-US
Any ideas what is happening or if i'm taking the complete wrong approach here? I tried to use JSoup but had no luck either and I believe underneath JSoup uses HttpUrlConnection as well.
Do I need to do some sort of GET request first, save something then do the POST request? How should it work?
You are trying to emulate the behaviour of a Web Browser with JSoup. As you have experienced this is quite complicated and JSoup is not made for to impersonate a browser. When you start with crafting HTTP headers, then it's better to go another way.
The solution for your problem is to use a browser that can be programmatically manipulated. Selenium is more or less the defacto standard in Java.
Selenium starts your favorite browser (Firefox, Chrome, ..) and let you control it from your Java program. You can also retrieve the content of the web pages in order to scrap them with JSoup. Selenium is well documented, you will have no difficulty to find the required documentation/tutorial.
Another answer to your problem. While it is impossible for me to reproduce your issue (don't have football fantasy account and I have no intent to create one), I can still try to give some methodology help.
I would tackle the problem by using the network inspector from my browser, copy in a file all the exchanges between the browser and the server and try to reproduce this in my code.
The API key value in the Authorization header can only be reused for a limited time. If it is expired, the registration response body will contain an "API_KEY_INVALID" error.
I want to read mobile version of the website but my program reads the normal website.
I am using this property
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
What am I supposed to do?
The decision on which page to serve by the server is made based upon the "User-Agent" property of the request.
To get the mobile version of the page, take a look at this chrome dev article detailing chrome on android user agent strings, and set the "User-Agent" string in your header to be that of a mobile client; it doesn't look like the User-Agent string you have used in your question is that of a mobile client.
For example,
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost(url);
String userAgent = "Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19";
try {
httppost.setHeader("User-Agent", userAgent);
// Add your data
// Execute HTTP Post Request
HttpResponse response = httpclient.execute(httppost);
// ....
} catch ... {
This should give you the mobile version of a page, as would be seen by a Galaxy Nexus device.
Here is a list of a ton of mobile browser user agent strings: http://www.useragentstring.com/pages/Mobile%20Browserlist/
Maybe try a different user agent string like:
Opera/9.80 (J2ME/MIDP; Opera Mini/9 (Compatible; MSIE:9.0; iPhone; BlackBerry9700; AppleWebKit/24.746; U; en) Presto/2.5.25 Version/10.54
I am trying to check if a service is available and always returns the same error:
java.io.IOException: Server returned HTTP response code: 403 for URL
Internet browsing proposed that it was necessary to indicate the "USER-AGENT" and so I did, but the error remains the same:
openConnection.addRequestProperty("User-Agent", "Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 40.0.2214.91 Safari / 537.36");
The complete code is:
url = cadenaURL + cadenaEndpoint;
URLConnection openConnection = new URL(url).openConnection();
openConnection.connect();
is = openConnection.getInputStream();
if ("gzip".equals(openConnection.getContentEncoding()))
is = new GZIPInputStream(is);
and the error is in:
is = openConnection.getInputStream();
Someone could help me?
Thank You,
a greeting,
I am trying to check if a service is available
Simply put returning 403 means the service is not available. There's your answer.
There's a possibility you're using the wrong url and attempting to connect to the wrong service.
However it seems also that
is = openConnection.getInputStream();
This may not be enough to connect to an HTTP server. You need to properly format the request, you need to show more of the code and how you're using it for us to help you more.
Finally I changed URLConnection by HttpURLConnection and works perfect.
Thank You
I am writing an android application which uses a REST-based API on the server. So far the login works perfectly using HttpGet = I send the credentials, it sends me back a JSON response object containing session id or failure. I then moved onto using another get api (this one is passed the sessionid) and the response I get back looks like a valid one "200 - Ok" but the response body contains nothing - 0 text.
If I take the same URL and drop it into a browser, I get all the JSON text I expect displayed in the browser window. So what is the difference between a browser request/response and that of HttpGet? Any clues as to why my HttpGet might return a 'valid' nothing?
I had the same problem. Setting user agent solved my problem:
HttpParams params = new BasicHttpParams();
...
params.setParameter(CoreProtocolPNames.USER_AGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.71");
Thats my pull() I have written
mHttpGet.setURI(url.toURI());
mResponse = mHttpClient.execute(mHttpGet);
mResponse.getEntity().getContent(); // returns inputstream
How did you do yours?!
It turned out to be a server-side issue. They were actually sending me empty strings when the requester was not a browser. Too bad I can't delete a question. :(