i am trying to get list of stops(stations) for a train by passing train number and other required parameters(got from web developer tools-firefox) with the url(POST method), but i get 404-page not found error code. when i tried with POSTMAN, it gets the webpage with the requested data, what is wrong with the code?
Document doc= Jsoup.connect("https://enquiry.indianrail.gov.in/mntes/q?")
.data("opt","TrainRunning")
.data("subOpt","FindStationList")
.data("trainNo",trainNumber)
.data("jStation","")
.data("jDate","25-Aug-2021")
.data("jDateDay","Wed")
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0")
.referrer("https://enquiry.indianrail.gov.in/mntes/")
.ignoreHttpErrors(true)
.post();
System.out.println(doc.text());
thank you in advance
I've tried to make the request work with Jsoup but to no avail. An odd way of sending form data is being used. Form data is passed as URL query parameters in a POST request.
Jsoup uses a simplified HTTP API in which this particular use case was not foreseen. It is debatable whether it is appropriate to send form parameters the way https://enquiry.indianrail.gov.in/mntes expects them to be sent.
If you're using Java 11 or later, you could simply fetch the response of your POST request via the modern Java HTTP Client. It fully supports the HTTP protocol. You can then feed the returned String into Jsoup.
Here's what you could do:
// 1. Get the response
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://enquiry.indianrail.gov.in/mntes/q?opt=TrainRunning&subOpt=ShowRunC&trainNo=07482&jDate=25-Aug-2021&jDateDay=Wed&jStation=DVD%23false"))
.POST(BodyPublishers.noBody())
.build();
HttpResponse<String> response =
client.send(request, BodyHandlers.ofString());
// 2. Parse the response via Jsoup
Document doc = Jsoup.parse(response.body());
System.out.println(doc.text());
I've simply copy-pasted the proper URL from Postman. You might want to build your query string in a more robust way. See:
Java URL encoding of query string parameters
How to convert map to url query string?
Related
I'm trying to access an Amazon Prometheus Server but am getting an authentication error when the request is done through a Java API, although it works on Postman when the request is the same.
Note: The Authentication setup currently is done through AWS Signature Version 4 Signing (https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)
Here are the logs from Prometheus, showing all Request Headers used (hiding some potentially sensitive info)
GET https://{resource}-workspaces.us-east-1.amazonaws.com/XXXXXXXXXXX/api/v1/query
Host: {resource}-workspaces.us-east-1.amazonaws.com
X-Amz-Security-Token: FwoGZXIvYXdzEB4aDPxYxIKgd5JznR9gOyKkAkoDbjeTi79mRDgU6Hdd2AGlLwKnNGySAkNYwmKItTcSssS9zNZ+/s..........................................==
X-Amz-Date: 20221207T201649Z
Authorization: AWS4-HMAC-SHA256 Credential={AccessKey}/20221207/us-east-1/{Resource}/aws4_request, SignedHeaders=host;x-amz-date;x-amz-security-token, Signature=XXXXXXXXXXXXXXXXXXXXXXXXXX
User-Agent: PostmanRuntime/7.29.0
Accept: `*`/`*`
Cache-Control: no-cache
Postman-Token: ea3f71dd-db52-4869-a48a-2cf5f72d15b3`
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
If I copy almost this same Request using a Java API, it gives a 403 error.
Note: I'm making sure to include the X-Amz-Security-Token since this request needs a session token.
try {
final URL url = new URL("https://{resource}-workspaces.us-east-1.amazonaws.com/XXXXXXXXXXX/api/v1/query");
final HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setRequestMethod("GET");
httpConn.setRequestProperty("Host", "{resource}-workspaces.us-east-1.amazonaws.com");
httpConn.setRequestProperty("X-Amz-Security-Token", "FwoGZXIvYXdzEB4aDPxYxIKgd5JznR9gOyKkAkoDbjeTi79mRDgU6Hdd2AGlLwKnNGySAkNYwmKItTcSssS9zNZ+/s..........................................==");
httpConn.setRequestProperty("X-Amz-Date", "20221207T201649Z");
httpConn.setRequestProperty("Authorization", "AWS4-HMAC-SHA256 Credential={AccessKey}/20221207/us-east-1/{Resource}/aws4_request, SignedHeaders=host;x-amz-date;x-amz-security-token, Signature=XXXXXXXXXXXXXXXXXXXXXXXXXX");
httpConn.setRequestProperty("User-Agent", "Mozilla/5.0");
httpConn.setRequestProperty("Accept", "`*`/`*`");
httpConn.setRequestProperty("Cache-Control", "no-cache");
httpConn.setRequestProperty("Postman-Token", "ea3f71dd-db52-4869-a48a-2cf5f72d15b3");
httpConn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpConn.setRequestProperty("Connection", "keep-alive");
responseCode = httpConn.getResponseCode(); // 403 here
}
Things I've attempted:
Creating my own Signature instead of reusing the same made from the Postman request
Including/excluding some header properties set including: User-Agent, Accept, Cache-Control, Postman-Token, Accept-Encoding, Connection.
Any ideas on what could be leading to this 403 response?
Thanks
I tried to make a request to an AWS service using Signature Version 4 Signing - expecting a 200 response but getting a 403.
I just figured out the answer to this and wanted to share in case anyone else comes across this issue.
A couple of things.
First, if you copy the same request generated from Postman it will not work. This is because AWS Signature Version 4 generates a unique Signature after every request. So if you copy this "Authentication" header field, it's going to give you a 403. That leads me to...
Second, you need to generate your own signature through Amazon's four-step process. This can get very confusing if try to code it on your own. I would suggest trying two things. Either use AWS4Signer and utilize the sign method or follow the example AWS provides (this is what I ultimately chose).
Download that Zip and check out the GET example. Make sure you put the "x-amz-security-token" on the header if your request requires a session. And lastly, make sure your URL includes the encoded query parameter. This is what I spent most of my time on. My URL/query parameters looked something like this -
String notEncodedQueryParameter = "avg(process_runtime_jvm_classes_current_loaded{stack_name=\"dev\"})";
String encodedQueryParameter = URLEncoder.encode(notEncodedQueryParameter, "UTF-8");
new URL("https://aps-workspace-amazonaws.com/api/v1/query?query="+ encodedQueryParameter);
final Map<String, String> queryParameters = new HashMap<>();
queryParameters.put("query", notEncodedQueryParameter);
Then later I passed my query parameters to get encoded as part of the authorization signature being generated.
String authorization = signer.computeSignature(headers,
queryParameters,
AWS4SignerBase.EMPTY_BODY_SHA256,
awsAccessKey,
awsSecretKey);
I really recommend using that second link because if you don't get the correct response, it will give you a useful error message like
the request signature we calculated does not match the signature you
provided canonical request has different parameter encoding
Instead of just leaving you with a 403 error.
Best of luck.
I am building an agent in Java which has to solve games using a planner. The planner that I am using runs as a service on the cloud, and thus anybody can send HTTP requests to it and get a response. I have to send to it a JSON with the following content: {"domain": "string containing the domain's description", "problem": "string containing the problem to be solved"}. As a response I get a JSON that contains the status and the result, which might be a plan or not, depending on whether there was some problem or not.
The following piece of code allows me to call the planner and receive its response, retrieving the JSON object from the body:
String domain = readFile(this.gameInformation.domainFile);
String problem = readFile("planning/problem.pddl");
// Call online planner and get its response
String url = "http://solver.planning.domains/solve";
HttpResponse<JsonNode> response = Unirest.post(url)
.header("accept", "application/json")
.field("domain", domain)
.field("problem", problem)
.asJson();
// Get the JSON from the body of the HTTP response
JSONObject responseBody = response.getBody().getObject();
This code works pefectly fine and I don't have any kind of problem with it. Since I have to do some heavy testing on the agent, I prefer to run the server on localhost, so that the service doesn't get saturated (it can only process one request at a time).
However, if I try to send a request to the server running on localhost, the body of the HTTP request that the server receives is empty. Somehow, the JSON is not sent and I am receiving a response that contains an error.
The following piece of code illustrates how I am trying to send a request to the server running on localhost:
// Call online planner and get its response
String url = "http://localhost:5000/solve";
HttpResponse<JsonNode> response = Unirest.post(url)
.header("accept", "application/json")
.field("domain", domain)
.field("problem", problem)
.asJson();
For the sake of testing, I had previously created a small Python script that sends the same request to the server running on localhost:
import requests
with open("domains/boulderdash-domain.pddl") as f:
domain = f.read()
with open("planning/problem.pddl") as f:
problem = f.read()
data = {"domain": domain, "problem": problem}
resp = requests.post("http://127.0.0.1:5000/solve", json=data)
print(resp)
print(resp.json())
When executing the script, I get a correct response, and it seems that the JSON is sent correctly to the server.
Does anyone know why this is happening?
Okay, fortunately I have found an answer for this issue (don't try to code/debug at 2-3AM folks, it's never going to turn out right). It seems that the problem was that I was specifying what kind of response I was expecting to get from the server instead of what I was trying to send to it in the request's body:
HttpResponse response = Unirest.post(url)
.header("accept", "application/json")...
I was able to solve my problem by doing the following:
// Create JSON object which will be sent in the request's body
JSONObject object = new JSONObject();
object.put("domain", domain);
object.put("problem", problem);
String url = "http://localhost:5000/solve";
<JsonNode> response = Unirest.post(url)
.header("Content-Type", "application/json")
.body(object)
.asJson();
Now I am specifying in the header what type of content I am sending. Also, I have create a JSONObject instance that contains the information that will be added to the request's body. By doing this, it works on both the local and cloud servers.
Despite of this, I still don't really get why when I was calling the cloud server I was able to get a correct response, but it doesn't really matter now. I hope that this answer is helpful for someone who is facing a similar issue!
If I request the following URL
http://www.google.com/recaptcha/api/noscript?k=MYPUBLICKEY
I will get old no-script version of captcha, containing image of Google street number, like this
But if I'll do the same with HtmlUnit I will get some faked version of image, like this:
It happens all the time: real-world street number from browser and blackish distorted text from HtmlUnit. Public key is the same.
How can Google server distinguish between browser and HtmlUnit?
The HtmlUnit code is follows:
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
final HtmlPage page = webClient.getPage("http://www.google.com/recaptcha/api/noscript?k=" + getPublicKey());
HtmlImage image = page.<HtmlImage>getFirstByXPath("//img");
ImageReader imageReader = image.getImageReader();
Process is observable with Fiddler.
And how about setting correct Headers for your request? User-Agent is a key here.
Headers are the way that backend can get client information (Firefox, Chrome etc) and what is it in your case? Set correct headers eg. for Firefox:
conn.setRequestProperty("User-Agent", " Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0.1) Gecko/20100101 Firefox/8.0.1");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
This snipped if from my code using Apache HttpClient, you need to adapt it to your needs.
I know this is old post but, good way is to use
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER);
How you solve your problem?
In particular, this is with the website amazon.com to be specific. I am receiving a 503 error for their domain, but I can successfully parse other domains.
I am using the line
Document doc = Jsoup.connect(url).timeout(30000).get();
to connect to the URL.
You have to set a User Agent:
Document doc = Jsoup.connect(url).timeout(30000).userAgent("Mozilla/17.0").get();
(Or others; best you choose a browser user agent)
Else you'll get blocked.
Please see also: Jsoup: select(div[class=rslt prod]) returns null when it shouldn't
you can try
val ret=Jsoup.connect(url)
.userAgent("Mozilla/5.0 Chrome/26.0.1410.64 Safari/537.31")
.timeout(2*1000)
.followRedirects(true)
.maxBodySize(1024*1024*3) //3Mb Max
//.ignoreContentType(true) //for download xml, json, etc
.get()
it maybe works, maybe amazon.com need followRedirects set to true.
I am writing an android application which uses a REST-based API on the server. So far the login works perfectly using HttpGet = I send the credentials, it sends me back a JSON response object containing session id or failure. I then moved onto using another get api (this one is passed the sessionid) and the response I get back looks like a valid one "200 - Ok" but the response body contains nothing - 0 text.
If I take the same URL and drop it into a browser, I get all the JSON text I expect displayed in the browser window. So what is the difference between a browser request/response and that of HttpGet? Any clues as to why my HttpGet might return a 'valid' nothing?
I had the same problem. Setting user agent solved my problem:
HttpParams params = new BasicHttpParams();
...
params.setParameter(CoreProtocolPNames.USER_AGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.71");
Thats my pull() I have written
mHttpGet.setURI(url.toURI());
mResponse = mHttpClient.execute(mHttpGet);
mResponse.getEntity().getContent(); // returns inputstream
How did you do yours?!
It turned out to be a server-side issue. They were actually sending me empty strings when the requester was not a browser. Too bad I can't delete a question. :(