jsoup request returns wrong status code - java

I have a Twitter shortened URL (t.co) and I'm trying to use jsoup to send a request and parse its response. There should be three redirect hops before reaching the final URL. This is not the case when using jsoup, even after setting followRedirects to true.
My code:
public static void main(String[] args) {
try {
Response response = Jsoup.connect("https://t. co/sLMy6zi4Yw").followRedirects(true).execute(); // Space intentional to avoid SOF shortened errors
System.out.println(response.statusCode()); // prints 200
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
However, using Python's Request library, I can get the right response:
response = requests.get('https://t. co/sLMy6zi4Yw', allow_redirects=False)
print(response.status_code)
301
I'm using jsoup version 1.11.2 and Requests version 2.18.4 with Python 3.5.2.
Anybody have any insight on the matter?

To overcome this special case you can remove the User-Agent header which Jsoup sets by default (for some unknown/undocument reason)
Connection connection = Jsoup.connect(url).followRedirects(true);
connection.request().removeHeader("User-Agent");
Let's examine the raw requests & view the server behavior
Request with user agent (to simulate a browser) returns
status code 200
Meta refresh which is a method of instructing a web browser to automatically refresh the current web page or frame after a given time interval, this case 0 seconds and url http://bit. ly/2n3VDpo
Javascript code which replaces location to the same url (google "meta refresh is depercated" / "drawbacks using meta refresh")
Curl example
curl --include --raw "https://t. co/sLMy6zi4Yw" --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
Response
Chrome/63.0.3239.132 Safari/537.36"
HTTP/1.1 200 OK
cache-control: private,max-age=300
content-length: 257
content-security-policy: referrer always;
content-type: text/html; charset=utf-8
referrer-policy: unsafe-url
server: tsa_b
strict-transport-security: max-age=0
vary: Origin
x-response-time: 20
x-xss-protection: 1; mode=block; report=https://twitter.com/i/xss_report
<head><meta name="referrer" content="always"><noscript><META http-equiv="refresh" content="0;URL=http://bit. ly/2n3VDpo"></noscript><title>http://bit. ly/2n3VDpo</title></head><script>window.opener = null;location.replace("http:\/\/bit. ly\/2n3VDpo")</script>
Request without user agent returns
status code 301
header "location" with the redirect url
Curl example
curl --include --raw "https://t. co/sLMy6zi4Yw"
HTTP/1.1 301 Moved Permanently
cache-control: private,max-age=300
content-length: 0
location: http://bit. ly/2n3VDpo
server: tsa_b
strict-transport-security: max-age=0
vary: Origin
x-response-time: 9

Related

How to print the whole response on the console

I can't print the whole response from the server on the console!
,
There are 3 ways to bypass this matter,
Add this header Connection: close
Replace HTTP/1.1 to HTTP/1.0
Add this s.close(); // Socket.close();
I can't close the connection because I want to send more than once at the same connection,
I just want to print the whole response without closing the connection.
String content = "GET /Zuck HTTP/1.1\r\nHost: www.facebook.com\r\nuser-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36\r\n\r\n";
Executing your code returns the following:
HTTP/1.1 302 Found
Location: https://www.facebook.com/zuck
Strict-Transport-Security: max-age=15552000; preload
Content-Type: text/html; charset="utf-8"
X-FB-Debug: NHDnNLmTeg5PBPiSL7++1dz/ZdRbnlnKy1gpdfBbLFkvrhbJMJT+nLJd1VYpmEkkkUtmvXsjgLvFEeML/82WUA==
Date: Thu, 18 Jun 2020 15:36:24 GMT
Alt-Svc: h3-27=":443"; ma=3600
Connection: keep-alive
Content-Length: 0
HTTP reponse status code 302 indicates a redirect to the Location: https://www.facebook.com/zuck. Either handle redirects in your code or - to get your example running - simply replace Zuck with zuck in your content string.
Since your operating on raw socket you actually cannot determine when you have received whole response. You can however do it with protocols like http in same cases.
In your example you receive Content-Length: 0 which tells the number (0) of bytes the body of message have.
You can also pass header Connection: close which closes connection after sending full response, but I think it is not what you're looking for.
You can also just do read/write operations on two separate threads.

Rebuild HTTP flow with incubated Java 10 HttpClient

I'm trying to rebuild the session setup to a web server by a Java HttpClient application. I have chosen the incubated HttpClient provided with Java 9 and Java 10.
With Chrome I captured this headers from a single request:
General
Request URL: https://<some_url>?user_id=1176&onlyDirectUserItems=true&onlyAssignedToUser=true&show=Unresolved&itemsFilter=0
Request Method: GET
Status Code: 302 Found
Remote Address: <theProxy>:8000
Referrer Policy: no-referrer-when-downgrade
Response Headers
Connection: Keep-Alive
Content-Length: 164
Content-Type: text/html; charset=iso-8859-1
Date: Fri, 08 Jun 2018 14:33:16 GMT
Keep-Alive: timeout=300, max=100
Location: https://<another_url>:443/nesp/app/plogin?agAppNa=app_me_company_ext&c=secure/name/password/uri&target=%22https://<another-usr>/browseIssues.spr?user_id=1176&onlyDirectUserItems=true&onlyAssignedToUser=true&show=Unresolved&itemsFilter=0%22
P3p: CP="NOI"
Server: Apache
Set-Cookie: IPCZQX03224bfb75=030003000000000000000000000000008f7aed69; path=/; domain=.me.de
Via: 1.1 <host> (Access Gateway-ag-7169149846802036-13837511)
Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Connection: keep-alive
Cookie: org.ditchnet.jsp.tabs.wiki=wiki-wysiwyg; ZNPCQ003-31393000=6c2f99a3; ZNPCQ003-32323200=cd188fdd
DNT: 1
Host: <host>
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
Query String Parameters
user_id: 1176
onlyDirectUserItems: true
onlyAssignedToUser: true
show: Unresolved
itemsFilter: 0
What can be seen the Response Header provides a URL (header-key: "location") which I need to grab and call next. But with my http client I fail with status-code 400 and get almost nothing
This is my code
url = "https://<some_url>?user_id=1176&onlyDirectUserItems=true&onlyAssignedToUser=true&show=Unresolved&itemsFilter=0";
HttpClient client = HttpClient.newBuilder()
.proxy(ProxySelector.of(new InetSocketAddress("<theProxy>", 8000)))
.cookieHandler(new CookieManager(null, CookiePolicy.ACCEPT_ALL))
.followRedirects(HttpClient.Redirect.SAME_PROTOCOL)
.build();
HttpRequest request = HttpRequest.newBuilder()
.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0")
.header("Upgrade-Insecure-Requests", "1")
// .header("Host", "<host>")
.header("Connection", "keep-alive")
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.header("Accept-Language", "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7")
.header("Accept-Encoding", "gzip, deflate, br")
.uri(new URI(url))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandler.asString());
HttpHeaders headers = response.headers();
Map<String, List<String>> headerMap= headers.map();
for (String key : headerMap.keySet()) {
System.out.println(">"+key+"<");
for (String value : headerMap.get(key)) {
System.out.println(" " + value);
}
}
System.out.println(response.statusCode());
System.out.println(response.body());
I have no clue what might be wrong and how to proceed to get this done. I hope someone can tell me what to ty next.
What I also do not understand: I had to remove the header "Host" - because I got the response: "Your browser sent a request that this server could not understand."
The very same header as can be found in the Chrome-listing
I could get it run with Apache HttpClient. I cannot tell what is wrong with the incubated HttpClient - but there is for sure a reason that it is not a fully integrated part of the Java 9/10 delivery

How to make $batch POST request using Olingo v2 and Java

I am trying to do a $batch request in Java using OData v2.
An example request from the browser would be something like below between the double quotes.
But how can I make this request programatically? Is there a sample call somewhere? Any help is appreciated.
Request URL: https://someUrl/project/odata/project/FOLDER/$batch
Request Method: POST
Status Code: 202 Accepted
Remote Address: 1.2.3.4:1234
Referrer Policy: no-referrer-when-downgrade
content-encoding: gzip
content-length: 5256
content-type: multipart/mixed; boundary=E828EB257B134AC6F567C8D3B67E666E1
dataserviceversion: 2.0
Accept: multipart/mixed
Accept-Encoding: gzip, deflate, br
Accept-Language: en
Connection: keep-alive
Content-Length: 595
Content-Type: multipart/mixed;boundary=batch_4edb-a2cd-948d
Cookie: project-usercontext=project-language=EN&project-client=100;
--Some cookie content--
DataServiceVersion: 2.0
Host: host.myClient.com:1234
MaxDataServiceVersion: 2.0
Origin: https://host.myClient.com:1234
Referer: https://host.myClient.com:1234/project/index.html
project-cancel-on-close: true
project-contextid-accept: header
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/1.2.3.4 Safari/537.36
x-csrf-token: 8Fd53yy2vuCjnaFKrZNuLg==
--batch_4edb-a2cd-948d
Content-Type: application/http
Content-Transfer-Encoding: binary
GET MyEntityDetailsSet HTTP/1.1
project-contextid-accept: header
Accept: application/json
Accept-Language: en
DataServiceVersion: 2.0
MaxDataServiceVersion: 2.0
project-cancel-on-close: true
> --batch_4edb-a2cd-948d
Content-Type: application/http
Content-Transfer-Encoding: binary
GET MyObjectSet HTTP/1.1
project-contextid-accept: header
Accept: application/json
Accept-Language: en
DataServiceVersion: 2.0
MaxDataServiceVersion: 2.0
project-cancel-on-close: true
--batch_4edb-a2cd-948d--
You can use Olingo V2 as an OData client (although a rather ugly one in my opinion). There is a full tutorial dedicated to this usage on the official Olingo site: How to use Apache Olingo as client library.
Olingo knows to build requests and parse responses, but you need an underlying mechanism to execute the HTTP calls. My recommendation would be to not rely on manually opening HttpURLConnections like in the above example, but to use something like Apache Http Client or some other dedicated library instead (in order to reduce the amount of code you write and also to have access to more advanced concepts like connection polling).
In a nutshell, you must first read and parse the metadata of the service that you want to consume:
// content = read the metadata as an InputStream
Edm dataModel = EntityProvider.readMetadata(content, false);
You can build a batch request via a fluent-style API:
BatchQueryPart part = BatchQueryPart.method("GET")
.uri("/Employees('1')")
.build();
// here you could have a larger list of parts, not just a singleton list
InputStream payload = EntityProvider.writeBatchRequest(
Collections.singletonList(part), "batch_boundary");
Then you have to just execute it using your HTTP request execution mechanism of choice (method = "POST" and body = the payload variable). Afterwards, you can parse the obtained response using Olingo:
// body = the response body received
// contentType = the Content-Type header received
List<BatchSingleResponse> responses =
EntityProvider.parseBatchResponse(responseBody, contentType);
// you can obtain the body for each request from the response list
String partBody = responses.get(0).getBody();
InputStream partStream = new ByteArrayInputStream(partBody.getBytes());
String partType = responses.get(0).getHeader(HttpHeaders.CONTENT_TYPE);
Lastly, using the Edm from the first step you can also parse each individual body based on the type of request that you build. For example you could use the readEntry method to de-serialize a single entity read:
// first we have to find the entity set you used to make the request
EdmEntitySet entitySet = edm.getDefaultEntityContainer()
.getEntitySet("Employees");
ODataEntry entry = EntityProvider.readEntry(partType, entitySet,
partStream, EntityProviderReadProperties.init().build())
Lastly, you can use the entry methods to get e.g. the properties.

how to test gwt requestbuilder with junit in gwttestcase?

My GWT app contains data which must be read out from xml-files which are on the client side.
For this i am using RequestBuilder.
RequestBuilder builder = new RequestBuilder(RequestBuilder.GET,GWT.getHostPageBaseURL()+"myFile.xml");
try{
builder.sendRequest(null, new RequestCallback() {
#Override
public void onResponseReceived(Request request,Response response) {
// read out data and put into to a list
}
});
.........
The data will be read out put it into an list and from this list the datas will be put into to the view.
How to test this?
When i try this in the GWTTestCase class with some assertEquals methods inside the onResponseReceived i get this error-message:
[WARN] 404 - GET /com.test.app.appName.JUnit/myFile.xml (192.168.2.102) 1466 bytes
Request headers
Host: 192.168.2.102:51731
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
Accept-Language: en-us
Accept: /
Connection: Keep-Alive
Referer: h t t_p://192.168.2.102:51731/com.test.app.appName.JUnit/junit-standards.html?gwt.codesvr=192.168.2.102:51727
Content-Type: text/plain; charset=utf-8
Response headers
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1466
What i am doing wrong?
Please help.
You should follow up with GWT Test code for RequestBuilder available in their source code. You can search as RequestBuilder test cases.
Also in your case you might be best served to just use mocking to avoid slowing down test cases.

Copy Contents of HttpMethod to another HttpMethod

I have been tryig to handle a redirect(302) in java code and now that I am done doing it by my code. I ran into an other problem in which after login, on click on any link I get logged out. So I checked my TCP Stream through wireshark and found that there are few HeaderRequests missing. After implementation of my code, Http Header are as follows :
GET /index.php/ HTTP/1.1
Host: 10.28.161.31
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 CentOS/3.6-3.el5.centos Firefox/3.6.24
Cookie: PHPSESSID=d488eea5e85afc8ec526c1a749e7ab20; path=/
Referrer: http://10.28.161.31
Cookie: $Version=0; PHPSESSID=d488eea5e85afc8ec526c1a749e7ab20; $Path=/ ???
and original Http Headers are as follows :
GET / HTTP/1.1
Host: 10.28.161.31
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 CentOS/3.6-3.el5.centos Firefox/3.6.24
Referer: http://10.28.161.31/index.php
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: PHPSESSID=978ee1e3b3696743c5c8f507a2ec7212
According to my observation, I did not copied the Header's content properly and that's why it is logging out quickly. So my question is that, how can I copy the complete content of HttpMethod to another HttpMethod? If any one can provide a code snippet or an example/tutorial would be great or If any one can give me a heads up on where I am doing things wrong, that would be appreciable.
My implementation is right here :
private HttpMethod loadHttp302Request(HttpMethod method, HttpClient client,
int status, String urlString) throws HttpException, IOException {
if (status!=302)
return null;
String[] url = urlString.split("/");
HttpMethod theMethod = new GetMethod(urlString + method.getResponseHeader("Location").getValue());
theMethod.setRequestHeader("Cookie", method.getResponseHeader("Set-Cookie")
.getValue());
theMethod.setRequestHeader("Referrer",url[0]+"//"+url[2]);
int _status = client.executeMethod(theMethod);
return theMethod;
}
HttpClient can automatically handle the redirects if you set the strategy. Follow this post on usage example Httpclient 4, error 302. How to redirect?

Categories