Trying to replicate POST petition to retrieve youtube comments - java

I'm trying to get the most recents comments from youtube videos, i'm replicating the POST petition as i see from the Firebug plugin.
The response code i get after executing the code is the 403
The 403 or "Forbidden" error message is a HTTP standard response code
indicating that the request was legal and understood but the server
refuses to respond to the request.
I'm not quite sure what info i'm missing to successfully get the comments from the video.
Map<String, String> cookies =new HashMap<String,String>();
cookies.put("session_token", session_token);
cookies.put("page_token", page_token);
Response res= Jsoup.connect(url_comments).header("Accept-Encoding", "gzip, deflate")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0")
.referrer(linkVideo)
.data(cookies)
.header("Host", "www.youtube.com")
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.header("Accept-Language", "es-MX,es-ES;q=0.9,es;q=0.7,es-AR;q=0.6,es-CL;q=0.4,en-US;q=0.3,en;q=0.1")
.header("Accept-Encoding", "gzip, deflate, br")
.header("X-YouTube-Client-Name", "1")
.header("X-YouTube-Client-Version", "1.20161213")
.header("X-YouTube-Page-CL", "141992279")
.header("X-YouTube-Page-Label", "youtube_20161213_0_RC1")
.header("X-YouTube-Variants-Checksum", "a55ebd64c41b17f87231bfa795156a50")
.header("Content-Type", "application/x-www-form-urlencoded")
.header("Cookie", "VISITOR_INFO1_LIVE=TRli2T2GQEQ; PREF=cvdm=list&gl=DE&f1=50000000&f5=30; _ga=GA1.2.527380759.1481667156; CONSENT=WP.25af40; YSC=vwoNBo1vTsg; ST-os04us=itct=CAIQ7pgBIhMI1Lef15X30AIVg2ROCh2zVgAw&csn=Bg1TWLXdKoSGuwXKyYzADg")
.header("Connection", "keep-alive").ignoreContentType(true)
.method(Method.POST).execute();
the session_token and page_token is info i get directly from the video webpage. Need help to solve this and get the first comments with the POST petition.

Related

Getting 503 error with HttpUrlConnection but site loading on browser

url = "https://www.lmcu.org/?__cf_chl_jschl_tk__=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs";
HttpURLConnection connection = pingHttpUrl(url);
responseCode = connection.getResponseCode();
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
logger.error("Caught exception : {}", e.getMessage());
throw new IOException();
}
return conn;
}
This gives response code as 503. But the site is properly loading on browser. What can be the issue with this ?
The problem is with the headers of the request. I found that this solution hosted on cloudflare requires two headers to be just so, otherwise you will receive the 503 response:
User-Agent - Your header specified chrome version 76, apparently the server has a problem with this. I had success with this User-Agent value: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
cookie - I found that the cookie value cf_clearance needs to be set, and possibly the other set-cookie values that are returned on the first request. This value has to do with the cloudflare support for privacy pass (https://blog.cloudflare.com/cloudflare-supports-privacy-pass/). It appears to be a means of verifying that a user is human, and not a machine. Which in turn is bad news for your efforts here.
I have a working solution below, but it will be hard to automate - since it will require you to establish a browser session, and use the cookie set there in the code. Upon expiration of the cf_clearance cookie, you will have visit the site again and reset the cookie value in the code.
I would also speculate that the User-Agent header of the request, is used in generating the cf_clearance cookie that is required. Making it more difficult to hijack the cookie, as you would have to use a matching User-Agent of the browser used for the request when the cf_clearance cookie was generated by cloudflare.
I have journaled my investigation here:
When visiting the URL in my browser:
https://www.lmcu.org/?cf_chl_jschl_tk=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs
And inspecting the response that the server is giving, it turns out that it is infact giving back a 503 as well:
For some reson that I can't make out, the browser is redirected to the below URL instead. I cannot see that the location header is passed back in the response, or find this URL anywhere in the response.
https://www.lmcu.org/?cf_chl_jschl_tk=fe835fdc1e7e2f5b2857ab5eb4be84e67d0e8c42-1592506549-0-AQ3E1piNGHg7O7lxgRyItR1U5BzB52q7GmCHe_HPJBsUHv8RcZCgqLPPtyngPmDjvy7pZDprPNK6ihKVEgQ7HqmbDSPXZ1aHPkBDs9re49u_Q_jI04etmtK7E0GIdxhKWCd-p4TR7b_b0JdnwzJOF6z4XaJQOgNU8kazJr5Mo96zxQpUlsKWPSumEmSfynkGeMDgkM-O1mN59LKp0p4kt-2O2IIFrlc8289ZbCSO6JghtvDsLsFDA3VxLV3Irn2W3KQ8sHg_TdwB-0g0WX9J-WTwedVYzj2a7uNtH377ZIritTXKqRw1qeQ6mkpxQ0h_OVMIl8XUiEC0Zj1KP50tUK8
I checked with Postman, and sure enough - I got the 503 error there as well. As far as I could tell, the server (or reverse proxy in front of it) was inspecting the headers of the request, and invalidating the request based on them. I fooled around a little, moving headers from the browser request into Postman, and finally figured out that it is a combination of the cookie and User-Agent headers being set that allows the request to be served.
The User-Agent header is not allowed to have the specified chrome version, I have it working with version 83 here.
The cookkie header is something that the browser will populate from my first visit to the site in the browser. So that is a bit harder to handle in your code. I tried to fetch it in code with connection.getHeaderField("set-cookie") but that cookie does not seem to cut it.
But! I was able to make the code work, when taking the cookie from my browser, and setting it manually in code, along with the User-Agent:
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
// This one does not work for the reason of the chrome version apparently
// conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36");
conn.addRequestProperty("cookie", "<cookie value from the browser, from the header on a successful request>");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(String.format("Caught exception : %s", e.getMessage()));
throw new IOException();
}
return conn;
}
I later found out that is is the cookie value from the cf_clearance key in the cookie that makes the difference.

How to handle 419 http response in Java

I am trying to parse through a particular website and am getting an HTTP response code of 419 when my java code calls it. I need to parse through the response to find content and I am stuck on the response code.
I have tried putting together a Java program using apache http client(version 4.5.6) to call a website that I need to parse. The http response code I get back is 419.
try (CloseableHttpClient httpclient = HttpClients.createDefault()) {
HttpGet httpGet = new HttpGet("http://www.website.com");
try (CloseableHttpResponse response1 = httpclient.execute(httpGet)) {
System.out.println(response1.getStatusLine());
HttpEntity entity1 = response1.getEntity();
EntityUtils.consume(entity1);
}
}
The result that it prints out is this:
HTTP/1.1 419 status code 419
I am expecting a 200
HTTP/1.1 200 OK
I get that when I change the website to google or other sites.
I was making a get request through HttpClient library as well as from POSTMAN and facing same 419 error. To solve this 419 error we need to add csrf token while making form submission.
However, In-case if you are still wondering how to find csrf token even when you are making a GET request and facing status 419. In my case I solved the problem by adding the user-agent: xxxx token in header.
Example:
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36
HttpClient Code:
connectionManager = new PoolingHttpClientConnectionManager();
...
...
...
httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.setRedirectStrategy(new LaxRedirectStrategy())
.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36")
.build();

REST Assured request is not read properly by JSON api

I'm sending some string eg:
private final String test = "{\"data\":{\"type\":\"test\",\"attributes\":{\"color\":\"yellow\",\"name\":\"TestN\"}}}";
via Rest Assured
given()
.header("Origin", "http://localhost:5000")
.header("Accept-Encoding", "gzip, deflate, br")
.header("Accept-Language", "pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7")
.header("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36")
.header("Content-Type", "application/vnd.api+json")
.header("Accept", "application/vnd.api+json")
.header("Cookie", "xxxxxx")
.header("Connection", "keep-alive")
.header("Cache-Control", "no-cache")
.header("Host", "localhost:4400")
.body(test).with()
.log().everything()
.when()
.post(base + "test-endpoint")
.then().statusCode(201);
unfortunately API responds with 500. I'm sending identical request via Postman and it works perfectly. Only difference is "assings" section. After Postman request it looks like:
assigns: %{
doc: %Jabbax.Document{
data: %Jabbax.Document.Resource{
attributes: %{"appointment_color" => "yellow", "name" => "TestN"},
id: nil,
links: %{},
meta: %{},
relationships: %{},
type: "test"
},
errors: [],
included: [],
jsonapi: %{version: "1.0"},
links: %{},
meta: %{}
}
},
when after Rest Assured request it's empty:
assigns: %{},
all of the headers are added and I've tried sending it as a string parsed from .json file. Everything gives same results. Somebody know what can be the problem?
The clue was that REST Assured added a charset information - similar issue was described here.

Getting 500 error from HttpClient, works in browser

I'm using Apache HttpClient to try to submit some post data to a server. Unfortunately, I don't have access to the server to get any log information so that won't be possible.
If I go through this process with Firefox, it works fine. (I do get a 302 warning on this particular page)
I have matched the Request headers of both Firefox and my program.
Firefox Request Headers:
Host: server ip
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://server ip/
Content-Type: application/x-www-form-urlencoded
Content-Length: 407
Cookie: sessionId=blahblah
Connection: keep-alive
Upgrade-Insecure-Requests: 1
My Programs Request Headers shown from context.getRequest().getAllHeaders();
Host: server ip
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://server ip/
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Content-Length: 406
Cookie: sessionId=blahblah
I have matched the body of the request by comparing the output of EntityUtils.toString(httpPost.getEntity(), "UTF-8"); and the built in tool for Firefox's tool to look at the request body, and they match almost character for character. (Just a slight difference in the session id which is expected as it's not using the same session.)
I'm not sure what else to check. What could be causing the server to behave differently between the Browser and the program?
Below is my code for the POST request.
HttpPost httpPost = new HttpPost("https://" + getIp() + "");
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("FTPUsername", "blah"));
params.add(new BasicNameValuePair("FTPPassword", "blah"));
params.add(new BasicNameValuePair("FormButtonSubmit", "OK"));
httpPost.setEntity(new UrlEncodedFormEntity(params));
httpPost.setHeader("Host", ip);
httpPost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0");
httpPost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
httpPost.setHeader("Accept-Language", "en-US,en;q=0.5");
httpPost.setHeader("Accept-Encoding", "gzip, deflate, br");
httpPost.setHeader("Referer", referer);
httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded");
httpPost.setHeader("Connection", "keep-alive");
httpPost.setHeader("Upgrade-Insecure-Requests", "1");
//Response
HttpResponse response = getHttpClient().execute(httpPost, LoginRequest.context);
int statusCode = response.getStatusLine().getStatusCode();
httpPost.releaseConnection();
I realize this could probably be many things since 500 is a server error, but it's got to be something I'm submitting wrong or I'm missing something as it works perfectly in the browser.
302 "warning" is actually a redirect. HTTP Client does do redirect automatically, you must flag the RedirectStrategy, For HttpClient 4.3:
HttpClient instance = HttpClientBuilder.create()
.setRedirectStrategy(new LaxRedirectStrategy()).build();
see examples in answer and w3 docs:
If the 302 status code is received in response to a request other than
GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user
Do you work with Windows machine? or Linux machine?
If you use a windows machine, have you tried working with WAMP server for Linux use LAMP server, so if you install it, you won't get those errors, that's how I fixed my error. Once if you install these two servers, change the port number in skype by logging into Skype and change the port number or uninstall your skype. It should work.

Android: Always getting GET on the server eventhough POST was sent

I am trying to send data using POST method from my android apps. However in
the server it is always recognized as GET. I am using Rails apps as the web
service. Here is the snippet of my Android code:
 
URI uri = new URI(hostName);
HttpPost httpRequest = new HttpPost(uri);
 httpRequest.addHeader("Accept", "application/json");
 httpRequest.addHeader("Content-Type", "application/json");
 List<NameValuePair> pairs = new ArrayList<NameValuePair>();
 pairs.add(new BasicNameValuePair("key1", "value1"));
 httpRequest.setEntity(new UrlEncodedFormEntity(pairs));
HttpClient httpClient = new DefaultHttpClient();
HttpResponse httpResponse = httpClient.execute(httpRequest);
Have I done anything wrong? Thanks for your help.
You're android code looks fine, make sure your log doesn't show a 301 redirect code for POST despite showing a 200 code for GET. Strangely, this can be the case depending on your host configuration.
e.g. You might see something like this :
123.156.189.123 - - [21/Oct/2011:09:03:34 -0700] "POST /server_script.php HTTP/1.1" 301 532 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1"
123.156.189.123 - - [21/Oct/2011:09:03:34 -0700] "GET /server_script.php HTTP/1.1" 200 250 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1"
Here the GET was not redirected (code 200), but the POST was (code 301). If this is happening then you need to override your redirect settings using a .htaccess or other configuration options.
Were you being redirected? I suspect that if you try to POST to domain A that redirects you to domain B, your request will be turned in to a GET request. I had the same problem until I decided to use the server's IP address in the POST request directly, instead of using a alphabet name that redirects to the IP.

Categories