url = "https://www.lmcu.org/?__cf_chl_jschl_tk__=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs";
HttpURLConnection connection = pingHttpUrl(url);
responseCode = connection.getResponseCode();
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
logger.error("Caught exception : {}", e.getMessage());
throw new IOException();
}
return conn;
}
This gives response code as 503. But the site is properly loading on browser. What can be the issue with this ?
The problem is with the headers of the request. I found that this solution hosted on cloudflare requires two headers to be just so, otherwise you will receive the 503 response:
User-Agent - Your header specified chrome version 76, apparently the server has a problem with this. I had success with this User-Agent value: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
cookie - I found that the cookie value cf_clearance needs to be set, and possibly the other set-cookie values that are returned on the first request. This value has to do with the cloudflare support for privacy pass (https://blog.cloudflare.com/cloudflare-supports-privacy-pass/). It appears to be a means of verifying that a user is human, and not a machine. Which in turn is bad news for your efforts here.
I have a working solution below, but it will be hard to automate - since it will require you to establish a browser session, and use the cookie set there in the code. Upon expiration of the cf_clearance cookie, you will have visit the site again and reset the cookie value in the code.
I would also speculate that the User-Agent header of the request, is used in generating the cf_clearance cookie that is required. Making it more difficult to hijack the cookie, as you would have to use a matching User-Agent of the browser used for the request when the cf_clearance cookie was generated by cloudflare.
I have journaled my investigation here:
When visiting the URL in my browser:
https://www.lmcu.org/?cf_chl_jschl_tk=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs
And inspecting the response that the server is giving, it turns out that it is infact giving back a 503 as well:
For some reson that I can't make out, the browser is redirected to the below URL instead. I cannot see that the location header is passed back in the response, or find this URL anywhere in the response.
https://www.lmcu.org/?cf_chl_jschl_tk=fe835fdc1e7e2f5b2857ab5eb4be84e67d0e8c42-1592506549-0-AQ3E1piNGHg7O7lxgRyItR1U5BzB52q7GmCHe_HPJBsUHv8RcZCgqLPPtyngPmDjvy7pZDprPNK6ihKVEgQ7HqmbDSPXZ1aHPkBDs9re49u_Q_jI04etmtK7E0GIdxhKWCd-p4TR7b_b0JdnwzJOF6z4XaJQOgNU8kazJr5Mo96zxQpUlsKWPSumEmSfynkGeMDgkM-O1mN59LKp0p4kt-2O2IIFrlc8289ZbCSO6JghtvDsLsFDA3VxLV3Irn2W3KQ8sHg_TdwB-0g0WX9J-WTwedVYzj2a7uNtH377ZIritTXKqRw1qeQ6mkpxQ0h_OVMIl8XUiEC0Zj1KP50tUK8
I checked with Postman, and sure enough - I got the 503 error there as well. As far as I could tell, the server (or reverse proxy in front of it) was inspecting the headers of the request, and invalidating the request based on them. I fooled around a little, moving headers from the browser request into Postman, and finally figured out that it is a combination of the cookie and User-Agent headers being set that allows the request to be served.
The User-Agent header is not allowed to have the specified chrome version, I have it working with version 83 here.
The cookkie header is something that the browser will populate from my first visit to the site in the browser. So that is a bit harder to handle in your code. I tried to fetch it in code with connection.getHeaderField("set-cookie") but that cookie does not seem to cut it.
But! I was able to make the code work, when taking the cookie from my browser, and setting it manually in code, along with the User-Agent:
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
// This one does not work for the reason of the chrome version apparently
// conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36");
conn.addRequestProperty("cookie", "<cookie value from the browser, from the header on a successful request>");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(String.format("Caught exception : %s", e.getMessage()));
throw new IOException();
}
return conn;
}
I later found out that is is the cookie value from the cf_clearance key in the cookie that makes the difference.
Related
Just trying to use a files direct download link to download it but I keep getting this error around 1/3 times I run it, the other 2 times it runs properly like its supposed to.
java.io.IOException: Server returned HTTP response code: 403 for URL:
This is my code
public static void copyUrlToFile(URL url, File destination) throws IOException {
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2");
connection.connect();
FileUtils.copyInputStreamToFile(connection.getInputStream(), destination);
}```
In my Java app, I want to download an image using the following:
ImageIO.read(new URL("https://www.example.com/example.png"))
It works fine most of the time, except for this url: https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png
The problem in the url is that there is an * in it. So I try the following workarounds, without any success:
replacing the * by \*
replacing the * by %2A
I always have the following error:
javax.imageio.IIOException: Can't get input stream from URL!
at javax.imageio.ImageIO.read(ImageIO.java:1395)
How can I download the image, then?
Thanks for your help.
The problem seems to be related to Java 8 and is fixed in Java 11. The problem with Java 8 is that a HTTP 403 code is returned.
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at java.net.URL.openStream(URL.java:1045)
at javax.imageio.ImageIO.read(ImageIO.java:1393)
To fix this we need to set the user agent header.
URL url = new URL("https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png");
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedImage bufferedImage = ImageIO.read(connection.getInputStream());
I am trying to use the Microsoft Custom Visionknow more. I need to make an HTTP request to send the image to be analyzed. I successfully made a request from C# so I know the information is correct.
However, when I tried to make the same request in Java I received an HTTP 400 error.
I believe I did not handle the request correctly in Java. Is that true?
Following are the snippets.
C#:
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Prediction-Key", PredicitionKey);
using (var content = new
ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = await client.PostAsync(url, content);
Console.WriteLine(await response.Content.ReadAsStringAsync());
}
Java:
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Prediction-Key", predicitionKey);
connection.setRequestProperty("Content-Type", "application/octet-stream");
connection.setDoInput(true);
connection.setDoOutput(true);
connection.getOutputStream().write(data.getData());
connection.connect();
Reader in = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
First, replace
connection.connect();
with
connection.getResponseCode();
If it still doesn't work, then the headers are the problem.
Since C# code runs successfully
The only difference between your C# request and Java one in fiddler is that Java request has two additional headers (Accept, User-Agent).
Try setting them explicitly
connection.setRequestProperty("Accept", "*/*");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");
If still doesn't work, try removing these two headers or check data you're sending in request body.
I am trying to parse through a particular website and am getting an HTTP response code of 419 when my java code calls it. I need to parse through the response to find content and I am stuck on the response code.
I have tried putting together a Java program using apache http client(version 4.5.6) to call a website that I need to parse. The http response code I get back is 419.
try (CloseableHttpClient httpclient = HttpClients.createDefault()) {
HttpGet httpGet = new HttpGet("http://www.website.com");
try (CloseableHttpResponse response1 = httpclient.execute(httpGet)) {
System.out.println(response1.getStatusLine());
HttpEntity entity1 = response1.getEntity();
EntityUtils.consume(entity1);
}
}
The result that it prints out is this:
HTTP/1.1 419 status code 419
I am expecting a 200
HTTP/1.1 200 OK
I get that when I change the website to google or other sites.
I was making a get request through HttpClient library as well as from POSTMAN and facing same 419 error. To solve this 419 error we need to add csrf token while making form submission.
However, In-case if you are still wondering how to find csrf token even when you are making a GET request and facing status 419. In my case I solved the problem by adding the user-agent: xxxx token in header.
Example:
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36
HttpClient Code:
connectionManager = new PoolingHttpClientConnectionManager();
...
...
...
httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.setRedirectStrategy(new LaxRedirectStrategy())
.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36")
.build();
I have some code, which is meant to send a GET request via HTTP to a server, and fetch the data there. I haven't yet coded the part that does stuff with the response, as I first wanted to test whether the GET request worked. And it didn't:
private static String fetch() throws UnsupportedEncodingException, MalformedURLException, IOException {
// Set the parameters
String url = "http://www.futhead.com";
String charset = "UTF-8";
//Fire the request
try {
URLConnection connection = new URL(url).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
// ^^^ I tried this, and it doesn't help!
InputStream response = connection.getInputStream();
HttpURLConnection httpConnection = (HttpURLConnection) new URL(url).openConnection();
httpConnection.setRequestMethod("GET");
System.out.println("Status: " + httpConnection.getResponseCode());
} catch (UnknownHostException e) {
// stuff
}
return null;
// ^^^ I haven't coded the fetching itself yet
}
With that code in mind, fetch() prints Status: 403. Why is this happening? My guess is that this particular server doesn't let non-browser clients access it (because the code works with http://www.google.com), but is there a workaround?
There are some answers out there already, but some of them are either irrelevant to me (they talk about a problem with HTTPS) or incomprehensible. I've tried those that I can understand, to no avail.
You might have Browser Integrity Check enabled https://support.cloudflare.com/hc/en-us/articles/200170086-What-does-the-Browser-Integrity-Check-do-
I disabled Browser Integrity Check and it works fine now. Another solution would be to set User-Agent, if possible.
I experienced the problem from Scala, which eventually uses java.net.URL