Cannot fix http response 403 in Java - java

Just trying to use a files direct download link to download it but I keep getting this error around 1/3 times I run it, the other 2 times it runs properly like its supposed to.
java.io.IOException: Server returned HTTP response code: 403 for URL:
This is my code
public static void copyUrlToFile(URL url, File destination) throws IOException {
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2");
connection.connect();
FileUtils.copyInputStreamToFile(connection.getInputStream(), destination);
}```

Related

403 Forbidden returned by trying to download a video in Java code

I want to download an embedded video from a website
The video is easily accessed in a browser but I'm struggling with getting it in Java
Below is the network tab with the website open after clicking a play button
If I copy the Request URL add a new tab, paste and hit enter the segment is downloaded
However, if I open a new Incognito window and try that I get 403 Forbidden (I also tried to open the video in Incognito mode, pressed the play button, then copied the url and hit enter, the video downloaded successfully, but I get 403 in Normal mode instead)
I get the same result while running the following code:
public static void main(String[] args) throws IOException
{
String baseUrl = "someUrl/ih4mxuprbnvyb3iihebxt26uxqo5x32qhldeyaejlhbmm7kga7llvftoqdia/seg-1-v1-a1.ts";
URL url = new URL(baseUrl);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("sec-ch-ua", "\"Google Chrome\";v=\"87\", \" Not;A Brand\";v=\"99\", \"Chromium\";v=\"87\"");
con.setRequestProperty("Accept", "*/*");
con.setRequestProperty("Upgrade-Insecure-Requests", "1");
con.setRequestProperty("sec-ch-ua-mobile", "?0");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36");
System.out.println(con.getResponseCode());
}
I've tried adding the same headers but it didn't work. What am I missing? Thanks in advance

Getting 503 error with HttpUrlConnection but site loading on browser

url = "https://www.lmcu.org/?__cf_chl_jschl_tk__=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs";
HttpURLConnection connection = pingHttpUrl(url);
responseCode = connection.getResponseCode();
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
logger.error("Caught exception : {}", e.getMessage());
throw new IOException();
}
return conn;
}
This gives response code as 503. But the site is properly loading on browser. What can be the issue with this ?
The problem is with the headers of the request. I found that this solution hosted on cloudflare requires two headers to be just so, otherwise you will receive the 503 response:
User-Agent - Your header specified chrome version 76, apparently the server has a problem with this. I had success with this User-Agent value: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
cookie - I found that the cookie value cf_clearance needs to be set, and possibly the other set-cookie values that are returned on the first request. This value has to do with the cloudflare support for privacy pass (https://blog.cloudflare.com/cloudflare-supports-privacy-pass/). It appears to be a means of verifying that a user is human, and not a machine. Which in turn is bad news for your efforts here.
I have a working solution below, but it will be hard to automate - since it will require you to establish a browser session, and use the cookie set there in the code. Upon expiration of the cf_clearance cookie, you will have visit the site again and reset the cookie value in the code.
I would also speculate that the User-Agent header of the request, is used in generating the cf_clearance cookie that is required. Making it more difficult to hijack the cookie, as you would have to use a matching User-Agent of the browser used for the request when the cf_clearance cookie was generated by cloudflare.
I have journaled my investigation here:
When visiting the URL in my browser:
https://www.lmcu.org/?cf_chl_jschl_tk=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs
And inspecting the response that the server is giving, it turns out that it is infact giving back a 503 as well:
For some reson that I can't make out, the browser is redirected to the below URL instead. I cannot see that the location header is passed back in the response, or find this URL anywhere in the response.
https://www.lmcu.org/?cf_chl_jschl_tk=fe835fdc1e7e2f5b2857ab5eb4be84e67d0e8c42-1592506549-0-AQ3E1piNGHg7O7lxgRyItR1U5BzB52q7GmCHe_HPJBsUHv8RcZCgqLPPtyngPmDjvy7pZDprPNK6ihKVEgQ7HqmbDSPXZ1aHPkBDs9re49u_Q_jI04etmtK7E0GIdxhKWCd-p4TR7b_b0JdnwzJOF6z4XaJQOgNU8kazJr5Mo96zxQpUlsKWPSumEmSfynkGeMDgkM-O1mN59LKp0p4kt-2O2IIFrlc8289ZbCSO6JghtvDsLsFDA3VxLV3Irn2W3KQ8sHg_TdwB-0g0WX9J-WTwedVYzj2a7uNtH377ZIritTXKqRw1qeQ6mkpxQ0h_OVMIl8XUiEC0Zj1KP50tUK8
I checked with Postman, and sure enough - I got the 503 error there as well. As far as I could tell, the server (or reverse proxy in front of it) was inspecting the headers of the request, and invalidating the request based on them. I fooled around a little, moving headers from the browser request into Postman, and finally figured out that it is a combination of the cookie and User-Agent headers being set that allows the request to be served.
The User-Agent header is not allowed to have the specified chrome version, I have it working with version 83 here.
The cookkie header is something that the browser will populate from my first visit to the site in the browser. So that is a bit harder to handle in your code. I tried to fetch it in code with connection.getHeaderField("set-cookie") but that cookie does not seem to cut it.
But! I was able to make the code work, when taking the cookie from my browser, and setting it manually in code, along with the User-Agent:
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
// This one does not work for the reason of the chrome version apparently
// conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36");
conn.addRequestProperty("cookie", "<cookie value from the browser, from the header on a successful request>");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(String.format("Caught exception : %s", e.getMessage()));
throw new IOException();
}
return conn;
}
I later found out that is is the cookie value from the cf_clearance key in the cookie that makes the difference.

Java ImageIO.read() crashes if url contains special characters

In my Java app, I want to download an image using the following:
ImageIO.read(new URL("https://www.example.com/example.png"))
It works fine most of the time, except for this url: https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png
The problem in the url is that there is an * in it. So I try the following workarounds, without any success:
replacing the * by \*
replacing the * by %2A
I always have the following error:
javax.imageio.IIOException: Can't get input stream from URL!
at javax.imageio.ImageIO.read(ImageIO.java:1395)
How can I download the image, then?
Thanks for your help.
The problem seems to be related to Java 8 and is fixed in Java 11. The problem with Java 8 is that a HTTP 403 code is returned.
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at java.net.URL.openStream(URL.java:1045)
at javax.imageio.ImageIO.read(ImageIO.java:1393)
To fix this we need to set the user agent header.
URL url = new URL("https://cdn-images-1.medium.com/max/1200/1*XSCC_nLOSp1VJ6wXeANgCQ.png");
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedImage bufferedImage = ImageIO.read(connection.getInputStream());

Why converting C# HTTP Request to Java did not work?

I am trying to use the Microsoft Custom Visionknow more. I need to make an HTTP request to send the image to be analyzed. I successfully made a request from C# so I know the information is correct.
However, when I tried to make the same request in Java I received an HTTP 400 error.
I believe I did not handle the request correctly in Java. Is that true?
Following are the snippets.
C#:
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Prediction-Key", PredicitionKey);
using (var content = new
ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = await client.PostAsync(url, content);
Console.WriteLine(await response.Content.ReadAsStringAsync());
}
Java:
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Prediction-Key", predicitionKey);
connection.setRequestProperty("Content-Type", "application/octet-stream");
connection.setDoInput(true);
connection.setDoOutput(true);
connection.getOutputStream().write(data.getData());
connection.connect();
Reader in = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
First, replace
connection.connect();
with
connection.getResponseCode();
If it still doesn't work, then the headers are the problem.
Since C# code runs successfully
The only difference between your C# request and Java one in fiddler is that Java request has two additional headers (Accept, User-Agent).
Try setting them explicitly
connection.setRequestProperty("Accept", "*/*");
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36");
If still doesn't work, try removing these two headers or check data you're sending in request body.

Java HTTP GET request gives 403 Forbidden, but works in browser

I have some code, which is meant to send a GET request via HTTP to a server, and fetch the data there. I haven't yet coded the part that does stuff with the response, as I first wanted to test whether the GET request worked. And it didn't:
private static String fetch() throws UnsupportedEncodingException, MalformedURLException, IOException {
// Set the parameters
String url = "http://www.futhead.com";
String charset = "UTF-8";
//Fire the request
try {
URLConnection connection = new URL(url).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
// ^^^ I tried this, and it doesn't help!
InputStream response = connection.getInputStream();
HttpURLConnection httpConnection = (HttpURLConnection) new URL(url).openConnection();
httpConnection.setRequestMethod("GET");
System.out.println("Status: " + httpConnection.getResponseCode());
} catch (UnknownHostException e) {
// stuff
}
return null;
// ^^^ I haven't coded the fetching itself yet
}
With that code in mind, fetch() prints Status: 403. Why is this happening? My guess is that this particular server doesn't let non-browser clients access it (because the code works with http://www.google.com), but is there a workaround?
There are some answers out there already, but some of them are either irrelevant to me (they talk about a problem with HTTPS) or incomprehensible. I've tried those that I can understand, to no avail.
You might have Browser Integrity Check enabled https://support.cloudflare.com/hc/en-us/articles/200170086-What-does-the-Browser-Integrity-Check-do-
I disabled Browser Integrity Check and it works fine now. Another solution would be to set User-Agent, if possible.
I experienced the problem from Scala, which eventually uses java.net.URL

Categories