Download web page only if it has been modified - java

I'm building and app for an assessment and I need to download a web page only if it has been modified since the last time I downloaded it. I need to store the date of last change as an Long, so the method getDate() returns a long.
I tried to use HttpURLConnection and URLConnection, but I couldn't manage to achieve a solution.
Within my attempts I tried to use:
If-Modified-Since, but, somehow, I didn't receive the 304 response code, only the 200. The code:
HttpURLConnection huc = null;
try {
URL url = new URL(pages.get(0).getUrl());
huc = (HttpURLConnection) url.openConnection();
huc.setIfModifiedSince(pages.get(0).getDate());
huc.connect();
Log.d("App", "Since: " + huc.getIfModifiedSince());
Log.d("App", "Response: " + huc.getResponseCode());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
//output
since: 1354320000000 - which is the return of the getDate method.
Response: 200
html etags, but I couldn't retrieve the information from the response, because the server doesn't answer the Last-Modified tag.
Thanks in advance

Related

How to check if the url is there or not in java?

I am coming to an issue where I need help to check for a url that when I search on a particular code it wont show the url that I have listed. Is there a way to make it work with my code? I tried and created a method below which is generateLink. Thanks for the help.
First, since generateLink will either return a valid URL or null, you need to change this:
jgen.writeStringField("pay_grade_description_link", generateLink(XXX_URL + value.jobClassCd) + ".pdf");
to this:
jgen.writeStringField("pay_grade_description_link", generateLink(value.jobClassCd));
If you concatenate ".pdf" to it, null return values will be meaningless, since null + ".pdf" results in the eight-character string "null.pdf".
Second, you can check the response code of an HttpURLConnection to test a URL’s validity. (In theory, you should be able to use the "OPTIONS" HTTP method to test the URL, but not all servers support it.)
private String generateLink(String jobClassCd) {
String url = XXX_URL + jobClassCd + ".pdf";
try {
HttpURLConnection connection =
(HttpURLConnection) new URL(url).openConnection();
if (connection.getResponseCode() < 400) {
return url;
}
} catch (IOException e) {
Logger.getLogger(JobSerializer.class.getName()).log(Level.FINE,
"URL \"" + url + "\" is not reachable.", e);
}
return null;
}

Why server returns me the Response Code 403 for a valid file in java?

I want to get the Content Length of this file by java:
https://www.subf2m.co/subtitles/farsi_persian-text/SImp4fRrRnBK6j-u2RiPdXSsHSuGVCDLz4XZQLh05FnYmw92n7DZP6KqbHhwp6gfvrxazMManmskHql6va6XEfasUDxGevFRmkWJLjCzsCK50w1lwNajPoMGPTy9ebCC0&name=Q2FwdGFpbiBNYXJ2ZWwgRmFyc2lQZXJzaWFuIGhlYXJpbmcgaW1wYWlyZWQgc3VidGl0bGUgLSBTdWJmMm0gW3N1YmYybS5jb10uemlw
When I insert this url in Firefox or Google Chrome, it downloads a file. but when i want to see that file's size by Java HttpsURlConnection, server returns me Response Code 403 and Content Length -1. why this happens? Thanks
try {
System.out.println("program started -----------------------------------------");
String str_url = "https://www.subf2m.co/subtitles/farsi_persian-text/SImp4fRrRnBK6j-u2RiPdXSsHSuGVCDLz4XZQLh05FnYmw92n7DZP6KqbHhwp6gfvrxazMManmskHql6va6XEfasUDxGevFRmkWJLjCzsCK50w1lwNajPoMGPTy9ebCC0&name=Q2FwdGFpbiBNYXJ2ZWwgRmFyc2lQZXJzaWFuIGhlYXJpbmcgaW1wYWlyZWQgc3VidGl0bGUgLSBTdWJmMm0gW3N1YmYybS5jb10uemlw";
URL url = new URL(str_url);
HttpsURLConnection con = (HttpsURLConnection) url.openConnection();
con.setConnectTimeout(150000);
con.setReadTimeout(150000);
con.setRequestMethod("HEAD");
con.setInstanceFollowRedirects(false);
con.setRequestProperty("Accept-Encoding", "identity");
con.setRequestProperty("connection", "close");
con.connect();
System.out.println("responseCode: " + con.getResponseCode());
System.out.println("contentLength: " + con.getContentLength());
} catch (IOException e) {
System.out.println("error | " + e.toString());
e.printStackTrace();
}
output:
program started -----------------------------------------
responseCode: 403
contentLength: -1
The default Java user-agent is blocked by some online services (most notably, Cloudflare). You need to set the User-Agent header to something else.
con.setRequestProperty("User-Agent", "My-User-Agent");
In my experience, it doesn't matter what you set it to, as long as it's not the default one:
con.setRequestProperty("User-Agent", "aaa"); // works perfectly fine
EDIT: looks like this site uses Cloudflare with DDoS protection active - your code won't run the JavaScript challenge needed to actually get the content of the file.

graphQL api requests stopped working suddenly

so I had some code that's been working for months but recently it's suddenly stopped. It makes a request to the Anilist api to get information about some anime.
try {
String query = getQuery();
query = query.replace("\n", " ").replace(" ", " ");
URL url = new URL("https://graphql.anilist.co");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setConnectTimeout(5000);
conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
conn.addRequestProperty("Accept", "application/json");
conn.setDoOutput(true);
conn.setDoInput(true);
conn.setRequestMethod("POST");
OutputStream os = conn.getOutputStream();
os.write(query.getBytes("UTF-8"));
os.close();
// read the response
InputStream in = new BufferedInputStream(conn.getInputStream());
String result = org.apache.commons.io.IOUtils.toString(in, "UTF-8");
in.close();
conn.disconnect();
//return result
} catch (Exception exp) {
System.out.println("exception TRIGGERED");
System.out.println(exp.getLocalizedMessage());
}
So I put in breakpoints and it seems to break at line after the comment "// read the response". I've tried the same request using the Altair GraphQL Client extension for google chrome and the suggested tool at https://anilist.co/graphiql and they both work, so the problem seems like it's with my code somehow, but I haven't updated my environment. I also have an instance of the code running on aws (it's part of a discord bot), and I know I haven't touched that since I put the bot up there.
If you're interested, one of the queries I know used to work is
{"query":"query {
Media (type: ANIME, search: "naruto") {
id
idMal
type
description(asHtml:false)
chapters
format
episodes
source
genres
averageScore
popularity
isAdult
status
nextAiringEpisode {
episode
airingAt
timeUntilAiring
}
startDate {
year
month
day
}
season
title {
english
native
romaji
}
coverImage{
large
medium
}
studios (isMain: true) {
nodes {
id
name
}
}
}
}"}
Any help is appreciated, Thank you very much.

Getting new Url if Moved Permanently

I am developing a code for a project where a part of the code is to check a list of Urls (Web site) is live and and confirm it.
So far every thing is working as planned, expect some pages that are Moved Permanently with error 301 regarding this list. In case of error 301 I need to get the new Url info and pass it in a method before returning true.
The following example is just move to https but other examples could be moved to another Url, so if you call this site:
http://en.wikipedia.org/wiki/HTTP_301
it moves to
https://en.wikipedia.org/wiki/HTTP_301
Which is fine, I just need to get the new Url.
Is this possible and how?
This is my working code part so far:
boolean isUrlOk(String urlInput) {
HttpURLConnection connection = null;
try {
URL url = new URL(urlInput);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
urlStatusCode = connection.getResponseCode();
} catch (IOException e) {
// other error types to be reported
e.printStackTrace();
}
if (urlStatusCode == 200) {
return true;
} else if (urlStatusCode == 301) {
// call a method with the correct url name
// before returning true
return true;
}
return false;
}
You can get the new URL with
String newUrl = connection.getHeaderField("Location");

Problem with POSTing XML data to an API using Java

I'm having problem with sending XML-data using HTTP POST to an API.
If I send well formatted XML, I get an error message:
Server Exception: Cannot access a closed Stream
If the XML isn't well formatted, I get HTTP 500. And if I just send an empty string instead of a string with XML, I get back an error message: EMPTY REQUEST.
I don't have many ideas about what the error could be, but the connection works because the error message is returned in XML format. I'm just sending the XML data as a string. Is it possible that I am required to send an EOF or something in the end? And how do I do that in my Java code? Any other ideas about what the problem can be?
The API is made in .NET
Here is the Java code I'm using to POST the XML data:
Authenticator.setDefault(new MyAuthenticator());
String xmlRequestStatus =
"<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><test><data>32</data></test>";
System.out.println(xmlRequestStatus);
String contentType = "text/xml";
String charset = "ISO-8859-1";
String request = null;
URL url = null;
HttpURLConnection connection = null;
OutputStream output = null;
InputStream response = null;
try {
url = new URL("http://127.0.0.1/test");
} catch (MalformedURLException e) {
e.printStackTrace();
}
try {
connection = (HttpURLConnection)url.openConnection();
connection.setDoOutput(true);
connection.setRequestMethod("POST");
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("Content-Type", contentType);
output = connection.getOutputStream();
output.write(request.getBytes("ISO-8859-1"));
if(output != null) try { output.close(); } catch (IOException e) {}
response = connection.getInputStream();
....
It looks fine and should work fine. The connection.setRequestMethod("POST"); is however entirely superfluous when you already did connection.setDoOutput(true);.
Since this error is coming straight from the .NET webservice hosted at localhost, are you sure that it is written without bugs? I don't do .NET, but Google learns me that it's related to MemoryStream. I'd concentrate on the .NET code and retest/debug it. Maybe those related SO questions may help.
You need to specify method POST by doing something like this,
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Length", "" + length);
Otherwise, it's treated as a GET and some server doesn't expect body with GET so the stream is closed.
Maybe close the OutputStream later in the control flow. So instead of this:
output.write(request.getBytes("ISO-8859-1"));
if(output != null) try { output.close(); } catch (IOException e) {}
response = connection.getInputStream();
Try this (and maybe add the flush)?
output.write(request.getBytes("ISO-8859-1"));
output.flush();
response = connection.getInputStream();
if(output != null) try { output.close(); } catch (IOException e) {}
Shouldn't it be <32 instead of <32?
It looks like request is initialized to null, but afterwards not set. Should it not be
output.write(xmlRequestStatus.getBytes("ISO-8859-1"));

Categories