I am trying to parse an xml response from the below url -
http://imdbapi.org/?type=xml&q=argo
For this, i have written the below code -
try
{
XMLReader myReader = XMLReaderFactory.createXMLReader();
xmlHandler handlerobj = new xmlHandler();
myReader.setContentHandler(handlerobj);
myReader.parse(new InputSource(new URL("http://imdbapi.org/?type=xml&q=argo").openStream()));
}
catch(Exception e)
{
System.out.println("Error");
}
xmlHandler is a class that extends DefaultHandler.
I am getting an IOException in the above code.
Stack trace -
java.io.IOException: Server returned HTTP response code: 403 for URL: http://imdbapi.org/?type=xml&q=argo
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at gui.getimdbdata(gui.java:73)
at gui.main(gui.java:64)
What is the problem with this code ?
You must set the user.agent:
System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36");
(if you connect to the URL with your browser this is done automagically)
Solved the issue, thanks to #dijkstra !
The web service would only allow browser to fetch the xml data.
Following are the modifications -
url = new URL(urlString);
uc = url.openConnection();
uc.addRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
uc.connect();
uc.getInputStream();
BufferedInputStream in = new BufferedInputStream(uc.getInputStream());
Related
I want to generate a pdf from html, with Spring Boot, so I use Flying Saucer, and generate the pdf with ITextRenderer
But in my html the resources cannot be loaded. When I call :
FileOutputStream os = new FileOutputStream(file);
iTextRenderer.createPDF(os);
I get this error :
java.io.IOException: Server returned HTTP response code: 403 for URL: https://my-custom-url.com
It works on my localhost, but not when deploying to my AWS Elastic Beanstalk.
My server rejected the requests due to a missing user-agent in Flying Saucer library.
To fix it, I overloaded the ITextUserAgent, to add manually a user-agent :
public class ResourceLoaderUserAgent extends ITextUserAgent
{
ResourceLoaderUserAgent(ITextOutputDevice outputDevice)
{
super(outputDevice);
}
#Override
protected InputStream openStream(String uri) throws IOException
{
URLConnection connection = new URL(uri).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
return connection.getInputStream();
}
}
And I used it like this :
ITextRenderer iTextRenderer = new ITextRenderer();
ResourceLoaderUserAgent callback = new ResourceLoaderUserAgent(iTextRenderer.getOutputDevice());
callback.setSharedContext(iTextRenderer.getSharedContext());
iTextRenderer.getSharedContext().setUserAgentCallback(callback);
I am trying to get response code using httpurlconnection but getting "Connection Reset" every time.
Here is the code I am using,
try {
String url = "https://www.northerntrust.com/asia-pac/home";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
} catch(Exception e) {
logger.error("Caught exception : {}", e.getMessage);
}
The exception says : "Stack trace : java.net.SocketInputStream.read(SocketInputStream.java:210)",
Connection reset
Why am I getting this error ?
This is happening for these urls as well :
https://www.comerica.com/business.html
https://www.pbcu.com/
I try get redirect from https://www.curseforge.com/projects/291874 to https://www.curseforge.com/minecraft/mc-mods/serene-seasons but all tries response 403. I try HttpURLConnection, HttpsURLConnection and URLConnection but nothing to work.
How do request on this site?
Browser do this two steps:
CONNECT www.curseforge.com:443 HTTP/1.1
GET /projects/291874 HTTP/1.1
How do this on java?
Please check your code before answer (first link in question)
String url = "https://www.curseforge.com/projects/291874";
try {
HttpURLConnection con1 = (HttpURLConnection)(new URL(url).openConnection());
con1.setRequestProperty("Host", "www.curseforge.com");
con1.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0");
con1.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
con1.setRequestProperty("Accept-Language", "ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3");
con1.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
con1.setRequestProperty("DNT", "1");
con1.setRequestProperty("Connection", "keep-alive");
con1.setRequestProperty("Upgrade-Insecure-Requests", "1");
con1.connect();
//con1.getInputStream();
System.out.println(con1.getResponseCode());
System.out.println(con1.getHeaderField("Location"));
} catch (IOException e) {
e.printStackTrace();
}
I keep getting HTTPStatusExceptions, whether it be 500/502/503, even 522. I'm guessing my IP has been blacklisted? What options do I have to ameliorate this?
I've noticed that while actually browsing the site, it takes forever to load, and that trying to view the source code actually times-out.
public Document getTPBDocument(String searchField) throws IOException {
Connection.Response response = Jsoup.connect("https://thepiratebay.org/search/" + searchField + "/0/99/0").userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.referrer("http://www.google.com")
.timeout(30000)
.followRedirects(true)
.header("Content-Type", "application/json;charset=UTF-8")
.execute();
return response.parse();
}
help me with jsoup login, i have to login, then redirect from login page to another page with saved session.
public void parseXhtml() throws IOException{
String sessionID=null;
Map<String, String> cookies = new HashMap<String, String>();
cookies.put("login", "login");
cookies.put("password", "password");
Connection conn=Jsoup.connect("http://localhost:8080/dir/login.xhtml");
Connection.Response res = Jsoup
.connect("http://localhost:8080/dir/login.xhtml")
.data(cookies)
.execute();
res = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.execute();
Document doc = res.parse();
System.out.println(doc.html());
sessionID = res.cookie("JSESSIONID");
Document docu = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Connection.Method.GET)
.get();
it throws the below exception
java.io.IOException: 403 error loading URL http://localhost:8080/dir/dir/index.xhtml
if i'm doing like [Sending POST request with username and password and save session cookie it throws the same exception.
ExternalContext ec=FacesContext.getCurrentInstance().getExternalContext();
HttpServletRequest req=(HttpServletRequest) ec.getRequest();
HttpSession sess=(HttpSession) ec.getSession(true);
String url = req.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
ec.setRequest("http://localhost:8080/dir/dir/index.xhtml");
HttpServletRequest req2=(HttpServletRequest) ec.getRequest();
String url2 = req2.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
Document doc2=Jsoup.connect(url2).get();
System.out.println(doc2.html());
Finally i got it, I do not know if it is right, but works