IOException - XML Parsing

IOException - XML Parsing - java

I am trying to parse an xml response from the below url -
http://imdbapi.org/?type=xml&q=argo
For this, i have written the below code -
try
{
XMLReader myReader = XMLReaderFactory.createXMLReader();
xmlHandler handlerobj = new xmlHandler();
myReader.setContentHandler(handlerobj);
myReader.parse(new InputSource(new URL("http://imdbapi.org/?type=xml&q=argo").openStream()));
}
catch(Exception e)
{
System.out.println("Error");
}
xmlHandler is a class that extends DefaultHandler.
I am getting an IOException in the above code.
Stack trace -
java.io.IOException: Server returned HTTP response code: 403 for URL: http://imdbapi.org/?type=xml&q=argo
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at gui.getimdbdata(gui.java:73)
at gui.main(gui.java:64)
What is the problem with this code ?

You must set the user.agent:
System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36");
(if you connect to the URL with your browser this is done automagically)

Solved the issue, thanks to #dijkstra !
The web service would only allow browser to fetch the xml data.
Following are the modifications -
url = new URL(urlString);
uc = url.openConnection();
uc.addRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
uc.connect();
uc.getInputStream();
BufferedInputStream in = new BufferedInputStream(uc.getInputStream());

Related

403 error with Flyer Saucer when creating pdf from html Spring Boot ressources

I want to generate a pdf from html, with Spring Boot, so I use Flying Saucer, and generate the pdf with ITextRenderer
But in my html the resources cannot be loaded. When I call :
FileOutputStream os = new FileOutputStream(file);
iTextRenderer.createPDF(os);
I get this error :
java.io.IOException: Server returned HTTP response code: 403 for URL: https://my-custom-url.com
It works on my localhost, but not when deploying to my AWS Elastic Beanstalk.

My server rejected the requests due to a missing user-agent in Flying Saucer library.
To fix it, I overloaded the ITextUserAgent, to add manually a user-agent :
public class ResourceLoaderUserAgent extends ITextUserAgent
{
ResourceLoaderUserAgent(ITextOutputDevice outputDevice)
{
super(outputDevice);
}
#Override
protected InputStream openStream(String uri) throws IOException
{
URLConnection connection = new URL(uri).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
return connection.getInputStream();
}
}
And I used it like this :
ITextRenderer iTextRenderer = new ITextRenderer();
ResourceLoaderUserAgent callback = new ResourceLoaderUserAgent(iTextRenderer.getOutputDevice());
callback.setSharedContext(iTextRenderer.getSharedContext());
iTextRenderer.getSharedContext().setUserAgentCallback(callback);

Getting exception "Connection Reset" when trying to fetch response code using httpurlconnection

I am trying to get response code using httpurlconnection but getting "Connection Reset" every time.
Here is the code I am using,
try {
String url = "https://www.northerntrust.com/asia-pac/home";
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(100000);
conn.connect();
int responseCode = conn.getResponseCode();
} catch(Exception e) {
logger.error("Caught exception : {}", e.getMessage);
}
The exception says : "Stack trace : java.net.SocketInputStream.read(SocketInputStream.java:210)",
Connection reset
Why am I getting this error ?
This is happening for these urls as well :
https://www.comerica.com/business.html
https://www.pbcu.com/

Java https request get 403 response code for curseforge site

I try get redirect from https://www.curseforge.com/projects/291874 to https://www.curseforge.com/minecraft/mc-mods/serene-seasons but all tries response 403. I try HttpURLConnection, HttpsURLConnection and URLConnection but nothing to work.
How do request on this site?
Browser do this two steps:
CONNECT www.curseforge.com:443 HTTP/1.1
GET /projects/291874 HTTP/1.1
How do this on java?
Please check your code before answer (first link in question)
String url = "https://www.curseforge.com/projects/291874";
try {
HttpURLConnection con1 = (HttpURLConnection)(new URL(url).openConnection());
con1.setRequestProperty("Host", "www.curseforge.com");
con1.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0");
con1.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
con1.setRequestProperty("Accept-Language", "ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3");
con1.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
con1.setRequestProperty("DNT", "1");
con1.setRequestProperty("Connection", "keep-alive");
con1.setRequestProperty("Upgrade-Insecure-Requests", "1");
con1.connect();
//con1.getInputStream();
System.out.println(con1.getResponseCode());
System.out.println(con1.getHeaderField("Location"));
} catch (IOException e) {
e.printStackTrace();
}

JSOUP Blocked: Returning HTTPStatusException

I keep getting HTTPStatusExceptions, whether it be 500/502/503, even 522. I'm guessing my IP has been blacklisted? What options do I have to ameliorate this?
I've noticed that while actually browsing the site, it takes forever to load, and that trying to view the source code actually times-out.
public Document getTPBDocument(String searchField) throws IOException {
Connection.Response response = Jsoup.connect("https://thepiratebay.org/search/" + searchField + "/0/99/0").userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21")
.referrer("http://www.google.com")
.timeout(30000)
.followRedirects(true)
.header("Content-Type", "application/json;charset=UTF-8")
.execute();
return response.parse();
}

java.io.IOException: 403 error loading URL jsoup login

help me with jsoup login, i have to login, then redirect from login page to another page with saved session.
public void parseXhtml() throws IOException{
String sessionID=null;
Map<String, String> cookies = new HashMap<String, String>();
cookies.put("login", "login");
cookies.put("password", "password");
Connection conn=Jsoup.connect("http://localhost:8080/dir/login.xhtml");
Connection.Response res = Jsoup
.connect("http://localhost:8080/dir/login.xhtml")
.data(cookies)
.execute();
res = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Method.GET)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.execute();
Document doc = res.parse();
System.out.println(doc.html());
sessionID = res.cookie("JSESSIONID");
Document docu = Jsoup.connect("http://localhost:8080/dir/dir/index.xhtml")
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36")
.cookie("JSESSIONID", res.cookies().get("JSESSIONID"))
.method(Connection.Method.GET)
.get();
it throws the below exception
java.io.IOException: 403 error loading URL http://localhost:8080/dir/dir/index.xhtml
if i'm doing like [Sending POST request with username and password and save session cookie it throws the same exception.

ExternalContext ec=FacesContext.getCurrentInstance().getExternalContext();
HttpServletRequest req=(HttpServletRequest) ec.getRequest();
HttpSession sess=(HttpSession) ec.getSession(true);
String url = req.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
ec.setRequest("http://localhost:8080/dir/dir/index.xhtml");
HttpServletRequest req2=(HttpServletRequest) ec.getRequest();
String url2 = req2.getRequestURL().append(";jsessionid=").append(sess.getId()).toString();
Document doc2=Jsoup.connect(url2).get();
System.out.println(doc2.html());
Finally i got it, I do not know if it is right, but works

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

IOException - XML Parsing - java

You must set the user.agent: System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36"); (if you connect to the URL with your browser this is done automagically)

Related

403 error with Flyer Saucer when creating pdf from html Spring Boot ressources

Getting exception "Connection Reset" when trying to fetch response code using httpurlconnection

Java https request get 403 response code for curseforge site

JSOUP Blocked: Returning HTTPStatusException

java.io.IOException: 403 error loading URL jsoup login

Categories

Resources