Hi I am using following code to reading URL:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLEncoder;
public class JavaHttpUrlConnectionReader
{
public static void main(String[] args)
throws Exception
{
new JavaHttpUrlConnectionReader();
}
public JavaHttpUrlConnectionReader()
{
try
{
String myUrl = "http://epaperbeta.timesofindia.com/NasData/PUBLICATIONS/THETIMESOFINDIA/Delhi/2015/06/09/PageIndex/09_06_2015.xml";
// if your url can contain weird characters you will want to
// encode it here, something like this:
// myUrl = URLEncoder.encode(myUrl, "UTF-8");
String results = doHttpUrlConnectionAction(myUrl);
System.out.println(results);
}
catch (Exception e)
{
// deal with the exception in your "controller"
}
}
/**
* Returns the output from the given URL.
*/
private String doHttpUrlConnectionAction(String desiredUrl)
throws Exception
{
URL url = null;
BufferedReader reader = null;
StringBuilder stringBuilder;
try
{
// create the HttpURLConnection
url = new URL(desiredUrl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
// just want to do an HTTP GET here
connection.setRequestMethod("GET");
// uncomment this if you want to write output to this url
//connection.setDoOutput(true);
// give it 15 seconds to respond
connection.setReadTimeout(35*1000);
connection.connect();
// read the output from the server
reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
stringBuilder = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null)
{
stringBuilder.append(line + "\n");
}
return stringBuilder.toString();
}
catch (Exception e)
{
e.printStackTrace();
throw e;
}
finally
{
// close the reader; this can throw an exception too, so
// wrap it in another try/catch block.
if (reader != null)
{
try
{
reader.close();
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
}
}
}
}
It gives me following error:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
at JavaHttpUrlConnectionReader.doHttpUrlConnectionAction(JavaHttpUrlConnectionReader.java:77)
at JavaHttpUrlConnectionReader.<init>(JavaHttpUrlConnectionReader.java:33)
at JavaHttpUrlConnectionReader.main(JavaHttpUrlConnectionReader.java:21)
Kindly tell me the reason why it occurs, and solution for it.
When I run this code outside of my office LAN, it is working fine. but not in office LAN.
Thanks & Regards
Abhishek
Your URL:
http://epaperbeta.timesofindia.com/NasData/PUBLICATIONS/THETIMESOFINDIA/Delhi/2015/06/09/PageIndex/09_06_2015.xml
is not accessible without a proxy (for example I can't access it from here), no wonder why cannot be read from the stream.
Check your proxy settings. you could try the url in the browser with/without proxy and see the difference.
As #Jens commented look at this.
Related
I am trying to use an API from https://us.mc-api.net/ for a project and I have made this as a test.
public static void main(String[] args){
try {
URL url = new URL("http://us.mc-api.net/v3/uuid/193nonaxishsl/csv/");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
in.close();
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
System.out.println("I/O Error");
}
}
}
And this is giving me an IOException error but when ever I open the same page in my web browser I get
false,Unknown-Username
which is what I want to get from the code. I am new and don't really know why it is happening or why.
EDIT: StackTrace
java.io.FileNotFoundException: http://us.mc-api.net/v3/uuid/193nonaxishsl/csv/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at com.theman1928.Test.Main.main(Main.java:13)
The URL is returning status code 404 and therefore the input stream (mild guess here) is not being created and therefore is null. Sort the status code and you should be OK.
Ran it with this CSV and it is fine: other csv
If the error code is important to you then you can use HttpURLConnection:
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
System.out.println("code:"+conn.getResponseCode());
In that way you can process the response code before proceeding with a quick if-then-else check.
I tried it with the Apache HTTP libraries. The API endpoint seems to return a status code of 404, hence your error. Code I used is below.
public static void main(String[] args) throws URISyntaxException, ClientProtocolException, IOException {
HttpClient httpclient = HttpClients.createDefault();
URIBuilder builder = new URIBuilder("http://us.mc-api.net/v3/uuid/193nonaxishsl/csv/");
URI uri = builder.build();
HttpGet request = new HttpGet(uri);
HttpResponse response = httpclient.execute(request);
System.out.println(response.getStatusLine().getStatusCode()); // 404
}
Switching out the http://us.mc-api.net/v3/uuid/193nonaxishsl/csv/ with www.example.com or whatever returns a status code of 200, which further proves an error with the API endpoint. You can take a look at [Apache HTTP Components] library here.
This has to do with how the wire protocols are working in comparison with the java.net classes and an actual browser. A browser is going to be much more sophisticated than the simple java.net API you are using.
If you want to get the equivalent response value in Java, then you need to use a richer HTTP API.
This code will give you the same response as the browser; however, you need to download the Apache HttpComponents jars
The code:
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import javax.net.ssl.HttpsURLConnection;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.impl.client.HttpClients;
public class TestDriver
{
public static void main(String[] args)
{
try
{
String url = "http://us.mc-api.net/v3/uuid/193nonaxishsl/csv";
HttpGet httpGet = new HttpGet(url);
getResponseFromHTTPReq(httpGet, url);
}
catch (Throwable e)
{
e.printStackTrace();
}
}
private static String getResponseFromHTTPReq(HttpUriRequest httpReq, String url)
{
HttpClient httpclient = HttpClients.createDefault();
// Execute and get the response.
HttpResponse response = null;
HttpEntity entity = null;
try
{
response = httpclient.execute(httpReq);
entity = response.getEntity();
}
catch (IOException ioe)
{
throw new RuntimeException(ioe);
}
if (entity == null)
{
String errMsg = "No response entity back from " + url;
throw new RuntimeException(errMsg);
}
String returnRes = null;
InputStream is = null;
BufferedReader buf = null;
try
{
is = entity.getContent();
buf = new BufferedReader(new InputStreamReader(is, "UTF-8"));
System.out.println("Response Code : " + response.getStatusLine().getStatusCode());
StringBuilder sb = new StringBuilder();
String s = null;
while (true)
{
s = buf.readLine();
if (s == null || s.length() == 0)
{
break;
}
sb.append(s);
}
returnRes = sb.toString();
System.out.println("Response: [" + returnRes + "]");
}
catch (UnsupportedOperationException | IOException e)
{
throw new RuntimeException(e);
}
finally
{
if (buf != null)
{
try
{
buf.close();
}
catch (IOException e)
{
}
}
if (is != null)
{
try
{
is.close();
}
catch (IOException e)
{
}
}
}
return returnRes;
}
}
Outputs:
Response Code : 404
Response: [false,Unknown-Username]
First of all, I am but a lowly web-programmer so have very little experience with actual programming.
I have been given a list of 30,000 urls and I am not going to waste my time clicking each one to check if they are valid - is there a way to read through the text file that they are in and have a program check each line?
The code I currently have is in java as really that's all I know so if there's a better language again, please let me know.
Here is what I have so far:
public class UrlCheck {
public static void main(String[] args) throws IOException {
URL url = new URL("http://www.google.com");
//Need to change this to make it read from text file
try {
InputStream inp = null;
try {
inp = url.openStream();
} catch (UnknownHostException ex) {
System.out.println("Invalid");
}
if (inp != null) {
System.out.println("Valid");
}
} catch (MalformedURLException exc) {
exc.printStackTrace();
}
}
}
First you read the file line by line using a BufferedReader and check each line. Below code should work. It is upto you to decide what to do when you encounter an invalid URL. You could just print it as I showed or write to another file.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.rmi.UnknownHostException;
public class UrlCheck {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("_filename"));
String line;
while ((line = br.readLine()) != null) {
if(checkUrl(line)) {
System.out.println("URL " + line + " was OK");
} else {
System.out.println("URL " + line + " was not VALID"); //handle error as you like
}
}
br.close();
}
private static boolean checkUrl(String pUrl) throws IOException {
URL url = new URL(pUrl);
//Need to change this to make it read from text file
try {
InputStream inp = null;
try {
inp = url.openStream();
} catch (UnknownHostException ex) {
System.out.println("Invalid");
return false;
}
if (inp != null) {
System.out.println("Valid");
return true;
}
} catch (MalformedURLException exc) {
exc.printStackTrace();
return false;
}
return true;
}
}
The checkUrl method can be simplified as below as well
private static boolean checkUrl(String pUrl) {
URL url = null;
InputStream inp = null;
try {
url = new URL(pUrl);
inp = url.openStream();
return inp != null;
} catch (IOException e) {
e.printStackTrace();
return false;
} finally {
try {
if (inp != null) {
inp.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
You could just use httpURLConnection. If it is not valid you won't get anything back.
HttpURLConnection connection = null;
try{
URL myurl = new URL("http://www.myURL.com");
connection = (HttpURLConnection) myurl.openConnection();
//Set request to header to reduce load
connection.setRequestMethod("HEAD");
int code = connection.getResponseCode();
System.out.println("" + code);
} catch {
//Handle invalid URL
}
I am unsure of your experience but a multi-threaded solution is possible here. As you read through the text file store the urls in a thread-safe structure and allow a number of threads to go and attempt to open these connections. This will make for a more efficient solution as it may take a while to test the 30000 urls while you are reading them in.
Check out a producer-consumer example if you are unsure:
http://www.journaldev.com/1034/java-blockingqueue-example-implementing-producer-consumer-problem
public class UrlCheck {
public static void main(String[] args) {
try {
URL url = new URL("http://www.google.com");
//Open the Http connection
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
//Get the http response code
int responceCode = connection.getResponseCode();
if (responceCode == HttpURLConnection.HTTP_OK) //if the http response code is 200 OK so the url is valid
{
System.out.println("Valid");
} else //Else the url is not valid
{
System.out.println("Invalid");
}
} catch (MalformedURLException ex) {
System.out.println("Invalid");
} catch (IOException ex) {
System.out.println("Invalid");
}
}
}
I all but copied the following code from here. I get a java.net.SocketException on line 10 saying "Connection Reset".
import java.net.*;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld {
public static void main(String[] x) {
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.print(body);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I'm worried this may not actually be an issue with the actual code but rather some permission I need to give Java. Is there something wrong with my code or is this an environment issue?
I used your code with small modification cause I don't have IOUtils at hands. And it works as it should. There is no need to set agent. No special privileges also as I run it by normal user.
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
System.out.print(sb.toString());
} catch (Exception e) {
e.printStackTrace();
}
import java.net.URL;
import java.io.*;
import java.net.MalformedURLException;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Test {
public static void main(String args[]) {
try {
processHTMLFromLink(new URL("http://fwallpapers.com"));
} catch (MalformedURLException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static int processHTMLFromLink(URL url) {
InputStream is = null;
DataInputStream dis;
String line;
int count = 0;
try {
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
while ((line = in.readLine()) != null) {
System.out.println(line);
}
} catch (MalformedURLException mue) {
System.out.println(mue.toString());
} catch (IOException ioe) {
System.out.println(ioe.toString());
} finally {
try {
is.close();
} catch (IOException ioe) {
// nothing to see here
}
}
return count;
}
}
error:
java.io.IOException: Server returned HTTP response code: 403 for URL: http://fwallpapers.com
Exception in thread "main" java.lang.NullPointerException
at Test.processHTMLFromLink(Test.java:38)
at Test.main(Test.java:15)
Java Result: 1
It is working fine on browser. But I am getting null point exceptions. this code works fine with other links. can anyone help me out with this. How can I get content while i am getting 403 error.
This is an old post but if people wanted to know how this works.
a 403 means acces-denied.
There is a work around for this.
If you want to able to do this you have to set a user agant parameter to 'fool' the website
This is how my old method looked like:
private InputStream read() {
try {
return url.openStream();
}
catch (IOException e) {
String error = e.toString();
throw new RuntimeException(e);
}
}
Changed it to: (And it works for me!)
private InputStream read() {
try {
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.addRequestProperty("User-Agent", "Mozilla/4.0");
return httpcon.getInputStream();
} catch (IOException e) {
String error = e.toString();
throw new RuntimeException(e);
}
}
Your mistake is swallowing the exception.
When I run my code, I get an HTTP 403 - "forbidden". The web server won't allow you to do this.
My code works perfectly for http://www.yahoo.com.
Here's how I do it:
package url;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
/**
* UrlReader
* #author Michael
* #since 3/20/11
*/
public class UrlReader {
public static void main(String[] args) {
UrlReader urlReader = new UrlReader();
for (String url : args) {
try {
String contents = urlReader.readContents(url);
System.out.printf("url: %s contents: %s\n", url, contents);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public String readContents(String address) throws IOException {
StringBuilder contents = new StringBuilder(2048);
BufferedReader br = null;
try {
URL url = new URL(address);
br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
while (line != null) {
line = br.readLine();
contents.append(line);
}
} finally {
close(br);
}
return contents.toString();
}
private static void close(Reader br) {
try {
if (br != null) {
br.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is now a completely different question so I have edited your title.
According to your edit, you aren't getting null pointer exceptions, you are getting HTTP 403 status, which means 'Forbidden', which means you can't access that resource.
I am using Jsoup Java HTML parser to fetch images from a particular URL. But some of the images are throwing a status 502 error code and are not saved to my machine. Here is the code snapshot i have used:-
String url = "http://www.jabong.com";
String html = Jsoup.connect(url.toString()).get().html();
Document doc = Jsoup.parse(html, url);
images = doc.select("img");
for (Element element : images) {
String imgSrc = element.attr("abs:src");
log.info(imgSrc);
if (imgSrc != "") {
saveFromUrl(imgSrc, dirPath+"/" + nameCounter + ".jpg");
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
log.error("error in sleeping");
}
nameCounter++;
}
}
And the saveFromURL function looks like this:-
public static void saveFromUrl(String Url, String destinationFile) {
try {
URL url = new URL(Url);
InputStream is = url.openStream();
OutputStream os = new FileOutputStream(destinationFile);
byte[] b = new byte[2048];
int length;
while ((length = is.read(b)) != -1) {
os.write(b, 0, length);
}
is.close();
os.close();
} catch (IOException e) {
log.error("Error in saving file from url:" + Url);
//e.printStackTrace();
}
}
I searched on internet about status code 502 but it says error is due to bad gateway. I don't understand this. One of the possible things i am thinking that this error may be because of I am sending get request to images in loop. May be webserver is not able handle to this much load so denying the request to the images when previous image is not sent.So I tried to put sleep after fetching every image but no luck :(
Some advices please
Here's a full code example that works for me...
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.Authenticator;
import java.net.HttpURLConnection;
import java.net.InetSocketAddress;
import java.net.MalformedURLException;
import java.net.Proxy;
import java.net.SocketAddress;
import java.net.URL;
public class DownloadImage {
public static void main(String[] args) {
// URLs for Images we wish to download
String[] urls = {
"http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png",
"http://www.google.co.uk/images/srpr/logo3w.png",
"http://i.microsoft.com/global/en-us/homepage/PublishingImages/sprites/microsoft_gray.png"
};
for(int i = 0; i < urls.length; i++) {
downloadFromUrl(urls[i]);
}
}
/*
Extract the file name from the URL
*/
private static String getOutputFileName(URL url) {
String[] urlParts = url.getPath().split("/");
return "c:/temp/" + urlParts[urlParts.length-1];
}
/*
Assumes there is no Proxy server involved.
*/
private static void downloadFromUrl(String urlString) {
InputStream is = null;
FileOutputStream fos = null;
try {
URL url = new URL(urlString);
System.out.println("Reading..." + url);
HttpURLConnection conn = (HttpURLConnection)url.openConnection(proxy);
is = conn.getInputStream();
String filename = getOutputFileName(url);
fos = new FileOutputStream(filename);
byte[] readData = new byte[1024];
int i = is.read(readData);
while(i != -1) {
fos.write(readData, 0, i);
i = is.read(readData);
}
System.out.println("Created file: " + filename);
}
catch (MalformedURLException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if(is != null) {
try {
is.close();
} catch (IOException e) {
System.out.println("Big problems if InputStream cannot be closed");
}
}
if(fos != null) {
try {
fos.close();
} catch (IOException e) {
System.out.println("Big problems if FileOutputSream cannot be closed");
}
}
}
System.out.println("Completed");
}
}
You should see the following ouput on your console...
Reading...http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png
Created file: c:/temp/apple-touch-icon.png
Completed
Reading...http://www.google.co.uk/images/srpr/logo3w.png
Created file: c:/temp/logo3w.png
Completed
Reading...http://i.microsoft.com/global/en-us/homepage/PublishingImages/sprites/microsoft_gray.png
Created file: c:/temp/microsoft_gray.png
Completed
So that's a working example without a Proxy server involved.
Only if you require authentication with a proxy server here's an additional Class that you'll need based on this Oracle technote
import java.net.Authenticator;
import java.net.PasswordAuthentication;
public class ProxyAuthenticator extends Authenticator {
private String userName, password;
public ProxyAuthenticator(String userName, String password) {
this.userName = userName;
this.password = password;
}
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(userName, password.toCharArray());
}
}
And to use this new Class you would use the following code in place of the call to openConnection() shown above
...
try {
URL url = new URL(urlString);
System.out.println("Reading..." + url);
Authenticator.setDefault(new ProxyAuthenticator("username", "password");
SocketAddress addr = new InetSocketAddress("proxy.server.com", 80);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
HttpURLConnection conn = (HttpURLConnection)url.openConnection(proxy);
...
Your problem sounds like HTTP communication issues, so you are probably better off trying to use a library to handle the communication side of things. Take a look at Apache Commons HttpClient.
Some notes about your code example. You haven't used a URLConnection object so it's not clear what the behaviour will be in regards to the Web/Proxy servers and closing resources cleanly, etc. The HttpCommon library mentioned will help in this aspect.
There also seems to be some examples of doing what you want using J2ME libararies. Not something I have used personally but may also help you out.