Read url to string in few lines of java code - java

I'm trying to find Java's equivalent to Groovy's:
String content = "http://www.google.com".toURL().getText();
I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.

Now that some time has passed since the original answer was accepted, there's a better approach:
String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
If you want a slightly fuller implementation, which is not a single line, do this:
public static String readStringFromURL(String requestURL) throws IOException
{
try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
StandardCharsets.UTF_8.toString()))
{
scanner.useDelimiter("\\A");
return scanner.hasNext() ? scanner.next() : "";
}
}

This answer refers to an older version of Java. You may want to look at ccleve's answer.
Here is the traditional way to do this:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static void main(String[] args) throws Exception {
String content = URLConnectionReader.getText(args[0]);
System.out.println(content);
}
}
As #extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:
InputStream in = new URL( "http://jakarta.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}

Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.

There's an even better way as of Java 9:
URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)

Now that more time has passed, here's a way to do it in Java 8:
URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
pageText = reader.lines().collect(Collectors.joining("\n"));
}

Additional example using Guava:
URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);

Java 11+:
URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();

If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)

The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)
If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.
Example
String result = getUrlAsString("https://www.google.com");
System.out.println(result);
outputs
<!doctype html><html itemscope="" .... etc
Code
import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public static String getUrlAsString(String url)
{
try
{
URL urlObj = new URL(url);
URLConnection con = urlObj.openConnection();
con.setDoOutput(true); // we want the response
con.setRequestProperty("Cookie", "myCookie=test123");
con.connect();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
String newLine = System.getProperty("line.separator");
while ((inputLine = in.readLine()) != null)
{
response.append(inputLine + newLine);
}
in.close();
return response.toString();
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}

Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:
private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
String urlData = "";
URL urlObj = new URL(aUrl);
URLConnection conn = urlObj.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8)))
{
urlData = reader.lines().collect(Collectors.joining("\n"));
}
return urlData;
}

URL to String in pure Java
Example call to get payload from http get call
String str = getStringFromUrl("YourUrl");
Implementation
You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.
The outcome will be something like
public String getStringFromUrl(URL url) throws IOException {
return inputStreamToString(urlToInputStream(url,null));
}
public String inputStreamToString(InputStream inputStream) throws IOException {
try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString(UTF_8);
}
}
private InputStream urlToInputStream(URL url, Map<String, String> args) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setReadTimeout(15000);
if (args != null) {
for (Entry<String, String> e : args.entrySet()) {
con.setRequestProperty(e.getKey(), e.getValue());
}
}
con.connect();
int responseCode = con.getResponseCode();
/* By default the connection will follow redirects. The following
* block is only entered if the implementation of HttpURLConnection
* does not perform the redirect. The exact behavior depends to
* the actual implementation (e.g. sun.net).
* !!! Attention: This block allows the connection to
* switch protocols (e.g. HTTP to HTTPS), which is <b>not</b>
* default behavior. See: https://stackoverflow.com/questions/1884230
* for more info!!!
*/
if (responseCode < 400 && responseCode > 299) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, args);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, args);
}
}
/*!!!!!*/
inputStream = con.getInputStream();
return inputStream;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Pros
It is pure java
It can be easily enhanced by adding different headers as a map (instead of passing a null object, like the example above does), authentication, etc.
Handling of protocol switches is supported

Related

Getting specific element in Json respone without using an external libary

I want to create a GitHub update checker which provides the version of the newest release.
public static void main(String[] args) {
HttpURLConnection httpURLConnection = null;
try {
httpURLConnection = (HttpURLConnection) new URL("https://api.github.com/repos/MilkBowl/Vault/releases/latest").openConnection();
httpURLConnection.setRequestMethod("GET");
httpURLConnection.setRequestProperty("Content-Type", "application/json");
try (final InputStream inputStream = httpURLConnection.getInputStream(); final InputStreamReader inputStreamReader = new InputStreamReader(inputStream); final BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {
}
} catch (final IOException exception) {
exception.printStackTrace();
} finally {
if (httpURLConnection != null) {
httpURLConnection.disconnect();
}
}
}
This is what I got so far. It returns this Json response:
https://api.github.com/repos/MilkBowl/Vault/releases/latest
Now I want to get the "tag_name". How can I do this with plain Java? I do not want to use an external library.
O well i don't have reputation enough to comment, so...
I think you can use regex to grab only the name, you can use something like this:
name.+?,
And after that apply some replaces for sanitization.
Something like that:
Pattern.compile("'name.+?,'").matcher(your data);
if (matcher.find()){
System.out.println(matcher.group(1).replace("\"name\": \"", "").replace("\",", "");
}

HTTP Post request read the response [duplicate]

In Java, this code throws an exception when the HTTP result is 404 range:
URL url = new URL("http://stackoverflow.com/asdf404notfound");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.getInputStream(); // throws!
In my case, I happen to know that the content is 404, but I'd still like to read the body of the response anyway.
(In my actual case the response code is 403, but the body of the response explains the reason for rejection, and I'd like to display that to the user.)
How can I access the response body?
Here is the bug report (close, will not fix, not a bug).
Their advice there is to code like this:
HttpURLConnection httpConn = (HttpURLConnection)_urlConnection;
InputStream _is;
if (httpConn.getResponseCode() < HttpURLConnection.HTTP_BAD_REQUEST) {
_is = httpConn.getInputStream();
} else {
/* error from server */
_is = httpConn.getErrorStream();
}
It's the same problem I was having:
HttpUrlConnection returns FileNotFoundException if you try to read the getInputStream() from the connection.
You should instead use getErrorStream() when the status code is higher than 400.
More than this, please be careful since it's not only 200 to be the success status code, even 201, 204, etc. are often used as success statuses.
Here is an example of how I went to manage it
... connection code code code ...
// Get the response code
int statusCode = connection.getResponseCode();
InputStream is = null;
if (statusCode >= 200 && statusCode < 400) {
// Create an InputStream in order to extract the response object
is = connection.getInputStream();
}
else {
is = connection.getErrorStream();
}
... callback/response to your handler....
In this way, you'll be able to get the needed response in both success and error cases.
Hope this helps!
In .Net you have the Response property of the WebException that gives access to the stream ON an exception. So i guess this is a good way for Java,...
private InputStream dispatch(HttpURLConnection http) throws Exception {
try {
return http.getInputStream();
} catch(Exception ex) {
return http.getErrorStream();
}
}
Or an implementation i used. (Might need changes for encoding or other things. Works in current environment.)
private String dispatch(HttpURLConnection http) throws Exception {
try {
return readStream(http.getInputStream());
} catch(Exception ex) {
readAndThrowError(http);
return null; // <- never gets here, previous statement throws an error
}
}
private void readAndThrowError(HttpURLConnection http) throws Exception {
if (http.getContentLengthLong() > 0 && http.getContentType().contains("application/json")) {
String json = this.readStream(http.getErrorStream());
Object oson = this.mapper.readValue(json, Object.class);
json = this.mapper.writer().withDefaultPrettyPrinter().writeValueAsString(oson);
throw new IllegalStateException(http.getResponseCode() + " " + http.getResponseMessage() + "\n" + json);
} else {
throw new IllegalStateException(http.getResponseCode() + " " + http.getResponseMessage());
}
}
private String readStream(InputStream stream) throws Exception {
StringBuilder builder = new StringBuilder();
try (BufferedReader in = new BufferedReader(new InputStreamReader(stream))) {
String line;
while ((line = in.readLine()) != null) {
builder.append(line); // + "\r\n"(no need, json has no line breaks!)
}
in.close();
}
System.out.println("JSON: " + builder.toString());
return builder.toString();
}
I know that this doesn't answer the question directly, but instead of using the HTTP connection library provided by Sun, you might want to take a look at Commons HttpClient, which (in my opinion) has a far easier API to work with.
First check the response code and then use HttpURLConnection.getErrorStream()
InputStream is = null;
if (httpConn.getResponseCode() !=200) {
is = httpConn.getErrorStream();
} else {
/* error from server */
is = httpConn.getInputStream();
}
My running code.
HttpURLConnection httpConn = (HttpURLConnection) urlConn;
if (httpConn.getResponseCode() < HttpURLConnection.HTTP_BAD_REQUEST) {
in = new InputStreamReader(urlConn.getInputStream());
BufferedReader bufferedReader = new BufferedReader(in);
if (bufferedReader != null) {
int cp;
while ((cp = bufferedReader.read()) != -1) {
sb.append((char) cp);
}
bufferedReader.close();
}
in.close();
} else {
/* error from server */
in = new InputStreamReader(httpConn.getErrorStream());
BufferedReader bufferedReader = new BufferedReader(in);
if (bufferedReader != null) {
int cp;
while ((cp = bufferedReader.read()) != -1) {
sb.append((char) cp);
}
bufferedReader.close();
}
in.close();
}
System.out.println("sb="+sb);
How to read 404 response body in java:
Use Apache library - https://hc.apache.org/httpcomponents-client-4.5.x/httpclient/apidocs/
or
Java 11 - https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html
Snippet given below uses Apache:
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.util.EntityUtils;
CloseableHttpClient client = HttpClients.createDefault();
CloseableHttpResponse resp = client.execute(new HttpGet(domainName + "/blablablabla.html"));
String response = EntityUtils.toString(resp.getEntity());

SocketException: Connection reset

I all but copied the following code from here. I get a java.net.SocketException on line 10 saying "Connection Reset".
import java.net.*;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld {
public static void main(String[] x) {
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.print(body);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I'm worried this may not actually be an issue with the actual code but rather some permission I need to give Java. Is there something wrong with my code or is this an environment issue?
I used your code with small modification cause I don't have IOUtils at hands. And it works as it should. There is no need to set agent. No special privileges also as I run it by normal user.
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
System.out.print(sb.toString());
} catch (Exception e) {
e.printStackTrace();
}

Reading from a URL Connection Java

I'm trying to read html code from a URL Connection. In one case the html file I'm trying to read includes 5 line breaks before the actual doc type declaration. In this case the input reader throws an exception for EOF.
URL pageUrl =
new URL(
"http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"
);
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
//some read method here
Has anyone ran into a problem like this?
URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
String urlData = "";
while ((urlData = dis.readUTF()) != null)
System.out.println(urlData);
//exception thrown
java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
in the case of bufferedreader, it just responds null and doesn't continue
pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream()));
String urlData = "";
while(true)
urlData = br.readLine();
System.out.println(urlData);
outputs null
You're using DataInputStream to read data that wasn't encoded using DataOutputStream. Examine the documented behavior for your call to DataInputStream#readUtf(); it first reads two bytes to form a 16-bit integer, indicating the number of bytes that follow comprising the UTF-encoded string. The data you're reading from the HTTP server is not encoded in this format.
Instead, the HTTP server is sending headers encoded in ASCII, per RFC 2616 sections 6.1 and 2.2. You need to read the headers as text, and then determine how the message body (the "entity") is encoded.
This works fine:
package url;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
/**
* UrlReader
* #author Michael
* #since 3/20/11
*/
public class UrlReader
{
public static void main(String[] args)
{
UrlReader urlReader = new UrlReader();
for (String url : args)
{
try
{
String contents = urlReader.readContents(url);
System.out.printf("url: %s contents: %s\n", url, contents);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
public String readContents(String address) throws IOException
{
StringBuilder contents = new StringBuilder(2048);
BufferedReader br = null;
try
{
URL url = new URL(address);
br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
while (line != null)
{
line = br.readLine();
contents.append(line);
}
}
finally
{
close(br);
}
return contents.toString();
}
private static void close(Reader br)
{
try
{
if (br != null)
{
br.close();
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
This:
public class Main {
public static void main(String[] args)
throws MalformedURLException, IOException
{
URL pageUrl = new URL("http://www.google.com");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader dis = new BufferedReader(
new InputStreamReader(
getConn.getInputStream()));
String myString;
while ((myString = dis.readLine()) != null)
{
System.out.println(myString);
}
}
}
Works perfectly. The URL you are supplying, however, returns nothing.

Read error response body in Java

In Java, this code throws an exception when the HTTP result is 404 range:
URL url = new URL("http://stackoverflow.com/asdf404notfound");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.getInputStream(); // throws!
In my case, I happen to know that the content is 404, but I'd still like to read the body of the response anyway.
(In my actual case the response code is 403, but the body of the response explains the reason for rejection, and I'd like to display that to the user.)
How can I access the response body?
Here is the bug report (close, will not fix, not a bug).
Their advice there is to code like this:
HttpURLConnection httpConn = (HttpURLConnection)_urlConnection;
InputStream _is;
if (httpConn.getResponseCode() < HttpURLConnection.HTTP_BAD_REQUEST) {
_is = httpConn.getInputStream();
} else {
/* error from server */
_is = httpConn.getErrorStream();
}
It's the same problem I was having:
HttpUrlConnection returns FileNotFoundException if you try to read the getInputStream() from the connection.
You should instead use getErrorStream() when the status code is higher than 400.
More than this, please be careful since it's not only 200 to be the success status code, even 201, 204, etc. are often used as success statuses.
Here is an example of how I went to manage it
... connection code code code ...
// Get the response code
int statusCode = connection.getResponseCode();
InputStream is = null;
if (statusCode >= 200 && statusCode < 400) {
// Create an InputStream in order to extract the response object
is = connection.getInputStream();
}
else {
is = connection.getErrorStream();
}
... callback/response to your handler....
In this way, you'll be able to get the needed response in both success and error cases.
Hope this helps!
In .Net you have the Response property of the WebException that gives access to the stream ON an exception. So i guess this is a good way for Java,...
private InputStream dispatch(HttpURLConnection http) throws Exception {
try {
return http.getInputStream();
} catch(Exception ex) {
return http.getErrorStream();
}
}
Or an implementation i used. (Might need changes for encoding or other things. Works in current environment.)
private String dispatch(HttpURLConnection http) throws Exception {
try {
return readStream(http.getInputStream());
} catch(Exception ex) {
readAndThrowError(http);
return null; // <- never gets here, previous statement throws an error
}
}
private void readAndThrowError(HttpURLConnection http) throws Exception {
if (http.getContentLengthLong() > 0 && http.getContentType().contains("application/json")) {
String json = this.readStream(http.getErrorStream());
Object oson = this.mapper.readValue(json, Object.class);
json = this.mapper.writer().withDefaultPrettyPrinter().writeValueAsString(oson);
throw new IllegalStateException(http.getResponseCode() + " " + http.getResponseMessage() + "\n" + json);
} else {
throw new IllegalStateException(http.getResponseCode() + " " + http.getResponseMessage());
}
}
private String readStream(InputStream stream) throws Exception {
StringBuilder builder = new StringBuilder();
try (BufferedReader in = new BufferedReader(new InputStreamReader(stream))) {
String line;
while ((line = in.readLine()) != null) {
builder.append(line); // + "\r\n"(no need, json has no line breaks!)
}
in.close();
}
System.out.println("JSON: " + builder.toString());
return builder.toString();
}
I know that this doesn't answer the question directly, but instead of using the HTTP connection library provided by Sun, you might want to take a look at Commons HttpClient, which (in my opinion) has a far easier API to work with.
First check the response code and then use HttpURLConnection.getErrorStream()
InputStream is = null;
if (httpConn.getResponseCode() !=200) {
is = httpConn.getErrorStream();
} else {
/* error from server */
is = httpConn.getInputStream();
}
My running code.
HttpURLConnection httpConn = (HttpURLConnection) urlConn;
if (httpConn.getResponseCode() < HttpURLConnection.HTTP_BAD_REQUEST) {
in = new InputStreamReader(urlConn.getInputStream());
BufferedReader bufferedReader = new BufferedReader(in);
if (bufferedReader != null) {
int cp;
while ((cp = bufferedReader.read()) != -1) {
sb.append((char) cp);
}
bufferedReader.close();
}
in.close();
} else {
/* error from server */
in = new InputStreamReader(httpConn.getErrorStream());
BufferedReader bufferedReader = new BufferedReader(in);
if (bufferedReader != null) {
int cp;
while ((cp = bufferedReader.read()) != -1) {
sb.append((char) cp);
}
bufferedReader.close();
}
in.close();
}
System.out.println("sb="+sb);
How to read 404 response body in java:
Use Apache library - https://hc.apache.org/httpcomponents-client-4.5.x/httpclient/apidocs/
or
Java 11 - https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html
Snippet given below uses Apache:
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.util.EntityUtils;
CloseableHttpClient client = HttpClients.createDefault();
CloseableHttpResponse resp = client.execute(new HttpGet(domainName + "/blablablabla.html"));
String response = EntityUtils.toString(resp.getEntity());

Categories