Reading from a URL Connection Java - java

I'm trying to read html code from a URL Connection. In one case the html file I'm trying to read includes 5 line breaks before the actual doc type declaration. In this case the input reader throws an exception for EOF.
URL pageUrl =
new URL(
"http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"
);
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
//some read method here
Has anyone ran into a problem like this?
URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
String urlData = "";
while ((urlData = dis.readUTF()) != null)
System.out.println(urlData);
//exception thrown
java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
in the case of bufferedreader, it just responds null and doesn't continue
pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream()));
String urlData = "";
while(true)
urlData = br.readLine();
System.out.println(urlData);
outputs null

You're using DataInputStream to read data that wasn't encoded using DataOutputStream. Examine the documented behavior for your call to DataInputStream#readUtf(); it first reads two bytes to form a 16-bit integer, indicating the number of bytes that follow comprising the UTF-encoded string. The data you're reading from the HTTP server is not encoded in this format.
Instead, the HTTP server is sending headers encoded in ASCII, per RFC 2616 sections 6.1 and 2.2. You need to read the headers as text, and then determine how the message body (the "entity") is encoded.

This works fine:
package url;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
/**
* UrlReader
* #author Michael
* #since 3/20/11
*/
public class UrlReader
{
public static void main(String[] args)
{
UrlReader urlReader = new UrlReader();
for (String url : args)
{
try
{
String contents = urlReader.readContents(url);
System.out.printf("url: %s contents: %s\n", url, contents);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
public String readContents(String address) throws IOException
{
StringBuilder contents = new StringBuilder(2048);
BufferedReader br = null;
try
{
URL url = new URL(address);
br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
while (line != null)
{
line = br.readLine();
contents.append(line);
}
}
finally
{
close(br);
}
return contents.toString();
}
private static void close(Reader br)
{
try
{
if (br != null)
{
br.close();
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}

This:
public class Main {
public static void main(String[] args)
throws MalformedURLException, IOException
{
URL pageUrl = new URL("http://www.google.com");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader dis = new BufferedReader(
new InputStreamReader(
getConn.getInputStream()));
String myString;
while ((myString = dis.readLine()) != null)
{
System.out.println(myString);
}
}
}
Works perfectly. The URL you are supplying, however, returns nothing.

Related

Unable to unzip the web service response in Linux using java

We are executing web services using java code. The web service response comes in gzip format from the web service provider. We are unzipping the response using GZIPInputStream after receiving the response.
Response is converted into byte codes and then passing as input to gzipinputstream. This code is working fine in Eclipse and able to unzip the response string. The same code is not working in Linux and throwing the error "Not in Gzip format" while passing the byte array to gzipinputstream.
We checked the default charset in Windows is windows-1252 and in Linux is UTF-8. So, we tried to get the bytes in UTF-8 and windows-1252. Both are not working.
Can anyone please help me where is it going wrong and how to resolve the issue?
Tried changing the charset while generating the byte codes of the response.
import java.util.List;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.Authenticator;
import java.net.HttpURLConnection;
import java.net.InetSocketAddress;
import java.net.PasswordAuthentication;
import java.net.Proxy;
import java.net.URL;
import java.io. * ;
import java.nio.charset.*;
import java.util.zip.GZIPInputStream;
import java.nio.charset.*;
public class WSConnectTest {
public final static String UserName = null; //User id login for Fusion
public final static String instanceURL = null;
public final static String USER_PWD = null; // API key shared by CSOD
private static final String PROXY_URL = null; //UBS proxy URL
private static final int PROXY_PORT = 8080;
private static final String PROXY_USERNAME = "USER_NAME";
private static final String PROXY_PASSWORD = "PASSWORD";
final static String USER_AGENT = "Mozilla/5.0";
static Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(
PROXY_URL, PROXY_PORT));
static {
Authenticator authenticator = new Authenticator() {
public PasswordAuthentication getPasswordAuthentication() {
return (new PasswordAuthentication(UserName, USER_PWD.toCharArray()));
}
};
Authenticator.setDefault(authenticator);
}
public static void main(String[] args) throws Exception {
FusionConnect fusionconnect = new FusionConnect();
String theURL = instanceURL + "<RESOURCE_NAME>";
System.out.println("The URL to be called is : " + theURL);
String json = "<JSON_STRING>"
String post_param = new String(json.toString());
System.out.println("The json is :" + json);
PostRequestWithFilter(theURL, post_param);
}
private static void PostRequestWithFilter(String url, String json) throws Exception {
try {
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection(proxy);
con.setRequestMethod("POST");
con.setRequestProperty("User-Agent", "Apache-HttpClient/4.1.1 (java 1.5)");
con.setRequestProperty("Content-Type", "application/json");
con.setRequestProperty("Accept-Language", "UTF-8");
con.setRequestProperty("Accept-Encoding", "gzip, deflate");
con.setDoOutput(true);
con.setConnectTimeout(15000);
System.out.println("get content type :"+con.getRequestProperties());
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(json);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("\nResponse Code : " + responseCode);
System.out.println("\nResponse message : " + con.getResponseMessage());
String inputLine;
StringBuffer response = new StringBuffer();
String ResponseStr = null;
byte[] bresponse = new byte[1024];
String deoutput = null;
BufferedReader in =null;
if (responseCode == con.HTTP_CREATED) { in =new BufferedReader(new InputStreamReader(con.getInputStream()));
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
System.out.println("Response received from Fusion string buffer :"+inputLine);
} in .close();
ResponseStr = response.toString();
System.out.println("response string :"+ResponseStr);
bresponse = ResponseStr.getBytes("UTF-8");
System.out.println("Response received from Fusion Bytes :"+bresponse);
deoutput = unzip(bresponse);
System.out.println("Decompressed response :"+deoutput);
} else { in =new BufferedReader(new InputStreamReader(con.getErrorStream()));
System.out.println("Response Content Type :"+con.getContentType());
System.out.println("Response Content Encoding :"+con.getContentEncoding());
while ((inputLine = in.readLine()) != null) {
response.append(inputLine+"\r");
System.out.println("Response received from Fusion string buffer :"+inputLine);
}
in .close();
ResponseStr = response.toString();
System.out.println("response string :"+response);
bresponse = ResponseStr.getBytes();
for (int i=0; i < bresponse.length; i++)
{
System.out.println("byte code :"+i+" "+bresponse[i]);
}
System.out.println("Response received from Fusion Bytes :"+Charset.defaultCharset()+bresponse);
deoutput = unzip(bresponse);
FileOutputStream fos = new FileOutputStream("fileName1.gz");
DataOutputStream outStream = new DataOutputStream(new BufferedOutputStream(fos));
outStream.writeUTF(ResponseStr);
outStream.close();
System.out.println("Decompressed response :"+deoutput);
}
}
catch(Exception e) {
e.printStackTrace();
}
}
public static String unzip(byte[] compressed) {
if ((compressed == null) || (compressed.length == 0)) {
System.out.println("The response is empty");
throw new IllegalArgumentException("Cannot unzip null or empty bytes");
}
if (!isZipped(compressed)) {
System.out.println("The response is not zipped");
return new String(compressed);
}
StringBuilder output = new StringBuilder();
try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(compressed)) {
System.out.println("After byte array input stream :");
try (GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream)) {
try (InputStreamReader inputStreamReader = new InputStreamReader(byteArrayInputStream, StandardCharsets.UTF_8)){
try (BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {
String line;
System.out.println("buffer reader :"+bufferedReader.readLine());
while ((line = bufferedReader.readLine()) != null) {
output.append(line);
System.out.println("line :"+output.toString());
}
} catch(IOException e) {
throw new RuntimeException("Failed to read bufferedReader content", e);
}
}
}
} catch(Exception e) {
e.printStackTrace();
}
return output.toString();
}
public static boolean isZipped(final byte[] compressed) {
System.out.println("(byte)(GZIPInputStream.GZIP_MAGIC) is "+(byte)(GZIPInputStream.GZIP_MAGIC));
System.out.println("gzip magic is "+(byte)(GZIPInputStream.GZIP_MAGIC >> 8));
return (compressed[0] == (byte)(GZIPInputStream.GZIP_MAGIC)) && (compressed[1] == (byte)(GZIPInputStream.GZIP_MAGIC >> 8));
}
}

SocketException: Connection reset

I all but copied the following code from here. I get a java.net.SocketException on line 10 saying "Connection Reset".
import java.net.*;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld {
public static void main(String[] x) {
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.print(body);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I'm worried this may not actually be an issue with the actual code but rather some permission I need to give Java. Is there something wrong with my code or is this an environment issue?
I used your code with small modification cause I don't have IOUtils at hands. And it works as it should. There is no need to set agent. No special privileges also as I run it by normal user.
try {
URL url = new URL("http://money.cnn.com/2013/06/07/technology/security/page-zuckerberg-spying/index.html");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
System.out.print(sb.toString());
} catch (Exception e) {
e.printStackTrace();
}

Read utf-8 url to string java

Good day. Have just switched from objective-c to java and trying to read url contents normally to string. Read tons of posts and still it gives garbage.
public class TableMain {
/**
* #param args
*/
#SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception {
URL url = null;
URLConnection urlConn = null;
try {
url = new URL("http://svo.aero/timetable/today/");
} catch (MalformedURLException err) {
err.printStackTrace();
}
try {
urlConn = url.openConnection();
} catch (IOException e) {
e.printStackTrace();
}
try {
BufferedReader input = new BufferedReader(new InputStreamReader(
urlConn.getInputStream(), "UTF-8"));
StringBuilder strB = new StringBuilder();
String str;
while (null != (str = input.readLine())) {
strB.append(str).append("\r\n");
System.out.println(str);
}
input.close();
} catch (IOException err) {
err.printStackTrace();
}
}
}
What's wrong? I get something like this
??y??'??)j1???-?q?E?|V??,??< 9??d?Bw(?э?n?v?)i?x?????Z????q?MM3~??????G??љ??l?U3"Y?]????zxxDx????t^???5???j?‌​?k??u?q?j6?^t???????W??????????~?????????o6/?|?8??{???O????0?M>Z{srs??K???XV??4Z‌​??'??n/??^??4????w+?????e???????[?{/??,??WO???????????.?.?x???????^?rax??]?xb??‌​& ??8;?????}???h????H5????v?e?0?????-?????g?vN
Here is a method using HttpClient:
public HttpResponse getResponse(String url) throws IOException {
httpClient.getParams().setParameter("http.protocol.content-charset", "UTF-8");
return httpClient.execute(new HttpGet(url));
}
public String getSource(String url) throws IOException {
StringBuilder sb = new StringBuilder();
HttpResponse response = getResponse(url);
if (response.getEntity() == null) {
throw new IOException("Response entity not set");
}
BufferedReader contentReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = contentReader.readLine();
while ( line != null ){
sb.append(line)
.append(NEW_LINE);
line = contentReader.readLine();
}
return sb.toString();
}
Edit: I edited the response to ensure it uses utf-8.
This is a result of:
You are fetching data that is UTF-8 encoded
You are didn't specify, but I surmise you are printing it to the console on a Windows system
The data is being received and stored correctly, but when you print it the destination is incapable of rendering the Russian text. You will not be able to just "print" the text to stdout unless the ultimate display handler is capable of rendering the characters involved.

How to parse a txt web service

I'm trying to obtain a result from a web service in a java program. I've done xml services before but this one is text based and i can't figure out how to record the response.
Here is the webService: http://ws.geonames.org/countryCode?lat=47.03&lng=10.2
Thanks!
If it's only text, and you doesn't use any standard format (like SOAP), you need to use Sockets:
URL myURL = new URL("http://ws.geonames.org/countryCode?lat=47.03&lng=10.2");
URLConnection serviceConnection = myURL.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
serviceConnection.getInputStream()));
List<String> response =new ArrayList<String>();
Use this if you had many lines:
while ((inputLine = in.readLine()) != null)
response.add(inputLine);
Or use this if you had ONLY ONE line (like the Web Service in your question):
String countryCode = in.readLine();
And finish with:
serviceConnection.close();
In my case, countryCode it was "AT"
An alternative implementation in addition to URL is to use a HttpClient.
public class CountryCodeReader {
private String readAll(Reader rd) throws IOException {
StringBuilder sb = new StringBuilder();
int cp;
while ((cp = rd.read()) != -1) {
sb.append((char) cp);
}
return sb.toString();
}
public String readFromUrl(String url) throws IOException, JSONException {
InputStream is = new URL(url).openStream();
try {
InputStreamReader is = new InputStreamReader(is, Charset.forName("UTF-8"))
BufferedReader rd = new BufferedReader(is);
return readAll(rd);
} finally {
is.close();
}
return null;
}
public static void main(String[] argv) {
CountryCodeReader ccr = new CountryCodeReader();
String cc = ccr.readFromUrl("http://ws.geonames.org/countryCode?lat=47.03&lng=10.2");
}
}

Read url to string in few lines of java code

I'm trying to find Java's equivalent to Groovy's:
String content = "http://www.google.com".toURL().getText();
I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.
Now that some time has passed since the original answer was accepted, there's a better approach:
String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
If you want a slightly fuller implementation, which is not a single line, do this:
public static String readStringFromURL(String requestURL) throws IOException
{
try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
StandardCharsets.UTF_8.toString()))
{
scanner.useDelimiter("\\A");
return scanner.hasNext() ? scanner.next() : "";
}
}
This answer refers to an older version of Java. You may want to look at ccleve's answer.
Here is the traditional way to do this:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static void main(String[] args) throws Exception {
String content = URLConnectionReader.getText(args[0]);
System.out.println(content);
}
}
As #extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:
InputStream in = new URL( "http://jakarta.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}
Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.
There's an even better way as of Java 9:
URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)
Now that more time has passed, here's a way to do it in Java 8:
URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
pageText = reader.lines().collect(Collectors.joining("\n"));
}
Additional example using Guava:
URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);
Java 11+:
URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();
If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)
The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)
If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.
Example
String result = getUrlAsString("https://www.google.com");
System.out.println(result);
outputs
<!doctype html><html itemscope="" .... etc
Code
import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public static String getUrlAsString(String url)
{
try
{
URL urlObj = new URL(url);
URLConnection con = urlObj.openConnection();
con.setDoOutput(true); // we want the response
con.setRequestProperty("Cookie", "myCookie=test123");
con.connect();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
String newLine = System.getProperty("line.separator");
while ((inputLine = in.readLine()) != null)
{
response.append(inputLine + newLine);
}
in.close();
return response.toString();
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}
Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:
private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
String urlData = "";
URL urlObj = new URL(aUrl);
URLConnection conn = urlObj.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8)))
{
urlData = reader.lines().collect(Collectors.joining("\n"));
}
return urlData;
}
URL to String in pure Java
Example call to get payload from http get call
String str = getStringFromUrl("YourUrl");
Implementation
You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.
The outcome will be something like
public String getStringFromUrl(URL url) throws IOException {
return inputStreamToString(urlToInputStream(url,null));
}
public String inputStreamToString(InputStream inputStream) throws IOException {
try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString(UTF_8);
}
}
private InputStream urlToInputStream(URL url, Map<String, String> args) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setReadTimeout(15000);
if (args != null) {
for (Entry<String, String> e : args.entrySet()) {
con.setRequestProperty(e.getKey(), e.getValue());
}
}
con.connect();
int responseCode = con.getResponseCode();
/* By default the connection will follow redirects. The following
* block is only entered if the implementation of HttpURLConnection
* does not perform the redirect. The exact behavior depends to
* the actual implementation (e.g. sun.net).
* !!! Attention: This block allows the connection to
* switch protocols (e.g. HTTP to HTTPS), which is <b>not</b>
* default behavior. See: https://stackoverflow.com/questions/1884230
* for more info!!!
*/
if (responseCode < 400 && responseCode > 299) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, args);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, args);
}
}
/*!!!!!*/
inputStream = con.getInputStream();
return inputStream;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Pros
It is pure java
It can be easily enhanced by adding different headers as a map (instead of passing a null object, like the example above does), authentication, etc.
Handling of protocol switches is supported

Categories