Read utf-8 url to string java - java

Good day. Have just switched from objective-c to java and trying to read url contents normally to string. Read tons of posts and still it gives garbage.
public class TableMain {
/**
* #param args
*/
#SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception {
URL url = null;
URLConnection urlConn = null;
try {
url = new URL("http://svo.aero/timetable/today/");
} catch (MalformedURLException err) {
err.printStackTrace();
}
try {
urlConn = url.openConnection();
} catch (IOException e) {
e.printStackTrace();
}
try {
BufferedReader input = new BufferedReader(new InputStreamReader(
urlConn.getInputStream(), "UTF-8"));
StringBuilder strB = new StringBuilder();
String str;
while (null != (str = input.readLine())) {
strB.append(str).append("\r\n");
System.out.println(str);
}
input.close();
} catch (IOException err) {
err.printStackTrace();
}
}
}
What's wrong? I get something like this
??y??'??)j1???-?q?E?|V??,??< 9??d?Bw(?э?n?v?)i?x?????Z????q?MM3~??????G??љ??l?U3"Y?]????zxxDx????t^???5???j?‌​?k??u?q?j6?^t???????W??????????~?????????o6/?|?8??{???O????0?M>Z{srs??K???XV??4Z‌​??'??n/??^??4????w+?????e???????[?{/??,??WO???????????.?.?x???????^?rax??]?xb??‌​& ??8;?????}???h????H5????v?e?0?????-?????g?vN

Here is a method using HttpClient:
public HttpResponse getResponse(String url) throws IOException {
httpClient.getParams().setParameter("http.protocol.content-charset", "UTF-8");
return httpClient.execute(new HttpGet(url));
}
public String getSource(String url) throws IOException {
StringBuilder sb = new StringBuilder();
HttpResponse response = getResponse(url);
if (response.getEntity() == null) {
throw new IOException("Response entity not set");
}
BufferedReader contentReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = contentReader.readLine();
while ( line != null ){
sb.append(line)
.append(NEW_LINE);
line = contentReader.readLine();
}
return sb.toString();
}
Edit: I edited the response to ensure it uses utf-8.

This is a result of:
You are fetching data that is UTF-8 encoded
You are didn't specify, but I surmise you are printing it to the console on a Windows system
The data is being received and stored correctly, but when you print it the destination is incapable of rendering the Russian text. You will not be able to just "print" the text to stdout unless the ultimate display handler is capable of rendering the characters involved.

Related

Reading txt file online

Consider following
Code
private String url = "https://celestrak.org/NORAD/elements/resource.txt";
#Override
public Boolean crawl() {
try {
// Timeout is set to 20s
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT).timeout(20 * 1000);
Document htmlDocument = connection.get();
// 200 is the HTTP OK status code
if (connection.response().statusCode() == 200) {
System.out.println("\n**Visiting** Received web page at " + url);
} else {
System.out.println("\n**Failure** Web page not recieved at " + url);
return Boolean.FALSE;
}
if (!connection.response().contentType().contains("text/plain")) {
System.out.println("**Failure** Retrieved something other than plain text");
return Boolean.FALSE;
}
System.out.println(htmlDocument.text()); // Here it print whole text file in one line
} catch (IOException ioe) {
// We were not successful in our HTTP request
System.err.println(ioe);
return Boolean.FALSE;
}
return Boolean.TRUE;
}
Output
SCD 1 1 22490U 93009B 16329.83043855 .00000228 00000-0 12801-4 0 9993 2 22490 24.9691 122.2579 0043025 337.9285 169.5838 14.44465946256021 TECHSAT 1B (GO-32) 1 25397U ....
I am trying to read an online-txt file (from https://celestrak.org/NORAD/elements/resource.txt). Problem is that while I print or save the body's text it prints whole online-txt file in one line. But I want to read it as splited by \n so that I can read it line by line. Am I making mistake while reading online-txt file?
I am using JSoup.
you can do it without using jsoup in the following manner:
public static void main(String[] args) {
String data;
try {
data = IOUtils.toString(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
for (String line : data.split("\n")) {
System.out.println(line);
}
} catch (IOException e1) {
e1.printStackTrace();
}
}
the above code uses org.apache.commons.io.IOUtils
if adding the commons library is a issue you can use the below code:
public static void main(String[] args) {
URLReader reader;
try {
reader = new URLReader(new URL("https://celestrak.com/NORAD/elements/resource.txt"));
BufferedReader bufferedReader = new BufferedReader(reader);
String sCurrentLine;
while ((sCurrentLine = bufferedReader.readLine()) != null) {
System.out.println(sCurrentLine);
}
bufferedReader.close();
} catch (MalformedURLException e1) {
e1.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Since the file is already delimited by line separator, we can simple take the input stream from URL to read the contents
String url = "https://celestrak.com/NORAD/elements/resource.txt";
List<String> text = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines().collect(Collectors.toList());
To convert to a String
String content = new BufferedReader(new InputStreamReader(new URL(url).openStream())).lines()
.collect(Collectors.joining(System.getProperty("line.separator")));

Reading HttpURLConnection

I've been trying to figure out how to read a HttpURLConnection. According to this example: http://www.vogella.com/tutorials/AndroidNetworking/article.html , the following code should work. However, readStream never fires, and I'm not logging any lines.
I do get that the InputStream is passed through the buffer and all, but for me the logic breaks down in the readStream method, and then mostly the empty string 'line' and the while statement. What exactly is happening there / should happen there, and how would I be able to fix it? Also, why do I have to create the url in the Try statement? It gives back a Unhandled Exception; java.net.MalformedURLException.
Thanks in advance!
static String SendURL(){
try {
URL url = new URL("http://www.google.com/");
HttpURLConnection con = (HttpURLConnection) url.openConnection();
readStream (con.getInputStream());
} catch (Exception e) {
e.printStackTrace();
}
return ("Done");
}
static void readStream(InputStream in) {
BufferedReader reader = null;
try {
reader = new BufferedReader(new InputStreamReader(in));
String line = "";
while ((line = reader.readLine()) != null) {
Log.i("Tag", line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
There are a bunch of things wrong with the code I posted in the question. Here is a working example:
public class GooglePlaces extends AsyncTask {
public InputStream inputStream;
public GooglePlaces(Context context) {
String url = "https://www.google.com";
try {
HttpRequest httpRequest = requestFactory.buildGetRequest(new GenericUrl(url));
HttpResponse httpResponse = httpRequest.execute();
inputStream = httpResponse.getContent();
} catch (IOException e) {
e.printStackTrace();
}
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder builder = new StringBuilder();
try {
for (String line = null; (line = bufferedReader.readLine()) != null;) {
builder.append(line).append("\n");
Log.i("GooglePlacesTag", line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
It appears you are not connecting your HTTPUrlClient try con.connect()

Why got 405 error when post data to website by using Java?

I just try to post data to google by using the following code,but always got 405 error,can anybody tell me way?
package com.tom.labs;
import java.net.*;
import java.io.*;
public class JavaHttp {
public static void main(String[] args) throws Exception {
File data = new File("D:\\in.txt");
File result = new File("D:\\out.txt");
FileOutputStream out = new FileOutputStream(result);
OutputStreamWriter writer = new OutputStreamWriter(out);
Reader reader = new InputStreamReader(new FileInputStream(data));
postData(reader,new URL("http://google.com"),writer);//Not working
//postData(reader,new URL("http://google.com/search"),writer);//Not working
sendGetRequest("http://google.com/search", "q=Hello");//Works properly
}
public static String sendGetRequest(String endpoint,
String requestParameters) {
String result = null;
if (endpoint.startsWith("http://")) {
// Send a GET request to the servlet
try {
// Send data
String urlStr = endpoint;
if (requestParameters != null && requestParameters.length() > 0) {
urlStr += "?" + requestParameters;
}
URL url = new URL(urlStr);
URLConnection conn = url.openConnection();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
StringBuffer sb = new StringBuffer();
String line;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
rd.close();
result = sb.toString();
} catch (Exception e) {
e.printStackTrace();
}
}
System.out.println(result);
return result;
}
/**
* Reads data from the data reader and posts it to a server via POST
* request. data - The data you want to send endpoint - The server's address
* output - writes the server's response to output
*
* #throws Exception
*/
public static void postData(Reader data, URL endpoint, Writer output)
throws Exception {
HttpURLConnection urlc = null;
try {
urlc = (HttpURLConnection) endpoint.openConnection();
try {
urlc.setRequestMethod("POST");
} catch (ProtocolException e) {
throw new Exception(
"Shouldn't happen: HttpURLConnection doesn't support POST??",
e);
}
urlc.setDoOutput(true);
urlc.setDoInput(true);
urlc.setUseCaches(false);
urlc.setAllowUserInteraction(false);
urlc.setRequestProperty("Content-type", "text/xml; charset=UTF-8");
OutputStream out = urlc.getOutputStream();
try {
Writer writer = new OutputStreamWriter(out, "UTF-8");
pipe(data, writer);
writer.close();
} catch (IOException e) {
throw new Exception("IOException while posting data", e);
} finally {
if (out != null)
out.close();
}
InputStream in = urlc.getInputStream();
try {
Reader reader = new InputStreamReader(in);
pipe(reader, output);
reader.close();
} catch (IOException e) {
throw new Exception("IOException while reading response", e);
} finally {
if (in != null)
in.close();
}
} catch (IOException e) {
e.printStackTrace();
throw new Exception("Connection error (is server running at "
+ endpoint + " ?): " + e);
} finally {
if (urlc != null)
urlc.disconnect();
}
}
/**
* Pipes everything from the reader to the writer via a buffer
*/
private static void pipe(Reader reader, Writer writer) throws IOException {
char[] buf = new char[1024];
int read = 0;
while ((read = reader.read(buf)) >= 0) {
writer.write(buf, 0, read);
}
writer.flush();
}
}
405 means "method not allowed". For example, if you try to POST to a URL that doesn't allow POST, then the server will return a 405 status.
What are you trying to do by making a POST request to Google? I suspect that Google's home page only allows GET, HEAD, and maybe OPTIONS.
Here's the body of a POST request to Google, containing Google's explanation.
405. That’s an error.
The request method POST is inappropriate for the URL /. That’s all we know.

Reading from a URL Connection Java

I'm trying to read html code from a URL Connection. In one case the html file I'm trying to read includes 5 line breaks before the actual doc type declaration. In this case the input reader throws an exception for EOF.
URL pageUrl =
new URL(
"http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"
);
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
//some read method here
Has anyone ran into a problem like this?
URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
String urlData = "";
while ((urlData = dis.readUTF()) != null)
System.out.println(urlData);
//exception thrown
java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
in the case of bufferedreader, it just responds null and doesn't continue
pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream()));
String urlData = "";
while(true)
urlData = br.readLine();
System.out.println(urlData);
outputs null
You're using DataInputStream to read data that wasn't encoded using DataOutputStream. Examine the documented behavior for your call to DataInputStream#readUtf(); it first reads two bytes to form a 16-bit integer, indicating the number of bytes that follow comprising the UTF-encoded string. The data you're reading from the HTTP server is not encoded in this format.
Instead, the HTTP server is sending headers encoded in ASCII, per RFC 2616 sections 6.1 and 2.2. You need to read the headers as text, and then determine how the message body (the "entity") is encoded.
This works fine:
package url;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
/**
* UrlReader
* #author Michael
* #since 3/20/11
*/
public class UrlReader
{
public static void main(String[] args)
{
UrlReader urlReader = new UrlReader();
for (String url : args)
{
try
{
String contents = urlReader.readContents(url);
System.out.printf("url: %s contents: %s\n", url, contents);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
public String readContents(String address) throws IOException
{
StringBuilder contents = new StringBuilder(2048);
BufferedReader br = null;
try
{
URL url = new URL(address);
br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
while (line != null)
{
line = br.readLine();
contents.append(line);
}
}
finally
{
close(br);
}
return contents.toString();
}
private static void close(Reader br)
{
try
{
if (br != null)
{
br.close();
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
This:
public class Main {
public static void main(String[] args)
throws MalformedURLException, IOException
{
URL pageUrl = new URL("http://www.google.com");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader dis = new BufferedReader(
new InputStreamReader(
getConn.getInputStream()));
String myString;
while ((myString = dis.readLine()) != null)
{
System.out.println(myString);
}
}
}
Works perfectly. The URL you are supplying, however, returns nothing.

Read url to string in few lines of java code

I'm trying to find Java's equivalent to Groovy's:
String content = "http://www.google.com".toURL().getText();
I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.
Now that some time has passed since the original answer was accepted, there's a better approach:
String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
If you want a slightly fuller implementation, which is not a single line, do this:
public static String readStringFromURL(String requestURL) throws IOException
{
try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
StandardCharsets.UTF_8.toString()))
{
scanner.useDelimiter("\\A");
return scanner.hasNext() ? scanner.next() : "";
}
}
This answer refers to an older version of Java. You may want to look at ccleve's answer.
Here is the traditional way to do this:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static void main(String[] args) throws Exception {
String content = URLConnectionReader.getText(args[0]);
System.out.println(content);
}
}
As #extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:
InputStream in = new URL( "http://jakarta.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}
Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.
There's an even better way as of Java 9:
URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)
Now that more time has passed, here's a way to do it in Java 8:
URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
pageText = reader.lines().collect(Collectors.joining("\n"));
}
Additional example using Guava:
URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);
Java 11+:
URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();
If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)
The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)
If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.
Example
String result = getUrlAsString("https://www.google.com");
System.out.println(result);
outputs
<!doctype html><html itemscope="" .... etc
Code
import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public static String getUrlAsString(String url)
{
try
{
URL urlObj = new URL(url);
URLConnection con = urlObj.openConnection();
con.setDoOutput(true); // we want the response
con.setRequestProperty("Cookie", "myCookie=test123");
con.connect();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
String newLine = System.getProperty("line.separator");
while ((inputLine = in.readLine()) != null)
{
response.append(inputLine + newLine);
}
in.close();
return response.toString();
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}
Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:
private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
String urlData = "";
URL urlObj = new URL(aUrl);
URLConnection conn = urlObj.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8)))
{
urlData = reader.lines().collect(Collectors.joining("\n"));
}
return urlData;
}
URL to String in pure Java
Example call to get payload from http get call
String str = getStringFromUrl("YourUrl");
Implementation
You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.
The outcome will be something like
public String getStringFromUrl(URL url) throws IOException {
return inputStreamToString(urlToInputStream(url,null));
}
public String inputStreamToString(InputStream inputStream) throws IOException {
try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString(UTF_8);
}
}
private InputStream urlToInputStream(URL url, Map<String, String> args) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setReadTimeout(15000);
if (args != null) {
for (Entry<String, String> e : args.entrySet()) {
con.setRequestProperty(e.getKey(), e.getValue());
}
}
con.connect();
int responseCode = con.getResponseCode();
/* By default the connection will follow redirects. The following
* block is only entered if the implementation of HttpURLConnection
* does not perform the redirect. The exact behavior depends to
* the actual implementation (e.g. sun.net).
* !!! Attention: This block allows the connection to
* switch protocols (e.g. HTTP to HTTPS), which is <b>not</b>
* default behavior. See: https://stackoverflow.com/questions/1884230
* for more info!!!
*/
if (responseCode < 400 && responseCode > 299) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, args);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, args);
}
}
/*!!!!!*/
inputStream = con.getInputStream();
return inputStream;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Pros
It is pure java
It can be easily enhanced by adding different headers as a map (instead of passing a null object, like the example above does), authentication, etc.
Handling of protocol switches is supported

Categories