Getting weird characters from a web page - java

I am writing a program in Java that parses some text from a web page. But when I use the code below I get weird/incorrect characters.
code:
URL url = new URL(getSearchUrl(crit));
URLConnection connection = url.openConnection();
BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"));
String line;
while((line = br.readLine()) != null){
System.out.println(line);
}
br.close();
I get the following output:
?}?v?8????...
So what am I doing wrong? I know that the site I want to gather info from uses utf-8.
Edit: I am currently in Crotia. I tried some other program I know worked in Serbia (my home country) but it doesn't work here.

It's g-zipped. you can see it using connection.getContentEncoding().
If you use a GZIPInputStream around the connection.getInputStream() it should work better.
BufferedReader br = new BufferedReader(
new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));

Related

reading remote csv file without downloading

I have requirement to read remote big csv file line by line (basically streaming). After each read I want to persist record in db. Currently I am achieving it through below code but I am not sure if it download complete file and keep it in jvm memory. I assume it is not. Can I write this code in better way using some java 8 stream features
URL url = new URL(baseurl);
HttpURLConnection urlConnection = url.openConnection();
if(connection.getResponseCode() == 200)
{
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String current;
while((current = in.readLine()) != null)
{
persist(current);
}
}
First you should use a try-with-resources statement to automatically close your streams when reading is done.
Next BufferedReader has a method BufferedReader::lines which returns a Stream<String>.
Then your code should look like this:
URL url = new URL(baseurl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
try (InputStreamReader streamReader = new InputStreamReader(connection.getInputStream());
BufferedReader br = new BufferedReader(streamReader);
Stream<String> lines = br.lines()) {
lines.forEach(s -> persist(s)); //should be a method reference
}
}
Now it's up to you to decide if the code is better and your assumption is right that you don't keep the whole file in the JVM.

Reading czech input using InputStreamReader

I am writing a server with protocol to translate text by client.
For getting the input code follows:
InputStreamReader isr = new InputStreamReader(clntSock.getInputStream(), "CP852");
BufferedReader br = new BufferedReader(isr);
while ((line = br.readLine()) != null) {
System.out.println(line);
The problem is that after printing out the input (for checking if its accepted correctly), it prints something like "îAU" instead of "ČAU".
I know I could use byte conversion, but wanted to do it this way and can't find the error. Please help

Fetch source code of web page using java?

I have a URL like this and the following method
public static void saveContent( String webURL )throws Exception
{
URL website = new URL(webURL);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
System.out.println(response.toString());
}
However, When I want to print web content, it always fetches the source code of the main page(www.google.com).
How can I solve my problem ? Thanks for your help.
I copied yours code to netbeans and it seems to work correctly. I think the problem could lead on content in method argument "webURL". Try run your app on debug mode and look what you've got back there.

How can I send POST data through url.openStream()?

i'm looking for tutorial or quick example, how i can send POST data throw openStream.
My code is:
URL url = new URL("http://localhost:8080/test");
InputStream response = url.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(response, "UTF-8"));
Could you help me ?
URL url = new URL(urlSpec);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod(method);
connection.setDoOutput(true);
connection.setDoInput(true);
// important: get output stream before input stream
OutputStream out = connection.getOutputStream();
out.write(content);
out.close();
// now you can get input stream and read.
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null) {
writer.println(line);
}
Use Apache HTTP Compoennts http://hc.apache.org/httpcomponents-client-ga/
tutorial: http://hc.apache.org/httpcomponents-client-ga/tutorial/html/fundamentals.html
Look for HttpPost - there are some examples of sending dynamic data, text, files and form data.
Apache HTTP Components in particular, the Client would be the best way to go.
It absracts a lot of that nasty coding you would normally have to do by hand

WebRequest using c# (VS2008) is perfectly working but not on Java (Ecplise)

I'm trying to read data from a webpage, and I have to do it using Java.
When I try to do it in Eclipse using Java i'm getting time out error:
java.net.ConnectException: Connection timed out: connect
(Using HttpURLConnection):
URL yahoo = new URL("http://www.yahoo.com/");
URLConnection yc = yahoo.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
In order to understand where is the problem I tried doing the same task using c# and VS2008, and it worked perfectly fine, no time out at all.
I'm doing this from work so there's a firewall but I don't have information about it.
What can be the reason for this?
Thanks!
Daniel
I'm using this code:
URL yahoo = new URL("http://www.yahoo.com/");
URLConnection yc = yahoo.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
found from here: http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html
I'm doing this from work so there's a firewall but I don't have information about it.

Categories