Java: Reading from a URL produces gibberish - java

So I've been trying to read the html code from kickass.to(it works fine on other sites) but all I get is some weird gibberish.
My code:
BufferedReader in = new BufferedReader(
new InputStreamReader(new URL("http://kickass.to/").openStream()));
String s = "";
while ((s=in.readLine())!=null) System.out.println(s);
in.close();
For example:
Does anyone knows why it does that?
thanks!

The problem here is a server that is probably not configured correctly, as it returns its response gzip compressed, even if the client does not send an Accept-Encoding: gzip header.
So what you're seeing is the compressed version of the page. To decompress it, pass it through a GZIPInputStream:
BufferedReader in = new BufferedReader(
new InputStreamReader(
new GZIPInputStream(new URL("http://kickass.to/").openStream())));

Related

Getting weird characters from a web page

I am writing a program in Java that parses some text from a web page. But when I use the code below I get weird/incorrect characters.
code:
URL url = new URL(getSearchUrl(crit));
URLConnection connection = url.openConnection();
BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"));
String line;
while((line = br.readLine()) != null){
System.out.println(line);
}
br.close();
I get the following output:
?}?v?8????...
So what am I doing wrong? I know that the site I want to gather info from uses utf-8.
Edit: I am currently in Crotia. I tried some other program I know worked in Serbia (my home country) but it doesn't work here.
It's g-zipped. you can see it using connection.getContentEncoding().
If you use a GZIPInputStream around the connection.getInputStream() it should work better.
BufferedReader br = new BufferedReader(
new InputStreamReader(new GZIPInputStream(connection.getInputStream()), "UTF-8"));

How to access data in InputStream as String?

I am trying to create an android client and java server application using socket programming. I need to retrieve the file send from java server but i dont want to write that content into another file in client side. Instead i want to create a listbox with the contents in the received file. I can find code for write these contents in a file but I dont know how to access the contents as strings.
Here is the code i tried:
Android client
client=new Socket("10.0.2.2", 7575);
writer=new PrintWriter(client.getOutputStream(),true);
writer.write(mMsg);
writer.flush();
writer.close();
InputStream is=client.getInputStream();
bytesread=is.read(mybytearray,0,mybytearray.length);
I changed my code as follws but it is not working.
InputStream is=client.getInputStream();
BufferedReader bf=new BufferedReader(new InputStreamReader(is));
String value=bf.readLine();
Wrap your inputstream into a BufferedReader
BufferedReader d
= new BufferedReader(new InputStreamReader(is));
and use readLine() method.
Reads a line of text. A line is considered to be terminated by any one
of a line feed ('\n'), a carriage return ('\r'), or a carriage return
followed immediately by a linefeed.
Try this:
InputStream is=client.getInputStream();
bytesread=is.read(mybytearray,0,mybytearray.length);
// Create a new String out of the bytes read
String data = new String(mybytearray, 0, bytesread, CharSet);
OR
Wrap the InputStream using BufferedReader and read the data line by line (using readLine()) in String format.
Using buffered reader is best or you can try the code below:
After the last statement use the byte[] "mybytearray" to convert to string as below:
if(bytesread > 0)
String result = new String(mybytearray);

Java reading httprequest stream to process json is jibberish

JSONObject obj = new JSONObject();
br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
obj.put("auth key", br.readLine());
System.out.println(obj.toString());
br.close();
return "test";
The problem I am having with this code is the json outputted is jibberish "{"auth key":""}" at first after reading through google I thought it was because it was being compressed with gzip however after checking the headers with fiddler there is no content encoding.
Any thoughts on the matter would be good many thanks

Using Perl data on server to display on Android

I am developing an Android application. I am calling a Perl file on a server. This Perl file has different print statements.
I want to make the collective text available to a variable in android Java file of mine.
I have tried this :
URL url= new URL("http://myserver.com/cgi-bin/myfile.pl?var=97320");
here goes my request to the server file. But how can i get the data from the Perl file available there?
In your perl service:
use CGI qw(param header);
use JSON;
my $var = param('var');
my $json = &fetch_return_data($var);
print header('application/json');
print to_json($json); # or encode_json($json) for utf-8
to return data as a JSON object. Then use one of many JSON libraries for Java to read the data. For instance http://json.org/java/:
Integer var = 97320;
InputStream inputStream = new URL("http://myserver.com/cgi-bin/myfile.pl?var=" + var).openStream();
try {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
// Or this if you returned utf-8 from your service
//BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8")));
JSONObject json = new JSONObject(readAll(bufferedReader));
} catch (Exception e) {
}

Wrong encoding with Java HttpURLConnection

Trying to read a generated XML from a MS Webservice
URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
line = buff.readLine();
text.append(line + "\n");
} while (line != null);
box.setText(text.toString());
or
URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
System.out.println(inputLine);
}
in.close();
Any page reads fine except the web service output
it reads the greater and less than signs strangely
it read < to "& lt;" and > to "& gt;" without spaces, but if i type them here without spaces stackoverflow makes them < and >
Please help
thanks
First there seem to be a confusion on this row:
inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
This effectively says that you expect every row in the document that your server is providing to be URL encoded. URL encoding is not the same as document encoding.
http://en.wikipedia.org/wiki/Percent-encoding
http://en.wikipedia.org/wiki/Character_encoding
Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after.
In terms of document character encoding. You are making a conversion on this line:
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
conn.getContent() returns an InputStream that operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. Checkout the other constructors of InputStreamReader which takes the encoding as second argument. Without the second argument you are falling back on whatever is your platform default in java.
InputStreamReader(InputStream in, String charsetName)
for instance lets you change your code to:
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");
But the real question will be "what encoding is your server providing the content in?" If you own the server code too, you may just hard code it to something reasonable such as utf-8. But if it can vary, you need to look at the http header Content-Type to figure it out.
String contentType = conn.getHeaderField("Content-Type");
The contents of contentType will look like
text/plain; charset=utf-8
A short hand way of getting this field is:
String contentEncoding = conn.getContentEncoding();
Notice that it's entirely possible that no charset is provided, or no Content-Type header, in which case you must fall back on reasonable defaults.
Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format
Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way.

Categories