Error while reading data from webpage in java?

Error while reading data from webpage in java? - java

I am using this code to read data from a webpage :
public class ReadLatex {
public static void main(String[] args) throws IOException {
String urltext = "http://chart.apis.google.com/chart?cht=tx&chl=1+2%20\frac{3}{4}";
URL url = new URL(urltext);
BufferedReader in = new BufferedReader(new InputStreamReader(url
.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
// Process each line.
System.out.println(inputLine);
}
in.close();
}
}
The webpage gives the image for a latex code in the URL.
I am getting this exception:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 400 for URL: http://chart.apis.google.com/chart?
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at ReadLatex.main(ReadLatex.java:11)
Can anyone tell why I am having this problem and what should be the solution for this?

Try escaping with something like org.apache.commons.lang.StringEscapeUtils

Your problem is that you are using a \ (backslash) in a string which in Java is a escape character. To get an actual \ you need to have two of them in your string. So:
Wanted text: part1\part2
you need to have
String theString = "part1\\part2";
So you actually want
String urltext = "http://chart.apis.google.com/chart?cht=tx&chl=1+2%20\\frac{3}{4}";
Also, when you succeed with your request you get back an image (png) which should not be read with a reader which will try to interpret the bytes as characters using some encoding and this will break the image data. Instead, use the input stream and write the content (bytes) to a file.
A simple example without error handling
public static void main(String[] args) throws IOException {
String urltext = "http://chart.apis.google.com/chart?cht=tx&chl=1+2%20\\frac{3}{4}";
URL url = new URL(urltext);
InputStream in = url.openStream();
FileOutputStream out = new FileOutputStream("TheImage.png");
byte[] buffer = new byte[8*1024];
int readSize;
while ( (readSize = in.read(buffer)) != -1) {
out.write(buffer, 0, readSize);
}
out.close();
in.close();
}

I think you should consider escaping the backslash in the URL. I Java, the backslash must be escaped in a String
It should become
String urltext =
"http://chart.apis.google.com/chart?cht=tx&chl=1+2%20\\frac{3}{4}";
This was for the pure java start.
It seems that this url works with my browser but, as suggested in the other answers, I think it should be better to also escape all the special characters such as backslashes, laces...

Related

How can I get the data from website in Java?

I want to get the value of "Yield" in "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319"
How can I do this with java?
I have tried "Jsoup" and my code like these:
public static void main(String[] args) throws IOException {
String url = "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319";
Document document = Jsoup.connect(url).get();
Elements answerers = document.select(".c3 .floatR ");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.data());
}
// TODO code application logic here
}
But it return empty. How can I do this?

Your code is fine. I tested it myself. The problem is the URL you're using. If I open the url in a browser, the value fields (e.g. Yield) are empty. Using the browser development tools (Network tab) you should get an URL that looks like:
http://www.aastocks.com/en/ltp/RTQuoteContent.aspx?symbol=01319&process=y
Using this URL gives you the wanted results.

The simplest solution is to create a URL instance pointing to the web page / link you want get the content using streams-
for example-
public static void main(String[] args) throws IOException
{
URL url = new URL("http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319");
// Get the input stream through URL Connection
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
// Once you have the Input Stream, it's just plain old Java IO stuff.
// For this case, since you are interested in getting plain-text web page
// I'll use a reader and output the text content to System.out.
// For binary content, it's better to directly read the bytes from stream and write
// to the target file.
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
// read each line and write to System.out
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}

I think Jsoup is critical in this purpose. I would not suspect a valid HTML document (or whatever).

Remove Base64 prefix from InputStream

I have a Base64 encoded Image String residing in a File Server. The encoded String has a prefix (ex: "data:image/png;base64,") for support in popular modern browsers (it's obtained via JavaScript's Canvas.toDataURL() method). The client sends a request for the image to my server which verifies them and returns a stream of the Base64 encoded String.
If the client is a web client, the image can be displayed as is within an <img> tag by setting the src to the Base64 encoded String. However, if the client is an Android client, the String needs to be decoded into a Bitmap without the prefix. Though, this can be done fairly easily.
The Problem:
In order to simplify my code and not reinvent the wheel, I'm using an Image Library for the Android client to handle loading, displaying, and caching the images (Facebook's Fresco Library to be exact). However, no library seems to support Base64 decoding (I want my cake and to eat it too). A solution I came up with is to decode the Base64 String on the server as it is being streamed to the client.
The Attempt:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
//decodes the stream as it is being read
InputStream stream = decoder.wrap(obj.getObjectContent());
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
int nextByte = 0;
while((nextByte = stream.read()) != -1){
output.write(nextByte);
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
}
Unfortunately, the Fresco library still has a problem displaying the image (with no stack traces!). As there doesn't seem to be an issue on my server when decoding the stream (no stack traces either), it leads me to believe that it must be an issue with the prefix. Which leaves me with a dilemma.
The Question: How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server? Is this possible?

Fresco does support decoding data URIs, just as the web client does.
The demo app has an example of this.

How do I remove the Base64 prefix from a Stream being sent to the client without storing and editing the entire Stream on the server?
Removing the prefix while sending the stream to the client turns out to be a pretty complex task. If you don't mind storing the whole String on the server you could simply do:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
String result = sb.toString();
//comma is the charater which seperates the prefix and the Base64 String
int i = result.indexOf(",");
result = result.substring(i + 1);
//Now, that we have just the Base64 encoded String, we can decode it
Base64.Decoder decoder = Base64.getDecoder();
byte[] decoded = decoder.decode(result);
//Now, just write each byte from the byte array to the output stream
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
But to be more efficient and not store the entire Stream on the server, creates a much more complicated task. We could use the Base64.Decoder.wrap() method but the problem with that is that it throws an IOException if it reaches a value that cannot be decoded (wouldn't it be nice if they provided a method that just left the bytes as is if they can't be decoded?). And unfortunately, the Base64 prefix can't be decoded because it's not Base64 encoded. So, it would throw an IOException.
To get around this problem, we would have to use an InputStreamReader to read the InputStream with the specified appropriate Charset. Then we would have to cast the ints received from the InputStream's read() method call to chars. When we reach the appropriate amount of chars, we would have to compare it with the Base64 prefix's intro ("data"). If it's a match, we know the Stream contains the prefix, so continue reading until we reach the prefix end character (the comma: ","). Finally, we can begin streaming out the bytes after the prefix. Example:
S3Object obj = s3Client.getObject(new GetObjectRequest(bucketName, keyName));
Base64.Decoder decoder = Base64.getDecoder();
InputStream stream = obj.getObjectContent();
InputStreamReader reader = new InputStreamReader(stream);
try{
return new StreamingOutput(){
#Override
public void write(OutputStream output) throws IOException, WebApplicationException{
//for checking if string has base64 prefix
char[] pre = new char[4]; //"data" has at most four bytes on a UTF-8 encoding
boolean containsPre = false;
int count = 0;
int nextByte = 0;
while((nextByte = stream.read()) != -1){
if(count < pre.length){
pre[count] = (char) nextByte;
count++;
}else if(count == pre.length){
//determine whether has prefix or not and act accordingly
count++;
containsPre = (Arrays.toString(pre).toLowerCase().equals("data")) ? true : false;
if(!containsPre){
//doesn't have Base64 prefix so write all the bytes until this point
for(int i = 0; i < pre.length; i++){
output.write((int) pre[i]);
}
output.write(nextByte);
}
}else if(containsPre && count < 25){
//the comma character (,) is considered the end of the Base64 prefix
//so look for the comma, but be realistic, if we don't find it at about 25 characters
//we can assume the String is not encoded correctly
containsPre = (Character.toString((char) nextByte).equals(",")) ? false : true;
count++;
}else{
output.write(nextByte);
}
}
output.flush();
output.close();
stream.close();
}
};
}catch(Exception e){
e.printStackTrace();
return null;
}
This seems a bit hefty of a task to do on the server so I think decoding on the client side is a better choice. Unfortunately, most Android client side libraries don't have support for Base64 decoding (especially with the prefix). However, as #tyronen pointed out Fresco does support it if the String is already obtained. Though, this removes one of the key reasons to use an image loading library.
Android Client Side Decoding
To decode on the client side application is pretty easy. First obtain the String from the InputStream:
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
String line;
try {
br = new BufferedReader(new InputStreamReader(stream));
while ((line = br.readLine()) != null) {
sb.append(line);
}
return sb.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Then decode the String using Android's Base64 class:
int i = result.indexOf(",");
result = result.substring(i + 1);
byte[] decodedString = Base64.decode(result, Base64.DEFAULT);
Bitmap bitMap = BitmapFactory.decodeByteArray(decodedString, 0, decodedString.length);
The Fresco library seems hard to update due to them using a lot of delegation. So, I moved on to using the Picasso image loading library and created my own fork of it with the Base64 decoding ability.

Formatting Web Service Response

I use the below function to retrieve the web service response:
private String getSoapResponse (String url, String host, String encoding, String soapAction, String soapRequest) throws MalformedURLException, IOException, Exception {
URL wsUrl = new URL(url);
URLConnection connection = wsUrl.openConnection();
HttpURLConnection httpConn = (HttpURLConnection)connection;
ByteArrayOutputStream bout = new ByteArrayOutputStream();
byte[] buffer = new byte[soapRequest.length()];
buffer = soapRequest.getBytes();
bout.write(buffer);
byte[] b = bout.toByteArray();
httpConn.setRequestMethod("POST");
httpConn.setRequestProperty("Host", host);
if (encoding == null || encoding == "")
encoding = UTF8;
httpConn.setRequestProperty("Content-Type", "text/xml; charset=" + encoding);
httpConn.setRequestProperty("Content-Length", String.valueOf(b.length));
httpConn.setRequestProperty("SOAPAction", soapAction);
httpConn.setDoOutput(true);
httpConn.setDoInput(true);
OutputStream out = httpConn.getOutputStream();
out.write(b);
out.close();
InputStreamReader is = new InputStreamReader(httpConn.getInputStream());
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(is);
String read = br.readLine();
while(read != null) {
sb.append(read);
read = br.readLine();
}
String response = decodeHtmlEntityCharacters(sb.toString());
return response = decodeHtmlEntityCharacters(response);
}
But my problem with this code is it returns lots of special characters and makes the structure of the XML invalid.
Example response:
<PLANT>A565</PLANT>
<PLANT>A567</PLANT>
<PLANT>A585</PLANT>
<PLANT>A921</PLANT>
<PLANT>A938</PLANT>
</PLANT_GROUP>
</KPI_PLANT_GROUP_KEYWORD>
<MSU_CUSTOMERS/>
</DU>
<DU>
So to solve this, I use the below method and pass the whole response to replace all the special characters with its corresponding punctuation.
private final static Hashtable htmlEntitiesTable = new Hashtable();
static {
htmlEntitiesTable.put("&","&");
htmlEntitiesTable.put(""","\"");
htmlEntitiesTable.put("<","<");
htmlEntitiesTable.put(">",">");
}
private String decodeHtmlEntityCharacters(String inputString) throws Exception {
Enumeration en = htmlEntitiesTable.keys();
while(en.hasMoreElements()){
String key = (String)en.nextElement();
String val = (String)htmlEntitiesTable.get(key);
inputString = inputString.replaceAll(key, val);
}
return inputString;
}
But another problem arised. If the response contains this segment <VALUE>< 0.5 </VALUE< and if this will be evaluated by the method, the output would be:
<VALUE>< 0.5</VALUE>
Which makes the structure of the XML invalid again.
The data is correct and valid "< 0.5" but having it within the VALUE elements causes issue on the structure of the XML.
Can you please help how to deal with this? Maybe the way I get or build the response can be improved. Is there any better way to call and get the response from web service?
How can I deal with elements containing "<" or ">"?

Do you know how to use a third-party open source library?
You should try using apache commons-lang:
StringEscapeUtils.unescapeXml(xml)
More detail is provided in the following stack overflow post:
how to unescape XML in java
Documentation:
http://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html
http://commons.apache.org/proper/commons-lang/userguide.html#lang3.

You're using SOAP wrong.
In particular, you do not need the following line of code:
String response = decodeHtmlEntityCharacters(sb.toString());
Just return sb.toString(). And for $DEITY's sake, do not use string methods to parse the retrieved string, use an XML parser, or a full-blown SOAP stack...

Does the > or < character always appear at the beginning of a value? Then you could use regex to handle the cases in which the > or < are followed by a digit (or dot, for that matter).
Sample code, assuming the replacement strings used in it don't appear anywhere else in the XML:
private String decodeHtmlEntityCharacters(String inputString) throws Exception {
Enumeration en = htmlEntitiesTable.keys();
// Replaces > or < followed by dot or digit (while keeping the dot/digit)
inputString = inputString.replaceAll(">(\\.?\\d)", "Valuegreaterthan$1");
inputString = inputString.replaceAll("<(\\.?\\d)", "Valuelesserthan$1");
while(en.hasMoreElements()){
String key = (String)en.nextElement();
String val = (String)htmlEntitiesTable.get(key);
inputString = inputString.replaceAll(key, val);
}
inputString = inputString.replaceAll("Valuelesserthan", "<");
inputString = inputString.replaceAll("Valuegreaterthan", ">");
return inputString;
}
Note the most appropriate answer (and easier for everyone) would be to correctly encode the XML at the sender side (it would also render my solution non-working BTW).

It would be hard to cope with all the situations but you could cover the most common ones by adding a few more rules by assuming that any less than followed by a space is data, and a greater than that has a space in front of it is data and need to be encoded again.
private final static Hashtable htmlEntitiesTable = new Hashtable();
static {
htmlEntitiesTable.put("&","&");
htmlEntitiesTable.put(""","\"");
htmlEntitiesTable.put("<","<");
htmlEntitiesTable.put(">",">");
}
private String decodeHtmlEntityCharacters(String inputString) throws Exception {
Enumeration en = htmlEntitiesTable.keys();
while(en.hasMoreElements()){
String key = (String)en.nextElement();
String val = (String)htmlEntitiesTable.get(key);
inputString = inputString.replaceAll(key, val);
}
inputString = inputString.replaceAll("< ","< ");
inputString = inputString.replaceAll(" >"," >");
return inputString;
}

'>' is not escaped in XML. So you shouldn't have an issue with that. Regarding '<', here are the options I can think of.
Use CDATA in web response for text containing special characters.
Rewrite the text by reversing the order. For eg. if it is x < 2, change it to 2 > x. '>' is not escaped unless its a part of CDATA.
Use another attribute or element in the XML response to indicate '<' or '>'.
Use regular expression to find a sequence that starts with '<' and followed by a string, and followed by '<' of the closing tag. And replace it with some code or some value that you can interpret and replace later.
Also, you don't need to do this:
String response = decodeHtmlEntityCharacters(sb.toString());
You should be able to parse the XML after you take care of the '<' sign in text.
You can use this site for testing regular expressions.

Why not serialize your xml?, its much easier than what you are doing.
for an example:
var ser = new XmlSerializer(typeof(MyXMLObject));
using (var reader = XmlReader.Create("http.....xml"))
{
MyXMLObject _myobj = (response)ser.Deserialize(reader);
}

parsing a text file using a java scanner

I am trying to create a method that parses a text file and returns a string that is the url after the colon. The text file looks as follow (it is for a bot):
keyword:url
keyword,keyword:url
so each line consists of a keyword and a url, or multiple keywords and a url.
could anyone give me a bit of direction as to how to do this? Thank you.
I believe I need to use a scanner but couldn't find anything on anyone wanting to do anything similar to me.
Thank you.
edit: my attempt using suggestions below. doesn't quite work. Any help would be appreciated.
public static void main(String[] args) throws IOException {
String sCurrentLine = "";
String key = "hello";
BufferedReader reader = new BufferedReader(
new FileReader(("sites.txt")));
Scanner s = new Scanner(sCurrentLine);
while ((sCurrentLine = reader.readLine()) != null) {
System.out.println(sCurrentLine);
if(sCurrentLine.contains(key)){
System.out.println(s.findInLine("http"));
}
}
}
output:
hello,there:http://www.facebook.com
null
whats,up:http:/google.com
sites.txt:
hello,there:http://www.facebook.com
whats,up:http:/google.com

You should read the file line by line with a BufferedReader as you are doing, I would the recommend parsing the file using regex.
The pattern
(?<=:)http://[^\\s]++
Will do the trick, this pattern says:
http://
followed by any number of non-space characters (more than one) [^\\s]++
and preceded by a colon (?<=:)
Here is a simple example using a String to proxy your file:
public static void main(String[] args) throws Exception {
final String file = "hello,there:http://www.facebook.com\n"
+ "whats,up:http://google.com";
final Pattern pattern = Pattern.compile("(?<=:)http://[^\\s]++");
final Matcher m = pattern.matcher("");
try (final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(file.getBytes("UTF-8"))))) {
String line;
while ((line = bufferedReader.readLine()) != null) {
m.reset(line);
while (m.find()) {
System.out.println(m.group());
}
}
}
}
Output:
http://www.facebook.com
http://google.com

Use BufferedReader, for text parsing you can use regular expresions.

You should use the split method:
String strCollection[] = yourScannedStr.Split(":", 2);
String extractedUrl = strCollection[1];

Reading a .txt file using Scanner class in Java
http://www.tutorialspoint.com/java/java_string_substring.htm
That should help you.

how to get list from json data in java

My Json Data:
{'ID':1,'FirstName':'x','LastName':'y','Company':'x','EMail':'x','PhoneNo':'x'}
My Java Code:
public static void main(String[] args) throws IOException {
String json = getJSON().substring(getJSON().indexOf("[")+1,getJSON().indexOf("]"));
Users user = new Gson().fromJson(json, Users.class);
WriteLine("["+user.getID()+"]"+" "+user.getFirstName()+" "+user.getLastName()+" "+user.getCompany()+" "+user.getEMail()+" "+user.getPhoneNo());
}
static void WriteLine(String text){
System.out.print(text);
}
static String getJSON() throws IOException
{
URL url = new URL("http://localhost:51679/api/User");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sBuilder = new StringBuilder();
String line;
while((line = reader.readLine()) != null){
sBuilder.append(line);
}
reader.close();
connection.disconnect();
return sBuilder.toString();
}
But my json data As these become:
{'ID':1,'FirstName':'x','LastName':'x','Company':'x','EMail':'x','PhoneNo':'x'},{'ID':2,'FirstName':'y','LastName':'y','Company':'y','EMail':'x','PhoneNo':'y'}
I have a error: Caused by: com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 136
Can you help me? Sorry for my bad english :(

Read the error message carefully. That input is not valid JSON, which is exactly what the message tells you. Strings and keys must be surrounded with double quotes, not single quotes:
{"ID":1,"FirstName":"x","LastName":"y","Company":"x","EMail":"x","PhoneNo":"x"}
A simple check with a JSON validator would tell you the same. Alternately – again, as the error message conveniently tells you – you could set the reader to be lenient with JsonReader.setLenient(true), to hopefully accept malformed JSON as input.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error while reading data from webpage in java? - java

Try escaping with something like org.apache.commons.lang.StringEscapeUtils

Related

How can I get the data from website in Java?

Remove Base64 prefix from InputStream

Formatting Web Service Response

parsing a text file using a java scanner

how to get list from json data in java

Categories

Resources