Get web page content to String is very slow - java

I did the download a web page with the HttpURLConnection.getInputStream() and to get the content to a String, i do the following method:
String content="";
isr = new InputStreamReader(pageContent);
br = new BufferedReader(isr);
try {
do {
line = br.readLine();
content += line;
} while (line != null);
return content;
} catch (Exception e) {
System.out.println("Error: " + e);
return null;
}
The download of the page is fast, but the processing to get the content to String is very slow. There is another way faster to get the content to a String?
I transform it to String to insert in the database.

Read into buffer by number of bytes, not something arbitrary like lines. That alone should be a good start to speeding this up, as the reader will not have to find the line end.

Use a StringBuffer instead.
Edit for an example:
StringBuffer buffer=new StringBuffer();
for(int i=0;i<20;++i)
buffer.append(i.toString());
String result=buffer.toString();

use the blob/clob to put the content directly into database.
any specific reason for buliding string line by line and put it in the database??

I'm using jsoup to get specified content of a page and here is a web demo based on jquery and jsoup to catch any content of a web page, you should specify the ID or Class for the page content you need to catch: http://www.gbin1.com/technology/democenter/20120720jsoupjquerysnatchpage/index.html

Related

How can I get the data from website in Java?

I want to get the value of "Yield" in "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319"
How can I do this with java?
I have tried "Jsoup" and my code like these:
public static void main(String[] args) throws IOException {
String url = "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319";
Document document = Jsoup.connect(url).get();
Elements answerers = document.select(".c3 .floatR ");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.data());
}
// TODO code application logic here
}
But it return empty. How can I do this?
Your code is fine. I tested it myself. The problem is the URL you're using. If I open the url in a browser, the value fields (e.g. Yield) are empty. Using the browser development tools (Network tab) you should get an URL that looks like:
http://www.aastocks.com/en/ltp/RTQuoteContent.aspx?symbol=01319&process=y
Using this URL gives you the wanted results.
The simplest solution is to create a URL instance pointing to the web page / link you want get the content using streams-
for example-
public static void main(String[] args) throws IOException
{
URL url = new URL("http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319");
// Get the input stream through URL Connection
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
// Once you have the Input Stream, it's just plain old Java IO stuff.
// For this case, since you are interested in getting plain-text web page
// I'll use a reader and output the text content to System.out.
// For binary content, it's better to directly read the bytes from stream and write
// to the target file.
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
// read each line and write to System.out
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
I think Jsoup is critical in this purpose. I would not suspect a valid HTML document (or whatever).

Java se file.txt convert to file.html

How to convert txt to html with all words of file.txt ?
public class Main {
private static String name = "writer.html";
private static String Text = "C://Users//Vladimir//IdeaProjects//Algorithms//src//pack/textfile.txt";
public static String readtxt(String filename) throws IOException{
BufferedReader reader = new BufferedReader(new FileReader(filename));
String s;
StringBuilder sb = new StringBuilder();
while((s = reader.readLine()) != null) {
sb.append(s + "\n");
}
reader.close();
return sb.toString();
}
public static Object writer(String fileName,String text){
Text = text;
try {
PrintWriter out = new PrintWriter(new File(fileName));
try {
out.print(Text);
} finally {
out.close();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return null;
}
}
writer.html (output - C://Users//Vladimir//IdeaProjects//Algorithms//src//pack/textfile.txt )
First of all change this,
From -
sb.append(s + "\n");
To -
sb.append(s + "<br/>");
Also remove this,
Text = text; // not needed
change over here like,
out.print(text);
I think it should be work properly as per your requirement.
It seems that you are missing quite a lot in your code, and that you are on a learning path. If your class is not complete I'd suggest you take a look first how to correctly read from one file and write into another, like this:
File I/O: Reading from one file and writing to another (Java)
or just educate your self on File I/O in Java
In case you have that already covered, and you are wondering how to transfer from simple text file to html, I'd suggest next to look at HTML format as you should create valid html file (where the content of your text file will be copied into <body> element ) - http://www.w3schools.com/html/default.asp
Once you inject that into your target file you can start adding line by line from your txt file. For the sake of simplicity lets assume all your text will be inside single paragraph html element then you will separate each line using <br>
tag (as vishal mentioned).
For 'advanced' transformation you should escape your strings so that all words are correctly diplayed in the browser using something like Commons StringEscapeUtils - or check this thread:How to escape HTML special characters in Java?
Good luck

Hebrew rendering in website

I am working on a product which has an internet "Admin Panel" - Somewhere the user can see information about the product. One of the minimal requirements is that the website has both English and Hebrew Version. So what is the problem? The problem is that some of the characters look like this, But they should look like this.
When I get a request from a browser I read an HTML file using this code (JAVA):
public static String loadPage(String page, String lang) {
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
(Thanks to Jon Skeet for helpig with reading it as UTF-8), After I read the file I am replacing some of the comments to with the correct data (For example: I have a comment like this: <!--username--> and I replace it with "Itay"), After the replacing I just send the response.
The server itself is hosted using sun's HttpServer.
I also made sure to do these things:
I saved the html file as UTF-8
In the html file there is this meta tag: <meta charset="UTF-8">"
One of the response headers is: Content-Type=text/html;charset=utf-8
By the way i am using Chrome.
So I hope I gave enough details about my problem and if you need more feel free to tell me!
(I also hope I posted the question with the right tags and title)
Basically, don't use FileReader. It always uses the platform-default encoding, which may well not be appropriate for this file.
If you're using a modern version of Java, it's better to use:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
br = Files.newBufferedReader(path);
That will read in UTF-8 by default - if you wanted a different charset, you can specify it as another argument to newBufferedReader.
I'd also advise you to use a try-with-resources statement to get rid of all the cruft with a manual finally block:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
That will remove all line breaks, mind you. (Note that I've used StringBuilder to avoid performance issues from repeated string concatenation...)
You need to tell your FileReader to read as UTF8.
In the end i found that i realy had a problem reading as UTF-8 but the other problem was thats I have not sent it back as UTF-8 So this is how i sent it:
public void end(HttpExchange t, String response, long tStart, int status) throws IOException {
try {
String temp = convertToUTF8(response);
t.sendResponseHeaders(status, temp.length());
OutputStream os = t.getResponseBody();
OutputStream bout= new BufferedOutputStream(os);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF-8");
out.write(response);
out.flush();
out.close();
}catch (UnsupportedEncodingException e) {
System.out.println("This VM does not support the UTF-8 character set.");
}catch (IOException e) {
System.out.println(e.getMessage());
}
long tEnd = System.currentTimeMillis();
long tDelta = tEnd - tStart;
System.out.println("Done handling request! Time took: " + tDelta);
}
Again thank you Jon Skeet for yor answer it was very helpfull!
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
(This is how to read the file as UTF-8 using his way)

How to write an S3 object to a String in java

I am trying to download the content from s3object to a string in the following manner:
S3Object s3Object = amazonS3Client.getObject(bucketName, key);
S3ObjectInputStream stream = s3Object.getObjectContent();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream));
String text = "";
String temp = "";
try {
while((temp = bufferedReader.readLine()) != null){
text = text+temp;
}
bufferedReader.close();
stream.close();
} catch (IOException e) {
m_logger.error("Exception while reading the string " + e);
}
but while downloading the content, i am getting the following error
Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt.
at com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidationInputStream.java:73)
at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:61)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
You can try jcabi-s3 (I'm a developer), which does this job for you:
Region region = new Region.Simple("key", "secret");
Bucket bucket = region.bucket("my.example.com");
Ocket.Text ocket = new Ocket.Text(bucket.ocket("test.txt"));
String content = ocket.read();
Check this blog post: http://www.yegor256.com/2014/05/26/amazon-s3-java-oop-adapter.html
Your code looks correct to me (although I'd put the close statements in a finally block, and handle line endings in concatenating the rows of the file in the text = text + temp statement).
Looking at the error message, I get the feeling that this is something occurring in the framework. Have you tried to get data from another object? Or to download the object you're trying to read through an alternative means to see that the data isn't corrupted?
Good Luck!

Java string comparisions/quotations

My problem is the comparision of two objects and the strings that they return (accessed through getters).
Object one parses a csv file for dates and when printed out through exampleObject.getDateTime() returns the string: "2010-03-26-10-54-06.471000"
Object two has its dateTime string set by the user. When I set the dateTime on object two to be the same as objectOne and then do exampleObjectTwo.getDateTime() it returns 2010-03-26-10-54-06.471000
So the main difference is that one string has quotations from parsing the csv (which contains no quotations) and the user set string when returned has no quotations!
If anyone can offer an explanation as to why this is happening I'd be very grateful!
Many thanks!
BufferedReader input = new BufferedReader(new FileReader(file));
try {
String line = null;
while ((line = input.readLine()) != null) {
SearchResult searchResult = new SearchResult();
if (!line.contains("Date")) {
String[] split = line.split(",");
SearchResult.setDateTime(split[0]);
SearchResults.add(SearchResult);
}
}
} finally {
input.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
edit above is the code that was using to parse the csv file. I checked and the csv file does not contain any quotations.
Thanks for the quick and helpful response!
You need to modify/configure the CSV parser to remove the quotes.
If it's a homegrown CSV parser, doing so should suffice to get rid of the surrounding doublequotes:
field = field.replaceAll("^\"|\"$", "");
If it's a 3rd party API, then you need to consult its documentation (or mention the library name here so that one who is willing to do that can lookup the documentation for you).
See also:
How to parse CSV in Java?
check again how do you perform csv parsing.
or remove quotations from string:
String newDate = oldString.replaceAll("\""," ").trim();

Categories