The below code i have helps me get the source code from the provided url without any errors. But what i am looking for is to format the source code i receive.
My manual task earlier was to go to this website http://www.freeformatter.com/html-formatter.html paste my source code and then format it by selecting 3 space per indent option. How do i get my java code to do the same formatting for me ?
The reason i want it formatted is because i have another script which reads it line by line and saves data which is required and ignores the rest.
private static String getUrlSource(String url) throws IOException {
URL x= new URL(url);
URLConnection yc = x.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
{ a.append(inputLine); a.append("\n");
}
in.close();
return a.toString();
}
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("Hello");
url="http://www.bctransit.com/regions/cfv/schedules/schedule.cfm?p=day.text&route=1%3A0&day=1&";
try {
String value= getUrlSource(url);
System.out.println(value);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
If you are scraping a web page, I suggest using a real HTML parser instead. Your method is bound to fail sooner or later.
I would recommend having a look at jsoup. While I have never used it, I have had great results with its Python counterpart, Beautifulsoup.
Using a library such as jsoup will get you a nice object model to traverse instead of relying on string manipulation.
As a bonus, jsoup will actually format the HTML string for you, should you want that anyway.
Related
I'm pretty new in the programming world, and i can't find a good explanation on how to to load a txt file to a string variable in java using eclpise.
So far, from what i have been able to understand, i am supposed to use the StdIn class, and i know that the txt file need to be located in my eclipse workspace (outside the source folder) but i don't know what excatly i need to write in the code to get the given file to load into the variable.
I could really use some help with this.
Although I'm not a Java expert, I'm pretty sure this is the information you're looking for It looks like this:
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
Basically all languages provide you with some methods to read from the file system you're in. Hope that does it for you!
Good luck with your project!
to read a file and store it in a String you can do it by using either String or StringBuilder:
you need to define BufferedReader to with constructor of FileReader to pass the name of the file and make it ready to read from file.
use StringBuilder to append every line of result to it.
when the reading finished add the result to String data.
public static void main(String[] args) {
String data = "";
try {
BufferedReader br = new BufferedReader(new FileReader("filename"));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
data = sb.toString();
} catch (Exception e) {
e.printStackTrace();
}
}
I want to get the value of "Yield" in "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319"
How can I do this with java?
I have tried "Jsoup" and my code like these:
public static void main(String[] args) throws IOException {
String url = "http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319";
Document document = Jsoup.connect(url).get();
Elements answerers = document.select(".c3 .floatR ");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.data());
}
// TODO code application logic here
}
But it return empty. How can I do this?
Your code is fine. I tested it myself. The problem is the URL you're using. If I open the url in a browser, the value fields (e.g. Yield) are empty. Using the browser development tools (Network tab) you should get an URL that looks like:
http://www.aastocks.com/en/ltp/RTQuoteContent.aspx?symbol=01319&process=y
Using this URL gives you the wanted results.
The simplest solution is to create a URL instance pointing to the web page / link you want get the content using streams-
for example-
public static void main(String[] args) throws IOException
{
URL url = new URL("http://www.aastocks.com/en/ltp/rtquote.aspx?symbol=01319");
// Get the input stream through URL Connection
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
// Once you have the Input Stream, it's just plain old Java IO stuff.
// For this case, since you are interested in getting plain-text web page
// I'll use a reader and output the text content to System.out.
// For binary content, it's better to directly read the bytes from stream and write
// to the target file.
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
// read each line and write to System.out
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
I think Jsoup is critical in this purpose. I would not suspect a valid HTML document (or whatever).
I am working on a product which has an internet "Admin Panel" - Somewhere the user can see information about the product. One of the minimal requirements is that the website has both English and Hebrew Version. So what is the problem? The problem is that some of the characters look like this, But they should look like this.
When I get a request from a browser I read an HTML file using this code (JAVA):
public static String loadPage(String page, String lang) {
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
(Thanks to Jon Skeet for helpig with reading it as UTF-8), After I read the file I am replacing some of the comments to with the correct data (For example: I have a comment like this: <!--username--> and I replace it with "Itay"), After the replacing I just send the response.
The server itself is hosted using sun's HttpServer.
I also made sure to do these things:
I saved the html file as UTF-8
In the html file there is this meta tag: <meta charset="UTF-8">"
One of the response headers is: Content-Type=text/html;charset=utf-8
By the way i am using Chrome.
So I hope I gave enough details about my problem and if you need more feel free to tell me!
(I also hope I posted the question with the right tags and title)
Basically, don't use FileReader. It always uses the platform-default encoding, which may well not be appropriate for this file.
If you're using a modern version of Java, it's better to use:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
br = Files.newBufferedReader(path);
That will read in UTF-8 by default - if you wanted a different charset, you can specify it as another argument to newBufferedReader.
I'd also advise you to use a try-with-resources statement to get rid of all the cruft with a manual finally block:
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
That will remove all line breaks, mind you. (Note that I've used StringBuilder to avoid performance issues from repeated string concatenation...)
You need to tell your FileReader to read as UTF8.
In the end i found that i realy had a problem reading as UTF-8 but the other problem was thats I have not sent it back as UTF-8 So this is how i sent it:
public void end(HttpExchange t, String response, long tStart, int status) throws IOException {
try {
String temp = convertToUTF8(response);
t.sendResponseHeaders(status, temp.length());
OutputStream os = t.getResponseBody();
OutputStream bout= new BufferedOutputStream(os);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF-8");
out.write(response);
out.flush();
out.close();
}catch (UnsupportedEncodingException e) {
System.out.println("This VM does not support the UTF-8 character set.");
}catch (IOException e) {
System.out.println(e.getMessage());
}
long tEnd = System.currentTimeMillis();
long tDelta = tEnd - tStart;
System.out.println("Done handling request! Time took: " + tDelta);
}
Again thank you Jon Skeet for yor answer it was very helpfull!
Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
StringBuilder website = new StringBuilder();
String currentLine;
while ((currentLine = br.readLine()) != null) {
website.append(currentLine);
}
return website.toString();
}
(This is how to read the file as UTF-8 using his way)
A method returns a String in comma separated format. For example, the returned String can be like the one given below.
Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China
I will need to get this String and write it into a CSV file. I'll have to insert a header and a footer for this file as well.
For example, when I open the file, the contents for the above data will be
Name,Age,Gender,Country
Tarantino,50,M,USA
Carey Mulligan,27,F,UK
Gong Li,45,F,China
How do we do that ? Are there any open source libraries to do this task ?
CSV format is not very well defined. You don't have to write headers for the file. Instead it is pretty SIMPLE format. Data values are separated using commas or semicolon or space etc.
You just have to write your own simple method that writes your string to a file on local computer using FileOutputStream or Writer in java.io package.
You can use this as a learning example.
I used BufferedReader because he will take care about line separators, but you can also use #split method, and write the resulting tokens.
import java.io.*;
public class Tests {
public static void main(String[] args) {
File file = new File("out.csv");
BufferedWriter out = null;
try {
out = new BufferedWriter(new FileWriter(file));
String string = "Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China";
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(string.getBytes())));
String line;
while ((line = reader.readLine()) != null) {
out.write(line.trim());
out.newLine();
}
}
catch (IOException e) {
// log something
e.printStackTrace();
}
finally {
if (out != null) {
try {
out.close();
} catch (IOException e) {
// ignored
}
}
}
}
}
This is pretty simple
String str = "Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China";
PrintWriter pr = new PrintWriter(new FileWriter(new File("test.csv"), true));
String arr[] = str.split("\\n");
// splited the string by new line provided with the string
pr.println("Name,Age,Gender,Country");
// header written first and rest of data appended
for(String s : arr){
pr.println(s);
}
pr.close();
don't forget to close the stream in finally block and handle the exception
I know this is a bit naive. How to unit test this piece of code without giving physical file as input.
I am new to mockito and unit testing. So I am not sure. Please help.
public static String fileToString(File file) throws IOException
{
BufferedReader br = new BufferedReader(new FileReader(file));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
return sb.toString();
} finally {
br.close();
}
}
You can create a file as part of the test, no need to mock it out.
JUnit does have a nice functionality for creating files used for testing and automatically cleaning them up using the TemporaryFolder rule.
public class MyTestClass {
#Rule
public TemporaryFolder folder = new TemporaryFolder();
#Test
public void myTest() {
// this folder gets cleaned up automatically by JUnit
File file = folder.newFile("someTestFile.txt");
// populate the file
// run your test
}
}
You should probably refactor your method. As you realized, a method taking a file as input isn't easily testable. Also, it seems to be static, which doesn't help testability. If you rewrite your method as :
public String fileToString(BufferedReader input) throws IOException
it will be much easier to test. You separate your business logic form the technicalities of reading a file. As I understand it, your business logic is reading a stream and ensuring the line endings are unix style.
If you do that, your method will be testable. You also make it more generic : it can now read from a file, from a URL, or from any kind of stream. Better code, easier to test ...
Why do you wanna mock a file? Mocking java.io.File is a bad idea as it has loads of native stuff. I would advice you to ensure that a minimalist text file is available in classpath when the unit tests are run. You can convert this file to text and confirm the output.
you could use combination of ByteArrayInputStream and BufferedReader class, to make your required file within your code. So there wouldn't be any need to create a real File on your system. What would happen if you don't have enough permission --based of some specific circumstances -- to create a file. On the code below, you create your own desirable content of your file:
public static void main(String a[]){
String str = "converting to input stream"+
"\n and this is second line";
byte[] content = str.getBytes();
InputStream is = null;
BufferedReader bfReader = null;
try {
is = new ByteArrayInputStream(content);
bfReader = new BufferedReader(new InputStreamReader(is));
String temp = null;
while((temp = bfReader.readLine()) != null){
System.out.println(temp);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try{
if(is != null) is.close();
} catch (Exception ex){
}
}
}