how can i get spesific words from an url in java - java

How can i get spesific words from an url in java. Like i want to take datas from class which calling like blablabla.
Here is my code.
URL url = new URL("https://www.doviz.com/");
URLConnection connect = url.openConnection();
InputStream is = connect.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while((line = br.readLine()) != null)
{
System.out.println(line);
}

Take a look at Jsoup , this will allow you to get the content of a web page and NOT the HTML code. Let's say it will play the role of the browser, it will parse the HTML tags into a human readable text.
Once you will get the content of your page in a String, you can count the occurrences of your word using any algorithm of occurrences count.
Simple example to use it:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/* ........ */
String URL = "https://www.doviz.com/";
Document doc = Jsoup.connect(URL).get();
String text = doc.body().text();
System.out.println(text);
EDIT
If you don't want to use a parser (as you mentioned in the comment that you don't want external libraries), you will get the whole HTML code of the page, that's how you can do it
try {
URL url = new URL("https://www.doviz.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
str = in.readLine().toString();
System.out.println(str);
/*str will get each time the new line, if you want to store the whole text in str
you can use concatenation (str+ = in.readLine().toString())*/
}
in.close();
} catch (Exception e) {}

Related

Reading Data (Numbers) from a Website

i would like to create an excel file with data from a website. In my inputstream i find something from the page but not the things i am looking for.
This is the website i want the data from: https://www.finanzen.net/bilanz_guv/adidas
As an Example i would like a System.out.println that returns the earning per share ( in german: Ergebnis je Aktie" ) from the years 2011 to 2017 so it would be the following numbers:
3,20 2,51 3,76 2,35 3,30 5,08 6,69
What i have managed till now:
URL u = new URL("https://www.finanzen.net/bilanz_guv/adidas");
InputStream in = u.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder result = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
result.append(line);
}
System.out.println(result.toString());
But the result String does not contain any of the searched numbers.
It does contain the first line of the pages soucecode so as if i clicked on show the soucecode of the webpage in my browser.
As i have not much knowledge about programming keep the answeres simple :-)
Thanks

How to replace a line with a new line using Java

Using a Buffer reader I parse throughout a file. If Oranges: pattern is found, I want to replace it with ApplesAndOranges.
try (BufferedReader br = new BufferedReader(new FileReader(resourcesFilePath))) {
String line;
while ((line = br.readLine()) != null) {
if (line.startsWith("Oranges:")){
int startIndex = line.indexOf(":");
line = line.substring(startIndex + 2);
String updatedLine = "ApplesAndOranges";
updateLine(line, updatedLine);
I call a method updateLine and I pass my original line as well as the updated line value.
private static void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(resourcesFilePath));
PrintWriter writer = new PrintWriter(new File(resourcesFilePath+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("Can't Write To File"+ resourcesFilePath);
writer.close();
}
To get the file to update I have to save it with a different name (resourcesFilePath+".out"). If I use the original file name the saved version become blank.
So here is my question, how can I replace a line with any value in the original file without losing any data.
For this you need to use the regular expressions (RegExp) like this:
str = str.replaceAll("^Orange:(.*)", "OrangeAndApples:$1");
It's an example and maybe it's not excactly what you want, but here, in the first parameter, the expression in parentesis is called a capturing group. The expression found will be replaced by the second parameter and the $1 will be replaced by the value of the capturing group. In our example Orange:Hello at the beggining of a line will be replaced by OrangeAndApples:Hello.
In your code, it seams you create one file per line ... maybe inlining the sub-method would be better.
try (
BufferedReader br = new BufferedReader(new FileReader(resourcesFilePath));
BufferedWriter writer = Files.newBufferedWriter(outputFilePath, charset);
) {
String line;
while ((line = br.readLine()) != null) {
String repl = line.replaceAll("Orange:(.*)","OrangeAndApples:$1");
writer.writeln(repl);
}
}
The easiest way to write over everything in your original final would be to read in everything - changing whatever you want to change and closing the stream. Afterwards open up the file again, then overwrite the file and all its lines with the data you want.
You can use RandomAccessFile to write to the file, and nio.Files to read the bytes from it. In this case, I put it as a string.
You can also read the file with RandomAccessFile, but it is easier to do it this way, in my opinion.
import java.io.RandomAccessFile;
import java.io.File;
import java.io.IOException;
import java.nio.file.*;
public void replace(File file){
try {
RandomAccessFile raf = new RandomAccessFile(file, "rw");
Path p = Paths.get(file.toURI());
String line = new String(Files.readAllBytes(p));
if(line.startsWith("Oranges:")){
line.replaceAll("Oranges:", "ApplesandOranges:");
raf.writeUTF(line);
}
raf.close();
} catch (IOException e) {
e.printStackTrace();
}
}

Getting Album Name with Song Title and Artist

I m trying to get an album information/album cover of an artist in my program.
i m trying to do it through (example: madonna/frozen);
String urlToRead = "http://www.musicbrainz.org/ws/2/recording/?query=artist:madonna+recording:frozen";
to get the album information and cover foto also if its possiable information about the artist.
what i habe been trying until now;
String urlToRead = "http://www.musicbrainz.org/ws/2/recording/?query=artist:madonna+recording:frozen";
URL url;
HttpURLConnection conn;
BufferedReader rd;
String line;
String result = "";
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (Exception e) {
e.printStackTrace();
}
String out = result;
But the output is a huge xml file in string format which full of information that doesnt really match what i want.(so many random albums which Madonna s song frozen in it)
Is there any other simple way to do it ? If not how could i get the exactly information from my output ?
Any tipps?
Have a look at this, if you haven't already: https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2/Search
That contains a list of fields that search fields you can use to narrow down your search result, including a LIMIT field if you only wish to retrieve one result.
As for the album art - that's handled by a separate API - https://musicbrainz.org/doc/Cover_Art_Archive/API - you can use their Java bindings in your program to fetch the album covers ( https://github.com/lastfm/coverartarchive-api )

how do you get String Tokenizer to ignore text?

I have this code:
public void readTroops() {
File file = new File("resources/objects/troops.txt");
StringBuffer contents = new StringBuffer();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text = null;
// repeat until all lines is read
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
String list = troops.nextToken();
String value = troops.nextToken();
}
and this file:
//this is a comment part of the text file//
Total=1
the problem is that 1) I cant get it to ignore everything within the //,// and can't get it to read with an 'ENTER' (line) in-between them. For example, this text works:
Total=1
So my question is what do I type into the delimiter area ie.
StringTokenizer troops = new StringTokenizer(text,"=","WHAT GOES HERE?");
So how can I get Tokenizer to ignore 'ENTER'/new line, and anything in-between // or something similar, thanks.
ps.I don't care if you use a String.split to answer my question.
Use the method countTokens to skip lines that don't have two tokens:
while ((text = reader.readLine()) != null) {
StringTokenizer troops = new StringTokenizer(text,"=");
if(troops.countTokens() == 2){
String list = troops.nextToken();
String value = troops.nextToken();
....
}else {
//ignore this line
}
}
Properties prop = new Properties();
prop.load(new FileInputStream("properties_file.txt"));
assertExuals("1",prop.getProperty("Total"));
ps. you might hold and close input stream.
Thinking out of the box, maybe you can use Properties instead of tokenizer (if you update your comments to start with #)?
Properties troops = new Properties();
InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties");
try {
props.load(inputStream);
} catch (IOException e) {
// Handle error
} finally {
// Close inputStream in a safe manner
}
troops.getProperty("Total"); // Returns "1"
Or if you are using Java 7:
Properties troops = new Properties();
try (InputStream inputStream = SomeClass.class.getResourceAsStream("troops.properties")) {
props.load(inputStream);
} catch (IOException e) {
// Handle error
}
troops.getProperty("Total"); // Returns "1"
If you are reading in the file a better way would be to use a StreamTokenizer. This then allows you to declare your own syntax of the tokenizer. I used this method to create a HTML rendering engine. This then allows you to parse direct from a reader, and also provides useful functions to identify numbers, which it seems you may use.
(I will post an example once my eclipse loads!)
public static String render(String file, HashMap vars){
// Create a stringbuffer to rebuild the string
StringBuffer renderedFile = new StringBuffer();
try{
FileReader in = new FileReader(file);
BufferedReader reader = new BufferedReader(in); // create your reader
StreamTokenizer tok;
tok = new StreamTokenizer(reader); //the tokenizer then takes in the reader as a builder
tok.resetSyntax();
tok.wordChars(0, 255); //sets all chars (inc spaces to be counted as words)
/*
* quoteChar allows you to set your comment char, for example $ hello $ means it will ignore hello
*/
tok.quoteChar('$');
while(tok.nextToken()!=StreamTokenizer.TT_EOF){ //while it is not at the end of file
String s = tok.sval;
if (vars.containsKey(s))
s =(String)vars.get(s);
renderedFile.append(s);
}
}
catch(Exception e){System.out.println("Error Loading Template");}
return renderedFile.toString();
}
Check this out for a good tutorial http://tutorials.jenkov.com/java-io/streamtokenizer.html

extract the main part of a page in java

Hello
I have a page of a personality in wikipedia and I want to extract with java source a code HTML from the main part is that.
Do you have any ideas?
Use Jsoup, specifically the selector syntax.
Document doc = Jsoup.parse(new URL("http://en.wikipedia.org/", 10000);
Elements interestingParts = doc.select("div.interestingClass");
//get the combined HTML fragments as a String
String selectedHtmlAsString = interestingParts.html();
//get all the links
Elements links = interestingParts.select("a[href]");
//filter the document to include certain tags only
Whitelist allowedTags = Whitelist.simpleText().addTags("blockquote","code", "p");
Cleaner cleaner = new Cleaner(allowedTags);
Document filteredDoc = cleaner.clean(doc);
It's a very useful API for parsing HTML pages and extracting the desired data.
For wikipedia there is API: http://www.mediawiki.org/wiki/API:Main_page
Analyze web page's structure
Use JSoup to parse HTML
Note that this returns a STRING (blob of a sort) of the HTML source code, not a nicely formatted content item.
I use this myself - a little snippet I have for whatever i need. Pass in the url, any start and stop text, or the boolean to get everything.
public static String getPage(
String url,
String booleanStart,
String booleanStop,
boolean getAll) throws Exception {
StringBuilder page = new StringBuilder();
URL iso3 = new URL(url);
URLConnection iso3conn = iso3.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
iso3conn.getInputStream()));
String inputLine;
if (getAll) {
while ((inputLine = in.readLine()) != null) {
page.append(inputLine);
}
} else {
boolean save = false;
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains(booleanStart))
save = true;
if (save)
page.append(inputLine);
if (save && inputLine.contains(booleanStop)) {
break;
}
}
}
in.close();
return page.toString();
}

Categories