How to do web scraping? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to get data from other site.
I want these items:
apple,anar,andi,arabi,Lucknow ,date
…from this site:
http://www.upmandiparishad.in/MWRates.asp
My original source code…
public class readURL {
public static void main(String[] args){
String generate_URL = "http://www.upmandiparishad.in/MWRates.asp";
try {
URL data = new URL(generate_URL);
URLConnection yc = data.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
My updated source code using the jsoup library…
public class parse3 {
public static void print(String url) throws IOException{
Document doc = Jsoup.connect(url).timeout(20*1000).get();
Element pending = doc.select("table td:eq(1)").first();
int nex=doc.select("table td:eq(0)").size();
//System.out.println(nex);
System.out.println(pending.text());
//System.out.println(nex);
}
public static void main(String[] args) throws IOException {
String url = "http://www.upmandiparishad.in/MWRates.asp";
new parse3().print(url);
}
}

You need to download the page and parse the html for the keywords you are looking for.
For this purpose, since you are using java use jsoup.
JSoup can download as well as retrieve the keywords you are looking for.
UPDATE
To get the rates of all the items you have to access the select tag.
Elements options = document.select("select#comcode > option");
for(Element element : options){
System.out.println("Price of " + element.text() + ":" + element.attr("value"));
}

Related

Get the list of all URLs on the website using Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
There are many libraries(eg. Jsoup) which can do this task in a go but how can I get all the URLs present in the HTML content of any website using Java without using any external libraries?
Edit 1: Can anyone explain what scanner.useDelimiter("\Z") actually does and what is the difference between scanner.useDelimiter("\Z") and scanner.useDelimiter("\z").
I am answering my own question as I was trying to find the accurate answer on StackOverflow but couldn't find one.
Here is the code:
URL url;
ArrayList<String> finalResult = new ArrayList<String>();
try {
String content = null;
URLConnection connection = null;
try {
connection = new URL("https://yahoo.com").openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\Z");
content = scanner.next();
scanner.close();
} catch (Exception ex) {
ex.printStackTrace();
}
String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
while (m.find()) {
if(!finalResult.contains((m.group())))
finalResult.add(m.group());
}
} finally {
for(String res: finalResult){
System.out.println(res);
}
}
You can try using a regEx.
Here is an example of a regEx that checks if any test is a URL or not.
https://www.regextester.com/96504.
But I can't stop my self to say that Jsoup is what fits for this. but it's an external library.

Searching for a sentence in a file java [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am really stuck up with this.
I am having an input file say input.txt.
content of input.txt is
Using a musical analogy, hardware is like a musical instrument and software is like the notes played on that instrument.
Now I want to search the text
like a musical instrument
How can I search the above content in input.txt in java. Any help???
Inorder to search a pattern in java, java provides contains() method in String. Try to use that, Following is the snippet of the code that serve the purpose,
public static void main(String[] args) throws IOException {
FileReader reader = new FileReader(new File("sat.txt"));
BufferedReader br = new BufferedReader(reader);
String s = null;
while((s = br.readLine()) != null) {
if(s.contains("like a musical instrument")) {
System.out.println("String found");
return;
}
}
System.out.println("String not found");
}
You can always use the String#contains() method to search for substrings. Here, we will read each line in the file one by one, and check for a string match. If a match is found, we will stop reading the file and print Match is found!
package com.adi.search.string;
import java.io.*;
import java.util.*;
public class SearchString {
public static void main(String[] args) {
String inputFileName = "input.txt";
String matchString = "like a musical instrument";
boolean matchFound = false;
try(Scanner scanner = new Scanner(new FileInputStream(inputFileName))) {
while(scanner.hasNextLine()) {
if(scanner.nextLine().contains(matchString)) {
matchFound = true;
break;
}
}
} catch(IOException exception) {
exception.printStackTrace();
}
if(matchFound)
System.out.println("Match is found!");
else
System.out.println("Match not found!");
}
}

unreported exception filenotfoundexception [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am trying to make a stream, so I can read 2 lines from .txt file to two string variables. I tried try/catch, but still have an ureported exception error.
public class Shad1 {
public void myMethod()throws FileNotFoundException {
String stringName = new String("");
String stringNumb = new String("");
File file = new File ("c:\\input.txt");
try {
DataInputStream input = new DataInputStream(new FileInputStream(file));
int check = input.read();
char data = input.readChar();
while(data != '\n') {
stringName = stringName + data;
}
while (check != -1){
stringNumb = stringNumb + data;}
input.close();
} catch (FileNotFoundException fnfe){System.out.println(fnfe.getMessage());}
}
you're using the read method: note that this method can also throw an IOException. See the docs for the read method here, the declaration is:
public final int read(byte[] b) throws IOException
So you'll also need to catch IOException, or report that your method throws IOException.
Note that you don't need to do both, so in your example code, you can similarly choose to report that your method throws FileNotFoundException or declare it in a catch block: you don't need both (unless some other part of the code in the method might generate an unhandled FileNotFoundException).

How to read a file, Split the contents and print it? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i have a file named "file" which is a text file(the file contains 1,2,3,4 integers).. Now i want to read this file and split the values in the file and print each value in new line. How can i do that??
Try this:
public static void main( String args[] )
{
try {
Scanner sc = new Scanner(new File("number.txt"));
sc.useDelimiter(",");
while (sc.hasNextInt()) {
System.out.println(sc.nextInt());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public class Main {
public static void main(String[] str) throws Exception{
File f = new File("C:\\prince\\temp\\test.txt");
FileInputStream fis = new FileInputStream(f);
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
while ((line = br.readLine()) != null) {
String[] splitedTokens = line.split("[,]");
for (String splitedToke : splitedTokens) {
System.out.println(splitedToke);
}
}
}
}

Reading string data from file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
I want to read string data from file and has to pass each strings in the file for some operation.For ex if my file possess links of a website then I have to extract each link and parse its data.I have already done parsing for sites by passing URL as input.But now I think its favorable to store entire links as string and pass it as argument.So how can I read URL from a file and parse each URL data?Can any one specify the code for doing this?
Assuming your file contains a url on each line do this:
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while((line = br.readLine()) != null) {
// do something with line.
}
But you should be more specific in your question. Where is the problem?
This was your code that I read from your comment:
File file = new File("myfil");
try (FileInputStream fis = new FileInputStream(file)) {
int content; while ((content = fis.read()) != -1) { // convert to char and display it
System.out.print((char) content); }
This is that mess fixed up:
File file = new File("myfil");
String fileContent = ""; // String to keep track of file content
try {
FileInputStream fis = new FileInputStream(file);
int content;
while ((content = fis.read()) != -1)
{
fileContent += (char)content; // append this to the file content as char
}
} catch (IOException e) {
System.out.print("Problem reading file");
}
System.out.print(fileContent); // print it
Keep in mind, you will have to import some stuff into your project. These are the import lines, if you don't already have them:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
If you don't wan't to write code at all, just use the FileUtils-class.
import org.apache.commons.io.FileUtils;
...
public void yourMethod() {
List<String> lines = FileUtils.readLines(yourFile);
}
You can use regular expression for get list with all url's of your file and later iterate list for do something.
This is a sample example.
public class GetURL {
public static void extractUrls(String input, List<URL> allUrls)
throws MalformedURLException {
Pattern pattern = Pattern
.compile("\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)"
+ "(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov"
+ "|mil|biz|info|mobi|name|aero|jobs|museum"
+ "|travel|[a-z]{2}))(:[\\d]{1,5})?"
+ "(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?"
+ "((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?"
+ "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)"
+ "(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?"
+ "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*"
+ "(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
allUrls.add(new URL(matcher.group()));
}
}
public static void main(String[] args) throws IOException {
List<URL> allUrls = new ArrayList<URL>();
BufferedReader br = new BufferedReader(new FileReader("./urls.txt"));
String line;
while ((line = br.readLine()) != null) {
extractUrls(line, allUrls);
}
Iterator<URL> it = allUrls.iterator();
while (it.hasNext()) {
//Do something
System.out.println(it.next().toString());
}
}
}
Take a look at Apache Commons FileUtils and method readFileToString(File source) for convert directly file to String.

Categories