Parsing website by jsoup Java

Parsing website by jsoup Java - java

I need to check a value on https://new.ppy.sh/u/9889129 from div class profile-header-extra__rank-box (pp score). But it's returns nothig from there. How to do it?
public class mainClass
{
public static void main(String[] args) throws Exception {
String url = "https://new.ppy.sh/u/9889129";
Document document = Jsoup.connect(url).get();
String ppValue = document.select(".profile-header-extra__rank-global").text();
System.out.println("PP: "+ppValue);
}
}

Related

How to extract the text from the web site?

I'm working on code for parsing the weather site.
I found a CSS class with needed data on the web-site. How to pick up from there "on October 12" in the form of a string? (Tue, Oct 12)
public class Pars {
private static Document getPage() throws IOException {
String url = "https://www.gismeteo.by/weather-mogilev-4251/3-day/";
Document page = Jsoup.parse(new URL(url), 3000);
return page;
}
public static void main(String[] args) throws IOException {
Document page = getPage();
Element Nameday = page.select("div [class=date date-2]").first();
String date = Nameday.select("div [class=date date-2").text();
System.out.println(Nameday);
}
}
The code is written for the purpose of parsing the weather site. On the page I found the right class in which only the date and day of the week I need. But at the stage of converting data from a class, an error crashes into a string.

The problem is with class selector, it should look like this: div.date.date-2
Working code example:
public class Pars {
private static Document getPage() throws IOException {
String url = "https://www.gismeteo.by/weather-mogilev-4251/3-day/";
return Jsoup.parse(new URL(url), 3000);
}
public static void main(String[] args) throws IOException {
Document page = getPage();
Element dateDiv = page.select("div.date.date-2").first();
if(dateDiv != null) {
String date = dateDiv.text();
System.out.println(date);
}
}
}
Here is an answer to Your problem: Jsoup select div having multiple classes
In future, please make sure Your question is more detailed and well structured. Here is the "asking questions" guideline: https://stackoverflow.com/help/how-to-ask

Grabing Information from Yahoo finnance jsoup

So i have started with some java, i am not that good i am still a beginner..
what im trying to do is grab specific information from Yahoo finance with Jsoup.
public class WebScraping {
public static void main(String[] args) throws Exception {
String url = "https://in.finance.yahoo.com/q/is?s=AAPL&annual";
Document document = Jsoup.connect(url).get();
String information = document.select(".yfnc_tabledata1").text();
System.out.println("Information: " + information);
}
}
but i get the whole table i want specific information like the Net Income and the income only for year 2015

so i found the solution
public static void main(String[] args) throws Exception {
String url = "https://in.finance.yahoo.com/q/is?s=AAPL&annual";
Document document = Jsoup.connect(url).get();
String information = document.select("table tr:eq(7) > td:eq(2)").text();
System.out.println("Information: " + information);
}
}

How to scrape the hyperlinks from a webpage using JSOUP

When I am scraping with the following code it does not show any element within body tag, but in manually checking with view-source it shows the elements in body. How to scrape the hyperlinks in the following URL?
public static void main(String[] args) throws SQLException, IOException {
String search_url = "http://www.manta.com/search?search=geico";
Document doc = Jsoup.connect(search_url).userAgent("Mozilla").get();
System.out.println(doc);
Elements links = doc.select("a[href]");
System.out.println(links);
for (Element a : links) {
System.out.println(a);
String linkhref=a.attr("href");
System.out.println(linkhref);
}
}

How to get absolute URL parh without files

I need get absolute path of links without links to files. I have this code which get me links and some links there missing.
public class Main {
public static void main(String[] args) throws Exception {
URI uri = new URI("http://www.niocchi.com/");
printURLofPages(uri);
}
private static void printURLofPages(URI uri) throws IOException {
Document doc = Jsoup.connect(uri.toString()).get();
Elements links = doc.select("a[href~=^[^#]+$]");
for (Element link : links) {
String href = link.attr("abs:href");
URL url = new URL(href);
String path = url.getPath();
int lastdot = path.lastIndexOf(".");
if (lastdot > 0) {
String extension = path.substring(lastdot);
if (!extension.equalsIgnoreCase(".html") && !extension.equalsIgnoreCase(".htm"))
return;
}
System.out.println(href);
}
}
}
This code get me following links:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.niocchi.com/
I need get this links:
http://www.enormo.com/
http://www.vitalprix.com/
http://www.niocchi.com/javadoc
http://www.linkedin.com/in/flmommens
http://www.linkedin.com/in/ivanprado
http://www.linkedin.com/in/marcgracia
http://es.linkedin.com/in/tdibaja
http://www.linkody.com
http://www.niocchi.com/
Thanks a lot for advices.

instead of
String href = link.attr("href");
try
String href = link.attr("abs:href");
EDIT docs: http://jsoup.org/cookbook/extracting-data/working-with-urls

How to get the content from a website using Jsoup

I amm trying to get the data from a website. With this code:
#WebServlet(description = "get content from teamforge", urlPatterns = { "/JsoupEx" })
public class JsoupEx extends HttpServlet {
private static final long serialVersionUID = 1L;
private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";
public JsoupEx() {
super();
}
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
Document doc = Jsoup.connect(URL).get();
for (Element table : doc.select("table.DataTbl")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 1) {
System.out.println(tds.get(0).text() + ":"
+ tds.get(2).text());
}
}
}
}
}
I am using the jsoup parser. When run, I do not get any errors, just no output.
Please help on this.

With the following code
public class Tester {
private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect(URL).get();
System.out.println(doc);
}
}
I get a java.net.SocketTimeoutException: Read timed out. I think the particuliar URL you are trying to crawl is too slow for Jsoup. Being in Europe, my connection might be slower as yours. However you might want to check for this exception in the log of your AS.
By setting the timeout to 10 seconds, I was able to download and parse the document :
Connection connection = Jsoup.connect(URL);
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);
With the rest of your code I get :
Population:78,413
Population Change Since 1990:53.00%
Population Density:6,897
Male:41,137
Female:37,278
.....

thanx Julien, I tried with the following code, getting SocketTimeoutException. And code is
Connection connection=Jsoup.connect("http://www.moving.com/real-estate/city-
profile/results.asp?Zip=60505");
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing website by jsoup Java - java

Related

How to extract the text from the web site?

Grabing Information from Yahoo finnance jsoup

How to scrape the hyperlinks from a webpage using JSOUP

How to get absolute URL parh without files

How to get the content from a website using Jsoup

Categories

Resources