Fetching data from a webpage - java

I need to fetch data from the website "https://www.arbatunity.com/index.php", I want the data from the top right of the website that says current market profit.
I need this as a string value that can be updated.

With the JSoup library, this is easy:
Document doc = Jsoup.connect("https://www.arbatunity.com/index.php").get();
Elements elements = doc.select("#id_profit b");
String percent = ""
for (Element e : elements) {
percent = e.html();
}
//percent holds the String you're looking for

Related

Trying to get the text out of a List of webelements with Selenium WebDriver

In my code, I try to find all elements with a specific name, then try taking each elements' descendant and get its title, link and price. The price I'm having issues with because it sticks to the price tag of the first element from the WebElements list.
List<WebElement> autos = driver.findElements(By.xpath("//section[contains(#class ,'ui-search-results')]/ol/li//a[#class = 'ui-search-result__content ui-search-link']"));
for(WebElement auto : autos) {
String model = auto.getAttribute("title");
String page = auto.getAttribute("href");
String price = auto.findElement(By.xpath("//span[#class = 'price-tag-fraction']")).getText();
System.out.println(model + page + price);
}
Console is printing model and page just fine but the price is always the same one. I already tested the site and there is a price-tag-fraction per element.
When you use XPath and want to start searching from a specific element, you need to add a . to the start of the XPath. In your case
"//span[#class = 'price-tag-fraction']"
becomes
".//span[#class = 'price-tag-fraction']"
Your updated code
List<WebElement> autos = driver.findElements(By.xpath("//section[contains(#class ,'ui-search-results')]/ol/li//a[#class = 'ui-search-result__content ui-search-link']"));
for(WebElement auto : autos) {
String model = auto.getAttribute("title");
String page = auto.getAttribute("href");
String price = auto.findElement(By.xpath(".//span[#class = 'price-tag-fraction']")).getText();
System.out.println("Model: %s, Page: %s, Price: %s".formatted(model, page, price));
}
NOTE: I changed your print statement to make it easier to read. You could also write these to a CSV file and then open them later in Excel, etc. as a table.

Extracting user details from facebook page

I am extracting details from a page which I'm administering. I tried using jsoup to extract the links then from that extract names of users but it's not working. It only shows links other than user links. I tried extracting names from this link
https://www.facebook.com/plugins/fan.php?connections=100&id=pageid
which is working quite well but does not works for this link
https://www.facebook.com/browse/?type=page_fans&page_id=
Can anyone help me...Below is the code which I tried.
doc = Jsoup.connect("https://www.facebook.com/browse/?type=page_fans&page_id=mypageid").get();
Elements els = doc.getElementsByClass("fsl fwb fcb");
Elements link = doc.select("a[href]");
for(Element ele : link)
{
system.out.println(ele.attr("href"));
} }
Try This
Document doc = Jsoup.connect("https://www.facebook.com/plugins/fan.php?connections=100&id=pageid").timeout(0).get();
Elements nameLinks = doc.getElementsByClass("link");
for (Element users : nameLinks) {
String name = users.attr("title");
String url = users.attr("href");
System.out.println(name + "-" + url);
}
It will give all the users name and URl present on the first link defined in your question.

JSoup parsing a text file containing a html table with Java

I am really unsure how I can get the information I need to place into a database, the code below just prints the whole file.
File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);
System.out.println(doc.toString());
My HTML is here from line 61 and I am needing to get the items under the column headings but also grab the MMSI number which is not under a column heading but in the href tag. I haven't used JSoup other than to get the HTML from the web page. I can only really see tutorials to use php and I'd rather not use it.
To get those information, the best way is to use Jsoup's selector API. Using selectors, your code will look something like this (pseudeocode!):
File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);
Elements matches = doc.select("<your selector here>");
for( Element element : matches )
{
// do something with found elements
}
There's a good documentation available here: Use selector-syntax to find elements. If you get stuck nevertheless, please describe your problem.
Here are some hints for that selector, you can use:
// Select the table with class 'shipinfo'
Elements tables = doc.select("table.shipinfo");
// Iterate over all tables found (since it's only one, you can use first() instead
for( Element element : tables )
{
// Select all 'td' tags of that table
Elements tdTags = element.select("td");
// Iterate over all 'td' tags found
for( Element td : tdTags )
{
// Print it's text if not empty
final String text = td.text();
if( text.isEmpty() == false )
{
System.out.println(td.text());
}
}
}

jsoup how to reach dropdownlist

Hello everybody I want to get the data from
http://sansoyunlari.hurriyet.com.tr/SayisalLoto/SayisalLotoSonuclari.aspx this adress by using jsoup ı can get them but only the latest results . There is a dropdownlist on the website which consists dates how can I reach other dates ? by the way I will move these codes to the android these are codes which is written in netbeans for now. ı will put a dropdownlist to my android program which get the data from this adress and also the results.
these are my java codes I wrote until now
public static void main(String[] args) {
String adres = "http://sansoyunlari.hurriyet.com.tr/SayisalLoto/SayisalLotoSonuclari.aspx";
ArrayList sayi = new ArrayList<>();
sayi.add("six");
sayi.add("five");
sayi.add("four");
sayi.add("three");
sayi.add("two");
sayi.add("one");
//Sayısal Loto
try {
Document doc = Jsoup.connect(adres).get();
Elements sonuclar = doc.select("div.hurriyet2010_so_sanstopu_no_bg");
//1. yi manuel almak gerek ilk yoldan çünkü resut diye kodlanmış
Elements sonuclar1 = doc.select("span#_ctl0_ContentPlaceHolder1_lblresut"+sayi.get(sayi.size()-1));
Element numaralar = sonuclar1.first();
System.out.println(numaralar.text());
//yol 1 numaraları almak için
for (int i = sonuclar.size();i>1;i--)
{
sonuclar1 = doc.select("span#_ctl0_ContentPlaceHolder1_lblresult"+sayi.get(i-2));
Element numaralar1 = sonuclar1.first();
System.out.println(numaralar1.text());
}
//yol 2 numaraları almak için
// for(Element el : sonuclar)
// {
// System.out.println(el.text());
// }
//kazanan kişi sayısı ve ikramiye tutarı için
for(int i = 0;i<4;i++)
{
int b = 6 -i;
System.out.println(b + " bilen kişi sayısı :");
sonuclar = doc.select("span#_ctl0_ContentPlaceHolder1_lblluckycount"+sayi.get(i));
Element el = sonuclar.first();
System.out.println(el.text());
System.out.println("Kişi başına düşen ikramiye :");
sonuclar = doc.select("span#_ctl0_ContentPlaceHolder1_lblluckyamount"+sayi.get(i));
el = sonuclar.first();
System.out.println(el.text());
}
}
catch(Exception e){
}
}
To get the select item you should do:
Element select = doc.select("#_ctl0_ContentPlaceHolder1_ddlSayisalLotoDates").first();
Now the children of this elements are the "option" items you want:
for (Element e : select) {
String date = e.text();
}
edit
I looked at the html source. In order to get the right page you need to do a post request at the URL "http://sansoyunlari.hurriyet.com.tr/SayisalLoto/SayisalLotoSonuclari.aspx" with following params:
__EVENTARGUMENT = empty
__EVENTTARGET = _ctl0$ContentPlaceHolder1$ddlSayisalLotoDates
__EVENTVALIDATION = a random value that you get from the html page
__LASTFOCUS = empty
__VIEWSTATE = another random value
_ctl0:ContentPlaceHolder1:ddlSayisalLotoDates = The ID of the date you want to search (i.e. 884 for 19 Ekim 2013)
txtSearch = can be empty
As you can see, it's quite annoying scraping an ASP.NET webpage..
Use an application like Fiddler (or another one) to find the params you need to post (hidden inputs, session cookies, your selected input). Probably you're missing some of them.
Hope it helps.

How to extract absolute URL from relative HTML links using Jsoup?

I am using Jsoup to extract URL of an webpage. The href attribute of those URL's are relative like:
example
Here is my attempt:
Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
String url = dl.select("a").attr("href");
}
This works fine, but if I use
String url = dl.select("a").attr("abs:href");
to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?
You need Element#absUrl().
String url = dl.select("a").absUrl("href");
You can by the way shorten the select:
Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
String url = link.absUrl("href");
}
String url = dl.select("a").absUrl("href");
Is not correct because dl.select("a") will not return a single item but a collection.
You need to get elements by index
eg :
Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");
If you are sure only one item will result from the select above, or that the item you want will be the first, you can:
String url = dl.select("a").get(0).absUrl("href");
Which is also same as
String url = dl.select("a").first().absUrl("href");
It doesn't have to be the first element anyway, you can always replace the 0 in
String url = dl.select("a").get(0).absUrl("href"); with the index of your element.
Or use a select that is more specific that will only result in one element.

Categories