How to use xpath to get href value - java

<div id="AdvancedSearchResultsContainter">
<table id="SearchResults" class="tablesorter">
<thead>
<tr>
<th scope="col" class="header">School name</th>
<th scope="col" class="header">School type</th>
<th scope="col" class="header">Sector</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ABC Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>XYZ High School</td>
<td class="nowrap">Secondary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="odd">
<td>PQR Park Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>JKL Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
</tbody>
</table>
</div>
I am using selenum and xpath .
I want to get the numeric value of href .Out of this href
i want to get 82648. like to put it in loop and get all numeric in href.
can some one please help.

You can use following css selector to get the <a> element:
By.cssSelector("#SearchResults tr a");
Then get all the link elements by using driver.findElements(By.cssSelector("#SearchResults tr a")) and the use getAttribute("href") to get the urls
Something like:
List<WebElement> elements = driver.findElements(By.cssSelector("#SearchResults tr a"));
Get the urls and then do whatever you want. The java.lang.String class provides a lot of methods to work on string. By the help of these methods, we can perform operations on string such as trimming, concatenating, converting, comparing, replacing strings etc. As an example:
for(WebElement e : elements) {
String url = e.getAttribute("href");
System.out.println(url.substring(url.length()-5));
}
There are other methods to get the substring as well.
Also you can write a method which will return a String and then you can assert if you intend to do so.

Related

Parsing Table with Jsoup for Android App

I am attempting to parse through a table on a website for a give table row in which the first column matches a certain string of characters. Below is the HTML for part of the table (it's very larger)
<table class="table display datatable" id="datatable1">
<thead>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</thead>
<tfoot>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</tfoot>
<tbody>
<tr>
<td>3R8RDBxiux3g1pFCCsQnm2vwD34axsVRTrEWzyX8tngJaRnNWkbnuFEewzuBAKhQrb3LxEQHtuBg1zW4tybt83SS</td>
<td>44279</td>
<td>27.37 %</td>
<td>1154</td>
</tr>
<tr>
<td>5gwVxC9cXguHHjD9wtTpHfsJPaZx4fPcvWD5jGWF1dcuHnAMyXxteaHrEtXviZkvWN3FAnevbVLErABSsP6mS7PR</td>
<td>36369</td>
<td>22.48 %</td>
<td>2725</td>
</tr>
<tr>
<td>2qZXPmop82UiA7LQEQqdoUzjFbcwCSpqf8U1f3656XXSsHnGvGXYTNoP11s2asiVSyVS8LPFqxmpdCeSNxcpFMnF</td>
<td>28596</td>
<td>17.68 %</td>
<td>967</td>
</tr>
<tr>
<td>21mbNSDo7g9BAyjsZGxnNfJUrEtBUVVNQZhR4tkVwdEHPaMNsa2u2JHQPAAe5riGfPA9Khb1Pq3bQGhqmrLEGNqN</td>
<td>6104</td>
<td>3.77 %</td>
<td>4787</td>
</tr>
<tr>
<td>4HAakKK7dSq18Djg7m6cLSyHb5aUU6ngvBQimo8pYyF5F64qX3gE4T8q8kfWHTZ79FvXybSG3JhUfSZDDv2sRwqY</td>
<td>5895</td>
<td>3.64 %</td>
<td>6020</td>
</tr>
<tr>
<td>2r2izPEC5o7ZDnUsdDA97q8wKCeZRRg9n243Rd9vkMQqRCtc6ZRUTruQUyZGduoHy8pTYPuEq9ACXPKfXt8fqKxS</td>
<td>5605</td>
<td>3.46 %</td>
<td>10958</td>
</tr>
</tbody>
</table>
I am trying to step through the table and search for a specific row but I am receiving an IndexOutOfBoundsException.
Would there be a better way to code the statement below?
for (Element table : doc.select("table")){
for(Element row : table.select("tr")){
Elements tds = row.select("td");
if(tds.get(0).text().equals("4HjSN79KUMz7AQC3GBvGkgPa5Qrio9HWTh7hg9JY48fkrYeVZJVmzB9YCB6GZSpuXB7V7DjJVuke3ZaCm5k7sRLE")){
myHistoricShares =tds.get(0).text();
}
}
}
As I said in comments, your table.select("tr") selects rows not only inside <tbody>, but inside the header and the footer too. For those rows row.select("td") returns an empty list, and hence tds.get(0) throws the IndexOutOfBoundsException.
You could simplify your loop by selecting only the rows in <tbody>:
for (Element row: doc.select("table#datatable1>tbody>tr")) {
if (row.children().size() > 0 && "some_long_string".equals(row.child(0).text())) {
doSomething();
}
}
The selector "table#datatable1>tbody>tr" selects the table with id="datatable1", then its exact tbody child and then all its exact tr children. So you only need to iterate through them once.

JSoup Returning IndexOutOfBoundsException when fetching data from Document

Im having a really difficult time resolving the error i'm getting! To cut a story short, I am trying to get a specific element from a table in HTML! Easy right? Well that's what I thought.. Essentially, if I copy the exact HTML page source from the browser and read it in from a file, I can find the element that I need.
However, when reading through the document through document.connect("URL"), I'm getting the error! I've been sat here for about 4 hours now, reading around trying to understand what's going on. I'm fairly confident with JSoup but this has stumped me! The code is below:
private String parseKcal( Element kCalElement ) throws IOException {
//Getting error on below line
String calories = kCalElement.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();
if (calories == null) {
throw new IOException();
}
return calories.toString();
}
The parameter kCalElement is the document im trying to get the element from!
**** The error ****
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at models.Product.parseKcal(Product.java:67)
**** The HTML which i'm trying to parse ****
<div class="tableWrapper">
<table class="nutritionTable">
<thead>
<tr class="tableTitleRow">
<th scope="col">Typical Values</th><th scope="col">Per 100g </th><th scope="col">% based on RI for Average Adult</th>
</tr>
</thead>
<tr class="tableRow1">
<th scope="row" class="rowHeader" rowspan="2">Energy</th><td class="tableRow1">140kJ</td><td class="tableRow1">-</td>
</tr>
<tr class="tableRow0">
<td class="nutritionLevel1">33kcal</td><td class="nutritionLevel1">2%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Fat</th><td class="nutritionLevel1"><0.5g</td><td class="nutritionLevel1">-</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Saturates</th><td class="nutritionLevel1"><0.1g</td><td class="nutritionLevel1">-</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Carbohydrate</th><td class="tableRow1">6.1g</td><td class="tableRow1">2%</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Total Sugars</th><td class="nutritionLevel2">6.1g</td><td class="nutritionLevel2">7%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Fibre</th><td class="tableRow1">1.0g</td><td class="tableRow1">-</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Protein</th><td class="tableRow0">0.6g</td><td class="tableRow0">1%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Salt</th><td class="nutritionLevel1"><0.01g</td><td class="nutritionLevel1">-</td>
</tr>
</table>
</div>
<p>RI= Reference Intakes of an average adult (8400kJ / 2000kcal)</p>
</div>
</div>
This does not work however when I paste the html into a string, it works!
See below:
File input = new File("~/Desktop/file.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Document document = Jsoup.parse(doc.toString());
String calories = document.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();
System.out.println(calories);
Can please someone help me from pulling my hair out! I am STUMPED :(
EDIT
I am trying to get the kcal element that contains calories!!!

Selenium WebDriver - iteration through table rows

I'm having an issue with Selenium in Java.
I have a web page like this:
<html>
<body>
<div id='content'>
<table class='matches'>
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
</table>
<div id='content'>
<body>
<html>
I first get all the rows into a list:
List<WebElement> allRows = driver.findElements(By.xpath("//table[#class='matches']/tbody/tr[contains(#id, 'today')]"));
Next I iterate through all the elements displaying the WebElement (i.e. the row) and on the next line I display the td containing the home team, separated by a line:
for (WebElement row : allRows) {
System.out.println("Outer HTML for row" + row.getAttribute("outerHTML"));
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("//td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
System.out.println("------------------------------------------------------------");
}
The first println displays all rows, one by one.
The second however displays ONLY 'Real Madrid' for each iteration. I'm losing my mind because I don't understand why. Can someone please help?
The output:
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
You have to use like this
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
Then it will point to the correct element that we want.

get table span class content using jsoup

I have a website that contains a table that look like similar(bigger..) to this one:
</table>
<tr>
<td>
<table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
<tr>
<td align="right" style="padding-right: 25px;">
<span class="artist_name_txt">
name
<p class="diccografia">subname</p>
</span>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">
<tr>
<td class="songs" align="right">
number1
</td>
</tr>
<tr>
<td class="songs" align="right">
number2
.......
</td>
</tr>
</table>
and I need an idea how can i parse the website and extract this table into 2 arrays -
one will be something like names{number1, number2}
and the second will be links{number1link, number2link}
I tried a lot of ways and nothing really helps me.
You should read the JSoup Cookbook - especially the Selector syntax is very powerful.
Here's an example:
final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html);
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();
for( Element element : doc.select("a.artist_player_songlist") )
{
names.add(element.text());
links.add(element.attr("href"));
}
System.out.println("Names: " + names);
System.out.println("Links: " + links);
Output:
Names: [number1, number2]
Links: [/number1link, /number2link]
Android Web Scraping with a Headless Browser
Htmlunit on Android application
HttpUnit/HtmlUnit equivalent for android

get particular element via jsoup

i want to select td.team in such way for 1st textview1 i want td.team (1st td.team ) and for 2nd textview i want (2nd td.team ) ... i am using list apdater
Elements info = dpc.select("td.team "); but it giving me both team it 1st and 2nd so how i can do it .. pl tell me what Elements shld u use to get info
<tr class="odd">
<td class="date">10</td>
<td class="team">one</td>
<td class="team">two</td>
<td class="type">8M</td>
</tr>
<tr class="even">
<td class="date">01</td>
<td class="team">Nice</td>
<td class="team">Monaco</td>
<td class="type">€ 4.1M</td>
</tr>
<tr class="odd">
<td class="date">07</td>
<td class="team">thre</td>
<td class="team">fou</td>
<td class="type"> 600K</td>
</tr>
<tr class="even">
<td class="date">99</td>
<td class="team"><a href="sad" title="Marsala">M/a></td>
<td class="team">a</td>
<td class="type">50K</td>
</tr>
i really don't understand your question. Do u want to get the first td from each row and second td from each row. If thats is the case you need to form an array.
Elements info = dpc.select("tr.odd,tr.even").select("td.team");
int i = 0;
String linkText = "";
String cse_id = null ;
ArrayList<String> s =new ArrayList<String>();
for(Element el : info ){
linkText = el.attr("href");//or el.attr("title")
s.add(linkText);
System.out.println(linkText);/or print it
}
}
There might be some errors in the code i didn't test it .
'Select' method returns a 'Elements' object. This class has a method called 'get(int index)' which returns the i-th element in the selection, starting from zero.

Categories