I want to loop through the news table and get the title and rating of each row. I tried different options, but I can’t understand why the select method receives all the options at once.
I need to get each news block in a loop.
I used this way to get table link:
Elements elements = document.select("#hnmain > tbody > tr:nth-child(3) > td > table");
This query doesn't work in a loop because it gets all the elements at once. I need to get the elements sequentially. So that I can do like this:
List list = new ArrayList<>();
for (Element element: elements){
String title = element...
String rating = element...
list.add(title);
list.add(rating);
}
Sample data from html:
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr class="athing" id="33582264">
<td align="right" valign="top" class="title"><span class="rank">1.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33582264" href="vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class="sitebit comhead"> (<span class="sitestr">upbase.io</span>)</span></span></td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext"><span class="subline"> <span class="score" id="score_33582264">632 points</span> by tonypham <span class="age" title="2022-11-13T12:00:06">20 days ago</span> <span id="unv_33582264"></span> | hide | 456 comments </span></td>
</tr>
<tr class="spacer" style="height:5px"></tr>
<tr class="athing" id="33584941">
<td align="right" valign="top" class="title"><span class="rank">2.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33584941" href="vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Forking Chrome to turn HTML into SVG<span class="sitebit comhead"> (<span class="sitestr">fathy.fr</span>)</span></span></td>
</tr>
if I understand your question I think this code will work for you
Document doc = Jsoup.parse("<table border=\"0\" id=\"hnmain\" cellpadding=\"0\" cellspacing=\"0\"> <tbody> <tr class=\"athing\" id=\"33582264\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">1.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33582264\" href=\"vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class=\"sitebit comhead\"> (<span class=\"sitestr\">upbase.io</span>)</span></span></td> </tr> <tr> <td colspan=\"2\"></td> <td class=\"subtext\"><span class=\"subline\"> <span class=\"score\" id=\"score_33582264\">632 points</span> by tonypham <span class=\"age\" title=\"2022-11-13T12:00:06\">20 days ago</span> <span id=\"unv_33582264\"></span> | hide | 456 comments </span></td> </tr> <tr class=\"spacer\" style=\"height:5px\"></tr> <tr class=\"athing\" id=\"33584941\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">2.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33584941\" href=\"vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Forking Chrome to turn HTML into SVG<span class=\"sitebit comhead\"> (<span class=\"sitestr\">fathy.fr</span>)</span></span></td> </tr>");
Elements elements = doc.select("#hnmain .athing");
for (Element element : elements) {
String title = element.select(".title").text();
String rank = element.select(".rank").text();
System.out.println(title + " -- "+rank);
}
Related
I have a table which has a column which contains:
a list of invoices
a column which contains a lots of charge types for every invoice displayed
What I want to do is to make a function which receives a String parameter,for example the invoice number and return all the charge types for invoice number inserted
Here is the code for the table
Every time a new invoice is displayed on the table,the first line of the table contains and a value
That value represents the number of the charge types displayed on every invoice
For example the charge types are :Management fee,Payments,Funds Transmission Cost,Acquiring Authorisation Fee,Service etc.
<form method="post" action="/accounting/billing/showInvoiceTransactionsCountTotal.html?
jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak" target="detail">
<table>
<tbody>
<tr>
<tr class="odd">
<td rowspan="8">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123399&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123399
<input type="hidden" value="BA7123399" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Payments</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Funds Transmission Cost (FTC)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Acquiring Authorisation Fees</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Service</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Refunds</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Chargebacks</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Minimum Billing</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="even">
<td rowspan="4">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123421&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123421
<input type="hidden" value="BA7123421" name="invoiceChecked"/>
</a>
</td>
<td>Payments</td>
<td>ALEXAUTOMATION01</td>
<td>ALEXADCODE</td>
<td>1</td>
</tr>
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="odd">
<td rowspan="8">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123398&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123398
<input type="hidden" value="BA7123398" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="even">
<td rowspan="10">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123397&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123397
<input type="hidden" value="BA7123397" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
</tbody>
You need to add some logic to achieve this scenario.since,invoice number row is missing for some of the common invoice number. Please try with the below
Algorithm:
1. Firstly,find all the row elements from the table
2. Iterate all the rowelement and match the expected Invoice Number.
3. If the Invoice Number is matched, then print all the sub sequence column charge type until next charge invoice number matches
Code:
String InvoiceNumber="";
List<String> chargetype=new ArrayList<>();
Boolean isInvoiceSpecificCharge=false;
//Find All the tr specific element
List<WebElement> elementList=driver.findElements(By.xpath("//table/tbody/tr"));
for(WebElement element:elementList){
WebElement tempElement=null;
try{
tempElement=element.findElement(By.xpath(".//a"));
}
catch(Exception e){
}
//If the Invoice Number is present, then we need to take the charge from td[2] else from td[1].
if(tempElement.getText().equalsIgnoreCase(InvoiceNumber)){
isInvoiceSpecificCharge=true;
chargetype.add(element.findElement(By.xpath(".//td[2]")).getText());
}
else if(tempElement==null && isInvoiceSpecificCharge ==true){
chargetype.add(element.findElement(By.xpath(".//td")).getText());
}else if(!tempElement.getText().equalsIgnoreCase(InvoiceNumber)){
isInvoiceSpecificCharge=false;
}
}
I am new in Thymeleaf and i try to subtract value of column paid amount from total amount but gives error as follow:
and if I comment Remaining amount column i get following result:
<div th:if="${not #lists.isEmpty(cust)}">
<table border="1" style="width: 300px">
<thead>
<tr>
<th>Name</th>
<th>Address</th>
<th>Phone</th>
<th>Total Amount</th>
<th>Paid Amount</th>
<th>Remaining Amount</th>
</tr>
</thead>
<tbody>
<tr th:each="customer : ${cust}">
<td th:text="${customer.name}"></td>
<td th:text="${customer.address}"></td>
<td th:text="${customer.phone}"></td>
<td
th:with="result1=${#aggregates.sum(customer.customerDetails.![totalAmount])}">
<span th:text="${result1}"></span>
</td>
<td
th:with="result3=${#aggregates.sum(customer.payment.![paidAmount])}">
<span th:text="${result3}"></span>
</td>
<td th:with="result=${#aggregates.sum(customer.customerDetails.![totalAmount])}, result2=${#aggregates.sum(customer.payment.![paidAmount])}">
<span th:text="${result}- ${result2}"></span>
</td>
</tr>
</th:block>
</tbody>
</table>
</div>
I'm having an issue with Selenium in Java.
I have a web page like this:
<html>
<body>
<div id='content'>
<table class='matches'>
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
</table>
<div id='content'>
<body>
<html>
I first get all the rows into a list:
List<WebElement> allRows = driver.findElements(By.xpath("//table[#class='matches']/tbody/tr[contains(#id, 'today')]"));
Next I iterate through all the elements displaying the WebElement (i.e. the row) and on the next line I display the td containing the home team, separated by a line:
for (WebElement row : allRows) {
System.out.println("Outer HTML for row" + row.getAttribute("outerHTML"));
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("//td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
System.out.println("------------------------------------------------------------");
}
The first println displays all rows, one by one.
The second however displays ONLY 'Real Madrid' for each iteration. I'm losing my mind because I don't understand why. Can someone please help?
The output:
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
You have to use like this
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
Then it will point to the correct element that we want.
I am trying to scrape with selenium a table of products.
Here is my example table:
<div class="article">
<table style="width: 100%">
<tbody><tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900101" class="changeable">
<span>Product 1 </span>
</a>
</td>
<td class="trenner_lu">
11.11.1999
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 1</a>
</td>
<td class="trenner_lu">
1999$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900347" class="changeable">
<span>Product 2 </span>
</a>
</td>
<td class="trenner_lu">
1.12.1944
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
</td>
<td class="trenner_lu">
1234$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12908635" class="changeable">
<img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">
<span>Product 1 </span>
<img src="/Content/images/icons/photo.png" alt="Foto">
</a>
</td>
<td class="trenner_lu">
05.12.1950
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 4</a>
</td>
<td class="trenner_lu">
131282$
</td>
</tr>
</tbody></table>
</div>
I tried to scrape each element with:
List<WebElement> links = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> prodNames = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> group = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
However, as you can see one of my td elements has two links inside, therefore my WebElement list has not the same length and it is extremely hard to merge together.
My desired list output should look like that:
[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]
Any suggestion how to scrape such a table much more efficient?
I appreciate your replies!
Think about everything you interact with as of objects:
class Table {
private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";
public String getTableCellText(int row, int col) {
WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
return cell.getText();
}
}
You can use it as you see fit:
Table t = new Table();
System.out.println(t.getTableCellText(3, 5)); // prints 131282$
You could probably iterate through each row to make it clearer as to what you are doing in python it would be:
rows = driver.find_elements(By.XPATH, "//*[#id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
cells = row.find_elements(By.XPATH, "//td")
product_name = cells[1].text
... etc ...
I have a website that contains a table that look like similar(bigger..) to this one:
</table>
<tr>
<td>
<table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
<tr>
<td align="right" style="padding-right: 25px;">
<span class="artist_name_txt">
name
<p class="diccografia">subname</p>
</span>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">
<tr>
<td class="songs" align="right">
number1
</td>
</tr>
<tr>
<td class="songs" align="right">
number2
.......
</td>
</tr>
</table>
and I need an idea how can i parse the website and extract this table into 2 arrays -
one will be something like names{number1, number2}
and the second will be links{number1link, number2link}
I tried a lot of ways and nothing really helps me.
You should read the JSoup Cookbook - especially the Selector syntax is very powerful.
Here's an example:
final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html);
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();
for( Element element : doc.select("a.artist_player_songlist") )
{
names.add(element.text());
links.add(element.attr("href"));
}
System.out.println("Names: " + names);
System.out.println("Links: " + links);
Output:
Names: [number1, number2]
Links: [/number1link, /number2link]
Android Web Scraping with a Headless Browser
Htmlunit on Android application
HttpUnit/HtmlUnit equivalent for android