Parsing table html with Jsoup - java

I'm try to parsing this table:
<table border="1" align="center" cellpadding="5" width="95%">
<tbody>
<tr>
<td colspan="2" align="center"> <b> <i> Test </i> </b> </td>
<td> <b> <i> Result </i> </b> </td>
<td> <b> <i> Credit </i> </b> </td>
<td> <b> <i> Data </i> </b> </td>
<td> <b> <i> A/A </i> </b> </td>
<td> <b> <i> Other data </i> </b> </td>
<td> <b> <i> A/A rif. </i> </b> </td>
</tr>
<tr>
<td> A000211 </td>
<td nowrap=""> Physic </td>
<td align="center"> - </td>
<td align="center"> 6 </td>
<td align="center"> - </td>
<td align="center"> 2008/2009 </td>
<td> something </td>
<td align="center"> 2007/2008 </td>
</tr>
<tr>
<td> 0065057 </td>
<td nowrap=""> Math </td>
<td align="center"> - </td>
<td align="center"> 6 </td>
<td align="center"> - </td>
<td align="center"> 2008/2009 </td>
<td> samething </td>
<td align="center"> 2008/2009 </td>
</tr>
<tr>
In java i have this, for now:
Document doc = Jsoup.parse(url);
Elements tables = doc.getElementsByTag("table");
I try to put this data in JsonObject i must iterate this tables? or there is a simple way?

i solved with:
Document doc = Jsoup.parse(url);
JSONObject jsonObject = new JSONObject();
JSONArray list = new JSONArray();
Element rows = doc.getElementsByTag("table tr");
for(Element row : rows) {
String Test = row.getElementsByTag("td").get(1).text();
String Result = row.getElementsByTag("td").get(2).text();
String Credit = row.getElementsByTag("td").get(3).text();
jsonObject.put("Test", Test);
jsonObject.put("Result", Result);
jsonObject.put("Credit", Credit);
}

Related

How to loop through this query using Jsoup?

I want to loop through the news table and get the title and rating of each row. I tried different options, but I can’t understand why the select method receives all the options at once.
I need to get each news block in a loop.
I used this way to get table link:
Elements elements = document.select("#hnmain > tbody > tr:nth-child(3) > td > table");
This query doesn't work in a loop because it gets all the elements at once. I need to get the elements sequentially. So that I can do like this:
List list = new ArrayList<>();
for (Element element: elements){
String title = element...
String rating = element...
list.add(title);
list.add(rating);
}
Sample data from html:
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr class="athing" id="33582264">
<td align="right" valign="top" class="title"><span class="rank">1.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33582264" href="vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class="sitebit comhead"> (<span class="sitestr">upbase.io</span>)</span></span></td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext"><span class="subline"> <span class="score" id="score_33582264">632 points</span> by tonypham <span class="age" title="2022-11-13T12:00:06">20 days ago</span> <span id="unv_33582264"></span> | hide | 456 comments </span></td>
</tr>
<tr class="spacer" style="height:5px"></tr>
<tr class="athing" id="33584941">
<td align="right" valign="top" class="title"><span class="rank">2.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33584941" href="vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Forking Chrome to turn HTML into SVG<span class="sitebit comhead"> (<span class="sitestr">fathy.fr</span>)</span></span></td>
</tr>
if I understand your question I think this code will work for you
Document doc = Jsoup.parse("<table border=\"0\" id=\"hnmain\" cellpadding=\"0\" cellspacing=\"0\"> <tbody> <tr class=\"athing\" id=\"33582264\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">1.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33582264\" href=\"vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class=\"sitebit comhead\"> (<span class=\"sitestr\">upbase.io</span>)</span></span></td> </tr> <tr> <td colspan=\"2\"></td> <td class=\"subtext\"><span class=\"subline\"> <span class=\"score\" id=\"score_33582264\">632 points</span> by tonypham <span class=\"age\" title=\"2022-11-13T12:00:06\">20 days ago</span> <span id=\"unv_33582264\"></span> | hide | 456 comments </span></td> </tr> <tr class=\"spacer\" style=\"height:5px\"></tr> <tr class=\"athing\" id=\"33584941\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">2.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33584941\" href=\"vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Forking Chrome to turn HTML into SVG<span class=\"sitebit comhead\"> (<span class=\"sitestr\">fathy.fr</span>)</span></span></td> </tr>");
Elements elements = doc.select("#hnmain .athing");
for (Element element : elements) {
String title = element.select(".title").text();
String rank = element.select(".rank").text();
System.out.println(title + " -- "+rank);
}

Setting multiple variable value and perform subtraction in thymleaf

I am new in Thymeleaf and i try to subtract value of column paid amount from total amount but gives error as follow:
and if I comment Remaining amount column i get following result:
<div th:if="${not #lists.isEmpty(cust)}">
<table border="1" style="width: 300px">
<thead>
<tr>
<th>Name</th>
<th>Address</th>
<th>Phone</th>
<th>Total Amount</th>
<th>Paid Amount</th>
<th>Remaining Amount</th>
</tr>
</thead>
<tbody>
<tr th:each="customer : ${cust}">
<td th:text="${customer.name}"></td>
<td th:text="${customer.address}"></td>
<td th:text="${customer.phone}"></td>
<td
th:with="result1=${#aggregates.sum(customer.customerDetails.![totalAmount])}">
<span th:text="${result1}"></span>
</td>
<td
th:with="result3=${#aggregates.sum(customer.payment.![paidAmount])}">
<span th:text="${result3}"></span>
</td>
<td th:with="result=${#aggregates.sum(customer.customerDetails.![totalAmount])}, result2=${#aggregates.sum(customer.payment.![paidAmount])}">
<span th:text="${result}- ${result2}"></span>
</td>
</tr>
</th:block>
</tbody>
</table>
</div>

Scrape td attribute rows with selenium

I am trying to scrape with selenium a table of products.
Here is my example table:
<div class="article">
<table style="width: 100%">
<tbody><tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900101" class="changeable">
<span>Product 1 </span>
</a>
</td>
<td class="trenner_lu">
11.11.1999
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 1</a>
</td>
<td class="trenner_lu">
1999$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900347" class="changeable">
<span>Product 2 </span>
</a>
</td>
<td class="trenner_lu">
1.12.1944
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
</td>
<td class="trenner_lu">
1234$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12908635" class="changeable">
<img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">
<span>Product 1 </span>
<img src="/Content/images/icons/photo.png" alt="Foto">
</a>
</td>
<td class="trenner_lu">
05.12.1950
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 4</a>
</td>
<td class="trenner_lu">
131282$
</td>
</tr>
</tbody></table>
</div>
I tried to scrape each element with:
List<WebElement> links = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> prodNames = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> group = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
However, as you can see one of my td elements has two links inside, therefore my WebElement list has not the same length and it is extremely hard to merge together.
My desired list output should look like that:
[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]
Any suggestion how to scrape such a table much more efficient?
I appreciate your replies!
Think about everything you interact with as of objects:
class Table {
private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";
public String getTableCellText(int row, int col) {
WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
return cell.getText();
}
}
You can use it as you see fit:
Table t = new Table();
System.out.println(t.getTableCellText(3, 5)); // prints 131282$
You could probably iterate through each row to make it clearer as to what you are doing in python it would be:
rows = driver.find_elements(By.XPATH, "//*[#id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
cells = row.find_elements(By.XPATH, "//td")
product_name = cells[1].text
... etc ...

get table span class content using jsoup

I have a website that contains a table that look like similar(bigger..) to this one:
</table>
<tr>
<td>
<table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
<tr>
<td align="right" style="padding-right: 25px;">
<span class="artist_name_txt">
name
<p class="diccografia">subname</p>
</span>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">
<tr>
<td class="songs" align="right">
number1
</td>
</tr>
<tr>
<td class="songs" align="right">
number2
.......
</td>
</tr>
</table>
and I need an idea how can i parse the website and extract this table into 2 arrays -
one will be something like names{number1, number2}
and the second will be links{number1link, number2link}
I tried a lot of ways and nothing really helps me.
You should read the JSoup Cookbook - especially the Selector syntax is very powerful.
Here's an example:
final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html);
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();
for( Element element : doc.select("a.artist_player_songlist") )
{
names.add(element.text());
links.add(element.attr("href"));
}
System.out.println("Names: " + names);
System.out.println("Links: " + links);
Output:
Names: [number1, number2]
Links: [/number1link, /number2link]
Android Web Scraping with a Headless Browser
Htmlunit on Android application
HttpUnit/HtmlUnit equivalent for android

using if else statement for dynamic table jsp

EDIT:
I am retrieving an arraylist of data with data like:[category,content,category,content,...],and
I want to display a table on a JSP page with dynamic data like this:
<tr>
<td align="center">
Category of Tweets:
</td>
<td align="center">
All Tweets of User:
</td>
</tr>
<tr>
<td align="center">
Entertainment
</td>
<td align="center">
Tweet Content
</td>
</tr>
But using the source code I have below:
<table id="table" border="1" align="center">
<tr>
<td align="center">
Category of Tweets:
</td>
<td align="center">
All Tweets of User: <%out.println(userName); %>
</td>
</tr>
<%
for(int i=0;i<stringList.size();i++){
%>
<tr>
<td>
<%
if(i%2==0){
String category = stringList.get(i).toString();
out.print(category);
%>
</td>
<%}else{ %>
<td>
<%
String content = stringList.get(i).toString();
out.print(content);
}
%>
</td>
</tr>
<%
}
%>
</table>
The browser seems to duplicate extra table tags just before the else statement:
<td>
</td>
</tr>
<tr>
<td>
</td>
I am lost on how to resolve this.Could anybody tell me how should I amend the code?
Assuming what you said " I am retrieving an arraylist of data with data like:[category,content,category,content,...]" and that list contains pair elements you can iterate the list like presented below:
<table id="table" border="1" align="center">
<tr>
<td align="center">
Category of Tweets:
</td>
<td align="center">
All Tweets of User: <%out.println(userName); %>
</td>
</tr>
<%
for(int i=0;i<stringList.size()/2;i++){
%>
<tr>
<td align="center">
<%
String category = stringList.get(i*2).toString();
out.print(category);
%>
</td>
<td align="center">
<%
String content = stringList.get(i*2+1).toString();
out.print(content);
%>
</td>
</tr>
<%
}
%>
</table>
Note that category will iterate on elements situated in pair positions and 0 (0,2,4, ...) and content will iterate on elements situated in odd positions (1,3,5, ...)
You were ending <td> after you if condition and then starting a new <td> before the else.
Just try this one and see if it satisfies your expected output.
<td>
<%
if(i%2==0){
String category = stringList.get(i).toString();
out.print(category);
%>
</td>
<td>
<%
String content = stringList.get(i).toString();
out.print(content);
}
%>
</td>
check this code this might help you keep those tr and td tags in conditional statements
<table id="table" border="1" align="center">
<tr>
<td align="center">
Category of Tweets:
</td>
<td align="center">
All Tweets of User: <%out.println(userName); %>
</td>
</tr>
<%
for(int i=0;i<stringList.size();i++){
%>
<%
if(i%2==0){%><tr>
<td><%
String category = stringList.get(i).toString();
out.print(category);
%>
</td></tr>
<%}else{ %>
<tr>
<td>
<%
String content = stringList.get(i).toString();
out.print(content);%>
</td>
</tr><%
}
}
%>
</table>
How about
<td>
<%
if(i%2==0){
String category = stringList.get(i).toString();
out.print(category);
}
%>
</td>
<td>
<%
else{
String content = stringList.get(i).toString();
out.print(content);
}
%>
</td>

Categories