I have a website that contains a table that look like similar(bigger..) to this one:
</table>
<tr>
<td>
<table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
<tr>
<td align="right" style="padding-right: 25px;">
<span class="artist_name_txt">
name
<p class="diccografia">subname</p>
</span>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">
<tr>
<td class="songs" align="right">
number1
</td>
</tr>
<tr>
<td class="songs" align="right">
number2
.......
</td>
</tr>
</table>
and I need an idea how can i parse the website and extract this table into 2 arrays -
one will be something like names{number1, number2}
and the second will be links{number1link, number2link}
I tried a lot of ways and nothing really helps me.
You should read the JSoup Cookbook - especially the Selector syntax is very powerful.
Here's an example:
final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html);
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();
for( Element element : doc.select("a.artist_player_songlist") )
{
names.add(element.text());
links.add(element.attr("href"));
}
System.out.println("Names: " + names);
System.out.println("Links: " + links);
Output:
Names: [number1, number2]
Links: [/number1link, /number2link]
Android Web Scraping with a Headless Browser
Htmlunit on Android application
HttpUnit/HtmlUnit equivalent for android
Related
I want to loop through the news table and get the title and rating of each row. I tried different options, but I can’t understand why the select method receives all the options at once.
I need to get each news block in a loop.
I used this way to get table link:
Elements elements = document.select("#hnmain > tbody > tr:nth-child(3) > td > table");
This query doesn't work in a loop because it gets all the elements at once. I need to get the elements sequentially. So that I can do like this:
List list = new ArrayList<>();
for (Element element: elements){
String title = element...
String rating = element...
list.add(title);
list.add(rating);
}
Sample data from html:
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr class="athing" id="33582264">
<td align="right" valign="top" class="title"><span class="rank">1.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33582264" href="vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class="sitebit comhead"> (<span class="sitestr">upbase.io</span>)</span></span></td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext"><span class="subline"> <span class="score" id="score_33582264">632 points</span> by tonypham <span class="age" title="2022-11-13T12:00:06">20 days ago</span> <span id="unv_33582264"></span> | hide | 456 comments </span></td>
</tr>
<tr class="spacer" style="height:5px"></tr>
<tr class="athing" id="33584941">
<td align="right" valign="top" class="title"><span class="rank">2.</span></td>
<td valign="top" class="votelinks">
<center>
<a id="up_33584941" href="vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13">
<div class="votearrow" title="upvote"></div></a>
</center></td>
<td class="title"><span class="titleline">Forking Chrome to turn HTML into SVG<span class="sitebit comhead"> (<span class="sitestr">fathy.fr</span>)</span></span></td>
</tr>
if I understand your question I think this code will work for you
Document doc = Jsoup.parse("<table border=\"0\" id=\"hnmain\" cellpadding=\"0\" cellspacing=\"0\"> <tbody> <tr class=\"athing\" id=\"33582264\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">1.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33582264\" href=\"vote?id=33582264&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Show HN: I built my own PM tool after trying Trello, Asana, ClickUp, etc.<span class=\"sitebit comhead\"> (<span class=\"sitestr\">upbase.io</span>)</span></span></td> </tr> <tr> <td colspan=\"2\"></td> <td class=\"subtext\"><span class=\"subline\"> <span class=\"score\" id=\"score_33582264\">632 points</span> by tonypham <span class=\"age\" title=\"2022-11-13T12:00:06\">20 days ago</span> <span id=\"unv_33582264\"></span> | hide | 456 comments </span></td> </tr> <tr class=\"spacer\" style=\"height:5px\"></tr> <tr class=\"athing\" id=\"33584941\"> <td align=\"right\" valign=\"top\" class=\"title\"><span class=\"rank\">2.</span></td> <td valign=\"top\" class=\"votelinks\"> <center> <a id=\"up_33584941\" href=\"vote?id=33584941&how=up&goto=front%3Fday%3D2022-11-13\"> <div class=\"votearrow\" title=\"upvote\"></div></a> </center></td> <td class=\"title\"><span class=\"titleline\">Forking Chrome to turn HTML into SVG<span class=\"sitebit comhead\"> (<span class=\"sitestr\">fathy.fr</span>)</span></span></td> </tr>");
Elements elements = doc.select("#hnmain .athing");
for (Element element : elements) {
String title = element.select(".title").text();
String rank = element.select(".rank").text();
System.out.println(title + " -- "+rank);
}
So i have this JSP page which having data from table and forming a GET request to render more data on another page , by clicking one of the table line
Problem is i have to transforming it into POST method , to avoid getting information in the http request link
i know how to use post with form, but here i have to take the date from a table line and not a form
Any idea how to do that. i'm new to JSP so i don't know how to do it
<table border=0 bgcolor=#92ADC2 cellspacing=1 cellpadding=3 width=95% align=center>
<tr class=entete>
<td class=texte8 align=center> <spring:message code="nom"/></td>
<td class=texte8 align=center> <spring:message code="date_naissance"/></td>
<td class=texte8 align=center> <spring:message code="numero"/></td>
</tr>
<%
String v_Person = "";
String v_date = "";
String v_numero = "";
for (int i = 0; i < PersonListeBean.getPerson(); i++)
{
Gen_rechBean cb = PersonListeBean.getPerson(i);
v_Person = cb.getname();
v_date=cb.getdate();
v_numero=cb.getNumero();
}
%>
<tr class="<%=class_cell%>" onMouseOver="this.className='over';" onMouseOut="this.className='<%=class_cell%>';" onclick="javascript:parent['gauche'].document.location='ResultServlet?name=<%=v_Person%>&numero=<%=v_numero%>&date_naissance=<%=v_date%>">
<td class=texte7 align=left > <%=cb.getname()%></td>
<td class=texte7 align=left > <%=cb.getdate()%></td>
<td class=texte7 align=left > <%=cb.getNumero()%></td>
</tr>
</table>
<br>
<table width="95%" align="center" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="right">
<a target="corps" href="rechResult.jsp" class="rub2" </a>
</td>
</tr>
</table>
I see what are you trying to do.
The easiest way to do that is using a form. So you can call a js method when you click the
<tr onclick="myMethod()">
that you want.
The method can fill your form and send the submit. Using this you can be redirected without sending data in you url.
A basic example could be:
(Supposing these are elements printed by server-side)
<tr onclick="myMethod(<%=getName()%>, <%=getDate()%>, <%=getNumero()%>)">
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<form id="myForm" action="targetFile.jsp" method="post">
//Hidden inputs to prevent users form touching this fields
<input type="hidden" name="name" id="data1">
<input type="hidden" name="date" id="data2">
<input type="hidden" name="numero" id="data3">
</form>
<script>
function myMethod(data1, data2, data3){
//Im gonna use jQuery. Is like javascript but quite faster to use
//Filling the form
$("#data1").val(data1);
$("#data2").val(data2);
$("#data3").val(data3);
//Submiting it
$("myForm").submit();
}
</script>
Let me know if it was helpful.
c:
I'm having an issue with Selenium in Java.
I have a web page like this:
<html>
<body>
<div id='content'>
<table class='matches'>
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
</table>
<div id='content'>
<body>
<html>
I first get all the rows into a list:
List<WebElement> allRows = driver.findElements(By.xpath("//table[#class='matches']/tbody/tr[contains(#id, 'today')]"));
Next I iterate through all the elements displaying the WebElement (i.e. the row) and on the next line I display the td containing the home team, separated by a line:
for (WebElement row : allRows) {
System.out.println("Outer HTML for row" + row.getAttribute("outerHTML"));
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("//td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
System.out.println("------------------------------------------------------------");
}
The first println displays all rows, one by one.
The second however displays ONLY 'Real Madrid' for each iteration. I'm losing my mind because I don't understand why. Can someone please help?
The output:
<tr id='today_01'>
<td class='team-a'>Real Madrid</td>
<td class='score'>0-0</td>
<td class='team-b'>Barcelona</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_02'>
<td class='team-a'>PSG</td>
<td class='score'>1-1</td>
<td class='team-b'>Manchester City</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
<tr id='today_03'>
<td class='team-a'>Liverpool</td>
<td class='score'>2-2</td>
<td class='team-b'>Arsenal</td>
</tr>
<td class='team-a'>Real Madrid</td>
------------------------------------------------------------
You have to use like this
System.out.println("Outer HTML for Home Team cell" + row.findElement(By.xpath("td[contains(#class,'team-a')]")).getAttribute("outerHTML"));
Then it will point to the correct element that we want.
I am trying to scrape with selenium a table of products.
Here is my example table:
<div class="article">
<table style="width: 100%">
<tbody><tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900101" class="changeable">
<span>Product 1 </span>
</a>
</td>
<td class="trenner_lu">
11.11.1999
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 1</a>
</td>
<td class="trenner_lu">
1999$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900347" class="changeable">
<span>Product 2 </span>
</a>
</td>
<td class="trenner_lu">
1.12.1944
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
</td>
<td class="trenner_lu">
1234$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12908635" class="changeable">
<img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">
<span>Product 1 </span>
<img src="/Content/images/icons/photo.png" alt="Foto">
</a>
</td>
<td class="trenner_lu">
05.12.1950
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 4</a>
</td>
<td class="trenner_lu">
131282$
</td>
</tr>
</tbody></table>
</div>
I tried to scrape each element with:
List<WebElement> links = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> prodNames = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> group = driver.findElements(By.xpath("//*[#id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
However, as you can see one of my td elements has two links inside, therefore my WebElement list has not the same length and it is extremely hard to merge together.
My desired list output should look like that:
[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]
Any suggestion how to scrape such a table much more efficient?
I appreciate your replies!
Think about everything you interact with as of objects:
class Table {
private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";
public String getTableCellText(int row, int col) {
WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
return cell.getText();
}
}
You can use it as you see fit:
Table t = new Table();
System.out.println(t.getTableCellText(3, 5)); // prints 131282$
You could probably iterate through each row to make it clearer as to what you are doing in python it would be:
rows = driver.find_elements(By.XPATH, "//*[#id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
cells = row.find_elements(By.XPATH, "//td")
product_name = cells[1].text
... etc ...
After Entering a string into a table with a checkbox next to it, I would like to click on the checkbox. In selenium, how can i iterate through the table and search for a particular text, then check the checkbox next to it.
Here's the html of the table:
<tbody>
<tr class="keyword-list-item">
<td width="75%">
<input class="keyword-selection-checkbox" type="checkbox" data-id="gw_78669090303"/>
<span>+spatspatulalas</span>
</td>
<td width="25%" style="text-align: right; padding-right: 4px;">
<span class="icon iconGoogle"/>
</td>
</tr>
<tr class="keyword-list-item">
<td width="75%">
<input class="keyword-selection-checkbox" type="checkbox" data-id="gw_102731166303"/>
<span>12.10 test post</span>
</td>
<td width="25%" style="text-align: right; padding-right: 4px;">
<span class="icon iconGoogle"/>
</td>
</tr>
You can use xpath for this. Just needs be little smart how you write the xpath. Notice the following xpath find the checkbox using the text of it.
String text = "12.10 test post";
By xpath = By.xpath("//span[contains(text(),'" + text + "')]/../input");
WebElement element = driver.findElement(xpath);