jsoup retrieving a specific table from the header

jsoup retrieving a specific table from the header - java

i have been working on this for a while and just cant seem to work out how to get the correct table that corresponds to the header it has. the tables are split up into sections which i can retrieve however inside the section is a header with the title of the table. i need to find the section with the header that matches a string and then pull the data from it. I'm fine with getting the data out of the table its just getting the correct section for the table
HTML extract of the section:
<section class="blueTab">
<header><h2>Energy</h2></header> //<----- THE HEADER I NEED TO MATCH TO
<table class="infoTable">
<tr><th>Model</th><th>0-60 mph</th><th>Top Speed</th><th>BHP</th><th></th></tr>
<tr>
<td><p>1.4i 16V Energy 5d</p></td>
<td><p>12.8 secs</p></td>
<td><p>111 mph</p></td>
<td><p>88 bhp</p></td>
</tr>
<tr class="alternate">
<td><p>1.6i 16V Energy 5d</p></td>
<td><p>11.5 secs</p></td>
<td><p>115 mph</p></td>
<td><p>103 bhp</p></td>
</tr>
<tr>
<td><p>1.8i VVT Energy 5d Auto</p></td>
<td><p>10.7 secs</p></td>
<td><p>117 mph</p></td>
<td><p>138 bhp</p></td>
</tr>
<tr class="alternate">
<td><p>1.3 CDTi 16V Energy 5d</p></td>
<td><p>12.8 secs</p></td>
<td><p>107 mph</p></td>
<td><p>88 bhp</p></td>
</tr>
</table>
<div class="fr topMargin">
<div id="ctl00_contentHolder_topFullWidthContent" class="modelEnquiry">
<div id="ctl00_contentHolder_topFullWidthContent" class="buttonLinks">
</div>
<div class="cb"><!----></div>
</div>
</div>
<div class="cb"><!----></div>
</section>
Im guessing i will have to use doc.getElementsByClass("blueTab") in a for loop and for each element see if h2 equals the string im looking for, i am just not sure how to implement this

This should solve your problem
Document doc = Jsoup.parse(input, "UTF-8");
Elements elem = doc.select(".blueTab header h2");
for (Iterator<Element> iterator = elem.iterator(); iterator.hasNext();)
{
Element element = iterator.next();
if (element.text().equals("Energy")) // your comparison text
{
Element tableElement = element.parent().nextElementSibling(); //Your got the expected table Element as per your requirement
}
}

Related

Select href from HTML table using Jsoup

I have HTML table:
<table class="table_class" id="table_id"
<tbody>
<tr>...</tr>
<tr>
<td>...</td>
<td>
...
</td>
<td>...</td>
</tr>
<tr>...</tr>
</tbody>
And need to get all such hrefs from 1 column in table.
I tried to use
Elements links = table.select("a[href]");
System.out.println(links);
but it parse hrefs from a tags on complete page.

Maybe this will work:
String url = "...";
Document doc = Jsoup.connect(url).get();
Elements elements = doc.select("#table_id a[href]");

How to make contains() method dynamic in selenium

I am tring to pass the value from excel to xpath but I am getting noSuchElementFoundException.
This is the code :
public String accountBalance(String accountNameToFind)
{
String accountBalance = null;
accountBalance = driver.findElement(By.xpath("//*[contains(text(),'" + accountNameToFind + "')]/following-sibling::td")).getText();
return accountBalance;
}
HTML:
<tr>
<th scope="row" class="">
SDRSP
<sup>2</sup> - <span class="td-copy-nowrap">1253 3292AUS</span>
<strong>
<a href="servlet/ca.tdbank.banking.servlet.SiteTransferOutServlet?dest=BROKER" class="td-link-standalone td-link-standalone-secondary">
<span class="td-copy-nowrap">
WebBroker
<span class="td-link-icon">›</span>
</span>
</a>
</strong>
</th>
<td class="td-copy-align-right">
$10,000.00
</td>
<td class="td-copy-align-centre">
</td>
</tr>

If your element located inside frame, you need to switch to it first and then handle element:
driver.switchTo().frame("frame_ID");
accountBalance=driver.findElement(By.xpath("//*[contains(text(),'"+accountNameToFind+"')]/following-sibling::td")).getText();
...some other actions...
driver.switchTo().defaultContent();
If your frame has no id attribute you can try to use another attributes, e.g. class name:
driver.switchTo().frame(driver.findElement(By.cssSelector("frame.frameClassName")));

Your function should be simplified to the below.
public String accountBalance(String accountNameToFind)
{
return driver.findElement(By.xpath("//th[contains(text(),'" + accountNameToFind + "')]/following-sibling::td")).getText();
}
I replaced * with th in the XPath because I'm assuming that the th is going to be the only element that you want to use. The accountName may exist elsewhere on the page and be causing issues.
If this is in a frame/iframe, you will need to switch to the frame first as in #Andersson's answer.
This may be a case where this portion of the page is dynamic so you may have to wait for the element to be visible before trying to scrape the data. See WebDriverWait with ExpectedConditions.elementToBeVisible().

Detecting same tags pattern in web page using jsoup java

I am writing a code for detecting matching tags patterns in web page. Here is the example.
<body>
<table width="200" border="1">
<tr>
<td>Name</td>
<td>Place</td>
<td>Animal</td>
</tr>
<p>hello World</p>
<tr>
<td>Jack</td>
<td>New york</td>
<td>Lion</td>
</tr>
<b>Code Works</b>
<tr>
<td>George</td>
<td>Sydney</td>
<td>Tiger</td>
</tr>
<tr>
<td>Tina</td>
<td>Delhi</td>
<td>Cat</td>
</tr>
</table>
<table>
<tbody>
<tr>
<td> </td>
<td>
1
2
3
4
5
</td>
</tr>
</tbody>
</table>
</body>
For above Tag pattern, I need to find the tags which are occurring repeatedly. And to discard those that are not in the pattern like tags b and p. For first table tags tr and td are occurring . For 2nd table 'a' tag is repeated.
This is what I have done till now:
Parsed to DOM tree using Jsoup.
Then used node visitor class to traverse the tree. Using head and tail methods, I can enter and exit tags.
But I am confused about how to proceed further.
Note: The tags pattern are not fixed.Tag pattern will vary depending on web page structure. Any kind of help will be appreciated.

But I am confused about how to proceed further.
Your confusion is propagating and reach us too. However, I'll try to give you an hint.
You can count the tags in your HTML code. If a tag count reaches a certain threshold, you can consider this tag as "repeatedly occuring".
// Load document
String html = ...
Document doc = Jsoup.parse(html);
// Count tags
String tagsSelector = "*";
Map<Element, Integer> tagsCountByType = new Hashmap<>();
for(Element e : doc.select("*")) {
Integer count = tagsCountByType.get(e);
if (count == null) {
tagsCountByType.put(e, new Integer(1));
} else {
tagsCountByType.put(e, new Integer(count.intValue() + 1));
}
}
// Find tag with a count greater than a given threshold
// ...
I didn't test the code. Just take it as an idea, some sort of inspiration.
Another idea, you can narrow down the tagsSelector. For example:
// All elements (tags) under any table directly under body.
String tagsSelector = "body > table *";

how to extract data inside a specific td in html table using java

I have:
<table class="cast_list">
<tr><td colspan="4" class="castlist_label"></td></tr>
<tr class="odd">
<td class="primary_photo">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_i1" ><img height="44" width="32" alt="Tim Robbins" title="Tim Robbins"src="http://ia.media-imdb.com/images/G/01/imdb/images/nopicture/32x44/name-2138558783._V379389446_.png"class="loadlate hidden " loadlate="http://ia.media-imdb.com/images/M/MV5BMTI1OTYxNzAxOF5BMl5BanBnXkFtZTYwNTE5ODI4._V1_SY44_CR1,0,32,44_AL_.jpg" /></a> </td>
<td class="itemprop" itemprop="actor" itemscope itemtype="http://schema.org/Person">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_t1" itemprop='url'> <span class="itemprop" itemprop="name">Tim Robbins</span>
</a> </td>
<td class="ellipsis">
...
</td>
how can I get only the information inside the second td class? (td class= itemprop). I want to get "/name/nm0000209/?ref_=ttfc_fc_cl_t1" and "Tim Robbins".
This is my code:
Elements elms = doc.getElementsByClass("cast_list").first().getElementsByTag("table");
Elements tds = elms.select("td");
for(Element td : tds){
if(td.attr("class").contains("itemprop")){
Elements links = tds.select("a[href]");
for(Element link : links){
if(link.attr("href").contains("name/nm"))
{
String castname = link.text();
String castImdbId = link.attr("href");
System.out.println("CastName:" + castname + "\n");
System.out.println("CastImdbID:" + castImdbId + "\n");
}
but it also returns the text of the link inside td class="primary_phptp" which is null, this is part of my output:
CastName:
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_i1
CastName:Tim Robbins
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_t1
CastName:
......
Could someone please let me know where is my problem? I think the condition if(td.attr("class").contains("itemprop")) does not work at all.
Thanks,

Use a different css selector instead of td. Since the right <td> is identified be the class, why not use it:
td.itemprop
Your java code then would start like this instead
Elements tds = elms.select("td.itemprop");

How to set the checkbox in the first column based on the value in the second column of a table?

I'm automating a task using Java and Selenium.
I want to set a checkbox (which is in the first column of a table) based on whether the value in the second column matches my input value. For example, in the following code snippet, the value "Magnus" matches my input value so I want to set the checkbox associated with it.
<table class="cuesTableBg" width="100%" cellspacing="0" border="0" summary="Find List Table Result">
<tbody>
<tr class="cuesTableBg">
<tr class="cuesTableRowEven">
<tr class="cuesTableRowOdd">
<td align="center">
<input class="content-nogroove" type="checkbox" name="result[1].chked" value="true">
<input type="hidden" value="1c62dd7a-097a-d318-df13-75de31f54cb9" name="result[1].col[0].stringVal">
<input type="hidden" value="Magnus" name="result[1].col[1].stringVal">
</td>
<td align="left">
<a class="cuesTextLink" href="userEdit.do?key=1c62dd7a-097a-d318-df13-75de31f54cb9">Magnus</a>
</td>
<td align="left"></td>
<td align="left">Carlsen</td>
<td align="left"></td>
</tr>
<tr class="cuesTableRowEven">
</tbody>
</table>
But I'm unable to do it. In the above case, the following two lines serve the purpose (as my input value matches with that in the second row):
WebElement checkbox = driver.findElement(By.xpath("//input[#type = 'checkbox' and #name = 'result[1].chked']"));
checkbox.click();
But it can't be used as the required value might not always be in the second row.
I tried following code block but to no avail:
List<WebElement> rows = driver.findElements(By.xpath("//table[#summary = 'Find List Table Result']//tr"));
for (WebElement row : rows) {
WebElement userID = row.findElement(By.xpath(".//td[1]"));
if(userID.getText() == "Magnus") {
WebElement checkbox = row.findElement(By.xpath(".//input[#type = 'checkbox']"));
checkbox.click();
break;
}
}
For what it's worth, XPath of the text in the second column:
/html/body[#id='mainbody']/table/tbody/tr/td/div[#id='contentautoscroll']/form/table[2]/tbody/tr[3]/td[2]/a
I don't know about CSS Selectors. Will it help here?

If you know the input value already you just use below xpath to select respected checkbox
"//a[text(),'Magnus']/parent::td/preceding-sibling::td/input[#type='checkbox']"
Update:
"//a[text()='Magnus']/parent::td/preceding-sibling::td/input[#type='checkbox']"

While comparing two strings equals should be used instead of ==
Replace
if(userID.getText() == "Magnus")
with
String check1 = userID.getText();
if(check1.equals("Magnus")

Seems like it was a silly mistake on my part. The following code snippet, a fairly straightforward one, worked.
List<WebElement> rows = driver.findElements(By.xpath("//table[#class='cuesTableBg']//tr"));
for (WebElement row : rows) {
WebElement secondColumn = row.findElement(By.xpath(".//td[2]"));
if(secondColumn.getText().equals("Magnus")) {
WebElement checkbox = row.findElement(By.xpath(".//td[1]/input"));
checkbox.click();
break;
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

jsoup retrieving a specific table from the header - java

Related

Select href from HTML table using Jsoup

How to make contains() method dynamic in selenium

Detecting same tags pattern in web page using jsoup java

how to extract data inside a specific td in html table using java

How to set the checkbox in the first column based on the value in the second column of a table?

Categories

Resources