Extracting text with jsoup out of a table with breaks

Extracting text with jsoup out of a table with breaks - java

Some knows how I can extract those texts with Jsoup?
<TR>
<TD bgColor=#ffa55c><B>
The first text I want. </B><BR>
<BR>
The second text I want <BR>
</TD>
</TR>
I can get the first one with:
Element element = doc.select("tr td:eq(1) b").get(1);
element.text();
But I don't get the second one :(

You need to close your table cell tag TD to make the HTML well formed
<table>
<TR>
<TD bgColor=#ffa55c><B>
The first text I want. </B><BR>
<BR>
</TD><TD> <!-- add this -->
The second text I want <BR>
</TD>
</TR>
</table>
otherwise JSoup will consider the first & second cell as one and get will throw an IndexOutOfBoundsException, then you can simple use
Element element = doc.select("td").get(2);

Using the table data you gave us, you can easily get all the text in one fell swoop:
String html = "<TR><TD bgColor=#ffa55c><B>The first text I want.</B><BR><BR>The second text I want<BR></TD></TR>";
Document doc = Jsoup.parse(html);
System.out.println("test: " + doc.text());
With the output:
test: The first text I want. The second text I want
I think you need to restrict your select to the TR and ignore everything after it, so make it something like
// get the TRs
Elements elements = doc.select("tr");
// iterate through the TRs
for (Element element: elements){
System.out.println(element.text());
}

Related

Selenium, Java. Need to select ancestor element inside table by Xpath

I have an HTML page containing the following code :
<table class="report" style="width:100%">
<tbody>
<tr>
<th/>
<th>Position Open
<br>
<span class="timestamp">27/7/2016 16:12:12</span>
</br>
</th>
<th>Position closed
<br>
<span class="timestamp">27/7/2016 16:12:42</span>
</br>
</th>
</tr>
<tr>
<td>
<span dir="ltr">EURJPY</span>
</td>
<td>116.098</td>
<td>116.156</td>
</tr>
</tbody>
</table>
On this page I have another table with the same class attribute "report" but only this table contains texts "Position Open" and "Position Closed".
I need to select elements containing the "EURJPY", "116.098" and "116.156" data.
These elements content is changing i.e. instead of "EURJPY" may appear "EURUSD" or "GBPCAD" etc.
I tried the following code:
driver.findElement(By.xpath("//span[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span")).getAttribute("textContent");
to get the first required field text but got the Invalid selector error.

Your XPath is close but there were a couple issues.
//span[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span
You are searching for a SPAN that contains the text 'Position Open' when in fact it is a TH that contains the text.
//th[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span
(//tr) should be corrected to //tr
//th[text()='Position Open']/ancestor::table[#class='report']//tr[2]/td/span
What you want is the text contained in the TD, not the SPAN. If you pull the text from the TD you can get the text you want from all three elements. If you pull the SPAN, then you will also need to pull the last two TDs. This way is just simpler.
...and finally, the TH contains more than just the text you are looking for. Use .contains() to get a match.
//th[text()='Position Open']/ancestor::table[#class='report']//tr[2]/td
So we take that XPath and put it into Java code and we get the below.
List<WebElement> tds = driver.findElements(By.xpath("//th[contains(text(),'Position Open')]/ancestor::table[#class='report']//tr[2]/td"));
for (WebElement td : tds)
{
System.out.println(td.getText());
}

There can be issues matching the text sometimes, use contains instead, try this selector
//th[contains(.,'Position')]/ancestor::table[#class='report']//tr[2]/td/span

You can use this xpath to locate the 3 <td> tags you are interest in
//th[contains(text(),'Position Open')]/ancestor::table//tr[2]/td
Using it will give you list of three elements, you can extract the text from them
List<WebElement> tds = driver.findElement(By.xpath"//th[contains(text(),'Position Open')]/ancestor::table//tr[2]/td");
String currency = tds.get(1).getText(); // this will be EURJPY
tds.get(2).getText(); // 116.098
tds.get(3).getText(); // 116.156

Click span element from div class

I need to select testdt4 from the table. I am not able to get the whole list of the table items displayed as web elements. Like testdt4, there are many rows of the table which i need to find. Can I get the whole list in List<WebElement>? Please help.
<tr class="ng-scope" ng-class="::{warning: !$$cancelLastAction && (entity.$$inProgress || entity.$$hasError)}" ng-repeat="entity in locals.data">
<td class="selectRow">
<!-- ngRepeat: columnDef in locals.gridOptions.columnDefs -->
<td class="ng-scope" ng-repeat="columnDef in locals.gridOptions.columnDefs">
<ci-resource-grid-cell text-limit="textLimit" column="columnDef" data="entity">
<a class="ng-scope" ng-click="openLink(entity)" href="">
<div class="ciTruncate" truncate-limit="150" ci-truncate="columnDef.$$parse(entity)">
<span class="ng-scope">**testdt4**</span>
</div>
</a>
</ci-resource-grid-cell>
</td>
<!-- end ngRepeat: columnDef in locals.gridOptions.columnDefs -->
<td class="ng-scope" ng-repeat="columnDef in locals.gridOptions.columnDefs">

First - you need a established goal as to what you are trying to find. Are looking for all elements or Single element. If multiple elements.
[https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/SearchContext.html#findElements-org.openqa.selenium.By-]
are you trying to find Cells or Rows or whole table? Cell tag - td/Row - tr/Table - table.
with find elements, xpath search with following will give you all cells:
Xpath Expression::
"//td//span[contains(#class,"ng-scope") and contains(text(),"testdt4")]"
Where I gave //tr above, replace with table or tr as needed [pt2]
Example:
Xpath Expression::
"//tr//span[contains(#class,"ng-scope") and contains(text(),"testdt4")]"
Xpath Expression::
"//table//span[contains(#class,"ng-scope") and contains(text(),"testdt4")]"
List someList = driver.findElement(By.xpath("Xpath Expression"))
Show me your code i will modify it:: Update You should be using TD for cells.
By identifierXPath = By.xpath("//tr//span[contains(#class,'ng-scope') and contains(text(),'testdt4')]");
List<WebElement> list_Cells = driver.findElements(identifierXPath);
System.out.println(list_Cells.size());
for (WebElement single_Cell : list_Cells){
System.out.println("Cell Text ::-" + single_Cell.getText().trim());
}

Selenium Xpath to find a table cells inside a div tag

I have the following HTML code that I need to check if text exists within a table cell:
<div class="background-ljus" id="AutoText">
<table class="topAlignedCellContent">
<tbody>
<tr>
<td>X1</td>
</tr>
<tr>
<td>X2</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td>Y1</td>
<td>Y2</td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td>Z1</td>
<td>Z2</td>
</tr>
</tbody>
</table>
</div>
I have solved it like this:
By locator = getLocator(CommonConst.XPATH, "*//div[#" + type + "='" + anyName + "']");
fluentWait(locator);
WebElement div = getDriver().findElement(locator);
List<WebElement> cells = div.findElements(By.tagName("td"));
for (WebElement cell : cells) {
if (cell.getText().contains(cellText)) {
foundit = true;
}
}
But i think its a bit slow because I need to do this several times.
I tried to do this with only XPath but had no luck.
"*//div[#id='AutoText']//td[contains[Text(), 'celltext']]"
"*//div[#id='AutoText']//table//tbody//tr[td//Text()[contains[., 'celltext']]"
Anyone have a suggestion about why my XPath isn't working?

Wrong:
*//div[#id='AutoText']//td[contains[Text(), 'celltext']]
*//div[#id='AutoText']//table//tbody//tr[td//Text()[contains[., 'celltext']]
Correct (with shorter alternatives):
//div[#id='AutoText']//td[contains(text(), 'celltext')]
//div[#id='AutoText']//td[contains(., 'celltext')]
//div[#id='AutoText']/table/tbody/tr[td//text()[contains(., 'celltext')]
//div[#id='AutoText']/table/tbody/tr[td[contains(., 'celltext')]]
contains() is a function
text() needs to be lowercase
predicates can be nested
don't use // when you don't have to
Note
. refers to "this node" and, when given as an argument to a string function such as contains(), is the equivalent of string(.). This, however, is not at all the same as text().
string(.) creates the concatenation of all text nodes inside the current node, no matter how deeply nested. text() is a node test and by default selects only the direct children.
In other words, if the current node was <td>foo <b>bar</b> baz</td>, contains(text(), 'bar') would actually be false. (So would contains(text(), 'baz'), for a different reason.)
contains(., 'bar') on the other hand would return true, so would contains(., 'baz').
Only when the current node contains nothing but a single text node, text() and . are equivalent. Most of the time, you will want to work with . instead of text(). Set up your predicates accordingly.

You can do it with dynamic xpath as well
//div[#id='AutoText']/table[1]//tbody/tr[1]/td
Now just replace 1 with i and use for loop to see if the element return any value or not

Your xpath's are wrong. You should be using text() instead of Text(). Also contains() is a function with round braces and not square braces. Try this -
"//div[#id='AutoText']//td[contains(text(), 'celltext')]"
"//div[#id='AutoText']//td[contains(., 'celltext')]"
Or instead if you want all the child elements with the text to be displayed then use . instead of text()
Hope it helps.

How can I get the last tag inner HTML?

I want to get the last item which the last item in the specific tags,
I mean ;
<tr>
<td><b>my name</b></td>
<td><spec id="nm" nm="eg">Example Name</spec>
</td>
</tr>
....
<tr>
<td><b>samp2</b></td>
<td title="samp2"><div>Example 2</div>
</td>
</tr>
I want to reach "Example Name" I want to write a dynamic program? How can I do that?
(you can see the the last tag is "spec" maybe the other scenerio the last tag is sam how can I find last tag inner html? second sample I want to get Example 2)
updated sample
if I has this :
<table>
<tr>
<td>1</td>
<td><div>2</div></td>
</tr>
<tr>
<td><span>3</span></td>
</tr>
</table>
So I need the output should be:
2 and 3
because they are the last tags inner html under tr tag.
(I want to last tag under tr tag , but if it has child element I want to its inner html)
thanks in advance?

You can use jsoup html parser to do it, you can use css or jquery like selector to find element
String html = "<table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>";
Document doc = Jsoup.parse(html);
System.out.println(doc);
Elements elements = doc.select("tr td:last-child");
for(Element element: elements) {
System.out.println(element.html());
}
output
2
4

you can try with a regex like :
/<spec[^>]*>(.*?)<\/spec>/
i think it is not efficient but you can try, check the regex for a better performance
/<td[^>]*>(.*?)<\/td><\/tr>/
this is an approximation. would fail the subject of child. You can use this result to remove span, div etc.
/<(.*?)[^>]*>(.*?)<\/(.*?)>/

How to iterate over <td> tags with condition using jsoup

I am able get all text with in tags but I want to access only specific td tags.
Eg.I want to get data of second cell text whose first cell html contains attribute
a name="manufacturer"
or Content.I am using Jsoup.
<tabel>
<tr>
<td><a name="Manufacturer"></a>manufacturer</td>
<td>happiness</td>
</tr>
<td>manuf</td>
<td>hap</td>
</tr>
<tr>
<td>tents</td>
<td>acd</td>
</tr>
<tr>
<td><a name="Content"></a>Contents</td>
<td>abcd</td>
</tr>
</tabel>
I am using the code ..
doc.select("a[name=Manufacturer]");
..but its giving me the reference of cell one ,I need to go to cell two get cell two text

You need to use selector like [attr=value]: elements with attribute value, e.g. [width=500].
Take a look at official documentation Selector Syntax

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting text with jsoup out of a table with breaks - java

Some knows how I can extract those texts with Jsoup? <TR> <TD bgColor=#ffa55c><B> The first text I want. </B><BR> <BR> The second text I want <BR> </TD> </TR> I can get the first one with: Element element = doc.select("tr td:eq(1) b").get(1); element.text(); But I don't get the second one :(

Related

Selenium, Java. Need to select ancestor element inside table by Xpath

Click span element from div class

Selenium Xpath to find a table cells inside a div tag

How can I get the last tag inner HTML?

How to iterate over <td> tags with condition using jsoup

Categories

Resources