read td class from url using java - java

hello i need little help reading website content
i want to read
<tr>
<td class="text-center"><strong>This Month</strong></td>
<td class="text-center">1194</td>
<td class="text-center">22</td>
<td class="text-center">7</td>
</tr>
i make it like this but it always return nothing
if (url.toLowerCase().contains("top100arena.com") && line.contains("<strong>This Month</strong></td><td class=\"text-center\">"))
return Integer.valueOf(line.split(">")[1].replace("</td", "").replace(",", ""));

if you are considering using third-party libraries, jsoup is a good idea.
https://jsoup.org/
using css / xpath selectors, you can indicate the element you are interested in, e.g.
//tr/td[contains (strong,.)]
will find all bold entries, then we can get the parent for that element, and read all the elements
http://xpather.com/
https://devhints.io/xpath
jsoup: How to select the parent nodes, which have children satisfying a condition

Related

Get data of html table from a website in android using Jsoup library,

I am working on an app where I am parsing some data from one or two websites. Luckily I did it for some of my targeted data but not. Now that I am using Jsoup for parsing the data from a website I used same jsoup format to get data of phase 2 as I did for phase 1 of my app but this time nothing is fetching arraylist showing up blank. I checked both HTML codes and there is a bit of difference in both.
In my phase1 i parsed the table using it's class and then i get the respective of that table. In the 2nd phase, the format of table and its tr & tds are different so i am struggling to figure it out. I am posting the html code from which i want to get data.
<div class="view-content">
<table class="views-table cols-3">
<thead>
</thead>
<tbody>
<tr class="odd views-row-first views-row-last">
<td class="views-field views-field-counter">
1 </td>
<td class="views-field views-field-body">
<p>some text here</p>
</td>
<td class="views-field views-field-field-notif-pdf">
Size :- 1.85 MB, Language:- English</td>
</tr>
</tbody>
</table>
</div>
I want the data inside above table tag and i am having problems to figure it out how it will be done with all classes in tr and td. Any help or suggestion will be highly appreciated..
THANK YOU!
You can use selectors in Jsoup:
File input = new File("path_to_html/test.html");
Document doc = Jsoup.parse(input, StandardCharsets.UTF_8.name());
///select table body
Element tbody = doc.select("tbody").first();
other examples at:
https://jsoup.org/cookbook/extracting-data/selector-syntax

Selenium, Java. Need to select ancestor element inside table by Xpath

I have an HTML page containing the following code :
<table class="report" style="width:100%">
<tbody>
<tr>
<th/>
<th>Position Open
<br>
<span class="timestamp">27/7/2016 16:12:12</span>
</br>
</th>
<th>Position closed
<br>
<span class="timestamp">27/7/2016 16:12:42</span>
</br>
</th>
</tr>
<tr>
<td>
<span dir="ltr">EURJPY</span>
</td>
<td>116.098</td>
<td>116.156</td>
</tr>
</tbody>
</table>
On this page I have another table with the same class attribute "report" but only this table contains texts "Position Open" and "Position Closed".
I need to select elements containing the "EURJPY", "116.098" and "116.156" data.
These elements content is changing i.e. instead of "EURJPY" may appear "EURUSD" or "GBPCAD" etc.
I tried the following code:
driver.findElement(By.xpath("//span[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span")).getAttribute("textContent");
to get the first required field text but got the Invalid selector error.
Your XPath is close but there were a couple issues.
//span[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span
You are searching for a SPAN that contains the text 'Position Open' when in fact it is a TH that contains the text.
//th[text()='Position Open']/ancestor::table[#class='report'](//tr)[2]/td/span
(//tr) should be corrected to //tr
//th[text()='Position Open']/ancestor::table[#class='report']//tr[2]/td/span
What you want is the text contained in the TD, not the SPAN. If you pull the text from the TD you can get the text you want from all three elements. If you pull the SPAN, then you will also need to pull the last two TDs. This way is just simpler.
...and finally, the TH contains more than just the text you are looking for. Use .contains() to get a match.
//th[text()='Position Open']/ancestor::table[#class='report']//tr[2]/td
So we take that XPath and put it into Java code and we get the below.
List<WebElement> tds = driver.findElements(By.xpath("//th[contains(text(),'Position Open')]/ancestor::table[#class='report']//tr[2]/td"));
for (WebElement td : tds)
{
System.out.println(td.getText());
}
There can be issues matching the text sometimes, use contains instead, try this selector
//th[contains(.,'Position')]/ancestor::table[#class='report']//tr[2]/td/span
You can use this xpath to locate the 3 <td> tags you are interest in
//th[contains(text(),'Position Open')]/ancestor::table//tr[2]/td
Using it will give you list of three elements, you can extract the text from them
List<WebElement> tds = driver.findElement(By.xpath"//th[contains(text(),'Position Open')]/ancestor::table//tr[2]/td");
String currency = tds.get(1).getText(); // this will be EURJPY
tds.get(2).getText(); // 116.098
tds.get(3).getText(); // 116.156

Syntax error on selenium xpath expression

I'm building a selenium findElement by.xpath expression, very complex, and it appears there is a syntax error.
I don't really know XML language so I made mistake(s) obviously, could someone tell where are the mistake(s) ? :)
Thanks!
I have a list of projects on a web page, in each project there are sub-projects, I want a specific one, I know the name and the subName.
The html code is something like :
<table id="list_proj" class="table tablebas table-striped table-bordered">
<thead>
<tbody style="height:1em;overflow-y:scroll"></tbody>
<tbody style="height:1em;overflow-y:scroll"></tbody>
<tbody style="height:1em;overflow-y:scroll">
<tr class="caption">
<td class="app_name" style="background-color:#EFEFEF;" colspan="11">
<b>
Application SEVo
</b>
</td>
</tr>
<tr class="data"></tr>
<tr class="data"></tr>
<tr class="data">
<td title="Pas de commentaire !" style="text-align:left">
<img alt="MCO" src="/colibri/images/corner-dots.gif"></img>
MCO
</td>
<td>0.50</td>
<td></td>
<td colspan="2">
</tr>
<tr class="data">
</tbody>
<tbody style="height:1em;overflow-y:scroll"></tbody>
</table>
And my xpath expression :
driver.findElement(
By.xpath(
"((//a[contains(text(),'"+name+"')])[1]
.(ancestor::td[#class='caption'])[1]
.following-sibling::td[#class='data']
.descendant::(a[contains(text(),'"+subName+"')])[1])[1]"
));
Sorry for the mess ^^'
Use below xPath which is more better :-
//a[contains(.,'"+name+"')]/ancestor::tr/following::a[contains(.,'"+subName+"')]
for eg. :- //a[contains(.,'Application SEVo')]/ancestor::tr/following::a[contains(.,'MCO')]
Updated..
another one is :-
//a[contains(.,'"+name+"')]/following::a[contains(.,'"+subName+"')]
Hope it will help you..:)
I don't know xPath with selenium, but I think the syntax error should be in the line below
.descendant::(a[contains(text(),'"+subName+"')])[1])[1]
Should be
.descendant::(//a[contains(text(),'"+subName+"')])[1])[1]
? (note the "//a")
It should be something like:
//a[contains(text(),'"+name+"')]/ancestor::tr[#class='caption']/following-sibling::tr[#class='data']//a[contains(text(),'"+subName+"')]
Don't use index [1] everytime, If you are sure that only one element would be returned, there is no point of putting index in every descendant or ancestor call.
"((//a[contains(text(),'"+name+"')])[1]
.(ancestor::td[#class='caption'])[1]
.following-sibling::td[#class='data']
.descendant::(a[contains(text(),'"+subName+"')])[1])[1]"
There's so much wrong with this it's hard to know where to start. First, we don't know what's in the variables name and subName - if these strings contain single quotes then anything might happen (injection attack). Secondly, it seems to be using "." as a separator between steps in the path when it should use "/". Third, what follows an axis like "descendant::" must be a NodeTest, and that rules out a parenthesized expression.
I think there's a real problem in your approach. Writing a complex expression in a language you don't understand and then asking on SO what the syntax errors mean is not going to be a good way to make progress. (Once you've got rid of the syntax errors, will you then know what the XPath actually means?). Do some reading and learn the language properly; don't try to drive a car without taking driving lessons.

How can I get the last tag inner HTML?

I want to get the last item which the last item in the specific tags,
I mean ;
<tr>
<td><b>my name</b></td>
<td><spec id="nm" nm="eg">Example Name</spec>
</td>
</tr>
....
<tr>
<td><b>samp2</b></td>
<td title="samp2"><div>Example 2</div>
</td>
</tr>
I want to reach "Example Name" I want to write a dynamic program? How can I do that?
(you can see the the last tag is "spec" maybe the other scenerio the last tag is sam how can I find last tag inner html? second sample I want to get Example 2)
updated sample
if I has this :
<table>
<tr>
<td>1</td>
<td><div>2</div></td>
</tr>
<tr>
<td><span>3</span></td>
</tr>
</table>
So I need the output should be:
2 and 3
because they are the last tags inner html under tr tag.
(I want to last tag under tr tag , but if it has child element I want to its inner html)
thanks in advance?
You can use jsoup html parser to do it, you can use css or jquery like selector to find element
String html = "<table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr></table>";
Document doc = Jsoup.parse(html);
System.out.println(doc);
Elements elements = doc.select("tr td:last-child");
for(Element element: elements) {
System.out.println(element.html());
}
output
2
4
you can try with a regex like :
/<spec[^>]*>(.*?)<\/spec>/
i think it is not efficient but you can try, check the regex for a better performance
/<td[^>]*>(.*?)<\/td><\/tr>/
this is an approximation. would fail the subject of child. You can use this result to remove span, div etc.
/<(.*?)[^>]*>(.*?)<\/(.*?)>/

I want to find the xpath for the image

This is the html code
<div class="navBg">
<table id="topnav" class="navTable" cellspacing="0" cellpadding="0" style="-moz-user- select: none; cursor: default;">
<tbody>
<tr>
<td class="logoCell" valign="top">
<td class="separator">
<td class="navItem relative" style="z-index: 99">
<td class="separator">
<td class="navItem relative">
<a class="content tasks" style="border-width: 0" href="/tasks/otasklist.do">
<div class="label" style="z-index:155; ">Tasks</div>
<img class="sizer" width="84" height="93" src="/img/default/pixel.gif? hash=1106906246"/>
<span class="bottomBorder">
I am trying to find the xpath for the image-->
src="/img/default/pixel.gif?hash=1106906246"
I have tried different combinations e:g
//table/tbody/tr/td[5][#class='navItem relative']/a/div[2]/img
I have written the following code too.
WebDriverWait wait= new WebDriverWait(driver, 20);
wait.until(ExpectedConditions.elementToBeClickable(By.linkText("Tasks")));
driver.findElement(By.xpath("//table/tbody/tr/td[5][#class='navItem relative']/a/div[2]/img")).click();
driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
It's identifying the element on web page by firepath but after running the script it's not clicking on the element and the console shows "No Such Element Exception".
Please answer in java lang only.
Can somebody please help me out.???
Thx
I see you are using Selenium. The safest bet is to find the closest parent with an #id attribute, and work your way down from there. Try this: //table[#id='topnav']//img. As alecxe pointed out, depending on how unique the image is in this table, you may need to narrow the XPath down a little more. Something like //table[#id='topnav']//tr[1]//img, or even //table[#id='topnav']//td[contains(#class, 'navItem')]//img.
The XPath you posted will not work, as it has some problems compared to the sample HTML you posted:
tbody may not appear in all browsers
the qualifier [#class='navItem relative'] for the element td[5] is redundant (although this is not exactly a problem)
div[2] does not exists, your HTML sample shows only one div
There are multiple ways to find the img tag. Depending on the uniqueness of the img tag attributes and it's location on the page.
Here's one way to find it, based on the Tasks div:
//table//div[text()='Tasks']/following-sibling::img
You can also rely on the td in which the img is located and check for the sizer class:
//table//td[contains(#class, 'navItem')]/img[#class='sizer']
And so on.

Categories