Java HTML Parsing not getting my data? - java

I have the following HTML code:
<tr class="odd">
<td class="first name">
3i Group PLC
</td>
<td class="value">457.80</td>
<td class="change up">+10.90</td> <td class="delta up">+2.44%</td> <td class="value">1,414,023</td>
<td class="datetime">11:35:08</td>
For which I need to get the data
457.80
(ie. The value attribute), and I have this Java code currently:
String FTSE = "http://www.bloomberg.com/quote/UKX:IND/members";
doc = Jsoup.connect(FTSE).get();
Elements links = doc.select("a[href='/quote/III:LN']");
for (Element link : links) {
// get the value from href attribute
System.out.println("\nlink : " + link.attr("value"));
System.out.println("text : " + link.text());
When I run my program it terminates having output nothing. How do I make it so that it outputs the value, which in this case, is '457.80'?

links will contain the <a href...> element. What you are trying to retrieve is the text of a completely different element, i.e. a <td> tag which has the class value.
My guess is that you have multiple <tr> elements and you only want the one which contains the link you've selected. In which case you will need the following code:
String FTSE = "http://www.bloomberg.com/quote/UKX:IND/members";
doc = Jsoup.connect(FTSE).get();
Elements trs = doc.select("tr:has(a[href='/quote/III:LN'])");
Elements values = trs.select("td.value");
link = values.get(0);
System.out.println("text : " + link.text());
Or something similar...

Related

I want the text of quote id should displayed on console.But instead of quote id it is displaying review on console?

1) I want to capture quote id.But instead, it will capture review on the console.
2) I am using selenium web driver with java.
3) Just want the quote id to printed on the console all the details are mentioned below.
I want to capture quote id from a webpage using selenium web driver.Below are my dom structure and the code that I have used. But instead, it is printing the review text is part of XPath.
Dom structure of my page:
<tbody>
<tr>
<tr id="pricerow" valign="top">
<td nowrap="nowrap">
3661017
<br/>
<a class="selectMenuHome" style="color: gray; text-transform: none; text-decoration: underline;cursor:hand;" alt="TestSITA" title="TestSITA">Review</a>
</td>
<td nowrap="nowrap">UNITED KINGDOM</td>
<td>London</td>
<td> 1 Ropemaker Street, London, United Kingdom </td>
<td nowrap="nowrap">3456789</td>
<td nowrap="nowrap"/>
<td nowrap="nowrap">2 Mbps</td>
<td nowrap="nowrap">ME</td>
<td nowrap="nowrap">Business VPN Corporate</td>
<td nowrap="nowrap">211</td>
Xpath that have used: (.//*[#id='pricerow']/td[1])[1]
Quote capture code:
public static void QuoteCapture() throws Exception{
logmessage = ExcelUtils.getCellData(Constant.testcaserownum,
Constant.TestStepID)
+ "; "
+ ExcelUtils.getCellData(Constant.testcaserownum,
Constant.TeststepDescription)
+ "; Action: Capture and stored value";
Log.info(logmessage);
String[] fetchfrompage = null;
ActionElement = FindElement();
if (ExcelUtils.getCellData(Constant.testcaserownum, Constant.Data1)
.isEmpty()) {
fetchfrompage = ActionElement.getText().split("\n");
ValueCaptured.add(fetchfrompage[0]);
// ValueCaptureindex=ValueCaptureindex+1;
int size = ValueCaptured.size();
size = size - 1;
logmessage = ValueCaptured.get(size)
+ "; Value has been stored in index: " + size;
}
Log.info(logmessage);
}

unable to retrieve the Table th tag value using webdriver with java

From the below html i want to check each row in the table header value and if matched need retrieve the td value
below is my html
<table class="span-5" id="summaryTable" title="Table showing Summary data">
<tbody>
<tr>
<th class="width-40" id="num">
(12) App no:
</th>
<td headers="num">
(11)
<strong>2796179</strong>
</td>
</tr>
<tr>
<th class="noLines alignLeft width35" id="EnglishTitle">
(54) English Title:
</th>
<td class="noLines alignLeft width65" headers="EnglishTitle">
FRAME BIT-SIZE ALLOCATION
</td>
</tr>
<tr>
</tbody>
</table>
i want to collect the each th tag value (i.e (12) App no (54) English Title)
my java code
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for (int i=1;i<=rows.size();i++){
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/th/a")).getText();
if (dc.equalsIgnoreCase("(12) App no")){
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/strong")).getText();
}
}
but i'm getting no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id='summaryTable']/tbody/tr[1]/td/th/a"}
Please use the below code for this
WebElement elem = driver.findElement(By.id("summaryTable"));
List<WebElement> lists = elem.findElements(By.tagName("th"));
for(WebElement el : lists){
WebElement element = el.findElement(By.tagName("a"));
String str = element.getAttribute("innerHTML");
System.out.println(str);
}
I think you are making it a bit complicated, can you try bit simpler version?
public String getRequiredDataFromTableFromRow(String header){
WebElement table = driver.findElement(By.id("summaryTable"));
List<WebElement> rows = table.findElements(By.tagName("tr"));
for (WebElement row:rows) {
if(row.getText().contains(header)){
return row.findElement(By.tagName("td")).getText();
}
}
return null;
}
Cells are also arrays within the row, so you need to specify the position to get the text. The th tag is not there within the td tag.
Try the following code:
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for(int i = 1; i <= rows.size(); i++) {
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/th[0]")).getText();
if(dc.equalsIgnoreCase("(12) App no")) {
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/td[0]")).getText();
}
}
Below is basically for getting you the text for each "th" element.
WebElement summary = driver.findElement(By.id("summaryTable"));
List<WebElement>rows = summary.findElements(By.tagName("th"));
for(WebElement row : rows){
row.getText();
}}
In the above code, I am getting the reference using the "id" and using same object reference in order to get the elements list for "th" tag.
In case you want to perform operation on the text been found can be done using the reference of the row element

Unable to grab an element in a dynamic table where we have only text of a td tag

I have a table whose data changes according to what is added and deleted. In the table there are multiple columns Name, Qty, type, Status. All i have is the Name Text with me , i need to find the status field of that row which has that Name.
Problem is the html tags of have same class names and i tried grabbing the parent and sibling everything failed. Please find the html structure of the table below:
<table>
<thead> </thead>
<tbody>
<tr>
<td class = "c1">
<a class = txtclass> text1 </a>
</td>
<td class = "c1"> Qty </td>
<td class = "c2"> type </td>
<td class = "c3">
<div id = "1" class = "status1"> /div>
</td>
</tr>
<tr>
<td class = "c1">
<a> text2 </a>
</td>
<td class = "c1"> Qty </td>
<td class = "c2"> type </td>
<td class = "c3">
<div id = "2" class = "status2"> /div>
</td>
</tr>
</tbody>
</table>
So all i have with me is text2 and i need to get the with the status of that row.
How do i proceed. I tried
List<WebElement> ele = driver.findElements(By.xpath("//*[#class = 'txtClass'][contains(text(),'text')]"));
for(WebElement el1:ele)
{
WebElement parent = el1.findElement(By.xpath(".."));
WebElement child1= parent.findElement(By.xpath("//td[4]/div"));
System.out.println(child1.getAttribute("class"));
}
this gives me the class name of the status of the first row in the table always.
Same i tried with
WebElement child = el1.findElement(By.xpath("//following-sibling::td[4]/div[1]"));
i got the same thing class name of the first row in the table. I figured since the class name of the are same for all child elements it will always grab the first row elements, and not the one from the row.
Please help i am stuck here for long, let me know if you need any other details.
You are trying using -
el1.findElements(By.xpath("//following-sibling::td[4]/div[1]"));
It is matching all the element present with format td[4]/div[1] in your page and retrieving first match.
You have to use following xpath to grab status present under div based on you text.
driver.findElement(By.xpath(".//tr/td[contains(.,'text1')]/following-sibling::td[3]/div")).getAttribute("class");
If your requirement to extract all status try following code-
List<WebElement> allElements = driver.findElements(By.xpath(".//tr/td[contains(.,'text2')]/following-sibling::td[3]/div"));
for(WebElement element:allElements)
{
String status = element.getAttribute("class");
System.out.println(status);
}
I think this approach suitable for you:
Get all div elements containt attribute status :
List<WebElement> listChildStatus = driver.findElements(By.xpath(".//tr[.//a[contains(.,'text')]]//div"));
Get specific div elements containt attribute status :
WebElement childStatus = driver.findElement(By.xpath(".//tr[.//a[contains(.,'{TEXT}')]]//div"));
{TEXT} = the text value that you have
You need to locate the element with the text, go one level up, and then get the sibling with the status
WebElement statusElement = driver.findElement(By.xpath(".//td[a[contains(text(),'text2')]]/following-sibling::td[3]/div"));
String status = statusElement.getAttribute("class"); // will be status2
If you don't want to relay on index you can use last() to find the last sibling
WebElement statusElement = driver.findElement(By.xpath(".//td[a[contains(text(),'text2')]]/following-sibling::td[last()]/div"));

how to extract data inside a specific td in html table using java

I have:
<table class="cast_list">
<tr><td colspan="4" class="castlist_label"></td></tr>
<tr class="odd">
<td class="primary_photo">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_i1" ><img height="44" width="32" alt="Tim Robbins" title="Tim Robbins"src="http://ia.media-imdb.com/images/G/01/imdb/images/nopicture/32x44/name-2138558783._V379389446_.png"class="loadlate hidden " loadlate="http://ia.media-imdb.com/images/M/MV5BMTI1OTYxNzAxOF5BMl5BanBnXkFtZTYwNTE5ODI4._V1_SY44_CR1,0,32,44_AL_.jpg" /></a> </td>
<td class="itemprop" itemprop="actor" itemscope itemtype="http://schema.org/Person">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_t1" itemprop='url'> <span class="itemprop" itemprop="name">Tim Robbins</span>
</a> </td>
<td class="ellipsis">
...
</td>
how can I get only the information inside the second td class? (td class= itemprop). I want to get "/name/nm0000209/?ref_=ttfc_fc_cl_t1" and "Tim Robbins".
This is my code:
Elements elms = doc.getElementsByClass("cast_list").first().getElementsByTag("table");
Elements tds = elms.select("td");
for(Element td : tds){
if(td.attr("class").contains("itemprop")){
Elements links = tds.select("a[href]");
for(Element link : links){
if(link.attr("href").contains("name/nm"))
{
String castname = link.text();
String castImdbId = link.attr("href");
System.out.println("CastName:" + castname + "\n");
System.out.println("CastImdbID:" + castImdbId + "\n");
}
but it also returns the text of the link inside td class="primary_phptp" which is null, this is part of my output:
CastName:
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_i1
CastName:Tim Robbins
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_t1
CastName:
......
Could someone please let me know where is my problem? I think the condition if(td.attr("class").contains("itemprop")) does not work at all.
Thanks,
Use a different css selector instead of td. Since the right <td> is identified be the class, why not use it:
td.itemprop
Your java code then would start like this instead
Elements tds = elms.select("td.itemprop");

Jsoup image tag extraction

i need to extract an image tag using jsoup from this html
<div class="picture">
<img src="http://asdasd/aacb.jpgs" title="picture" alt="picture" />
</div>
i need to extract the src of this img tag ...
i am using this code i am getting null value
Element masthead2 = doc.select("div.picture").first();
String linkText = masthead2.outerHtml();
Document doc1 = Jsoup.parse(linkText);
Element masthead3 = doc1.select("img[src]").first();
String linkText1 = masthead3.html();
Here's an example to get the image source attribute:
public static void main(String... args) {
Document doc = Jsoup.parse("<div class=\"picture\"><img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /></div>");
Element img = doc.select("div.picture img").first();
String imgSrc = img.attr("src");
System.out.println("Img source: " + imgSrc);
}
The div.picture img selector finds the image element under the div.
The main extract methods on an element are:
attr(name), which gets the value of an element's attribute,
text(), which gets the text content of an element (e.g. in <p>Hello</p>, text() is "Hello"),
html(), which gets an element's inner HTML (<div><img></div> html() = <img>), and
outerHtml(), which gets an elements full HTML (<div><img></div> html() = <div><img></div>)
You don't need to reparse the HTML like in your current example, either select the correct element in the first place using a more specific selector, or hit the element.select(string) method to winnow down.
<tr> <td class="blackNoLine" nowrap="nowrap" valign="top" width="25" align="left"><b>CAST: </b></td> <td class="blackNoLine" valign="top" width="416">Jay, Shazahn Padamsee </td> </tr>
You can use:
Document doc = Jsoup.parse(...);
Elements els = doc.select("td[class=blackNoLine]");
Element el= els.get(1);
String castName = el.text();
With the following code I can extract the image correctly:
Document doc = Jsoup.parse("<div class=\"picture\"> <img src=\"http://asdasd/aacb.jpgs\" title=\"picture\" alt=\"picture\" /> </div>");
Element elem = doc.select("div.picture img").first();
System.out.println("elem: " + elem.attr("src"));
I'm using jsoup release 1.2.2, the latest one.
Maybe you're trying to print the inner html of an empty tag like img.
From the documentation: "html() - Retrieves the element's inner HTML".
For the second portion of html you can use:
Document doc2 = Jsoup.parse("<tr> <td class=\"blackNoLine\" nowrap=\"nowrap\" valign=\"top\" width=\"25\" align=\"left\"><b>CAST: </b></td> <td class=\"blackNoLine\" valign=\"top\" width=\"416\">Jay, Shazahn Padamsee </td> </tr>");
Elements trElems = doc2.select("tr");
if (trElems != null) {
for (Element element : trElems) {
Element secondTd = element.select("td").get(1);
System.out.println("name: " + secondTd.text());
}
}
which prints "Jay, Shazahn Padamsee".

Categories