How to target specific td in a HTML document with java - java

I want to target specific td inside a tr.
This is my code:
private void fletch(String name) throws IOException, JSONException {
final String iron = "img=2";
final String ui = "img=3";
final String hc = "img=10";
String url = "services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=";
if ( name.toLowerCase().indexOf(iron.toLowerCase()) != -1 ) {
url = "http://services.runescape.com/m=hiscore_oldschool_ironman/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(ui.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_ultimate/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(hc.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_hardcore_ironman/hiscorepersonal.ws?user1=";
}
String[] parts = name.split(">");
String part2 = parts[1];
String fin = part2.replaceAll("\\s","+");
url+=fin;
Document doc = Jsoup.connect(url)
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
//core part
Element table1 = doc.select("table").first();
String body = table1.toString();
Document docb = Jsoup.parseBodyFragment(body);
Element bbd = docb.body();
String hhk = bbd.toString();
//This is where i dont know how to target the td data.. Tried this (cant check code so came on here):
String overall = bbd.getElementsByTag("td").get(4).text();
Now this gives me this HTML code:
<table cellpadding="3" cellspacing="0" border=0 style="max-width: 355px;">
<tr><td colspan="5" align="center"><b>Personal scores for big kurwaaa</b></td></tr>
<tr>
<td colspan="2" style="text-align:left;padding-left:24px;"><b>Skill</b></td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td>
</tr>
<tr><td width="35"></td><td width="100"></td><td width="75"></td><td width="40"></td><td width="75"></td></tr>
<tr>
<td></td>
<td align="left"><a href="overall.ws?table=0&user=big+kurwaaa">
Overall
</a></td>
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
</tr>
<tr>
<td align="right"><img class="miniimg" src="http://www.runescape.com/img/rsp777/hiscores/skill_icon_attack1.gif"></td>
<td align="left"><a href="overall.ws?table=1&user=big+kurwaaa">
Attack
</a></td>
<td align="right">14,475</td>
<td align="right">19</td>
<td align="right">4,304</td>
</tr>
I want to target the 3 td with data inside every tr. So for example:
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
and so on from the "overall" tr to the last. Is there any way to do in a simple way that will give me the option to loop through the data and create a JSON/map?
Ps: new to java

If you want to get all the tr tags inside bbd use getElementsByTag.
It will return Elements, by which you can browse through all the tr tags by index (0 based index).If want to skip first 3 tr tags just start loop from index : 3, and so for td tags
Here is the demo code :
Elements trList = bbd.getElementsByTag("tr");
for (int i = 3; i < trList.size(); i++) {
System.out.println("----------------- TR START -----------------");
Elements tdList = trList.get(i).getElementsByTag("td");
for (int j = 2; j < tdList.size(); j++) {
System.out.println(tdList.get(j));
}
System.out.println("------------------ TR END ------------------");
}

String url = "yourUrl";
Document doc = Jsoup.connect(url).get();
Element table = doc.select("table[class=tableClass]").first();
Iterator<Element> iterator = table.select("td[align=right]").iterator();
iterator.next();//skip first
iterator.next();//skip second
System.out.println(iterator.next().text());

Related

Retrieve data from Html with JSoup

I'm trying to retreive informantions from a website but the problem is that classes names are identical.
This is the website structure.
<tr class="main_el">
<td class="key">KEY1</td>
<td class="val">VALUE1</td>
</tr>
<tr class="main_el">
<td class="key">KEY2</td>
<td class="val">VALUE2</td>
</tr>
...
<tr class="main_el">
<td class="key">KEY3</td>
<td class="val">VALUE3</td>
</tr>
I can't use this .get(i).getElementsByClass(); because indexes are diffrent for each page. Please help!
EDIT
I want to use KEY1 retrieve VALUE1 only and independently of other VALUES.
Note VALUE1 could be at index 1 or 9
You can write simple function like that.
public Map<String, String> parseHtml(String inputHtml) {
Document.OutputSettings outputSettings = new Document.OutputSettings();
outputSettings.syntax(Document.OutputSettings.Syntax.html);
outputSettings.prettyPrint(false);
Document htmlDoc = Jsoup.parse(inputHtml);
//Creating map to save td <key,value>
Map<String, String> textMap = new HashMap<>();
Elements trElements = htmlDoc.select("tr.main_el");
if (trElements.size() > 0) {
for (Element trElement : trElements) {
String key = null;
String value = null;
for (Element tdElement : trElement.children()) {
if (tdElement.hasClass("key"))
key = tdElement.text();
if (tdElement.hasClass("value"))
value = tdElement.text();
}
if (key != null && value != null)
textMap.put(key, value);
}
}
return textMap;
}
Then you can retrieve values from map by keys from your html.
Thanks.
Maybe this works:
select all <tr> elements
for each <tr>
select <td> with class "key" from the <tr>
if value of this element == "KEY1" then
select <td> with class "key" from <tr>
do whatever you want with this value

unable to retrieve the Table th tag value using webdriver with java

From the below html i want to check each row in the table header value and if matched need retrieve the td value
below is my html
<table class="span-5" id="summaryTable" title="Table showing Summary data">
<tbody>
<tr>
<th class="width-40" id="num">
(12) App no:
</th>
<td headers="num">
(11)
<strong>2796179</strong>
</td>
</tr>
<tr>
<th class="noLines alignLeft width35" id="EnglishTitle">
(54) English Title:
</th>
<td class="noLines alignLeft width65" headers="EnglishTitle">
FRAME BIT-SIZE ALLOCATION
</td>
</tr>
<tr>
</tbody>
</table>
i want to collect the each th tag value (i.e (12) App no (54) English Title)
my java code
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for (int i=1;i<=rows.size();i++){
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/th/a")).getText();
if (dc.equalsIgnoreCase("(12) App no")){
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/strong")).getText();
}
}
but i'm getting no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id='summaryTable']/tbody/tr[1]/td/th/a"}
Please use the below code for this
WebElement elem = driver.findElement(By.id("summaryTable"));
List<WebElement> lists = elem.findElements(By.tagName("th"));
for(WebElement el : lists){
WebElement element = el.findElement(By.tagName("a"));
String str = element.getAttribute("innerHTML");
System.out.println(str);
}
I think you are making it a bit complicated, can you try bit simpler version?
public String getRequiredDataFromTableFromRow(String header){
WebElement table = driver.findElement(By.id("summaryTable"));
List<WebElement> rows = table.findElements(By.tagName("tr"));
for (WebElement row:rows) {
if(row.getText().contains(header)){
return row.findElement(By.tagName("td")).getText();
}
}
return null;
}
Cells are also arrays within the row, so you need to specify the position to get the text. The th tag is not there within the td tag.
Try the following code:
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for(int i = 1; i <= rows.size(); i++) {
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/th[0]")).getText();
if(dc.equalsIgnoreCase("(12) App no")) {
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/td[0]")).getText();
}
}
Below is basically for getting you the text for each "th" element.
WebElement summary = driver.findElement(By.id("summaryTable"));
List<WebElement>rows = summary.findElements(By.tagName("th"));
for(WebElement row : rows){
row.getText();
}}
In the above code, I am getting the reference using the "id" and using same object reference in order to get the elements list for "th" tag.
In case you want to perform operation on the text been found can be done using the reference of the row element

html table td contents parsing using jsoup in android

I've with me some html table contents.And for my application I want to parse these html contents using JSOUP parsing in android.But I am new to this JSOUP method and I can't parse those html contents properly.
HTML data:
<table id="box-table-a" summary="Tracking Result">
<thead>
<tr>
<th width="20%">AWB / Ref. No.</th>
<th width="30%">Status</th>
<th width="30%">Date Time</th>
<th width="20%">Location</th>
</tr>
</thead>
<tbody>
<tr>
<td width="20%" nowrap="nowrap" class="click">Z45681583</td>
<td width="30%" nowrap="nowrap" class="click">
IN TRANSIT<div id='ntfylink' style='display:block; text-decoration:blink'><a href='#' class='topopup' name='modal' style='text-decoration:none'><font face='Verdana' color='#DF0000'><blink>Notify Me</blink></font></a></div>
</td>
<td width="30%">
Sat, Jan, 31, 2015 07:09 PM
</td>
<td width="20%">DELHI</td>
</tr>
</tbody>
</table>
from this table I need the"td" contents.
Any help would be greatly appreciated.
Everything is described clearly in source code below.
private static String test(String htmlFile) {
File input = null;
Document doc = null;
Elements tdEles = null;
Element table = null;
String tdContents = "";
try {
input = new File(htmlFile);
doc = Jsoup.parse(input, "ASCII", "");
doc.outputSettings().charset("ASCII");
doc.outputSettings().escapeMode(EscapeMode.base);
/** Get table with id = box-table-a **/
table = doc.getElementById("box-table-a");
if (table != null) {
/** Get td tag elements **/
tdEles = table.getElementsByTag("td");
/** Loop each of the td element and get the content by ownText() **/
if (tdEles != null && tdEles.size() > 0) {
for (Element e: tdEles) {
String ownText = e.ownText();
//Delimiter as "||"
if (ownText != null && ownText.length() > 0)
tdContents += ownText + "||";
}
if (tdContents.length() > 0) {
tdContents = tdContents.substring(0, tdContents.length() - 2);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
return tdContents;
}
You can manipulate the String in your textview. All the TD contents is delimited by ||. Use String.split() to get each content if you want.
String[] data = tdContents.split("\\|\\|");

Parsing values from complex table using JSoup

I have a table with the following html:
<TABLE class=data-table cellSpacing=0 cellPadding=0>
<TBODY>
<TR>
<TD colSpan=4><A id=accounting name=accounting></A>
<H3>Accounting</H3></TD></TR>
<TR>
<TH class=data-tablehd align=left>FORM NO.</TH>
<TH class=data-tablehd align=left>TITLE</TH>
<TH class=data-tablehd align=right>Microsoft</TH>
<TH class=data-tablehd align=right>Acrobat</TH></TR>
<TR>
<TD><A id=1008ft name=1008ft>SF 1008-FT</A></TD>
<TD>Work for Others Funding Transfer Between Projects for an Agreement</TD>
<TD align=right><A
href="https://someurl1"
target=top>MS Word</A></TD>
<TD align=right><A
href="https://someurl2"
target=top>PDF </A></TD></TR>
...
I need to parse the <TR> data getting something like
SF 1008-FT, Work for Others ... an Agreement, https://someurl1, https://someurl2
I have tried using the following code:
URL formURL = new URL("http://urlToParse");
Document doc = Jsoup.parse(formURL, 3000);
Element table = doc.select("TABLE[class = data-table]").first();
Iterator<Element> ite = table.select("td[colSpan=4]").iterator();
while(ite.next() != null) {
System.out.println(ite.next().text());
}
However this only returns the "back to Top" and some different headings located throughout the table.
Can someone help me write the correct JSoup code to parse the information I need?
I have not time to test, but you can use something like this:
Element table = doc.select("TABLE[class = data-table]").first();
Elements rows = table.select("tr");
for (Element td: rows.get(2).children()) {
System.out.println(td.text());
}
You get the children of the 3rd row of the table.
I found the solution with some small modification to a similar thread. The code that provides the solution is given below:
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
formNumber = tds.get(0).text();
title = tds.get(1).text();
link1 = tds.get(2).select("a[href]").attr("href");
link2 = tds.get(3).select("a[href]").attr("href");
}
}

Adding td element to next tr of rowspan td

I have below html,
<!DOCTYPE html>
<html>
<body>
<table border="1">
<tr>
<th>Month</th>
<th>Savings</th>
<th>Savings for holiday!</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
<td rowspan="2">$50</td>
</tr>
<tr>
<td>February</td>
<td>$80</td>
</tr>
</table>
</body>
</html>
I want to generate below html using jsoup,
<tr>
<th>Month</th>
<th>Savings</th>
<th>Savings for holiday!</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
<td rowspan="2">$50</td>
</tr>
<tr>
<td>February</td>
<td>$80</td>
<td>$50</td>
</tr>
I have currenty written this piece of code through which i can get the rowspan cell and its associated td index
final Elements rows = table.select("tr");
int rowspanCount=0;
String rowspanString ="";
for(Element row : rows){
int rowspanIndex = 0;
for(Element cell: row.select("td")){
rowspanIndex++;
if(cell.hasAttr("rowspan")){
rowspanCount = Integer.parseInt(cell.attr("rowspan"));
rowspanString = cell.ownText();
cell.removeAttr("rowspan");
}
}
}
Possible HINT: For condition,
cell.hasAttr("rowspan")
Get row-index, like;
int index = row.getIndex();
and then get next Row by index+1, like;
Element eRow = rows.get(index+1);
then append td-Element to this row, this would be your next row to rowspan-row.
After coding everything, I found the solution. Below is the code,
for (Element row : rows) {
int cellIndex = -1;
if(row.select("td").hasAttr("rowspan")){
for (Element cell : row.select("td")) {
cellIndex++;
if (cell.hasAttr("rowspan")) {
rowspanCount = Integer.parseInt(cell.attr("rowspan"));
cell.removeAttr("rowspan");
Element copyRow = row;
for (int i = rowspanCount; i > 1; i--) {
nextRow = copyRow.nextElementSibling();
Element cellCopy = cell.clone();
Element childTd = nextRow.child(cellIndex);
childTd.after(cellCopy);
}
}
}
}
}
It duplicates the rowspan cell to all the following rows that should contain it. As well removes the attribute rowspan for removing any further discrepancy.
You can append this row simply with this code:
Elements rows = table.select("tr > td[rowspan=2]");
for (Element row : rows) {
row.parent().nextElementSibling().append("<td>$50</td>");
}

Categories