html table td contents parsing using jsoup in android - java

I've with me some html table contents.And for my application I want to parse these html contents using JSOUP parsing in android.But I am new to this JSOUP method and I can't parse those html contents properly.
HTML data:
<table id="box-table-a" summary="Tracking Result">
<thead>
<tr>
<th width="20%">AWB / Ref. No.</th>
<th width="30%">Status</th>
<th width="30%">Date Time</th>
<th width="20%">Location</th>
</tr>
</thead>
<tbody>
<tr>
<td width="20%" nowrap="nowrap" class="click">Z45681583</td>
<td width="30%" nowrap="nowrap" class="click">
IN TRANSIT<div id='ntfylink' style='display:block; text-decoration:blink'><a href='#' class='topopup' name='modal' style='text-decoration:none'><font face='Verdana' color='#DF0000'><blink>Notify Me</blink></font></a></div>
</td>
<td width="30%">
Sat, Jan, 31, 2015 07:09 PM
</td>
<td width="20%">DELHI</td>
</tr>
</tbody>
</table>
from this table I need the"td" contents.
Any help would be greatly appreciated.

Everything is described clearly in source code below.
private static String test(String htmlFile) {
File input = null;
Document doc = null;
Elements tdEles = null;
Element table = null;
String tdContents = "";
try {
input = new File(htmlFile);
doc = Jsoup.parse(input, "ASCII", "");
doc.outputSettings().charset("ASCII");
doc.outputSettings().escapeMode(EscapeMode.base);
/** Get table with id = box-table-a **/
table = doc.getElementById("box-table-a");
if (table != null) {
/** Get td tag elements **/
tdEles = table.getElementsByTag("td");
/** Loop each of the td element and get the content by ownText() **/
if (tdEles != null && tdEles.size() > 0) {
for (Element e: tdEles) {
String ownText = e.ownText();
//Delimiter as "||"
if (ownText != null && ownText.length() > 0)
tdContents += ownText + "||";
}
if (tdContents.length() > 0) {
tdContents = tdContents.substring(0, tdContents.length() - 2);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
return tdContents;
}
You can manipulate the String in your textview. All the TD contents is delimited by ||. Use String.split() to get each content if you want.
String[] data = tdContents.split("\\|\\|");

Related

Retrieve data from Html with JSoup

I'm trying to retreive informantions from a website but the problem is that classes names are identical.
This is the website structure.
<tr class="main_el">
<td class="key">KEY1</td>
<td class="val">VALUE1</td>
</tr>
<tr class="main_el">
<td class="key">KEY2</td>
<td class="val">VALUE2</td>
</tr>
...
<tr class="main_el">
<td class="key">KEY3</td>
<td class="val">VALUE3</td>
</tr>
I can't use this .get(i).getElementsByClass(); because indexes are diffrent for each page. Please help!
EDIT
I want to use KEY1 retrieve VALUE1 only and independently of other VALUES.
Note VALUE1 could be at index 1 or 9
You can write simple function like that.
public Map<String, String> parseHtml(String inputHtml) {
Document.OutputSettings outputSettings = new Document.OutputSettings();
outputSettings.syntax(Document.OutputSettings.Syntax.html);
outputSettings.prettyPrint(false);
Document htmlDoc = Jsoup.parse(inputHtml);
//Creating map to save td <key,value>
Map<String, String> textMap = new HashMap<>();
Elements trElements = htmlDoc.select("tr.main_el");
if (trElements.size() > 0) {
for (Element trElement : trElements) {
String key = null;
String value = null;
for (Element tdElement : trElement.children()) {
if (tdElement.hasClass("key"))
key = tdElement.text();
if (tdElement.hasClass("value"))
value = tdElement.text();
}
if (key != null && value != null)
textMap.put(key, value);
}
}
return textMap;
}
Then you can retrieve values from map by keys from your html.
Thanks.
Maybe this works:
select all <tr> elements
for each <tr>
select <td> with class "key" from the <tr>
if value of this element == "KEY1" then
select <td> with class "key" from <tr>
do whatever you want with this value

java find table using jsoup and equivalent xpath

Here is the HTML code:
<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
<tbody>
<tr>
<td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
</tr>
<tr>
<td class="chl" width="20%">ALM Server</td>
<td class="ctext"></td>
</tr>
<tr>
<td class="chl" width="20%">ALM Domain/Project</td>
<td class="ctext">EBUSINESS/STERLING</td>
</tr>
<tr>
<td class="chl" width="20%">TestSet URL</td>
<td class="ctext">almtestset://localhost</td>
</tr>
<tr>
<td class="chl" width="20%">Tests Executed</td>
<td class="ctext"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Start Time</td>
<td class="ctext">08/31/2017 12:20:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Finish Time</td>
<td class="ctext">08/31/2017 02:31:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Total Duration</td>
<td class="ctext"><b>2h 11m </b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Parameters</td>
<td class="ctext"><b>{"browser":"chrome","browser-version":"56","language":"english","country":"US"}</b></td>
</tr>
<tr>
<td class="chl" width="20%">Passed</td>
<td class="ctext" style="color:#269900"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Failed</td>
<td class="ctext" style="color:#990000"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Not Completed</td>
<td class="ctext" style="color: ##ff8000;"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Pass %</td>
<td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
</tr>
</tbody>
And here is the xpath to get the table:
//td[text() = 'TestSet URL']/ancestor::table[1]
How can I get this table using jSoup? I've tried:
tableElements = doc.select("td:contains('TestSet URL')");
to get the child element, but that doesn't work and returns null. I need to find the table and put all the children into a map. Any help would be greatly appreciated!
The following code will parse your table into a map, this code is subject to a few assumptions:
This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath
The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid
The code below throws exceptions if the above assumptions are not met i.e. if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table.
If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid.
public void parseTable() {
Document doc = Jsoup.parse(html);
// declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
Map<String, String> asMap = new HashMap<>();
Element table = getTable(doc);
// now walk though the rows creating a map for each one
Elements rows = table.select("tr");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
// expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
if (cols.size() == 2) {
asMap.put(cols.get(0).text(), cols.get(1).text());
} else {
throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
}
}
System.out.println(asMap);
}
private Element getTable(Document doc) {
Elements tables = doc.select("table");
for (int i = 0; i < tables.size(); i++) {
// this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
// text "TestSet URL" anywhere in its body
// this crude evaluation is the JSoup equivalent of that xpath
if (tables.get(i).text().contains("TestSet URL")) {
return tables.get(i);
}
}
throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}
For the HTML posted in your question, the above code will output:
{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}
You have to remove those quotation marks to get the row with the text; just
tableElements = doc.select("td:contains(TestSet URL)");
but note with the above you are only selecting td elements which contain the text "TestSet URL". To select the whole table use
Element table = doc.select("table.textfont").first();
which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first().
To get all the tr elements:
Elements tableRows = doc.select("table.textfont tr");
for(Element e: tableRows)
System.out.println(e);

How to target specific td in a HTML document with java

I want to target specific td inside a tr.
This is my code:
private void fletch(String name) throws IOException, JSONException {
final String iron = "img=2";
final String ui = "img=3";
final String hc = "img=10";
String url = "services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=";
if ( name.toLowerCase().indexOf(iron.toLowerCase()) != -1 ) {
url = "http://services.runescape.com/m=hiscore_oldschool_ironman/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(ui.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_ultimate/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(hc.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_hardcore_ironman/hiscorepersonal.ws?user1=";
}
String[] parts = name.split(">");
String part2 = parts[1];
String fin = part2.replaceAll("\\s","+");
url+=fin;
Document doc = Jsoup.connect(url)
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
//core part
Element table1 = doc.select("table").first();
String body = table1.toString();
Document docb = Jsoup.parseBodyFragment(body);
Element bbd = docb.body();
String hhk = bbd.toString();
//This is where i dont know how to target the td data.. Tried this (cant check code so came on here):
String overall = bbd.getElementsByTag("td").get(4).text();
Now this gives me this HTML code:
<table cellpadding="3" cellspacing="0" border=0 style="max-width: 355px;">
<tr><td colspan="5" align="center"><b>Personal scores for big kurwaaa</b></td></tr>
<tr>
<td colspan="2" style="text-align:left;padding-left:24px;"><b>Skill</b></td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td>
</tr>
<tr><td width="35"></td><td width="100"></td><td width="75"></td><td width="40"></td><td width="75"></td></tr>
<tr>
<td></td>
<td align="left"><a href="overall.ws?table=0&user=big+kurwaaa">
Overall
</a></td>
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
</tr>
<tr>
<td align="right"><img class="miniimg" src="http://www.runescape.com/img/rsp777/hiscores/skill_icon_attack1.gif"></td>
<td align="left"><a href="overall.ws?table=1&user=big+kurwaaa">
Attack
</a></td>
<td align="right">14,475</td>
<td align="right">19</td>
<td align="right">4,304</td>
</tr>
I want to target the 3 td with data inside every tr. So for example:
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
and so on from the "overall" tr to the last. Is there any way to do in a simple way that will give me the option to loop through the data and create a JSON/map?
Ps: new to java
If you want to get all the tr tags inside bbd use getElementsByTag.
It will return Elements, by which you can browse through all the tr tags by index (0 based index).If want to skip first 3 tr tags just start loop from index : 3, and so for td tags
Here is the demo code :
Elements trList = bbd.getElementsByTag("tr");
for (int i = 3; i < trList.size(); i++) {
System.out.println("----------------- TR START -----------------");
Elements tdList = trList.get(i).getElementsByTag("td");
for (int j = 2; j < tdList.size(); j++) {
System.out.println(tdList.get(j));
}
System.out.println("------------------ TR END ------------------");
}
String url = "yourUrl";
Document doc = Jsoup.connect(url).get();
Element table = doc.select("table[class=tableClass]").first();
Iterator<Element> iterator = table.select("td[align=right]").iterator();
iterator.next();//skip first
iterator.next();//skip second
System.out.println(iterator.next().text());

java.sql.SQLException: Column Index out of range, 8 > 6

When I try retrieve the whole record and display it from database. Its showing an error
java.sql.SQLException: Column Index out of range, 8 > 6.
I am not able to figure this out. pls help.
JAVA CODE
public ArrayList viewAllDrivers() {
ArrayList allDrivers=new ArrayList();
try {
String sql= "select * from adddriver ORDER BY dname";
rs =DBConnection.executeQuery(sql);
while(rs.next()) {
ArrayList one = new ArrayList();
one.add(rs.getInt(1));
one.add(rs.getString(2));
one.add(rs.getString(6));
one.add(rs.getString(8));
one.add(rs.getString(9));
one.add(rs.getString(10));
one.add(rs.getInt(11));
allDrivers.add(one);
}
}
catch (Exception ex) {
System.out.println (ex);
}
return allDrivers;
}
PAGE WHERE I AM TRYING TO SHOW THE RESULT
<%
SearchDAO searchDAO = new SearchDAO();
ArrayList all = searchDAO.viewAllDrivers();
int size = all.size();
%>
<table width="95%" align="center" style="border:#D22929 solid 2px;padding:10px;" border="0">
<tr>
<th bgcolor="#D22929" scope="col"><span class="style10">Driver Name </span></th>
<th bgcolor="#D22929" scope="col"><span class="style10">Address</span></th>
<th bgcolor="#D22929" scope="col"><span class="style10">City</span></th>
<th bgcolor="#D22929" scope="col"><span class="style10">Contact</span></th>
<th bgcolor="#D22929" scope="col"><span class="style10">Country </span></th>
<th bgcolor="#D22929" scope="col"><span class="style10">Ation</span></th>
</tr>
<%
for(int i=0;i<size;i++){
ArrayList one=(ArrayList)all.get(i);
%>
<tr style="height:30px; padding:4px;">
<td><div align="center"><%=(String)one.get(1)%></div></td>
<td><div align="center"><%=(String)one.get(2)%></div></td>
<td><div align="center"><%=(String)one.get(3)%></div></td>
<td><div align="center"><%=(String)one.get(4)%> </div></td>
<td><div align="center"><%=(String)one.get(5)%> </div></td>
</tr>
<% } %>
This error implies that your adddriver only has 6 columns, so 8 is an invalid column index.
This means all of these statements have invalid indices :
one.add(rs.getString(8));
one.add(rs.getString(9));
one.add(rs.getString(10));
one.add(rs.getInt(11));
Perhaps your DB table doesn't contain what you think it does.
Always better to explicitly name the column you are retrieving so you dont face such problem.
public ArrayList viewAllDrivers() {
ArrayList allDrivers=new ArrayList();
try {
String sql= "select city,address,... from adddriver ORDER BY dname";
rs =DBConnection.executeQuery(sql);
while(rs.next()) {
ArrayList one = new ArrayList();
one.add(rs.getInt(1));
one.add(rs.getString(2));
one.add(rs.getString(6));
one.add(rs.getString(8));
one.add(rs.getString(9));
one.add(rs.getString(10));
one.add(rs.getInt(11));
allDrivers.add(one);
}
}
catch (Exception ex) {
System.out.println (ex);
}
return allDrivers;
}
You are going out of your column count because that you are encountering this exception

Parsing values from complex table using JSoup

I have a table with the following html:
<TABLE class=data-table cellSpacing=0 cellPadding=0>
<TBODY>
<TR>
<TD colSpan=4><A id=accounting name=accounting></A>
<H3>Accounting</H3></TD></TR>
<TR>
<TH class=data-tablehd align=left>FORM NO.</TH>
<TH class=data-tablehd align=left>TITLE</TH>
<TH class=data-tablehd align=right>Microsoft</TH>
<TH class=data-tablehd align=right>Acrobat</TH></TR>
<TR>
<TD><A id=1008ft name=1008ft>SF 1008-FT</A></TD>
<TD>Work for Others Funding Transfer Between Projects for an Agreement</TD>
<TD align=right><A
href="https://someurl1"
target=top>MS Word</A></TD>
<TD align=right><A
href="https://someurl2"
target=top>PDF </A></TD></TR>
...
I need to parse the <TR> data getting something like
SF 1008-FT, Work for Others ... an Agreement, https://someurl1, https://someurl2
I have tried using the following code:
URL formURL = new URL("http://urlToParse");
Document doc = Jsoup.parse(formURL, 3000);
Element table = doc.select("TABLE[class = data-table]").first();
Iterator<Element> ite = table.select("td[colSpan=4]").iterator();
while(ite.next() != null) {
System.out.println(ite.next().text());
}
However this only returns the "back to Top" and some different headings located throughout the table.
Can someone help me write the correct JSoup code to parse the information I need?
I have not time to test, but you can use something like this:
Element table = doc.select("TABLE[class = data-table]").first();
Elements rows = table.select("tr");
for (Element td: rows.get(2).children()) {
System.out.println(td.text());
}
You get the children of the 3rd row of the table.
I found the solution with some small modification to a similar thread. The code that provides the solution is given below:
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
formNumber = tds.get(0).text();
title = tds.get(1).text();
link1 = tds.get(2).select("a[href]").attr("href");
link2 = tds.get(3).select("a[href]").attr("href");
}
}

Categories