I'm trying to retreive informantions from a website but the problem is that classes names are identical.
This is the website structure.
<tr class="main_el">
<td class="key">KEY1</td>
<td class="val">VALUE1</td>
</tr>
<tr class="main_el">
<td class="key">KEY2</td>
<td class="val">VALUE2</td>
</tr>
...
<tr class="main_el">
<td class="key">KEY3</td>
<td class="val">VALUE3</td>
</tr>
I can't use this .get(i).getElementsByClass(); because indexes are diffrent for each page. Please help!
EDIT
I want to use KEY1 retrieve VALUE1 only and independently of other VALUES.
Note VALUE1 could be at index 1 or 9
You can write simple function like that.
public Map<String, String> parseHtml(String inputHtml) {
Document.OutputSettings outputSettings = new Document.OutputSettings();
outputSettings.syntax(Document.OutputSettings.Syntax.html);
outputSettings.prettyPrint(false);
Document htmlDoc = Jsoup.parse(inputHtml);
//Creating map to save td <key,value>
Map<String, String> textMap = new HashMap<>();
Elements trElements = htmlDoc.select("tr.main_el");
if (trElements.size() > 0) {
for (Element trElement : trElements) {
String key = null;
String value = null;
for (Element tdElement : trElement.children()) {
if (tdElement.hasClass("key"))
key = tdElement.text();
if (tdElement.hasClass("value"))
value = tdElement.text();
}
if (key != null && value != null)
textMap.put(key, value);
}
}
return textMap;
}
Then you can retrieve values from map by keys from your html.
Thanks.
Maybe this works:
select all <tr> elements
for each <tr>
select <td> with class "key" from the <tr>
if value of this element == "KEY1" then
select <td> with class "key" from <tr>
do whatever you want with this value
Related
I have a map that I would like to render into a table in Thymeleaf. Each key is a document title and each value is an integer displaying the number of times a keyword appears in that document.
HTML:
<table class="table table-striped table-bordered table-hover">
<tr>
<td>Document Title</td>
<td>Keyword Count</td>
</tr>
<tr th:each="m : ${map}">
<td th:text="${m} + '(' + ${map.get(key)} + ') '"></td>
<td th:text="${m} + '(' + ${map.get(value)} + ') '"></td>
</tr>
</table>
Map:
Map<String, Integer> map = combineListsIntoOrderedMap(formattedLinks, countList);
private static Map<String, Integer> combineListsIntoOrderedMap(List<String> keys, List<Integer> values) {
if (keys.size() != values.size())
throw new IllegalArgumentException("Cannot combine lists with dissimilar sizes.");
Map<String, Integer> map = new LinkedHashMap<>();
for (int i = 0; i < keys.size(); i++) {
map.put(keys.get(i), values.get(i));
}
return map;
}
My output is a table that looks like this:
Document Title | Keyword Count
http://www.example.com/doc.doc=1(null) | http://www.example.com/doc.doc=1(null)
It should look like:
Document Title | Keyword Count
http://www.example.com/doc.doc | 1
You can use key and value like this:
<tr th:each="m : ${map}">
<td th:text="${m.key}">key</td>
<td th:text="${m.value}">value</td>
</tr>
I want to target specific td inside a tr.
This is my code:
private void fletch(String name) throws IOException, JSONException {
final String iron = "img=2";
final String ui = "img=3";
final String hc = "img=10";
String url = "services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=";
if ( name.toLowerCase().indexOf(iron.toLowerCase()) != -1 ) {
url = "http://services.runescape.com/m=hiscore_oldschool_ironman/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(ui.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_ultimate/hiscorepersonal.ws?user1=";
}else if( name.toLowerCase().indexOf(hc.toLowerCase()) != -1 ){
url = "http://services.runescape.com/m=hiscore_oldschool_hardcore_ironman/hiscorepersonal.ws?user1=";
}
String[] parts = name.split(">");
String part2 = parts[1];
String fin = part2.replaceAll("\\s","+");
url+=fin;
Document doc = Jsoup.connect(url)
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
//core part
Element table1 = doc.select("table").first();
String body = table1.toString();
Document docb = Jsoup.parseBodyFragment(body);
Element bbd = docb.body();
String hhk = bbd.toString();
//This is where i dont know how to target the td data.. Tried this (cant check code so came on here):
String overall = bbd.getElementsByTag("td").get(4).text();
Now this gives me this HTML code:
<table cellpadding="3" cellspacing="0" border=0 style="max-width: 355px;">
<tr><td colspan="5" align="center"><b>Personal scores for big kurwaaa</b></td></tr>
<tr>
<td colspan="2" style="text-align:left;padding-left:24px;"><b>Skill</b></td><td align="right"><b>Rank</b></td><td align="right"><b>Level</b></td><td align="right"><b>XP</b></td>
</tr>
<tr><td width="35"></td><td width="100"></td><td width="75"></td><td width="40"></td><td width="75"></td></tr>
<tr>
<td></td>
<td align="left"><a href="overall.ws?table=0&user=big+kurwaaa">
Overall
</a></td>
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
</tr>
<tr>
<td align="right"><img class="miniimg" src="http://www.runescape.com/img/rsp777/hiscores/skill_icon_attack1.gif"></td>
<td align="left"><a href="overall.ws?table=1&user=big+kurwaaa">
Attack
</a></td>
<td align="right">14,475</td>
<td align="right">19</td>
<td align="right">4,304</td>
</tr>
I want to target the 3 td with data inside every tr. So for example:
<td align="right">7,430</td>
<td align="right">466</td>
<td align="right">6,164,312</td>
and so on from the "overall" tr to the last. Is there any way to do in a simple way that will give me the option to loop through the data and create a JSON/map?
Ps: new to java
If you want to get all the tr tags inside bbd use getElementsByTag.
It will return Elements, by which you can browse through all the tr tags by index (0 based index).If want to skip first 3 tr tags just start loop from index : 3, and so for td tags
Here is the demo code :
Elements trList = bbd.getElementsByTag("tr");
for (int i = 3; i < trList.size(); i++) {
System.out.println("----------------- TR START -----------------");
Elements tdList = trList.get(i).getElementsByTag("td");
for (int j = 2; j < tdList.size(); j++) {
System.out.println(tdList.get(j));
}
System.out.println("------------------ TR END ------------------");
}
String url = "yourUrl";
Document doc = Jsoup.connect(url).get();
Element table = doc.select("table[class=tableClass]").first();
Iterator<Element> iterator = table.select("td[align=right]").iterator();
iterator.next();//skip first
iterator.next();//skip second
System.out.println(iterator.next().text());
I've with me some html table contents.And for my application I want to parse these html contents using JSOUP parsing in android.But I am new to this JSOUP method and I can't parse those html contents properly.
HTML data:
<table id="box-table-a" summary="Tracking Result">
<thead>
<tr>
<th width="20%">AWB / Ref. No.</th>
<th width="30%">Status</th>
<th width="30%">Date Time</th>
<th width="20%">Location</th>
</tr>
</thead>
<tbody>
<tr>
<td width="20%" nowrap="nowrap" class="click">Z45681583</td>
<td width="30%" nowrap="nowrap" class="click">
IN TRANSIT<div id='ntfylink' style='display:block; text-decoration:blink'><a href='#' class='topopup' name='modal' style='text-decoration:none'><font face='Verdana' color='#DF0000'><blink>Notify Me</blink></font></a></div>
</td>
<td width="30%">
Sat, Jan, 31, 2015 07:09 PM
</td>
<td width="20%">DELHI</td>
</tr>
</tbody>
</table>
from this table I need the"td" contents.
Any help would be greatly appreciated.
Everything is described clearly in source code below.
private static String test(String htmlFile) {
File input = null;
Document doc = null;
Elements tdEles = null;
Element table = null;
String tdContents = "";
try {
input = new File(htmlFile);
doc = Jsoup.parse(input, "ASCII", "");
doc.outputSettings().charset("ASCII");
doc.outputSettings().escapeMode(EscapeMode.base);
/** Get table with id = box-table-a **/
table = doc.getElementById("box-table-a");
if (table != null) {
/** Get td tag elements **/
tdEles = table.getElementsByTag("td");
/** Loop each of the td element and get the content by ownText() **/
if (tdEles != null && tdEles.size() > 0) {
for (Element e: tdEles) {
String ownText = e.ownText();
//Delimiter as "||"
if (ownText != null && ownText.length() > 0)
tdContents += ownText + "||";
}
if (tdContents.length() > 0) {
tdContents = tdContents.substring(0, tdContents.length() - 2);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
return tdContents;
}
You can manipulate the String in your textview. All the TD contents is delimited by ||. Use String.split() to get each content if you want.
String[] data = tdContents.split("\\|\\|");
I have a HashMap created in the following manner,
HashMap products = new HashMap<String, String[]>();
products.put("001",new String[] {"SAM", "100"});
Now I need to print the content of the Map in a table, I know how to print it, if it is created with out the nested array, as shown in the code snippet below.
<%
for (Map.Entry<String, String> entry : orderList.entrySet()) {
%>
<tr>
<td><%=counter++%></td>
<td><%=entry.getKey()%></td>
<td><%=entry.getValue()%></td>
</tr>
How can I print the content in the HashMap with the nested Array?
Solution I tried,
<%
for (Map.Entry<String, String[]> entry : Order.entrySet()) {
%>
<tr>
<td><%=counter++%></td>
<td><%=entry.getKey()%></td>
<td></td>
<td></td>
</tr>
<%
You need one more nested loop
for (String arrayElement: entry.getValue()) {
<%=arrayElement%>
You can define a nested table in the <TD> instead of the simple <td><%=entry.getValue()%></td>
Why cant we use jstl here? Like..
<c:forEach var="entry" items="${products}">
Key: <c:out value="${entry.key}"/>
Value: <c:forEach var="arrayVar" items="${entry.value}">
<li>${arrayVar}</li>
</c:forEach>
</c:forEach>
I have managed to get the solution thanks to StanislavL and Shaded answers so the complete solution is illustrated below,
The HashMap is defined as following,
static HashMap<String,String[]> products = new HashMap<String, String[]>();
products.put("001",new String[] {"Samsung", "USD. 500 ", "5 Units" });
This Hashmap can be printed in a table as illustrated below using a JSP.
<% int counter = 1; %>
<table class="styledLeft" id="moduleTable">
<thead>
<tr>
<th width="10%">No</th>
<th width="10%">Model No</th>
<th width="30%">Model/Make</th>
<th width="30%">Price</th>
<th width="20%">Available Quantity</th>
</tr>
</thead>
<tbody>
<%
for (Map.Entry<String, String[]> entry : orders.entrySet()) {
%>
<tr>
<td><%=counter++%></td>
<td><%=entry.getKey()%></td>
<%for (String arrayElement: entry.getValue()) {%>
<td><%=arrayElement%></td>
<%
}
%>
</tr>
<%
}
%>
</tbody>
</table>
The Final output will look like this,
I have a table with the following html:
<TABLE class=data-table cellSpacing=0 cellPadding=0>
<TBODY>
<TR>
<TD colSpan=4><A id=accounting name=accounting></A>
<H3>Accounting</H3></TD></TR>
<TR>
<TH class=data-tablehd align=left>FORM NO.</TH>
<TH class=data-tablehd align=left>TITLE</TH>
<TH class=data-tablehd align=right>Microsoft</TH>
<TH class=data-tablehd align=right>Acrobat</TH></TR>
<TR>
<TD><A id=1008ft name=1008ft>SF 1008-FT</A></TD>
<TD>Work for Others Funding Transfer Between Projects for an Agreement</TD>
<TD align=right><A
href="https://someurl1"
target=top>MS Word</A></TD>
<TD align=right><A
href="https://someurl2"
target=top>PDF </A></TD></TR>
...
I need to parse the <TR> data getting something like
SF 1008-FT, Work for Others ... an Agreement, https://someurl1, https://someurl2
I have tried using the following code:
URL formURL = new URL("http://urlToParse");
Document doc = Jsoup.parse(formURL, 3000);
Element table = doc.select("TABLE[class = data-table]").first();
Iterator<Element> ite = table.select("td[colSpan=4]").iterator();
while(ite.next() != null) {
System.out.println(ite.next().text());
}
However this only returns the "back to Top" and some different headings located throughout the table.
Can someone help me write the correct JSoup code to parse the information I need?
I have not time to test, but you can use something like this:
Element table = doc.select("TABLE[class = data-table]").first();
Elements rows = table.select("tr");
for (Element td: rows.get(2).children()) {
System.out.println(td.text());
}
You get the children of the 3rd row of the table.
I found the solution with some small modification to a similar thread. The code that provides the solution is given below:
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
formNumber = tds.get(0).text();
title = tds.get(1).text();
link1 = tds.get(2).select("a[href]").attr("href");
link2 = tds.get(3).select("a[href]").attr("href");
}
}