Jsoup parsing for nested html - java

I have an HTML to parse with Jsoup and I lose track after the HTML's weird structure. I can summarize HTML like this(Every line is one level inside of the above):
<html>
<body class="page3078">
<div id="mainCapsule">
<div id="contentCapsule" class="capsule">
<div id="content">
<div id="subCapsule" class="clearFix" xmlns="">
<div id="contentLeft">
<iframe width="635" height="1000" frameborder="0" src="apps/Results.aspx">
#document
<html xmlns="http://www.w3.org/1999/xhtml">
<body style="background:none;">
<form id="form1" action="Results.aspx" method="post" name="form1">
<div class="pressContent">
<div class="tableCapsule details">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr class="even">
Basically I want to get text inside of the tag with class "even". I tried directly calling class even like this:
doc.getElementsByClass("even")
It didn't work. I tried parent > child relationship with selector method. It didn't work either. I tried this inside of second html tag:
doc.select("body.page3078 > html > body > #form1 > th");
Didn't work either. Where am I wrong?

One comment summarizes the start of a solution here:
As mentioned here you need to get the page from the iframe in a separate jsoup parser. This page isn't weird at all - it's just a separate page is shown in the iframe. – Boris the Spider

Related

Using a form to display model objects in Thymeleaf. Is there a better way to do this?

been trying to get comfortable with Spring Boot/Thymeleaf/frontend development in general. I'm currently using a table on the frontend to display a list of objects from the backend, as well as a form to allow users to view properties of each item in the list.
I feel like the way I'm doing this is really weird and feel that there could be a better way. Would appreciate any tips/feedback, thank you.
HTML/Thymeleaf
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8">
<title>Gambler Card List</title>
<link th:href="#{/css/styles.css}" rel="stylesheet" />
</head>
<body>
<h1>Gambler Card List</h1>
<div class="cardlist-wrapper">
<div class="cardlist-names">
<form action="/gambler/cardlist" method="get">
<table class="cardlist-table">
<tr th:each="card : ${cardList.getCardList()}">
<td>
<input th:class="card-btn"
th:attr="id=${{card.getId()} == {currentCard.getId()} ? 'selected' : ''}"
type="submit" th:value="${card.getName()}" name="cardName">
</td>
</tr>
</table>
</form>
</div>
<div class="cardlist-description">
<h4 th:text="${currentCard.getDescriptionTitle()}"></h4>
<p th:text="${currentCard.getDescription()}"></p>
</div>
</div>
</body>
</html>
Controller
#RequestMapping(value = "gambler/cardlist")
public String baseCardList(Model model,
#RequestParam(value="cardName", required=false) String cardName) {
model.addAttribute("currentCard", cardList.getCardByName(cardName));
model.addAttribute("cardList", cardList);
return "gambler/cardlist";
}
Output (need 10 rep to post image, so here's a link)
One potential improvement/simplification is to remove the form and replace the inputs with anchors. The anchors would look something like this (not tested!):
<a th:href="#{/gambler/cardlist(cardName=${card.name})}">
<button th:text="${card.name}"
th:classappend="${card.id == currentCard.id ? 'selectedButton' : 'deselectedButton'}">Card Name</button>
</a>
You can style selectedButton and deselectedButton as you wish.

Jsp div order get changed while rendering

In jsp div's are in below order
<div id="identification"/>
<div id="address"/>
<div id="communication"/>
<div id="familyDetail"/>
but while rendering in JBOSS and Weblogic server below is HTML output
<div id="address"/>
<div id="communication"/>
<div id="identification"/>
<div id="familyDetail"/>
Jsp java file also in same order as in jsp but while converting in HTML it gets changed

Wicket : render/refresh RepeatingView without having html tag

So i have FormControl inside Repeating View and i set html for Repeating view as Wicket:container
I am trying to refresh formControl but because i am stripping wicket tags in output
it gives JS error.
i know wicket:container can not be refreshed. but i am not able to refresh control inside it. I tried setting
control.setOutputMarkupPlaceholderTag(true);
control.setOutputMarkupId(true);
And Html is Something like this
<form wicket:id="form">
<wicket:container wicket:id="repeatingContainer">
</wicket:container>
</form>
here is error i am getting
Cannot bind a listener for event "change" on element "id1a3" because the element is not in the DOM
i want to remove repeatingContainer html tag from output so it follows bootstrap form layout.
Update:
This code is inside the RepeaterView
<wicket:panel>
<div wicket:id="componentGroup">
<wicket:child/>
</div>
</wicket:panel>
This code wicket Child
<div wicket:id="labelContainer">
<label wicket:id="label"></label>
</div>
<div wicket:id="controlContainer" class="control-container">
<input wicket:id="input"/>
</div>
ok so i am able to tackle this down.
I updated component Group div to wicket:container and leave the Repeater as div. So now i am able to refresh. and it works alright.
So this is how it will look like
<form wicket:id="form">
<div wicket:id="repeatingContainer">
<wicket:container wicket:id="componentGroup">
<div wicket:id="labelContainer">
<label wicket:id="label"></label>
</div>
<div wicket:id="controlContainer" class="control-container">
<input wicket:id="input"/>
</div>
</wicket:container>
</div>
</form>

Convert Html String content to java Map

I have the following html string content and i want to convert it into java map using java.
<div dir="ltr"><div class="gmail_quote"><div dir="ltr"><div><div dir="ltr"><div><p style="font-family:arial,sans-serif;font-size:13px">Notification for shipment event group "Picked up" for 13 May 14.<u></u><u></u></p>
<div class="MsoNormal" align="center" style="font-family:arial,sans-serif;font-size:13px;text-align:center">
<hr size="2" width="100%" align="center"></div><table border="0" cellpadding="0" style="font-family:arial,sans-serif;font-size:13px">
<tbody><tr><td style="padding:0.75pt">
<p class="MsoNormal">
AWB Number: 8841965182<br>
Pickup Date: 2014-05-13 20:11:00<br>
Service: P<br>
Pieces: 1<br>
enter code here`
I have used jsoup but did not worked.
Take a look at Boilerpipe
A similar question is asked here at SO

How to get content of iframe using GWT Query?

I trying to do like this:
$("iframe.cke_dialog_ui_input_file").contents()
but it returns:
< #document(gquery, error getting the element string representation: (TypeError) #com.google.gwt.dom.client.DOMImplMozilla::toString(Lcom/google/gwt/dom/client/Element;)([JavaScript object(8570)]): doc is null)/>
But document is not null!
Help me please to solve this problem :(
UPD. HTML CODE:
<iframe id="cke_107_fileInput" class="cke_dialog_ui_input_file" frameborder="0" src="javascript:void(0)" title="Upload Image" role="presentation" allowtransparency="0">
<html lang="en" dir="ltr">
<head>
<body style="margin: 0; overflow: hidden; background: transparent;">
<form lang="en" action="gui/ckeditor/FileUploadServlet?CKEditor=gwt-uid-7&CKEditorFuncNum=0&langCode=en" dir="ltr" method="POST" enctype="multipart/form-data">
<label id="cke_106_label" style="display:none" for="cke_107_fileInput_input">Upload Image</label>
<input id="cke_107_fileInput_input" type="file" size="38" name="upload" aria-labelledby="cke_106_label">
</form>
<script>
window.parent.CKEDITOR.tools.callFunction(90);window.onbeforeunload = function() {window.parent.CKEDITOR.tools.callFunction(91)}
</script>
</body>
</html>
</iframe>
First get the iframe element using javascript like your existing cod and store it into Iframe of GWT
IFrameElement iframe = (IFrameElement) element;
Now use iframe to get content
iframe.getContentDocument().getBody().getInnerText();
Hope it help you to get values.
The contents() method returns the HTMLDocument, so normally you have to find the <body> to manipulate it.
$("iframe.cke_dialog_ui_input_file").contents().find("body");
A common mistake is to query the iframe before it has been fully loaded, so code a delay using a Timer, Scheduler or GQuery.delay(). For instance:
$("iframe.cke_dialog_ui_input_file")
.delay(100,
lazy()
.contents().find("body")
.css("font-name", "verdana")
.css("font-size", "x-small")
.done());

Categories