Extract Span tag data using Jsoup - java

I am trying to extract the specific content in html using Jsoup. Below is the sample html content.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body class="">
<div class="shop-section line bmargin10 tmargin10">
<div class="price-section fksk-price-section unit">
<div class="price-table">
<div class="line" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">
<div class="price-save">
<span class="label-td"><span class="label fksk-label">Price :</span></span>
</div>
<span class="price final-price our fksk-our" id="fk-mprod-our-id">Rs.<span class="small-font"> </span>11990</span>
</div>
<meta itemprop="price" content="Rs. 11990" />
<meta itemprop="priceCurrency" content="INR" />
<div class="our-price-desc fksk-our-price-desc">
<small>(Prices are inclusive of all taxes)</small>
</div>
</div>
</div>
</div>
</body>
</html>
I got the required output using below command:
document.select(".price-table").select(".line").select("span").get(2).text()
Looks like its lengthy.
Can't i able to directly get using span class ("price final-price our fksk-our")?
Any help regarding the same?

Does this not work for you? Not sure why you're arbitrarily starting at price-table.
doc.select("span[class=price final-price our fksk-our]").text();
If not, it should be pretty close. Look at JSoup's selector syntax; it is very powerful.

Related

Handlebar compile error if template is not well-formed html

I am using handlebar as a templating framework.
When the template get compiled, in case it is not well-formed html, there should be exception thrown.
Does handlebar have such a feature?
For example the following template is not well-formed as the img tag is not properly closed.
<head>
<meta charset="utf-8"/>
<title>Template</title>
</head>
<body>
<div class="container">
<div class="row">
<div class="col">
<img src="logo.jpg">
</div>
</div>
</div>
</body>
</html>

Bootstrap components not rendering in Thymeleaf template when having more than one path segment

I have a Spring Boot Web MVC application where I'm using a Bootstrap 4 template and Thymeleaf as view.
I'm facing a problem where a drop-down navigation item is rendering without styles and scripts when the containing page is mapped in the controller under more than one path segment. If the path consists only of one segment the component works as intended.
Page containing the component "menu.html"
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="">
<meta name="author" content="">
<title>Menu</title>
<!-- Custom fonts for this template -->
<link href="/vendor/fontawesome-free/css/all.min.css" rel="stylesheet" type="text/css">
<link href="https://fonts.googleapis.com/css?family=Nunito:200,200i,300,300i,400,400i,600,600i,700,700i,800,800i,900,900i"
rel="stylesheet">
<!-- Custom styles for this template -->
<link href="/css/sb-admin-2.min.css" rel="stylesheet">
<!-- Custom styles for this page -->
<link href="/vendor/datatables/dataTables.bootstrap4.min.css" rel="stylesheet">
</head>
<body id="page-top">
<div id="wrapper">
<nav class="navbar navbar-expand navbar-light bg-white topbar mb-4 static-top shadow">
<ul class="navbar-nav ml-auto">
<!-- Drop-Down Component -->
<li class="nav-item dropdown no-arrow">
<a class="nav-link dropdown-toggle" href="#" id="userDropdown" role="button"
data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
<span class="mr-2 d-none d-lg-inline text-gray-600 small" th:text="${user.getName()}"></span>
<img class="img-profile rounded-circle" src="img/undraw_profile.svg" width="50" height="50">
</a>
<!-- Dropdown - User Information -->
<div class="dropdown-menu dropdown-menu-right shadow animated--grow-in" aria-labelledby="userDropdown">
<a class="dropdown-item" href="#" data-toggle="modal" data-target="#logoutModal">
<i class="fas fa-sign-out-alt fa-sm fa-fw mr-2 text-gray-400"></i>
Logout
</a>
</div>
</li>
</ul>
</nav>
<!-- Logout Modal-->
<div class="modal fade" id="logoutModal" tabindex="-1" role="dialog" aria-labelledby="logoutModalLabel"
aria-hidden="true">
<div class="modal-dialog" role="document">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="logoutModalLabel">Ready to Leave?</h5>
<button class="close" type="button" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">×</span>
</button>
</div>
<div class="modal-body">Select "Logout" below if you are ready to end your current session.</div>
<div class="modal-footer">
<button class="btn btn-secondary" type="button" data-dismiss="modal">Cancel</button>
<a class="btn btn-primary" href="/login?logout=true">Logout</a>
</div>
</div>
</div>
</div>
</div>
<!-- Bootstrap core JavaScript-->
<script src="vendor/jquery/jquery.min.js"></script>
<script src="vendor/bootstrap/js/bootstrap.bundle.min.js"></script>
<!-- Core plugin JavaScript-->
<script src="vendor/jquery-easing/jquery.easing.min.js"></script>
<!-- Custom scripts for all pages-->
<script src="js/sb-admin-2.min.js"></script>
<!-- Page level plugins -->
<script src="vendor/datatables/jquery.dataTables.min.js"></script>
<script src="vendor/datatables/dataTables.bootstrap4.min.js"></script>
<!-- Page level custom scripts -->
<script src="js/demo/datatables-demo.js"></script>
</body>
</html>
Using the following mapping in my controller the component works fine:
#GetMapping(value = "/first-segment")
public String showModal(Model model) {
model.addAttribute("user", userService.getAuthenticatedUser());
return "menu";
}
But if I add a second segment no styling is displayed and the component can't be clicked on:
#GetMapping(value = "/first-segment/second-segment")
public String showModal(Model model) {
model.addAttribute("user", userService.getAuthenticatedUser());
return "menu";
}
Note
I'd also like to add that not all Bootstrap components don't render correctly under two path segments. As far as I've noticed only drop-down and modal components share this issue.
Replace:
<script src="vendor/jquery/jquery.min.js"></script>
with:
<script src="/vendor/jquery/jquery.min.js"></script>
Note the extra slash at the start. Otherwise, the Javascript file is supposed to be relative to the current file. That is why it worked for the first segment, but not the second segment.
By starting with a slash, it is always relative to the root of the path.

Adding button for each elements of list in thymeleaf

I'm having trouble putting a button into the element of the list based on Thymeleaf. I would like to have a button next to element of the list.
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org" xmlns:layout="http://www.ultraq.net.nz/thymeleaf/layout"
layout:decorate="~{layout.html}">
<head>
<meta charset="UTF-8">
<title>Home</title>
</head>
<body>
<section layout:fragment="" content="">
<div class="row">
<div class="col-2">
<a th:href="#{/archives}">archiwum zadań</a>
</div>
<div class="col-2">
<a th:href="#{/add}">dodawanie zadania</a>
</div>
</div>
<ul>
<li th:each="task: ${tasks}"
th:text="|${task.getId()} ${task.getName()} ${task.getCategory().getDescription()} ${task.isFinished()}|">
<input type="submit" value="Done">
</li>
</ul>
</section>
</body>
</html>
Thymeleaf will replace any pre-existing tag content with whatever is generated by the th:text expression.
In your case that pre-existing content is your <input> element - which is why the buttons are not displayed.
One way to avoid this is to place your th:text into a span inside the <li>:
<ul>
<li th:each="task: ${tasks}">
<span th:text="|${task.id} ${task.name} ... |"></span>
<input type="submit" value="Done">
</li>
</ul>
(I removed a couple of your fields from my example, for brevity).
Note also in my case, I have changed ${task.getId} to ${task.id}. As long as you have appropriately named getters (as per the JavaBeans naming convention) you can use the field name - and Thymeleaf will find the correct getter to call.

Using templates/includes in Spring Boot

im 'developing' an application made using Spring Boot, i am not so sure how to manage templates.
I've tought two ways to do this:
Using custom tags and returning the value of the HTML/JSP file as a String and use it
Something like this
<th:header></th:header>
<!-- This would be the header (The implementation of this tag in Java) -->
<body>
Things that are not common
<th:footer></th:footer>
<!-- This would be the footer -->
</body>
The other way could be using includes, but im not so sure how to do it...
Not sure if theres another way of doing this using Spring. Hope that you could understand me.
Thanks you before hand :·).
You could use Thymeleaf. Then you can use
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head th:replace="common/header :: common-header" />
<body>
<div th:replace="common/navbar :: common-navbar" />
<your content here>
</body
</html>
Then you would have the common-header and common-navbar thymeleaf fragments in separate files like so (note the th:fragment in the common files. Also note that these files are under a common folder):
<head th:fragment="common-header">
<title>DevOps Buddy</title>
<!-- Bootstrap core CSS -->
<link th:href="#{/webjars/bootstrap/3.3.6/css/bootstrap.min.css}"
rel="stylesheet" media="screen" />
<!-- Custom styles for this template -->
<link type="text/css" th:href="#{/css/styles.css}" rel="stylesheet" media="screen"/>
And so
<div th:fragment="common-navbar" class="navbar navbar-inverse navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li class="active"><a th:text="#{navbar.home.text}" href="/"></a></li>
<li><a th:text="#{navbar.about.text}" href="/about"></a></li>
<li><a th:text="#{navbar.contact.text}" href="/contact"></a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li th:if="${#authorization.expression('!isAuthenticated()')}">
<a th:href="#{/login}" th:text="#{navbar.login.text}" />
</li>
<li th:if="${#authorization.expression('isAuthenticated()')}">
<form id="f" th:action="#{/logout}" method="post" role="form" class="navbar-form">
<button type="submit" th:text="#{navbar.logout.text}" class="btn btn-primary" />
</form>
</li>
</ul>
</div><!--/.nav-collapse -->
</div>
</div>

Jsoup parse and nested tags

I'm learning Jsoup and have this HTML:
[...]
<p style="..."> <!-- div 1 -->
Content
</p>
<p style="..."> <!-- div 2 -->
Content
</p>
<p style="..."> <!-- div 3 -->
Content
</p>
[...]
I use Jsoup.parse() and document select("p") for catch "content" (and works nice). But...
[...]
<p style="..."> <!-- div 1 -->
Content
</p>
<p style="..."> <!-- div 2 -->
Content
</p>
<p style="..."> <!-- div 3 -->
Content
<p style="..."></p>
<p style="..."></p>
</p>
[...]
In this scene, I see that Jsoup.parse() convert this code to:
[...]
<p style="..."> <!-- div 1 -->
Content
</p>
<p style="..."> <!-- div 2 -->
Content
</p>
<p style="..."> <!-- div 3 -->
Content
</p>
<p style="..."> <!-- div 4 -->
</p>
<p style="..."> <!-- div 5 -->
</p>
[...]
How can I keep order of nested paragraphs with Jsoup (div 4 & 5 inside of div 3)?
Add a example:
HTML file:
<html>
<head>
<title>Title</title>
</head>
<body>
<p style="margin-left:2em">
<span class="one">Text</span>
<span class="two"><span class="nest">Text</span></span>
<span class="three"></span>
</p>
<p style="margin-left:2em">
<span class="one">Text</span>
<span class="two"><span class="nest">Text</span></span>
<span class="three"></span>
</p>
<p style="margin-left:2em">
<span class="one">Text</span>
<span class="two"><span class="nest">Text</span></span>
<span class="three"></span>
<p style="margin-left:2em"></p>
<p style="margin-left:2em"></p>
</p>
</body>
</html>
Java code:
Document doc = null;
doc = Jsoup.connect(URL_with_HTML).get();
System.out.println(doc.outerHtml());
Return:
<html>
<head>
<title>Title</title>
</head>
<body>
<p style="margin-left:2em"> <span class="one">Text</span> <span class="two"><span class="nest">Text</span></span> <span class="three"></span> </p>
<p style="margin-left:2em"> <span class="one">Text</span> <span class="two"><span class="nest">Text</span></span> <span class="three"></span> </p>
<p style="margin-left:2em"> <span class="one">Text</span> <span class="two"><span class="nest">Text</span></span> <span class="three"></span> </p>
<p style="margin-left:2em"></p>
<p style="margin-left:2em"></p>
<p></p>
</body>
</html>
Is correct this? I using Jsoup 1.6.1. I understand that Jsoup should return nested paragraphs instead of previous return.
Nested paragraphs do not exist in HTML. The prior paragraph is closed automatically since Jsoup implements the WHATWG HTML5 specification:
A p tag is automatically closed by any of the following: address, article, aside, blockquote, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul. Therefore <p><div></div> becomes <p></p><div></div>.
An end tag whose name is p (ie </p>) that does not have a corresponding start tag is a parse error and is replaced with <p>. Therefore <span></span></p> becomes <span></span><p>.
So jsoup is correct and your HTML is invalid.
Be sure to comprehend that your HTML is invalid because you have too many </p> and not because "nesting" paragraphs. Nesting cannot happend because they get auto-closed. But the later coming </p> is obsolet because the "corresponding" <p> was already auto-closed before.
Hj, I understand the original question. But I think it is a bug of Jsoup (not yours). Since this is a simple example:
<html>
<head></head>
<body>
<p></p>
<p>
<div></div>
</p>
</body>
</html>
But Jsoup parses this:
<html>
<head></head>
<body>
<p></p>
<p></p>
<div></div>
<p></p>
</body>
</html>
If you could, please file this bug so the author can fix it :-)
P.S: Just a word "hi", stackoverflow does not allow it?

Categories