How to parse a webpage that includes Javascript? [duplicate] - java

This question already has answers here:
Parse JavaScript with jsoup
(2 answers)
Closed 9 years ago.
I've got a webpage that creates a table using Javascript. Right now I'm using JSoup in my Java project to parse the webpage. By the way JSoup isn't able to run Javascript so the table isn't generated and the source of the webpage is incomplete.
How can I include the HTML code created by that script in order to parse its content using JSoup? Can you provide a simple example? Thank you!
Webpage example:
<!doctype html>
<html>
<head>
<title>A blank HTML5 page</title>
<meta charset="utf-8" />
</head>
<body>
<script>
var table = document.createElement("table");
var tr = document.createElement("tr");
table.appendChild(tr);
document.body.appendChild(table);
</script>
<p>First paragraph</p>
</body>
</html>
The output should be:
<!DOCTYPE html>
<html>
<head>
<title>
A blank HTML5 page
</title>
<meta charset="utf-8"></meta>
</head>
<body>
<script>
var table = document.createElement("table");
var tr = document.createElement("tr");
table.appendChild(tr);
document.body.appendChild(table);
</script>
<table>
<tr></tr>
</table>
<p>
First paragraph
</p>
</body>
</html>
By the way, JSoup doesn't include the table tag as it isn't able to execute Javascript. How can I achieve this?

First possibility
You have some options outside Jsoup, i.e. employing a "real" browser and interact with it. An excellent choice for this would be selenium webdriver. With selenium you can use different browsers as back end, and maybe in your case the very lightweight htmlUnit would do already. If more complicated JavaScript is called there is often no other choice then running a full browser. Luckily, phantomjs is out there and its footprint is not too bad (headless and all).
Second possibility
Another approach could be that you grab the javascript source with JSoup and start a JavaScript interpreter within Java. For that you could use Rhino. However, if you go that path you might as well use HtmlUnit directly, which is probably a bit less bulky.

Related

Muliple body nodes present in HTML dom, so Selenium fails to find the element in dom

I'm trying to automate an application and that has multiple HTML head and body tags present. Below is the sample provided. I tried all possibility using xpath, id , class etc. It doesn't work for this application alone as it as embedded HTML page inside the DOM. I guess, JavaScript loads the a new HTML page inside the page.
Even-though the XPath works in Chrome browser, when I put it in script and run, it throws an exception:
Exception in thread "main" org.openqa.selenium.NoSuchElementException: Unable to locate element: //*[text()='Continue'].
How to tackle this problem?
HTML DOM Sample:
<html class="UShellFullHeight">
<head>
<style id="antiClickjackStyle" type="text/css">
body {
display : none !important;
}
</style>
</head>
<body class="UiBody UShellFullHeight" role="application">
<div id="canvas" class="UShellFullHeight"></div>
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html id="home" lang="EN">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
.
.
.
<span id="WD8A-cnt" class="urNoUserSelect lsButton--contentlsControl--centeraligned urBtnCnt" style="pointer-events:none;">
<span class="lsButton__text " id="WD8A-caption" style="white-space:nowrap;">Continue</span>
</span>
</body>
</html>
</body>
</html>
try with tag:
//span[text()='Continue']
or
the best solution for this example is to use id, this element has id:
driver.findElement(By.id("WD8A-caption"));
Or this xpath which is the same
//span[#id='WD8A-caption']
Try with xpath by creating manually
Create Xpath Manually or
the element has id and also class name driver.findElement(By.className("lsButton__text "))
Thanks a lot for your time guys. I got the answer myself.
Answer is i need to switch the frame and do the operation on elements.
to switch frame.
driver.switchTo().frame("id");

Java - Selenium : <html> containing <html> problematic

I'm testing a web page with Java and Selenium API. The web page is done like this :
<html>
<head>
</head>
<frameset name="f1">
<html>
<frameset name="f2">
</frameset>
</html>
</frameset>
</html>
Using internetExplorerDriver.findElement(By.xpath("//frame[#name='f1']")) the good element is retrieved. But when I do internetExplorerDriver.findElement(By.xpath("//frame[#name='f2']")) it tells that element can't be found. When I look to children of f1 with f1.findElements(By.xpath("*")).size(); I obtain a 0, no children are found.
How this could be solved ? Does the web page has to be changed and remove the sub html markup ? Thanks in advance.
Regards,
Jean Ducrot
You need to switch to the frame before finding the element within the context OF the frame.
internetExplorerDriver.switchTo().frame("f1");
internetExplorerDriver.findElement(By.xpath("//frame[#name='f2']"))

Evaluate JavaScript within a HTML with Java

I want to get the generated html output from a javascript within a given html string in java.
First, i just don't know, how to set the full html and javascript in code. all i've seen is, that i can give some small javascript to java, invoke some parameters and get some output.
Is there a way to set a html-string as context??
The example (taken from another thread):
<html>
<head><title> test </title>
<body></body>
<div>Welcome</div>
<style type="text/css">
.title{
color:red
}
</style>
<script type="text/javascript">
var i=0;
for (i=0;i < 5;i++){
document.writeln("<div class='title'>" + i + "</div>");
}
</script>
</html>
What i want to do is, to pass this html code into javascript context and call the function (actually there is no function, but i know how to call javascripts functions)
then, Rendering in a browser will give you this html back:
<html>
<head><title> test </title>
<body></body>
<div>Welcome</div>
<style type="text/css">
.title{
color:red
}
</style>
<script type="text/javascript">
var i=0;
for (i=0;i < 5;i++){
document.writeln("<div class='title'>" + i + "</div>");
}
</script>
<div class="title">0</div>
<div class="title">1</div>
<div class="title">2</div>
<div class="title">3</div>
<div class="title">4</div>
</html>
So, the basic output on the javascripting mangere is simple:
Welcome
0
1
2
3
4
I don't want to render any of these elements, i am simply interested in getting the html output.
I'm not sure how to perform this task.
I want to keep it simple, no frameworks at all, just a simple mechanic to pass html-code and javafunctions into the javascriptengine, evaluate it and get the html back, just like the browsers getting more html-code back (usually).
there's no rendering intended.
Have you checked out the new Nashorn javascript engine in Java 8?
I havent actually used it yet, so I am not sure if you can just pass it an html string or if you will need to separate the JS and HTML, but worth taking a look.
Theres a basic tutorial here:
http://winterbe.com/posts/2014/04/05/java8-nashorn-tutorial/
If not, there is always PhantomJS that you can use to execute Javascript. There are some examples here:
http://phantomjs.org/quick-start.html
Looks like there are some useful functions like page loading and code evaluating that you may be able to use.

Play! framework. template "include"

I'm planning my website structure as following:
header.scala.html
XXX
footer.scala.html
now, instead of "xxx" there should be a specific page (i.e. "UsersView.scala.html").
what I need is to include (like with well-known languages) the source of the footer and the
header into the the middle page's code.
so my questions are:
How do you include a page in another with scala templating?
Do you think it's a good paradigm for Play! framework based website?
Just call another template like a method. If you want to include footer.scala.html:
#footer()
A common pattern is to create a template that contains the boilerplate, and takes a parameter of type HTML. Let's say:
main.scala.html
#(content: HTML)
#header
// boilerplate
#content
// more boilerplate
#footer
In fact, you don't really need to separate out header and footer with this approach.
Your UsersView.scala.html then looks like this:
#main {
// all your users page html here.
}
You're wrapping the UsersView with main by passing it in as a parameter.
You can see examples of this in the samples
My usual main template is a little more involved and looks roughly like this:
#(title: String)(headInsert: Html = Html.empty)(content: Html)(implicit user: Option[User] = None)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>#title</title>
// bootstrap stuff here
#headInsert
</head>
<body>
#menu(user)
<div id="mainContainer" class="container">
#content
</div>
</body>
</html>
This way a template can pass in a head insert and title, and make a user available, as well as content of course.
Play provide a very convenient way to help implement that!
Layout part from official docs:
First we have a base.html (that's we call in django -_-)
// views/main.scala.html
#(title: String)(content: Html)
<!DOCTYPE html>
<html>
<head>
<title>#title</title>
</head>
<body>
<section class="content">#content</section>
</body>
</html>
How to use the base.html?
#main(title = "Home") {
<h1>Home page</h1>
}
More information here

Reading a JSP variable from JavaScript

How can I read/access a JSP variable from JavaScript?
alert("${variable}");
or
alert("<%=var%>");
or full example
<html>
<head>
<script language="javascript">
function access(){
<% String str="Hello World"; %>
var s="<%=str%>";
alert(s);
}
</script>
</head>
<body onload="access()">
</body>
</html>
Note: sanitize the input before rendering it, it may open whole lot of XSS possibilities
The cleanest way, as far as I know:
add your JSP variable to an HTML element's data-* attribute
then read this value via Javascript when required
My opinion regarding the current solutions on this SO page: reading "directly" JSP values using java scriplet inside actual javascript code is probably the most disgusting thing you could do. Makes me wanna puke. haha. Seriously, try to not do it.
The HTML part without JSP:
<body data-customvalueone="1st Interpreted Jsp Value" data-customvaluetwo="another Interpreted Jsp Value">
Here is your regular page main content
</body>
The HTML part when using JSP:
<body data-customvalueone="${beanName.attrName}" data-customvaluetwo="${beanName.scndAttrName}">
Here is your regular page main content
</body>
The javascript part (using jQuery for simplicity):
<script type="text/JavaScript" src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.1/jquery.js"></script>
<script type="text/javascript">
jQuery(function(){
var valuePassedFromJSP = $("body").attr("data-customvalueone");
var anotherValuePassedFromJSP = $("body").attr("data-customvaluetwo");
alert(valuePassedFromJSP + " and " + anotherValuePassedFromJSP + " are the values passed from your JSP page");
});
</script>
And here is the jsFiddle to see this in action http://jsfiddle.net/6wEYw/2/
Resources:
HTML 5 data-* attribute: https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
Include javascript into html file Include JavaScript file in HTML won't work as <script .... />
CSS selectors (also usable when selecting via jQuery) https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Getting_started/Selectors
Get an HTML element attribute via jQuery http://api.jquery.com/attr/
Assuming you are talking about JavaScript in an HTML document.
You can't do this directly since, as far as the JSP is concerned, it is outputting text, and as far as the page is concerned, it is just getting an HTML document.
You have to generate JavaScript code to instantiate the variable, taking care to escape any characters with special meaning in JS. If you just dump the data (as proposed by some other answers) you will find it falling over when the data contains new lines, quote characters and so on.
The simplest way to do this is to use a JSON library (there are a bunch listed at the bottom of http://json.org/ ) and then have the JSP output:
<script type="text/javascript">
var myObject = <%= the string output by the JSON library %>;
</script>
This will give you an object that you can access like:
myObject.someProperty
in the JS.
<% String s="Hi"; %>
var v ="<%=s%>";
<%#page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<script
src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">
<title>JSP Page</title>
<script>
$(document).ready(function(){
<% String name = "phuongmychi.github.io" ;%> // jsp vari
var name = "<%=name %>" // call var to js
$("#id").html(name); //output to html
});
</script>
</head>
<body>
<h1 id='id'>!</h1>
</body>
I know this is an older post, but I have a cleaner solution that I think will solve the XSS issues and keep it simple:
<script>
let myJSVariable = <%= "`" + myJavaVariable.replace("`", "\\`") + "`" %>;
</script>
This makes use of the JS template string's escape functionality and prevents the string from being executed by escaping any backticks contained within the value in Java.
You could easily abstract this out to a utility method for re-use:
public static String escapeStringToJS(String value) {
if (value == null) return "``";
return "`" + value.replace("`", "\\`") + "`";
}
and then in the JSP JS block:
<script>
let myJSVariable = <%= Util.escapeStringToJS(myJavaVariable) %>;
</script>
The result:
<script>
let myJSVariable = `~\`!##$%^&*()-_=+'"|]{[?/>.,<:;`;
</script>
Note: This doesn't take separation of concerns into consideration, but if you're just looking for a simple and quick solution, this may work.
Also, if you can think of any risks to this approach, please let me know.

Categories