How can get html content include content of javascript? - java

i need to get contents on web page and read it via URL,but contents not include data on javascript any body can help me to solve this problem ? For example : i want to get bibtext content ' javascrip from URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=111326695&CFTOKEN=18291914 how can i get content (2) from (1)

From a quick observation, here is what I would do:
1/ Get the content of this web page: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=111326695&CFTOKEN=18291914
2/ Use regular expression to search for 'BibTeX' and locate the below string in the content:
<li style="list-style:disc; display:inline; margin-bottom:0px;">BibTeX</li>
3/ Use another regular expression to fish out:
exportformats.cfm?id=152611&expformat=bibtex
4/ Concatenate it to the url (make sure you decode & to &):
"http://portal.acm.org/" + "exportformats.cfm?id=152611&expformat=bibtex"
5/ Capture the content you're looking for. Ultimately http://portal.acm.org/exportformats.cfm?id=152611&expformat=bibtex gives you the content.

Related

Jsoup extract Hrefs from the HTML content

My problem is that I try to get the Hrefs from this site with JSoup
https://www.amazon.de/s?k=kissen&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2
but it does not work.
I tried to select the class from the Href like this
Elements elements = documentMainSite.select(".a-link-normal");
and after that I tried to extract the Hrefs with the following piece of code.
for (Element element : elements) {
String href = element.attributes().get("href");
}
but unfortunately it gives me nothing...
Can someone tell me where is my mistake please?
I don't just connect to the website. I also save the hrefs in a string by extracting them with
String href = element.attributes().get("href");
after that I've print the href String but is empty.
On another side the code works with another css selector. so it has nothing to do with the code by it self. its just the css selector (.a-link-normal) that is probably wrong.
You won't get anything by simply connecting to the url via Jsoup.
Document document = Jsoup.connect(yourUrl).get();
String bodyText = document.getElementsByTag("body").get(0).text();
Here is the translation of the body text, which I got from the above code.
Enter the characters below We ask for your understanding and want to
be sure that you are not a bot. For best results, please use a browser
that accepts cookies. Type the characters you see in the image: Enter
characters Try another image Continue shopping Terms & Conditions
Privacy Policy © 1996-2015, Amazon.com, Inc. or its affiliates
Either you need to bypass captcha or emulate a browser by means of Selenium, for example.

Selenium - PHP error - Finding text not contained in an tag

I find elements either by their ID or tag or etc. But my element is in a body tag with no tags at all, how can I find this? I know it is in the body tag but there are other elements too! The "text I want to find" is a php error displayed and I am hoping to catch that. I usually go writing WebElement x = driver.findElement(By.??); I cant proceed because I am uncertain what to do.
Sample HTML doc
<head></head>
<body>
Text I want to find
<div>xx</div>
<div>yy</div>
</body>
The reason for the java tag is, I am using Java to write my code?
In your situation I'd have used "context item expression" i.e. a .(dot) operator. So if I write an Xpath like this:
//div[contains(.,'Text To Be Searched')]
Then it will find all div elements which contain text Text To Be Searched. For you my answer would be
driver.findElement(By.xpath("//body[contains(.,'Text I want to find')]"));
You should add that text inside p tag and then you can write :
WebElement x = driver.getElementByTag('p');

How to map server response retrieved in jsp to an iFrame

I'm using struts2 framework(java/js/html/css combo) for my webapp. I am reading a text file from server and I want to write the response to an iFrame present in the same jsp.
Flow:
(1) On click of a link, I pass the relative URL of the text file to jsp.
(2) When the jsp page loads, the java code in the jsp reads the file from server.
(3) Now this response has to be written to an iFrame present in the same jsp file
Can anyone plz help me in writing such response to an iFrame?
Thanks in advance :)
[code not tested, only a demostration of the concept]
here's some very rough idea as to how to fix your code, they definitly not the best but they should be enough to help you understand the concept.
However I'd still recommend going over the whole concept and maybe come up with a more efficent way to do what you need.
if you insist on using iframe, you need to make use of 2 seperate jsp as W3C says in "Implementing HTML Frames":
Any frame that attempts to assign as its SRC a URL used by any of its ancestors is treated as if it has no SRC URL at all (basically a blank frame).
so you'll need 2 jsp, the first one is basically what you have but the the src of the iframe changed to:
<iframe scrolling="yes" width="80%" height="200" src="second.jsp?content=<%=all%>" name="imgbox" id="imgbox">
and the second one will be something like :
<html><body><%= request.getAttribute("content") %></body></html>
From the code you've shown you forced a "content update" on the iframe by using javascript. The proper/usual way to update an iframe is to provide different input parameter to the second jsp and let it update it for you.
Finally, I'd recommend using JSTL as much as possible instead of scriptlets. It is much cleaner.
What you need to do is set the src attribute of the IFRAME to the jsp url when your link is clicked. Another way to do it is doing something like this:
<iframe src="" name="iframe_a"></iframe>
<p>W3Schools.com</p>
with the correct parameters of course

How to strip all html tags and extract content using java?

I have a requirement to escape all html tags from a string and extract only the content. I will have an HTML content as input. for example
<html><body><input type=’text’ value=’Hello World’ size=’50’ /> <div> This is a basic example </div><br/><span align=’center’>Hello Sam!!!</span></body><html>
I need the output as below :
Hello World. This is a basic example.
Hello Sam!!!
I have tried to use HtmlCleaner and even JSoup. First of all I am not getting any full sample application of them. I was able to extract
This is a basic example.
Hello Sam!!!
using HTMLCleaner but could not extract the textbox value as it’s an attribute. Please help.
Here's an example, using JSoup, that shows how to extract attribute values from elements.

Is there a simple java program that can extract URL & title of html files

Hi I am looking for a simple URL & title extractor from html files in Java. I am trying to parse bookmarks.html (IE,Firefox) etc and add the title & url to a db. I need to do this in java (no 3rd party libraries allowed) so proably I have to use sax/dom/regex.
You can load up the file into a DOM document and then use an XPath expression to find all the instances of an tag. Extracting the HREF attribute and the tag contents should do what you want to do. The XPath would probably be something as simple as '//A'.

Categories