use jsoup get comments from html and save as XML file - java

I need the help!
Go to page: http://www.tweetfeel.com/
Then type in "linkin park",Used jsoup get all user comments saved as xml?
I'm used java with netbeans.

Not sure if this is possible with jsoup, since the content is generated dynamic by javascript.

Related

How do I get the html (with js script) of a page using JSOUP

I want to get the html content of a page but am unable to because of the scripts that are in the HTML file. I'm trying to use Jsoup to extract the content.
If it helps, this is the link to my issue.
JSoup select form returns null
Does anyone know how I can achieve this? Thanks.

Extracting web page with jquery content data

I need to extract data from particular websites, say it a comment section in a website. What i already tried is extracting html text using jsoup, but since the comment section used jquery it only extract the jquery code not the comments text. Any suggest to solve my problems? thankyou
You can use HTMLUnit to render the page with all needed content and then extract data from build DOMTree. Here you can find info on what to do if AJAX doesn't work OOTB.

how to save a jsp page as pdf in java?

I have a page that built with JSP, struts.This page loading with dynamic content.
I want to save the page as PDF-file with all contents and the same format with a button click. If i can save the page with all contents, I can convert to PDF.
How to save a jsp page with this properties as pdf?
thanks in advance
I was researching this topic lately and I found that better aproach is to use javascript on client side to generate PDF.
There are few libs who can make it for you. choose your way:
f.e.
Generate pdf from HTML in div using Javascript
http://html2canvas.hertzen.com/
https://github.com/MrRio/jsPDF
(:

Saving page content using selenium

I am using selenium to gather data on a web portal.The problem here is the data is in XML format but the URL extension is not .xml and it is displayed as .aspx since it is a dot net website.Now using selenium I can get the page source by using driver.getPageSource()
But it gives me the format in HTML.Separating the XML here using HTML is really a pain and I have tried many options such as JSoup, but it seems like there is too much parsing to be done.
Is there any other way to make selenium manipulate the browser.I can see that File-Save as gives me an option to save the web page in xml format.How to do this in selenium?Are there any other API's that can help me out here.
Edit : My browser here is Internet Explorer
Have you tried like this ?
String pageSource=driver.findElement(By.tagName("body")).getText();
see this pageSource content If it is giving only XML content you can write it to file using file operations.

Getting DOM of webpage with Java

I am making a new browser and I need to get the DOM of a webpage to do it. How do I do this?
You will have to write an HTML parser or use an existing one such as http://jsoup.org/

Categories