Getting DOM of webpage with Java - java

I am making a new browser and I need to get the DOM of a webpage to do it. How do I do this?

You will have to write an HTML parser or use an existing one such as http://jsoup.org/

Related

How do I get the html (with js script) of a page using JSOUP

I want to get the html content of a page but am unable to because of the scripts that are in the HTML file. I'm trying to use Jsoup to extract the content.
If it helps, this is the link to my issue.
JSoup select form returns null
Does anyone know how I can achieve this? Thanks.

How to read current source html from a webpage using Java/Perl/Python (e.g. after editing it using the js console)

I realize this looks like a duplicate question, but it's not!(as far as I know, and I've searched a lot...) So for the last few days I've been trying to get the HTML content of my whatsapp web application but using the input stream reader provided by java seems to not give me the full html code. The URL I'm using is just https://web.whatsapp.com/, which I suppose could be a problem, but there aren't any personal URLs as far as I'm aware. However, in developer tools using the element inspector I can easily access and read the DOM elements I'm interested in. I'm wondering if there's a way I can get this source directly using java/perl/python.
I'm also looking to do this as a learning project, so preferably would like to stay away from tools such as jsoup and such. Thanks!
You can use selenium.webdriver in python. Something like:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://web.whatsapp.com/")
html = browser.page_source
If you want to get your own whatsapp page, you should use selenium to log into the site before getting the page_source.

Get DOM after javascript runs java

I need to get the contents of a URL as a String, but the page this url points to has javascript that runs on page load that manipulates the DOM. How can I retrieve the HTML with the javascript DOM manipulation included? Is something like Selenium the right option? If so, how would I do that?
Try doing this :
Use the pause command
And then use driver.getPageSource()

Saving page content using selenium

I am using selenium to gather data on a web portal.The problem here is the data is in XML format but the URL extension is not .xml and it is displayed as .aspx since it is a dot net website.Now using selenium I can get the page source by using driver.getPageSource()
But it gives me the format in HTML.Separating the XML here using HTML is really a pain and I have tried many options such as JSoup, but it seems like there is too much parsing to be done.
Is there any other way to make selenium manipulate the browser.I can see that File-Save as gives me an option to save the web page in xml format.How to do this in selenium?Are there any other API's that can help me out here.
Edit : My browser here is Internet Explorer
Have you tried like this ?
String pageSource=driver.findElement(By.tagName("body")).getText();
see this pageSource content If it is giving only XML content you can write it to file using file operations.

use jsoup get comments from html and save as XML file

I need the help!
Go to page: http://www.tweetfeel.com/
Then type in "linkin park",Used jsoup get all user comments saved as xml?
I'm used java with netbeans.
Not sure if this is possible with jsoup, since the content is generated dynamic by javascript.

Categories