I can read the HTML contents via http (for example, http://www.foo.com) using Java (with URL and BufferedReader classes). However, a couple of them contain JavaScript. My current app cannot process JavaScript.
What's the best way to read HTML content with JavaScript using Java?
I am open using other languages if it is easier.
Thanks in advance for your help.
UPDATE - Clarification:
A couple HTML contents are generated dynamically using JavaScript. I can see the result (in pure HTML after the JavaScript processing) when viewing them on a browser.
On the other hand, when my Java app retrieves the HTML contents, it says that there is no JavaScript on my app.
Ideally, I want to be able to get the same result as on the browser using my Java app.
Thanks for everyone's response.
HtmlUnit has good JavaScript support and it should (almost) parse the HTML as a web browser.
http://htmlunit.sourceforge.net/
http://htmlunit.sourceforge.net/javascript.html
Cobra (http://lobobrowser.org/cobra/getting-started.jsp) will fit your needs
For just HTML parsing you can use HTMLParser (org.htmlparser). However from the way you described your problem, it seems you need a browser, because executing is totally different than just parsing. Cheers.
With no doubt you need to use Java html parser:
Java Open Source HTML Parsers
Which Html Parser is best?
HTML/XML Parser for Java
HTML PARSER in java [closed]
Related
Without the use of any external library, what is the simplest way to fetch a website's HTML content into a string? I had tried, but I'm getting the complete page source, but I only want HTML content.
I find it a bit difficult to achieve this my friend without the use of an external lib.
You actually want to execute the javascript parts of the Html and act like a GUI-less web browser programmaticaly.
If you are to use an external library I would go for http://htmlunit.sourceforge.net/ that is pretty easy.
I want to edit html pages using java. is it possible. ( it is required for my project, if I automate the test cases using selenium when it passes then the result comes in the company provided HTML sheet)
If you use JavaFX as GUI you can use its built in HTML editor.
Well there are a lot of differnt ways to edit html.
If you want to parse HTML code and manipulate it, http://jsoup.org/ is a very good solution.
I am looking for a Java tool to scrape a CSV from a website and then parse the data. Jsoup seems like a viable option. Is there a way to scrape a CSV file and then save the information to a database using Jsoup?
Or is it strictly for scraping HTML code? Thanks.
No, it ain't gonna work. Look at the Jsoup description:
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
What you are asking for is how to parse CSV file in Java. This question might be helpful for you:
Fast CSV parsing
I am looking some good Sql Syntax highlighter which will easily integrated with Component based(JSF,ZK ) framework. Any idea which will be best for me i tried Codeirror but binding is not working. Any one suggest some other which will easily integrated. I do not want to open output in JFrame or Applet its should be in Browser
Using prettify is a good solution, but this is a JavaScript library working client-side in the browser.
If you want to send your source code (java, sql, python, bash, html, xml, css, javascript...) prepared server-side as HTML code with span tags to color the text (i.e. syntax highligthing) in pure Java, you can use java-prettify. This is a port of the Javascript lib in java.
I have explained how you can use the parser to produce highligthed HTML code here: use the parser to create HTML. Have a look at the code in the java class PrettifyToHtml and at the example.
I'm writing a program (in Java) that needs to extract links from webpages. I'm using htmlParser (http://htmlparser.sourceforge.net/) but I'm only able to extract html links (defined with <a href="...">) and I don't know how to handle javascript code to extract links from... can you help me??
You can use Rhino with DOM environment, written in JavaScript.
By the way it is written by John Resig.
HTML Parser from sourceforge is useful. I have used it to parse a whole bunch of HTML already. However, parsing JS is different. Cheers.
This is probally the most comprehensive tool out there. Rhino . Everything you want to do can be done with Rhino.