Exporting contents in html using java [duplicate] - java

This question already has answers here:
Write HTML file using Java
(11 answers)
Closed 7 years ago.
I need to create an html page to export some information.
Currently, by using java, i've exported information to excel. But now, I need to export the information in HTML page using Java.Actually I am developing an application which will test rest api and generate the output in html.
Is there any APIs I can use? Thanks

There are a lot of ways to do that. You may use any template engine you like or write HTML directly.
Freemarker (http://freemarker.org): easy to use template engine
Velocity: (https://velocity.apache.org/): another one
XML + XSLT: Generate XML with DocumentBuilder or serialize your file with XStream and then apply XSLT. Safier (you can't miss tag) but only cool if you really like XSL (most people hate it).
If your app is web application, you may use JSP or JSPX which is part of servlet specification and good enoght (but its pain to use it offline, so only good for web apps)

Related

Is there an official Java API to parse HTML?

I have an HTML files in which I need to add some attributes to HTML tags using Java, but for some (stupid) reason I can't use a third-party libraries, so the question is, is there an official Java API to parse HTML files? if not, what other options do I have? I'm thinking in adding the attributes without paring the files but I'm guessing that may cause problems later.

GAE how to serve an html file dynamically from the server

I'm working with GAE (Java) in my GWT application.
When my users enter a certain URL I'd like to dynamically create an html page on the server side and serve it to the client.
How can I do this? Using HttpServlet?
I'm quite lost here, do I need to have an html template file on the server side that I dynamically complete and serve to the client?
You should start with the tutorial to learn the basics. You can generate the whole HTML dynamically, but that tends to get awkward. It's better to separate the HTML to a template and fill in the details with the logic implemented in the GAE application.
https://developers.google.com/appengine/docs/java/gettingstarted/
You can use a library like this one https://github.com/alexmuntean/java-url-rewrite . Read the readme to understand more.
You can just take the request and serve anything you want (jsp, jsf, html static). And you can also write gwt code to do actions(effects or ajax for more things. Etc) with the existing html (just add ids to elements) And write another entry point for that page and just include the generated js in your page
I am planning to do a tutorial and POC on how to make a gwt website indexable by google

Create PDF with Java [duplicate]

This question already has answers here:
PDF Generation Library for Java [closed]
(6 answers)
Closed 2 years ago.
Possible Duplicate:
PDF Generation Library for Java
I'm working on an invoice program for a local accounting company.
What is a good way to create a PDF file with Java? Any good library?
I'm totally new to PDF export (On any language).
I prefer outputting my data into XML (using Castor, XStream or JAXB), then transforming it using a XSLT stylesheet into XSL-FO and render that with Apache FOP into PDF. Worked so far for 10-page reports and 400-page manuals. I found this more flexible and stylable than generating PDFs in code using iText.
Following are few libraries to create PDF with Java:
iText
Apache PDFBox
BFO
I have used iText for genarating PDF's with a little bit of pain in the past.
Or you can try using FOP: FOP is an XSL formatter written in Java. It is used in conjunction with an XSLT transformation engine to format XML documents into PDF.
Another alternative would be JasperReports: JasperReports Library. It uses iText itself and is more than a PDF library you asked for, but if it fits your needs I'd go for it.
Simply put, it allows you to design reports that can be filled during runtime. If you use a custom datasource, you might be able to integrate JasperReports easily into the existing system. It would save you the whole layouting troubles, e.g. when invoices span over more sites where each side should have a footer and so on.

Reading HTML+JavaScript using Java

I can read the HTML contents via http (for example, http://www.foo.com) using Java (with URL and BufferedReader classes). However, a couple of them contain JavaScript. My current app cannot process JavaScript.
What's the best way to read HTML content with JavaScript using Java?
I am open using other languages if it is easier.
Thanks in advance for your help.
UPDATE - Clarification:
A couple HTML contents are generated dynamically using JavaScript. I can see the result (in pure HTML after the JavaScript processing) when viewing them on a browser.
On the other hand, when my Java app retrieves the HTML contents, it says that there is no JavaScript on my app.
Ideally, I want to be able to get the same result as on the browser using my Java app.
Thanks for everyone's response.
HtmlUnit has good JavaScript support and it should (almost) parse the HTML as a web browser.
http://htmlunit.sourceforge.net/
http://htmlunit.sourceforge.net/javascript.html
Cobra (http://lobobrowser.org/cobra/getting-started.jsp) will fit your needs
For just HTML parsing you can use HTMLParser (org.htmlparser). However from the way you described your problem, it seems you need a browser, because executing is totally different than just parsing. Cheers.
With no doubt you need to use Java html parser:
Java Open Source HTML Parsers
Which Html Parser is best?
HTML/XML Parser for Java
HTML PARSER in java [closed]

Scrape data from HTML pages using Java, output to database [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to know how to create a scraper (in Java) to gather data from HTML pages and output to a database...do not have a clue where to start so any information you can give me on this would be great. Also, you can't be too basic or simple here...thanks :)
First you need to get familiar with a HTML DOM parser in Java like JTidy. This will help you to extract the stuff you want from a HTML file. Once you have the essential stuff, you can use JDBC to put in the database.
It might be tempting to use regular expression for this job. But don't. HTML is not a regular language so regex are not the way to go.
I am running a scraper using JSoup I'm a noob yet found it to be very intuitive and easy to work with. It is also capable of parsing a wide range or sources html, XML, RSS, etc.
I experimented with htmlunit with little to no success.
i successfully used lobo browser API in a project that scraped HTML pages. the lobo browser project offers a browser but you can also use the API behind it very easily. it will also execute javascript and if that javascript manipulates the DOM, then that will also be reflected in the DOM when you investigate the DOM. so, in short, the API allows you mimic a browser, you can also work with cookies and stuff.
now for getting the data out of the HTML, i would first transform the HTML to valid XHTML. you can use jtidy for this. since XHTML is valid XML, you can use XPath to retrieve the data you want very easily. if you try to write code that parses the data from the raw HTML, your code will become a mess quickly. therefore i'd use XPath.
Once you have the data, you can insert it into a DB with JDBC or maybe use Hibernate if you want to avoid writing too much SQL
A HUGE percentage of websites are build on malformed HTML code. It is essential that you use something like HtmlCleaner to clean up the source code that you want to parse.
Then you can successfully use XPath to extract Nodes and Regex to parse specific part of the strings you extracted from the page.
At least this is the technique I used.
You can use the xHtml that is returned from HtmlCleaner as a sort of Interface between your Application and the remote Page you're trying to parse. You should test against this and in the case the remote page changes you just have to extract the new xHtml cleaned by HtmlCleaner, re-adapt the XPath Queries to extract what you need and re-test your Application code against the new Interface.
In the case you want to create a MultiThreaded 'scraper' be aware that HtmlCleaner is not Thread Safe (refer my post here).
This post can give you an idea of how to parse a correctly formatted xHtml using XPath.
Good Luck! ;)
note: at the time I implemented my Scraper, HtmlCleaner did a better job in normalizing the pages I wanted to parse. In some cases jTidy was failing in doing the same job so I'd suggest you to give it a try
Using JTidy you can scrap data from HTML. Then yoou can use JDBC.

Categories