Is there an official Java API to parse HTML? - java

I have an HTML files in which I need to add some attributes to HTML tags using Java, but for some (stupid) reason I can't use a third-party libraries, so the question is, is there an official Java API to parse HTML files? if not, what other options do I have? I'm thinking in adding the attributes without paring the files but I'm guessing that may cause problems later.

Related

Exporting contents in html using java [duplicate]

This question already has answers here:
Write HTML file using Java
(11 answers)
Closed 7 years ago.
I need to create an html page to export some information.
Currently, by using java, i've exported information to excel. But now, I need to export the information in HTML page using Java.Actually I am developing an application which will test rest api and generate the output in html.
Is there any APIs I can use? Thanks
There are a lot of ways to do that. You may use any template engine you like or write HTML directly.
Freemarker (http://freemarker.org): easy to use template engine
Velocity: (https://velocity.apache.org/): another one
XML + XSLT: Generate XML with DocumentBuilder or serialize your file with XStream and then apply XSLT. Safier (you can't miss tag) but only cool if you really like XSL (most people hate it).
If your app is web application, you may use JSP or JSPX which is part of servlet specification and good enoght (but its pain to use it offline, so only good for web apps)

library to bring a pdf to code

I have the need to create some pdf through java code, pdf templates are fixed and are very complex too. Each pdf represents a module and I have to fill it with informations I get from other sources in my application and finally create the complete pdf.
So, is there a library which, given a pdf, scans it and creats some java code which will eventually create it back? This way once I get the pdf template in java code I just have to edit it adding my informations and create it back.
You can check these 2 link it's have all information regarding PDF CRUD operation.
How to read PDF files using Java?
http://mrbool.com/how-to-create-write-and-read-pdf-files-using-pdfone-and-java/27058
I have used with success iText http://itextpdf.com/ for various tasks involving creating, parsing and modifying pdf files. Please not that this one is not free for commercial use and the pricing isn't cheap either.
However, your question is a possible duplicate of the folowing one, so be sure to check the answers there:
https://stackoverflow.com/questions/6118635/what-is-the-best-pdf-open-source-library-for-java
itext is one of best approach for your need..
itext API'S

Which free Java library can I use to generate PDFs in Java?

I have a need to generate some PDF documents through Java .. Which API or library, etc, can I use to do this in the most effective way ?
EDIT: Added requirements:
I'll be using this for a commercial application, so I'd like to work with a library which is free to use for commercial applications as well ..
Secondly, my work will be like this: I have a pre-defined PDF file which has blank text fields in it. This would be my 'template file', and can be generated manually. Then, within my program, I would then take this file, put data into the text fields and generate new PDFs. And this would be done repeatedly.
So for the above added requirements, what would be the best library for me now ? iText does seem appealing, but it seems I would have to pay for it if I wanted to use it in a commercial app, which I'd like to avoid ..
http://itextpdf.com/download.php
Check out iText:
http://itextpdf.com/
I am using Apache PDFBox to generate PDF in java
Reference : https://pdfbox.apache.org/
iText is probably the best, but if you cannot live with their license there is also Apache PDFBox.
If you are creating your PDF templates yourself, then you might want to look at Docmosis - it lets you create templates in Word or OpenOffice with fields that you replace when you render documents. There is a free version you can distribute with your application if your document volumes are low.

Reading HTML+JavaScript using Java

I can read the HTML contents via http (for example, http://www.foo.com) using Java (with URL and BufferedReader classes). However, a couple of them contain JavaScript. My current app cannot process JavaScript.
What's the best way to read HTML content with JavaScript using Java?
I am open using other languages if it is easier.
Thanks in advance for your help.
UPDATE - Clarification:
A couple HTML contents are generated dynamically using JavaScript. I can see the result (in pure HTML after the JavaScript processing) when viewing them on a browser.
On the other hand, when my Java app retrieves the HTML contents, it says that there is no JavaScript on my app.
Ideally, I want to be able to get the same result as on the browser using my Java app.
Thanks for everyone's response.
HtmlUnit has good JavaScript support and it should (almost) parse the HTML as a web browser.
http://htmlunit.sourceforge.net/
http://htmlunit.sourceforge.net/javascript.html
Cobra (http://lobobrowser.org/cobra/getting-started.jsp) will fit your needs
For just HTML parsing you can use HTMLParser (org.htmlparser). However from the way you described your problem, it seems you need a browser, because executing is totally different than just parsing. Cheers.
With no doubt you need to use Java html parser:
Java Open Source HTML Parsers
Which Html Parser is best?
HTML/XML Parser for Java
HTML PARSER in java [closed]

How to parse and modify HTML file in Java

I am doing a project wherein I need to read an HTML file and identify specific tags, modify the contents of the tag, and create a new HTML file. Is there a library that parses HTML tags and is capable of writing the tags back to a new file?
Check out http://jsoup.org, it has a friendly dom-like API, for simple tasks you don't need to parse the html.
if you want to modify web page and return modified content, I thnk the best way is to use XSL transformation.
http://en.wikipedia.org/wiki/XSLT
There are too many HTML parsers. You could use JTidy, NekoHTML or check TagSoup.
I usually prefer parsing XHTML with the standard Java XML Parsers, but you can't do this for any type of HTML.
Look at http://java-source.net/open-source/html-parsers for a list of java libraries that parse html files into java objects that can be manipulated.
If the html files you are working with are well formed (xhtml) then you can also use XML libraries in java to find particular tags and modify them. The IO itself should be handled by the particular libraries you are using.
If you choose to manually parse the strings you could use regular expressions to find particular tags and use the java io libraries to write to the files and create new html documents. But this method reinvents the wheel so to speak because you have to manage tag opening and closing and all of those things are handled by pre-existing libraries.

Categories