write to docx with Apache POI and enriched text (HTML) [duplicate] - java

This question already has an answer here:
How to set define different styles for the same paragraph
(1 answer)
Closed 2 years ago.
i'm writing a docx with apache poi and i have enriched text like "<p>This is a Paragraph with <strong>enriched text</strong>.</p>". But i need that the Word document keep the style like the "strong".

If you only have to create bold/italic/underlined texts, then checking the tags like strong yourself and using the build in Apache POI functions setBold/setItalic on the run Class can be an option. If more complex HTML structures like with breaks and lists are also included, then a better option would be to use the JSOUP library in combination with Apache POI. I have good experience with this library in combination with Apache POI.
Simple example:
How do I use Jsoup to parse html for text?

Related

Apache poi open xml tag search

I'm trying to parse a .docx (Open XML) file using Apache POI on Java. I want to be able to extract tags like these: <w:tag w:val="tag"/> from my document. The problem is that I didn't find any examples of how to do it in the internet. Is it possible to achieve something like this using Apache POI library for java or some another library?
Similiar question but in C# for reference: OpenXML tag search

Simple PDF generation via Java batch: iText or Apache FOP?

I have to generate a simple PDF document from a little java batch (Java 7). The generated document will contain a list and a couple of tables (nothing fancy). Aside from license problems (AGPL is not an issue in this case), which library is faster/easier to implement and has better performances between iText and Apache FOP for the desired output?
As you said , you don't need fancy tables and you need faster and easier library to implement, I'd prefer iText because it is very much simpler than Apache FOP. It is very easy to add list and tables to your PDF document by using iText. Apache FOP is much concerned about generating PDF documents in which the data to be written is stored in XML. Basically Apache FOP's main objective is to convert XML files to PDF ones.
You can visit here for more details:- http://blog.xebia.com/comparing-apache-fop-with-itext/

Convert HTML to Microsoft Word Document in Java [duplicate]

This question already has answers here:
Convert html to doc in java
(5 answers)
Writing HTML content into MS WORD using JAVA?
(2 answers)
Closed 9 years ago.
I know it is a repetitive question, but most of the answers are not straightforward for this question. Some say, convert HTML to XHTML and then convert it to Word doc. Some say, right-click on the page and select 'Save as doc'. But my question is, is there any particular API for this, which simply converts the HTML to Doc?
To elaborate, is there any API like iText (which we use for PDF) for Word doc generation?
Thanks.
The Apache POI project's goal is to provide a comprehensive Java API for Microsoft Word documents.
However, you should keep in mind that Word formatting is awfully complex and messy. It might be easier to go with a cleaner or more abstracted specification, like PDF.
Best of luck.

Create PDF with Java [duplicate]

This question already has answers here:
PDF Generation Library for Java [closed]
(6 answers)
Closed 2 years ago.
Possible Duplicate:
PDF Generation Library for Java
I'm working on an invoice program for a local accounting company.
What is a good way to create a PDF file with Java? Any good library?
I'm totally new to PDF export (On any language).
I prefer outputting my data into XML (using Castor, XStream or JAXB), then transforming it using a XSLT stylesheet into XSL-FO and render that with Apache FOP into PDF. Worked so far for 10-page reports and 400-page manuals. I found this more flexible and stylable than generating PDFs in code using iText.
Following are few libraries to create PDF with Java:
iText
Apache PDFBox
BFO
I have used iText for genarating PDF's with a little bit of pain in the past.
Or you can try using FOP: FOP is an XSL formatter written in Java. It is used in conjunction with an XSLT transformation engine to format XML documents into PDF.
Another alternative would be JasperReports: JasperReports Library. It uses iText itself and is more than a PDF library you asked for, but if it fits your needs I'd go for it.
Simply put, it allows you to design reports that can be filled during runtime. If you use a custom datasource, you might be able to integrate JasperReports easily into the existing system. It would save you the whole layouting troubles, e.g. when invoices span over more sites where each side should have a footer and so on.

xlsx file reading in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Read xlsx file in Java
can anybody answer how to read xlsx file in java.
Try Apache POI - the Java API for Microsoft Documents
Have a look at http://poi.apache.org/spreadsheet/index.html
"Why would you use docx4j to do this", I hear you ask, "rather than POI, which focuses on xlsx and binary xls?"
Probably because you like JAXB (as opposed to XML Beans), or you are already using docx4j for docx or pptx, and need to be able to do some stuff with xlsx as well.
Another possible reason is that the jar XML Beans generates from the OpenXML schemas is too big for your purposes. (To get around this, POI offers a 'lite' subset: the 'big' ooxml-schemas-1.0.jar is 14.5 MB! But if you need to support arbitrary spreadsheets, you'll probably need the complete jar). In contrast, the whole of docx4j/pptx4j/xlsx4j weighs in at about the same as POI's lite subset.
If you are processing spreadsheets only (ie not docx or pptx), and preceding paragraph is not a concern for you, then you would probably be best off using POI.

Categories