Updating a .docx file's page header using Apache POI - java

How can I update the page header of a .docx file using the Apache POI 3.7 API?

Since your document is in .docx format, you'll need to use the XWPF component API of the POI project. You may find the org.apache.poi.xwpf.usermodel.XWPFHeader class useful (Javadoc), but I've never used it myself.
I couldn't find a good reference for doing this with XWPF, but the following instructions describe accessing headers with HWPF, the analagous interface for older Word documents (AKA .doc docs):
To get at the headers and footers of a Word document, first create a org.apache.poi.hwpf.HWPFDocument. Next, you need to create a org.apache.poi.hwpf.usermodel.HeaderStores, passing it your HWPFDocument. Finally, the HeaderStores gives you access to the headers and footers, including first / even / odd page ones if defined in your document. Additionally, HeaderStores provides a method for removing any macros in the text, which is helpful as many headers and footers do end up with macros in them.
The page those instructions are from implies that header support was never that good in HWPF, let alone XWPF. For more bad news, this other Apache page makes it sound like XWPF development has all but stalled. It's possible that what you want to do is planned but not supported yet.

Check out Writing Microsoft Word Documents in Java With Apache POI
I never worked with Word file before, but done so with POI library for excel stuff, they are quite easy to follow (they model the row, column, sheet etc for excel) so I am assuming they will be equally easy to do for Word files.
And do quick read on their guide Apache POI - HWPF - Java API to Handle Microsoft Word Files

First up, call getHeaderFooterPolicy() on your XWPFDocument, which returns a HeaderFooterPolicy. From that, you can identify the appropriate header for your page (eg Default, First Page etc)
Once you have the appropriate XWPFHeader that you want to change, then you can go about editing it as any other document part. You can fetch the tables, the paragraphs etc, then remove them, add new ones, change the text of them etc. It's all the same process then as editing the main document.

Related

how to access blank cells from excel file using apache-tika

I am using Apache-Tika-1.13 to read and work with the excel file contents, I am working good with it but I have problem when I access excel file that contains blank cells. I need to access that blank cells, do Tika provides any way to access blank cells with it's latest release?
No. Apache Tika provides a consistent, easy to use and simplified view across a wide range of file formats. As such, not everything possible with the underlying libraries is possible with Tika.
You'll need to step down to Apache POI, the library that Tika uses for Excel files, if you want fine-grained control over blank and missing cells. Then, see the POI documentation on iterating over cells, including with missing/blank cell control for how to do what you want

Convert multiple files into PDF

My questions is rather theoretical. I need to implement an application that takes different file extensions such as [asp,bmp,doc,docx,html,jpg,pdf,pdf,png,pptx,sql,txt,xls,xlsx] and converts them all into a consecutive PDF file for print.
I did my research in terms of coding and found multiple libraries that do the job
such as Apache POI, iText, aspose.pdf.jar and others I tested them out on individual portions of the idea. They work great but require a lot of women-hours to implement the desired application. My question is, is there anything more complete that will speed up the job. For example a library such as apache POI that does not require specifying every single padding and background color.
Any suggestions are appreciated.
As per my knowledge and experience not a single API is present there who provide all solution without this. Because of that we need to use Apache POI, iText, aspose.pdf.jar , FOP.
In java(using jquery as per my knowledge) you are able take snapshot of rendered page using stream which you will need to proceed at serverside and generate PDF which generate pdf file same like html page without any extra formatting but it having limitation that you can't use landscape. If page data is big then it shrinks pdf which causes small font(in very large page even you unable to read pdf).

populate data from java to pdf

I am currently looking for a solution to populate a PDF file from the java code, Is there any APi for that, I googled it and found JustformsPDF does the trick, but it works only for some pdf and not all + its a old API without recent developments/support.
Basically I have an existing PDF (and do not want to build it) I just need to populate my java data inside that pdf.
any suggestions ?
These two should be able to do the job (these are the ones I know, there may be more):
Apache PDFBox
iText
You can use our product PDFOne (for Java). It has a document component that can also fill form fields in existing PDF documents. Viewer and printer components are also available.

Generate PowerPoint 2007/2010 file using Java

Does anyone know of any API (commercial or open-source) that can generate/edit PowerPoint 2007/2010 presentations through Java. I have a template in the PowerPoint 2007/2010 format that I require to edit/update. So far I have been converting the .pptx file to xml and then editing and storing it back as .pptx. But the file gets corrupted while opening.
Is anyone aware of any other method or API that do this in Java?
We have done it programmatically (closed source at the moment, sorry) so might be able to help, but beware of a few gotchas.
One is that the POI project (at least when we looked at it last year), was quite incomplete. It didn't do PPTX Charts - which is the one feature we wanted. Infact the POI site may not be upto date, but they don't appear to support PowerPoint 20087 format (http://poi.apache.org/slideshow/index.html). Everybody recommends this project, but our evaluation was that it was pretty much useless for generating PowerPoint 2007 files via Java. Your milage may vary.
Apose also had some significant limitations when we looked at it; not doing Charts in PowerPoint 2007 being the blocking issue for us.
Another issue is that PowerPoint 2007 can be quite buggy. We have had a number of progammatically produced PPT files that caused lock ups, but when testing, we found that we can repro crashes and lock ups with simple PPTX documents created in PowerPoint 2007 - i.e. not our code.
In the end, we did the following: Unpacked a 'template' PowerPoint file to a folder, then on demand, filled the template XML with new values, zipped it up, renaming various elements & delivered it to the user as a valid PPTX. Works OK, other than the odd PowerPoint crash when people edit the file. If there was a market for it, I guess we could package up the code as a webservice (i.e xml/csv -> PPTX) or put together a commerical package, but we wouldn't do it for free.
docx4j (apache license) now includes a pptx4j component, which can open/edit/save pptx documents.
Yes. Check this out http://poi.apache.org/, they just released version 3.6 which now supports Office 2007 format documents. The best part is that it's free!
To generate a PowerPoint presentation from a template file, you can use PPT Templates.
This library provides a fluent API to replace variables inside the PPT template:
try(FileOutputStream out = new FileOutputStream("generated.pptx")) {
new PptMapper()
.text("variable", "Hello")
.text("other_variable", "World!")
.processTemplate(PptTemplateDemo.class.getResourceAsStream("/title.pptx"))
.write(out);
}
With this library, you can process text and images in the template.
Another solution that may work for you is Windward Reports (disclaimer, I'm the founder & CEO there). It uses PPTX as one of the supported template formats and merges in data to then generate a PPTX (or PDF, etc.) output.
If the edit/update you need can be handled via the data tags in Windward, this should be trivial for you. If what you need cannot be handled by the tags, then this won't work for you.
Well as mentioned by GrantB best way is to create a template, then load the template , traverse the xml graph,update the data and stream out to a output ppt. We recently did it to generate reports for clients that had complex visuals and charts in ppt. You can have a look here generate ppt in java

How do I use Apache POI to read a .DOC file in Java to separate images from text?

I need to read a Word .doc file from Java that has text and images. I need to recognize the images & text and separate them into 2 files.
I've recently heard about "Apache POI." How I can use Apache POI to read Word .doc files?
The examples and sample code on apache's site are pretty good. I recommend you start there.
http://poi.apache.org/hwpf/quick-guide.html
To get specific bits of text, first create a org.apache.poi.hwpf.HWPFDocument. Fetch the range with getRange(), then get paragraphs from that. You can then get text and other properties.
Here for an example of extracting an image. Here for the latest revision as of this writing.
And of course, the Javadocs
Note that, according to the POI site,
HWPF is still in early development.
It's not free (or even cheap!) but Aspose.Words should be able to do this. Their evaluation download will let you play with small files.
Do the destination files also have to be Docs? You could open the docs in Office and save them out as HTML. Then the separation becomes trivial. RTF is also a viable option, but I can't recommend a good RTF parser off the top of my head.
Edit to say: I just remembered another possible solution: Jacob, but you'll need an instance of Office running on the same machine. It's short for Java COM Bridge and it lets you make calls to the COM libraries in Office to manipulate the documents. I'm sure it's not as scary as it might sound!

Categories