split docx based on footer using apache poi or docx4j - java

i have a big docx file and i want to split it to a new docx containing only pages with the footer which contains "Appendix B" word in it, can i have some code example or any help.

You can have an algorithm:
inspect the footers to find which ones contain the words of interest. Note the relId in the rels part pointing to such footer.
now go through the main document part, looking at the sectPr elements. Find sectPr elements containing the relId(s). Note that it might be implicit (same as previous).
Provided your footer applies to every page in the relevant section(s), then you can just delete the content before and after, then save the resulting docx.

Related

Apache POI for docx Insert Text On Specific Page

I'm trying to make Table of Contents for my Word document docx.
Apache POI is still too buggy. The document.createTOC() does not produce anything unless placed at the end. Sometimes, it doesn't give the correct page numbers.
The document.enforceUpdateFields() doesn't do anything!
So I thought I make my own method that creates the Table of Content. However, I will call it at the end but I need it to be inserted at the beginning!
In other words, suppose my document at some point in my program has some text on the first page and the second page. And I haven't yet saved it; how do I insert at the beginning of first page?
I haven't tried this yet. But, after you write the document. Reload it again
Then try the following:
List<XWPFParagraph> paragraphs = document.getParagraphs();
XWPFRun run = paragraphs.get(0).insertNewRun(0); // first paragraph, 0 is the position
run.setText("your data here");

iText - add content to the bottom of an existing page

I want to add a piece of text to every page of a PDF file. This answer in SO works fine. But, the text is added to the top of the page. I would like to add my text to the bottom of each page. How do I do this?
Here is the relevant part of the code.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
// Create a new page in the target for each source page.
while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
document.newPage();
pageOfCurrentReaderPDF++;
currentPageNumber++;
page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF);
cb.addTemplate(page, 0, 0);
document.add(new Paragraph("My Text here")); //As per the SO answer
}
pageOfCurrentReaderPDF = 0;
}
The code is part of a function which accepts a folder, reads the PDF files in it and merges them into one single file. So, I would like to add the text in the above loop itself, instead of iterating the file once again.
If you want to automatically add content to every page, you need a page event.
This is explained in chapter 5 of my book" iText in Action - Second Edition".
If you don't own a copy of the book, you can consult the examples here.
You can also find solutions by looking for the keyword Header / Footer.
The example you're referring to doesn't look correct at first sight. Sure, you can use "two passes", one to create the content, and another to add headers or footers, but the suggested solution is different from the recommended solution: http://itextpdf.com/examples/iia.php?id=118
You are copying the mistake in your question: why on earth would you import the document you've just created into a new document, thus throwing away all possible interactivity you've added to that document? It just doesn't make sense. It's unbelievable that this answer received that many up-votes. I'm the original developer of iText and I'm not at all happy with that answer!
In your case, there may be no need to create the document in memory first and to add the footer afterwards. Just take a look at http://itextpdf.com/examples/iia.php?id=104
You need to create a PdfPageEvent implementation (for instance using PdfPageEventHelper) and you need to implement the onEndPage() method.
Documented caveats:
Do not use onStartPage() to add content,
Do not add anything to the Document object passed to the page event,
Unless you specified a different page size, the lower-left corner has the coordinate x = 0; y = 0. You need to take that into account when adding the footer. The y-value for the footer is lower than the y-value for the header.
For more info: consult my book.
Have a look at chapter 6 of iText in Action, 2nd edition, especially at subsection 6.4.1: Concatenating and splitting PDF documents.
Listing 6.22, ConcatenateStamp.java, shows you how you should create a PDF from copies of pages (in your case: all pages) of multiple other PDFs; the sample additionally adds a new "Page X of Y" footer; this demonstrates how you can add content at given positions on the pages while merging the source files.
Perhaps this may be of assistance here... I suspect you want to do something like the following:
cb.addTemplate(page, 0, 0);
document.add(new Paragraph("My Text here"));
document.setFooter(new HeaderFooter("Footnote goes here"));
}
pageOfCurrentReaderPDF = 0;

Replace text with an image docx4j

I have an word template. There is an word photo that has to be replaced with an image. This has to be done with Docx4Java.
How do I do this?
If specifically looking to replace a text with an image(which is not possible using docx4j as answered above), you can use replace bookmark with image as an alternative.
Just open your templated word file, position the cursor at desired location and insert->bookmark and name your bookmark.
I followed the instructions here to replace this bookmark with an image
Disclosure: I manage the docx4j project
The VariableReplace code doesn't handle images.
The best way to do this would be to use data bound content controls, specifically a picture content control pointing via XPath at a base-64 encoded image in an XML document (see Getting Started for details).
However, if you want to replace a word with an image, you can do so, but you'll have to write a bit of glue code. It is pretty straightforward.
First, find the word. You can do this using XPath or TraversalUtil (again, see Getting Started for details).
Hopefully it is in a run (w:r/w:t) by itself. If not, you'll need to split the run up so you don't replace adjacent text.
Then, add the image. See the sample ImageAdd.
I suggest you have a look at the XML created when you add an image in Word (ie save and unzip your docx, then look at document.xml). Take care that the XML representing the image is at the correct level (eg child of w:p).

How to insert content in the middle of a page in a PDF using IText

I have a requirement to insert Content into the middle of the page in a PDF.
The Content may be a Dynamic Table or an Image.
My Concept was to first split the PDF into 2 parts, then get the new Content that is to be added and append by replacing a place holder field.
the Splitting is called Tiling as per IText and here is an example for the same.
http://itextpdf.com/examples/iia.php?id=116
The Code above has 2 drawbacks:
1. It splits the page into 16 parts. but that is part of the example. Still i cant figure out a way to split the file into 2 parts only.
2. secondly the split page is converted to a complete page thus disturbing its proportions.
The Rearranging code is the another problem.
The remaining Content should be re-ordered in append mode. but till yet i have only found codes to add complete new pages rather than just the content.
I have found a code that appends the PDF content by replacing a placeholder:
float[] fieldPosition= pdfTemplate.getAcroFields().getFieldPositions("tableField");
PdfPTable table = buildTable();
PdfContentByte cb = stamper.getOverContent(1);
table.writeSelectedRows(0, -1, fieldPosition[1],fieldPosition[4],cb);
Please help me to solve this requirement.
PDF is a presentation format, not an edition format. In other words, it is not designed to allow content insertion, with the original content reflowing gracefully. As a consequence, no tool (at least, none that I know of, and surely not iText) will enable you to achieve what you were given as a requirement.
My advice :
refuse the assignment since it's not feasible, or
get your hands on the original document, insert the desired extra content, and then convert to PDF.

Extracting text between two bookmarks using Apache PdfBox

I am using Apache PDFBox to read a PDF document that has a hierarchy defined by bookmarks. The hierarchy is in a tree form with contents only at the leaf level.
Extracting the text between two leaf level bookmarks using the following code:
Stripper.setStartBookmark(),
Stripper.setEndBookmark(),
Stripper.writeText()),
Returns text in the whole page instead. In short, my problem is similar to that mentioned in this thread.
Is there a way to extract the contents between two bookmarks?
If so, what should be the change in my code?
I am guessing that your bookmark does not contain the correct data.
It sounds like the bookmark you are using is only pointing to the page where your content starts, rather than a location on the page.
Here is an example of a bookmark that contains location data:
<Title Action="GoTo" Style="bold" Page="2 FitH 518">
Title Name
</Title>

Categories