Split PDF Into Separate Files By Child Bookmarks

Split PDF Into Separate Files By Child Bookmarks - java

I am trying to split PDF file (book) to multiple files by child bookmarks in code
Use case: table of contents of a book is available for a user. User can select up to n sections (might be not sequential) to preview. Application need to extract this sections and merge into single preview PDF
I found few tools, while looking into the solution in internet: Aspose, Spire (E-IceBlue), etc.
All of them can split PDF by pages (top bookmarks), but I need to split PDF by child bookmarks. It means, that area to extract can be started and/or finished at the middle of the page.
Ideally to have abiliti to do this in java code, but if someone knows solution in any other programming language or CLI program - it also would be great

It depends whether you insist that the non-chosen content on a page be redacted or not. For example, if section 6.3.2 takes up the middle half of a page, do you care if the end of 6.3.1 and the beginning of 6.3.3 are shown in the output on the same page?
If you don't care, cpdf can do this easily. Just output the bookmark data as JSON:
cpdf -list-bookmarks-json -utf8 in.pdf > marks.json
Then you can parse this JSON to show the list of bookmarks, and choose which pages to extract based on child bookmark page numbers.
As for redaction, you could use -add-rectangle or -hard-box to clean up the output based on the coordinates from the JSON bookmarks file, but that's not real redaction -- it just removes the content from view.

Related

Is there a way to set page orientation to landscape when a specific element is found in OpenHtmlToPDF (or another HTML to PDF converter)?

The functionality for the project that I am currently working is to get data from a WYSIWYG editor and convert all the input to a PDF document. The problem is sometimes there is necessary to add wider tables and this produces a truncated visualization of them.
To solve this problem, I added to the editor (specifically, CKEditor) a HR button but I renamed it to "Change page orientation", so users can click that before inserting a table. In Java, I used iText 7 to detect this element (<hr>) and change the page orientation. This works like a charm.
Example using iText with a simple table
Now, requirements changed and for license purposes we need to replace iText for another HTML to PDF converter, but we need to keep this functionality.
I found OpenHTMLToPdf and I liked it, but I didn't find the way to replicate this page orientation when a hr (or another specific element) is found.
How can I solve that? I can use whatever library as long as they are open source.

add custom title after page break

i am using docx4j library and using templating to genearte report from my application.
i have following requirement,
When a page break comes between a paragraph content, i need to add a custom title before the next page content starts as you see in figure.
I know if we need to repeat same title , we can achieve it by using table and repeating header row. But there title will be same. Here I need custom title.
Paragraph is getting populatated from backend and how do we figure out page breaks happens at code level ?
Thanks in advance

This has to do with the Word Object Model. Word really does not have the concept of pages in the underlying structure of a document. Word Doesn't Know What a Page Is by Daiya Mitchell, MVP
Because of that, this would be better posted in the Word Answers forum hosted by Microsoft.
There are ways to deal with this using headers (not table headers necessarily although they can be used) or using a shape anchored to the table to occlude the word "continued" in the original header.
When you say templating, are you talking about Word templates (term of art) or something else?

Html search validation android

I have an android app with a search functionality. The search functionality loops through locally stored html files and appends a span with a background color to words that equal the imputed word, the same as if you press ctrl -f on your desktop. The problem i am having is that if the user searches for head, body, div, span etc it adds a span to the html tags. My question. Is there an android validation library that deals with this issue or do i need to make my own blacklist? I am aware of Android form validator's libraries but but i am not sure that they are built for what i am looking for.

I've use jsoup before to strip out unwanted html tags. You could do this in order to make the html data more "searchable". Also look at Android's Html.escapeHtml(CharSequence) that converts html into a String.

Expandable list in pdf files

Can i create an expandable list in pdf files. Expandable list will be of the form :
+Item1
+Item2
-Item3
-Subitem3.1
-Subitem3.2
+Item4
-Item5
-Subitem5.1
-Subitem5.2
-Subitem5.3
Also I need to create the pdf file from Java(I was thinking of using iText, is another library better/easier?). Is this possible. Or is a report in some other standard format(not pdf or html) an easier way out.

First this: I'm the creator of iText, so forgive me for not pointing you to other solutions ;-)
Now for your question: you're asking for dynamic functionality (a tree structure that opens/closes upon user interaction) inside a PDF document.
The most obvious answer is: this isn't possible. When creating PDF, think of paper. Can you print a tree structure on paper that opens/closes when the end user touches the paper? No, you can't, therefore you're asking something that isn't possible in PDF.
The less obvious answer is: it depends. What type of PDF are we talking about?
If you're talking about an interactive XFA form, then you may be able to achieve what you want. The XML Forms Architecture (XFA) is an XML specification that can be used to define interactive forms. When you use XFA, the PDF is nothing more than a container for XML. This XML is rendered dynamically inside Adobe Reader. How to create an XFA form? I only know about two products: Adobe LiveCycle Designer and Avoka Smart Forms Designer.
If you're talking about 'regular PDF', then one option is to embed a swf file. In this case, the tree structure will be rendered by Flash player (which could be a disadvantage, because this might not work with all PDF viewers). Another disadvantage: the tree structure will be confined to a fixed rectangle on a fixed page.
Finally: you can have create such a structure in the bookmarks panel. In PDF terminology, those bookmarks are called Outlines. Obviously, the tree structure won't be a part of the printable content. It will be visible in a separate panel in your PDF viewer.

Replace text with an image docx4j

I have an word template. There is an word photo that has to be replaced with an image. This has to be done with Docx4Java.
How do I do this?

If specifically looking to replace a text with an image(which is not possible using docx4j as answered above), you can use replace bookmark with image as an alternative.
Just open your templated word file, position the cursor at desired location and insert->bookmark and name your bookmark.
I followed the instructions here to replace this bookmark with an image

Disclosure: I manage the docx4j project
The VariableReplace code doesn't handle images.
The best way to do this would be to use data bound content controls, specifically a picture content control pointing via XPath at a base-64 encoded image in an XML document (see Getting Started for details).
However, if you want to replace a word with an image, you can do so, but you'll have to write a bit of glue code. It is pretty straightforward.
First, find the word. You can do this using XPath or TraversalUtil (again, see Getting Started for details).
Hopefully it is in a run (w:r/w:t) by itself. If not, you'll need to split the run up so you don't replace adjacent text.
Then, add the image. See the sample ImageAdd.
I suggest you have a look at the XML created when you add an image in Word (ie save and unzip your docx, then look at document.xml). Take care that the XML representing the image is at the correct level (eg child of w:p).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.