Creating Table of contents for an existing Pdf - java

I am trying to create a table of contents for an existing pdf file and subsequently merge the table of contents page to the pdf file.Page heading and corresponding page numbers are available in a separate excel file.
I am using iText for Pdf manipulation.
All the examples i came across were related to inserting links while creating a new pdf. But in my case I want to create links for existing pages.
Any suggestion or examples would be highly appreciated.

I finally found the answer. Thanks to my friend for pointing me towards an example of this in c#.
The code in java looks like :
public class Test1 {
public static void main(String args[]) throws Exception{
PdfReader reader = new PdfReader(new RandomAccessFileOrArray("C:\\test.pdf"), null);
Document doc = new Document(reader.getPageSize(1));
PdfWriter writer = PdfWriter.getInstance(doc, new FileOutputStream("C:\\result.pdf"));
Font link = FontFactory.getFont("Arial", 12, Font.UNDERLINE);
doc.open();
PdfContentByte pdfContentByte = writer.getDirectContent();
Anchor topAnchor = null;
PdfImportedPage page = null;
for (int i = 1; i < reader.getNumberOfPages(); i++)
{
if (i == 1)
{
Anchor click = new Anchor("Click to go to Target");
click.setReference("#target");
Paragraph p1 = new Paragraph();
p1.add(click);
doc.add(p1);
page = writer.getImportedPage(reader, i);
doc.newPage();
pdfContentByte.addTemplate(page, 0, 0);
}
else
{
if (i == 5)
{
Anchor target = new Anchor("My targer");
target.setName("target");
Paragraph p3 = new Paragraph();
p3.add(target);
doc.add(p3);
}
page = writer.getImportedPage(reader, i);
doc.newPage();
pdfContentByte.addTemplate(page, 0, 0);
}
}
doc.close();
}
}

I looked for a patch I made in Flying Saucer that made it possible to click on a link and make it do something. This was a very long time ago so I'm not sure if this will help or not, but it seems to be using iText for this. Here's the snippet of code I think I wrote for this :P
private void processLink(RenderingContext c, Box box) {
Element elem = box.getElement();
if (elem != null) {
NamespaceHandler handler = _sharedContext.getNamespaceHandler();
String uri = handler.getLinkUri(elem);
if (uri != null) {
if (uri.length() > 1 && uri.charAt(0) == '#') {
String anchor = uri.substring(1);
Box target = _sharedContext.getBoxById(anchor);
if (target != null) {
PdfDestination dest = createDestination(c, target);
if (dest != null) {
PdfAction action = new PdfAction();
if (!"".equals(handler.getAttributeValue(elem, "onclick"))) {
action = PdfAction.javaScript(handler.getAttributeValue(elem, "onclick"), _writer);
} else {
action.put(PdfName.S, PdfName.GOTO);
action.put(PdfName.D, dest);
}
com.lowagie.text.Rectangle targetArea = checkLinkArea(c, box);
if (targetArea == null) {
return;
}
targetArea.setBorder(0);
targetArea.setBorderWidth(0);
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(),
targetArea.getRight(), targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);
annot.setBorderStyle(new PdfBorderDictionary(0.0f, 0));
annot.setBorder(new PdfBorderArray(0.0f, 0.0f, 0));
_writer.addAnnotation(annot);
}
}
} else if (uri.indexOf("://") != -1) {
PdfAction action = new PdfAction(uri);
com.lowagie.text.Rectangle targetArea = checkLinkArea(c, box);
if (targetArea == null) {
return;
}
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(), targetArea.getRight(),
targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);
annot.setBorderStyle(new PdfBorderDictionary(0.0f, 0));
annot.setBorder(new PdfBorderArray(0.0f, 0.0f, 0));
_writer.addAnnotation(annot);
}
}
}
}
This is in the org.xhtmlrenderer.pdf.ITextOutputDevice.
I'm not sure what context you're working in. Is that enough information to figure out how to do this?

Related

How move MediaBox to 0.0 with itext(itextsharp)

The MediaBox coordinate of my PDF file is (-8,-8), now I want to set it (0,0).
I tried to set it directly, but the contents of the file were offset.
So I want to change the MediaBox coordinates and move the content as well.
Here's the itextshare code(c#). I'm glad to be able to solve it with Java itext.
using (PdfReader pdfReader = new PdfReader(#"MediaBoxZero.pdf"))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(#"MediaBoxZero_Result.pdf", FileMode.Create)))
{
var mediaBox = pdfReader.GetBoxSize(1, "media");
PdfArray mediaBoxN = new PdfArray();
mediaBoxN.Add(new float[] { 0, 0, mediaBox.Width, mediaBox.Height });
for (int curPageNum = 1; curPageNum <= pdfReader.NumberOfPages; ++curPageNum)
{
PdfDictionary pagedict = pdfReader.GetPageN(curPageNum);
pagedict.Put(PdfName.MEDIABOX, mediaBoxN);
}
}
}
I tried affine transformation, but it didn't work. Affine transformation should only work when generating new PDFs, and I wanted to edit existing PDFs.
using (PdfReader pdfReader = new PdfReader(#"MediaBoxZero.pdf"))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(#"MediaBoxZero_Result.pdf", FileMode.Create)))
{
PdfContentByte pb = stamper.GetOverContent(1);
AffineTransform at = new AffineTransform();
at.Translate(100,0);
pb.Transform(at);
pb.ConcatCTM(at);
//var mediaBox = pdfReader.GetBoxSize(1, "media");
//PdfArray mediaBoxN = new PdfArray();
//mediaBoxN.Add(new float[] { 0, 0, mediaBox.Width, mediaBox.Height });
//for (int curPageNum = 1; curPageNum <= pdfReader.NumberOfPages; ++curPageNum)
//{
// PdfDictionary pagedict = pdfReader.GetPageN(curPageNum);
// foreach (var item in pagedict.GetEnumerator())
// {
// }
// pagedict.Put(PdfName.MEDIABOX, mediaBoxN);
//}
}
}
}
You can shift the page contents like this with iTextSharp to match the MediaBox change:
using (PdfReader pdfReader = new PdfReader(SOURCE_PDF))
{
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
Rectangle mediaBox = pdfReader.GetPageSize(i);
if (mediaBox.Left == 0 && mediaBox.Bottom == 0)
continue;
PdfDictionary pageDict = pdfReader.GetPageN(i);
pageDict.Put(PdfName.MEDIABOX, new PdfArray { new PdfNumber(0), new PdfNumber(0),
new PdfNumber(mediaBox.Width), new PdfNumber(mediaBox.Height) });
Rectangle cropBox = pdfReader.GetBoxSize(i, "crop");
if (cropBox != null)
{
pageDict.Put(PdfName.CROPBOX, new PdfArray { new PdfNumber(cropBox.Left - mediaBox.Left),
new PdfNumber(cropBox.Bottom-mediaBox.Bottom), new PdfNumber(cropBox.Right - mediaBox.Left),
new PdfNumber(cropBox.Top - mediaBox.Bottom) });
}
using (MemoryStream stream = new MemoryStream())
{
string translation = String.Format(CultureInfo.InvariantCulture, "1 0 0 1 {0} {1} cm\n", -mediaBox.Left, -mediaBox.Bottom);
byte[] translationBytes = Encoding.ASCII.GetBytes(translation);
stream.Write(translationBytes, 0, translationBytes.Length);
byte[] contentBytes = pdfReader.GetPageContent(i);
stream.Write(contentBytes, 0, contentBytes.Length);
pdfReader.SetPageContent(i, stream.ToArray());
}
}
using (FileStream fileStream = new FileStream(#"MediaBox-normalized.pdf", FileMode.Create))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, fileStream))
{
}
}
Some remarks:
You cannot simply manipulate the UnderContent of the respective page because iText attempts to prevent changes to the graphics state there (like a change of the current transformation matrix) from bleeding through to the existing content. Thus, here we update the page content on the byte level.
The code updates the CropBox (if set) alongside the MediaBox. Strictly speaking it should also update the other boxes known to PDF pages.
The code ignores annotations. If your PDFs have annotations, you have to also move the annotation Rects and other annotation properties in userspace coordinates, like QuadPoints, Vertices, etc.

PDFBox: put two A4 pages on one A3

I have a pdf document with one or more pages A4 paper.
The resulting pdf document should be A3 paper where each page contains two from the first one (odd on the left, even on the right side).
I already got it to render the A4 pages into images and the odd pages are successfully placed on the first parts of a new A3 pages but I cannot get the even pages to be placed.
public class CreateLandscapePDF {
public void renderPDF(File inputFile, String output) {
PDDocument docIn = null;
PDDocument docOut = null;
float width = 0;
float height = 0;
float posX = 0;
float posY = 0;
try {
docIn = PDDocument.load(inputFile);
PDFRenderer pdfRenderer = new PDFRenderer(docIn);
docOut = new PDDocument();
int pageCounter = 0;
for(PDPage pageIn : docIn.getPages()) {
pageIn.setRotation(270);
BufferedImage bufferedImage = pdfRenderer.renderImage(pageCounter);
width = bufferedImage.getHeight();
height = bufferedImage.getWidth();
PDPage pageOut = new PDPage(PDRectangle.A3);
PDImageXObject image = LosslessFactory.createFromImage(docOut, bufferedImage);
PDPageContentStream contentStream = new PDPageContentStream(docOut, pageOut, AppendMode.APPEND, true, true);
if((pageCounter & 1) == 0) {
pageOut.setRotation(90);
docOut.addPage(pageOut);
posX = 0;
posY = 0;
} else {
posX = 0;
posY = width;
}
contentStream.drawImage(image, posX, posY);
contentStream.close();
bufferedImage.flush();
pageCounter++;
}
docOut.save(output + "\\LandscapeTest.pdf");
docOut.close();
docIn.close();
} catch(IOException io) {
io.printStackTrace();
}
}
}
I'm using Apache PDFBox 2.0.2 (pdfbox-app-2.0.2.jar)
Thank you very much for your help and the link to the other question - I think I already read it but wasn't able to use in in my code yet.
But finally the PDFClown made the job, though I think it's not very nice to use PDFBox and PDFClown in the same program.
Anyway here's my working code to combine A4 pages on A3 paper.
public class CombinePages {
public void run(String input, String output) {
try {
Document source = new File(input).getDocument();
Pages sourcePages = source.getPages();
Document target = new File().getDocument();
Page targetPage = null;
int pageCounter = 0;
double moveByX = .0;
for(Page sourcePage : source.getPages()) {
if((pageCounter & 1) == 0) {
//even page gets a blank page
targetPage = new Page(target);
target.setPageSize(PageFormat.getSize(PageFormat.SizeEnum.A3, PageFormat.OrientationEnum.Landscape));
target.getPages().add(targetPage);
moveByX = .0;
} else {
moveByX = .50;
}
//get content from source page
XObject xObject = sourcePages.get(pageCounter).toXObject(target);
PrimitiveComposer composer = new PrimitiveComposer(targetPage);
Dimension2D targetSize = targetPage.getSize();
Dimension2D sourceSize = xObject.getSize();
composer.showXObject(xObject, new Point2D.Double(targetSize.getWidth() * moveByX, targetSize.getHeight() * .0), new Dimension(sourceSize.getWidth(), sourceSize.getHeight()), XAlignmentEnum.Left, YAlignmentEnum.Top, 0);
composer.flush();
pageCounter++;
}
target.getFile().save(output + "\\CombinePages.pdf", SerializationModeEnum.Standard);
source.getFile().close();
} catch (FileNotFoundException fnf) {
log.error(fnf);
} catch (IOException io) {
log.error(io);
}
}
}

Splitting a large Pdf file with PDFBox gets large result files

I am processing some large pdf files, (up to 100MB and about 2000 pages), with pdfbox. Some of the pages contain a QR code, I want to split those files into smaller ones with the pages from one QR code to the next.
I got this, but the result file sizes are the same as the source file. I mean, if I cut a 100MB pdf file into a ten files I am getting ten files 100MB each.
This is the code:
PDDocument documentoPdf =
PDDocument.loadNonSeq(new File("myFile.pdf"),
new RandomAccessFile(new File("./tmp/temp"), "rw"));
int numPages = documentoPdf.getNumberOfPages();
List pages = documentoPdf.getDocumentCatalog().getAllPages();
int previusQR = 0;
for(int i =0; i<numPages; i++){
PDPage page = (PDPage) pages.get(i);
BufferedImage firstPageImage =
page.convertToImage(BufferedImage.TYPE_USHORT_565_RGB , 200);
String qrText = readQRWithQRCodeMultiReader(firstPageImage, hintMap);
if(qrText != null and i!=0){
PDDocument outputDocument = new PDDocument();
for(int j = previusQR; j<i; j++){
outputDocument.importPage((PDPage)pages.get(j));
}
File f = new File("./splitting_files/"+previusQR+".pdf");
outputDocument.save(f);
outputDocument.close();
documentoPdf.close();
}
I also tried the following code for storing the new file:
PDDocument outputDocument = new PDDocument();
for(int j = previusQR; j<i; j++){
PDStream src = ((PDPage)pages.get(j)).getContents();
PDStream streamD = new PDStream(outputDocument);
streamD.addCompression();
PDPage newPage = new PDPage(new
COSDictionary(((PDPage)pages.get(j)).getCOSDictionary()));
newPage.setContents(streamD);
byte[] buf = new byte[10240];
int amountRead = 0;
InputStream is = null;
OutputStream os = null;
is = src.createInputStream();
os = streamD.createOutputStream();
while((amountRead = is.read(buf,0,10240)) > -1) {
os.write(buf, 0, amountRead);
}
outputDocument.addPage(newPage);
}
File f = new File("./splitting_files/"+previusQR+".pdf");
outputDocument.save(f);
outputDocument.close();
But this code creates files which lacks some content and also have the same size than the original.
How can I create smaller pdfs files from a larger one?
Is it posible with PDFBox? Is there any other library with which I can transform a single page into an image (for qr recognition), and also allows me to split a big pdf file into smaller ones?
Thx!
Thx! Tilman you are right, the PDFSplit command generates smaller files. I checked the PDFSplit code out and found that it removes the page links to avoid not needed resources.
Code extracted from Splitter.class :
private void processAnnotations(PDPage imported) throws IOException
{
List<PDAnnotation> annotations = imported.getAnnotations();
for (PDAnnotation annotation : annotations)
{
if (annotation instanceof PDAnnotationLink)
{
PDAnnotationLink link = (PDAnnotationLink)annotation;
PDDestination destination = link.getDestination();
if (destination == null && link.getAction() != null)
{
PDAction action = link.getAction();
if (action instanceof PDActionGoTo)
{
destination = ((PDActionGoTo)action).getDestination();
}
}
if (destination instanceof PDPageDestination)
{
// TODO preserve links to pages within the splitted result
((PDPageDestination) destination).setPage(null);
}
}
else
{
// TODO preserve links to pages within the splitted result
annotation.setPage(null);
}
}
}
So eventually my code looks like this:
PDDocument documentoPdf =
PDDocument.loadNonSeq(new File("docs_compuestos/50.pdf"), new RandomAccessFile(new File("./tmp/t"), "rw"));
int numPages = documentoPdf.getNumberOfPages();
List pages = documentoPdf.getDocumentCatalog().getAllPages();
int previusQR = 0;
for(int i =0; i<numPages; i++){
PDPage firstPage = (PDPage) pages.get(i);
String qrText ="";
BufferedImage firstPageImage = firstPage.convertToImage(BufferedImage.TYPE_USHORT_565_RGB , 200);
firstPage =null;
try {
qrText = readQRWithQRCodeMultiReader(firstPageImage, hintMap);
} catch (NotFoundException e) {
e.printStackTrace();
} finally {
firstPageImage = null;
}
if(i != 0 && qrText!=null){
PDDocument outputDocument = new PDDocument();
outputDocument.setDocumentInformation(documentoPdf.getDocumentInformation());
outputDocument.getDocumentCatalog().setViewerPreferences(
documentoPdf.getDocumentCatalog().getViewerPreferences());
for(int j = previusQR; j<i; j++){
PDPage importedPage = outputDocument.importPage((PDPage)pages.get(j));
importedPage.setCropBox( ((PDPage)pages.get(j)).findCropBox() );
importedPage.setMediaBox( ((PDPage)pages.get(j)).findMediaBox() );
// only the resources of the page will be copied
importedPage.setResources( ((PDPage)pages.get(j)).getResources() );
importedPage.setRotation( ((PDPage)pages.get(j)).findRotation() );
processAnnotations(importedPage);
}
File f = new File("./splitting_files/"+previusQR+".pdf");
previusQR = i;
outputDocument.save(f);
outputDocument.close();
}
}
}
Thank you very much!!

pdf file is opening in a bad way after download

using iText 2.1.7 jar (or any newer version), after merging and creating a pdf file than downloading it, this file opens with the bottom part of some pages not shown. the problem is reproduced when i have an image covering the entire page of the pdf file. After downloading, i can't see the whole image in my new downloaded pdf. any idea?
switch (type){
case PDF_FILE : reader = new PdfReader(filePath);
break;
//...code...
}if (reader != null){
copyToWriter(out, reader, annotations, fileLabel);
}
code in copyToWriter function:
private void copyToWriter(OutputStream out, PdfReader reader, Annotation[] annotations, String fileLabel) throws DocumentException, IOException, BadPdfFormatException {
PdfDictionary outi = null;
PdfICCBased ib = null;
if (reader != null) {
reader.consolidateNamedDestinations();
// we retrieve the total number of pages
int n = reader.getNumberOfPages();
List bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
// original bookmark copy and page Shift
if (_pageOffset != 0) {
SimpleBookmark.shiftPageNumbers(bookmarks, _pageOffset,
null);
}
}
if(_setMergeBookmark) {
// we add specifics bookmarks for beginning page of each added file.
//...code...(not used)
}
else {
// original bookmark copy and page Shift
if (bookmarks != null) {
_master.addAll(bookmarks);
}
}
_pageOffset += n;
if (_writer == null) {
// step 1: creation of a document-object
_document = new Document();
// step 2: we create a writer that listens to the document
_writer = PdfWriter.getInstance(_document, out);
if (_stamp != null) {
_stamp.setTotalPage( n);
_writer.setPageEvent(_stamp);
}
if(_setMergeBookmark) {
_writer.setViewerPreferences(PdfWriter.PageLayoutSinglePage
| PdfWriter.PageModeUseOutlines);
}
else {
_writer.setViewerPreferences(PdfWriter.PageLayoutSinglePage
| PdfWriter.PageModeUseNone);
}
_writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
if (_pdfA_pdfX) {
//...code... (not used)
}
//
// step 3: we open the document
_document.open();
//
}
// adding the content
PdfContentByte cb = _writer.getDirectContent();
// step 4: we add content
PdfImportedPage page;
for (int i = 0; i < n;) {
++i;
page = _writer.getImportedPage(reader, i);
// get PageSize to ensure landscape/portrait is kept
// also look at rotation tag then we HAD TO use getPageSizeWithRotation
// instead of getPageSize
Rectangle gle = reader.getPageSizeWithRotation(i);
int rotation = reader.getPageRotation(i);
_document.setPageSize( gle);
_document.newPage();
float[] matf = new float[6];
if (rotation == 90) {
matf[0] = 0;
matf[1] = -1f;
matf[2] = 1f;
matf[3] = 0;
matf[4] = 0;
matf[5] = gle.getHeight();
// use transformation matrix to rotate imported page according 'rotation' value
// cb.addTemplate(page, 0, -1f, 1f, 0, 0, gle.getHeight());
}
else if (rotation == 270) {
matf[0] = 0;
matf[1] = 1f;
matf[2] = -1f;
matf[3] = 0;
matf[4] = gle.getWidth();
matf[5] = 0;
// use transformation matrix to rotate imported page according 'rotation' value
// cb.addTemplate(page, 0, 1f, -1f, 0, gle.getWidth(), 0);
}
else { //my case here
matf[0] = 1f;
matf[1] = 0;
matf[2] = 0;
matf[3] = 1f;
matf[4] = 0;
matf[5] = 0;
// cb.addTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
cb.addTemplate(page, matf[0], matf[1], matf[2], matf[3], matf[4], matf[5]);
//copy text annotation on destination
PdfDictionary pageannot = reader.getPageN(i);
PdfArray annots = null;
annots = pageannot.getAsArray(PdfName.ANNOTS);
//if annots is null... code...
}
}
}

How to read bookmarks in PDF using itext at multi level?

I am using iText-Java to split PDFs at bookmark level.
Does anybody know or have any examples for splitting a PDF at bookmarks that exist at a level 2 or 3?
For ex: I have the bookmarks in the following levels:
Father
|-Son
|-Son
|-Daughter
|-|-Grand son
|-|-Grand daughter
Right now I have below code to read the bookmark which reads the base bookmark(Father). Basically SimpleBookmark.getBookmark(reader) line did all the work.
But I want to read the level 2 and level 3 bookmarks to split the content present between those inner level bookmarks.
public static void splitPDFByBookmarks(String pdf, String outputFolder){
try
{
PdfReader reader = new PdfReader(pdf);
//List of bookmarks: each bookmark is a map with values for title, page, etc
List<HashMap> bookmarks = SimpleBookmark.getBookmark(reader);
for(int i=0; i<bookmarks.size(); i++){
HashMap bm = bookmarks.get(i);
HashMap nextBM = i==bookmarks.size()-1 ? null : bookmarks.get(i+1);
//In my case I needed to split the title string
String title = ((String)bm.get("Title")).split(" ")[2];
log.debug("Titel: " + title);
String startPage = ((String)bm.get("Page")).split(" ")[0];
String startPageNextBM = nextBM==null ? "" + (reader.getNumberOfPages() + 1) : ((String)nextBM.get("Page")).split(" ")[0];
log.debug("Page: " + startPage);
log.debug("------------------");
extractBookmarkToPDF(reader, Integer.valueOf(startPage), Integer.valueOf(startPageNextBM), title + ".pdf",outputFolder);
}
}
catch (IOException e)
{
log.error(e.getMessage());
}
}
private static void extractBookmarkToPDF(PdfReader reader, int pageFrom, int pageTo, String outputName, String outputFolder){
Document document = new Document();
OutputStream os = null;
try{
os = new FileOutputStream(outputFolder + outputName);
// Create a writer for the outputstream
PdfWriter writer = PdfWriter.getInstance(document, os);
document.open();
PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data
PdfImportedPage page;
while(pageFrom < pageTo) {
document.newPage();
page = writer.getImportedPage(reader, pageFrom);
cb.addTemplate(page, 0, 0);
pageFrom++;
}
os.flush();
document.close();
os.close();
}catch(Exception ex){
log.error(ex.getMessage());
}finally {
if (document.isOpen())
document.close();
try {
if (os != null)
os.close();
} catch (IOException ioe) {
log.error(ioe.getMessage());
}
}
}
Your help is much appreciated.
Thanks in advance! :)
You get an ArrayList<HashMap> when you call SimpleBookmark.getBookmark(reader); (do the cast if you need it). Try to iterate through that Arraylist and see its structure. If a bookmarks have sons (as you call it), it will contains another list with the same structure.
A recursive method could be the solution.
Reference for those who are looking at this using itext7
public void walkOutlines(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument,List<String>titles,List<Integer>pageNum) { //----------loop traversing all paths
for (PdfOutline child : outline.getAllChildren()){
if(child.getDestination() != null) {
prepareIndexFile(child,names,pdfDocument,titles,pageNum,list);
}
}
}
//-----Getting pageNumbers from outlines
public void prepareIndexFile(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument,List<String>titles,List<Integer>pageNum) {
String title = outline.getTitle();
PdfDestination pdfDestination = outline.getDestination();
String pdfStr = ((PdfString)pdfDestination.getPdfObject()).toUnicodeString();
PdfArray array = (PdfArray) names.get(pdfStr);
PdfObject pdfObj = array != null ? array.get(0) : null;
Integer pageNumber = pdfDocument.getPageNumber((PdfDictionary)pdfObj);
titles.add(title);
pageNum.add(pageNumber);
if(outline.getAllChildren().size() > 0) {
for (PdfOutline child : outline.getAllChildren()){
prepareIndexFile(child,names,pdfDocument,titles,pageNum);
}
}
}
public boolean splitPdf(String inputFile, final String outputFolder) {
boolean splitSuccess = true;
PdfDocument pdfDoc = null;
try {
PdfReader pdfReaderNew = new PdfReader(inputFile);
pdfDoc = new PdfDocument(pdfReaderNew);
final List<String> titles = new ArrayList<String>();
List<Integer> pageNum = new ArrayList<Integer>();
PdfNameTree destsTree = pdfDoc.getCatalog().getNameTree(PdfName.Dests);
Map<String, PdfObject> names = destsTree.getNames();//--------------------------------------Core logic for getting names
PdfOutline root = pdfDoc.getOutlines(false);//--------------------------------------Core logic for getting outlines
walkOutlines(root,names, pdfDoc, titles, pageNum,content); //------Logic to get bookmarks and pageNumbers
if (titles == null || titles.size()==0) {
splitSuccess = false;
}else { //------Proceed if it has bookmarks
for(int i=0;i<titles.size();i++) {
String title = titles.get(i);
String startPageNmStr =""+pageNum.get(i);
int startPage = Integer.parseInt(startPageNmStr);
int endPage = startPage;
if(i == titles.size() - 1) {
endPage = pdfDoc.getNumberOfPages();
}else {
int nextPage = pageNum.get(i+1);
if(nextPage > startPage) {
endPage = nextPage - 1;
}else {
endPage = nextPage;
}
}
String outFileName = outputFolder + File.separator + getFileName(title) + ".pdf";
PdfWriter pdfWriter = new PdfWriter(outFileName);
PdfDocument newDocument = new PdfDocument(pdfWriter, new DocumentProperties().setEventCountingMetaInfo(null));
pdfDoc.copyPagesTo(startPage, endPage, newDocument);
newDocument.close();
pdfWriter.close();
}
}
}catch(Exception e){
//---log
}
}

Categories