pdf file is opening in a bad way after download

pdf file is opening in a bad way after download - java

using iText 2.1.7 jar (or any newer version), after merging and creating a pdf file than downloading it, this file opens with the bottom part of some pages not shown. the problem is reproduced when i have an image covering the entire page of the pdf file. After downloading, i can't see the whole image in my new downloaded pdf. any idea?
switch (type){
case PDF_FILE : reader = new PdfReader(filePath);
break;
//...code...
}if (reader != null){
copyToWriter(out, reader, annotations, fileLabel);
}
code in copyToWriter function:
private void copyToWriter(OutputStream out, PdfReader reader, Annotation[] annotations, String fileLabel) throws DocumentException, IOException, BadPdfFormatException {
PdfDictionary outi = null;
PdfICCBased ib = null;
if (reader != null) {
reader.consolidateNamedDestinations();
// we retrieve the total number of pages
int n = reader.getNumberOfPages();
List bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
// original bookmark copy and page Shift
if (_pageOffset != 0) {
SimpleBookmark.shiftPageNumbers(bookmarks, _pageOffset,
null);
}
}
if(_setMergeBookmark) {
// we add specifics bookmarks for beginning page of each added file.
//...code...(not used)
}
else {
// original bookmark copy and page Shift
if (bookmarks != null) {
_master.addAll(bookmarks);
}
}
_pageOffset += n;
if (_writer == null) {
// step 1: creation of a document-object
_document = new Document();
// step 2: we create a writer that listens to the document
_writer = PdfWriter.getInstance(_document, out);
if (_stamp != null) {
_stamp.setTotalPage( n);
_writer.setPageEvent(_stamp);
}
if(_setMergeBookmark) {
_writer.setViewerPreferences(PdfWriter.PageLayoutSinglePage
| PdfWriter.PageModeUseOutlines);
}
else {
_writer.setViewerPreferences(PdfWriter.PageLayoutSinglePage
| PdfWriter.PageModeUseNone);
}
_writer.setViewerPreferences(PdfWriter.DisplayDocTitle);
if (_pdfA_pdfX) {
//...code... (not used)
}
//
// step 3: we open the document
_document.open();
//
}
// adding the content
PdfContentByte cb = _writer.getDirectContent();
// step 4: we add content
PdfImportedPage page;
for (int i = 0; i < n;) {
++i;
page = _writer.getImportedPage(reader, i);
// get PageSize to ensure landscape/portrait is kept
// also look at rotation tag then we HAD TO use getPageSizeWithRotation
// instead of getPageSize
Rectangle gle = reader.getPageSizeWithRotation(i);
int rotation = reader.getPageRotation(i);
_document.setPageSize( gle);
_document.newPage();
float[] matf = new float[6];
if (rotation == 90) {
matf[0] = 0;
matf[1] = -1f;
matf[2] = 1f;
matf[3] = 0;
matf[4] = 0;
matf[5] = gle.getHeight();
// use transformation matrix to rotate imported page according 'rotation' value
// cb.addTemplate(page, 0, -1f, 1f, 0, 0, gle.getHeight());
}
else if (rotation == 270) {
matf[0] = 0;
matf[1] = 1f;
matf[2] = -1f;
matf[3] = 0;
matf[4] = gle.getWidth();
matf[5] = 0;
// use transformation matrix to rotate imported page according 'rotation' value
// cb.addTemplate(page, 0, 1f, -1f, 0, gle.getWidth(), 0);
}
else { //my case here
matf[0] = 1f;
matf[1] = 0;
matf[2] = 0;
matf[3] = 1f;
matf[4] = 0;
matf[5] = 0;
// cb.addTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
cb.addTemplate(page, matf[0], matf[1], matf[2], matf[3], matf[4], matf[5]);
//copy text annotation on destination
PdfDictionary pageannot = reader.getPageN(i);
PdfArray annots = null;
annots = pageannot.getAsArray(PdfName.ANNOTS);
//if annots is null... code...
}
}
}

Related

PDFBOX 2.0+ java flatten annotations freetext created by foxit

I ran into a very tough issue. We have forms that were supposed to be filled out, but some people used annotation freeform text comments in foxit instead of filling the form fields, so the annotations never flatten. When our render software generates the final document annotations are not included.
The solution I tried is to basically go through the document, get the annotation text content and write it to the pdf so it is on the final document then remove the actual annotation, but I run into an issue where I don't know the font the annotation is using, line space, etc so cannot find out how to get it from a pdfbox to recreate exacactly as the annotation looks on the unflattened form.
Basically I want to flatten annotatations that are freeform created in foxit (The typewriter comment feature)
Here is the code. It is working, but again I am struggling with figuring out how to get the annotations to write to my final pdf document. Again flatten on the acroform is not working because these are not acroform fields! The live code filters out anything that is not a freetext type annotation, but below code should show my issue.
public static void main(String [] args)
{
String startDoc = "C:/test2/test.pdf";
String finalFlat = "C:/test2/test_FLAT.pdf";
try {
// for testing
try {
//BasicConfigurator.configure();
File myFile = new File(startDoc);
PDDocument pdDoc = PDDocument.load( myFile );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedApperances flag
pdAcroForm.setNeedAppearances(false);
// correct the missing page link for the annotations
for (PDPage page : pdDoc.getPages()) {
for (PDAnnotation annot : page.getAnnotations()) {
System.out.println(annot.getContents());
System.out.println(annot.isPrinted());
System.out.println(annot.isLocked());
System.out.println(annot.getAppearance().toString());
PDPageContentStream contentStream = new PDPageContentStream(pdDoc, page, PDPageContentStream.AppendMode.APPEND,true,true);
int fontHeight = 14;
contentStream.setFont(PDType1Font.TIMES_ROMAN, fontHeight);
float height = annot.getRectangle().getLowerLeftY();
String s = annot.getContents().replaceAll("\t", " ");
String ss[] = s.split("\\r");
for(String sss : ss)
{
contentStream.beginText();
contentStream.newLineAtOffset(annot.getRectangle().getLowerLeftX(),height );
contentStream.showText(sss);
height = height + fontHeight * 2 ;
contentStream.endText();
}
contentStream.close();
page.getAnnotations().remove(annot);
}
}
pdAcroForm.flatten();
pdDoc.save(finalFlat);
pdDoc.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
catch (Exception e) {
System.err.println("Exception: " + e.getLocalizedMessage());
}
}

This was not a fun one. After a million different tests, and I STILL do not understand all the nuances, but this is the version that appeas to flatten all pdf files and annotations if they are visible on PDF. Tested about half a dozen pdf creators and if an annotation is visible on a page this hopefully flattens it. I suspect there is a better way by pulling the matrix and transforming it and what not, but this is the only way I got it to work everywhere.
public static void flattenv3(String startDoc, String endDoc) {
org.apache.log4j.Logger.getRootLogger().setLevel(org.apache.log4j.Level.INFO);
String finalFlat = endDoc;
try {
try {
//BasicConfigurator.configure();
File myFile = new File(startDoc);
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
if (pdAcroForm != null) {
pdAcroForm.setNeedAppearances(false);
pdAcroForm.flatten();
}
// set the NeedApperances flag
boolean isContentStreamWrapped;
int ii = 0;
for (PDPage page: pdDoc.getPages()) {
PDPageContentStream contentStream;
isContentStreamWrapped = false;
List < PDAnnotation > annotations = new ArrayList < > ();
for (PDAnnotation annotation: page.getAnnotations()) {
if (!annotation.isInvisible() && !annotation.isHidden() && annotation.getNormalAppearanceStream() != null)
{
ii++;
if (ii > 1) {
// contentStream.close();
// continue;
}
if (!isContentStreamWrapped) {
contentStream = new PDPageContentStream(pdDoc, page, AppendMode.APPEND, true, true);
isContentStreamWrapped = true;
} else {
contentStream = new PDPageContentStream(pdDoc, page, AppendMode.APPEND, true);
}
PDAppearanceStream appearanceStream = annotation.getNormalAppearanceStream();
PDFormXObject fieldObject = new PDFormXObject(appearanceStream.getCOSObject());
contentStream.saveGraphicsState();
boolean needsTranslation = resolveNeedsTranslation(appearanceStream);
Matrix transformationMatrix = new Matrix();
boolean transformed = false;
float lowerLeftX = annotation.getNormalAppearanceStream().getBBox().getLowerLeftX();
float lowerLeftY = annotation.getNormalAppearanceStream().getBBox().getLowerLeftY();
PDRectangle bbox = appearanceStream.getBBox();
PDRectangle fieldRect = annotation.getRectangle();
float xScale = fieldRect.getWidth() - bbox.getWidth();
transformed = true;
lowerLeftX = fieldRect.getLowerLeftX();
lowerLeftY = fieldRect.getLowerLeftY();
if (bbox.getLowerLeftX() <= 0 && bbox.getLowerLeftY() < 0 && Math.abs(xScale) < 1) //BASICALLY EQUAL TO 0 WITH ROUNDING
{
lowerLeftY = fieldRect.getLowerLeftY() - bbox.getLowerLeftY();
if (bbox.getLowerLeftX() < 0 && bbox.getLowerLeftY() < 0) //THis is for the o
{
lowerLeftX = lowerLeftX - bbox.getLowerLeftX();
}
} else if (bbox.getLowerLeftX() == 0 && bbox.getLowerLeftY() < 0 && xScale >= 0) {
lowerLeftX = fieldRect.getUpperRightX();
} else if (bbox.getLowerLeftY() <= 0 && xScale >= 0) {
lowerLeftY = fieldRect.getLowerLeftY() - bbox.getLowerLeftY() - xScale;
} else if (bbox.getUpperRightY() <= 0) {
if (annotation.getNormalAppearanceStream().getMatrix().getShearY() < 0) {
lowerLeftY = fieldRect.getUpperRightY();
lowerLeftX = fieldRect.getUpperRightX();
}
} else {
}
transformationMatrix.translate(lowerLeftX,
lowerLeftY);
contentStream.transform(transformationMatrix);
contentStream.drawForm(fieldObject);
contentStream.restoreGraphicsState();
contentStream.close();
}
}
page.setAnnotations(annotations);
}
pdDoc.save(finalFlat);
pdDoc.close();
File file = new File(finalFlat);
// Desktop.getDesktop().browse(file.toURI());
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
System.err.println("Exception: " + e.getLocalizedMessage());
}
}
}

How move MediaBox to 0.0 with itext(itextsharp)

The MediaBox coordinate of my PDF file is (-8,-8), now I want to set it (0,0).
I tried to set it directly, but the contents of the file were offset.
So I want to change the MediaBox coordinates and move the content as well.
Here's the itextshare code(c#). I'm glad to be able to solve it with Java itext.
using (PdfReader pdfReader = new PdfReader(#"MediaBoxZero.pdf"))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(#"MediaBoxZero_Result.pdf", FileMode.Create)))
{
var mediaBox = pdfReader.GetBoxSize(1, "media");
PdfArray mediaBoxN = new PdfArray();
mediaBoxN.Add(new float[] { 0, 0, mediaBox.Width, mediaBox.Height });
for (int curPageNum = 1; curPageNum <= pdfReader.NumberOfPages; ++curPageNum)
{
PdfDictionary pagedict = pdfReader.GetPageN(curPageNum);
pagedict.Put(PdfName.MEDIABOX, mediaBoxN);
}
}
}
I tried affine transformation, but it didn't work. Affine transformation should only work when generating new PDFs, and I wanted to edit existing PDFs.
using (PdfReader pdfReader = new PdfReader(#"MediaBoxZero.pdf"))
{
using (PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(#"MediaBoxZero_Result.pdf", FileMode.Create)))
{
PdfContentByte pb = stamper.GetOverContent(1);
AffineTransform at = new AffineTransform();
at.Translate(100,0);
pb.Transform(at);
pb.ConcatCTM(at);
//var mediaBox = pdfReader.GetBoxSize(1, "media");
//PdfArray mediaBoxN = new PdfArray();
//mediaBoxN.Add(new float[] { 0, 0, mediaBox.Width, mediaBox.Height });
//for (int curPageNum = 1; curPageNum <= pdfReader.NumberOfPages; ++curPageNum)
//{
// PdfDictionary pagedict = pdfReader.GetPageN(curPageNum);
// foreach (var item in pagedict.GetEnumerator())
// {
// }
// pagedict.Put(PdfName.MEDIABOX, mediaBoxN);
//}
}
}
}

You can shift the page contents like this with iTextSharp to match the MediaBox change:
using (PdfReader pdfReader = new PdfReader(SOURCE_PDF))
{
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
Rectangle mediaBox = pdfReader.GetPageSize(i);
if (mediaBox.Left == 0 && mediaBox.Bottom == 0)
continue;
PdfDictionary pageDict = pdfReader.GetPageN(i);
pageDict.Put(PdfName.MEDIABOX, new PdfArray { new PdfNumber(0), new PdfNumber(0),
new PdfNumber(mediaBox.Width), new PdfNumber(mediaBox.Height) });
Rectangle cropBox = pdfReader.GetBoxSize(i, "crop");
if (cropBox != null)
{
pageDict.Put(PdfName.CROPBOX, new PdfArray { new PdfNumber(cropBox.Left - mediaBox.Left),
new PdfNumber(cropBox.Bottom-mediaBox.Bottom), new PdfNumber(cropBox.Right - mediaBox.Left),
new PdfNumber(cropBox.Top - mediaBox.Bottom) });
}
using (MemoryStream stream = new MemoryStream())
{
string translation = String.Format(CultureInfo.InvariantCulture, "1 0 0 1 {0} {1} cm\n", -mediaBox.Left, -mediaBox.Bottom);
byte[] translationBytes = Encoding.ASCII.GetBytes(translation);
stream.Write(translationBytes, 0, translationBytes.Length);
byte[] contentBytes = pdfReader.GetPageContent(i);
stream.Write(contentBytes, 0, contentBytes.Length);
pdfReader.SetPageContent(i, stream.ToArray());
}
}
using (FileStream fileStream = new FileStream(#"MediaBox-normalized.pdf", FileMode.Create))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, fileStream))
{
}
}
Some remarks:
You cannot simply manipulate the UnderContent of the respective page because iText attempts to prevent changes to the graphics state there (like a change of the current transformation matrix) from bleeding through to the existing content. Thus, here we update the page content on the byte level.
The code updates the CropBox (if set) alongside the MediaBox. Strictly speaking it should also update the other boxes known to PDF pages.
The code ignores annotations. If your PDFs have annotations, you have to also move the annotation Rects and other annotation properties in userspace coordinates, like QuadPoints, Vertices, etc.

PDFBox: put two A4 pages on one A3

I have a pdf document with one or more pages A4 paper.
The resulting pdf document should be A3 paper where each page contains two from the first one (odd on the left, even on the right side).
I already got it to render the A4 pages into images and the odd pages are successfully placed on the first parts of a new A3 pages but I cannot get the even pages to be placed.
public class CreateLandscapePDF {
public void renderPDF(File inputFile, String output) {
PDDocument docIn = null;
PDDocument docOut = null;
float width = 0;
float height = 0;
float posX = 0;
float posY = 0;
try {
docIn = PDDocument.load(inputFile);
PDFRenderer pdfRenderer = new PDFRenderer(docIn);
docOut = new PDDocument();
int pageCounter = 0;
for(PDPage pageIn : docIn.getPages()) {
pageIn.setRotation(270);
BufferedImage bufferedImage = pdfRenderer.renderImage(pageCounter);
width = bufferedImage.getHeight();
height = bufferedImage.getWidth();
PDPage pageOut = new PDPage(PDRectangle.A3);
PDImageXObject image = LosslessFactory.createFromImage(docOut, bufferedImage);
PDPageContentStream contentStream = new PDPageContentStream(docOut, pageOut, AppendMode.APPEND, true, true);
if((pageCounter & 1) == 0) {
pageOut.setRotation(90);
docOut.addPage(pageOut);
posX = 0;
posY = 0;
} else {
posX = 0;
posY = width;
}
contentStream.drawImage(image, posX, posY);
contentStream.close();
bufferedImage.flush();
pageCounter++;
}
docOut.save(output + "\\LandscapeTest.pdf");
docOut.close();
docIn.close();
} catch(IOException io) {
io.printStackTrace();
}
}
}
I'm using Apache PDFBox 2.0.2 (pdfbox-app-2.0.2.jar)

Thank you very much for your help and the link to the other question - I think I already read it but wasn't able to use in in my code yet.
But finally the PDFClown made the job, though I think it's not very nice to use PDFBox and PDFClown in the same program.
Anyway here's my working code to combine A4 pages on A3 paper.
public class CombinePages {
public void run(String input, String output) {
try {
Document source = new File(input).getDocument();
Pages sourcePages = source.getPages();
Document target = new File().getDocument();
Page targetPage = null;
int pageCounter = 0;
double moveByX = .0;
for(Page sourcePage : source.getPages()) {
if((pageCounter & 1) == 0) {
//even page gets a blank page
targetPage = new Page(target);
target.setPageSize(PageFormat.getSize(PageFormat.SizeEnum.A3, PageFormat.OrientationEnum.Landscape));
target.getPages().add(targetPage);
moveByX = .0;
} else {
moveByX = .50;
}
//get content from source page
XObject xObject = sourcePages.get(pageCounter).toXObject(target);
PrimitiveComposer composer = new PrimitiveComposer(targetPage);
Dimension2D targetSize = targetPage.getSize();
Dimension2D sourceSize = xObject.getSize();
composer.showXObject(xObject, new Point2D.Double(targetSize.getWidth() * moveByX, targetSize.getHeight() * .0), new Dimension(sourceSize.getWidth(), sourceSize.getHeight()), XAlignmentEnum.Left, YAlignmentEnum.Top, 0);
composer.flush();
pageCounter++;
}
target.getFile().save(output + "\\CombinePages.pdf", SerializationModeEnum.Standard);
source.getFile().close();
} catch (FileNotFoundException fnf) {
log.error(fnf);
} catch (IOException io) {
log.error(io);
}
}
}

How to find blank pages inside a PDF using PDFBox?

Here is the challenge I'm currently facing.
I have a lot of PDFs and I have to remove the blank pages inside them and display only the pages with content (text or images).
The problem is that those pdfs are scanned documents.
So the blank pages have some dirty left behind by the scanner.

I did some research and ended up with this code that checks for 99% of the page as white or light gray.
I needed the gray factor as the scanned documents sometimes are not pure white.
private static Boolean isBlank(PDPage pdfPage) throws IOException {
BufferedImage bufferedImage = pdfPage.convertToImage();
long count = 0;
int height = bufferedImage.getHeight();
int width = bufferedImage.getWidth();
Double areaFactor = (width * height) * 0.99;
for (int x = 0; x < width ; x++) {
for (int y = 0; y < height ; y++) {
Color c = new Color(bufferedImage.getRGB(x, y));
// verify light gray and white
if (c.getRed() == c.getGreen() && c.getRed() == c.getBlue()
&& c.getRed() >= 248) {
count++;
}
}
}
if (count >= areaFactor) {
return true;
}
return false;
}

#Shoyo's code works fine for PDFBox version < 2.0. For future readers, there's no much change but, just in case, here is the code for PDFBOX 2.0+ to make your life easier.
In your main (By main, I mean the place where you are loading your PDF into PDDocument) method:
try {
PDDocument document = PDDocument.load(new File("/home/codemantra/Downloads/tetml_ct_access/C.pdf"));
PDFRenderer renderedDoc = new PDFRenderer(document);
for (int pageNumber = 0; pageNumber < document.getNumberOfPages(); pageNumber++) {
if(isBlank(renderedDoc.renderImage(pageNumber))) {
System.out.println("Blank Page Number : " + pageNumber + 1);
}
}
} catch (Exception e) {
e.printStackTrace();
}
And isBlank method will just have BufferedImage passed in:
private static Boolean isBlank(BufferedImage pageImage) throws IOException {
BufferedImage bufferedImage = pageImage;
long count = 0;
int height = bufferedImage.getHeight();
int width = bufferedImage.getWidth();
Double areaFactor = (width * height) * 0.99;
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
Color c = new Color(bufferedImage.getRGB(x, y));
if (c.getRed() == c.getGreen() && c.getRed() == c.getBlue() && c.getRed() >= 248) {
count++;
}
}
}
if (count >= areaFactor) {
return true;
}
return false;
}
All the credits goes to #Shoyo
Update:
Some PDFs have "This Page was Intentionally Left Blank" to which the above code considers as blank. If this is your requirement then feel free to use the above code. But, my requirement was only to filter out the pages that were completely blank (No any images present nor consisting of any fonts). So, I ended up using this code (Plus this code runs faster :P) :
public static void main(String[] args) {
try {
PDDocument document = PDDocument.load(new File("/home/codemantra/Downloads/CTP2040.pdf"));
PDPageTree allPages = document.getPages();
Integer pageNumber = 1;
for (PDPage page : allPages) {
Iterable<COSName> xObjects = page.getResources().getXObjectNames();
Iterable<COSName> fonts = page.getResources().getFontNames();
if(xObjects.spliterator().getExactSizeIfKnown() == 0 && fonts.spliterator().getExactSizeIfKnown() == 0) {
System.out.println(pageNumber);
}
pageNumber++;
}
} catch (Exception e) {
e.printStackTrace();
}
}
This will return the page numbers of those pages which are completely blank.
Hope this helps someone! :)

#Pramesh Bajracharya, Your solution to find a blank page in a PDF document is intact!
If in case the requirement is to remove the blank pages the same code can be enhanced as below
List<Integer> blankPageList = new ArrayList<Integer>();
for( PDPage page : allPages )
{
Iterable<COSName> xObjects = page.getResources().getXObjectNames();
Iterable<COSName> fonts = page.getResources().getFontNames();
// condition to determine if the page is a blank page
if( xObjects.spliterator().getExactSizeIfKnown() == 0 && fonts.spliterator().getExactSizeIfKnown() == 0 )
{
pageRemovalList.add( pageNumber );
}
pageNumber++;
}
// remove the blank pages from the pdf document using the blank page numbers list
for( Integer i : blankPageList )
{
document.removePage( i );
}

http://www.rgagnon.com/javadetails/java-detect-and-remove-blank-page-in-pdf.html
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.io.RandomAccessSourceFactory;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
public class RemoveBlankPageFromPDF {
// value where we can consider that this is a blank image
// can be much higher or lower depending of what is considered as a blank page
public static final int BLANK_THRESHOLD = 160;
public static void removeBlankPdfPages(String source, String destination)
throws IOException, DocumentException
{
PdfReader r = null;
RandomAccessSourceFactory rasf = null;
RandomAccessFileOrArray raf = null;
Document document = null;
PdfCopy writer = null;
try {
r = new PdfReader(source);
// deprecated
// RandomAccessFileOrArray raf
// = new RandomAccessFileOrArray(pdfSourceFile);
// itext 5.4.1
rasf = new RandomAccessSourceFactory();
raf = new RandomAccessFileOrArray(rasf.createBestSource(source));
document = new Document(r.getPageSizeWithRotation(1));
writer = new PdfCopy(document, new FileOutputStream(destination));
document.open();
PdfImportedPage page = null;
for (int i=1; i<=r.getNumberOfPages(); i++) {
// first check, examine the resource dictionary for /Font or
// /XObject keys. If either are present -> not blank.
PdfDictionary pageDict = r.getPageN(i);
PdfDictionary resDict = (PdfDictionary) pageDict.get( PdfName.RESOURCES );
boolean noFontsOrImages = true;
if (resDict != null) {
noFontsOrImages = resDict.get( PdfName.FONT ) == null &&
resDict.get( PdfName.XOBJECT ) == null;
}
System.out.println(i + " noFontsOrImages " + noFontsOrImages);
if (!noFontsOrImages) {
byte bContent [] = r.getPageContent(i,raf);
ByteArrayOutputStream bs = new ByteArrayOutputStream();
bs.write(bContent);
System.out.println
(i + bs.size() + " > BLANK_THRESHOLD " + (bs.size() > BLANK_THRESHOLD));
if (bs.size() > BLANK_THRESHOLD) {
page = writer.getImportedPage(r, i);
writer.addPage(page);
}
}
}
}
finally {
if (document != null) document.close();
if (writer != null) writer.close();
if (raf != null) raf.close();
if (r != null) r.close();
}
}
public static void main (String ... args) throws Exception {
removeBlankPdfPages
("C://temp//documentwithblank.pdf", "C://temp//documentwithnoblank.pdf");
}
}

Creating Table of contents for an existing Pdf

I am trying to create a table of contents for an existing pdf file and subsequently merge the table of contents page to the pdf file.Page heading and corresponding page numbers are available in a separate excel file.
I am using iText for Pdf manipulation.
All the examples i came across were related to inserting links while creating a new pdf. But in my case I want to create links for existing pages.
Any suggestion or examples would be highly appreciated.

I finally found the answer. Thanks to my friend for pointing me towards an example of this in c#.
The code in java looks like :
public class Test1 {
public static void main(String args[]) throws Exception{
PdfReader reader = new PdfReader(new RandomAccessFileOrArray("C:\\test.pdf"), null);
Document doc = new Document(reader.getPageSize(1));
PdfWriter writer = PdfWriter.getInstance(doc, new FileOutputStream("C:\\result.pdf"));
Font link = FontFactory.getFont("Arial", 12, Font.UNDERLINE);
doc.open();
PdfContentByte pdfContentByte = writer.getDirectContent();
Anchor topAnchor = null;
PdfImportedPage page = null;
for (int i = 1; i < reader.getNumberOfPages(); i++)
{
if (i == 1)
{
Anchor click = new Anchor("Click to go to Target");
click.setReference("#target");
Paragraph p1 = new Paragraph();
p1.add(click);
doc.add(p1);
page = writer.getImportedPage(reader, i);
doc.newPage();
pdfContentByte.addTemplate(page, 0, 0);
}
else
{
if (i == 5)
{
Anchor target = new Anchor("My targer");
target.setName("target");
Paragraph p3 = new Paragraph();
p3.add(target);
doc.add(p3);
}
page = writer.getImportedPage(reader, i);
doc.newPage();
pdfContentByte.addTemplate(page, 0, 0);
}
}
doc.close();
}
}

I looked for a patch I made in Flying Saucer that made it possible to click on a link and make it do something. This was a very long time ago so I'm not sure if this will help or not, but it seems to be using iText for this. Here's the snippet of code I think I wrote for this :P
private void processLink(RenderingContext c, Box box) {
Element elem = box.getElement();
if (elem != null) {
NamespaceHandler handler = _sharedContext.getNamespaceHandler();
String uri = handler.getLinkUri(elem);
if (uri != null) {
if (uri.length() > 1 && uri.charAt(0) == '#') {
String anchor = uri.substring(1);
Box target = _sharedContext.getBoxById(anchor);
if (target != null) {
PdfDestination dest = createDestination(c, target);
if (dest != null) {
PdfAction action = new PdfAction();
if (!"".equals(handler.getAttributeValue(elem, "onclick"))) {
action = PdfAction.javaScript(handler.getAttributeValue(elem, "onclick"), _writer);
} else {
action.put(PdfName.S, PdfName.GOTO);
action.put(PdfName.D, dest);
}
com.lowagie.text.Rectangle targetArea = checkLinkArea(c, box);
if (targetArea == null) {
return;
}
targetArea.setBorder(0);
targetArea.setBorderWidth(0);
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(),
targetArea.getRight(), targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);
annot.setBorderStyle(new PdfBorderDictionary(0.0f, 0));
annot.setBorder(new PdfBorderArray(0.0f, 0.0f, 0));
_writer.addAnnotation(annot);
}
}
} else if (uri.indexOf("://") != -1) {
PdfAction action = new PdfAction(uri);
com.lowagie.text.Rectangle targetArea = checkLinkArea(c, box);
if (targetArea == null) {
return;
}
PdfAnnotation annot = new PdfAnnotation(_writer, targetArea.getLeft(), targetArea.getBottom(), targetArea.getRight(),
targetArea.getTop(), action);
annot.put(PdfName.SUBTYPE, PdfName.LINK);
annot.setBorderStyle(new PdfBorderDictionary(0.0f, 0));
annot.setBorder(new PdfBorderArray(0.0f, 0.0f, 0));
_writer.addAnnotation(annot);
}
}
}
}
This is in the org.xhtmlrenderer.pdf.ITextOutputDevice.
I'm not sure what context you're working in. Is that enough information to figure out how to do this?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pdf file is opening in a bad way after download - java

Related

PDFBOX 2.0+ java flatten annotations freetext created by foxit

How move MediaBox to 0.0 with itext(itextsharp)

PDFBox: put two A4 pages on one A3

How to find blank pages inside a PDF using PDFBox?

Creating Table of contents for an existing Pdf

Categories

Resources