iText: Importing styled Text and informations from an existing PDF

iText: Importing styled Text and informations from an existing PDF - java

I´m generating PDFs using iText and it works fine. But I need a way to import html styled informations from an existing PDF at some point.
I know i could just use the XMLWorker class to generate the text directly from html in my own document. But cause I´m not sure whether it actually supports all html features I´m looking to work around this.
Therefore a PDF is generated from html using XSLT. The content of this PDF then should be copied to my document.
There are two ways discribed in the book ("iText in Action").
One that parses the PDF and gets you the text (or other informations) from the document using PdfReaderContentParser and TextExtractionStrategy.
It looks like this:
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
TextExtractionStrategy strategy;
for(int i=1;i<=reader.getNumberOfPages();i++){
strategy = parser.processContent(i, new LocationTextExtractionStrategy());
document.add(new Chunk(strategy.getResultantText()));
}
But this only prints plain text to the document. Obviously there are more ExtractionStrategys and maybe one of them does exactly what i want but i couldn´t find it yet.
The second way is to copy an itextpdf.text.Image of each side of the PDF to your document. This is obviously not a good idea, cause it will add the entire page to your document even if there is only one line of text in the existing PDF. Its done like this:
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
PdfReader reader = new PdfReader(pdf);
PdfImportedPage page;
for(int i=1;i<=reader.getNumberOfPages();i++){
page = writer.getImportedPage(reader,i);
document.add(Image.getInstance(page));
}
Like I said this copys all the empty lines at the end of the PDF aswell, but i need to continue my text immediatly after the last line of text.
If I could convert this itext.text.Image into a java.awt.BufferedImage I could use getSubImage(); and informations i can extract from the PDF to cut away all the empty lines. But i wasn´t able to find a way to to this.
This are the two ways i found. But cause none of them is suitable for my purpose as they are my question is:
Is there a way to import everything except the empty lines at the end, but including text-style informations, tables and everything else from a PDF to my document using iText?

You can trim away empty space of the XSLT generated PDF and then import the trimmed pages as in your code.
Example code
The following code borrows from the code in my answer to Using iTextPDF to trim a page's whitespace. In contrast to the code there, though, we have to manipulate the media box, not the crop box, because this is the only box respected by PdfWriter.getImportedPage.
Before importing a page from a given PdfReader, crop it using this method:
static void cropPdf(PdfReader reader) throws IOException
{
int n = reader.getNumberOfPages();
for (int i = 1; i <= n; i++)
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MarginFinder finder = parser.processContent(i, new MarginFinder());
Rectangle rect = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
PdfDictionary page = reader.getPageN(i);
page.put(PdfName.MEDIABOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
}
}
(excerpt from ImportPageWithoutFreeSpace.java)
The extended render listener MarginFinder is taken as is from the question linked to above. You can find a copy here: MarginFinder.java.
Example run
Using this code
PdfReader readerText = new PdfReader(docText);
cropPdf(readerText);
PdfReader readerGraphics = new PdfReader(docGraphics);
cropPdf(readerGraphics);
try ( FileOutputStream fos = new FileOutputStream(new File(RESULT_FOLDER, "importPages.pdf")))
{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, fos);
document.open();
document.add(new Paragraph("Let's import 'textOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerText, 1)));
document.add(new Paragraph("and now 'graphicsOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerGraphics, 1)));
document.add(new Paragraph("That's all, folks!", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.close();
}
finally
{
readerText.close();
readerGraphics.close();
}
(excerpt from unit test method testImportPages in ImportPageWithoutFreeSpace.java)
I imported both the page from the docText document
and the page from the docGraphics document
into a new document with some text before, between, and after. The result:
As you can see, source styles are preserved but free space around is discarded.

Related

How do you add multiple images to a PDF with itext7 Java?

First google result takes me to Add multiple images into a single pdf file with iText using java which was posted 5 years ago. I am not sure which version they are using, because the Image object doesn't even have the getInstance method for me. Needless to say I am not getting much help from that link.
Anyways I am trying to create a javaFX application that loops multiple JPG images to create a single PDF document. Below is my code, which successfully creates a PDF from 2 images, but I am having trouble making the second image display on the second page.
In the link I posted above the simple solution I saw was to do document.newPage() then do document.add(img), but my document object doesn't have that method? I am not sure what to do.
PdfWriter writer = new PdfWriter("D:/sample1.pdf");
// Creating a PdfDocument
PdfDocument pdfDoc = new PdfDocument(writer);
// Adding a new page
// I can add multiple pages here, but when I add multiple images they do not
// automatically flow over to the next page.
pdfDoc.addNewPage();
pdfDoc.addNewPage();
// Creating a Document
Document document = new Document(pdfDoc);
String imageFile = "C:/Users/***/Downloads/MAT204/1.3-1.4 HW/test.jpg";
ImageData data = ImageDataFactory.create(imageFile);
Image img = new Image(data);
img.setAutoScale(true);
img.setRotationAngle(-Math.toRadians(90));
// I can add multiple images, but they overlaps each other and only
// appears on the first page.
// Is there a way for me to change the current page to write on?
document.add(img);
document.add(img);
// Closing the document
document.close();
System.out.println("PDF Created");
Anyways, I just want to figure out how to manually add another image before I write a loop to automate the process.

After doing more research I found the answer here.
https://kb.itextpdf.com/home/it7kb/examples/multiple-images
protected void manipulatePdf(String dest) throws Exception {
Image image = new Image(ImageDataFactory.create(IMAGES[0]));
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
Document doc = new Document(pdfDoc, new PageSize(image.getImageWidth(), image.getImageHeight()));
for (int i = 0; i < IMAGES.length; i++) {
image = new Image(ImageDataFactory.create(IMAGES[i]));
pdfDoc.addNewPage(new PageSize(image.getImageWidth(), image.getImageHeight()));
image.setFixedPosition(i + 1, 0, 0);
doc.add(image);
}
doc.close();
}

Add text to a specific page of a pdf using iText

I need create a tool that adds a hyperlink every other page for a pdf file.
I followed the iText documentation and I managed to add the hyperlink but only on the first page.
My code:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
Font bold = new Font(FontFamily.HELVETICA, 30, Font.BOLD);
PdfReader reader = new PdfReader(src);
int count = reader.getNumberOfPages();
Utils.logInfoMessage("Number of pages: " + count, mLogList);
if(count < 1) {
Utils.logErrorMessage("file : " + src + " has no pages", mLogList);
return;
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfContentByte canvas = stamper.getOverContent(1);
PdfGState gState = new PdfGState();
gState.setFillOpacity(0.1f);
canvas.setGState(gState);
Chunk chunk = new Chunk("www.google.com", bold);
chunk.setAnchor("https://www.google.ro/");
Phrase phrase = new Phrase("");
phrase.add(chunk);
ColumnText ct = new ColumnText(canvas);
ct.setSimpleColumn(36, 700, 559, 750);
ct.addText(phrase);
ct.go();
stamper.close();
reader.close();
}
Any ideas how to add the hyperlink/text only on a specific page?

You wrote:
I followed the iText documentation and I managed to add the hyperlink but only on the first page
This is the iText documentation: PdfStamper
getOverContent
public PdfContentByte getOverContent(int pageNum)
Gets a PdfContentByte to write over the page of the original
document.
Parameters:
pageNum - the page number where the extra content is written
Returns:
a PdfContentByte to write over the page of the original document
This is the code you wrote:
PdfContentByte canvas = stamper.getOverContent(1);
You used 1 as the value for pageNum.
Now you tell me: if you choose 1 as the page number, then why are you surprised that all the content you add is only added on the first page?
IMPORTANT:
You write
I followed the iText documentation
I assume that you refer to the official documentation on the official iText web page: https://itextpdf.com
If that is correct, then why are you still using an old version of iText? The current version is 7.1.2, and the PdfStamper class no longer exists in that version. As explained in chapter 5 of the iText 7 Jump-Start tutorial adding content to an existing PDF is done differently nowadays.
FYI: there are some more tutorials here: https://developers.itextpdf.com/books

iText add image to pdf - setAbsolutePosition [duplicate]

I am trying to read one PDF and copy its data into another PDF. The first PDF contains some text and images and I wish to write an image in the second PDF exactly where the text ends(which is basically the end of the PDF file). RIght now it just prints at the top. How can I make this change?
PdfReader reader = null;
reader = new PdfReader(Var.input);
Document document=new Document();
PdfWriter writer = null;
writer = PdfWriter.getInstance(document,new FileOutputStream(Var.output));
PdfImportedPage page = writer.getImportedPage(reader, 1);
reader.close();
document.open();
PdfContentByte cb = writer.getDirectContent();
// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your new data / text here
Image image = null;
image = Image.getInstance (Var.qr);
document.add(image);
document.close();

Try this:
First get the location/co-ords of where the image needs to go, then simply add the second line from below to your code so the image is inserted at that location "X, Y"
Image image = Image.getInstance(String RESOURCE);
image.setAbsolutePosition(X, Y);
writer.getDirectContent().addImage(image);
Take a look here for some examples in iText 5: https://itextpdf.com/en/resources/examples/itext-5-legacy/chapter-3-adding-content-absolute-positions

You should use a PdfStamper instead of a PdfWriter with imported pages. Your approach throws away all interactive contents. You can use sorifiend's idea there, too.
To determine where the text on the given page ends, have a look at the iText in Action, 2nd edition example ShowTextMargins which parses a PDF and ads a rectangle showing the text margin.

Convert a PDF with forms to a PDF with text only (preserve data) using iText

I have multiple PDFs that get populated with multiple records (a.pdf,b.pdf,c[0-9].pdf,d[0-9].pdf,ez.pdf) using acroforms and pdfbox.
The resulting files (aflat.pdf,bflat.pdf,c[0-9]flat.pdf,d[0-9]flat.pdf,ezflat.pdf) should have their forms(dictionaries and whatever adobe uses) removed but the fields filled as raw text saved on the pdf (setReadOnly is not what I want!).
PdfStamper can only remove fields without saving their content but I've found some references to PdfContentByte as a way to save the content. Alas, the documentation is too brief to understand how I should do this.
As a last resort I could use FieldPosition to write directly on the PDF. Has anyone ever encountered such problem? How do I solve it?
UPDATE: Saving a single page of b.pdf yields a valid bfilled.pdf but a blank bflattened.pdf. Saving the whole document solved the issue.
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
/*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
/*wrong approach*/doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);//<---right approach
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}

Use PdfStamper.setFormFlattening(true) to get rid of the fields and write them as content.

Always use the whole page when working with acroforms
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}

Image positioning in iText - Java

I am trying to read one PDF and copy its data into another PDF. The first PDF contains some text and images and I wish to write an image in the second PDF exactly where the text ends(which is basically the end of the PDF file). RIght now it just prints at the top. How can I make this change?
PdfReader reader = null;
reader = new PdfReader(Var.input);
Document document=new Document();
PdfWriter writer = null;
writer = PdfWriter.getInstance(document,new FileOutputStream(Var.output));
PdfImportedPage page = writer.getImportedPage(reader, 1);
reader.close();
document.open();
PdfContentByte cb = writer.getDirectContent();
// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your new data / text here
Image image = null;
image = Image.getInstance (Var.qr);
document.add(image);
document.close();

Try this:
First get the location/co-ords of where the image needs to go, then simply add the second line from below to your code so the image is inserted at that location "X, Y"
Image image = Image.getInstance(String RESOURCE);
image.setAbsolutePosition(X, Y);
writer.getDirectContent().addImage(image);
Take a look here for some examples in iText 5: https://itextpdf.com/en/resources/examples/itext-5-legacy/chapter-3-adding-content-absolute-positions

You should use a PdfStamper instead of a PdfWriter with imported pages. Your approach throws away all interactive contents. You can use sorifiend's idea there, too.
To determine where the text on the given page ends, have a look at the iText in Action, 2nd edition example ShowTextMargins which parses a PDF and ads a rectangle showing the text margin.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

iText: Importing styled Text and informations from an existing PDF - java

Related

How do you add multiple images to a PDF with itext7 Java?

Add text to a specific page of a pdf using iText

iText add image to pdf - setAbsolutePosition [duplicate]

Convert a PDF with forms to a PDF with text only (preserve data) using iText

Image positioning in iText - Java

Categories

Resources