In iText v5 there was the option to make 'smart' concatenation of PDF documents:
public PdfConcatenate(OutputStream os, boolean smart) throws DocumentException
Creates an instance of the concatenation class.
Parameters:
os - the OutputStream for the PDF document
smart - do we want PdfCopy to detect redundant content?
The initialization I was doing would be something like:
PdfConcatenate concatenatedPdf = new PdfConcatenate(outputStream, true);
In iText 7 I read we should use the copyPages function. Something like:
[...]
PdfDocument concatenatedPdf = new PdfDocument(writer);
PdfDocument docToAdd = new PdfDocument(pdfReader);
docToAdd.copyPagesTo(1, docToAdd.getNumberOfPages(), concatenatedPdf);
I'm migrating a logic to merge documents from iText v5 to v7. For a sample test in v5 with PdfConcatenate and the flag 'smart' the result PDF is 177 KB, in v7 is 763 KB. Is there a way to detect this redundant content in iText v7?
First of all, iText7 provides a convenient class called PdfMerger for merging PDFs.
Here is a sample how to use it:
PdfDocument sourceDocument = new PdfDocument(new PdfReader(filename));
PdfMerger resultDocument = new PdfMerger(new PdfDocument(new PdfWriter(resultFile)));
resultDocument.merge(sourceDocument, fromPage, toPage);
resultDocument.close();
sourceDocument.close();
Of course in the example only one set of pages from source document is added to the resultant document but you can call merge function as many times as you like.
Now when you want the resultant file to be as small in terms of file size as possible, you need to specify some settings for the destination PdfDocument that you feed to the PdfMerger.
First of all, you can tweak the compression level for streams to use more CPU and time but compress better:
PdfMerger resultDocument = new PdfMerger(new PdfDocument(
new PdfWriter(resultFile, new WriterProperties().setCompressionLevel(CompressionConstants.BEST_COMPRESSION))));
To compress even better you can use full compression. That would not only better compress streams (content of the page, images, fonts), but also would compress PDF objects that usually consume many bits in the out file size. This can be done like this:
PdfMerger resultDocument = new PdfMerger(new PdfDocument(
new PdfWriter(resultFile, new WriterProperties().setFullCompressionMode(true))));
In case source documents have same objects by default you might have some duplication. So-called "Smart Mode" provides a possibility to avoid such duplication and optimize the file size for cases when there are many duplicating objects. This would be the closes analogue to the "smart" flag you refer to in your iText 5 code. You can enable smart mode in iText 7 in the following way:
PdfMerger resultDocument = new PdfMerger(new PdfDocument(
new PdfWriter(resultFile, new WriterProperties().useSmartMode())));
Related
I want to write some text in a PDF file.
Approach 1: Using AcroForms.
To me, this the simplest approach. But we have some performance doubts. Using this approach, I'll be embedding a PDF template that contains in my resources folder and populating already existing fields that I've already created in Acrobat like so:
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfDoc, true);
form.setGenerateAppearance(true);
form.getField("name")
.setVisibility(PdfFormField.VISIBLE)
.setReadOnly(true)
.setValue("Hamza Belmellouki");
// other fields being set
Approach 2: Generate the PDF from Scratch.
In this approach, I write the whole content to the file(Static content as well as dynamic one) and generate it.
In terms of performance, I am wondering which approach should we go with?
I am watermarking pdf document using itext7 library. it is preserving layers and shows one of its signature invalid.
I want to flatten the created document.
When i tried saving the document manually using Adobe print option, it flattens all signature and makes the document as valid document. Same functionality i want with java program.
Is there any way using java program, we can flatten pdf document?
According to your tag selection you appear to be using iText 7 for Java.
How to flatten a PDF AcroForm form using iText 7 is explained in the iText 7 knowledge base example Flattening a form. The pivotal code is:
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfDoc, true);
// If no fields have been explicitly included, then all fields are flattened.
// Otherwise only the included fields are flattened.
form.flattenFields();
pdfDoc.close();
(https://kb.itextpdf.com/home/it7kb/examples/flattening-a-form visited 2021-05-24)
In my project, some data sets are needed to be exported in PDF format.
I learned that iText is helpful, and PdfpTable can do the work, but it needs much code to deal with styles. While using PDF template can save time and code for adjusting style, but I can only set certain fields left in the template.
Can you give me some suggestions to show the data sets using commands like foreach? Thanks in advance!
Here are my code using pdfpTable, which has done the work, but the code is a little ugly:
PdfPTable pdfTable = createNewPDFTable();
for (int i = 0; i < dataSet.size(); i++) {
MetaObject metaObject = SystemMetaObject.forObject(dataSet.get(i));
for (String field : fields) {
Phrase phrase = new Phrase(String.valueOf(metaObject.getValue(field) != null ? metaObject.getValue(field) : "")
, PDFUtil.createChineseSong(DEFAULT_CELL_FONT_SIZE));
PdfPCell fieldCell = new PdfPCell(phrase);
fieldCell.setBorder(Rectangle.NO_BORDER);
fieldCell.setFixedHeight(DEFAULT_COLUMN_HEIGHT);
fieldCell.setHorizontalAlignment(Element.ALIGN_CENTER);
fieldCell.setVerticalAlignment(Element.ALIGN_MIDDLE);
pdfTable.addCell(fieldCell);
}
}
Here are some code using pdfp template,which is copied from itext examples, the work is unfinished yet because i haven't find a proper way to show the data set.
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.setField("text_1", "Bruno Lowagie");
form.setFieldProperty("text_1", "setfflags", PdfFormField.FF_READ_ONLY, null);
There is an inconsistency in your question. You write: PdfpTable can do the work, but it needs much code to deal with styles. However, in your first code snippet, you don't really create your PDFs the way one would expect. Instead of producing a high volume of finished PDFs, you create use PdfPTable to create a template. I assume you then use that template to create a high volume of finished PDFs.
If you want to use a template and populate it afterwards, you shouldn't create your form using iText. Create it manually, for instance using Open Office or Libre Office. See for instance the example in chapter 6 of my book (section 6.3.5). Create the template with a tool that has a GUI, then fill out that template many times using iText.
This approach has some down-sides: all the content has to fit the fields you define. All fields have a fixed position on a fixed page.
If "applying styles through code" is a problem, you may want to follow the approach described in the ZUGFeRD book. In that book, we create HTML first: Creating HTML invoices.
Once you have the HTML, then convert the HTML to PDF, and use CSS to apply styles: Creating PDF invoices.
This is how we create a ZUGFeRDDocument:
ZugferdDocument pdfDocument = new ZugferdDocument(
new PdfWriter(fos), ZugferdConformanceLevel.ZUGFeRDComfort,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", new FileInputStream(INTENT)));
pdfDocument.addFileAttachment(
"ZUGFeRD invoice", dom.toXML(), "ZUGFeRD-invoice.xml",
PdfName.ApplicationXml, new PdfDictionary(), PdfName.Alternative);
pdfDocument.setTagged();
HtmlConverter.convertToPdf(
new ByteArrayInputStream(html), pdfDocument, getProperties());
The getProperties() method looks like this:
public ConverterProperties getProperties() {
if (properties == null) {
properties = new ConverterProperties()
.setBaseUri("resources/zugferd/");
}
return properties;
}
You can find other examples on how to use HTML to PDF here: pdfHTML add-on (read the introduction).
Note that you are using an old version of iText. The examples I shared are using iText 7. There's a huge difference between iText 5 and iText 7.
I need to parse a PDF file through the pages and load each separately into a byte[]. I use the itext library.
I download a file consisting of one page with this code:
public Document addPageInTheDocument(String namePage, MultipartFile pdfData, Long documentId) throws IOException {
notNull(namePage, INVALID_PARAMETRE);
notNull(pdfData, INVALID_PARAMETRE);
notNull(documentId, INVALID_PARAMETRE);
byte[] in = pdfData.getBytes(); // size file 88747
Page page = new Page(namePage);
Document document = new Document();
document.setId(documentId);
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData.getBytes()));
PdfDocument pdfDocument = new PdfDocument(reader);
if (pdfDocument.getNumberOfPages() != 1) {
throw new IllegalArgumentException();
}
byte[] transform = pdfDocument.getPage(1).getContentBytes(); // 1907 size page
page.setPageData(pdfDocument.getPage(1).getContentBytes());
return addPageInTheDocument(document, page);
}
I'm trying to restore the file with this code:
ByteBuffer byteContent = new ByteBuffer() ;
for (Map.Entry<String, Page> page : pages.entrySet()) {
byteContent.append(page.getValue().getPageData());
}
PdfWriter writer = new PdfWriter(new FileOutputStream(book.getName() + modification + FORMAT));
byte[] df = byteContent.toByteArray();
PdfReader reader = new PdfReader(new ByteArrayInputStream(byteContent.toByteArray()));
com.itextpdf.layout.Document itextDocument = new com.itextpdf.layout.Document(new PdfDocument(reader, writer));
itextDocument.close();
Why is there such a difference in size?
And why the files and pages, and both the byte[] to create the file?
Let's start with your size question:
byte[] in = pdfData.getBytes(); // size file 88747
...
byte[] transform = pdfDocument.getPage(1).getContentBytes(); // 1907 size page
...
Why are there such a difference in size?
Because PdfPage.getContentBytes() does not return what you expect.
You seem to expect it to return a complete representation of the contents of the given page, and the Javadocs of that method might be interpreted ("Get decoded bytes for the whole page content.") to mean that.
This is not the case. PdfPage.getContentBytes() returns the contents of the content stream(s) of the page. These content stream(s) contain a sequence of commands which build the page. But these commands take parameters which reference data outside the content stream, e.g.:
when text is drawn on a PDF page, the content stream contains an operation selecting a font but the data describing the font and in case of embedded fonts the font program itself are outside the content stream;
when bitmap images are drawn, the content stream usually contains an operation for it which references image data outside the content stream;
there are operations which reference so called xobjects which essentially are independent content streams which can be called upon from any page; these xobject are not contained in the page content stream either.
Furthermore there are annotations (e.g. form fields) with their own content streams which are stored in separate structures. And lots of page properties are outside, too.
Thus, there are such differences in size because you get only a minute part of the page definition using getContentBytes.
Now let's look at your code "restoring the file".
As a corollary of the above it is obvious that your code merely concatenates some content streams but does not provide the external resources these streams refer to.
But aside from that your code also points out a misunderstanding concerning the nature of PDF pages: they are not merely blobs you can split and concatenate again as you want. They are collections of PDF objects which are spread throughout the PDF file; different pages can share some of their objects (e.g. fonts of often used images).
What you can do instead...
As representations of a single page you should use a PDF containing the data referenced by that single page. The iText example Burst.java shows how to do that.
To join these single page PDFs again you can use an iText PdfMerger. Remember to set smart mode (PdfWriter.setSmartMode(true)) to prevent resource duplication in the result.
i try to generate a pdf with itext. First i read in a existing template and stamp the formulars in the method stampFormular(Formular formular, PdfStamper stamper). The stamp method works. But i have a problem, with adding more formulars to the output file.
I want to stamp for each Formular the PDF Template "yellow". So i tried it with, the document.add(), but that doesn't work. So i tried to do this with pdf writer. But that doesn't work to. Any idea how i can stamp the pdf template with the one formular data, make a new page and stamp the same pdf template with the next formular data.
public static File createForm(List<Fomular> formulars) {
Document document = new Document();
File pdf = null;
document.open();
try {
PdfReader pdfTemplate = new PdfReader('YELLOW');
PdfStamper stamper = new PdfStamper(pdfTemplate,
new FileOutputStream("output.pdf"));
PdfWriter writer;
for (Formular f : formulars) {
stamper = stampFormular(f, stamper);
writer = stamper.getWriter();
writer.newPage();
}
stamper.close();
pdfTemplate.close();
pdf = new File("output.pdf");
Desktop.getDesktop().open(pdf);
} catch (DocumentException | IOException e) {
e.printStackTrace();
}
return pdf;
}
A couple of observations:
You can't take the PdfWriter object from a PdfStamper, use newPage() and expect it to work. That's the equivalent of opening the hood of your car and start rewiring tubes that fit without knowing anything about the art of motor maintenance. When you want to add a new page to a stamper, you're supposed to use the insertPage() method as explained in the documentation.
Second observation: you're not telling us if you're flattening the content of the forms. If you do, then it's simple, just use the example mentioned in the documentation and you're all set. In other words: combine PdfStamper with PdfSmartCopy. Especially if you're using the same template over and over again, PdfSmartCopy will give you much better results than PdfCopy for the reason explained in chapter 6.
Suppose that your template needs to remain interactive, then you may have a problem for a reason that is also explained in that chapter: different visualizations of a field with a specific name must always have the same value. For instance: if your template has a field named name, then every occurrence of this field in the document must have the same value. If you don't want this, you need to rename name, for instance to name1, name2, etc...
Concatenation of templates that need to remain interactive used to be done with PdfCopyFields (see documentation). Here, the documentation is somewhat outdated. In the latest version of iText, we now have a method addDocument() in PdfCopy and PdfSmartCopy. This method allows you to add a full document at once, preserving the interactivity.