iText mergeFields in PdfCopy creates invalid pdf - java

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks

We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function

Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.

Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

Related

Using POI To read/write a doc with the full POIFSFileSystem

I have the following issue, as everybody it seems, I want to replace some items with others in Word doc.
Issue with the issue is, the doc contains headers and footers which are part of the POIFSFileSystem (I know this because reading the FS / writing the doc back -without any changes- loses these informations, whereas reading the FS / writing it back as a new file doesn't).
Currently I do this :
POIFSFileSystem pfs = new POIFSFileSystem(fis);
HWPFDocument document = new HWPFDocument(pfs);
Range r1 = document.getRange();
…
document.write();
ByteArrayOutputStream bos = new ByteArrayOutputStream(50000);
pfs.writeFilesystem(bos);
pfs.close();
However this fails, with this error:
Opened read-only or via an InputStream, a Writeable File is required
If I don't rewrite the document, it works fine, but my changes are lost.
The other way around if I only save the document, not the filesystem, I lose the header/footer.
Now the problem is, how can I update the document while "saving as" the entire filesystem, or is there a way to force the document to contain everything from the file system?
The HWPF stuff is always in scratchpad because the DOC binary file format is the most horrible of all the Horrible formats. So it will really not be ready and also will be buggy in many cases.
But in your special case, your observations are not reproducible. Using apache poi 4.0.1 the HWPFDocument contains the header story, which also contains the footer stories, after creating from *.doc file. So the following works for me:
Source:
Code:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.usermodel.*;
public class ReadAndWriteDOCWithHeaderFooter {
public static void main(String[] args) throws Exception {
HWPFDocument document = new HWPFDocument(new FileInputStream("TemplateDOCWithHeaderFooter.doc"));
Range bodyRange = document.getRange();
System.out.println(bodyRange);
for (int p = 0; p < bodyRange.numParagraphs(); p++) {
System.out.println(bodyRange.getParagraph(p).text());
if (bodyRange.getParagraph(p).text().contains("<<NAME>>"))
bodyRange.getParagraph(p).replaceText("<<NAME>>", "Axel Richter");
if (bodyRange.getParagraph(p).text().contains("<<DATE>>"))
bodyRange.getParagraph(p).replaceText("<<DATE>>", "12/21/1964");
if (bodyRange.getParagraph(p).text().contains("<<AMOUNT>>"))
bodyRange.getParagraph(p).replaceText("<<AMOUNT>>", "1,234.56");
System.out.println(bodyRange.getParagraph(p).text());
}
System.out.println("==============================================================================");
Range overallRange = document.getOverallRange();
System.out.println(overallRange);
for (int p = 0; p < overallRange.numParagraphs(); p++) {
System.out.println(overallRange.getParagraph(p).text()); // contains all inclusive header and footer
}
FileOutputStream out = new FileOutputStream("ResultDOCWithHeaderFooter.doc");
document.write(out);
out.close();
document.close();
}
}
Result:
So please do checking it again and tell us exactly what is not working for you. Because we need reproducing that, please do providing a minimal, complete, and verifiable example as I have done with my code.

Open and Save to Same PDF File Name [duplicate]

I am required to replace a word in an existing PDF AcroField with another word. I am using PDFStamper of iTEXTSHARP to do the same and it is working fine. But, in doing so it is required to create a new PDF and i would like the change to be reflected in the existing PDF itself. If I am setting the destination filename same as the original filename then no change is being reflected.I am new to iTextSharp , is there anything I am doing wrong? Please help.. I am providing the piece of code I am using
private void ListFieldNames(string s)
{
try
{
string pdfTemplate = #"z:\TEMP\PDF\PassportApplicationForm_Main_English_V1.0.pdf";
string newFile = #"z:\TEMP\PDF\PassportApplicationForm_Main_English_V1.0.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
PdfReader reader = new PdfReader((string)pdfTemplate);
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(newFile, FileMode.Create, FileAccess.ReadWrite)))
{
AcroFields form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
//Replace Address Form field with my custom data
if (fieldKey.Contains("Address"))
{
form.SetField(fieldKey, s);
}
}
stamper.FormFlattening = true;
stamper.Close();
}
}
}
As documented in my book iText in Action, you can't read a file and write to it simultaneously. Think of how Word works: you can't open a Word document and write directly to it. Word always creates a temporary file, writes the changes to it, then replaces the original file with it and then throws away the temporary file.
You can do that too:
read the original file with PdfReader,
create a temporary file for PdfStamper, and when you're done,
replace the original file with the temporary file.
Or:
read the original file into a byte[],
create PdfReader with this byte[], and
use the path to the original file for PdfStamper.
This second option is more dangerous, as you'll lose the original file if you do something that causes an exception in PdfStamper.

Why is my form being flattened without calling the flattenFields method?

I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.
Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...
The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.
IllegalArgumentException issue has been fixed in iText 7.1.5.

How to decrypt 128bit RC4 pdf file in java with user password if it is encrypted with user as well as owner password

I have a PDF file locked with owner and user password. I don't have the owner password but I have the user password.
I am using iText to decrypt the file
So how should I decrypt the PDF file.
public class Decrypt {
public static final String SRC = "D:\\GitCodeBase(Master)\\pdf\\src\\main\\resources\\encrypt\\abc.pdf";
public static final String DEST = "D:\\GitCodeBase(Master)\\pdf\\src\\main\\resources\\decrypt\\def.pdf";
public static void main(String[] args) throws Exception {
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(SRC,"abc123".getBytes());
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(DEST));
stamper.close();
reader.close();
}
}
Using your code and your example file, I unfortunately cannot reproduce the issue: The code executes without throwing an exception.
But it does not yet do what you want either: The result file still is encrypted. Thus, here some information on that.
If you read Bruno's answer here up to its end, you'll see that your code used to decrypt PDF files before iText 5.3.5. Meanwhile, though, the encryption is kept. Strictly speaking that indeed is more correct, after all the code nowhere asks iText to drop the encryption.
Thus, in current iText 5 versions (I'm using the current 5.5.12-SNAPSHOT maintenance version), you have to do a bit more, you have to fool iText into thinking that the PDF wasn't encrypted as Bruno put it in his answer.
Unfortunately the member variable of PdfReader you have to change to do that is not public. Thus, you cannot simply set it.
The member in question is protected. Thus, you can change it by deriving your own PdfReader subclass and use a method in it to do the change. This has been demonstrated in Bruno's answer, here a variation for non-empty user passwords:
class MyReader extends PdfReader {
public MyReader(final String filename, final byte password[]) throws IOException {
super(filename, password);
}
public void decryptOnPurpose() {
encrypted = false;
}
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
MyReader.unethicalreading = true;
MyReader reader = new MyReader(src, "abc123".getBytes());
reader.decryptOnPurpose();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
Alternatively you can also use reflection:
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(inputStream, "abc123".getBytes());
Field encryptedField = PdfReader.class.getDeclaredField("encrypted");
encryptedField.setAccessible(true);
encryptedField.set(reader, false);
PdfStamper stamper = new PdfStamper(reader, outputStream);
stamper.close();
reader.close();
(DecryptUserOnly.java test method testDecryptAbc)
PS: I am aware that this answer hardly adds anything to Bruno's original answer. I did not try to mark this question a duplicate of the question of that answer only because it has been "closed as off-topic" and because numerous links in that answer meanwhile have become stale.

"Random" generated documents fail to print

We are attempting to generate documents using iText that are formed largely from "template" files - smaller PDF files that are combined together into one composite file using the PdfContentByte.addTemplate method. We then automatically and silently print the new file using the *nix command lp. This usually works; however occasionally, some files that are generated will fail to print. The document proceeds through all queues and arrives at the printer proper (a Lexmark T652n, in this case), its physical display gives a message of pending progress, and even its mechanical components whir up in preparation - then, the printing job vanishes spontaneously without a trace, and the printer returns to being ready.
The oddity in how specific this issue tends to be. For starters, the files in question print without fail when done manually through Adobe PDF Viewer, and can be read fine by editors like Adobe Live Cycle. Furthermore, the content of the file effects whether it is plagued by this issue, but not in a clear way - adding a specific template 20 times could cause the problem, while doing it 19 or 21 times might be fine, or using a different template will change the pattern entirely and might cause it to happen instead after 37 times. Generating a document with the exact same content will be consistent on whether or not the issue occurs, but any subtle and seemingly irrelevant change in content will change whether the problem happens.
While it could be considered a hardware issue, the fact remains that certain iText-generated files have this issue while others do not. Is our method of file creation sometimes creating files that are somehow considered corrupt only to the printer and only sometimes?
Here is a relatively small code example that generates documents using the repetitive template method similar to our main program. It uses this file as a template and repeats it a specified number of times.
public class PDFFileMaker {
private static final int INCH = 72;
final private static float MARGIN_TOP = INCH / 4;
final private static float MARGIN_BOTTOM = INCH / 2;
private static final String DIREC = "/pdftest/";
private static final String OUTPUT_FILEPATH = DIREC + "cooldoc_%d.pdf";
private static final String TEMPLATE1_FILEPATH = DIREC + "template1.pdf";
private static final Rectangle PAGE_SIZE = PageSize.LETTER;
private static final Rectangle TEMPLATE_SIZE = PageSize.LETTER;
private ByteArrayOutputStream workingBuffer;
private ByteArrayOutputStream storageBuffer;
private ByteArrayOutputStream templateBuffer;
private float currPosition;
private int currPage;
private int formFillCount;
private int templateTotal;
private static final int DEFAULT_NUMBER_OF_TIMES = 23;
public static void main (String [] args) {
System.out.println("Starting...");
PDFFileMaker maker = new PDFFileMaker();
File file = null;
try {
file = maker.createPDF(DEFAULT_NUMBER_OF_TIMES);
}
catch (Exception e) {
e.printStackTrace();
}
if (file == null || !file.exists()) {
System.out.println("File failed to be created.");
}
else {
System.out.println("File creation successful.");
}
}
public File createPDF(int inCount) throws Exception {
templateTotal = inCount;
String sFilepath = String.format(OUTPUT_FILEPATH, templateTotal);
workingBuffer = new ByteArrayOutputStream();
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
startPDF();
doMainSegment();
finishPDF(sFilepath);
return new File(sFilepath);
}
private void startPDF() throws DocumentException, FileNotFoundException {
Document d = new Document(PAGE_SIZE);
PdfWriter w = PdfWriter.getInstance(d, workingBuffer);
d.open();
d.add(new Paragraph(" "));
d.close();
w.close();
currPosition = 0;
currPage = 1;
formFillCount = 1;
}
protected void finishPDF(String sFilepath) throws DocumentException, IOException {
//Transfers data from buffer 1 to builder file
PdfReader r = new PdfReader(workingBuffer.toByteArray());
PdfStamper s = new PdfStamper(r, new FileOutputStream(sFilepath));
s.setFullCompression();
r.close();
s.close();
}
private void doMainSegment() throws FileNotFoundException, IOException, DocumentException {
File fTemplate1 = new File(TEMPLATE1_FILEPATH);
for (int i = 0; i < templateTotal; i++) {
doTemplate(fTemplate1);
}
}
private void doTemplate(File f) throws FileNotFoundException, IOException, DocumentException {
PdfReader reader = new PdfReader(new FileInputStream(f));
//Transfers data from the template input file to temporary buffer
PdfStamper stamper = new PdfStamper(reader, templateBuffer);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
//Get size of template file via looking for "end" Acrofield
float[] area = form.getFieldPositions("end");
float size = TEMPLATE_SIZE.getHeight() - MARGIN_TOP - area[4];
//Requires Page Break
if (size >= PAGE_SIZE.getHeight() - MARGIN_TOP - MARGIN_BOTTOM + currPosition) {
PdfReader subreader = new PdfReader(workingBuffer.toByteArray());
PdfStamper substamper = new PdfStamper(subreader, storageBuffer);
currPosition = 0;
currPage++;
substamper.insertPage(currPage, PAGE_SIZE);
substamper.close();
subreader.close();
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
}
//Set Fields
form.setField("field1", String.format("Form Text %d", formFillCount));
form.setField("page", String.format("Page %d", currPage));
formFillCount++;
stamper.close();
reader.close();
//Read from working buffer, stamp to storage buffer, stamp template from template buffer
reader = new PdfReader(workingBuffer.toByteArray());
stamper = new PdfStamper(reader, storageBuffer);
reader.close();
reader = new PdfReader(templateBuffer.toByteArray());
PdfImportedPage page = stamper.getImportedPage(reader, 1);
PdfContentByte cb = stamper.getOverContent(currPage);
cb.addTemplate(page, 0, currPosition);
stamper.close();
reader.close();
//Reset buffers - working buffer takes on storage buffer data, storage and template buffers clear
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
currPosition -= size;
}
Running this program with a DEFAULT_NUMBER_OF_TIMES of 23 produces this document and causes the failure when sent to the printer. Changing it to 22 times produces this similar-looking document (simply with one less "line") which does not have the problem and prints successfully. Using a different PDF file as a template component completely changes these numbers or makes it so that it may not happen at all.
While this problem is likely too specific and with too many factors for other people to reasonably be expected to reproduce, the question of possibilities remains. What about the file generation could cause this unusual behavior? What might cause one file to be acceptable to a specific printer but another, generated in the same manner in different only in seemingly non-trivial ways, to be unacceptable? Is there a bug in iText produced by using the stamper template commands too heavily? This has been a long-running bug with us for a while now, so any assistance is appreciate; additionally, I am willing to answer questions or have extended conversations in chat as necessary in an effort to get to the bottom of this.
The design of your application more or less abuses the otherwise perfectly fine PdfStamper functionality.
Allow me to explain.
The contents of a page can be expressed as a stream object or as an array of a stream objects. When changing a page using PdfStamper, the content of this page is always an array of stream objects, consisting of the original stream object or the original array of stream objects to which extra elements are added.
By adding the same template creating a PdfStamper object over and over again, you increase the number of elements in the page contents array dramatically. You also introduce a huge number of q and Q operators that save and restore the stack. The reason why you have random behavior is clear: the memory and CPU available to process the PDF can vary from one moment to another. One time, there will be sufficient resources to deal with 20 q operators (saves the state), the next time there will only be sufficient resources to deal with 19. The problem occurs when the process runs out of resources.
While the PDFs you're creating aren't illegal according to ISO-32000-1, some PDF processors simply choke on these PDFs. iText is a toolbox that allows you to create PDFs that can make me very happy when I look under the hood, but it also allows you to create horrible PDFs if you don't use the toolbox wisely. The latter is what happened in your case.
You should solve this be reusing the PdfStamper instance instead of creating a new PdfStamper over and over again. If that's not possible, please post another question, using less words, explaining exactly what you want to achieve.
Suppose that you have many different source files with PDF snippets that need to be added to a single page. For instance: suppose that each PDF snippet was a coupon and you need to create a sheet with 30 coupons. Than you'd use a single PdfWriter instance, import pages with getImportedPage() and add them at the correct position using addTemplate().
Of course: I have no idea what your project is about. The idea of coupons of a page was inspired by your test PDF.

Categories