PDFBox 1.8.10: Fill and Sign PDF produces invalid signatures

PDFBox 1.8.10: Fill and Sign PDF produces invalid signatures - java

I fill (programatically) a form (AcroPdf) in a PDF document and sign the document afterwards. I start with doc.pdf, create doc_filled.pdf, using the setFields.java example of PDFBox. Then I sign doc_filled.pdf, creating doc?filled_signed.pdf, using some code, based on the signature examples and open the pdf in the Acrobat Reader. The entered Field data is visible and the signature panel tells me
"There are errors in the formatting or information contained in this signature (The signature byte array is invalid)"
So far, I know that:
the signature code applied alone (i.e. directly creating some doc_signed.pdf) creates a valid signature
the problem exists for "invisible signatures", visible signatures and visible signatures, being added to existing signature fields.
the problem even occurs, if I do not fill the form, but only open it and save it, i.e.:
PDDocument doc = PDDocument.load(new File("doc.pdf"));
doc.save(new File("doc_filled.pdf"));
doc.close();
suffices to break the afterwards applied signing code.
On the other hand, if I take the same doc.pdf, enter the field's values manually in Adobe, the signing code produces valid signatures.
What am I doing wrong?
Update:
#mkl asked me to provide the files, i am talking about (I do not have enough reputation currently, to post all files as links, sorry for that inconvenience):
odc.pdf: https://www.dropbox.com/s/ev8x9q48w5l0hof/doc.pdf?dl=0
doc_filled.pdf: https://www.dropbox.com/s/fxn4gyneizs1zzb/doc_filled.pdf?dl=0
doc_filled_signed.pdf: https://www.dropbox.com/s/xm846sj8f9kiga9/doc_filled_signed.pdf?dl=0
doc_filled_and_signed.pdf: https://www.dropbox.com/s/5jftje6ke87jedr/doc_filled_and_signed.pdf?dl=0
the last one was created, by signing and filling the document in one go, using
doc.saveIncremental();
As I already wrote in the comment, some
setNeedToBeUpdate(true);
seems to be missing, though.
With reference to #mkl 's second comment, I found this
SO question: Saved Text Field value is not displayed properly in PDF generated using PDFBOX, which also covers to some entered text not being show. I gave it a first try, applying
setBoolean(COSName.getPDFName("NeedAppearances"), true);
to the field's and form's dictionary, which then shows the fields context, but the signature does not get added in the end. Still I have to look further into that.
Update:
The story continues here: PDFBox 1.8.10: Fill and Sign Document, Filling again fails

The cause of the OP's original problem, i.e. that after loading his PDF (for form fill-in) with PDFBox and then saving it, this new PDF cannot be successfully signed using PDFBox signing code, has already been explained in detail in this answer, in short:
When saving documents regularly, PDFBox does so using a cross reference table.
If the document to save regularly had been loaded from a PDF with a cross reference stream, all entries of the cross reference stream dictionary are saved in the trailer dictionary.
When saving documents in the process of applying a signature, PDFBox creates an incremental update; as such incremental updates require that the update uses the same kind of cross reference as the original revision, PDFBox in this case tries to use the same technique.
For recognizing the technique originally used PDFBox looks at the Type entry of the dictionary in its document representation into which trailer or cross reference stream dictionary had been loaded: If there is a Type entry with value XRef (which is so specified for cross reference streams), a stream is assumed, otherwise a table.
Thus, in the case of the OP's original PDF doc.pdf which has a cross reference stream:
After loading and form fill-in the document is saved regularly, i.e. using a cross reference table, but all the former cross reference stream entries, among them the Type, are copied to the trailer. (doc_filled.pdf)
After loading this saved PDF with a cross reference table for signing, it is saved again using an incremental update. PDFBox assumes (due to the Type trailer entry) that the existing file has a cross reference stream and, therefore, uses a cross reference stream at the end of the incremental update, too. (doc_filled_signed.pdf)
Thus, in the end the filled-in, then signed PDF has two revisions, the inner one with a cross reference table, the outer one with a cross reference stream.
As this is not valid, Adobe Reader upon loading the PDF, repairs this in its internal document representation. Repairing changes the document bytes. Thus, the signature in Adobe Reader's eyes is broken.
Most other signature validators don't attempt such repairs but check the signature of the document as is. They validate the signature successfully.
The answer referenced above also offers some ways around this:
A: After loading the PDF for form fill-in, remove the Type entry from the trailer before saving regularly. If signing is applied to this file, PDFBox will assume a cross reference table (because the misleading Type entry is not there. Thus, the signature incremental update will be valid.
B: Use an incremental update for saving the form fill-in changes, too, either in a separate run or in the same run as signing. This also results in a valid incremental update.
Generally I would propose the latter option because the former option likely will break if the PDFBox saving routines ever are made compatible with each other.
Unfortunately, though, the latter option requires marking the added and changed objects as updated, including a path from the document catalog. If this is not possible or at least too cumbersome, the first option might be preferable.
In the case at hand the OP tried the latter option (doc_filled_and_signed.pdf):
At the Moment the text box's content is only visible, when the text box is selected (with Acrobat reader and Preview the same behaviour). I flag the PDField, all of its parents, the AcroForm, the Catalog as well as the page where it is displayed.
He marked the changed field as updated but not the associated appearance stream which automatically is generated by PDFBox when setting the form field value.
Thus, in the result PDF file the field has the new value but the old, empty appearance stream. Only when clicking into the field, Adobe Reader creates a new appearance based on the value for editing.
Thus, the OP also has to mark the new normal appearance stream (the form field dictionary contains an entry AP referencing a dictionary in which N references the normal appearance stream). Alternatively (if finding the changed or added entries becomes too cumbersome) he might try the other option.

Related

Is it possible to add a footnote depending on the page contents when using itext?

I am using Itext 5.5 and right now, I have a custom implementation of PdfPageEventHelper that adds a footer to the page containing Page number information.
Recent changes in my application lead to the existence of necessary footnotes. The way I am creating the PDF (dynamically created from a list of Components) makes it effectively impossible to determine which page contains which items, as that is part of customizable Styling options.
However, I need to add explanations to the footnote markers.
The approach I have now is to simply notify the PdfPageEventHelper that, somewhere in the document, there is at least one element that needs the (currently only) footnote, and then I add the explanatory footnote to every Page.
This is something I want to avoid, as the future might bring more footnotes and explanations.
So the question is:
Can I parse the current page content directly and scan for the existence of marker text? Or is there another way to see if the current page needs the explanatory footnote?
My failed approaches so far (all in onEndPage(PdfWriter, Document)):
PdfContentByte cb = writer.getDirectContent();
PdfReader reader = new PdfReader(cb.toPdf(writer());
// this led to InvalidPdfException
----
ColumnText ct = new ColumnText(cb);
ct.getCompositeElements();
// returned null, I expected the current page contents
----
OutputStreamCounter oc = writer.getOs();
// did not expose any useful methods. also, cannot read from OutputStream
Googling the problem yielded dozens of results - how to add a page number or how to add a document-static, user-specific header. But nothing page-depending.
Oh, and this, which is not really helpful:
Adding a pdf footer conditionally on certain pages in a multi-page pdf document
which seems basically to be the exact same problem as mine.

Essentially you'll have to do it the other way around:
Simply add a generic tag to the chunk of a footnote marker. Then your page event listener is informed about this generic tag between the start of the page and the end of it. If you set a flag in onGenericTag, therefore, your onEndPage method merely has to check (and later reset) that flag and add the footnote accordingly.
You can even use the generic tag text to differentiate between different markers and only add the matching footnotes.
For an example use of generic tags, have a look at examples using the Chunk.setGenericTag(String) method, e.g. the sandbox example GenericFields.
(I originally referenced the iText site URL https://developers.itextpdf.com/examples/page-events-itext5/page-events-chunks but due to a restructuring of that site it leads nowhere specific anymore; but you can still find a copy of the original page using the wayback machine.)

PDFBox identify specific pages and functionalities recommendations

I am looking for a way to resolve multiple signatures on a document, so I got a couple of questions of what I can do and what I cannot.
First, since multiple signatures from different people can be added to the document, the position of the signatures is important due to aesthetics and document printing if needed. Having said this, I would like to know an approach to handle this. What I was thinking was adding/append an additional page at the end of the documents and assign to it some kind of identifier like "doc_signatures", so when the second person opens the document for signature, it detects it already has a "doc_signatures" page created, and just add the signature and save the document using the increment option in PDFBox. Is this a good approach? If it is, is there a way to identify the "doc_signatures" page so I don't append it again.
Also, can I add like signature fields to that "doc_signatures" page, with a position each one, so when I open the PDF, I detect it has "doc_signatures" already created and that it already has a signature on that page on "Field 1"(with its own X,Y coordinates) so place the second signature on "Field 2" on "doc_signatures" page and "Field 3" for the third signature, and also some type of limmit of the amount of signatures on the document?
I would appreciate if this is a acceptable approach and if it is not, is there any recommendation or something I can do to accomplish this? I would appreciate any other approach or logic for this that can be implemented using PDFBox. Regards everyone.

As you ask this question in general (not PDFBox specific) terms, I'll start by answering similarly. PDFBox is versatile enough to implement the concepts in question.
First, since multiple signatures from different people can be added to the document, the position of the signatures is important due to aesthetics and document printing if needed. Having said this, I would like to know an approach to handle this. What I was thinking was adding/append an additional page at the end of the documents and assign to it some kind of identifier like "doc_signatures", so when the second person opens the document for signature, it detects it already has a "doc_signatures" page created, and just add the signature and save the document using the increment option in PDFBox. Is this a good approach?
Whether this is a good approach or not depends on the nature of the documents to be signed and your influence on the pre-signing workflow of the document.
Paper documents often have dedicated positions for signatures of persons in a specific role. If you are buying something and as part of the sales contract acknowledge receipt, your signature has to clearly also sign the receipt part while the signature of the vendor needs not.
In digital PDF signatures you can alternatively make this clear by means of the Reason entry of the signature field value, but as you also want to print the documents, that might not suffice: In print there is no signature field value, only its appearance.
In such a situation the document to sign should already be prepared with empty signature fields positioned appropriately in the document and named or otherwise flagged to signal the role of the person to sign it. This, by the way, would also be the interoperable way, empty signature fields can easily be signed in e.g. Adobe Reader.
If this is not possible, though, and if the software for signing the document has a GUI, this GUI might provide the capabilities for each signer to position his signature appropriately for his signing reason and role.
Otherwise your extra signature page approach would be the approach of choice.
If all signers have the same role, though, or if there at least is no special appropriate position for any of the signing roles, your extra page approach might not merely be a last resort. It even kind of looks like a document resulting from a notarial act.
If it is, is there a way to identify the "doc_signatures" page so I don't append it again.
For PDFs according to the current ISO 32000-1 norm, you could do this using a page-piece dictionary:
A page-piece dictionary may be used to hold private conforming product data. The data may be
associated with a page or form XObject by means of the optional PieceInfo entry in the page object or form dictionary.
(section 14.5 of ISO 32000-1)
It looks like these piece dictionaries will be deprecated in the upcoming ISO 32000-2, though. Thus, a more future-proof approach would be for you to register a developer prefix and use your own key for that endeavor:
Developer
prefixes shall be used to identify extensions to PDF that use First Class names (see below) and that are
intended for public use.
(annex E of ISO 32000-1)
These custom keys don't seem to become deprecated in ISO 32000-2.
Also, can I add like signature fields to that "doc_signatures" page, with a position each one, so when I open the PDF, I detect it has "doc_signatures" already created and that it already has a signature on that page on "Field 1"(with its own X,Y coordinates) so place the second signature on "Field 2" on "doc_signatures" page and "Field 3" for the third signature, and also some type of limmit of the amount of signatures on the document?
You can easily inspect the annotations on your extra page and especially determine their location and extent. Consequently you can arrange additional signatures to your liking on an individual basis. Alternatively you can prepare a fixed number of empty signature fields on that extra page when you create it, arranging the signatures to your liking in one go.
All the above is only possible if the source document has not been signed before! If it already has been signed, adding a new page usually is considered a disallowed change of the document, effectively invalidating that first signature. For allowed and disallowed changes of signed document, see this answer.
Lets say each document type has a specific amount of signatures, for example, A sales document, with seller and buyer signatures, so the approach would be adding too signing fields to the documents and then place the signatures on those fields.. am I correct?
Exactly that is what I would propose: If you know the number and roles of the signers beforehand, prepare empty signature fields for them. In that case you do not even have to mark a signature page or something.
Now, sorry to bother you, with PDFBox will I be able to create signature fields and add signatures to those fields? Is there any example code for that?
Both is possible with PDFBox, but in particular adding a signature to an existing empty signature field may require some own coding.

Is it possible to append or merge one or more XFA form-based PDF files together with iText?

I have one PDF file that has an embedded form that is based on XFA (XML) forms. The first PDF has a table which holds a list of people. If that table overflows, the subsequent list of people are handled by an addendum page which is also a PDF (XFA based form). Is it possible to merge all XFA-based PDFs into one PDF using iText?

#BrunoLowagie Thanks for your response. Actually, I managed to get iText to concatenate PDF interactive forms to create a custom PDF packet. Let me explain how I did this.
From using Adobe Acrobat XI Pro, I learned that when the XFA PDF is loaded, I cannot edit the form if I go to Tools->Edit (it gives me the usual warning that this PDF was created by LiveCycle Designer), but when I go to Pages->Extract, then select all pages to be extracted, the whole XFA-based PDF is extracted and converted over to an AcroForms based PDF. So if I had 25 fields in the XFA-based PDF, it had successfully converted all 25 fields into AcroForm fields. Somehow Adobe Acrobat had to determine the variable names based on the XML structure. (i.e. xpath) //form1/Page1/variable1 was converted to acro field name: form1[0].Page1[0].variable1[0]. All visible (editable form) fields were present and aligned (pixel-perfect) as usual.
If I had flattened the XFA PDF, I would again need to place the form fields back on each page, which would be tedious. By using Pages->Extract->All Pages, it converts everything for me (no flattening needed - flattening which strips all fields also; also no XFA worker lib needed).
However, my PDF packet is static and I would like to repeat the addendum page in case data overflows from the 2nd page. I know I could have modified the initial XFA to handle this overflow, but the client wants to use the exact page look for the addendum, with headers/footers intact.
I found that I can achieve this by extracting the addendum page separately via Adobe Acrobat Pro->Pages->Extract->(Select Addendum Page), then it was converted to a PDF form w/AcroForm intact.
I took the original PDF packet and attempted to concatenated the addendum PDF page. So for the moment, the main packet has AcroForm fields and the Addendum PDF page also has AcroForm fields.
When I used PdfCopy or PdfConcatenate to do the concatenation, I noticed that I lost all form fields when I called form.getFields()
When I used the (deprecated) PdfCopyFields to do the concatenation, all AcroForm fields were intact. (Exactly what I needed!) I also tested the PdfCopyFields where some fields were filled/saved and the PdfCopyFields still worked & carried over the prefilled values over. I looked at the reason why PdfCopyFields was marked for deprecation, saying that we can either merge accessible files and lose the forms or merge the forms or lose the accessibility (tagged PDFs). What if I don't care about tagged PDFs or accessibility?....and still need PDFCopyFields to carry over the form fields intact. So far I'm forced to keep using the PDFCopyFields since it does exactly what I need for PDF concatenation with Interactive Form Fields. Is it possible to update PDFCopy to have an option to copy over the fields if PDFCopyFields is going away?

Attach hidden (biometric) data to a digital signature on a pdf

I'm wondering if it is possible, using iText (that I used for signing) or other tools in Java, to add biometric data on a pdf.
I'll explain better: while signing on a sign tablet, I collect signature information like pen pressure, signing speed and so on. I'd like to store those informations (variables in java) togheter with the signature on the pdf. Obviously hidden and encrypted such as the signatures info.
Is there some kind of hidden data field on a pdf or something that can contain this kind of information? I think it is inappropriate to store it in the metadata fields such as author etc.

There are different ways to add info to a PDF document.
You could add the data in a document-level attachment. That way, people can inspect the data by opening the attachment panel.
Storing it as metadata is fine too, but you're right about it being inappropriate to store that info in something like the author key.
As you may know, the /Info dictionary will be deprecated in PDF 2.0 in favor of using an XMP metadata stream. In this metadata stream, you can add custom XML data (see section 2.2.1 of the XMP specification - Part 3).
If you don't want to mix your biometric data with the document metadata, you can even define an XMP stream for any dictionary you want, probably including the signature dictionary. See section 14.3.2 of ISO-32000-1.
PS 1: I don't know who downvoted your question. I upvoted it, so you're back at 0.
PS 2: If you want to create future proof signatures, read http://itextpdf.com/book/digitalsignatures
PS 3: Signatures created with the 4-year-old version of iText usually aren't future-proof.

Reading PDF in java as a file and making "PDF" editable

I have a program which will be used for building questions database. I'm making it for a site that want user to know that contet was donwloaded from that site. That's why I want the output be PDF - almost everyone can view it, almost nobody can edit it (and remove e.g. footer or watermark, unlike in some simpler file types). That explains why it HAS to be PDF.
This program will be used by numerous users which will create new databases or expand existing ones. That's why having output formed as multple files is extremly sloppy and inefficient way of achieving what I want to achieve (it would complicate things for the user).
And what I want to do is to create PDF files which are still editable with my program once created.
I want to achieve this by implementing my custom file type readable with my program into the output PDF.
I came up with three ways of doing that:
Attach the file to PDF and then corrupting the part of PDF which contains it in a way it just makes the PDF unaware that it contains the file, thus making imposible for user to notice it (easely). Upon reading the document I'd revert the corruption and extract file using one of may PDF libraries.
Hide the file inside an image which would be added to the PDF somwhere on the first or last page, somehow (that is still need to work out) hidden from the public eye. Knowing it's location, it should be relativley easy to retrieve it using PDF library.
I have learned that if you add "%" sign as a first character in line inside a PDF, the whole line will be ignored (similar to "//" in Java) by the PDF reader (atleast Adobe reader), making possible for me to add as many lines as I want to the PDF (if I know where, and I do) whitout the end user being aware of that. I could implement my whole custom file into PDF that way. The problem here is that I actually have to read the PDF using one of the Java's input readers, but I'm not sure which one. I understand that PDF can't be read like a text file since it's a binary file (Right?).
In the end, I decided to go with the method number 3.
Unless someone has any better ideas, and the conditions are:
1. One file only. And that file is PDF.
2. User must not be aware of the addition.
The problem is that I don't know how to read the PDF as a file (I'm not trying to read it as a PDF, which I would do using a PDF library).
So, does anyone have a better idea?
If not, how do I read PDF as a FILE, so the output is array of characters (with newline detection), and then rewrite the whole file with my content addition?

In Java, there is no real difference between text and binary files, you can read them both as an inputstream. The difference is that for binary files, you can't really create a Reader for it, because that assumes there's a way to convert the byte stream to unicode characters, and that won't work for PDF files.
So in your case, you'd need to read the files in byte buffers and possibly loop over them to scan for bytes representing the '%' and end-of-line character in PDF.
A better way is to use another existing way of encoding data in a PDF: XMP tags. This is allows any sort of complex Key-Value pairs to be encoded in XML and embedded in PDF's, JPEGs etc. See http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf.
There's an open source library in Java that allows you to manipulate that: http://pdfbox.apache.org/userguide/metadata.html. See also a related question from another guy who succeeded in it: custom schema to XMP metadata or http://plindenbaum.blogspot.co.uk/2010/07/pdfbox-insertextract-metadata-frominto.html

It's all just 1's and 0's - just use RandomAccessFile and start reading. The PDF specification defines what a valid newline character(s) is/are (there are several). Grab a hex editor and open a PDF and you can at least start getting a feel for things. Be careful of where you insert your lines though - you'll need to add them towards the end of the file where they won't screw up the xref table offsets to the obj entries.
Here's a related question that may be of interest: PDF parsing file trailer
I would suggest putting your comment immediately before the startxref line. If you put it anywhere else, you could wind up shifting things around and breaking the xref table pointers.
So a simple algorithm for inserting your special comment will be:
Go to the end of the file
Search backwards for startxref
Insert your special comment immediately before startxref - be sure to insert a newline character at the end of your special comment
Save the PDF
You can (and should) do this manually in a hex editor.
Really important: are your users going to be saving changes to these files? i.e. if they fill in the form field, are they going to hit save? If they are, your comment lines may be removed during the save (and different versions of different PDF viewers could behave differently in this regard).
XMP tags are the correct way to do what you are trying to do - you can embed entire XML segments, and I think you'd be hard pressed to come up with a data structure that couldn't be expressed as XML.
I personally recommend using iText for this, but I'm biased (I'm one of the devs). The iText In Action book has an excellent chapter on embedding XMP data into PDFs. Here's some sample code from the book (which I definitely recommend): http://itextpdf.com/examples/iia.php?id=217

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.