PDFBox identify specific pages and functionalities recommendations

PDFBox identify specific pages and functionalities recommendations - java

I am looking for a way to resolve multiple signatures on a document, so I got a couple of questions of what I can do and what I cannot.
First, since multiple signatures from different people can be added to the document, the position of the signatures is important due to aesthetics and document printing if needed. Having said this, I would like to know an approach to handle this. What I was thinking was adding/append an additional page at the end of the documents and assign to it some kind of identifier like "doc_signatures", so when the second person opens the document for signature, it detects it already has a "doc_signatures" page created, and just add the signature and save the document using the increment option in PDFBox. Is this a good approach? If it is, is there a way to identify the "doc_signatures" page so I don't append it again.
Also, can I add like signature fields to that "doc_signatures" page, with a position each one, so when I open the PDF, I detect it has "doc_signatures" already created and that it already has a signature on that page on "Field 1"(with its own X,Y coordinates) so place the second signature on "Field 2" on "doc_signatures" page and "Field 3" for the third signature, and also some type of limmit of the amount of signatures on the document?
I would appreciate if this is a acceptable approach and if it is not, is there any recommendation or something I can do to accomplish this? I would appreciate any other approach or logic for this that can be implemented using PDFBox. Regards everyone.

As you ask this question in general (not PDFBox specific) terms, I'll start by answering similarly. PDFBox is versatile enough to implement the concepts in question.
First, since multiple signatures from different people can be added to the document, the position of the signatures is important due to aesthetics and document printing if needed. Having said this, I would like to know an approach to handle this. What I was thinking was adding/append an additional page at the end of the documents and assign to it some kind of identifier like "doc_signatures", so when the second person opens the document for signature, it detects it already has a "doc_signatures" page created, and just add the signature and save the document using the increment option in PDFBox. Is this a good approach?
Whether this is a good approach or not depends on the nature of the documents to be signed and your influence on the pre-signing workflow of the document.
Paper documents often have dedicated positions for signatures of persons in a specific role. If you are buying something and as part of the sales contract acknowledge receipt, your signature has to clearly also sign the receipt part while the signature of the vendor needs not.
In digital PDF signatures you can alternatively make this clear by means of the Reason entry of the signature field value, but as you also want to print the documents, that might not suffice: In print there is no signature field value, only its appearance.
In such a situation the document to sign should already be prepared with empty signature fields positioned appropriately in the document and named or otherwise flagged to signal the role of the person to sign it. This, by the way, would also be the interoperable way, empty signature fields can easily be signed in e.g. Adobe Reader.
If this is not possible, though, and if the software for signing the document has a GUI, this GUI might provide the capabilities for each signer to position his signature appropriately for his signing reason and role.
Otherwise your extra signature page approach would be the approach of choice.
If all signers have the same role, though, or if there at least is no special appropriate position for any of the signing roles, your extra page approach might not merely be a last resort. It even kind of looks like a document resulting from a notarial act.
If it is, is there a way to identify the "doc_signatures" page so I don't append it again.
For PDFs according to the current ISO 32000-1 norm, you could do this using a page-piece dictionary:
A page-piece dictionary may be used to hold private conforming product data. The data may be
associated with a page or form XObject by means of the optional PieceInfo entry in the page object or form dictionary.
(section 14.5 of ISO 32000-1)
It looks like these piece dictionaries will be deprecated in the upcoming ISO 32000-2, though. Thus, a more future-proof approach would be for you to register a developer prefix and use your own key for that endeavor:
Developer
prefixes shall be used to identify extensions to PDF that use First Class names (see below) and that are
intended for public use.
(annex E of ISO 32000-1)
These custom keys don't seem to become deprecated in ISO 32000-2.
Also, can I add like signature fields to that "doc_signatures" page, with a position each one, so when I open the PDF, I detect it has "doc_signatures" already created and that it already has a signature on that page on "Field 1"(with its own X,Y coordinates) so place the second signature on "Field 2" on "doc_signatures" page and "Field 3" for the third signature, and also some type of limmit of the amount of signatures on the document?
You can easily inspect the annotations on your extra page and especially determine their location and extent. Consequently you can arrange additional signatures to your liking on an individual basis. Alternatively you can prepare a fixed number of empty signature fields on that extra page when you create it, arranging the signatures to your liking in one go.
All the above is only possible if the source document has not been signed before! If it already has been signed, adding a new page usually is considered a disallowed change of the document, effectively invalidating that first signature. For allowed and disallowed changes of signed document, see this answer.
Lets say each document type has a specific amount of signatures, for example, A sales document, with seller and buyer signatures, so the approach would be adding too signing fields to the documents and then place the signatures on those fields.. am I correct?
Exactly that is what I would propose: If you know the number and roles of the signers beforehand, prepare empty signature fields for them. In that case you do not even have to mark a signature page or something.
Now, sorry to bother you, with PDFBox will I be able to create signature fields and add signatures to those fields? Is there any example code for that?
Both is possible with PDFBox, but in particular adding a signature to an existing empty signature field may require some own coding.

Related

Is it possible to add a footnote depending on the page contents when using itext?

I am using Itext 5.5 and right now, I have a custom implementation of PdfPageEventHelper that adds a footer to the page containing Page number information.
Recent changes in my application lead to the existence of necessary footnotes. The way I am creating the PDF (dynamically created from a list of Components) makes it effectively impossible to determine which page contains which items, as that is part of customizable Styling options.
However, I need to add explanations to the footnote markers.
The approach I have now is to simply notify the PdfPageEventHelper that, somewhere in the document, there is at least one element that needs the (currently only) footnote, and then I add the explanatory footnote to every Page.
This is something I want to avoid, as the future might bring more footnotes and explanations.
So the question is:
Can I parse the current page content directly and scan for the existence of marker text? Or is there another way to see if the current page needs the explanatory footnote?
My failed approaches so far (all in onEndPage(PdfWriter, Document)):
PdfContentByte cb = writer.getDirectContent();
PdfReader reader = new PdfReader(cb.toPdf(writer());
// this led to InvalidPdfException
----
ColumnText ct = new ColumnText(cb);
ct.getCompositeElements();
// returned null, I expected the current page contents
----
OutputStreamCounter oc = writer.getOs();
// did not expose any useful methods. also, cannot read from OutputStream
Googling the problem yielded dozens of results - how to add a page number or how to add a document-static, user-specific header. But nothing page-depending.
Oh, and this, which is not really helpful:
Adding a pdf footer conditionally on certain pages in a multi-page pdf document
which seems basically to be the exact same problem as mine.

Essentially you'll have to do it the other way around:
Simply add a generic tag to the chunk of a footnote marker. Then your page event listener is informed about this generic tag between the start of the page and the end of it. If you set a flag in onGenericTag, therefore, your onEndPage method merely has to check (and later reset) that flag and add the footnote accordingly.
You can even use the generic tag text to differentiate between different markers and only add the matching footnotes.
For an example use of generic tags, have a look at examples using the Chunk.setGenericTag(String) method, e.g. the sandbox example GenericFields.
(I originally referenced the iText site URL https://developers.itextpdf.com/examples/page-events-itext5/page-events-chunks but due to a restructuring of that site it leads nowhere specific anymore; but you can still find a copy of the original page using the wayback machine.)

adding digital signature to PDF with visible timestamp and Reason field using ESIG/DSS

I am trying to understand and implement a solution based on European Commission-sponsored Digital Signature Service project. I currently have succeeded in using the abstraction provided by the DSS-DEMO application, mentioned in the aforementioned github link, with the help of Nowina NexU client software. My wish is to digitally sign a PDF document with the following configuration:
no container
PAdES signature form
enveloped
PAdES_BASELINE_LT signature level
SHA256 digest algorithm
I want the signature to have a visible part, i.e. to be seen on the first page of the document. This is somewhat demonstrated here. Personally, I need the actual signing timestamp and the name of the signer from her certificate. In the above demonstration this is done by providing "parameters" to the signing function.
I also want to fill the Reason field of the signature - it is then subsequently displayed when you view the Signature properties with a program like Adobe Acrobat Reader.
My problems so far are the following, and I can't seem to find neither examples nor other sort of information about them.
If I want to display the signing timestamp that I would get from a Timestamp Authority service, how would I get it, since the communication with the timestamp server is done during the signing process, i.e. after specifying the parameters as I mention above. I guess I have to dig into DSS code and do all the steps done there for me myself.
Currently, a strange thing happens. It appears that the signatures are deemed valid, or at least UNKNOWN, when I specify a hardcoded Reason (like 'testtest'), or no Reason at all. When I fill it from results of something else, the signature is not valid. Because things like this don't usually happen by magic, I must be doing something awfully wrong.
The code is organized approximately like this - there's a REST communication between two machines - a server and a client with NexU installed. NexU does all the communication with the smart card or any other certificate store on the client machine - it exchanges the digest value and the signed digest value with the server. There are, among others, two specific phases in the server code :
getDataToSign - here a digest is calculated from the PDF content
signDocument - here the actual signing - (embedding of the signature into the document, i guess?) takes place.
I am giving to both these phases a host of parameters, that, among other things, specify the signing timestamp, the Reason, and the visual parameters of the text I want to appear on the first page. I am doing this with the same parameters for both of the phases (because I am not sure on which phase I should give which)
My signing date - isn't it logical for it to be as close to the timestamp authority server's timestamp as can be? Okay - I am setting it to the current timestamp of my own server at the time of the beginning of the signing process.
I am setting Reason using PAdESSignatureParameters.setReason.
Any helpful insight is appreciated - thanks.

I have solved the weird issue with the Reason field of the Signature.
I don't seem to see any way around the Signing Date being different from the Timestamping Authority-provided Timestamp.
Explanation follows.
As far as the first case, it was my fault. To elaborate, following my understanding, the signature parameters are provided to the DSS methods two times using SigningService.fillParameters() method.
in SigningService.getDataToSign(...) and then
in SigningService.signDocument(...)
This is important to be done in both methods, because during the first time, the hash/digest of the document-to-be-signed is calculated. Since I have chosen the signature to be enveloped, i.e. to be contained within the document that is going to be signed, we need to first apply the signature, and then calculate the digest basing on that "final" document.
As far as I saw in the DSS code (approximately), the in-memory representation of the uploaded PDF is signed, and its digest is calculated, during getDataToSign - but the outcome is discarded.
During the actual signDocument method (in between, the digest has travelled back to the client with NexU installed, and returned back to the server signed), the uploaded PDF is signed again, its digest calculated again, but this time the actual signed digest (we got from the client) is also applied to the document - and the in-memory outcome of this operation is sent back to the client as a signed PDF document.
What I was doing wrong is that during the first time, I was losing the variable that I was going to add as the Reason (it was lost somewhere in the model attributes - i was not passing it somewhere in between requests), with the result of my first map of parameters passed to getDataToSign differing from the second map of parameters - so it is only logical, that the actual hash/digest of the document was different from the digest in the saved signature (because at the time the digest-to-be-signed was calculated, i wasn't passing the Reason). That's why when I was passing a hardcoded value, because it was hardcoded, it was present during both calls to fillParameters. It was such a stupid mistake, I know. I should have known this because there was absolutely nothing about any difficulties with passing Reason (or other fields like Location) to the Signature.
BTW the signing is done using Apache PDFBox, and it is done incrementally.
As for the second thing, we decided to leave it as it is, although there is a comparatively impressive gap between the signing timestamp and the timestamp authority one. I don't really know what should be the allowed gaps in cases like this. I guess that this happens because
My server might have a slightly off-normal local time
Because the whole process of signing is going on between two machines (server and client with NexU installed, as well as the smart card), and because there are different dialog windows appearing asking for password etc. - it all postpones the actual signing and the call to the timestamping authority is done during the very last step. Of course, I am not sure if it is an issue, since theoretically timestamping authority doesn't know about the actual contents being changed - the previous error would have been triggered in that case..
That's more like it - of course I am open to other comments and answers. Thank you!

PDFBox 1.8.10: Fill and Sign PDF produces invalid signatures

I fill (programatically) a form (AcroPdf) in a PDF document and sign the document afterwards. I start with doc.pdf, create doc_filled.pdf, using the setFields.java example of PDFBox. Then I sign doc_filled.pdf, creating doc?filled_signed.pdf, using some code, based on the signature examples and open the pdf in the Acrobat Reader. The entered Field data is visible and the signature panel tells me
"There are errors in the formatting or information contained in this signature (The signature byte array is invalid)"
So far, I know that:
the signature code applied alone (i.e. directly creating some doc_signed.pdf) creates a valid signature
the problem exists for "invisible signatures", visible signatures and visible signatures, being added to existing signature fields.
the problem even occurs, if I do not fill the form, but only open it and save it, i.e.:
PDDocument doc = PDDocument.load(new File("doc.pdf"));
doc.save(new File("doc_filled.pdf"));
doc.close();
suffices to break the afterwards applied signing code.
On the other hand, if I take the same doc.pdf, enter the field's values manually in Adobe, the signing code produces valid signatures.
What am I doing wrong?
Update:
#mkl asked me to provide the files, i am talking about (I do not have enough reputation currently, to post all files as links, sorry for that inconvenience):
odc.pdf: https://www.dropbox.com/s/ev8x9q48w5l0hof/doc.pdf?dl=0
doc_filled.pdf: https://www.dropbox.com/s/fxn4gyneizs1zzb/doc_filled.pdf?dl=0
doc_filled_signed.pdf: https://www.dropbox.com/s/xm846sj8f9kiga9/doc_filled_signed.pdf?dl=0
doc_filled_and_signed.pdf: https://www.dropbox.com/s/5jftje6ke87jedr/doc_filled_and_signed.pdf?dl=0
the last one was created, by signing and filling the document in one go, using
doc.saveIncremental();
As I already wrote in the comment, some
setNeedToBeUpdate(true);
seems to be missing, though.
With reference to #mkl 's second comment, I found this
SO question: Saved Text Field value is not displayed properly in PDF generated using PDFBOX, which also covers to some entered text not being show. I gave it a first try, applying
setBoolean(COSName.getPDFName("NeedAppearances"), true);
to the field's and form's dictionary, which then shows the fields context, but the signature does not get added in the end. Still I have to look further into that.
Update:
The story continues here: PDFBox 1.8.10: Fill and Sign Document, Filling again fails

The cause of the OP's original problem, i.e. that after loading his PDF (for form fill-in) with PDFBox and then saving it, this new PDF cannot be successfully signed using PDFBox signing code, has already been explained in detail in this answer, in short:
When saving documents regularly, PDFBox does so using a cross reference table.
If the document to save regularly had been loaded from a PDF with a cross reference stream, all entries of the cross reference stream dictionary are saved in the trailer dictionary.
When saving documents in the process of applying a signature, PDFBox creates an incremental update; as such incremental updates require that the update uses the same kind of cross reference as the original revision, PDFBox in this case tries to use the same technique.
For recognizing the technique originally used PDFBox looks at the Type entry of the dictionary in its document representation into which trailer or cross reference stream dictionary had been loaded: If there is a Type entry with value XRef (which is so specified for cross reference streams), a stream is assumed, otherwise a table.
Thus, in the case of the OP's original PDF doc.pdf which has a cross reference stream:
After loading and form fill-in the document is saved regularly, i.e. using a cross reference table, but all the former cross reference stream entries, among them the Type, are copied to the trailer. (doc_filled.pdf)
After loading this saved PDF with a cross reference table for signing, it is saved again using an incremental update. PDFBox assumes (due to the Type trailer entry) that the existing file has a cross reference stream and, therefore, uses a cross reference stream at the end of the incremental update, too. (doc_filled_signed.pdf)
Thus, in the end the filled-in, then signed PDF has two revisions, the inner one with a cross reference table, the outer one with a cross reference stream.
As this is not valid, Adobe Reader upon loading the PDF, repairs this in its internal document representation. Repairing changes the document bytes. Thus, the signature in Adobe Reader's eyes is broken.
Most other signature validators don't attempt such repairs but check the signature of the document as is. They validate the signature successfully.
The answer referenced above also offers some ways around this:
A: After loading the PDF for form fill-in, remove the Type entry from the trailer before saving regularly. If signing is applied to this file, PDFBox will assume a cross reference table (because the misleading Type entry is not there. Thus, the signature incremental update will be valid.
B: Use an incremental update for saving the form fill-in changes, too, either in a separate run or in the same run as signing. This also results in a valid incremental update.
Generally I would propose the latter option because the former option likely will break if the PDFBox saving routines ever are made compatible with each other.
Unfortunately, though, the latter option requires marking the added and changed objects as updated, including a path from the document catalog. If this is not possible or at least too cumbersome, the first option might be preferable.
In the case at hand the OP tried the latter option (doc_filled_and_signed.pdf):
At the Moment the text box's content is only visible, when the text box is selected (with Acrobat reader and Preview the same behaviour). I flag the PDField, all of its parents, the AcroForm, the Catalog as well as the page where it is displayed.
He marked the changed field as updated but not the associated appearance stream which automatically is generated by PDFBox when setting the form field value.
Thus, in the result PDF file the field has the new value but the old, empty appearance stream. Only when clicking into the field, Adobe Reader creates a new appearance based on the value for editing.
Thus, the OP also has to mark the new normal appearance stream (the form field dictionary contains an entry AP referencing a dictionary in which N references the normal appearance stream). Alternatively (if finding the changed or added entries becomes too cumbersome) he might try the other option.

Digital signatures in MS Word

Do you know a way (f.e. library) in which I can generate signature lines in Microsoft Word document in Java as described for example on Microsoft's page
and later use it to sign a document multiple times? Each signature can be added at different point in time.
The crucial functionality I need to have is that a signer's name is added when a document is signed. In MS Word at the beginning you can define a number of 'signature lines' and then each person must right click on one signature box and click sign. The signer's name is filled in without breaking previous signatures.
I know how to sign documents in java (usually external xades signatures). I was also able to add multiple signatures at the same time using Apache POI as described here:
How to programatically sign an MS office XML document with Java? but this will not work when changing a document (to update a signer's name). Maybe this functionality is also available in Apache POI?
In the attached print screen you can see (unfortunately in Polish; PODPIS/PODPISY == SIGNATURE/SIGNATURES) what can be generated in Microsoft Word. Near the 'X' signers' names are updated when the document is signed by each person.
Thanks for any help.

Attach hidden (biometric) data to a digital signature on a pdf

I'm wondering if it is possible, using iText (that I used for signing) or other tools in Java, to add biometric data on a pdf.
I'll explain better: while signing on a sign tablet, I collect signature information like pen pressure, signing speed and so on. I'd like to store those informations (variables in java) togheter with the signature on the pdf. Obviously hidden and encrypted such as the signatures info.
Is there some kind of hidden data field on a pdf or something that can contain this kind of information? I think it is inappropriate to store it in the metadata fields such as author etc.

There are different ways to add info to a PDF document.
You could add the data in a document-level attachment. That way, people can inspect the data by opening the attachment panel.
Storing it as metadata is fine too, but you're right about it being inappropriate to store that info in something like the author key.
As you may know, the /Info dictionary will be deprecated in PDF 2.0 in favor of using an XMP metadata stream. In this metadata stream, you can add custom XML data (see section 2.2.1 of the XMP specification - Part 3).
If you don't want to mix your biometric data with the document metadata, you can even define an XMP stream for any dictionary you want, probably including the signature dictionary. See section 14.3.2 of ISO-32000-1.
PS 1: I don't know who downvoted your question. I upvoted it, so you're back at 0.
PS 2: If you want to create future proof signatures, read http://itextpdf.com/book/digitalsignatures
PS 3: Signatures created with the 4-year-old version of iText usually aren't future-proof.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.