I'm wondering if it is possible, using iText (that I used for signing) or other tools in Java, to add biometric data on a pdf.
I'll explain better: while signing on a sign tablet, I collect signature information like pen pressure, signing speed and so on. I'd like to store those informations (variables in java) togheter with the signature on the pdf. Obviously hidden and encrypted such as the signatures info.
Is there some kind of hidden data field on a pdf or something that can contain this kind of information? I think it is inappropriate to store it in the metadata fields such as author etc.
There are different ways to add info to a PDF document.
You could add the data in a document-level attachment. That way, people can inspect the data by opening the attachment panel.
Storing it as metadata is fine too, but you're right about it being inappropriate to store that info in something like the author key.
As you may know, the /Info dictionary will be deprecated in PDF 2.0 in favor of using an XMP metadata stream. In this metadata stream, you can add custom XML data (see section 2.2.1 of the XMP specification - Part 3).
If you don't want to mix your biometric data with the document metadata, you can even define an XMP stream for any dictionary you want, probably including the signature dictionary. See section 14.3.2 of ISO-32000-1.
PS 1: I don't know who downvoted your question. I upvoted it, so you're back at 0.
PS 2: If you want to create future proof signatures, read http://itextpdf.com/book/digitalsignatures
PS 3: Signatures created with the 4-year-old version of iText usually aren't future-proof.
Related
I am able to extract text from PDF's which doesn't have any security restrictions. I just want to know if it is possible to extract text from PDF which has restrictions
UPDATE:
Thanks to all for your comments. I appreciate your concern. Please understand the question. I did not ask how to do it. I just want to know if it is possible. I have created a PDF with these restrictions. I do not want my information to be extracted from my document. There are many developers who can achieve any task. I want to know if this task can be done. If this can be done, then I will investigate further to overcome this issue.
As the OP clarified that he asked the question to know whether his documents with such restrictions are safe from text extraction, and that he does not ask how to do it (in spite of the explicit languages and libraries given in tags), here an answer on the principle option, not a concrete implementation. Thus...
Yes, it is possible to extract text from documents with restrictions as long as the document can be read at all and no other means are applied to prevent text extraction.
The restrictions you show merely are flags that indicate to a PDF processor what the author wants to allow or not to allow a user to do with his document but they are not technical restrictions.
These restrictions can only be applied to encrypted documents, but you surely want these restrictions to work in particular for anyone (other than you) who can open the document for reading, be it by knowing a specific user password or be it by using the empty password.
Cf. the specification ISO 32000 (here from part 2, similarly in part 1 with a focus on PDF viewers):
If a user attempts to open an encrypted document that has a user password, the PDF reader shall first try to authenticate the encrypted document using the padding string defined in 7.6.4.3, "File encryption key algorithm" (default user password):
If this authentication attempt is successful, the PDF reader may open, decrypt, render and otherwise provide access to the document.
If this authentication attempt fails, the interactive PDF processor should prompt for a password. Correctly supplying either password (owner or user password) should enable the user to gain access to the document.
Whether additional operations shall be allowed on a decrypted document depends on which password (if any) was supplied when the document was opened and on any access restrictions that were specified when the document was created:
Opening the document with the correct owner password should allow full (owner) access to the document. This unlimited access includes the ability to change the document’s passwords and access permissions.
Opening the document with the correct user password (or opening a document with the default password) should allow additional operations to be performed according to the user access permissions specified in the document’s encryption dictionary.
Access permissions shall be specified in the form of flags corresponding to the various operations and the set of operations to which they correspond shall depend on the security handler’s revision number (also stored in the encryption dictionary).
...
Once the document has been opened and decrypted successfully, a PDF reader technically has access to the entire contents of the document. There is nothing inherent in PDF encryption that enforces the document permissions specified in the encryption dictionary. PDF readers shall respect the intent of the document creator by restricting user access to an encrypted PDF file according to the permissions contained in the file.
(ISO 32000-2 section 7.6.4 Standard Security Handler)
Thus, these restrictions only work in cooperating PDF processors, but in particular in case of open source PDF libraries, it is trivial for a programmer to remove any code trying to enforce the restrictions.
Being aware of this, the developers of open source PDF libraries usually don't try to enforce the restrictions at all, or they add some flag to override restriction enforcement to prevent patched copies of the library to circulate.
I fill (programatically) a form (AcroPdf) in a PDF document and sign the document afterwards. I start with doc.pdf, create doc_filled.pdf, using the setFields.java example of PDFBox. Then I sign doc_filled.pdf, creating doc?filled_signed.pdf, using some code, based on the signature examples and open the pdf in the Acrobat Reader. The entered Field data is visible and the signature panel tells me
"There are errors in the formatting or information contained in this signature (The signature byte array is invalid)"
So far, I know that:
the signature code applied alone (i.e. directly creating some doc_signed.pdf) creates a valid signature
the problem exists for "invisible signatures", visible signatures and visible signatures, being added to existing signature fields.
the problem even occurs, if I do not fill the form, but only open it and save it, i.e.:
PDDocument doc = PDDocument.load(new File("doc.pdf"));
doc.save(new File("doc_filled.pdf"));
doc.close();
suffices to break the afterwards applied signing code.
On the other hand, if I take the same doc.pdf, enter the field's values manually in Adobe, the signing code produces valid signatures.
What am I doing wrong?
Update:
#mkl asked me to provide the files, i am talking about (I do not have enough reputation currently, to post all files as links, sorry for that inconvenience):
odc.pdf: https://www.dropbox.com/s/ev8x9q48w5l0hof/doc.pdf?dl=0
doc_filled.pdf: https://www.dropbox.com/s/fxn4gyneizs1zzb/doc_filled.pdf?dl=0
doc_filled_signed.pdf: https://www.dropbox.com/s/xm846sj8f9kiga9/doc_filled_signed.pdf?dl=0
doc_filled_and_signed.pdf: https://www.dropbox.com/s/5jftje6ke87jedr/doc_filled_and_signed.pdf?dl=0
the last one was created, by signing and filling the document in one go, using
doc.saveIncremental();
As I already wrote in the comment, some
setNeedToBeUpdate(true);
seems to be missing, though.
With reference to #mkl 's second comment, I found this
SO question: Saved Text Field value is not displayed properly in PDF generated using PDFBOX, which also covers to some entered text not being show. I gave it a first try, applying
setBoolean(COSName.getPDFName("NeedAppearances"), true);
to the field's and form's dictionary, which then shows the fields context, but the signature does not get added in the end. Still I have to look further into that.
Update:
The story continues here: PDFBox 1.8.10: Fill and Sign Document, Filling again fails
The cause of the OP's original problem, i.e. that after loading his PDF (for form fill-in) with PDFBox and then saving it, this new PDF cannot be successfully signed using PDFBox signing code, has already been explained in detail in this answer, in short:
When saving documents regularly, PDFBox does so using a cross reference table.
If the document to save regularly had been loaded from a PDF with a cross reference stream, all entries of the cross reference stream dictionary are saved in the trailer dictionary.
When saving documents in the process of applying a signature, PDFBox creates an incremental update; as such incremental updates require that the update uses the same kind of cross reference as the original revision, PDFBox in this case tries to use the same technique.
For recognizing the technique originally used PDFBox looks at the Type entry of the dictionary in its document representation into which trailer or cross reference stream dictionary had been loaded: If there is a Type entry with value XRef (which is so specified for cross reference streams), a stream is assumed, otherwise a table.
Thus, in the case of the OP's original PDF doc.pdf which has a cross reference stream:
After loading and form fill-in the document is saved regularly, i.e. using a cross reference table, but all the former cross reference stream entries, among them the Type, are copied to the trailer. (doc_filled.pdf)
After loading this saved PDF with a cross reference table for signing, it is saved again using an incremental update. PDFBox assumes (due to the Type trailer entry) that the existing file has a cross reference stream and, therefore, uses a cross reference stream at the end of the incremental update, too. (doc_filled_signed.pdf)
Thus, in the end the filled-in, then signed PDF has two revisions, the inner one with a cross reference table, the outer one with a cross reference stream.
As this is not valid, Adobe Reader upon loading the PDF, repairs this in its internal document representation. Repairing changes the document bytes. Thus, the signature in Adobe Reader's eyes is broken.
Most other signature validators don't attempt such repairs but check the signature of the document as is. They validate the signature successfully.
The answer referenced above also offers some ways around this:
A: After loading the PDF for form fill-in, remove the Type entry from the trailer before saving regularly. If signing is applied to this file, PDFBox will assume a cross reference table (because the misleading Type entry is not there. Thus, the signature incremental update will be valid.
B: Use an incremental update for saving the form fill-in changes, too, either in a separate run or in the same run as signing. This also results in a valid incremental update.
Generally I would propose the latter option because the former option likely will break if the PDFBox saving routines ever are made compatible with each other.
Unfortunately, though, the latter option requires marking the added and changed objects as updated, including a path from the document catalog. If this is not possible or at least too cumbersome, the first option might be preferable.
In the case at hand the OP tried the latter option (doc_filled_and_signed.pdf):
At the Moment the text box's content is only visible, when the text box is selected (with Acrobat reader and Preview the same behaviour). I flag the PDField, all of its parents, the AcroForm, the Catalog as well as the page where it is displayed.
He marked the changed field as updated but not the associated appearance stream which automatically is generated by PDFBox when setting the form field value.
Thus, in the result PDF file the field has the new value but the old, empty appearance stream. Only when clicking into the field, Adobe Reader creates a new appearance based on the value for editing.
Thus, the OP also has to mark the new normal appearance stream (the form field dictionary contains an entry AP referencing a dictionary in which N references the normal appearance stream). Alternatively (if finding the changed or added entries becomes too cumbersome) he might try the other option.
I have a program that outputs to PDF, however, I want it to be able to read from it.
I have come up with my own data type which my program is able to read, but I need it somehow included in PDF file (no multiple files, I want one file per single output).
I also need this data to be invisible and undetectable for the user.
I heard something about PDF dictionaries, but I'm not sure how to do it (or if there's another way). I do not want to use XMP/XML file, my data is more complex than key-value.
What would be nice is somebody writing me couple example lines of code that would enable me to:
add new dicitonary to PDF using iText
populate it with data using iText
locate it in a file using iText
read from it using iText
You want to do something similar to what Adobe Illustrator is doing. If you create a PDF from Adobe Illustrator, you can encapsulate the original AI file. This gives you the impression the PDF can be edited. In reality, Adobe Illustrator takes the AI file and uses that to edit, and re-creates the PDF from the updated AI.
Where is this information stored? See ISO-32000-1 section 14.5:
Conforming products may use this dictionary as a place to store
private data in connection with that document, page, or form. Such
private data can convey information meaningful to the conforming
product that produces it (such as information on object grouping for a
graphics editor or the layer information used by Adobe Photoshop®) but
may be ignored by general-purpose conforming readers.
I'm not sure what is asked here. If you're asking for advice like what I answered above: for instance add a PieceInfo entry to the Root dictionary (aka Catalog). This is all documented, isn't it? Read the ISO specification, and read part 4 of "iText in Action".
If your question is: write some code for me that does what I need to do. then I believe that's more or less in violation with the goal of this site.
Well you could hex encode your data as a String and then draw it off screen like this:
cb.showTextAligned(PdfContentByte.ALIGN_LEFT,"HIDDENDATA_"+ hexencodeddata, 2000f,2000f, 0f);
and to read process all string searching for HIDDENDATA_
Another way is to use Annotations
public void addAnnotation(PdfWriter writer,
Document document, Rectangle rect, String text) {
PdfAnnotation annotation = new PdfAnnotation(writer,
new Rectangle(
rect.getRight() + 10, rect.getBottom(),
rect.getRight() + 30, rect.getTop()));
annotation.setTitle("Text annotation");
annotation.put(PdfName.SUBTYPE, PdfName.TEXT);
annotation.put(PdfName.OPEN, PdfBoolean.PDFFALSE);
annotation.put(PdfName.NAME, new PdfName(text));
writer.addAnnotation(annotation);
}
And then use some like this to read it.
http://downloads.snowtide.com/javadoc/PDFTextStream/2.3.2/com/snowtide/pdf/PDFTextStream.html
I have created a program that should one day become a PDF editor
It's purpose will be saving GUI's textual content to the PDF, and loading it from it. GUI resembles text editor, but it only has certain fields(JTextAreas, actually).
It can look like this (this is only one page, it can have many more, also upper and lower margins are cut out of the picture) It should actually resemble A4 in pixel size.
I have looked around for a bit for PDF libraries and found out that iText could suit my PDF creating needs, however, if I understood it correct, it retirevs text from a whole page as a string which won't work for me, because I will need to detect diferent fields/paragaphs/orsomething to be able to load them back into the program.
Now, I'm a bit lazy, but I don't want to spend hours going trough numerus PDF libraries just to find out that they won't work for me.
Instead, I'm asking someone with a bit more Java PDF handling experience to recommend me one according to my needs.
Or maybe recommend me how to add invisible parts to PDF which will help my program to determine where is it exactly situated insied a PDF file...
Just to be clear (I formed my question wrong before), only thing I need to put in my PDF is text, and that's all I need to later be able to get out. My program should be able to read PDF's which he created himself...
Also, because of the designated use of files created with this program, they need to be in the PDF format.
Short Answer: Use an intermediate format like JSON or XML.
Long Answer: You're using PDF's in a manner that they wasn't designed for. PDF's were not designed to store data; they were designed to present and format data in an portable form. Furthermore, a PDF is a very "heavy" way to store data. I suggest storing your data in another manner, perhaps in a format like JSON or XML.
The advantage now is that you are not tied to a specific output-format like PDF. This can come in handy later on if you decide that you want to export your data into another format (like a Word document, or an image) because you now have a common representation.
I found this link and another link that provides examples that show you how to store and read back metadata in your PDF. This might be what you're looking for, but again, I don't recommend it.
If you really insist on using PDF to store data, I suggest that you store the actual data in either XML or RDF and then attach that to the PDF file when you generate it. Then you can read the XML back for the data.
Assuming that your application will only consume PDF files generated by the same application, there is one part of the PDF specification called Marked Content, that was introduced precisely for this purpose. Using Marked Content you can specify the structure of the text in your document (chapter, paragraph, etc).
Read Chapter 14 - Document Interchange of the PDF Reference Document for more details.
I am attempting to figure out how to upload a dynamically generated PDF file to SugarCRM using Java.
At first I thought I would simply need to create a Document object and fill in some field expecting a byte[] 64Bit encoded as a String. However, from what I've read online, what I'm looking for is not a Document but a Note with an attachment. That seems like a poor naming convention in use; am I correct in assuming I need to be creating Notes?
In this question I am simply asking for clarification on what a Document is, and if it is suited for containing a PDF document or if the only method for uploading a file like a PDF is through Notes and their attachments.
(I will ask a followup question elsewhere depending on this question's answer.)
Thank you.
You should be able to use both Document and Note, but in SugarCRM they are used for two different things.
Documents: Used for common documents that you want to share with the users and customers (e.g. Letter templates, Product briefs, Terms, etc)
Notes: Used for documents (or files actually) related to a specific Account, Contact, or similar, which most commonly is for the users only (e.g. Business cases, Emails, Contracts, etc)
You can upload both documents and notes to SugarCRM through the Soap API.
Documents: Use set_document_revision
Notes: First use set_entry with a Note, and afterwards set_note_attachment to upload and relate the file
(Disclaimer: I haven't used the "upload documents" in sugarCRM before, but according to the wsdl, it should be possible)