With iText I can use Java to open a pdf and write it. If the pdf has an owner password I can still open it but it can not be written.
Clearly the content is readable, it seems like at that point you could simply write the document to a new file. iText doesn't allow this, it throws a bad password exception. Is there a way around this?
By removing the throw of BadPasswordException I was able to successfully save a pdf that had an owner password.
It sounds like the PDF is likely encrypted and has an owner password set but no user password set. If that's the case then iText is doing the right thing, since the owner password must be supplied in order to decrypt the file before you write the document to a new file (by contrast supplying just the user password, in this case nothing, will allow you to view the PDF and sometimes perform other operations on it like printing and copy/paste).
Most, if not all, well reputed toolkits are going to respect the encryption. However there are some less scrupulous tools out there that allow passwords to be "broken off." This is generally best avoided but such tools do exist.
The other option, assuming that the document's permissions have been set so that the user password allows printing, would be to print the PDF to a new PDF, either using a printer driver based conversion SDK (if you get a lot of these files) or by simply manually printing (if you only get them once in a blue moon). Printing a PDF to another PDF is a somewhat nsaty process because you then have to take care to manage instances of Acrobat, but can be done in a limited fashion if absolutely necessary.
Related
I am able to extract text from PDF's which doesn't have any security restrictions. I just want to know if it is possible to extract text from PDF which has restrictions
UPDATE:
Thanks to all for your comments. I appreciate your concern. Please understand the question. I did not ask how to do it. I just want to know if it is possible. I have created a PDF with these restrictions. I do not want my information to be extracted from my document. There are many developers who can achieve any task. I want to know if this task can be done. If this can be done, then I will investigate further to overcome this issue.
As the OP clarified that he asked the question to know whether his documents with such restrictions are safe from text extraction, and that he does not ask how to do it (in spite of the explicit languages and libraries given in tags), here an answer on the principle option, not a concrete implementation. Thus...
Yes, it is possible to extract text from documents with restrictions as long as the document can be read at all and no other means are applied to prevent text extraction.
The restrictions you show merely are flags that indicate to a PDF processor what the author wants to allow or not to allow a user to do with his document but they are not technical restrictions.
These restrictions can only be applied to encrypted documents, but you surely want these restrictions to work in particular for anyone (other than you) who can open the document for reading, be it by knowing a specific user password or be it by using the empty password.
Cf. the specification ISO 32000 (here from part 2, similarly in part 1 with a focus on PDF viewers):
If a user attempts to open an encrypted document that has a user password, the PDF reader shall first try to authenticate the encrypted document using the padding string defined in 7.6.4.3, "File encryption key algorithm" (default user password):
If this authentication attempt is successful, the PDF reader may open, decrypt, render and otherwise provide access to the document.
If this authentication attempt fails, the interactive PDF processor should prompt for a password. Correctly supplying either password (owner or user password) should enable the user to gain access to the document.
Whether additional operations shall be allowed on a decrypted document depends on which password (if any) was supplied when the document was opened and on any access restrictions that were specified when the document was created:
Opening the document with the correct owner password should allow full (owner) access to the document. This unlimited access includes the ability to change the document’s passwords and access permissions.
Opening the document with the correct user password (or opening a document with the default password) should allow additional operations to be performed according to the user access permissions specified in the document’s encryption dictionary.
Access permissions shall be specified in the form of flags corresponding to the various operations and the set of operations to which they correspond shall depend on the security handler’s revision number (also stored in the encryption dictionary).
...
Once the document has been opened and decrypted successfully, a PDF reader technically has access to the entire contents of the document. There is nothing inherent in PDF encryption that enforces the document permissions specified in the encryption dictionary. PDF readers shall respect the intent of the document creator by restricting user access to an encrypted PDF file according to the permissions contained in the file.
(ISO 32000-2 section 7.6.4 Standard Security Handler)
Thus, these restrictions only work in cooperating PDF processors, but in particular in case of open source PDF libraries, it is trivial for a programmer to remove any code trying to enforce the restrictions.
Being aware of this, the developers of open source PDF libraries usually don't try to enforce the restrictions at all, or they add some flag to override restriction enforcement to prevent patched copies of the library to circulate.
I'm wondering if it is possible, using iText (that I used for signing) or other tools in Java, to add biometric data on a pdf.
I'll explain better: while signing on a sign tablet, I collect signature information like pen pressure, signing speed and so on. I'd like to store those informations (variables in java) togheter with the signature on the pdf. Obviously hidden and encrypted such as the signatures info.
Is there some kind of hidden data field on a pdf or something that can contain this kind of information? I think it is inappropriate to store it in the metadata fields such as author etc.
There are different ways to add info to a PDF document.
You could add the data in a document-level attachment. That way, people can inspect the data by opening the attachment panel.
Storing it as metadata is fine too, but you're right about it being inappropriate to store that info in something like the author key.
As you may know, the /Info dictionary will be deprecated in PDF 2.0 in favor of using an XMP metadata stream. In this metadata stream, you can add custom XML data (see section 2.2.1 of the XMP specification - Part 3).
If you don't want to mix your biometric data with the document metadata, you can even define an XMP stream for any dictionary you want, probably including the signature dictionary. See section 14.3.2 of ISO-32000-1.
PS 1: I don't know who downvoted your question. I upvoted it, so you're back at 0.
PS 2: If you want to create future proof signatures, read http://itextpdf.com/book/digitalsignatures
PS 3: Signatures created with the 4-year-old version of iText usually aren't future-proof.
We are encrypting our PDF with the following iText code. However, someone was able to edit our pdf (I am not sure how).
pdfWriter.setEncryption(null, null, PdfWriter.ALLOW_SCREENREADERS
| PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING,
PdfWriter.ENCRYPTION_AES_128);
Is there a better way for us to secure the pdf to prevent this?
PDF Encryption and restriction of information relies purely on the goodwill of the authors of the viewer software to enforce that restriction.
Generally speaking, every application that has enough information to display the PDF has enough information to print the PDF, there's nothing really you can do about it.
Since there are plenty of open-source PDF viewers out there, it's very easy to produce a viewer that simply ignores those restrictions.
See this explanation of the PDF encryption mechanism for more detail.
If your PDF is encrypted using 128 bits AES, then it is safe from someone that would not know the key, the most plausible explanation is that someone has had access to the key.
You may think about signing the PDF using RSA, that is a good way to make sure it has not been compromised.
Encryption which prevents the viewing of a pdf works if the password is long enough.
The DRM features which allow viewing but disable other features such as printing, editing,... only work if the reader co-operates. The user can use a hacked or third party reader to circumvent such restrictions.
Add a user password. It's the only one that really matters. As you have no doubt gathered from the other answers, the owner password is a bit of a joke.
The USER password is strong crypto... up to 256-bit AES IIRC, though the original PDF crypto spec only allowed for 40-bit encryption due to US export restrictions. Anything stronger than 40-bit was considered a "munition". Goofy laws.
The OWNER password is not, it's more courtesy than anything else. PDF libraries try to support it to one degree or another, but open source PDF libraries are a quick code change away from being "pdf crackers".
A blank user password means "use the predefined string of bytes listed in the PDF Specification that anyone can download". The contents of the PDF are still encrypted, but everyone knows the password, so it doesn't do you much good. PDF viewers/libraries substitute this string of bytes when given no password.
PS:
When calling setEncryption:
a null open password means "a blank password" as I described above
a null owner password means "generate a random one for me".
A random owner password means "no one can legitimately modify the PDF".. but that does not mean "no one can modify the PDF".
I am currently writing a program which takes user input and creates rows of a comma delimited .csv file. I am in need of a way to save this data in a way in which users are not able to easily edit this data. It does not need to be super secure, just enough so that it couldn't accidentally be edited. I also need another file (or the same file?) created to then be easily accessible (in the file system) by the user so that they may then email this file to a system admin who can then open the .csv file. I could provide this second person with a conversion program if necessary.
The file I save data in and the file to be sent can be two different files if there are any advantages to this. I was currently considering just using a file with a weird file extension, but saving it as a text file so that the user will only be able to open it if they know to try that. The other option being some sort of encryption, but I'm not sure if this is necessary and even if it was where I would start.
Thanks for the help :)
Edit: This file is meant to store the actual data being entered. Currently the data is being gathered on paper forms which are then sent to the admin to manually enter all of the data. This little app is meant to have someone else enter the data from the paper form and then tell them if they've entered it all correctly. After they've entered it all they then need to send the data to the admin. It would be preferable if the sending was handled automatically, but this app needs to be very simple and low budget and I don't want an internet connection to be a requirement.
You could store your data in a serializable object and save that. It would resist casual editing and be very simple to read and write from your app. This page should get you started: http://java.sun.com/developer/technicalArticles/Programming/serialization/
From your question, I am guessing that the uneditable file's purpose is to store some kind of system config and you don't want it to get messed up easily. From your own suggestions, it seems that even knowing that the file has been edited would help you, since you can then avoid using it. If that is the case, then you can use simple checks, such as save the total number of characters in the line as the first or last comma delimited value. Then, before you use the file, you just run a small validation code on it to verify that the file is indeed unaltered.
Another approach may just be to use a ZIP (file) of a "plain text format" (CSV, XML, other serialization method, etc) and, optionally, utilize a well-known (to you) password.
This approach could be used with other stream/package types: the idea behind using a ZIP (as opposed to an object serializer directly) is so that one can open/inspect/modify said data/file(s) easily without special program support. This may or may not be a benefit and using a password may or may not even be required, see below.
Some advantages of using a ZIP (or CAB):
The ability for multiple resources (aids in extensibility)
The ability to save the actual data in a "text format" (XML, perhaps)
Maintain competitive file-sizes for "common data"
Re-use existing tooling support (also get checksum validation for free!)
Additionally, using a non-ZIP file extension will prevent most users from casually associating the file (a similar approach to what is presented in the original post, but subtly different because the ZIP format itself is not "plain text") with the ZIP format and being able to open it. A number of modern Microsoft formats utilize the fact that the file-extension plays an important role and use CAB (and sometimes ZIP) formats as the container format for the document. That is, an ".XSN" or ".WSP" or ".gadget" file can be opened with a tool like 7-zip, but are generally only done so by developers who are "in the know". Also, just consider ".WAR" and ".JAR" files as other examples of this approach, since this is Java we're in.
Traditional ZIP passwords are not secure, and more-so is using a static password embedded in the program. However, if this is just a deterrent (e.g. not for "security") then those issues are not important. Coupled with an "un-associated" file-type/extension, I believe this offers the protection asked for in the question while remaining flexible. It may be possible to entirely drop the password usage and still prevent "accidental modifications" just by using a ZIP (or other) container format, depending upon requirement/desires.
Happy coding.
Can you set file permissions to make it read-only?
Other than doing a binary output file, the file system that Windows runs (I know for sure it works from XP through x64 Windows 7) has a little trick that you can use to hide data from anyone simply perusing through your files:
Append your output and input files with a colon and then an arbitrary value, eg if your filename is "data.csv", make it instead "data.csv:42". Any existing or non-existing file can be appended to to access a whole hidden area (and every file for every value after the colon is distinct, so "data.csv:42" != "data.csv:carrots" != "second.csv:carrots").
If this file doesn't exist, it will be created and initialized to have 0 bytes of data with it. If you open up the file in Notepad you will indeed see that it holds exactly the data it held before writing to the :42 file, no more, no less, but in reality subsequent data read from this "data.csv:42" file will persist. This makes it a perfect place to hide data from any annoying user!
Caveats: If you delete "data.csv", all associated hidden data will be deleted too. Also, there are indeed programs that will find these files, but if your user goes through all that trouble to manually edit some csv file, I say let them.
I also have no idea if this will work on other platforms, I've never thought to try it.
I have built a web application that can be seen as an overcomplicated application form. There are bunch of text areas with a given character limit. After the form submission various things happen and one of them is PDF generation.
The text is queried from the DB and inserted in the PDF template created in iReports. This works fine but the major pain is overflowing text.
The maximum number of characters is set based on 'average' text. But sometimes people prefer to write with CAPS or add plenty of linefeeds to format their text. These then cause user's text to overflow the space given in PDF. Unfortunately the PDF document must look like a real application form so I cannot allow unlimited space.
What kind of approaches you have used to tackle this?
Clean/restrict user input?
Calculate the space requirement of the text based on font metrics?
Provide preview of the PDF? (too bad users are not allowed to change their input after submission...)
Ideally, calculate the requirement based on metrics. I don't know how iReports handles text, but with iText, it lays everything out itself, you just present the data as a streaming document, so we don't worry about overflowing text.
However, iReport may not support that, or you may need to have the PDF layout fit within certain bounds. I'd try to clean the input (ie: if it's all caps, lowercase/sentence case/proper case it), strip extra whitespace. If cleaning the input can't be reliably done, or people are still getting past that, I'd also restrict it.
As a last resort, I'd present the PDF for the user to authorize. Really, users shouldn't be given more work to do, and they're not going to do it anyways.
Your own suggested solutions to your problem are all good. Probably the most important question to have answered is what should your PDF look like when the data to be displayed in a field won't fit? Do you ever need the "full answer" for anything else? When you know the answer to these, you'll have your options reduced.
For example if a field must be limited to 1/2 a page, and users sometimes enter more than 1/2 a page of text you can either
1) limit the user input - on submission calculate the size (using font-metrics as you said) and reject the submission until corrected. This assumes you can legitimately force the user to reduce their data entry.
2) accept the user input and truncate in the display of this report. Some systems use "..." to indicate data has been truncated, and can provide a hyperlink (even within the PDF) to get more information.
Providing a preview would work really well, but only if the users are good at checking and correcting and your system can handle the extra load this will generate.
Do you have control of the font that is used when generating the PDF? If so, I would look for a font in the Monospace family. This will give you consistent length for a given number of chars, regardless of puncuation, capitalization, etc.