How to distinguish between two encrypted / secured PDF files - java

I have two secured pdf files. One has a password and the other one is secured but without password. I am using PDF Box.
How can I identify which file has password and which one is secured but without password?

PDF's have two type of encryption -
Owner password - Password set by PDF owner / creator to restrict its usage (e.g. edit, print, copy etc)
User password - Password set to open / view the PDF
PDF can have only owner password or both; but not only user password. In either case the PDF is termed to be encrypted and there is no direct API to distinguish between two kind of encryption.
In case of PDFBox you can use below code snippet to determine if it is encrypted or not; and distinguish whether it has only owner password or both.
PDDocument pdfDoc = PDDocument.load(new File("path/to/pdf"));
boolean hasOwnerPwd = false;
boolean hasUserPwd = false;
if(pdfDoc.isEncrypted()){
hasOwnerPwd = true;
try{
StandardDecryptionMaterial sdm = new StandardDecryptionMaterial(null);
pdfDoc.openProtection(sdm);
hasUserPwd = true;
} catch(Exception e){
// handle exception
}
}
See PDFBox API docs here and here.
EDIT Thanks to Tilman to point out latest code and alternate way to determine / distinguish between two encryption. Updated the code snippet and post accordingly.

Related

Removing PublicKeyProtectionPolicy in pdfbox

I developed a PDF Encoder which normaly removes the password and enable editing and this stuff.
But now there is a file which is protected by a certificate and it's forbidden to change anything but the type is no security.
I tryed this code:
PDDocument doc = PDDocument.load(input);
AccessPermission perms = new AccessPermission();
perms.setCanAssembleDocument(true);
perms.setCanExtractContent(true);
perms.setCanModify(true);
perms.setCanModifyAnnotations(true);
perms.setCanExtractForAccessibility(true);
perms.setCanFillInForm(true);
perms.setCanPrint(true);
perms.setCanPrintDegraded(true);
perms.setCanExtractForAccessibility(true);
StandardProtectionPolicy policy = new StandardProtectionPolicy("secret", "", perms);
doc.protect(policy);
doc.setAllSecurityToBeRemoved(true);
doc.save(output);
But it's only working on password protected files.
Anyone knows the mistake / soultion?
My guess is the file is encrypted using certificates and not using standard password method. In this situation unless you have the corresponding certificate for decryption there is nothing you can do.

PDFBox extracting blanks from PDF encrypted with no password

I'm using PDFBox to extract text from forms and I have a PDF that is not encrypted with a password but PDFBox says is encrypted. I suspect some sort of Adobe "feature" since when I open it it says (SECURED), while other PDFs that I don't have issues with do not. isEncrypted() returns true so despite not having a password it appears to be secured somehow.
I suspect that it is not decrypting properly, as it is able to pull the form's text prompts but not the responses themselves. In the code below it pulls Address (Street Name and Number) and City from the sample PDF, but not the response in between them.
I am using PDFBox 2.0, but I have also tried 1.8.
I've tried every method of decrypting that I can find for PDFBox, including the deprecated ones (why not). I get the same result as not trying to decrypt at all, just the Address and City prompts.
With PDF's being the absolute nightmare that they are, this PDF was likely created in some non-standard way. Any help in identifying this and getting moving again is appreciated.
Sample PDF
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import java.io.File;
import org.apache.pdfbox.pdmodel.PDPage;
import java.awt.Rectangle;
import java.util.List;
class Scratch {
private static float pwidth;
private static float pheight;
private static int widthByPercent(double percent) {
return (int)Math.round(percent * pwidth);
}
private static int heightByPercent(double percent) {
return (int)Math.round(percent * pheight);
}
public static void main(String[] args) {
try {
//Create objects
File inputStream = new File("ocr/TestDataFiles/i-9_08-07-09.pdf");
PDDocument document = PDDocument.load(inputStream);
// Try every decryption method I've found
if(document.isEncrypted()) {
// Method 1
document.decrypt("");
// Method 2
document.openProtection(new StandardDecryptionMaterial(""));
// Method 3
document.setAllSecurityToBeRemoved(true);
System.out.println("Removed encryption");
}
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
//Get the page with data on it
PDPageTree allPages = document.getDocumentCatalog().getPages();
PDPage page = allPages.get(3);
pheight = page.getMediaBox().getHeight();
pwidth = page.getMediaBox().getWidth();
Rectangle LastName = new Rectangle(widthByPercent(0.02), heightByPercent(0.195), widthByPercent(0.27), heightByPercent(0.1));
stripper.addRegion("LastName", LastName);
stripper.setSortByPosition(true);
stripper.extractRegions(page);
List<String> regions = stripper.getRegions();
System.out.println(stripper.getTextForRegion("LastName"));
} catch (Exception e){
System.out.println(e.getMessage());
}
}
}
Brunos comment explains why the PDF is encrypted even though you do not need to enter a password:
A PDF can be encrypted with two passwords: a user password and an owner password. When a PDF is encrypted with a user password, you can't open the document in a PDF viewer without entering that password. When a PDF is encrypted with an owner password only, everyone can open a PDF without that password, but some restrictions may be in place. You can recognize PDFs encrypted with an owner password because they mention "SECURED" in Adobe Reader.
Your PDF is encrypted using only an owner password, i.e. the user password is empty. Thus, you can decrypt it using the empty password "" like this in your PDFBox version:
document.decrypt("");
(This "method 1", by the way, is exactly the same as your "method 2"
document.openProtection(new StandardDecryptionMaterial(""));
plus some exception wrapping.)
Tilman's comment implies the reason why you don't retrieve the form values: Your code uses the PDFTextStripperByArea to do text extraction, but this text extraction only extracts the fixed page content, not the content of the annotations floating on that page.
The content you want to extract is the content of form fields whose widgets are annotations.
Tilman's proposal
doc.getDocumentCatalog().getAcroForm().getField("form1[0].#subform[3].address[0]").getValueAsString()
shows how to extract the value of a form field you know the name of, "form1[0].#subform[3].address[0]" in this case. If you don't know the name of the field you want to extract content from, the PDAcroForm object returned by doc.getDocumentCatalog().getAcroForm() has a number of other methods to access field contents.
By the way, a field name like "form1[0].#subform[3].address[0]" in the AcroForm definition indicates yet another specialty of your PDF: It actually contains two form definitions, the core PDF AcroForm definition and the more independent XFA definition. Both describe the same visual form. Such a PDF form is called a hybrid PDF form.
The advantage of hybrid forms is that they can be viewed and filled in using PDF tools which only know AcroForm forms (which is essentially all software except Adobe's) while PDF tools with XFA support (essentially only Adobe's software) can make use of additional XFA features.
The drawback of hybrid forms is that if they are filled in using a tool without XFA support, only the AcroForm information are updated while the XFA information remain as before. Thus, the hybrid document can contain different data for the same field...

How to fill a secured PDF from spreadsheet data [Java]

I have looked into two libraries for doing this to no success. I am not the most experienced.
PDFBox - I think because it is a secured pdf the PDDocument class was unable to load the fields to fill.
Adobe FDFToolkit - I couldn't get the fields from the file because it was a PDF not an FDF. Not sure how to convert.
iText - org/bouncycastle/asn1/ASN1OctetString error while opening the PDF
I am having trouble getting any of these to work due to the nature of the file. It is a government immigration form which can be found here: https://www.uscis.gov/sites/default/files/files/form/i-589.pdf. Any ideas for working around this?
Your form is encrypted using an owner password. The permissions are set in such a way that they allow form filling, but iText nor PdfBox are currently fine-grained enough to check those permissions: if a PDF is encrypted, you are asked to provide a password.
However, with iText, there is a setting called unethicalreading. See How to decrypt a PDF document with the owner password? in the official documentation:
PdfReader.unethicalreading = true;
By setting this static variable to true, the PDF will be treated as if it weren't encrypted.

Is it possible to set only owner password while using setEncryption method in iText?

Is it possible to set owner password as some value and user password as null or empty while using set encryption method of PdfWriter class?
I tried using code something like this
String OWNER = "test";
PdfWriter.setEncryption(null,OWNER.getBytes(),
PdfWriter.ALLOW_PRINTING, PdfWriter.ENCRYPTION_AES_128);
I am able to open PDF generated with this code without entering any password.
BUT when I try to open it for editing with Adobe Acrobat, it opens the document in view mode and throws an error "This is secured document. Editing is not permitted."
Screenshot of error: http://dropbox.com/s/1ef551o1z0n9ug1/editerror.jpg
Any idea why this must be occurring? Am I doing something wrong?
On an additional note,
I have generated this new document with
PdfWriter.setEncryption("test1".getBytes(),"test".getBytes(),
PdfWriter.ALLOW_PRINTING, PdfWriter.ENCRYPTION_AES_128);
Link: http://dropbox.com/s/8jeia7ezervrz18/Test_Success.pdf
I am able to view it after entering password as "test1" and able to edit it with password "test". I am not sure what exactly is going wrong when I pass USER as null in earlier case.
I am using following set of jars in my project
itext-2.1.7.jar
bcmail-jdk14.jar
bcprov-jdk14.jar
private static String user = "";
private static String admin = "ADMIN";
writer.setEncryption(admin.getBytes(), user.getBytes(),
PdfWriter.ALLOW_PRINTING, PdfWriter.ENCRYPTION_AES_128);
By using the above approach you can set admin password. There might be some problem in your classpath setting. Use Mavel on Gardle for dependencies

How to programatically open a PDF with a User password

This is related to my other question... hope this one has a solution.
The requirement is to display a password-protected PDF in the browser but to pass the User password programatically. I create a PDF using Jasper and set the user password as follows:
exporter.setParameter(JRPdfExporterParameter.USER_PASSWORD, userPassword);
As soon as the PDF is created, it has to be displayed in the screen. While displaying in the browser, the user should not be prompted to key in the password ans hence the password should be supplied by the application However, if the user downloads the PDF and then tries to open it, he should be prompted to enter the password.
[Edit]: I am looking for an approach that does NOT involve licensed tools
You can open a password protected PDF using the PDF.JS library.
PDFJS.getDocument({ url: pdf_url, password: pdf_password }).then(function(pdf_doc) {
// success
}).catch(function(error) {
// incorrect password
// error is an object having 3 properties : name, message & code
});
I've written a blog post on it, also containing a demo. This is the link : http://usefulangle.com/post/22/pdfjs-tutorial-2-viewing-a-password-protected-pdf
I'm not sure whether something of this is possible. On the browser the pdf is opened by a Plugin - usually Adobe Reader plug-in. There are also other makes apart from Adobe Reader. Chrome has it own plugin.
On the browser when it detects any PDF file - the rendering plugin takes over - and this is browser specific. You hardly have any control.
Easy alternative is to show the same content in a web page - probably a modal window if the content is sensitive and give a link to download the password protected pdf file
my 2c
You could checkout PDF.js, an open source client based PDF renderer that also has support for encrypted PDFs.
http://mozilla.github.com/pdf.js/
This means you will have to put your password somewhere in the javascript though, so you will have to disguise it, but it should do the trick :)
You can use pdf.js of mozilla to render password protected PDF. The below url that will prompt for password, until the correct password is given. The password for the pdf is "test".
http://learnnewhere.unaux.com/pdfViewer/passwordviewer.html
Here is the sample code for prompting password
pdfJs.onPassword = function (updatePassword, reason) {
if (reason === 1) { // need a password
var new_password= prompt('Please enter a password:');
updatePassword(new_password);
} else { // Invalid password
var new_password= prompt('Invalid! Please enter a password:');
updatePassword(new_password);
}
};
If you want to close the password prompt on unsuccessful password attempts you can remove the else part(// Invalid password).
You could get the complete code from here https://github.com/learnnewhere/simpleChatApp/tree/master/pdfViewer

Categories