I have to create a function that tells me is an uploaded PDF file encrypted-only, timestamped-only, none, or both.
So far I am using PDFBox 2.0.5 and only success knowing timestamped-only and none file.
Here is the current code:
try{
InputStream fis = new ByteArrayInputStream(file.getBytes());
PDDocument d = PDDodument.load(fis); //encrypted file will go to InvalidPasswordException error
//***file not encrypted***
List<PDSignature> a = d.getSignatureDictionaries();
List<PDSignatureField> b = d.getSignatureFields();
if(a.size()==0 && b.size()==0){
//***file not timestamped***
}else{
//***file timestamped***
}
} catch(InvalidPasswordException e){
//***file encrypted***
//***is file timestamped?***
//***is file not timestamped?***
} catch(IOException e){
}
My question is, how to know if there any timestamp (as signature) on encrypted pdf file?
Note: I have found several PDFBox usage example but rarely find PDFBox latest version implemented example
Update note: file is an uploaded MultipartFile object, having hard time getting its path avoiding copying or transferring it into another object type
Related
I am working on a project where I need to iterate through a file system, extract text from a pdf, and scan through that text. Previously, the file system was an N drive (which acts as a local file system), so using the java File API, I could access each pdf file. Using this method, I would then extract the text:
public static String returnStringOfPDFiText(File file)
{
try {
PdfReader reader = new PdfReader(file.getPath());
int n = reader.getNumberOfPages();
String pdfText = null;
for(int i = 1; i<=n; i++)
{
pdfText += PdfTextExtractor.getTextFromPage(reader, n);
}
reader.close();
System.out.println(pdfText);
return pdfText;
}
catch(Exception e)
{
System.out.print(e);
return null;
}
}
From here, I could scan through the text.
I now need to do this, but using a dropbox file system. I can only find a way to get the metadata of each file, though, and not the actual file, so I can extract text.
Is there a way to get the file so I can call this method on the file to extract the text, or to just extract the text directly from the dropbox file?
Edit: I am working with the DropboxAPI already (though I might be missing some methods, I haven't read through a lot of the documentation). I am aware of the download method, but I don't want to use it, since we will be working with around 1 gb of pdfs, and downloading it would be super inefficient.
Dropbox does offer an API you can use for listing, uploading, and downloading files, among other operations. You can find everything you need to get started with the Dropbox API, including documentation, tutorials, and SDKs here:
https://www.dropbox.com/developers
For Java specifically, we recommend you use the official Dropbox Java SDK:
https://github.com/dropbox/dropbox-sdk-java
To download a file's contents using that, you can use the download method:
https://dropbox.github.io/dropbox-sdk-java/api-docs/v5.2.0/com/dropbox/core/v2/files/DbxUserFilesRequests.html#download(java.lang.String)
You can find an example of that here:
https://github.com/dropbox/dropbox-sdk-java/blob/e52fc828c7c753e04c3fa9d47ab6de7e85d000c4/examples/tutorial/src/main/java/com/dropbox/core/examples/tutorial/Main.java#L54
I'm currently working with PDFs on a Java application that makes some modifications to PDF Documents.
Currently, the signing of these PDFs is working, as I am using classes such as FileInputStream and FileOutputStream. Basically, I copy the original documents from a source folder, and then put them in a output folder, with. I am using PDDocument class with pdfbox 1.8.9
However, I want to use the same file, meaning I don't pretend to copy the PDFs anymore. I want to grab the document, sign it, and overwrite the original one.
Since I learned that having FileInputStream and FileOutputStream pointing at the same file is not a good idea, I simply tried to use the File class.
I tried the following:
File file = new File(locOriginal);
PDDocument doc = PDDocument.load(file);
PDSignature signature = new PDSignature();
Overlay overlay = new Overlay();
//The signature itself. It has not been modified
signature.setFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED); // default filter
signature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
if (msg.getAreaNegocio().startsWith("A")) {
signature.setName(this.campoCertificadoAcquiring);
signature.setLocation(this.localCertificadoAcquiring);
signature.setReason(this.razaoCertificadoAcquiring);
}else {
signature.setName(this.campoCertificadoIssuing);
signature.setLocation(this.localCertificadoIssuing);
signature.setReason(this.razaoCertificadoIssuing);
}
// register signature dictionary and sign interface
doc.addSignature(signature,this);
doc.saveIncremental(file.getAbsolutePath());
doc.close();
My PDF file does get overwritten as intended, yet, the signature is not valid anymore when I open the file. I read these questions... Does it relate to any of these issues? What can I do to solve to this?
PDFBox 1.8.10: Fill and Sign PDF produces invalid signatures
PDFBox - opening and saving a signed pdf invalidates my signature
Thanks for the help!
The 1.8.* saveIncremental(filename) was buggy until PDFBox 1.8.16. This is described in PDFBOX-4312 but is confusing because the user deleted most of his own messages and had multiple other problems. If you insist on using an outdated version (that has a security issue), then try this code instead of calling saveIncremental(filename):
//BEWARE: do not "optimize" this method by using buffered streams,
// because COSStandardOutputStream only allows seeking
// if a FileOutputStream is passed, see PDFBOX-4312.
FileInputStream fis = new FileInputStream(fileName);
byte[] ba = IOUtils.toByteArray(fis);
fis.close();
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(ba);
fis = new FileInputStream(fileName);
saveIncremental(fis, fos);
And no, I don't think that the questions you linked to related to your issue.
Btw I don't consider overwriting the original file to be a good idea. You are risking the loss of your file if there is an error or a power loss.
See also the comment by mkl: setFilter() is usually called with parameter PDSignature.FILTER_ADOBE_PPKLITE.
I am using Kryo to save binary files of user data. The user can open one of their files in my application. I'm not sure if I have a clean approach to detecting whether they tried to open a file of some other type.
Right now, I'm writing a simple FileHeader object to the file before the user's data. The file header has info about what version of the app saved the file.
public void write (UserProject project, File file) throws FileNotFoundException {
OutputStream outputStream = new DeflaterOutputStream(new FileOutputStream(file));
Output output = new Output(outputStream);
kryo.writeObject(output, new FileHeader());
kryo.writeObject(output, project);
output.close();
}
So when I load a file, I can try to deserialize the file header and the user project and catch any Exception that might occur. But doing a catch-all block could hide certain issues I could perhaps react to in a more elegant way that simply showing the user an error no matter the exception. Here's what I'm doing now:
public Project read (File file) throws FileNotFoundException, FileVersionException, UnreadableException {
InputStream inputStream = new InflaterInputStream(new FileInputStream(file));
Input input = new Input(inputStream);
try {
FileHeader fileHeader = kryo.readObject(input, FileHeader.class);
if (fileHeader.fileVersion > CURRENT_FILE_VERSION)
throw new FileVersionException(/* */);
Project project = kryo.readObject(input, Project.class);
return project;
} catch (Exception e){
if (DEBUG) e.printStackTrace();
throw new UnreadableException(e); //caller will show user error msg
} finally {
input.close();
}
}
I suppose there's also a very tiny (infinitesimal?) chance that some file actually loads without throwing an exception, in which case a very unexpected error could happen elsewhere in my application. Not sure if I should worry about this...a user should not expect to open an incorrect file type and have it work correctly.
You could use magic numbers, a set of bytes that describes the type of file. Like .jpg, .pdf, .wav, etc. all have a few bytes at the beginning of each file, so even if these types are saved with different extensions you can check to see if the file's magic number is OK.
Magic Number Description
However, if you're serializing and deserializing you may have to tack on some additional data to the file after serializing and remove it before deserializing.
HI I am new comer java developer of OLE Object of Package Part. i am facing issue about to read embedded zip from docx file in my current project.
I have read docx file and get package part to read embedded object . it returns PackagePart class.
We need to read a zip file which has some excel files.
I am confuse that how to read it to get data from excel files.
i am using some code to do this.: -
PackagePart pPart = null;
Iterator<PackagePart> pIter = null;
List<PackagePart> embeddedDocs = document.getAllEmbedds();
if (embeddedDocs != null && !embeddedDocs.isEmpty())
{
pIter = embeddedDocs.iterator();
while (pIter.hasNext())
{
pPart = pIter.next();
//System.out.println(pPart.getPartName().getExtension());
System.out.println(pPart.getInputStream());
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
This code provides us some output like
java.io.ByteArrayInputStream#2862c542
java.io.ByteArrayInputStream#6c8484c4
java.io.ByteArrayInputStream#70289784
java.io.ByteArrayInputStream#78f394a2
It is possible that we can read data from attached zip in docx file ?
can we save all data from zip to my hard disk ?
Please help me
Thanks to interest.
I'm trying to use the pdf library from jpedal, using the code snippet found here: http://www.jpedal.org/simple_image_example.php
/**instance of PdfDecoder to convert PDF into image*/
PdfDecoder decode_pdf = new PdfDecoder(true);
/**set mappings for non-embedded fonts to use*/
FontMappings.setFontReplacements();
/**open the PDF file - can also be a URL or a byte array*/
try {
decode_pdf.openPdfFile("C:/myPDF.pdf"); //file
//decode_pdf.openPdfFile("C:/myPDF.pdf", "password"); //encrypted file
//decode_pdf.openPdfArray(bytes); //bytes is byte[] array with PDF
//decode_pdf.openPdfFileFromURL("http://www.mysite.com/myPDF.pdf",false);
/**get page 1 as an image*/
//page range if you want to extract all pages with a loop
//int start = 1, end = decode_pdf.getPageCount();
BufferedImage img=decode_pdf.getPageAsImage(1);
/**close the pdf file*/
decode_pdf.closePdfFile();
} catch (PdfException e) {
e.printStackTrace();
}
But on this line:
decode_pdf.openPdfFile("C:/myPDF.pdf"); //file
Eclipse trows an error:
The type javax.swing.JPanel cannot be resolved. It is indirectly
referenced from required .class files
It seems as if I'm missing javax.swing.*
Intellisence does give me other javax.* options but not the swing class.
I already searched google for this but I had no luck finding a solution.
Any ideas?
Can't get any clearer than this:
http://www.jpedal.org/PDFblog/2011/09/java-is-not-android/
Appears that the library I wanted to use was not compatible with android at all.
Also not the last sentence:
that makes converting a Java PDF viewer to Android a major task.
Thanks for crushing that last bit of hope for me jpedal...
I doubt path is not resolved, Try this
decode_pdf.openPdfFile("C:\\myPDF.pdf");