How to downgrade arbitrary PDF file to version PDF-1.2? - java

I have some user generated PDF files. Typically the files are be generated with Word, but they could be just a about any kind of valid PDF file. I'd like convert the file to version PDF-1.2 if they have higher version number. The features available only in higher version (like multimedia) should be removed and the result should be still reasonably reasonable and readable.
How to do this programmatically, without interactive tools such as Adobe Acrobat? Preferably with Java and iText-library, but I would be interested in other solutions also.
One way would be to generate a bunch of images from original PDF and then package them as a PDF-1.2 file, but is the a more elegant way?

Try the commandline below. It uses Ghostscript to re-distill the PDF. Use Ghostscript version 8.71 or newer: 9.00. (The wrongly up-voted answer above advicing to "set PDF version in iText using setPdfVersion()" will NOT work -- it only re-labels the PDF, which will only be mis-leading...)
gswin32c.exe ^
-o output-v1.2.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/ebook ^
-dCompatibilityLevel=1.2 ^
input-v1.6.pdf

The easiest is to reprint it through Ghostscript.

You can set the PDF version in iText using setPdfVersion() however downgrading won't work out of the box I think. You could use PdfCopy and write your pdfs to a new one with the version 1.2 and strip out all none 1.2 objects. Or convert them to version 1.2 objects (which you will have to do yourself I think, not sure however)

Related

Replacement for COSName.DOCMDP in PDFBox 2.0.4

I'm testing the example codes from this page:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/
But inside the file CreateSignatureBase.java, exactly in the functions getMDPPermission and setMDPPermission, it calls a property that doesn't exist anymore: COSName.DOCMDP. I perused the Pdfbox page and its migration guide and it doesn't mention this property and how to replace it. I also looked into the PDfbox source code (exactly the file COSName.java) and It doesn't have that property, despite this file:
https://svn.apache.org/viewvc/pdfbox/branches/2.0/pdfbox/src/main/java/org/apache/pdfbox/cos/COSName.java?view=markup does have it.
I checked both pdfbox-2.0.4.jar and pdfbox-app-2.0.4.jar adding them to the Netbeans project where I'm testing the java files from the pdfbox examples. None of them have the property COSName.DOCMDP in the COSName class.
Both jars and the pdfbox sourcecode are downloaded from here:
https://pdfbox.apache.org/download.cgi#20x
How can I replace the property COSName.DOCMDP in the CreateSignatureBase class? Am I getting the right jars?
It will appear in 2.1.0 version:
https://issues.apache.org/jira/browse/PDFBOX-3017
https://issues.apache.org/jira/browse/PDFBOX-3699
https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSName.java?annotate=1786065
If you need it for testing purposes, you may download it's SNAPSHOT version from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox/
Or, you may see this example in current stable version - just download 2.0.4 jar and browse examples.

iText PDF Concatination fails - InvalidPDFException

I am trying to concatenate 2 PDFs using itext 4.2.0 utility. For few cases, it throws InvalidPDFException in below code
reader = new PdfReader("c:\tmp\test.pdf");
com.itextpdf.text.exceptions.InvalidPdfException: No message found for
trailer.not.found at
com.itextpdf.text.pdf.PdfReader.rebuildXref(Unknown Source) at
com.itextpdf.text.pdf.PdfReader.readPdf(Unknown Source) at
com.itextpdf.text.pdf.PdfReader.(Unknown Source) at
com.itextpdf.text.pdf.PdfReader.(Unknown Source)
This PDF is valid one- I opened it in Text editor and ensured it has %PDF as well as %EOF as recommended here
UPDATE
The iText version is 2.1.7. The jar was wrongly named as 4.2.0.
The path mentioned ("c:\tmp\test.pdf") is sample one. We are sending as "c:/tmp/test.pdf"
There is no iText 4.2.0. Please throw it away. It is a rogue version that is not released by the official developers of iText. It's a "gork", meaning God Only Really Knows what's inside. Solution: Throw away iText 4.2.0 and replace it with a more recent, official version: https://github.com/itext/itextpdf/releases
You get the error saying that the actual error message for the key trailer.not.found is not found. This means that you are using an iText jar that isn't build correctly. The .lng files are missing from the jar, hence the actual error message can't be found. Solution: Throw away iText 4.2.0 and replace it with a more recent, official version: https://github.com/itext/itextpdf/releases
The key trailer.not.found corresponds with the message "Trailer not found". It means that you are trying to create a PdfReader with a file that may look like a PDF, but that isn't. For instance: it starts with %PDF-, but there is no trailer. That means that iText searches the file (that should end in %%EOF; please check if this is the case) and the keyword startxref can be found. In other words: the trailer is missing. Solution: check if the PDF is valid. Note that old versions of iText weren't able to read PDFs that use a feature that was introduced after version PDF 1.5. Maybe your "unofficial" iText version is that old...
Finally: \ is an escape character. This is wrong: "c:\tmp\test.pdf" because if reads as "c:[tab] mp [tab] est.pdf" where [tab] is the tab character \t. You should use either "c:/tmp/test.pdf" or "c:\\tmp\\test.pdf".

LatinIME dicttool for use with a V401 Binary Dictionary

I’m trying to convert a version 401 binary dictionary -- a directory called PersonalizationDictionary.en_US.dict -- to human readable .xml.
The command line utility dicttool_aosp in packages/inputmethods/LatinIME/tools/dicttool can do it like so:
dicttool_aosp makedict -s sourcedict.dict -x output.xml
I’m unable to compile the Android Lollipop version of dicttool, since dicttool has native C++ dependencies that don’t play nice with my Mac. Note this line in the NativeLib.mk file of dictool:
HACK: Temporarily disable host tool build on Mac until the build system is ready for C++11.
I am hoping someone with a compatible setup can compile this utility for me using “make dicttool_aosp” from the root of the AOSP source tree. I've spent the past few days looking for compiled versions of it, and while I’ve found many makedict.jar files online, they are too old to support my newer V401 binary dictionary. The main difference between the V401 and older versions is that a V401 is split up into multiple files with extensions like .bigrams, .freq, .header, whereas the older dicts are contained in a single file.
Thank you, please let me know if I can clarify anything!

Why would a Saxon Report run correctly on a Mac but not on Windows?

I am using Saxon 4.4.2 to convert DocBook to various formats (e.g. HTML, PDF, ePub). I am doing development on a MacBook Pro using Eclipse. Everything is written in Java. On my Mac, everything works fine. When I use Eclipse to generate a deployable plug-in, copy the plug-in and drop it into my Eclipse installation on Windows 7, and run the conversion from DocBook to HTML, Saxon reports "Failed to compile stylesheet. 1 error detected."
The error comes from
com.icl.saxon.TransformerFactoryImpl, method newTemplates line 120.
called by
com.icl.saxon.TransformerFactoryImpl, method newTransformer, line 72.
My calling line of code is:
Transformer transformer = tfactory.newTransformer(xsl);
The setting of xsl is done via this line:
StreamSource xsl = new StreamSource(DocBookTransformer.class.getResourceAsStream("/lib/docbook-xsl-1.76.1/xhtml/docbook.xsl");
Why would Saxon process the stylesheet without error on a Mac, but fail to parse it on Windows, when it is the same Saxon Jars and the same stylesheet file being processed on both machines?
Saxon 4.4.2? Where on earth did you get hold of that? Perhaps a CD in the back of a book published around 1998? It predates the first release on SourceForge in 2001, and was probably designed to run on Java 1.1.8.
So your first step should be to see if the problem still occurs on a more modern release. The current release is 9.5.
The other thing is to find out what the error is that Saxon says it reported. It will have been sent to the JAXP ErrorListener, and unless you changed anything, the default ErrorListener will have written the message to System.err.
The things that are most likely to work on one platform and fail on another are the URIs in xsl:include and xsl:import, so you try checking those.

Image won't display in PDF using FOP 1.1 and Java

I've searched Google through and through and can't seem to find the solution to my issue...
I'm using Apache FOP 1.1 and Java to generate a PDF file from Java classes. This Java project runs from a JAR file. I am using an image that is external to the JAR itself. The XSL file that is used to generate the PDF contains this:
<fo:external-graphic src="file:///C:/images/image.jpg" width="7.5in" />
Based on much searching/reading, I've tried many different variations of the src attribute:
src="file:///C:/images/image.jpg"
src="C:/images/image.jpg"
src="url('file:///C:/images/image.jpg')"
src="url('C:/images/image.jpg')"
all without success...
Now, here's the confusing part. I am doing my development from Eclipse IDE and when using the variations of src attribute:
src="C:/images/image.jpg"
src="url('C:/images/image.jpg')"
The PDF is created properly with the images embedded.
I can not figure out what is keeping the image from being displayed when running from the JAR file...
Thanks in advance! (hopefully)
Devin
The syntax
<fo:external-graphic src="url('C:/images/image.jpg')" content-height="100%" content-width="100%"/>
works perfectly fine for me, both from Eclipse or from a JAR. Have figured out what the problem was?
I know this is an old thread but I had a similar problem and eventually figured out a partial fix. It was a combo of 2 things:
Difference between JVMs in dev and deployed environments (for me raw sun ... err oracle vs. ibm websphere bundled java)
IBM JVM doesn't like indexed PNG files. As soon as I converted it to RGB it worked.
Here is the error message I got when I manually ran the fop.bat file with websphere jvm:
SEVERE: Image not available. URI: /tmp/image.png. Reason: org.apache.xmlgraphics.image.loader.ImageException: I/O error while extracting image metadata: Error reading PNG metadata (See position 30:182)
btw, i was using fop 1.0 + java 1.6 + WAS 7.0 (java 1.6)
Hope this helps someone else!

Categories