I am trying to attach a file to a existing PDF/A-3.
This example explains how to create a PDF/A-3 and attach content to it.
My next step was to adapt the code and use the PdfAStamper instead of the document.
So here is my resulting code.
private ByteArrayOutputStream append(byte[] content, InputStream inPdf) throws IOException, DocumentException {
ByteArrayOutputStream result = new ByteArrayOutputStream(16000);
PdfReader reader = new PdfReader(inPdf);
PdfAStamper stamper = new PdfAStamper(reader, result, PdfAConformanceLevel.PDF_A_3B);
stamper.createXmpMetadata();
// Creating PDF/A-3 compliant attachment.
PdfDictionary embeddedFileParams = new PdfDictionary();
embeddedFileParams.put(PARAMS, new PdfName(ZF_NAME));
embeddedFileParams.put(MODDATE, new PdfDate());
PdfFileSpecification fs = PdfFileSpecification.fileEmbedded(stamper.getWriter(), null,ZF_NAME, content , "text/xml", embeddedFileParams,0);
fs.put(AFRELATIONSHIP, Alternative);
stamper.addFileAttachment("file description",fs);
stamper.close();
reader.close();
return result;
}
Here is the Stacktrace of the error.
com.itextpdf.text.pdf.PdfAConformanceException: EF key of the file specification dictionary for an embedded file shall contain dictionary with valid F key.
at com.itextpdf.text.pdf.internal.PdfA3Checker.checkFileSpec(PdfA3Checker.java:95)
at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:198)
at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:70)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3380)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3376)
at com.itextpdf.text.pdf.PdfFileSpecification.toPdf(PdfFileSpecification.java:309)
at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:157)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:424)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:402)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:381)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:334)
at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:819)
at com.itextpdf.text.pdf.PdfFileSpecification.getReference(PdfFileSpecification.java:256)
at com.itextpdf.text.pdf.PdfDocument.addFileAttachment(PdfDocument.java:2253)
at com.itextpdf.text.pdf.PdfWriter.addFileAttachment(PdfWriter.java:1714)
at com.itextpdf.text.pdf.PdfStamper.addFileAttachment(PdfStamper.java:497)
Now when I try to analyze the Stacktrace and take a look atPdfFileSpecification.fileEmbedded
I see that an EF is created with an F and UF entry.
Looking inside PdfA3Checker I see that line PdfDictionary embeddedFile = getDirectDictionary(dict.get(PdfName.F)); is not a directory but a strem.
if (fileSpec.contains(PdfName.EF)) {
PdfDictionary dict = getDirectDictionary(fileSpec.get(PdfName.EF));
if (dict == null || !dict.contains(PdfName.F)) {
throw new PdfAConformanceException(obj1, MessageLocalization.getComposedMessage("ef.key.of.file.specification.dictionary.shall.contain.dictionary.with.valid.f.key"));
}
PdfDictionary embeddedFile = getDirectDictionary(dict.get(PdfName.F));
if (embeddedFile == null) {
throw new PdfAConformanceException(obj1, MessageLocalization.getComposedMessage("ef.key.of.file.specification.dictionary.shall.contain.dictionary.with.valid.f.key"));
}
checkEmbeddedFile(embeddedFile);
}
Is this a bug in iText or am I missing something? By the way I am using iText 5.4.5.
Update 1
As suggested by Bruno an mkl the 4.5.6-Snapshot should contains the fix. I tried my Test case Gist link to full test case against the current trunk. But the result was the same error.
You ran into a bug very similar to the one focused in Creating PDF/A-3: Embedded file shall contain valid Params key:
The problem (as you found out) is in this code
PdfDictionary embeddedFile = getDirectDictionary(dict.get(PdfName.F));
if (embeddedFile == null) {
throw new PdfAConformanceException(obj1, MessageLocalization.getComposedMessage("ef.key.of.file.specification.dictionary.shall.contain.dictionary.with.valid.f.key"));
}
in PdfA3Checker.checkFileSpec(PdfWriter, int, Object); even though dict contains a stream named F, getDirectDictionary(dict.get(PdfName.F)) does not return it. The reason is not, though, that a dictionary is sought here (a stream essentially is a dictionary with some additions), but it is an issue in PdfAChecker.getDirectObject which is called by PdfAChecker.getDirectDictionary:
protected PdfObject getDirectObject(PdfObject obj) {
if (obj == null)
return null;
//use counter to prevent indirect reference cycling
int count = 0;
while (obj.type() == 0) {
PdfObject tmp = cachedObjects.get(new RefKey((PdfIndirectReference)obj));
if (tmp == null)
break;
obj = tmp;
//10 - is max allowed reference chain
if (count++ > 10)
break;
}
return obj;
}
This method only looks for cached objects (i.e. in cachedObjects) but in your case (and a test of mine, too) this stream has already been written to file and is not in cache anymore resulting in a null returned... correction, cf the PPS: it has been written but it has not been cached to begin with.
PS: PDF/A-3 conform file attachments work if added during PDF creation (using a PdfAWriter), but not if added during PDF manipulation (using a PdfAStamper); maybe the caching is different in these use cases.
PPS: Indeed there is a difference: PdfAWriter overrides the addToBody overloads by adding the added objects to a cache. PdfAStamperImp does not do so and furthermore is derived from PdfStamperImp and PdfWriter, not from PdfAWriter.
You've indeed hit a bug in iText 5.4.5. This bug was reported here: Creating PDF/A-3: Embedded file shall contain valid Params key
It was fixed in the SVN version of iText. We're preparing the next version of iText (due this week).
Related
I have a spring boot application and I am trying to merge two pdf files. The one I am getting as a byte array from another service and the one I have it locally in my resources file: /static/documents/my-file.pdf. This is the code of how I am getting byte array from my file from resources:
public static byte[] getMyPdfContentForLocale(final Locale locale) {
byte[] result = new byte[0];
try {
final File myFile = new ClassPathResource(TEMPLATES.get(locale)).getFile();
final Path filePath = Paths.get(myFile.getPath());
result = Files.readAllBytes(filePath);
} catch (IOException e) {
LOGGER.error(format("Failed to get document for local %s", locale), e);
}
return result;
}
I am getting the file and getting the byte array. Later I am trying to merge this two files with the following code:
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
pdfMergerUtility.addSource(new ByteArrayInputStream(offerDocument));
pdfMergerUtility.addSource(new ByteArrayInputStream(merkblattDocument));
ByteArrayOutputStream os = new ByteArrayOutputStream();
pdfMergerUtility.setDestinationStream(os);
pdfMergerUtility.mergeDocuments(null);
os.toByteArray();
But unfortunately it throws an error:
throw new IOException("Page tree root must be a dictionary");
I have checked and it makes this validation before it throws it:
if (!(root.getDictionaryObject(COSName.PAGES) instanceof COSDictionary))
{
throw new IOException("Page tree root must be a dictionary");
}
And I really have no idea what does this mean and how to fix it.
The strangest thing is that I have created totally new project and tried the same code to merge two documents (the same documents) and it works!
Additionally what I have tried is:
Change the spring boot version if it is ok
Set the mergeDocuments method like this: pdfMergerUtility.mergeDocuments(setupMainMemoryOnly())
Set the mergeDocuments method like this: pdfMergerUtility.mergeDocuments(setupTempFileOnly())
Get the bytes with a different method not using the Files from java.nio:
And also executed this in a different thread
Merging files only locally stored (in resources)
Merging the file that I am getting from another service - this works btw and that is why I am sure he is ok
Can anyone help with this?
The issue as Tilman Hausherr said is in that resource filtering that you can find in your pom file. If you have a case where you are not allowed to modify this then this approach will help you:
final String path = new
ClassPathResource(TEMPLATES.get(locale)).getFile().getAbsolutePath();
final File file = new File(path);
final Path filePath = Paths.get(file.getPath());
result = Files.readAllBytes(filePath);
and then just pass the bytes to the pdfMergerUtility object (or even the whole file instead of the list of bytes).
We are attempting to generate documents using iText that are formed largely from "template" files - smaller PDF files that are combined together into one composite file using the PdfContentByte.addTemplate method. We then automatically and silently print the new file using the *nix command lp. This usually works; however occasionally, some files that are generated will fail to print. The document proceeds through all queues and arrives at the printer proper (a Lexmark T652n, in this case), its physical display gives a message of pending progress, and even its mechanical components whir up in preparation - then, the printing job vanishes spontaneously without a trace, and the printer returns to being ready.
The oddity in how specific this issue tends to be. For starters, the files in question print without fail when done manually through Adobe PDF Viewer, and can be read fine by editors like Adobe Live Cycle. Furthermore, the content of the file effects whether it is plagued by this issue, but not in a clear way - adding a specific template 20 times could cause the problem, while doing it 19 or 21 times might be fine, or using a different template will change the pattern entirely and might cause it to happen instead after 37 times. Generating a document with the exact same content will be consistent on whether or not the issue occurs, but any subtle and seemingly irrelevant change in content will change whether the problem happens.
While it could be considered a hardware issue, the fact remains that certain iText-generated files have this issue while others do not. Is our method of file creation sometimes creating files that are somehow considered corrupt only to the printer and only sometimes?
Here is a relatively small code example that generates documents using the repetitive template method similar to our main program. It uses this file as a template and repeats it a specified number of times.
public class PDFFileMaker {
private static final int INCH = 72;
final private static float MARGIN_TOP = INCH / 4;
final private static float MARGIN_BOTTOM = INCH / 2;
private static final String DIREC = "/pdftest/";
private static final String OUTPUT_FILEPATH = DIREC + "cooldoc_%d.pdf";
private static final String TEMPLATE1_FILEPATH = DIREC + "template1.pdf";
private static final Rectangle PAGE_SIZE = PageSize.LETTER;
private static final Rectangle TEMPLATE_SIZE = PageSize.LETTER;
private ByteArrayOutputStream workingBuffer;
private ByteArrayOutputStream storageBuffer;
private ByteArrayOutputStream templateBuffer;
private float currPosition;
private int currPage;
private int formFillCount;
private int templateTotal;
private static final int DEFAULT_NUMBER_OF_TIMES = 23;
public static void main (String [] args) {
System.out.println("Starting...");
PDFFileMaker maker = new PDFFileMaker();
File file = null;
try {
file = maker.createPDF(DEFAULT_NUMBER_OF_TIMES);
}
catch (Exception e) {
e.printStackTrace();
}
if (file == null || !file.exists()) {
System.out.println("File failed to be created.");
}
else {
System.out.println("File creation successful.");
}
}
public File createPDF(int inCount) throws Exception {
templateTotal = inCount;
String sFilepath = String.format(OUTPUT_FILEPATH, templateTotal);
workingBuffer = new ByteArrayOutputStream();
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
startPDF();
doMainSegment();
finishPDF(sFilepath);
return new File(sFilepath);
}
private void startPDF() throws DocumentException, FileNotFoundException {
Document d = new Document(PAGE_SIZE);
PdfWriter w = PdfWriter.getInstance(d, workingBuffer);
d.open();
d.add(new Paragraph(" "));
d.close();
w.close();
currPosition = 0;
currPage = 1;
formFillCount = 1;
}
protected void finishPDF(String sFilepath) throws DocumentException, IOException {
//Transfers data from buffer 1 to builder file
PdfReader r = new PdfReader(workingBuffer.toByteArray());
PdfStamper s = new PdfStamper(r, new FileOutputStream(sFilepath));
s.setFullCompression();
r.close();
s.close();
}
private void doMainSegment() throws FileNotFoundException, IOException, DocumentException {
File fTemplate1 = new File(TEMPLATE1_FILEPATH);
for (int i = 0; i < templateTotal; i++) {
doTemplate(fTemplate1);
}
}
private void doTemplate(File f) throws FileNotFoundException, IOException, DocumentException {
PdfReader reader = new PdfReader(new FileInputStream(f));
//Transfers data from the template input file to temporary buffer
PdfStamper stamper = new PdfStamper(reader, templateBuffer);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
//Get size of template file via looking for "end" Acrofield
float[] area = form.getFieldPositions("end");
float size = TEMPLATE_SIZE.getHeight() - MARGIN_TOP - area[4];
//Requires Page Break
if (size >= PAGE_SIZE.getHeight() - MARGIN_TOP - MARGIN_BOTTOM + currPosition) {
PdfReader subreader = new PdfReader(workingBuffer.toByteArray());
PdfStamper substamper = new PdfStamper(subreader, storageBuffer);
currPosition = 0;
currPage++;
substamper.insertPage(currPage, PAGE_SIZE);
substamper.close();
subreader.close();
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
}
//Set Fields
form.setField("field1", String.format("Form Text %d", formFillCount));
form.setField("page", String.format("Page %d", currPage));
formFillCount++;
stamper.close();
reader.close();
//Read from working buffer, stamp to storage buffer, stamp template from template buffer
reader = new PdfReader(workingBuffer.toByteArray());
stamper = new PdfStamper(reader, storageBuffer);
reader.close();
reader = new PdfReader(templateBuffer.toByteArray());
PdfImportedPage page = stamper.getImportedPage(reader, 1);
PdfContentByte cb = stamper.getOverContent(currPage);
cb.addTemplate(page, 0, currPosition);
stamper.close();
reader.close();
//Reset buffers - working buffer takes on storage buffer data, storage and template buffers clear
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
currPosition -= size;
}
Running this program with a DEFAULT_NUMBER_OF_TIMES of 23 produces this document and causes the failure when sent to the printer. Changing it to 22 times produces this similar-looking document (simply with one less "line") which does not have the problem and prints successfully. Using a different PDF file as a template component completely changes these numbers or makes it so that it may not happen at all.
While this problem is likely too specific and with too many factors for other people to reasonably be expected to reproduce, the question of possibilities remains. What about the file generation could cause this unusual behavior? What might cause one file to be acceptable to a specific printer but another, generated in the same manner in different only in seemingly non-trivial ways, to be unacceptable? Is there a bug in iText produced by using the stamper template commands too heavily? This has been a long-running bug with us for a while now, so any assistance is appreciate; additionally, I am willing to answer questions or have extended conversations in chat as necessary in an effort to get to the bottom of this.
The design of your application more or less abuses the otherwise perfectly fine PdfStamper functionality.
Allow me to explain.
The contents of a page can be expressed as a stream object or as an array of a stream objects. When changing a page using PdfStamper, the content of this page is always an array of stream objects, consisting of the original stream object or the original array of stream objects to which extra elements are added.
By adding the same template creating a PdfStamper object over and over again, you increase the number of elements in the page contents array dramatically. You also introduce a huge number of q and Q operators that save and restore the stack. The reason why you have random behavior is clear: the memory and CPU available to process the PDF can vary from one moment to another. One time, there will be sufficient resources to deal with 20 q operators (saves the state), the next time there will only be sufficient resources to deal with 19. The problem occurs when the process runs out of resources.
While the PDFs you're creating aren't illegal according to ISO-32000-1, some PDF processors simply choke on these PDFs. iText is a toolbox that allows you to create PDFs that can make me very happy when I look under the hood, but it also allows you to create horrible PDFs if you don't use the toolbox wisely. The latter is what happened in your case.
You should solve this be reusing the PdfStamper instance instead of creating a new PdfStamper over and over again. If that's not possible, please post another question, using less words, explaining exactly what you want to achieve.
Suppose that you have many different source files with PDF snippets that need to be added to a single page. For instance: suppose that each PDF snippet was a coupon and you need to create a sheet with 30 coupons. Than you'd use a single PdfWriter instance, import pages with getImportedPage() and add them at the correct position using addTemplate().
Of course: I have no idea what your project is about. The idea of coupons of a page was inspired by your test PDF.
I'm starting with new functionality in my android application which will help to fill certain PDF forms.
I find out that the best solution will be to use iText library.
I can read file, and read AcroFields from document but is there any possibility to find out that specific field is marked as required?
I tried to find this option in API documentation and on Internet but there were nothing which can help to solve this issue.
Please take a look at section 13.3.4 of my book, entitled "AcroForms revisited". Listing 13.15 shows a code snippet from the InspectForm example that checks whether or not a field is a password or a multi-line field.
With some minor changes, you can adapt that example to check for required fields:
for (Map.Entry<String,AcroFields.Item> entry : fields.entrySet()) {
out.write(entry.getKey());
item = entry.getValue();
dict = item.getMerged(0);
flags = dict.getAsNumber(PdfName.FF);
if (flags != null && (flags.intValue() & BaseField.REQUIRED) > 0)
out.write(" -> required\n");
}
check required fields without looping for itext5.
String src = "yourpath/pdf1.pdf";
String dest = "yourpath/pdf2.pdf";
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(this.reader, new FileOutputStream(dest));
AcroFields form = this.stamper.getAcroFields();
Map<String,AcroFields.Item> fields = form.getFields();
AcroFields.Item item;
PdfDictionary dict;
PdfNumber flags;
item=fields.get("fieldName");
dict = item.getMerged(0);
flags = dict.getAsNumber(PdfName.FF);
if (flags != null && (flags.intValue() & BaseField.REQUIRED) > 0)
{
System.out.println("flag has set");
}
I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks
We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function
Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.
Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0
I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.
The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)
I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.