iText difference between PdfCopy and PdfACopy

iText difference between PdfCopy and PdfACopy - java

I wrote a function to embed a file as attachment inside a PDF/A-3a document using iText 5.5.13 (using instructions from iText tutorials).
If I attach the file using the class PdfCopy, the result is a correct PDF file, but it does not claim to be PDF/A (maybe it matches all the requirements, but it doesn't say).
If I do the same using PdfACopy, I get an wrongly built document:
InvalidPdfException: Rebuild failed: trailer not found.; Original
message: PDF startxref not found.
Here is my code a little simplified. Commented is the line to use a PdfCopy instead.
public static File embedFile(File inputPdf) {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
Document document = new com.itextpdf.text.Document();
OutputStream os = new FileOutputStream(outputPdf.getAbsolutePath());
PdfACopy copy = new PdfACopy(document, os, PdfAConformanceLevel.PDF_A_3A); // Output doc doesn't work
// PdfCopy copy = new PdfCopy(document, os); // Output doc works but doesn't claim to be PDF/A
document.open();
copy.addDocument(reader);
// Include attachment (extactly as in the sample tutorial)
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.MODDATE, new PdfDate());
PdfFileSpecification fileSpec = PdfFileSpecification.fileEmbedded(
writer, "./src/main/resources/com/itextpdf/invoice.xml",
"invoice.xml", null, "application/xml", parameters, 0);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
writer.addFileAttachment("invoice.xml", fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getReference());
writer.getExtraCatalog().put(new PdfName("AF"), array);
os.flush();
reader.close();
document.close();
os.close();
copy.close();
return outputPdf;
}
The input file is already a PDF/A-3a document, so I think I don't need to redefine all the required things like embedded fonts, output intent...
Is there maybe a missing step that is mandatory when using PdfACopy that is not required with PdfCopy?
Would it help to try with iText 7?
Many thanks in advance!

As pointed by Bruno Lowagie in the comments, this is possible with iText 7. Here the function in case it helps someone:
public static File embedFile(File inputPdf, File embeddedFile, String embeddedFileName, String embeddedFileMimeType)
throws IOException {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
PdfWriter writer = new PdfWriter(outputPdf.getAbsolutePath());
PdfADocument pdfDoc = new PdfADocument(reader, writer);
// Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(pdfDoc, embeddedFile.getAbsolutePath(), embeddedFileName,
embeddedFileName, new PdfName(embeddedFileMimeType), parameters, PdfName.Data);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdfDoc.addFileAttachment(embeddedFileName, fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdfDoc.getCatalog().put(new PdfName("AF"), array);
pdfDoc.close();
reader.close();
writer.close();
return outputPdf;
}

Related

PDFBox Open PDF file into new browser tab

I am using the pdfbox library 2.0 version. I need to open PDF in new browser tab i.e. Print View.
As if we are migrating from iText to PDFBox below is the existing code with iText.
With below code, there is PDFAction class to achieve same. It is,
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);
and to apply print Javascript on doc,
copy.addJavaScript(action);
I need equivalent solution with PDFBox.
Document document = new Document();
try{
outputStream=response.getOutputStream();
// step 2
PdfCopy copy = new PdfCopy(document, outputStream);
// step 3
document.open();
// step 4
PdfReader reader;
int n;
//add print dialog in Pdf Action to open file for preview.
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);
// loop over the documents you want to concatenate
Iterator i=mergepdfFileList.iterator();
while(i.hasNext()){
File f =new File((String)i.next());
is=new FileInputStream(f);
reader=new PdfReader(is);
n = reader.getNumberOfPages();
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
reader.close();
is.close();
}
copy.addJavaScript(action);
// step 5
document.close();
}catch(IOException io){
throw io;
}catch(DocumentException e){
throw e;
}catch(Exception e){
throw e;
}finally{
outputStream.close();
}
I also tried with below reference but could not find print() method of PDDocument type.
Reference Link
Please guide me with this.
This is how file looks when display in browser tab:

This code reproduces what your file has, a JavaScript action in the name tree in the JavaScript entry in the name dictionary in the document catalog. ("When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document" - PDF specification) There's probably an easier way to do this, e.g. with an OpenAction.
PDActionJavaScript javascript = new PDActionJavaScript("this.print(true);\n");
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDDocumentNameDictionary names = new PDDocumentNameDictionary(documentCatalog, new COSDictionary());
PDJavascriptNameTreeNode javascriptNameTreeNode = new PDJavascriptNameTreeNode();
Map<String, PDActionJavaScript> map = new HashMap<>();
map.put("0000000000000000", javascript);
javascriptNameTreeNode.setNames(map);
names.setJavascript(javascriptNameTreeNode);
document.getDocumentCatalog().setNames(names);

How to open a PdfADocument from an existing PdfDocument in itext7?

In order to check uploaded PDF files for basic PDF/A conformance, I need to read them in as PdfADocuments.
But starting with version 7.1.6 this no longer works, but throws a PdfException(PdfException.PdfReaderHasBeenAlreadyUtilized)
class Controller
...
// get uploaded data into PdfDocument, which is passed
// on to different services.
InputStream filecontent = fileupload.getInputStream();
int read = 0;
byte[] bytes = new byte[1024];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while ((read = filecontent.read(bytes,0,bytes.length)) != -1) {
filesize += read;
buffer.write(bytes, 0, read);
}
ByteArrayInputStream input = new ByteArrayInputStream(buffer.toByteArray());
PdfReader reader = new PdfReader(input);
PdfWriter writer = new PdfWriter(new ByteArrayOutputStream());
PdfDocument pdf = new PdfDocument(reader, writer);
AnalyzerService analyzer = new AnalyzerService();
if(analyzer.analyze(pdf)) {
otherService.doSomethingWith(pdf);
}
...
class AnalyzerService
...
public boolean analyze(PdfDocument pdf) {
PdfADocument pdfa = new PdfADocument(
pdf.getReader(), pdf.getWriter() <-- PdfException here
);
...
}
Up to and including iText 7.1.5 this worked.
With 7.1.6 I get "com.itextpdf.kernel.PdfException: Given PdfReader instance has already been utilized. The PdfReader cannot be reused, please create a new instance."
It seems that I need to get the Bytes from the PdfDocument as a byte[], then create a new PdfReader from it. I have tried getting them from the pdf.getReader().getOutputStream().toByteArray(), but that doesn't work.
I'm quite lost at the moment on how to create that PdfADocument from the given PdfDocument.

Your approach uses the same PdfReader and (even worse) the same PdfWriter for both a PdfDocument and a PdfADocument instance. As both can manipulate the PdfReader and write to the PdfWriter, that situation is likely to result in garbage in the writer, so you shall not do this.
Simply always consider a document with both a reader and a writer as work-in-progress, something one cannot treat as a finished document file, e.g. extract for intermediary checks.
As you want to check uploaded PDF files, why don't you simply forward the byte[] from buffer.toByteArray() to the analyze method to create a separate reader (and, if need be, a document) from? This indeed exactly would check the uploaded file...
Furthermore, if your input document may be PDF/A conform and is treated specially in that case, shouldn't you also manipulate it as a PdfADocument if it is? I.e. shouldn't you first check in your analyzer for conformance and in the positive case use a PdfADocument for it also in your controller class?

PdfDocument SourcePDF=null;
PdfADocument DisPDF =null;
try
{
PdfReader Reader = new PdfReader(input-Path);
PdfWriter writer = new PdfWriter(output-Path, new WriterProperties().SetPdfVersion(PdfVersion.PDF_2_0));
writer.SetSmartMode(true);
SourcePDF = new PdfDocument(Reader);
DisPDF = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_3A,
new PdfOutputIntent("Custom", "", "https://www.color.org", "sRGB", new MemoryStream(Properties.Resources.sRGB_CS_profiles)));
DisPDF.InitializeOutlines();
//Setting some required parameters
DisPDF.SetTagged();
DisPDF.GetCatalog().SetLang(new PdfString("en-EN"));
DisPDF.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfMerger merger = new PdfMerger(DisPDF, true, true);
merger.Merge(SourcePDF, 1, sorsePDF.GetNumberOfPages());
SourcePDF.Close();
DisPDF.Close();
}
catch (Exception ex)
{
throw;
}

How to add metadata to PDF document using PDFbox?

I have an input stream of a PDF document available to me. I would like to add subject metadata to the document and then save it. I'm not sure how to do this.
I came across a sample recipe here: https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html
However, it is still fuzzy. Below is what I'm trying and places where I have questions
PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket

The following code sets the title of a PDF document, but it should be adaptable to work with other properties as well:
public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
try {
PDDocument document = PDDocument.load(documentBytes);
PDDocumentInformation info = document.getDocumentInformation();
info.setTitle(title);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos);
return baos.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Apache PDFBox is needed, so import it to e.g. Maven with:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.6</version>
</dependency>
Add a title with:
byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");
Display it in the browser with (JSF example):
<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>
Result (Chrome):

Another much easier way to do this would be to use the built-in Document Information object:
PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");
This also has the benefit of not requiring the xmpbox library.

This answer uses xmpbox and comes from the AddMetadataFromDocInfo example in the source code download:
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);

iText mergeFields in PdfCopy creates invalid pdf

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks

We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function

Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.

Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

Convert Outputstream to file

Well i'm stucked with a problem,
I need to create a PDF with a html source and i did this way:
File pdf = new File("/home/wrk/relatorio.pdf");
OutputStream out = new FileOutputStream(pdf);
InputStream input = new ByteArrayInputStream(build.toString().getBytes());//Build is a StringBuilder obj
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
well i'm using JSP so i need to download this file to the user not write in the server...
How do I transform this Outputstream output to a file in the java without write this file in hard drive ?

If you're using VRaptor 3.3.0+ you can use the ByteArrayDownload class. Starting with your code, you can use this:
#Path("/download-relatorio")
public Download download() {
// Everything will be stored into this OutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream input = new ByteArrayInputStream(build.toString().getBytes());
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
// Now that you have finished, return a new ByteArrayDownload()
// The 2nd and 3rd parameters are the Content-Type and File Name
// (which will be shown to the end-user)
return new ByteArrayDownload(out.toByteArray(), "application/pdf", "Relatorio.pdf");
}

A File object does not actually hold the data but delegates all operations to the file system (see this discussion).
You could, however, create a temporary file using File.createTempFile. Also look here for a possible alternative without using a File object.

use temporary files.
File temp = File.createTempFile(prefix ,suffix);
prefix -- The prefix string defines the files name; must be at least three characters long.
suffix -- The suffix string defines the file's extension; if null the suffix ".tmp" will be used.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

iText difference between PdfCopy and PdfACopy - java

Related

PDFBox Open PDF file into new browser tab

How to open a PdfADocument from an existing PdfDocument in itext7?

How to add metadata to PDF document using PDFbox?

iText mergeFields in PdfCopy creates invalid pdf

Convert Outputstream to file

Categories

Resources