How to open a PdfADocument from an existing PdfDocument in itext7?

How to open a PdfADocument from an existing PdfDocument in itext7? - java

In order to check uploaded PDF files for basic PDF/A conformance, I need to read them in as PdfADocuments.
But starting with version 7.1.6 this no longer works, but throws a PdfException(PdfException.PdfReaderHasBeenAlreadyUtilized)
class Controller
...
// get uploaded data into PdfDocument, which is passed
// on to different services.
InputStream filecontent = fileupload.getInputStream();
int read = 0;
byte[] bytes = new byte[1024];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while ((read = filecontent.read(bytes,0,bytes.length)) != -1) {
filesize += read;
buffer.write(bytes, 0, read);
}
ByteArrayInputStream input = new ByteArrayInputStream(buffer.toByteArray());
PdfReader reader = new PdfReader(input);
PdfWriter writer = new PdfWriter(new ByteArrayOutputStream());
PdfDocument pdf = new PdfDocument(reader, writer);
AnalyzerService analyzer = new AnalyzerService();
if(analyzer.analyze(pdf)) {
otherService.doSomethingWith(pdf);
}
...
class AnalyzerService
...
public boolean analyze(PdfDocument pdf) {
PdfADocument pdfa = new PdfADocument(
pdf.getReader(), pdf.getWriter() <-- PdfException here
);
...
}
Up to and including iText 7.1.5 this worked.
With 7.1.6 I get "com.itextpdf.kernel.PdfException: Given PdfReader instance has already been utilized. The PdfReader cannot be reused, please create a new instance."
It seems that I need to get the Bytes from the PdfDocument as a byte[], then create a new PdfReader from it. I have tried getting them from the pdf.getReader().getOutputStream().toByteArray(), but that doesn't work.
I'm quite lost at the moment on how to create that PdfADocument from the given PdfDocument.

Your approach uses the same PdfReader and (even worse) the same PdfWriter for both a PdfDocument and a PdfADocument instance. As both can manipulate the PdfReader and write to the PdfWriter, that situation is likely to result in garbage in the writer, so you shall not do this.
Simply always consider a document with both a reader and a writer as work-in-progress, something one cannot treat as a finished document file, e.g. extract for intermediary checks.
As you want to check uploaded PDF files, why don't you simply forward the byte[] from buffer.toByteArray() to the analyze method to create a separate reader (and, if need be, a document) from? This indeed exactly would check the uploaded file...
Furthermore, if your input document may be PDF/A conform and is treated specially in that case, shouldn't you also manipulate it as a PdfADocument if it is? I.e. shouldn't you first check in your analyzer for conformance and in the positive case use a PdfADocument for it also in your controller class?

PdfDocument SourcePDF=null;
PdfADocument DisPDF =null;
try
{
PdfReader Reader = new PdfReader(input-Path);
PdfWriter writer = new PdfWriter(output-Path, new WriterProperties().SetPdfVersion(PdfVersion.PDF_2_0));
writer.SetSmartMode(true);
SourcePDF = new PdfDocument(Reader);
DisPDF = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_3A,
new PdfOutputIntent("Custom", "", "https://www.color.org", "sRGB", new MemoryStream(Properties.Resources.sRGB_CS_profiles)));
DisPDF.InitializeOutlines();
//Setting some required parameters
DisPDF.SetTagged();
DisPDF.GetCatalog().SetLang(new PdfString("en-EN"));
DisPDF.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfMerger merger = new PdfMerger(DisPDF, true, true);
merger.Merge(SourcePDF, 1, sorsePDF.GetNumberOfPages());
SourcePDF.Close();
DisPDF.Close();
}
catch (Exception ex)
{
throw;
}

Related

Beanio - How to write to stream object

I am trying to create a fixed length file output using beanio. I don't want to write the physical file, instead I want to write content to an OutputStream.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
StreamFactory factory = StreamFactory.newInstance();
StreamBuilder builder = new StreamBuilder("sb")
.format("fixedlength")
.parser(new FixedLengthParserBuilder())
.addRecord(Team.class);
factory.define(builder);
OutputStreamWriter writer = new OutputStreamWriter(outputStream, StandardCharsets.UTF_8);
BeanWriter bw = sf.createWriter("sb", writer);
bw.write(teamVO) // teamVO has some value.
try(OutputStream os = new FileOutputStream("file.txt")){
outputStream.writeTo(os); //Doing this only to check outputStream has some value
}
Here the file created, file.txt has no content in it, It is of size 0 kb.
I am able to write the file in the following method, but as I don't want to write the file in physical location, I thought of writing the contents in to a outputStream and then later in a different method, it can be converted in to a file
//This is working and creates file successfully
StreamFactory factory = StreamFactory.newInstance();
StreamBuilder builder = new StreamBuilder("sb")
.format("fixedlength")
.parser(new FixedLengthParserBuilder())
.addRecord(Team.class)
factory.define(builder);
BeanWriter bw = factory.createWriter("sb", new File("file.txt"));
bw.write(teamVO);
Why in the first approach the file is created with size 0 kb?

It looks like you haven't written enough data to the OutputstreamWriter for it to push some data to the underlying ByteArrayOutputStream. You have 2 options here.
Flush the writer object manually and then close it. This will then write the data to the outputStream. I would not recommend this approach because the writer may not be flushed or closed should there be any Exception before you could manually flush and close the writer.
Use a try-with-resource block for the writer which should take care of closing the writer and ultimately flushing the data to the outputStream
This should do the trick:
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try (final OutputStreamWriter writer = new OutputStreamWriter(outputStream) ) {
final BeanWriter bw = factory.createWriter("sb", writer);
final Team teamVO = new Team();
teamVO.setName("TESTING");
bw.write(teamVO); // teamVO has some value.
}
try (OutputStream os = new FileOutputStream("file.txt") ) {
outputStream.writeTo(os); // Doing this only to check outputStream has some value
}

iText difference between PdfCopy and PdfACopy

I wrote a function to embed a file as attachment inside a PDF/A-3a document using iText 5.5.13 (using instructions from iText tutorials).
If I attach the file using the class PdfCopy, the result is a correct PDF file, but it does not claim to be PDF/A (maybe it matches all the requirements, but it doesn't say).
If I do the same using PdfACopy, I get an wrongly built document:
InvalidPdfException: Rebuild failed: trailer not found.; Original
message: PDF startxref not found.
Here is my code a little simplified. Commented is the line to use a PdfCopy instead.
public static File embedFile(File inputPdf) {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
Document document = new com.itextpdf.text.Document();
OutputStream os = new FileOutputStream(outputPdf.getAbsolutePath());
PdfACopy copy = new PdfACopy(document, os, PdfAConformanceLevel.PDF_A_3A); // Output doc doesn't work
// PdfCopy copy = new PdfCopy(document, os); // Output doc works but doesn't claim to be PDF/A
document.open();
copy.addDocument(reader);
// Include attachment (extactly as in the sample tutorial)
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.MODDATE, new PdfDate());
PdfFileSpecification fileSpec = PdfFileSpecification.fileEmbedded(
writer, "./src/main/resources/com/itextpdf/invoice.xml",
"invoice.xml", null, "application/xml", parameters, 0);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
writer.addFileAttachment("invoice.xml", fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getReference());
writer.getExtraCatalog().put(new PdfName("AF"), array);
os.flush();
reader.close();
document.close();
os.close();
copy.close();
return outputPdf;
}
The input file is already a PDF/A-3a document, so I think I don't need to redefine all the required things like embedded fonts, output intent...
Is there maybe a missing step that is mandatory when using PdfACopy that is not required with PdfCopy?
Would it help to try with iText 7?
Many thanks in advance!

As pointed by Bruno Lowagie in the comments, this is possible with iText 7. Here the function in case it helps someone:
public static File embedFile(File inputPdf, File embeddedFile, String embeddedFileName, String embeddedFileMimeType)
throws IOException {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
PdfWriter writer = new PdfWriter(outputPdf.getAbsolutePath());
PdfADocument pdfDoc = new PdfADocument(reader, writer);
// Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(pdfDoc, embeddedFile.getAbsolutePath(), embeddedFileName,
embeddedFileName, new PdfName(embeddedFileMimeType), parameters, PdfName.Data);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdfDoc.addFileAttachment(embeddedFileName, fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdfDoc.getCatalog().put(new PdfName("AF"), array);
pdfDoc.close();
reader.close();
writer.close();
return outputPdf;
}

iText mergeFields in PdfCopy creates invalid pdf

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks

We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function

Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.

Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

Convert Outputstream to file

Well i'm stucked with a problem,
I need to create a PDF with a html source and i did this way:
File pdf = new File("/home/wrk/relatorio.pdf");
OutputStream out = new FileOutputStream(pdf);
InputStream input = new ByteArrayInputStream(build.toString().getBytes());//Build is a StringBuilder obj
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
well i'm using JSP so i need to download this file to the user not write in the server...
How do I transform this Outputstream output to a file in the java without write this file in hard drive ?

If you're using VRaptor 3.3.0+ you can use the ByteArrayDownload class. Starting with your code, you can use this:
#Path("/download-relatorio")
public Download download() {
// Everything will be stored into this OutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream input = new ByteArrayInputStream(build.toString().getBytes());
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
// Now that you have finished, return a new ByteArrayDownload()
// The 2nd and 3rd parameters are the Content-Type and File Name
// (which will be shown to the end-user)
return new ByteArrayDownload(out.toByteArray(), "application/pdf", "Relatorio.pdf");
}

A File object does not actually hold the data but delegates all operations to the file system (see this discussion).
You could, however, create a temporary file using File.createTempFile. Also look here for a possible alternative without using a File object.

use temporary files.
File temp = File.createTempFile(prefix ,suffix);
prefix -- The prefix string defines the files name; must be at least three characters long.
suffix -- The suffix string defines the file's extension; if null the suffix ".tmp" will be used.

how to append data in docx file using docx4j

Please tell me how to append data in docx file using java and docx4j.
What I'm doing is, I am using a template in docx format in which some field are dilled by java at run time,
My problem is for every group of data it creates a new file and i just want to append the new file into 1 file. and this is not done using java streams
String outputfilepath = "e:\\Practice/DOC/output/generatedLatterOUTPUT.docx";
String outputfilepath1 = "e:\\Practice/DOC/output/generatedLatterOUTPUT1.docx";
WordprocessingMLPackage wordMLPackage;
public void templetsubtitution(String name, String age, String gender, Document document)
throws Exception {
// input file name
String inputfilepath = "e:\\Practice/DOC/profile.docx";
// out put file name
// id of Xml file
String itemId1 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId2 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId3 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
// Load the Package
if (inputfilepath.endsWith(".xml")) {
JAXBContext jc = Context.jcXmlPackage;
Unmarshaller u = jc.createUnmarshaller();
u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
org.docx4j.xmlPackage.Package wmlPackageEl = (org.docx4j.xmlPackage.Package) ((JAXBElement) u
.unmarshal(new javax.xml.transform.stream.StreamSource(
new FileInputStream(inputfilepath)))).getValue();
org.docx4j.convert.in.FlatOpcXmlImporter xmlPackage = new org.docx4j.convert.in.FlatOpcXmlImporter(
wmlPackageEl);
wordMLPackage = (WordprocessingMLPackage) xmlPackage.get();
} else {
wordMLPackage = WordprocessingMLPackage
.load(new File(inputfilepath));
}
CustomXmlDataStoragePart customXmlDataStoragePart = wordMLPackage
.getCustomXmlDataStorageParts().get(itemId1);
// Get the contents
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart
.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:name[1]", name,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId2);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:age[1]", age,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId3);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:gender[1]", gender,
"xmlns:ns0='EasyForm'");
// Apply the bindings
BindingHandler.applyBindings(wordMLPackage.getMainDocumentPart());
File f = new File(outputfilepath);
wordMLPackage.save(f);
FileInputStream fis = new FileInputStream(f);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for (int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum);
}
// System.out.println( buf.length);
} catch (IOException ex) {
}
byte[] bytes = bos.toByteArray();
FileOutputStream file = new FileOutputStream(outputfilepath1, true);
DataOutputStream out = new DataOutputStream(file);
out.write(bytes);
out.flush();
out.close();
System.out.println("..done");
}
public static void main(String[] args) {
utility u = new utility();
u.templetsubtitution("aditya",24,mohan);
}
thanks in advance

If I understand you correctly, you're essentially talking about merging documents. There are two very simple approaches that you can use, and their effectiveness really depends on the structure and onward use of your data:
PhilippeAuriach describes one approach in his answer, which entails
appending all components within a MaindocumentPart instance to
another. In terms of the final docx file, this means the content
that appears in document.xml -- it won't take into account headers
and footers ( for example), but that may be fine for you.
You can insert multiple documents into a single docx file by inserting them
as AltChunk elements (see the docx4j documentation). This will
bring everything from one Word file into another, headers and all.
The downside of this is that your final document won't be a proper
flowing Word file until you open it and save it in MS Word itself
(the imported components remain as standalone files within the docx
bundle). This will cause you issues if you want to generated
'merged' files and then do something with them like render PDFs --
the merged content will simply be ignored.
The more complete (and complex) approach is to perform a "deep merge". This updates and maintains all references held within a document. Imported content becomes part of the main "flow" of the document (i.e. it is not stored as separate references), so the end result is a properly-merged file which can be rendered to PDF or whatever.
The downside to this is you need a good knowledge of docx structure and the API, and you will be writing a fair amount of code (I would recommend buying a license to Plutext's MergeDocx instead).

I had to deal with similar things, and here is what I did (probably not the most efficient, but working) :
create a finalDoc loading the template, and emptying it (so you have the styles in this doc)
for each data row, create a new doc loading the template, then replace your fields with your values
use the function below to append the doc filled with the datas to the finalDoc :
public static void append(WordprocessingMLPackage docDest, WordprocessingMLPackage docSource) {
List<Object> objects = docSource.getMainDocumentPart().getContent();
for(Object o : objects){
docDest.getMainDocumentPart().getContent().add(o);
}
}
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.