iText - OutOfMemory creating more than 1000 PDFs

iText - OutOfMemory creating more than 1000 PDFs - java

I want to create a ZipOutputStream filled with PDF-As. I'm using iText (Version 5.5.7). For more than 1000 pdf entries I get an OutOfMemory-exception on doc.close() and can't find the leak.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(baos));
zos.setEncoding("Cp850");
for (MyObject o : objects) {
try {
String pdfFilename = o.getName() + ".pdf";
zos.putNextEntry(new ZipEntry(pdfFilename));
pdfBuilder.buildPdfADocument(zos);
zos.closeEntry();
} ...
PdfBuilder
public void buildPdfADocument(org.apache.tools.zip.ZipOutputStream zos){
Document doc = new Document(PageSize.A4);
PdfAWriter writer = PdfAWriter.getInstance(doc, zos, PdfAConformanceLevel.PDF_A_1B);
writer.setCloseStream(false); // to not close my zos
writer.setViewerPreferences(PdfWriter.ALLOW_PRINTING | PdfWriter.PageLayoutSinglePage);
writer.createXmpMetadata();
doc.open();
// adding Element's to doc
// with flushContent() on PdfPTables
InputStream sRGBprofile = servletContext.getResourceAsStream("/WEB-INF/conf/AdobeRGB1998.icc");
ICC_Profile icc = ICC_Profile.getInstance(sRGBprofile);
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
//try to close/flush everything possible
doc.close();
writer.setXmpMetadata(null);
writer.flush();
writer.close();
if(sRGBprofile != null){
sRGBprofile.close();
}
}
Any suggestions how can I fix it? Am I forgetting something?
I've already tried to use java ZipOutputStream but it makes any difference.
Thx for ur answers! I understand the issue with the ByteOutputStream, but I am not sure what's the best approach in my case. It's a web application and I need to pack the zip in a database blob somehow.
What I am doing now is creating the PDFs directly into the ZipOutputStream with iText and saving byte array of the corresponding ByteArrayOutputSteam to blob. Options that I see are:
Split my data in 500 object packages, save first 500 PDFs to the database and then open the zip and add the next 500 ones and so on... But I assume that this creates me the same situation as I have now, namely too big stream opened in the memory.
Try to save the PDFs on the server (not sure if there's enough space), create temporary zip file and then submit the bytes to the blob...
Any suggestions/ideas?

It's because your ZipOutputStream is backed by a ByteArrayOutputStream, so even closing the entries keeps the full ZIP contents in memory.

You need to use another approach to do it with this number of arguments (1000+ files).
You are loading all the PDF files in memory on your example, you will need to do this in blocks of documents to minimize the effect of this 'memory load'.
Another approach is serialize your PDFs on filesystem, and then create your zip file.

Related

How to fix java.lang.OutOfMemoryError: Java heap space when generating pdf document?

My system throws exception: "java.lang.OutOfMemoryError: Java heap space", when it processed a huge file. I realized that StringWriter.toString() cause the double size in heap, so it could cause the issue. How can I optimize block of following code to avoid Out Of Memory.
public byte[] generateFromFo(final StringWriter foString) {
try {
StringReader foReader = new StringReader(foString.toString());
ByteArrayOutputStream pdfWriter = new ByteArrayOutputStream();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, fopFactory.newFOUserAgent(),pdfWriter);
TRANSFORMER_FACTORY.newTransformer().transform(new StreamSource(foReader), new SAXResult(fop.getDefaultHandler()));
LOG.debug("Completed rendering PDF output!");
return pdfWriter.toByteArray();
} catch (Exception e) {
LOG.error("Error while generating PDF from FO",e);
throw new AuditReportExportServiceException(AuditErrorCode.INTERNAL_ERROR,"Could not generate PDF from XSL-FO");
}
}

Using an InputStream of bytes may reduce the memory for foString by upto a factor 2 (char = 2 bytes).
A ByteArrayOutputStream resizes during its filling, so adding an estimated need speeds things up, and might prevent a resizing too much.
InputStream foReader = new ByteArrayInputStream(
foString.toString().getBytes(StandardCharsets.UTF_8);
foString.close();
final int initialCapacity = 160 * 1024;
ByteArrayOutputStream pdfWriter = new ByteArrayOutputStream(initialCapacity);
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, fopFactory.newFOUserAgent(),
pdfWriter);
TRANSFORMER_FACTORY.newTransformer().transform(new StreamSource(foReader),
new SAXResult(fop.getDefaultHandler()));
The best would be to change the API:
public void generateFromFo(final String foString, OutputStream pdfOut) { ... }
This might make the ByteArrayOutputStream superfluous, and you might immediately stream to a file, URL, or whatever.
The document itself and the generated PDF also has issues:
image sizes (but remember the higher resolution of prints)
some images can be nicely vectorized
repeated images like in a page header, should be stored once
fonts should ideally be the standard fonts, second best embedded subsets (of used chars)
XML might be suboptimal, very repetitive

Broadly, you have two main options:
Increase the memory available to your process. The -Xmx option to Java will set this config. You could pass e.g. -Xmx8G to ask for 8GB of memory on a 64 bit system, if you have that much. Docs are here: http://docs.oracle.com/javase/7/docs/technotes/tools/windows/java.html#nonstandard
Change your code to "stream" the data through in smaller chunks, rather than trying to assemble the whole file into a byte[] in memory, as you have done here. You could change the output of your transformer to a FileOutputStream rather than a ByteArrayOutputStream and return a File rather than a byte[] in the code shown? Or, depending on what you do with the output of this method, you could return an InputStream and allow the consumer to receive the file data in a streaming fashion?
You may also need to change things so that the input to this method is consumed in a streaming fashion. How to do that depends on the details of how StringWriter foString was created. You may need to "pipe" an OutputStream into an InputStream to make this work, see https://docs.oracle.com/javase/7/docs/api/java/io/PipedInputStream.html
1 is simpler. 2 is probably better here.

iText - Read pdfs in a loop and merge one by one

I have a loop where I read a pdf file every time. And I want to add these pdf files into another final pdf. Basically merging all pdfs into one new pdf file.
I tried following way :
I tried to concatenate byteArrays inside the loop like
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
for (............){
outputStream.write(myInfo.getByteArray());
}
pdfreader = new PdfReader(outputStream.toByteArray());
FileOutputStream fileout = new FileOutputStream(file);
PdfStamper pdfStamper = new PdfStamper(pdfreader, fileout);
pdfStamper.close();
pdfreader.close();
The problem the final pdf does not have all the pdfs. Instead it has only one pdf.
And I am not sure if this is the right way to do it.
Or is there any other to merge pdfs one by one ?

Have a look at the documentation for PdfMerger: itextsupport.com/apidocs/iText7/7.0.3/com/itextpdf/kernel/utils/PdfMerger.html
I can prepare an example later if needed.

How to convert InputStream to a PDF in Java, without damaging the file?

I have an InputStream which I would like to convert to a PDF, and save that PDF in a directory. Currently, my code is able to convert the InputStream to a PDF and the PDF does show up in the correct directory. However, when I try to open it, the file is damaged.
Here is the current code:
InputStream pAdESStream = signingServiceConnector.getDirectClient().getPAdES(this.statusReader.getStatusResponse().getpAdESUrl());
byte[] buffer = new byte[pAdESStream.available()];
pAdESStream.read(buffer);
File targetFile = new File(System.getProperty("user.dir") + "targetFile2.pdf");
OutputStream outStream = new FileOutputStream(targetFile);
outStream.write(buffer);
Originally, the InputStream was a pAdES-file (https://en.wikipedia.org/wiki/PAdES). However, it should be able to be read as just a regular PDF.
Does anyone know how to convert the InputStream to a PDF, without getting a damaged PDF as a result?

Hello it might be a bit late but you can use PDFBOX api (or itextpdf)
https://www.tutorialkart.com/pdfbox/create-write-text-pdf-file-using-pdfbox/
here is a tuto of the process gl

Compressing content in Java without file I/O

I would like to perform repeated compression task for CPU profiling without doing any file I/O but strictly reading a byte stream. I want to do this in Java (target of my benchmark).
Does anyone have a suggestion how to do this?
I used Zip API that uses ZipEntry but ZipEntry triggers file I/O.
Any suggestions or code samples are highly appreciated.

I used Zip API that uses ZipEntry but ZipEntry triggers file I/O.
I wouldn't expect it to if you use a ByteArrayOutputStream as the underlying output:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zipStream = new ZipOutputStream(baos);
... write to zipStream ...
Likewise wrap your byte array for reading data from in a ByteArrayInputStream.
Of course, ZipOutputStream is appropriate if you want to create content using the zip compression format, which is good for (or at least handles :) multiple files. For a single stream of data, you may want to use DeflaterOutputStream or GZIPOutputStream, again using a ByteArrayOutputStream as the underlying output.

Instead of using a FileInputStream or FileOutputStream you can use ByteArrayInputStream and ByteArrayOutputStream.

Append full PDF file to FOP PDF

I have an xml file already being created and rendered as a PDF sent over a servlet:
TraxInputHandler input = new TraxInputHandler(
new File(XML_LOCATION+xmlFile+".xml"),
new File(XSLT_LOCATION)
);
ByteArrayOutputStream out = new ByteArrayOutputStream();
//driver is just `new Driver()`
synchronized (driver) {
driver.reset();
driver.setRenderer(Driver.RENDER_PDF);
driver.setOutputStream(out);
input.run(driver);
}
//response is HttpServletResponse
byte[] content = out.toByteArray();
response.setContentType("application/pdf");
response.setContentLength(content.length);
response.getOutputStream().write(content);
response.getOutputStream().flush();
This is all working perfectly fine.
However, I now have another PDF file that I need to include in the output. This is just a totally separate .pdf file that I was given. Is there any way that I can append this file either to the response, the driver, out, or anything else to include it in the response to the client? Will that even work? Or is there something else I need to do?

We also use FOP to generate some documents, and we accept uploaded documents, all of which we eventually combine into a single PDF.
You can't just send them sequentially out the stream, because the combined result needs a proper PDF file header, metadata, etc.
We use the iText library to combine the files, starting off with
PdfReader reader = new PdfReader(/*String*/fileName);
reader.consolidateNamedDestinations();
We later loop through adding pages from each pdf to the new combined destination pdf, adjusting the bookmark / page numbers as we go.
AFAIK, FOP doesn't provide this sort of functionality.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

iText - OutOfMemory creating more than 1000 PDFs - java

It's because your ZipOutputStream is backed by a ByteArrayOutputStream, so even closing the entries keeps the full ZIP contents in memory.

Related

How to fix java.lang.OutOfMemoryError: Java heap space when generating pdf document?

iText - Read pdfs in a loop and merge one by one

How to convert InputStream to a PDF in Java, without damaging the file?

Compressing content in Java without file I/O

Append full PDF file to FOP PDF

Categories

Resources