HTML to PDF with base64 images throws FileNotFoundException - java

I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks

Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.

Related

Why is my form being flattened without calling the flattenFields method?

I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.
Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...
The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.
IllegalArgumentException issue has been fixed in iText 7.1.5.

Add html code with base64 images in a header using iText [duplicate]

I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks
Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.

iText mergeFields in PdfCopy creates invalid pdf

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks
We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function
Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.
Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

Print PDF that contains JBIG2 images

Please, suggest me some libraries that will help me print PDF files that contain JBIG2 encoded images. PDFRenderer, PDFBox don't help me. These libs can print simple PDF, but not PDF containing JBIG2 images. PDFRenderer tries to fix it (according to bug issue on PDFRedndrer's bug tracker), but some pages still (especially where barcodes exist) don't want to print.
P.S. I use javax.print API within applet
Thanks!
UPDATE: also tried ICEPdf, is too don't want to work.
I came to the conclusion that all these libraries(PDFRenderer, ICEPdf, PDFBox) use JPedals jbig2 decoder. Bug (some pages didn't print) come from this decoder library. The open source version of this decoder (which is used in PDFRenderer, ICEPdf, PDFBox) is no longer supported, but JPedal has a new commercial branch of the project, and they wrote that the bug has been fixed in new commercial release, which costs $9k.
Any ideas?
UPDATE 2: yesterday I tried to replace JPedal's free library with other open-source jbig2-imageio libraries. But yet I don't get any successful results, so I created a new topic on their project's page (google-code's forum - here ). Would be grateful for any help.
I also found some helpfull discussions on Apache PDFBox bug-tracker: here and here.
As going through your comment in yms answer ie. " but what library I can use to extract images and (more importantly) put them back in PDF?"
Here is a simple demonstration of
1 ) Extracting jbig2 or you can say all images from pdf.
2 ) Converting jbig2 image to any other format, in my case its jpeg.
3 ) Creating new pdf containing the jpeg.
Using libraries jbig2-imageio and itext.
In the below example please change the resources and the directories path as per your need.
For this I had to go through several resources that I will attach in the end. Hope this helps.
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.parser.*;
import com.levigo.jbig2.JBIG2ImageReader;
import com.levigo.jbig2.JBIG2ImageReaderSpi;
import com.levigo.jbig2.JBIG2ReadParam;
import com.levigo.jbig2.io.DefaultInputStreamFactory;
import java.awt.image.BufferedImage;
import java.io.*;
import javax.imageio.ImageIO;
import javax.imageio.stream.ImageInputStream;
public class JBig2Image {
private String filepath;
private int imageIndex;
public JBig2Image() {
this.filepath = "/home/blackadmin/Desktop/pdf/demo18.jbig2";
this.imageIndex = 0;
extractImgFromPdf();
convertJBig2ToJpeg();
createPDF();
}
private void extractImgFromPdf() {
try {
/////////// Extract all Images from pdf /////////////////////////
PdfReader reader = new PdfReader("/home/blackadmin/Desktop/pdf/orig.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener("/home/blackadmin/Desktop/pdf/demo%s.%s");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
} catch (IOException ex) {
System.out.println(ex);
}
}
private void convertJBig2ToJpeg() {
InputStream inputStream = null;
try {
///////// Read jbig2 image ////////////////////////////////////////
inputStream = new FileInputStream(new File(filepath));
DefaultInputStreamFactory disf = new DefaultInputStreamFactory();
ImageInputStream imageInputStream = disf.getInputStream(inputStream);
JBIG2ImageReader imageReader = new JBIG2ImageReader(new JBIG2ImageReaderSpi());
imageReader.setInput(imageInputStream);
JBIG2ReadParam param = imageReader.getDefaultReadParam();
BufferedImage bufferedImage = imageReader.read(imageIndex, param);
////////// jbig2 to jpeg ///////////////////////////////////////////
ImageIO.write(bufferedImage, "jpeg", new File("/home/blackadmin/Desktop/pdf/demo18.jpeg"));
} catch (IOException ex) {
System.out.println(ex);
} finally {
try {
inputStream.close();
} catch (IOException ex) {
System.out.println(ex);
}
}
}
public void createPDF() {
Document document = new Document();
try {
PdfWriter.getInstance(document,
new FileOutputStream("/home/blackadmin/Desktop/pdf/output.pdf"));
document.open();
PdfPTable table = new PdfPTable(1); //1 column.
Image image = Image.getInstance("/home/blackadmin/Desktop/pdf/demo18.jpeg");
image.scaleToFit(800f, 600f);
image.scaleAbsolute(800f, 600f); // Give the size of image you want to print on pdf
PdfPCell nestedImgCell = new PdfPCell(image);
table.addCell(nestedImgCell);
document.add(table);
document.close();
System.out.println(
"======== PDF Created Successfully =========");
} catch (Exception e) {
System.out.println(e);
}
}
public static void main(String[] args) throws IOException {
new JBig2Image();
}
}
class MyImageRenderListener implements RenderListener {
/**
* The new document to which we've added a border rectangle.
*/
protected String path = "";
/**
* Creates a RenderListener that will look for images.
*/
public MyImageRenderListener(String path) {
this.path = path;
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#beginTextBlock()
*/
public void beginTextBlock() {
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#endTextBlock()
*/
public void endTextBlock() {
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#renderImage(
* com.itextpdf.text.pdf.parser.ImageRenderInfo)
*/
public void renderImage(ImageRenderInfo renderInfo) {
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
if (image == null) {
return;
}
filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
os = new FileOutputStream(filename);
os.write(image.getImageAsBytes());
os.flush();
os.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#renderText(
* com.itextpdf.text.pdf.parser.TextRenderInfo)
*/
public void renderText(TextRenderInfo renderInfo) {
}
}
References :
1 ) Extracting jbig2 from pdf (extract images) (MyImageRenderListener).
2 ) Converting jbig2 (JBIG2ImageReaderDemo)
There is a fork of the JPedal library by Borisvl located at
https://github.com/Borisvl/JBIG2-Image-Decoder#readme
which contains speed improvements and I believe it should also fix your bug.
EDIT : The bug is related to simple range checking. Basically you need to prevent GetPixel from accessing x,y values outside of the bitmap extents.
You need to make sure the following conditions are met before calling getPixel
col >= 0 and col < bitmap.width
row >= 0 and row < bitmap.height
Here is some Delphi code with a couple of small range checks. I cannot test the Java code myself but you need to make changes to src/org/jpedal/jbig2/image/JBIG2Bitmap.java
procedure TJBIG2Bitmap.combine(bitmap: TJBIG2Bitmap; x, y: Integer; combOp: Int64);
...
...
var
begin
srcWidth := bitmap.width;
srcHeight := bitmap.height;
srcRow := 0;
srcCol := 0;
if (x < 0) then x := 0;
if (y < 0) then y := 0;
for row := y to Min(y + srcHeight - 1, Self.height - 1) do // <<<<<<<< HERE
begin
for col := x to x + srcWidth - 1 do
begin
srcPixel := bitmap.getPixel(srcCol, srcRow);
Andrew.
How about using AcrobatReader itself? It's a bit muddy getting it to work, and not a robust solution I guess. But will probably print all of it perfectly. And be free
Some info about this route;
http://vineetreynolds.blogspot.nl/2005/12/silent-print-pdf-print-pdf.html
http://www.codeproject.com/Questions/98586/Programmatically-print-PDF-documents
http://forums.adobe.com/message/2336723
You have tools as ImageMagick which handle images and convert them to a lot of formats. I used it some years ago so I can't tell you if the jbig2 format is properly handled by default or if you have to install some plugin.
You can try the following to have a list of supported formats beginning with J like the JBIG2 you are searching for:
$ convert -list format | grep -i J
It is really obvious to convert to pdf with with tool too, coupled with gs tool aka GhostScript.
If fact nothing prevent you to display a PNG/JPEG version of the image and provide a download link to the original JBIG2 file with its own metadatas.
As an alternative, you could try doing this server-side:
Approach 1:
Convert the PDF files to raster images using an external application and print that instead.
Approach 2:
Adjust your PDF files by recompressing JBIG2 images:
1- Extracting the images compressed as JBIG2 from your files.
2- Re-compress them with some other algorithm (jpeg, png, etc). In order to do this you might need to go outside of Java using either JNI or calling an external application. You can try with jbig2dec or ImageMagic for example if the GPL lincense suits your needs.
3- Put the recompressed images back in your PDF.
This approach will imply some quality loss on those images, but at least you will be able to print the files.
You can do this in Java with iText, there is a chapter about resizing images in the book iText in Action (with sample code). The idea there is to extract the image, resize it (including recompression) and put it back. You can use this as starting point. Be aware that iText is an AGPL project, hence you cannot use it for free in commercial closed-source applications.
If you are using a Windows-based server and you can afford a commercial tool, you can also achieve this with Amyuni PDF Creator either with C#/VB.Net or C++ (Usual disclaimer applies for this suggestion). You just need to go though all objects of type acObjectTypePicture and set the attribute Compression to acJPegHigh, this approach does not require any external JBIG2 decoder, (I can include some sample code here if you are interested).
If you are using an applet just to print your PDF files, you could also try generating a PDF file that shows the print dialog when opened

Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

I have eyeballing this code for a long time, trying to reducing the amount of memory the code use and still it generated java.lang.OutOfMemoryError: Java heap space. As my last resort, I want to ask the community on how can I improve this code to avoid OutOfMemoryError
I have a driver/manifest file (.txt file) that contain information about the PDFs. I have about 2000-5000 pdf inside a zip file that I need to combine together. Before the combining, for each pdf, I need to add 2-3 more pdf pages to it. Manifest object holds information about a pdf.
try{
blankPdf = new PdfReader(new FileInputStream(config.getBlankPdf()));
mdxBacker = new PdfReader(new FileInputStream(config.getMdxBacker()));
theaBacker = new PdfReader(new FileInputStream(config.getTheaBacker()));
mdxAffidavit = new PdfReader(new FileInputStream(config.getMdxAffidavit()));
theaAffidavit = new PdfReader(new FileInputStream(config.getTheaAffidavit()));
ImmutableList<Manifest> manifestList = //Read manifest file and obtain List<Manifest>
File zipFile = new File(config.getInputDir() + File.separator + zipName);
//Extracting PDF into `process` folder
ZipUtil.extractAll(config.getExtractPdfDir(), zipFile);
outputPdfName = zipName.replace(".zip", ".pdf");
outputZipStream = new FileOutputStream(config.getOutputDir() +
File.separator + outputPdfName);
document = new Document(PageSize.LETTER, 0, 0, 0, 0);
writer = new PdfCopy(document , outputZipStream);
document.open(); //Open the document
//Start combining PDF files together
for(Manifest m : manifestList){
//Obtain full path to the current pdf
String pdfFilePath = config.getExtractPdfDir() + File.separator + m.getPdfName();
//Before combining PDF, add backer and affidavit to individual PDF
PdfReader pdfReader = PdfUtil.addBackerAndAffidavit(config, pdfType, m,
pdfFilePath, blankPdf, mdxBacker, theaBacker, mdxAffidavit,
theaAffidavit);
for(int pageNumber=1; pageNumber<=pdfReader.getNumberOfPages(); pageNumber++){
document.newPage();
PdfImportedPage page = writer.getImportedPage(pdfReader, pageNumber);
writer.addPage(page);
}
}
} catch (DocumentException e) {
} catch (IOException e) {
} finally{
if(document != null) document.close();
try{
if(outputZipStream != null) outputZipStream.close();
if(writer != null) writer.close();
}catch(IOException e){
}
}
Please, rest assure that I have look at this code for a long time, and try rewrite it many times to reduce the amount of memory it using. After the OutOfMemoryError, there are still lots of pdf files that have not been added 2-3 extra pages, so I think it is inside addBackerAndAffidavit, however, I try to close every resources I opened, but it still exception out. Please help.
You need to invoke PdfWriter#freeReader() by end of every loop to free the involved PdfReader. The PdfCopy#freeReader() has this method inherited from PdfWriter and does the same. See also the javadoc:
freeReader
public void freeReader(PdfReader reader)
throws IOException
Description copied from class: PdfWriter
Use this method to writes the reader to the document and free the memory used by it. The main use is when concatenating multiple documents to keep the memory usage restricted to the current appending document.
Overrides:
freeReader in class PdfWriter
Parameters:
reader - the PdfReader to free
Throws:
IOException - on error

Categories