iText - Read pdfs in a loop and merge one by one

iText - Read pdfs in a loop and merge one by one - java

I have a loop where I read a pdf file every time. And I want to add these pdf files into another final pdf. Basically merging all pdfs into one new pdf file.
I tried following way :
I tried to concatenate byteArrays inside the loop like
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
for (............){
outputStream.write(myInfo.getByteArray());
}
pdfreader = new PdfReader(outputStream.toByteArray());
FileOutputStream fileout = new FileOutputStream(file);
PdfStamper pdfStamper = new PdfStamper(pdfreader, fileout);
pdfStamper.close();
pdfreader.close();
The problem the final pdf does not have all the pdfs. Instead it has only one pdf.
And I am not sure if this is the right way to do it.
Or is there any other to merge pdfs one by one ?

Have a look at the documentation for PdfMerger: itextsupport.com/apidocs/iText7/7.0.3/com/itextpdf/kernel/utils/PdfMerger.html
I can prepare an example later if needed.

Related

How to read the pdf content from response and write it to another new pdf file

%PDF-1.4
%����
1 0 obj
<<
/Type /Catalog
/Pages 9 0 R
/Outlines 8 0 R
/Names 6 0 R
i am trying to read above pdf content response from rest end point in java class and trying to write it to another file
but the file is getting corrupted and I could not view the pdf generated
File file = new File("Data.pdf");-- trying to write data to this
FileOutputStream out = new FileOutputStream(file)
\\service call to download pdf document
out.write(response.getBody().getBytes());
how to write the pdf content to another file or generate new pdf in a proper way?

Basically you want to read from an InputStream and then write to an OutputStream. This question has been answered several times e.g. here, here and here and there are lots of possible solutions. Since you also tagged ioutils one possible way is to:
File file = new File("Data.pdf");
FileOutputStream out = new FileOutputStream(file)
IOUtils.copy(response.getBody(), out);
This presumes that response.getBody returns an InputStream. If you supply more code we can tell for sure. (This depends on your restclient implementation you are using like JAX-RS, Spring-Rest, Apache httpClient or HttpUrlConnection...

PdfReader class in iText 7 has got an overloaded version that takes InputStream as an argument. Using this method, you can basically read the bytes of your first input pdf using ByteArrayInputStream. iText 7 has got a PDFWriter class that can write to an OutputStream too. Please refer to the following snippet. PdfDocument class then can read the input pdf file and write it to a new file using pdfWriter.
//Pdf bytes returned by some rest API or method
byte[] bytes = {};
ByteArrayInputStream bin = new ByteArrayInputStream(bytes);
//File where you want to write the pdf and update some content
File file = new File("Data.pdf");
FileOutputStream out = new FileOutputStream(file);
PdfDocument dd = new PdfDocument(new PdfReader(bin), new PdfWriter(out));

Unexpected different page sizes when merging PDF's in Java using iText

I have the following code snippet to merge two single paged PDF files (first and second):
byte[] codes = IOUtils.toByteArray(resource.getURI());
PdfReader first = new PdfReader(firstBytes);
PdfReader second = new PdfReader(secondBytes);
Document document = new Document(PageSize.A4);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfCopy copy = new PdfCopy(document, byteArrayOutputStream);
document.open();
copy.addDocument(first);
copy.addDocument(second);
document.close();
return byteArrayOutputStream.toByteArray();
Next I have the following test:
byte[] generated = new Merger(...).generate(...); // Location of the snippet above
File file = new File("dir", "generated.pdf");
FileUtils.writeByteArrayToFile(file , generated);
PdfReader pdfReader = new PdfReader(new FileInputStream(file));
assertThat(pdfReader.getNumberOfPages()).isEqualTo(2);
This test works fine locally and fails on our build server.
Locally the generated PDF simply looks contains the two A4 pages On the build server there are three pages:
the first document
one blank page
the second document
The first two pages seem to be in letter format, while the last page seems to be an A4 page.
How do I fix this?
Edit: Some extra info. Local OS is Windows. Build system runs Linux.

The first generated document was generated using the flying-saucer library. It was fixed by setting the page size in the css.
My assumption of where the problem was located was completely wrong. Sorry.

Converting .docx file to pdf using apache poi drops images

I'm having a word document, .docx, containing tables, paragraphs and images. I have been able to successfully convert the file to pdf but the pdf file is missing images. This is a code snippet I'm using:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(xwpfDocument, baos, options);
new FileOutputStream(new File("/home/sam/test.pdf")).write(baos.toByteArray());
The final file, test.pdf does not contain images on the .docx. Is there something else I'm supposed to do?

iText - OutOfMemory creating more than 1000 PDFs

I want to create a ZipOutputStream filled with PDF-As. I'm using iText (Version 5.5.7). For more than 1000 pdf entries I get an OutOfMemory-exception on doc.close() and can't find the leak.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(baos));
zos.setEncoding("Cp850");
for (MyObject o : objects) {
try {
String pdfFilename = o.getName() + ".pdf";
zos.putNextEntry(new ZipEntry(pdfFilename));
pdfBuilder.buildPdfADocument(zos);
zos.closeEntry();
} ...
PdfBuilder
public void buildPdfADocument(org.apache.tools.zip.ZipOutputStream zos){
Document doc = new Document(PageSize.A4);
PdfAWriter writer = PdfAWriter.getInstance(doc, zos, PdfAConformanceLevel.PDF_A_1B);
writer.setCloseStream(false); // to not close my zos
writer.setViewerPreferences(PdfWriter.ALLOW_PRINTING | PdfWriter.PageLayoutSinglePage);
writer.createXmpMetadata();
doc.open();
// adding Element's to doc
// with flushContent() on PdfPTables
InputStream sRGBprofile = servletContext.getResourceAsStream("/WEB-INF/conf/AdobeRGB1998.icc");
ICC_Profile icc = ICC_Profile.getInstance(sRGBprofile);
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
//try to close/flush everything possible
doc.close();
writer.setXmpMetadata(null);
writer.flush();
writer.close();
if(sRGBprofile != null){
sRGBprofile.close();
}
}
Any suggestions how can I fix it? Am I forgetting something?
I've already tried to use java ZipOutputStream but it makes any difference.
Thx for ur answers! I understand the issue with the ByteOutputStream, but I am not sure what's the best approach in my case. It's a web application and I need to pack the zip in a database blob somehow.
What I am doing now is creating the PDFs directly into the ZipOutputStream with iText and saving byte array of the corresponding ByteArrayOutputSteam to blob. Options that I see are:
Split my data in 500 object packages, save first 500 PDFs to the database and then open the zip and add the next 500 ones and so on... But I assume that this creates me the same situation as I have now, namely too big stream opened in the memory.
Try to save the PDFs on the server (not sure if there's enough space), create temporary zip file and then submit the bytes to the blob...
Any suggestions/ideas?

It's because your ZipOutputStream is backed by a ByteArrayOutputStream, so even closing the entries keeps the full ZIP contents in memory.

You need to use another approach to do it with this number of arguments (1000+ files).
You are loading all the PDF files in memory on your example, you will need to do this in blocks of documents to minimize the effect of this 'memory load'.
Another approach is serialize your PDFs on filesystem, and then create your zip file.

Append full PDF file to FOP PDF

I have an xml file already being created and rendered as a PDF sent over a servlet:
TraxInputHandler input = new TraxInputHandler(
new File(XML_LOCATION+xmlFile+".xml"),
new File(XSLT_LOCATION)
);
ByteArrayOutputStream out = new ByteArrayOutputStream();
//driver is just `new Driver()`
synchronized (driver) {
driver.reset();
driver.setRenderer(Driver.RENDER_PDF);
driver.setOutputStream(out);
input.run(driver);
}
//response is HttpServletResponse
byte[] content = out.toByteArray();
response.setContentType("application/pdf");
response.setContentLength(content.length);
response.getOutputStream().write(content);
response.getOutputStream().flush();
This is all working perfectly fine.
However, I now have another PDF file that I need to include in the output. This is just a totally separate .pdf file that I was given. Is there any way that I can append this file either to the response, the driver, out, or anything else to include it in the response to the client? Will that even work? Or is there something else I need to do?

We also use FOP to generate some documents, and we accept uploaded documents, all of which we eventually combine into a single PDF.
You can't just send them sequentially out the stream, because the combined result needs a proper PDF file header, metadata, etc.
We use the iText library to combine the files, starting off with
PdfReader reader = new PdfReader(/*String*/fileName);
reader.consolidateNamedDestinations();
We later loop through adding pages from each pdf to the new combined destination pdf, adjusting the bookmark / page numbers as we go.
AFAIK, FOP doesn't provide this sort of functionality.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

iText - Read pdfs in a loop and merge one by one - java

Have a look at the documentation for PdfMerger: itextsupport.com/apidocs/iText7/7.0.3/com/itextpdf/kernel/utils/PdfMerger.html I can prepare an example later if needed.

Related

How to read the pdf content from response and write it to another new pdf file

Unexpected different page sizes when merging PDF's in Java using iText

Converting .docx file to pdf using apache poi drops images

iText - OutOfMemory creating more than 1000 PDFs

Append full PDF file to FOP PDF

Categories

Resources