PDF merging with itext and pdfbox - java

I have multi module maven project, in that there is a process of request generation and in this process there are some upload component of vaadin in these we are uploading some documents that must be only png, jpgs, pdf and bmp.
Now at last of this process i am merging all the document types into one pdf and then downloading it with file downloader.
The function i am calling on a button click event is:
/**
* This function is responsible for getting
* all documents from request and merge
* them in a single pdf file for
* download purposes
* #throws Exception
*/
protected void downloadMergedDocument() throws Exception {
// Calling create pdf function for merged pdf
createPDF();
// Setting the merged file as a resource for file downloader
Resource myResource = new FileResource(new File (mergedReportPath +request.getWebProtocol()+ ".pdf"));
FileDownloader fileDownloader = new FileDownloader(myResource);
// Extending the download button for download
fileDownloader.extend(downloadButton);
}
/**
* This function is responsible for providing
* the PDF related to a particular request that
* contains all the documents merged inside it
* #throws Exception
*/
private void createPDF() throws Exception {
try{
// Getting the current request
request = evaluationRequestUI.getRequest();
// Fetching all documents of the request
Collection<DocumentBean> docCollection = request.getDocuments();
// Initializing Document of using itext library
Document doc = new Document();
// Setting PdfWriter for getting the merged images file
PdfWriter.getInstance(doc, new FileOutputStream(mergedReportPath+ "/mergedImages_" + request.getWebProtocol()+ ".pdf"));
// Opening document
l_doc.open();
/**
* Here iterating on document collection for the images type
* document for merging them into one pdf
*/
for (DocumentBean documentBean : docCollection) {
byte[] documents = documentBean.getByteArray();
if(documentBean.getFilename().toLowerCase().contains("png") ||
documentBean.getFilename().toLowerCase().contains("jpeg") ||
documentBean.getFilename().toLowerCase().contains("jpg") ||
documentBean.getFilename().toLowerCase().contains("bmp")){
Image img = Image.getInstance(documents);
doc.setPageSize(img);
doc.newPage();
img.setAbsolutePosition(0, 0);
doc.add(img);
}
}
// Closing the document
doc.close();
/**
* Here we get all the images type documents merged into
* one pdf, now moving to pdfbox for searching the pdf related
* document types in the request and merging the above resultant
* pdf and the pdf document in the request into one pdf
*/
PDFMergerUtility utility = new PDFMergerUtility();
// Adding the above resultant pdf as a source
utility.addSource(new File(mergedReportPath+ "/mergedImages_" + request.getWebProtocol()+ ".pdf"));
// Iterating for the pdf document types in the collection
for (DocumentBean documentBean : docCollection) {
byte[] documents = documentBean.getByteArray();
if(documentBean.getFilename().toLowerCase().contains("pdf")){
utility.addSource(new ByteArrayInputStream(documents));
}
}
// Here setting the final pdf name
utility.setDestinationFileName(mergedReportPath +request.getWebProtocol()+ ".pdf");
// Here final merging and then result
utility.mergeDocuments();
}catch(Exception e){
m_logger.error("CATCH", e);
throw e;
}
}
Note: mergedReportPath is a path defined for pdf files to be stored and then
retreive from there for download purposes.
Now, i have two problems in that:
When i do this process for a first request , it give me the pdfs in the
destination folder but it does not download it.
When i again do the this process for the second request, it get stuck on
the utility.mergedocuments(), i mean if it found that the pdf is already
present in the destination folder it get stuck. I dont know where the
problem is. Please Help

In the 2.0 version of PDFBox, you can set an output stream with setDestinationStream(). Thus, you just call
response.setContentType("application/pdf");
OutputStream os = response.getOutputStream();
utility.setDestinationStream(os);
utility.mergeDocuments();
os.flush();
os.close();
You can't set the response size this way; if you have to, use ByteArrayOutputStream like in Bruno's answer or this one.

In the comment section of your question, you have clarified that you don't need the file on disk, but that you want to send the PDF to the browser. You want to know how to achieve this. This is explained in the official documentation: How can I serve a PDF to a browser without storing a file on the server side?
This is how you create a PDF in memory:
// step 1
Document document = new Document();
// step 2
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
document.add(new Paragraph("Hello"));
// step 5
document.close();
Merging PDFs is done with PdfCopy: How to merge documents correctly?
You need to apply the same principle as above to those examples: replace the FileOutputStream by a ByteArrayOutputStream.
Now you have PDF bytes that are stored in the baos object. We can send it to the browser like this:
// setting some response headers
response.setHeader("Expires", "0");
response.setHeader("Cache-Control",
"must-revalidate, post-check=0, pre-check=0");
response.setHeader("Pragma", "public");
// setting the content type
response.setContentType("application/pdf");
// the contentlength
response.setContentLength(baos.size());
// write ByteArrayOutputStream to the ServletOutputStream
OutputStream os = response.getOutputStream();
baos.writeTo(os);
os.flush();
os.close();
Make sure to read the documentation if you have further questions.

Related

Which Java Library can help to convert an MS Word Document containing shapes and Word drawings to PDF?

I am trying to convert files with .docx extension to PDF using Java. I need to convert files with shapes and drawings in MS Word. Which libraries(open source or licensed) will serve the purpose?
Currently I have been using "org.apache.poi.xwpf.converter.pdf.PdfConverter" for the purpose, but it skips to convert the shapes or drawings in my Word Document. I am unable to test it using Aspose.words. Any help with that will also be appreciated.
The method I used for conversion is:
public static void createPDFFromIMG(String sSourceFilePath,String sFileName, String sDestinationFilePath) throws Exception {
logger.debug("Entered into createPDFFromIMG()\n");
logger.info("### Started PDF Conversion..");
System.out.println("### Started PDF Conversion..");
try {
if(sFileName.contains(".docx")) {
InputStream doc = new FileInputStream(new File(sSourceFilePath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(
new File(sDestinationFilePath + "/" + sFileName.split("\\.")[0] + ".pdf"));
PdfConverter.getInstance().convert(document, out, options);
doc.close();
out.close();
System.out.println("### Completed PDF Conversion..");
logger.info("### Completed PDF Conversion..");
logger.debug("Exited from createPDFFromIMG()");
return;
}
}
I expect the complete Word file to be converted to PDF, but the file converted using the mentioned Java library does not contain drawings or shapes present in the docx file.
It is not actually clear why you unable to test with Aspose.Words. Code is quite simple
Document doc = new Document("in.docx");
Doc.save("out.pdf");
Also you can test with free Aspose App (which is actually based on Aspose.Words)
https://products.aspose.app/words/conversion

Generate pdf file in Secured mode

I have written a code for pdf generation and it is working fine but now I to generate a pdf file in secured mode.
Here is my code for Secured mode
try {
HttpServletResponse response = ServletActionContext.getResponse();
PDFGenerator pdf = new PDFGenerator();
PDFGenerator generator=new PDFGenerator();
/* byte[] bytes = null;
bytes = (generator.generatepdf(sosValues.getCmaId(), null)).toByteArray();
//bytes = buffer.toByteArray();
response.setContentLength(bytes.length);
if (bytes != null) {
bis = new ByteArrayInputStream(bytes);
}*/
ByteArrayOutputStream baos=generator.generatepdf(sosValues.getCmaId(), null);
bis = new ByteArrayInputStream(baos.toByteArray());
PdfReader pdfReader=new PdfReader(bis);
PdfStamper pdfStamper=new PdfStamper(pdfReader, baos);
pdfStamper.setEncryption(null,null, PdfWriter.HideToolbar, PdfWriter.STRENGTH40BITS);
pdfStamper.setEncryption("Hello".getBytes(), "World".getBytes(), PdfWriter.AllowPrinting
| PdfWriter.AllowCopy, PdfWriter.STRENGTH40BITS);
pdfStamper.close();
baos.close();
} catch (Exception e) {
e.printStackTrace();
}
While debugging I was getting an exception at this line pdfStamper.setEncryption(null,null, PdfWriter.HideToolbar, PdfWriter.STRENGTH40BITS);
Exception in browser was:
The server encountered an internal error that prevented it from fulfilling this request.
PdfWriter.HideToolbar is a viewer preference, not a permission.
This is the list of permissions:
PdfWriter.ALLOW_PRINTING
PdfWriter.ALLOW_MODIFY_CONTENTS
PdfWriter.ALLOW_COPY
PdfWriter.ALLOW_MODIFY_ANNOTATIONS
PdfWriter.ALLOW_FILL_IN
PdfWriter.ALLOW_SCREEN_READERS
PdfWriter.ALLOW_ASSEMBLY
PdfWriter.ALLOW_DEGRADED_PRINTING
Moreover: hiding the toolbar in the hope to secure a PDF is wrong. Please read my answer to How to disable download option of pdf file in c# ?
Even using encryption to avoid printing may not be the best of ideas. See How to protect a PDF with a username and password?
However, this isn't what causes your problem. The internal error is caused by the strange way you're using the ByteArrayOutputStream. You generate a PDF in memory in the generatepdf() method. You didn't share that method, but:
if you're closing that stream, you get the exception because you're trying to add new bytes to it with your stamper object. You can't add extra bytes to a closed OutputStream.
if you're not closing that stream, your PDF isn't completer and you'll get an exception when PdfReader tries to read the (unfinished) PDF.
Furthermore, it's very strange that you would first create the PDF, and then read that PDF to encrypt it. Why not encrypt it right away? That saves you CPU-time.

iText mergeFields in PdfCopy creates invalid pdf

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks
We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function
Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.
Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

How to return a non-editable PDF as a response?

I have a URL:
http://www.irs.gov/pub/irs-pdf/fw4.pdf
It contains an editable PDF. I have make it non-editable. I did so and kept it in the temp directory of a folder. Now i want to send the non-editable PDF as a response, when the user clicks this url, he must get the non-editable pdf. This is what I have done till now:
String strDirectoy ="C:\\Temp";
boolean success = (
new File(strDirectoy)).mkdir();
if (success) {
System.out.println("Directory: "
+ strDirectoy + " created");
}
PdfReader reader = new PdfReader("http://www.irs.gov/pub/irs-pdf/fw4.pdf");//C:\\fw4.pdf
PdfStamper stamp2 = new PdfStamper(reader, new FileOutputStream("C:\\Temp\\Flattened.pdf"));
AcroFields form2 = stamp2.getAcroFields();
stamp2.setFormFlattening(true);
stamp2.close();
Now i need to delete the temp folder as if it never existed and return the non-editable PDF as the response for the above specified URL.
How can i do this?
Write a Servlet.
Flatten your pdf in a temporary file (using the
createTempFile() and deleteOnExit() methods of java.io.File).
Use the setContentType of the HttpServletResponse to set the MIME type
of the pdf.
Write the contents of the temporary pdf file to the
outputstream of the http response

Creating tables in a MS Word file using Java

I want to create a table in a Microsoft Office Word file using Java. Can anybody tell me how to do it with an example?
Have a look at Apache POI
The POI project is the master project
for developing pure Java ports of file
formats based on Microsoft's OLE 2
Compound Document Format. OLE 2
Compound Document Format is used by
Microsoft Office Documents, as well as
by programs using MFC property sets to
serialize their document objects.
I've never seen it done, and I work in Word a lot. If you really want to programatically do something in a word document then I'd advise using Microsoft's scripting language VBA which is specifically designed for this purpose. In fact, I'm working in it right now.
If you're working under Open Office then they have a very similar set of macro-powered tools for doing the same thing.
Office 2003 has an xml format, and the default document format for office 2007 is xml (zipped). So you could just generate xml from java. If you open an existing document it's not too hard too see the xml required.
Alternatively, you could use openoffice's api to generate a document, and save it as a ms-word document.
This snippet can be used to create a table dynamically in MS Word document.
WPFDocument document = new XWPFDocument();
XWPFTable tableTwo = document.createTable();
XWPFTableRow tableTwoRowOne = tableTwo.getRow(0);
tableTwoRowOne.getCell(0).setText(Knode1);
tableTwoRowOne.createCell().setText(tags.get("node1").toString());
for (int i = 1; i < nodeList.length; i++) {
String node = "node";
String nodeVal = "";
XWPFTableRow tr = null;
node = node + (i + 1);
nodeVal = tags.get(node).toString();
if (tr == null) {
tr = tableTwo.createRow();
tr.getCell(0).setText(nodeList[i]);
tr.getCell(1).setText(tags.get(node).toString());
}
}
Our feature set is to hit a button in our web app and get the page you are looking at back as a Word document. We use the docx schema for description of documents and have a bunch of Java code on the server side which does the document creation and response back to our web client. The formatting itself is done with some compiled xsl-t's from within Java to translate from our own XML persistence tier.
The docx schema is pretty hard to understand. The way we made most progress was to create template docx's in Word with exactly the formatting that we needed but with bogus content. We then fooled around with them until we understood exactly what was going on. There is a huge amount in the docx that you don't really need to worry about. When reading / translating the docx Word is pretty tolerant to a partially complete formatting schema. In fact we chose to strip out pretty much all the formatting because it also means that the user's default formatting takes precedence, which they seem to prefer. It also makes the xsl process faster and the resulting document smaller.
I manage the docx4j project
docx4j contains a class TblFactory, which creates regular tables (ie no row or column spans), with the default settings which Word 2007 would create, and with the dimensions specified by the user.
If you want a more complex table, the easiest approach is to create it in Word, then copy the resulting XML into a String in your IDE, where you can use docx4j's XmlUtils.unmarshalString to create a Tbl object from it.
Using my little zip utility, you can create docx with ease, if you know what you're doing. Word's DOCX file format is simply zip (folders with xml files). By using java zip utilities, you can modify existing docx, just the content part.
For the following sample to work, simply open Word, enter few lines, save document. Then with zip program, remove file word/document.xml (this is file where main content of the Word document is residing) from the zip. Now you have the template prepared. Save modified zip.
Here is what creation of new Word file looks:
/* docx file head */
final String DOCUMENT_XML_HEAD =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?>" +
"<w:document xmlns:wpc=\"http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\" xmlns:v=\"urn:schemas-microsoft-com:vml\" xmlns:wp14=\"http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing\" xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\" xmlns:w10=\"urn:schemas-microsoft-com:office:word\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\" xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\" xmlns:wpg=\"http://schemas.microsoft.com/office/word/2010/wordprocessingGroup\" xmlns:wpi=\"http://schemas.microsoft.com/office/word/2010/wordprocessingInk\" xmlns:wne=\"http://schemas.microsoft.com/office/word/2006/wordml\" xmlns:wps=\"http://schemas.microsoft.com/office/word/2010/wordprocessingShape\" mc:Ignorable=\"w14 w15 wp14\">" +
"<w:body>";
/* docx file foot */
final String DOCUMENT_XML_FOOT =
"</w:body>" +
"</w:document>";
final ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("c:\\TEMP\\test.docx"));
final String fullDocumentXmlContent = DOCUMENT_XML_HEAD + "<w:p><w:r><w:t>Hey MS Word, hello from java.</w:t></w:r></w:p>" + DOCUMENT_XML_FOOT;
final si.gustinmi.DocxZipCreator creator = new si.gustinmi.DocxZipCreator();
// create new docx file
creator.createDocxFromExistingDocx(zos, "c:\\TEMP\\existingDocx.docx", fullDocumentXmlContent);
These are zip utilities:
package si.gustinmi;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.logging.Logger;
import java.util.zip.CRC32;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;
/**
* Creates new docx from existing one.
* #author gustinmi [at] gmail [dot] com
*/
public class DocxZipCreator {
public static final Logger log = Logger.getLogger(DocxZipCreator.class.getCanonicalName());
private static final int BUFFER_SIZE = 4096;
/** OnTheFly zip creator. Traverses through existing docx zip and creates new one simultaneousl.
* On the end, custom document.xml is inserted inside
* #param zipFilePath location of existing docx template (without word/document.xml)
* #param documentXmlContent content of the word/document.xml
* #throws IOException
*/
public void createDocxFromExistingDocx(ZipOutputStream zos, String zipFilePath, String documentXmlContent) throws IOException {
final FileInputStream fis = new FileInputStream(zipFilePath);
final ZipInputStream zipIn = new ZipInputStream(fis);
try{
log.info("Starting to create new docx zip");
ZipEntry entry = zipIn.getNextEntry();
while (entry != null) { // iterates over entries in the zip file
copyEntryfromZipToZip(zipIn, zos, entry.getName());
zipIn.closeEntry();
entry = zipIn.getNextEntry();
}
// add document.xml to existing zip
addZipEntry(documentXmlContent, zos, "word/document.xml");
}finally{
zipIn.close();
zos.close();
log.info("End of docx creation");
}
}
/** Copies sin gle entry from zip to zip */
public void copyEntryfromZipToZip(ZipInputStream is, ZipOutputStream zos, String entryName)
{
final byte [] data = new byte[BUFFER_SIZE];
int len;
int lenTotal = 0;
try {
final ZipEntry entry = new ZipEntry(entryName);
zos.putNextEntry(entry);
final CRC32 crc32 = new CRC32();
while ((len = is.read(data)) > -1){
zos.write(data, 0, len);
crc32.update(data, 0, len);
lenTotal += len;
}
entry.setSize(lenTotal);
entry.setTime(System.currentTimeMillis());
entry.setCrc(crc32.getValue());
}
catch (IOException ioe){
ioe.printStackTrace();
}
finally{
try { zos.closeEntry();} catch (IOException e) {}
}
}
/** Create new zip entry with content
* #param content content of a new zip entry
* #param zos
* #param entryName name (npr: word/document.xml)
*/
public void addZipEntry(String content, ZipOutputStream zos, String entryName)
{
final byte [] data = new byte[BUFFER_SIZE];
int len;
int lenTotal = 0;
try {
final InputStream is = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
final ZipEntry entry = new ZipEntry(entryName);
zos.putNextEntry(entry);
final CRC32 crc32 = new CRC32();
while ((len = is.read(data)) > -1){
zos.write(data, 0, len);
crc32.update(data, 0, len);
lenTotal += len;
}
entry.setSize(lenTotal);
entry.setTime(System.currentTimeMillis());
entry.setCrc(crc32.getValue());
}
catch (IOException ioe){
ioe.printStackTrace();
}
finally{
try { zos.closeEntry();} catch (IOException e) {}
}
}
}
Office Writer would be a better tool to use than POI for your requirement.
If all you want is a simple table without too much of formatting, I would use this simple trick. Use Java to generate the table as HTML using plain old table,tr,td tags and copy the rendered HTML table into the word document ;)
Click here for a Working example with source code.
This example generates MS-Word docs from Java, based on a template concept.

Categories