Converting .docx file to pdf using apache poi drops images - java

I'm having a word document, .docx, containing tables, paragraphs and images. I have been able to successfully convert the file to pdf but the pdf file is missing images. This is a code snippet I'm using:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(xwpfDocument, baos, options);
new FileOutputStream(new File("/home/sam/test.pdf")).write(baos.toByteArray());
The final file, test.pdf does not contain images on the .docx. Is there something else I'm supposed to do?

Related

Unexpected different page sizes when merging PDF's in Java using iText

I have the following code snippet to merge two single paged PDF files (first and second):
byte[] codes = IOUtils.toByteArray(resource.getURI());
PdfReader first = new PdfReader(firstBytes);
PdfReader second = new PdfReader(secondBytes);
Document document = new Document(PageSize.A4);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfCopy copy = new PdfCopy(document, byteArrayOutputStream);
document.open();
copy.addDocument(first);
copy.addDocument(second);
document.close();
return byteArrayOutputStream.toByteArray();
Next I have the following test:
byte[] generated = new Merger(...).generate(...); // Location of the snippet above
File file = new File("dir", "generated.pdf");
FileUtils.writeByteArrayToFile(file , generated);
PdfReader pdfReader = new PdfReader(new FileInputStream(file));
assertThat(pdfReader.getNumberOfPages()).isEqualTo(2);
This test works fine locally and fails on our build server.
Locally the generated PDF simply looks contains the two A4 pages On the build server there are three pages:
the first document
one blank page
the second document
The first two pages seem to be in letter format, while the last page seems to be an A4 page.
How do I fix this?
Edit: Some extra info. Local OS is Windows. Build system runs Linux.
The first generated document was generated using the flying-saucer library. It was fixed by setting the page size in the css.
My assumption of where the problem was located was completely wrong. Sorry.

iText - Read pdfs in a loop and merge one by one

I have a loop where I read a pdf file every time. And I want to add these pdf files into another final pdf. Basically merging all pdfs into one new pdf file.
I tried following way :
I tried to concatenate byteArrays inside the loop like
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
for (............){
outputStream.write(myInfo.getByteArray());
}
pdfreader = new PdfReader(outputStream.toByteArray());
FileOutputStream fileout = new FileOutputStream(file);
PdfStamper pdfStamper = new PdfStamper(pdfreader, fileout);
pdfStamper.close();
pdfreader.close();
The problem the final pdf does not have all the pdfs. Instead it has only one pdf.
And I am not sure if this is the right way to do it.
Or is there any other to merge pdfs one by one ?
Have a look at the documentation for PdfMerger: itextsupport.com/apidocs/iText7/7.0.3/com/itextpdf/kernel/utils/PdfMerger.html
I can prepare an example later if needed.

How to convert InputStream to a PDF in Java, without damaging the file?

I have an InputStream which I would like to convert to a PDF, and save that PDF in a directory. Currently, my code is able to convert the InputStream to a PDF and the PDF does show up in the correct directory. However, when I try to open it, the file is damaged.
Here is the current code:
InputStream pAdESStream = signingServiceConnector.getDirectClient().getPAdES(this.statusReader.getStatusResponse().getpAdESUrl());
byte[] buffer = new byte[pAdESStream.available()];
pAdESStream.read(buffer);
File targetFile = new File(System.getProperty("user.dir") + "targetFile2.pdf");
OutputStream outStream = new FileOutputStream(targetFile);
outStream.write(buffer);
Originally, the InputStream was a pAdES-file (https://en.wikipedia.org/wiki/PAdES). However, it should be able to be read as just a regular PDF.
Does anyone know how to convert the InputStream to a PDF, without getting a damaged PDF as a result?
Hello it might be a bit late but you can use PDFBOX api (or itextpdf)
https://www.tutorialkart.com/pdfbox/create-write-text-pdf-file-using-pdfbox/
here is a tuto of the process gl

Java byteArray[] to docx

doc file in byte[] type.
Is it possible to convert it from byte[] into .docx file.
tried just change file extension programilly but it does not work.
any suggestions?
I generate report using BiRT eclipse
code of saving doc:
options = new RenderOptionBase();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
options.setOutputStream(bos);
options.setOutputFormat("doc");
if(parameters != null){
task.setParameterValues(parameters);
}
task.setRenderOption(options);
task.run();
return bos.toByteArray()
//IRunAndRenderTask task;
problem is that we use BIRT 3.7 which not support DocxRenderOption
Take a look at Aspose.Words for Java -- http://www.aspose.com/java/word-component.aspx
It has really good doc too -- http://www.aspose.com/docs/display/wordsjava/load+or+create+a+document
Code will be as simple as
// Open a document.
Document doc = new Document("input.doc");
// Save document.
doc.save("output.docx");
Step1: save the doc file
Step2: using this lib convert the file and save as docx file.

Using PdfBox, how do I retrieve contents of PDDocument as a byte array?

I am currently using PdfBox as the driver for a pdf-file editor application.
I need the contents of the PdfBox representation of a pdf file (PDDocument) as a byte array.
Does anyone know how to do this?
I hope it's not too late...
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
document.save(byteArrayOutputStream);
document.close();
InputStream inputStream = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
And voila! You've got both input streams!

Categories