Java PDFBox, how to get File object from PDDocument - java

I am trying to retrieve a File or InputStream instance from PDDocument without saving a PDDocument to the file system.
PDDocument doc= new PDDocument();
...
doc.save("D:\\document.pdf");
File f= new File("D:\\document.pdf");
Is there any method in PDFBox which returns File or InputStream from an existing PDDocument?

I solved it:
PDDocument doc=new PDDocument();
PDStream ps=new PDStream(doc);
InputStream is=ps.createInputStream();

I solve it in this way ( It's creating a file but in temporary-file directory ):
final PDDocument document = new PDDocument();
final File file = File.createTempFile(filename, ".pdf");
document.save(file);
and if you need
document.close();

What if you first create the outputstream
PDDocument doc= new PDDocument();
File f= new File("D:\\document.pdf");
FileOutputStream fOut = new FileOutputStream(f);
doc.save(fOut);
Take a look at this
http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDDocument.html#save(java.io.OutputStream)

I am trying to retrieve a File or InputStream instance from PDDocument without saving a PDDocument to the file system.
[...]
Is there any method in PDFBox which returns File or InputStream from an existing PDDocument?
Obviously PDFBox cannot return a meaningful File object without saving a PDDocument to the file system.
It does not offer a method providing an InputStream directly either but it is easy to write code around it that does. e.g.:
InputStream docInputStream = null;
try ( ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDDocument doc = new PDDocument() )
{
[...]
doc.save(baos);
docInputStream = new ByteArrayInputStream(baos.toByteArray());
}

Related

iText 7 Html to Pdf conversion and linking external file to the generated pdf

I am encountering an issue while merging two PDFs generated out of IText.
I am new to iText7
I am creating one pdf from html and creating another pdf with excel(.xls) as embedded document to pdf.
I want to merge the 2 files.
Basically I want to generate a PDF from html then attach a excel document to it and then output combined html outPutStream from these two pdfs.
Below is the code I am using
ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(htmlToPdfContent);
PdfDocument pdf = new PdfDocument(writer);
pdf.setTagged();
PageSize pageSize = PageSize.A4.rotate();
pdf.setDefaultPageSize(pageSize);
ConverterProperties properties = new ConverterProperties();
HtmlConverter.convertToPdf(htmlContent, pdf, properties);
FileUtils.cleanDirectory(new File(outputDir));
ByteArrayOutputStream pdfResult = new ByteArrayOutputStream();
PdfWriter writerResult = new PdfWriter(pdfResult);
PdfDocument pdfDocResult = new PdfDocument(writerResult);
PdfReader reader = new PdfReader(new ByteArrayInputStream(htmlToPdfContent.toByteArray()));
PdfDocument pdfDoc = new PdfDocument(reader);
pdfDoc.copyPagesTo(1, pdfDoc.getNumberOfPages(), pdfDocResult);
ByteArrayOutputStream pdfAttach = new ByteArrayOutputStream();
PdfDocument pdfLaunch = new PdfDocument(new PdfWriter(pdfAttach));
Rectangle rect = new Rectangle(36, 700, 100, 100);
byte[] embeddedFileContentBytes = Files.readAllBytes(Paths.get(excelPath));
PdfFileSpec fs = PdfFileSpec.createEmbeddedFileSpec(pdfLaunch, embeddedFileContentBytes, null, "test.xlsx", null, null);
PdfAnnotation attachment = new PdfFileAttachmentAnnotation(rect, fs)
.setContents("Click me");
pdfLaunch.addNewPage().addAnnotation(attachment);
PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));
appliedChanges.copyPagesTo(1, appliedChanges.getNumberOfPages(), pdfDocResult);
try(OutputStream outputStream = new FileOutputStream(dest)) {
pdfResult.writeTo(outputStream);
}
This is throwing exception
13:56:05.724 [main] ERROR com.itextpdf.kernel.pdf.PdfReader - Error occurred while reading cross reference table. Cross reference table will be rebuilt.
com.itextpdf.io.IOException: Error at file pointer 19,272.
at com.itextpdf.io.source.PdfTokenizer.throwError(PdfTokenizer.java:678)
at com.itextpdf.kernel.pdf.PdfReader.readXrefSection(PdfReader.java:801)
at com.itextpdf.kernel.pdf.PdfReader.readXref(PdfReader.java:774)
at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:538)
at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:76)
Caused by: com.itextpdf.io.IOException: xref subsection not found.
... 8 common frames omitted
Exception in thread "main" com.itextpdf.kernel.PdfException: Trailer not found.
at com.itextpdf.kernel.pdf.PdfReader.rebuildXref(PdfReader.java:1064)
at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:543)
at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:88)
13:56:05.773 [main] ERROR com.itextpdf.kernel.pdf.PdfReader - Error occurred while reading cross reference table. Cross reference table will be rebuilt.
com.itextpdf.io.IOException: PDF startxref not found.
at com.itextpdf.io.source.PdfTokenizer.getStartxref(PdfTokenizer.java:262)
at com.itextpdf.kernel.pdf.PdfReader.readXref(PdfReader.java:753)
at com.itextpdf.kernel.pdf.PdfReader.readPdf(PdfReader.java:538)
at com.itextpdf.kernel.pdf.PdfDocument.open(PdfDocument.java:1818)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:238)
at com.itextpdf.kernel.pdf.PdfDocument.<init>(PdfDocument.java:221)
at com.mediaocean.prisma.order.command.infrastructure.pdf.itext.PdfAttachmentLaunch.main(PdfAttachmentLaunch.java:88)
Please advise. Thanks in advance !!
Concerning revision 2 of your question
You changed your code differently than proposed in my answer to the first revision of your question, you now convert into the formerly unused PdfDocument pdf instead of directly into the ByteArrayOutputStream htmlToPdfContent.
This actually also is a possible fix of the problem identified in that answer. Thus, you don't get an exception here anymore:
PdfReader reader = new PdfReader(new ByteArrayInputStream(htmlToPdfContent.toByteArray()));
PdfDocument pdfDoc = new PdfDocument(reader);
Instead you now get an exception further down the flow, here:
PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));
And the reason is simple, you have not yet closed the PdfDocument pdfLaunch which writes to the ByteArrayOutputStream pdfAttach. But only closing finalizes the PDF in the output stream. Thus, add the close():
ByteArrayOutputStream pdfAttach = new ByteArrayOutputStream();
PdfDocument pdfLaunch = new PdfDocument(new PdfWriter(pdfAttach));
[...]
pdfLaunch.addNewPage().addAnnotation(attachment);
pdfLaunch.close(); //<==== added
PdfDocument appliedChanges = new PdfDocument(new PdfReader(new ByteArrayInputStream(pdfAttach.toByteArray())));
And you actually do the same mistake again, shortly after, you store the contents of the ByteArrayOutputStream pdfResult to outputStream without closing the PdfDocument pdfDocResult which writes to pdfResult. Thus, also add a close call there:
appliedChanges.copyPagesTo(1, appliedChanges.getNumberOfPages(), pdfDocResult);
pdfDocResult.close(); //<==== added
try(OutputStream outputStream = new FileOutputStream(dest)) {
pdfResult.writeTo(outputStream);
}
Concerning revision 1 of your question
You use the ByteArrayOutputStream htmlToPdfContent as target of two distinct PDF generators, the PdfDocument pdf via the PdfWriter writer and the HtmlConverter.convertToPdf call:
ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(htmlToPdfContent);
PdfDocument pdf = new PdfDocument(writer);
pdf.setTagged();
PageSize pageSize = PageSize.A4.rotate();
pdf.setDefaultPageSize(pageSize);
ConverterProperties properties = new ConverterProperties();
HtmlConverter.convertToPdf(content, htmlToPdfContent, properties);
This makes the content of htmlToPdfContent a hodgepodge of the outputs of both of them, in particular not a valid PDF.
As you don't add any content to pdf, you can safely remove it and reduce the above excerpt to
ByteArrayOutputStream htmlToPdfContent = new ByteArrayOutputStream();
ConverterProperties properties = new ConverterProperties();
HtmlConverter.convertToPdf(content, htmlToPdfContent, properties);

Add Empty/Blank Page to PdfDocument java

It is there any way to add a Blank Page to an existing PdfDocument ? I've created a method like this:
public void addEmptyPage(PdfDocument pdfDocument){
pdfDocument.addNewPage();
pdfDocument.close();
}
However , when I use it with a PdfDocument , it throws :
com.itextpdf.kernel.PdfException: There is no associate PdfWriter for making indirects.
at com.itextpdf.kernel.pdf.PdfObject.makeIndirect(PdfObject.java:228) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfObject.makeIndirect(PdfObject.java:248) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfPage.<init>(PdfPage.java:104) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfDocument.addNewPage(PdfDocument.java:416) ~[kernel-7.1.1.jar:?]
Which is the correct way to insert a Blank page into a pdf document?
com.itextpdf.kernel.PdfException: There is no associate PdfWriter for making indirects.
That exception indicates that you initialize your PdfDocument with only a PdfReader, no PdfWriter. You don't show your PdfDocument instantiation code but I assume you do something like this:
PdfReader reader = new PdfReader(SOURCE);
PdfDocument document = new PdfDocument(reader);
Such documents are for reading only. (Actually you can do some minor manipulations but nothing as big as adding pages.)
If you want to edit a PDF, initialize your PdfDocument with both a PdfReader and a PdfWriter, e.g.
PdfReader reader = new PdfReader(SOURCE);
PdfWriter writer = new PdfWriter(DESTINATION);
PdfDocument document = new PdfDocument(reader, writer);
If you want to store the edited file at the same location as the original file,
you must not use the same file name as SOURCE in the PdfReader and as DESTINATION in the PdfWriter.
Either first write to a temporary file, close all participating objects, and then replace the original file with the temporary file:
PdfReader reader = new PdfReader("document.pdf");
PdfWriter writer = new PdfWriter("document-temp.pdf");
PdfDocument document = new PdfDocument(reader, writer);
...
document.close();
Path filePath = Path.of("document.pdf");
Path tempPath = Path.of("document-temp.pdf");
Files.move(tempPath, filePath, StandardCopyOption.REPLACE_EXISTING);
Or read the original file into a byte[] and initialize the PdfReader from that array:
PdfReader reader = new PdfReader(new ByteArrayInputStream(Files.readAllBytes(Path.of("document.pdf"))));
PdfWriter writer = new PdfWriter("document.pdf");
PdfDocument document = new PdfDocument(reader, writer);
...
document.close();

Creating PDF file in java using PDDocument results in corrupted PDF files

I'm trying to create temporary PDF files in Java using PDDocument. I'm employing the following method to create a temporary PDF file.
/* Create a temporary PDF file.*/
private File createPdf(String fileName) throws IOException {
final PDDocument document = new PDDocument();
final File file = File.createTempFile(fileName, ".pdf");
//write it
BufferedWriter bw = new BufferedWriter(new FileWriter(file));
bw.write("This is the temporary pdf file content");
bw.close();
document.save(file);
document.close();
return file;
}
This is the test.
#Test
public void testCreateAndMergePdfs() throws IOException {
Collection<File> pdfs = new ArrayList<>(Arrays.asList(createPdf("File1"), createPdf("File2")));
assertFalse(CollectionUtils.isEmpty(pdfs));
PdfPrintPojo pdfPrintPojo = new PdfPrintPojo(pdfs);
File mergedFile = service.createAndMergePDFs(pdfPrintPojo, "Merged");
assertNotNull(mergedFile);
List<File> list = new ArrayList<>(pdfs);
File file1 = list.get(0);
File file2 = list.get(1);
assertTrue(FileUtils.contentEquals(file1, file2));
}
What I'm trying to do here is to create and merge two PDF files. When I run the test, it creates two PDF files in the temp folder, for example, \AppData\Local\Temp\File16375814641476797612.pdf and \AppData\Local\Temp\File24102718409195239661.pdf and the merged file at \AppData\Local\Temp\Merged_merged_3755858389884894769.pdf. But the test fails at
assertTrue(FileUtils.contentEquals(file1, file2));
When I try to open the PDF files in the temp folder, it says that the PDF is corrupted. Also, I have no idea why the files are not being saved as File1 and File2. Can anyone help me with this?
Using Apache PDFBox tutorial, I managed to create a working PDF file(s). The method was changed as follows.
/* Create a temporary PDF file.*/
private File createPdf(String fileName) throws IOException {
// Create a document and add a page to it
final PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
// Create a new font object selecting one of the PDF base fonts
PDFont font = PDType1Font.HELVETICA_BOLD;
// Start a new content stream which will "hold" the to be created content
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText("Hello World");
contentStream.endText();
// Make sure that the content stream is closed:
contentStream.close();
// Save the results and ensure that the document is properly closed:
File file = File.createTempFile(fileName, ".pdf");
document.save(file);
document.close();
return file;
}
As for the test, I took the approach of using PDDocument to load the files, then extract data as String using PDFTextStripper and using assertions on those Strings.
#Test
public void testCreateAndMergePdfs() throws IOException {
Collection<File> pdfs = new ArrayList<>(Arrays.asList(createPdf("File1"), createPdf("File2")));
assertFalse(CollectionUtils.isEmpty(pdfs));
PdfPrintPojo pdfPrintPojo = new PdfPrintPojo(pdfs);
File mergedFile = service.createAndMergePDFs(pdfPrintPojo, "Merged");
assertNotNull(mergedFile);
List<File> list = new ArrayList<>(pdfs);
/* Load the PDF files and extract data as String. */
PDDocument document1 = PDDocument.load(list.get(0));
PDDocument document2 = PDDocument.load(list.get(1));
PDDocument merged = PDDocument.load(mergedFile);
PDFTextStripper stripper = new PDFTextStripper();
String file1Data = stripper.getText(document1);
String file2Data = stripper.getText(document2);
String mergedData = stripper.getText(merged);
/* Assert that data from file 1 and 2 are equal with each other and merged file. */
assertEquals(file1Data, file2Data);
assertEquals(file1Data + file2Data, mergedData);
}
The way you compare the file contents is a bit different, Could you try with below,
#Test
public void testCreateAndMergePdfs() {
Assert.assertEquals(FileUtils.readLines(file1), FileUtils.readLines(file2));
}
Or you can try
byte[] file1Bytes = Files.readAllBytes(Paths.get("Path to File 1"));
byte[] file2Bytes = Files.readAllBytes(Paths.get("Path to File 2"));
String file1 = new String(file1Bytes, StandardCharsets.UTF_8);
String file2 = new String(file2Bytes, StandardCharsets.UTF_8);
assertEquals("The content in the strings should match", file1, file2);
Or
File file1 = new File(file1);
File file2 = new File(file2);
assertThat(file1).hasSameContentAs(file2);

PDF context into String from a blob

I want to extract PDF context into a String. I've done that previously using PDFBox but it doesn't support a lot of fonts I have in my PDFs.
Decided to use iText instead. How can I do that using getByteStream from a blob rather than a file on disk?
Blob blobPdf = ...;
File outputFile = new File("/tmp/blah/whatever.pdf");
FileOutputStream fout = new FileOutputStream(outputFile);
IOUtils.copy(blobPdf.getBinaryStream(), fout);
I want this sort of logic but insert the context into the String variable instead. How can I do that?
#EDIT
This is my attempt
InputStream is = resultSet.getBinaryStream(3);
PdfReader reader = new PdfReader(is);
String text = PdfTextExtractor.getTextFromPage(reader, 1);
System.out.println(text);

how to set attributes for existing pdf that contains only images using java itext?

I would like to set attributes to pdf before uploading it into a server.
Document document = new Document();
try
{
OutputStream file = new FileOutputStream({Localpath});
PdfWriter.getInstance(document, file);
document.open();
//Set attributes here
document.addTitle("TITLE");
document.close();
file.close();
} catch (Exception e)
{
e.printStackTrace();
}
But its not working. The file is getting corrupted
In a comment to another answer the OP clarified:
I want to set attributes to an existing pdf(not to create new pdf)
Obviously, though, his code creates a new document from scratch (as is obvious from the fact that a mere FileOutputStream is used to access the file, no reading, only writing).
To manipulate an existing PDF, one has to use a PdfReader / PdfWriter couple. Bruno Lowagie provided an example for that in his answer to the stack overflow question "iText setting Creation Date & Modified Date in sandbox.stamper.SuperImpose.java":
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Map info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
}
(ChangeMetadata.java)
As you see the code sets the metadata both in the ol'fashioned PDF info dictionary (stamper.setMoreInfo) and in the XMP metadata (stamper.setXmpMetadata).
Obviously src and dest should not be the same here.
Without a second file
In yet another comment the OP clarified that he had already tried a similar solution but that he wants to prevent the
Temporary existence of second file
This can easily be prevented by first reading the original PDF into a byte[] and then stamping to it as the target file. E.g. if File singleFile references the original file which is also to be the target file, you can implement:
byte[] original = Files.readAllBytes(singleFile.toPath());
PdfReader reader = new PdfReader(original);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(singleFile));
Map<String, String> info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
(UpdateMetaData test testChangeTitleWithoutTempFile)

Categories