I'm generating a PDDocument in Java with code like this...
HashMap<Integer, PDPageContentStream> mPageContentStreamMap = new HashMap<>();
PDDocument doc = new PDDocument();
for (int i = 1; i <= mNumPages; i++) {
PDPage page = new PDPage(PDRectangle.A4);
page.setRotation(90);
PDPageContentStream pageContentStream = new PDPageContentStream(doc, page);
contentStreamMap.put(i, pageContentStream);
doc.addPage(page);
}
}
Then later save and close the document like this...
for (int i : mPageContentStreamMap.keySet()) {
mPageContentStreamMap.get(i).close();
}
doc.save("test-filename");
doc.close();
This works fine on the first run; however when I run my program multiple times I get the following error
java.io.IOException: Scratch file already closed
at org.apache.pdfbox.io.ScratchFile.checkClosed(ScratchFile.java:390)
at org.apache.pdfbox.io.ScratchFileBuffer.<init>(ScratchFileBuffer.java:78)
at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:403)
at org.apache.pdfbox.cos.COSStream.createOutputStream(COSStream.java:208)
at org.apache.pdfbox.pdmodel.common.PDStream.createOutputStream(PDStream.java:224)
at org.apache.pdfbox.pdmodel.PDPageContentStream.<init>(PDPageContentStream.java:259)
at org.apache.pdfbox.pdmodel.PDPageContentStream.<init>(PDPageContentStream.java:121)
If I re-run my program without the "doc.close();" line, this error goes away, but the output of the PDF is duplicated (i.e. a new PDF is generated, but with the content from the last PDF and the content from the current PDF).
Is there a way to close the stream and create multiple PDFs without running into the scratch file error?
I had created a singleton object for my drawing logic meaning after the first run, the same objects were reused when they shouldn't've been, because the input (what was being drawn) had changed.
Related
I am using the pdfbox library 2.0 version. I need to open PDF in new browser tab i.e. Print View.
As if we are migrating from iText to PDFBox below is the existing code with iText.
With below code, there is PDFAction class to achieve same. It is,
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);
and to apply print Javascript on doc,
copy.addJavaScript(action);
I need equivalent solution with PDFBox.
Document document = new Document();
try{
outputStream=response.getOutputStream();
// step 2
PdfCopy copy = new PdfCopy(document, outputStream);
// step 3
document.open();
// step 4
PdfReader reader;
int n;
//add print dialog in Pdf Action to open file for preview.
PdfAction action = new PdfAction(PdfAction.PRINTDIALOG);
// loop over the documents you want to concatenate
Iterator i=mergepdfFileList.iterator();
while(i.hasNext()){
File f =new File((String)i.next());
is=new FileInputStream(f);
reader=new PdfReader(is);
n = reader.getNumberOfPages();
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
reader.close();
is.close();
}
copy.addJavaScript(action);
// step 5
document.close();
}catch(IOException io){
throw io;
}catch(DocumentException e){
throw e;
}catch(Exception e){
throw e;
}finally{
outputStream.close();
}
I also tried with below reference but could not find print() method of PDDocument type.
Reference Link
Please guide me with this.
This is how file looks when display in browser tab:
This code reproduces what your file has, a JavaScript action in the name tree in the JavaScript entry in the name dictionary in the document catalog. ("When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document" - PDF specification) There's probably an easier way to do this, e.g. with an OpenAction.
PDActionJavaScript javascript = new PDActionJavaScript("this.print(true);\n");
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDDocumentNameDictionary names = new PDDocumentNameDictionary(documentCatalog, new COSDictionary());
PDJavascriptNameTreeNode javascriptNameTreeNode = new PDJavascriptNameTreeNode();
Map<String, PDActionJavaScript> map = new HashMap<>();
map.put("0000000000000000", javascript);
javascriptNameTreeNode.setNames(map);
names.setJavascript(javascriptNameTreeNode);
document.getDocumentCatalog().setNames(names);
I have a pdf template which contains images and form fields.
I read this template and fill form fields per page and write to a temp pdf file. Then I read this file and copy to a master document to have multiple pages using the same template. Roughly as below:
Document masterDoc = ...
-- loop per page --
PdfWriter pfdWriter = new PdfWriter(tmpFileName);
PdfDocument pdf = new PdfDocument(new PdfReader(templateFile), pfdWriter);
Document doc = new Document(pdf);
// Set form fields
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
form.setDefaultJustification(0);
Map<String, PdfFormField> formFields = form.getFormFields();
formFields.get("key").setValue("value");
form.flattenFields();
doc.close();
try (PdfDocument resource = new PdfDocument(new PdfReader("pathToTmpFile"))) {
resource.copyPagesTo(1, 1, masterDoc.getPdfDocument());
}
-- end of loop
This approach is slow (depends on the template file size, but takes seconds not milliseconds).
Would it be possible to use same template file per every page without writing and reading from/to temp files?
I read the documentation and guess it might be possible with new page event handler but couldn't figure it out.
I have downloaded pdf file from one site, and on each page there is hyperlink to this site in a rectangle. I want to remove link from every page.
I am using PDFBox version 2.0.8
I figured out that link description is located in ANNOTS in every page of the document. I deleted ANOOTS corresponding to link. Of cause I set needToUpdated flag true to every node in the chain from the PDF catalog.
In debug mode I see that readOnly flag is set to true in AccessPermission object.
When I open edited pdf file all pages are empty and for every page Acrobat Reader shows the following error:
There was an error processing a page. Invalid Function resource.
I have several questions:
Can I programmatically change the pdf file when readOnly flag is set
to true in AccessPermission object?
Why I get error described above?
What do I need to do to remove unnecessary link from page and every
page display properly in pdf document?
Here is my code(sorry for quality this is only draft):
File book = new File(path_to_pdf_file);
PDDocument document = PDDocument.load(book);
document.setAllSecurityToBeRemoved(true);
COSDictionary dictionary = document.getDocumentCatalog().getCOSObject();
dictionary.removeItem(COSName.PERMS);
dictionary.setNeedToBeUpdated(true);
((COSObject) document.getDocumentCatalog().getCOSObject().getItem(COSName.PAGES)).setNeedToBeUpdated(true);
dictionary = document.getDocumentCatalog().getPages().getCOSObject();
dictionary.setNeedToBeUpdated(true);
COSArray arr = (COSArray) dictionary.getDictionaryObject(COSName.KIDS);
arr.setNeedToBeUpdated(true);
COSArray arrayForLoop;
COSDictionary tempDic;
for (int k = 0; k < arr.size(); ++k) {
COSObject object = (COSObject) arr.get(k);
object.setNeedToBeUpdated(true);
dictionary = (COSDictionary) object.getObject();
dictionary.setNeedToBeUpdated(true);
arrayForLoop = (COSArray) dictionary.getItem(COSName.ANNOTS);
arrayForLoop.setNeedToBeUpdated(true);
arrayForLoop = (COSArray) arrayForLoop.getCOSObject();
arrayForLoop.setNeedToBeUpdated(true);
dictionary = (COSDictionary) arrayForLoop.get(0);
dictionary.setNeedToBeUpdated(true);
dictionary.removeItem(COSName.TYPE);
dictionary.removeItem(COSName.SUBTYPE);
dictionary.removeItem(COSName.RECT);
dictionary.removeItem(COSName.BORDER);
tempDic = (COSDictionary) dictionary.getItem(COSName.A);
tempDic.setNeedToBeUpdated(true);
dictionary.removeItem(COSName.A);
}
document.saveIncremental(new FileOutputStream(path_to_save_file));
document.close();
In code above I iterate over every page, delete ANNOTS that corresponding to
link. Also I used saveIncremental method to traverse all modified nodes from leaf to root.
Thank you for your answers.
I created a PDF file with the data from an Excel file. I am not sure what was happened, but when I tried to remove one PdfPage and insert it somewhere else, it showed the warning message "The removing page has already been flushed".
The used code is quite simple:
PdfDocument pdf = ...;
....
PdfPage page = pdf.removePage(10);
pdf.addPage(1, page);
But I got warning and errors:
[main] WARN com.itextpdf.kernel.pdf.PdfPage - The removing page has
already been flushed. Exception in thread "main"
com.itextpdf.kernel.PdfException:flushed.page.cannot.be.added.or.inserted
at
com.itextpdf.kernel.pdf.PdfDocument.checkAndAddPage(PdfDocument.java:1473)
at com.itextpdf.kernel.pdf.PdfDocument.addPage(PdfDocument.java:437)
To be honest, I did have tried the above code with some other PDF files, it works to remove and to insert pages. What could be the possible reasons with my PDF file?
Complete code used in my application:
PdfWriter writer;
PdfDocument pdfDocument;
Document document;
try {
writer = new PdfWriter(FileConfigurator.getAbsoluteResultFilePath(),
new WriterProperties().addXmpMetadata().setPdfVersion(PdfVersion.PDF_1_7));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
pdfDocument = new PdfDocument(writer);
// Initialize document
document = new Document(pdfDocument);
I have two PDF files (named : A1.pdf and B1.pdf). Now I want to replace the some pages of the second PDF file (B1.pdf) with the first one (A1.pdf) programatically. In this case I am using PDFBox library.
Here is my sample code:
try {
File file = new File("/Users/test/Desktop/A1.pdf");
PDDocument pdDoc = PDDocument.load(file);
PDDocument document = PDDocument.load(new File("/Users/test/Desktop/B1.pdf"));
document.removePage(3);
document.addPage((PDPage) pdDoc.getDocumentCatalog().getAllPages().get(0));
document.save("/Users/test/Desktop/"+"generatedPDFBox"+".pdf");
document.close();
}catch(Exception e){}
The idea is to replace the 3rd page. In this implementation the page is appending to the last page of the output pdf. Can anyone help me to implement this? If not with PDFBOX. Could you please suggest some other libraries in java?
This solution creates a third PDF file with the contents like you asked for. Note that pages are zerobased, so the "3" in your question must be a "2".
PDDocument a1doc = PDDocument.load(file1);
PDDocument b1doc = PDDocument.load(file2);
PDDocument resDoc = new PDDocument();
List<PDPage> a1Pages = a1doc.getDocumentCatalog().getAllPages();
List<PDPage> b1Pages = b1doc.getDocumentCatalog().getAllPages();
// replace the 3rd page of the 2nd file with the 1st page of the 1st one
for (int p = 0; p < b1Pages.size(); ++p)
{
if (p == 2)
resDoc.addPage(a1Pages.get(0));
else
resDoc.addPage(b1Pages.get(p));
}
resDoc.save(file3);
a1doc.close();
b1doc.close();
resDoc.close();
If you want to work from the command line instead, look here:
https://pdfbox.apache.org/commandline/
Then use PDFSplit and PDFMerge.
I am not too familiar with how PDFBox works, but to answer your follow up I know you can accomplish what you want to do in a fairly simple manner with the Datalogics APDFL SDK. A free trial exists in case you want to look into it. Here is a code snippet so you can see how it would be done:
Document Doc1 = new Document("/Users/test/Desktop/A1.pdf");
Document Doc2 = new Document("/Users/test/Desktop/B1.pdf");
/* Delete pages on the page range 3-3*/
Doc2.deletePages(3, 3)
/* LastPage is where in Doc2 you want to insert the page, Doc1 the document from which the page is coming from, 0 is the page number in Doc1 that will be inserted first, 1 is the number of pages that will be inserted (beginning from the page number specified in the previous parameter), and PageInsertFlags which would let you customize what gets / doesn't get copied */
Doc2.insertPages(Document.LastPage, Doc1, 0, 1, PageInsertFlags.All);
Doc2.save(EnumSet.of(SaveFlags.FULL), "out.pdf")
Alternatively, there is another method called replacePages which makes the deletion unnecessary. It all depends on what your end goal is, of course.