Exception when adding a PdfFormField to a big PDF - java

I am adding a PdfTextFormField over a Table cell using a custom renderer, as per the iText7 example code in CreateFormInTable.java. This works initially, until I create a Table on page 3 or later of the PDF, at which point I'm getting an exception:
Caused by: java.lang.NullPointerException
at com.itextpdf.kernel.pdf.PdfDictionary.get(PdfDictionary.java:552)
at com.itextpdf.kernel.pdf.PdfDictionary.getAsArray(PdfDictionary.java:156)
at com.itextpdf.kernel.pdf.PdfPage.getAnnotations(PdfPage.java:746)
at ...pdf.annot.PdfAnnotation.getPage(PdfAnnotation.java:435)
at ...forms.fields.PdfFormField.regenerateField(PdfFormField.java:1761)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:1038)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:999)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:994)
etc.
It seems fairly easy to reproduce, and I can provide a full code sample if you want, but a simple way to see the problem is to insert:
for (int i=1; i < 2; i++) // Change 2 to 3 and you get an NPE
{
Paragraph para = new Paragraph("Page "+ i);
doc.add( para );
doc.add( new AreaBreak( AreaBreakType.NEXT_PAGE ) );
}
straight after the Document constructor in the aforementioned iText7 Java sample file at:
http://developers.itextpdf.com/examples/form-examples/clone-create-fields-table#2350-createformintable.java
I've tested it on 7.0.1 and 7.0.2, with same result.

Well, currently some of the form-related functionality requires the whole PDF document structure to be in memory to operate. This means that no object can be flushed. But layout's DocumentRenderer flushes the pages when possible. The problem reproduces only for three or more pages because there is a small "window" of unflushed pages.
This is indeed not mentioned in the sample and can be improved in the future. In the current version, to get the desired PDF, you can set the Document to operate in "postpone flushing" mode using the following constructor:
Document doc = new Document(pdfDoc, PageSize.A4, false);

Related

Getting OutOfMemoryError with PDFBox Annotation constructAppearances() method

In a Nutshell
I've been working on a program that gets a pdf, highlights some words (via pdfbox Mark Annotation) and saves the new pdf.
In order to these annotations be visible on some viewers like pdf.js, it's needed to call the pdAnnotationTextMarkup.constructAppearances() before adding the mark annotation into the page Annotation list.
However, by doing so, I get an OutOfMemoryError when dealing with huge documents that would contain thousands of mark annotations.
I'd like to know if there's a way to prevent this from happening.
(this is a kind of a sequel of this ticket, but that's not much relevant for this one)
Technical Specification:
PDFBox 2.0.17
Java 11.0.6+10, AdoptOpenJDK
MacOS Catalina 10.15.2, 16gb, x86_64
My Code
//my pdf has 216 pages
for (int pageIndex = 0; pageIndex < numberOfPages; pageIndex++) {
PDPage page = document.getPage(pageIndex);
List<PDAnnotation> annotations = page.getAnnotations();
// each coordinate obj represents a hl annotation. crashing with 7.816 elements
for (CoordinatePoint coordinate : coordinates) {
PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
txtMark.setRectangle(pdRectangle);
txtMark.setQuadPoints(quadPoints);
txtMark.setColor(getColor());
txtMark.setTitlePopup(coordinate.getHintDescription());
txtMark.setReadOnly(true);
// this is what makes everything visible on pdf.js and what causes the Java heap space error
txtMark.constructAppearances();
annotations.add(txtMark);
}
}
Current Result
This is the heavy pdf doc that is leading to the issue:
https://pdfhost.io/v/I~nu~.6G_French_Intensive_Care_Society_International_congress_Ranimation_2016.pdf
My program tries to add 7.816 annotations to it throughout 216 pages.
and the stacktrace:
[main] INFO highlight.PDFAnnotation - Highlighting 13613_2016_Article_114.pdf...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.ScratchFile.<init>(ScratchFile.java:128)
at org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143)
at org.apache.pdfbox.cos.COSStream.<init>(COSStream.java:61)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDAbstractAppearanceHandler.createCOSStream(PDAbstractAppearanceHandler.java:106)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDHighlightAppearanceHandler.generateNormalAppearance(PDHighlightAppearanceHandler.java:136)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDHighlightAppearanceHandler.generateAppearanceStreams(PDHighlightAppearanceHandler.java:59)
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup.constructAppearances(PDAnnotationTextMarkup.java:175)
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup.constructAppearances(PDAnnotationTextMarkup.java:147)
at highlight.PDFAnnotation.drawHLAnnotations(PDFAnnotation.java:288)
I've already tried to increase my jvm xmx and xms parameters to like -Xmx10g -Xms10g, which only postponed the crash a little bit.
What I Want
I want to prevent this memory issue and still be able to see my annotations in pdf.js viewer. Without calling constructAppearances the process is much more faster, I don't have this issue, but the annotations can only be seen on some pdf viewers, like Adobe.
Any suggestions? Am I doing anything wrong here or missing something?
In the upcoming version 2.0.19, construct the appearances like this:
annotation.constructAppearances(document);
In 2.0.18 and earlier, you need to initialize the appearance handler yourself:
setCustomAppearanceHandler(new PDHighlightAppearanceHandler(annotation, document));
That line can be removed in 2.0.19 as this is the default appearance handler.
Why all this? So that the document common memory space ("scratch file") is used in the annotation handler instead to create a new one each time (which is big). The later is done when calling new COSStream() instead of document.getDocument().createCOSStream().
All this is of course only important when doing many annotations.
related PDFBox issues: PDFBOX-4772 and PDFBOX-4080

iText 5.5.11 - bold text looks blurry after using PdfCleanUpProcessor

I need to remove some content from an existing pdf created with Jasper Reports in iText 5.5.11 but after running PdfCleanUpProcessor all bold text is blurry.
This is the code I'm using:
PdfReader reader = new PdfReader("input.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("output.pdf"));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
cleanUpLocations.add(new PdfCleanUpLocation(1, new Rectangle(0f, 0f, 595f, 680f)));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
As already discussed here downgrading to itext-5.5.4 solves the problem, but in my case itext-5.5.11 is already in use for other reasons and so downgrading is not an option.
Is there another solution or workaround?
This are the pdf files before and after cleaning: BEFORE - AFTER
By comparing the before and after files it becomes clear that for some reason the PdfCleanUpProcessor falsely drops general graphics state operations (at least w, J, and d).
In your before document in particular the w operation is important for the text because a poor man's bold variant is used, i.e. instead of using an actual bold font the normal font is used and the text rendering mode is set to not only fill the glyph contours but also draw a line along it giving it a bold'ish appearance.
The width of that line is set to 0.23333 using a w operation. As that operation is missing in the after document, the default width value of 1 is used. Thus, the line along the contour now is 4 times as big as before resulting in a very fat appearance.
This issue has been introduced in commit d5abd23 (dated May 4th, 2015) which (among other things) added this block to PdfCleanUpContentOperator.invoke:
} else if (lineStyleOperators.contains(operatorStr)) {
if ("w" == operatorStr) {
cleanUpStrategy.getContext().setLineWidth(((PdfNumber) operands.get(0)).floatValue());
} else if ("J" == operatorStr) {
cleanUpStrategy.getContext().setLineCapStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("j" == operatorStr) {
cleanUpStrategy.getContext().setLineJoinStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("M" == operatorStr) {
cleanUpStrategy.getContext().setMiterLimit(((PdfNumber) operands.get(0)).floatValue());
} else if ("d" == operatorStr) {
cleanUpStrategy.getContext().setLineDashPattern(new LineDashPattern(((PdfArray) operands.get(0)),
((PdfNumber) operands.get(1)).floatValue()));
}
disableOutput = true;
This causes all lineStyleOperators to be dropped while at the same time an attempt was made to store the changed values in the cleanup strategy context. But of course using == for String comparisons in Java usually is a very bad idea, so since this version the line style operators were dropped for good in iText.
Actually this code had been ported from iTextSharp, and in C# == for the string type works entirely different; nonetheless, even in the iTextSharp version these stored values at first glance only seem to have been taken into account if paths were stroked, not if text rendering included stroking along the contour.
Later on in commit 9967627 (on the same day as the commit above) the inner if..else if..else.. has been removed with the comment Replaced PdfCleanUpGraphicsState with existing GraphicsState from itext.pdf.parser package, added missing parameters into the latter, only the disableOutput = true remained. This (also at first glance) appears to have fixed the difference between iText/Java and iTextSharp/.Net, but the line style values still are not considered if text rendering included stroking along the contour.
As a work-around consider removing the lines
} else if (lineStyleOperators.contains(operatorStr)) {
disableOutput = true;
from PdfCleanUpContentOperator.invoke. Now the line style operators are not dropped anymore and the text in your PDF after redaction looks like before. I have not checked for any side effects, though, so please test with a number of documents before even considering using that work-around in production.

APACHE POI autoSizeColumn with useMergedCells too slow

I try to apply autoSizeColumn on an Excel sheet. I'm using POI 3.10.1.
I apply autoSizeColumn at the end but the problem is than the process is too slow/long anyway:
On the sheet I've approximately 1000 lines and 20 columns... After 5 hours, I kill the process ...
I don't understand what is taking so long, 1000 lines and 20 columns doesn't appear so huge: Did I miss something? (nb: on a smaller file, it's working)
My simplified code below:
Workbook vWorkbook = getWorkbook();
Sheet vSheet = vWorkbook.createSheet("sheet");
CreationHelper vHelper = vWorkbook.getCreationHelper();
Drawing drawing = vSheet.createDrawingPatriarch();
Set<CellRangeAddress> vRegions = new HashSet<CellRangeAddress>();
//Parcours des lignes du document
MatrixDocument vMatrixDocument = getMatrixDocument();
List<MatrixRow> vListMatrixRows = vMatrixDocument.getRows();
int maxColNb = 0;
//Parcours des lignes de la grille.
for (MatrixRow vMatrixRow : vListMatrixRows)
{
//(...)
//create cells
//(...)
}
initColSpan(vListMatrixRows, vRegions);
//Gestion des colSpan et des RowSpan
for (CellRangeAddress vRegion : vRegions)
{
vSheet.addMergedRegion(vRegion);
}
for (int i = 0; i < maxColNb; ++i)
{
vSheet.autoSizeColumn(i, true);//Here the problem. spent more than 5 hour for 1000 lines and 20 columns
}
I've already read threads below :
http://stackoverflow.com/questions/16943493/apache-poi-autosizecolumn-resizes-incorrectly
http://stackoverflow.com/questions/15740100/apache-poi-autosizecolumn-not-working-right
http://stackoverflow.com/questions/23366606/autosizecolumn-performance-effect-in-apache-poi
http://stackoverflow.com/questions/18984785/a-poi-related-code-block-running-dead-slow
http://stackoverflow.com/questions/28564045/apache-poi-autosizecolumn-behaving-weird
http://stackoverflow.com/questions/18456474/apache-poi-autosizecolumn-is-not-working
But none solve my issue.
Any idea ?
PS : I tried to upload an example image of the Excel file but I don't find how to upload it.
Even after using Apache POI 3.12, I was facing the same issue for auto-sizing. Also auto-size doesn't work in Unix/Linux.
What I learnt from various forums is this:
1.You can try is using the SXSSF API, usually works much faster.
2. If not, then go for setColumnWidth method(I know its literally manual work for 20 columns)
My solution was only use Autosize on the last Excel line, like this:
if (i >= abaExcel.Itens.Count)
sheet.AutoSizeColumn(j);
Because merged regions cannot overlap without producing a corrupt document, POI may be checking the list of merged regions on a sheet for potential intersections before adding a merged region. This gives O(N) behavior for adding one region instead of the expected O(1).
addMergedRegion - with checking but slow
addMergedRegionUnsafe - without checking but fast
Documentation: read more about addMergedRegionUnsafe(...)

JPivot Display Mondrian Result

I am trying to display the result of a Mondrian query using JPivot. Many examples are showing how to use the tag library for JSP but I need to use the Java API, I looked at the documentation but I cannot understand how to use it to display the results in the table. Here is my code
Query query = connection.parseQuery(mdxQuery);
Result result = connection.execute(query);
result.print(new PrintWriter(System.out,true));
I would like to know if I can use the result object to build the jpivot table.
Thanks in advance!
First of all, using JPivot
is a pretty bad idea.
It was discontinued back in 2008.
There is a good project which is intended to replace the JPivot called Pivot4j. Despite it is currently under development (0.8 -> 0.9 version), Pivot4j can actually do the business.
However, if we're talking about your case:
result.print(new PrintWriter(System.out,true));
This string prints the HTML code with OLAP cube into your System.out.
You can write the HTML code in some output stream (like FileOuputStream), and then display it.
OutputStream out = new FileOutputStream("result.html");
result.print(new PrintWriter(out, true));
//then display this file in a browser
However, if you want to have the same interface as in JPivot, I don't think there is an easy way to do it without .jsp. In these case I strongly recommend you to try Pivot4j.
Good luck!

Printing Multiple PDFs from Java as a single print job (physical printing)

I would like to print multiple pdfs from java (using the java print service) in a single print job.
I would like to send multiple pdfs as a single job to the printer. This is so that all the documents in my 'batch' print together and are not interleaved with someone else's print jobs when I go pick them up from the printer.
A batch potentially consists of 1000s of print jobs.
I tried jpedal, but it does not support java.awt.print.Book
Book book = new Book();
PdfDecoder pdfDecoder = readFileApplyOptions("C:/Temp/singlepagetest.pdf", pageFormat);
book.append(pdfDecoder, pageFormat);
PdfDecoder pdfDecoderTwo = readFileApplyOptions("C:/Temp/printfax-test.pdf",pageFormat);
book.append(pdfDecoderTwo, pageFormat);
printJob.setPageable(book);
printJob.print();
only prints out the first pdf. How do I print multiple pdfs in a single job?
readFileAndApplyOptions() basically creates a new PdfDecoder object and returns it.
I also tried Sun's PDFRenderer PDFRenderer in a similar fashion (using the Book object), but my code still only prints out the first page only.
Has anyone encountered a similar issue before? Is there a solution I might be missing?
Not Java specific, but I've experience of this one in C#. I solved it by printing each document to a file (programmatically equivalent to checking the "PrintToFile" checkbox on a print dialog), then concatenated each file into a memory stream, which I passed to the Win32 API printer spool in raw format (since the output to file was already correctly formatted by default).
You might be able to use a similar technique in Java
I met the same difficulties when printing at once several JPanels of a JTabbedPane, each on a separate page. I gather them in a Book but it only prints the first page.
The Book class works well (right number of pages), but I suppose the problem comes from setPageable. Since a Book is not a Printable, I made it, and it works !
Workaround:
Design a PrintableBook class : extends Book, implements Printable
public class PrintableBook extends Book implements Printable {
Vector<Printable> pages;// NB: we assume pages are single
public PrintableBook() {
super();
pages = new Vector<Printable>();
}
public void add(Printable pp) {
append(pp, pp.getPageFormat());
pages.add(pp);
}
public int print(Graphics g, PageFormat pf, int pageIndex) {
if (pageIndex >= pages.size())
return NO_SUCH_PAGE;
else {
Printable pp = pages.elementAt(pageIndex);
return pp.print(g, pf, 0);
}
}
}
Then use printJob.setPrintable( printableBook ) instead of setPageable
You should merge all of your pdf documents in one document using iText library and then print the merged document page by page.
see Print a PDF Document in Java
AFAIK, you can't, multiple documents will be printed in multiple jobs.
A workaround could be join all the pdf into a single document and print them.
:-/

Categories