Getting OutOfMemoryError with PDFBox Annotation constructAppearances() method

Getting OutOfMemoryError with PDFBox Annotation constructAppearances() method - java

In a Nutshell
I've been working on a program that gets a pdf, highlights some words (via pdfbox Mark Annotation) and saves the new pdf.
In order to these annotations be visible on some viewers like pdf.js, it's needed to call the pdAnnotationTextMarkup.constructAppearances() before adding the mark annotation into the page Annotation list.
However, by doing so, I get an OutOfMemoryError when dealing with huge documents that would contain thousands of mark annotations.
I'd like to know if there's a way to prevent this from happening.
(this is a kind of a sequel of this ticket, but that's not much relevant for this one)
Technical Specification:
PDFBox 2.0.17
Java 11.0.6+10, AdoptOpenJDK
MacOS Catalina 10.15.2, 16gb, x86_64
My Code
//my pdf has 216 pages
for (int pageIndex = 0; pageIndex < numberOfPages; pageIndex++) {
PDPage page = document.getPage(pageIndex);
List<PDAnnotation> annotations = page.getAnnotations();
// each coordinate obj represents a hl annotation. crashing with 7.816 elements
for (CoordinatePoint coordinate : coordinates) {
PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
txtMark.setRectangle(pdRectangle);
txtMark.setQuadPoints(quadPoints);
txtMark.setColor(getColor());
txtMark.setTitlePopup(coordinate.getHintDescription());
txtMark.setReadOnly(true);
// this is what makes everything visible on pdf.js and what causes the Java heap space error
txtMark.constructAppearances();
annotations.add(txtMark);
}
}
Current Result
This is the heavy pdf doc that is leading to the issue:
https://pdfhost.io/v/I~nu~.6G_French_Intensive_Care_Society_International_congress_Ranimation_2016.pdf
My program tries to add 7.816 annotations to it throughout 216 pages.
and the stacktrace:
[main] INFO highlight.PDFAnnotation - Highlighting 13613_2016_Article_114.pdf...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.ScratchFile.<init>(ScratchFile.java:128)
at org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143)
at org.apache.pdfbox.cos.COSStream.<init>(COSStream.java:61)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDAbstractAppearanceHandler.createCOSStream(PDAbstractAppearanceHandler.java:106)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDHighlightAppearanceHandler.generateNormalAppearance(PDHighlightAppearanceHandler.java:136)
at org.apache.pdfbox.pdmodel.interactive.annotation.handlers.PDHighlightAppearanceHandler.generateAppearanceStreams(PDHighlightAppearanceHandler.java:59)
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup.constructAppearances(PDAnnotationTextMarkup.java:175)
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup.constructAppearances(PDAnnotationTextMarkup.java:147)
at highlight.PDFAnnotation.drawHLAnnotations(PDFAnnotation.java:288)
I've already tried to increase my jvm xmx and xms parameters to like -Xmx10g -Xms10g, which only postponed the crash a little bit.
What I Want
I want to prevent this memory issue and still be able to see my annotations in pdf.js viewer. Without calling constructAppearances the process is much more faster, I don't have this issue, but the annotations can only be seen on some pdf viewers, like Adobe.
Any suggestions? Am I doing anything wrong here or missing something?

In the upcoming version 2.0.19, construct the appearances like this:
annotation.constructAppearances(document);
In 2.0.18 and earlier, you need to initialize the appearance handler yourself:
setCustomAppearanceHandler(new PDHighlightAppearanceHandler(annotation, document));
That line can be removed in 2.0.19 as this is the default appearance handler.
Why all this? So that the document common memory space ("scratch file") is used in the annotation handler instead to create a new one each time (which is big). The later is done when calling new COSStream() instead of document.getDocument().createCOSStream().
All this is of course only important when doing many annotations.
related PDFBox issues: PDFBOX-4772 and PDFBOX-4080

Related

Why does java.awt.Font.getStringBounds give different result on different machines?

I have an application that generates PDF reports (using JasperReports), however if I run my application on my development laptop the text fields have a slightly different size than when I generate the exact same report on the server. I eventually reduced the issue to the following code:
final Font font = Font.createFont(
Font.TRUETYPE_FONT,
MyTest.class.getResourceAsStream("/fonts/europa/Europa-Bold.otf")
).deriveFont(10f);
System.out.println(font);
System.out.println(font.getStringBounds(
"Text",
0,
4,
new FontRenderContext(null, true, true)
));
On my Laptop this prints:
java.awt.Font[family=Europa-Bold,name=Europa-Bold,style=plain,size=10]
java.awt.geom.Rectangle2D$Float[x=0.0,y=-9.90999,w=20.080002,h=12.669988]
On the server this prints:
java.awt.Font[family=Europa-Bold,name=Europa-Bold,style=plain,size=10]
java.awt.geom.Rectangle2D$Float[x=0.0,y=-7.6757812,w=20.06897,h=10.094452]
As you can see I actually ship the font file with the application, so I believe that there is no chance that both machines actually work with a different font.
I would have guessed that under these conditions the output of getStringBounds is system-independent. Obviously it is not. What could possibly cause the difference?

Disclaimer! : I'm not a font dev expert, just sharing my experience.
Yes, it's kind of native. Even new JavaFX web view for example, is depends on webkit.
If you dive into debugging for getStringBounds, you will realize it reaches a point, where font manager should decide to load a concreted font manager, where the class name is supposed to be system property sun.font.fontmanager.
Source code of sun.font.FontManagerFactory
...
private static final String DEFAULT_CLASS;
static {
if (FontUtilities.isWindows) {
DEFAULT_CLASS = "sun.awt.Win32FontManager";
} else if (FontUtilities.isMacOSX) {
DEFAULT_CLASS = "sun.font.CFontManager";
} else {
DEFAULT_CLASS = "sun.awt.X11FontManager";
}
}
...
public static synchronized FontManager getInstance() {
...
String fmClassName = System.getProperty("sun.font.fontmanager", DEFAULT_CLASS);
}
Those DEFAULT_CLASS values could validate your Obviously it is not. What could possibly cause the difference? mark.
The val for sun.font.fontmanager may be sun.awt.X11FontManager for some nix systems, but could be null for windows for instance, so the manager will be sun.awt.Win32FontManager.
Now each manager, may depends on vary underlying shaping/rendering engine/impl for sure(this may help).
Main reason could be the nature of fonts. As they are mostly vector stuffs. So based on platform/env, a rendered text could be bigger, or smaller. e.g., maybe windows apply desktop cleartype, and screen text size(DPI) on requested text rendering.
It seems, even if you have exactly two sun.awt.X11FontManager manager, the results will be vary. this may help too
If you just try out the sample code, on online compilers, you will face with vary results for sure.
Result of ideaone (https://ideone.com/AuQvMV), could not be happened, stderr has some interesting info
java.lang.UnsatisfiedLinkError: /opt/jdk/lib/libfontmanager.so: libfreetype.so.6: cannot open shared object file: No such file or directory
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2430)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2487)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2684)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2638)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:827)
at java.base/java.lang.System.loadLibrary(System.java:1902)
at java.desktop/sun.font.FontManagerNativeLibrary$1.run(FontManagerNativeLibrary.java:57)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:310)
at java.desktop/sun.font.FontManagerNativeLibrary.<clinit>(FontManagerNativeLibrary.java:32)
at java.desktop/sun.font.SunFontManager$1.run(SunFontManager.java:270)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:310)
at java.desktop/sun.font.SunFontManager.<clinit>(SunFontManager.java:266)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:415)
at java.desktop/sun.font.FontManagerFactory$1.run(FontManagerFactory.java:82)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:310)
at java.desktop/sun.font.FontManagerFactory.getInstance(FontManagerFactory.java:74)
at java.desktop/java.awt.Font.getFont2D(Font.java:497)
at java.desktop/java.awt.Font.getFamily(Font.java:1410)
at java.desktop/java.awt.Font.getFamily_NoClientCode(Font.java:1384)
at java.desktop/java.awt.Font.getFamily(Font.java:1376)
at java.desktop/java.awt.Font.toString(Font.java:1869)
at java.base/java.lang.String.valueOf(String.java:3042)
at java.base/java.io.PrintStream.println(PrintStream.java:897)
at Ideone.main(Main.java:19)
Note the failed/missed load of libfreetype which is a native/C font rendering app
Result of coding ground (https://www.tutorialspoint.com/compile_java_online.php)
fnt manager: sun.awt.X11FontManager
java.awt.Font[family=Dialog,name=tahoma,style=plain,size=10]
java.awt.geom.Rectangle2D$Float[x=0.0,y=-9.282227,w=22.09961,h=11.640625]
Result of jdoodle (https://www.jdoodle.com/online-java-compiler/)
fnt manager: sun.awt.X11FontManager
java.awt.Font[family=Dialog,name=tahoma,style=plain,size=10]
java.awt.geom.Rectangle2D$Float[x=0.0,y=-9.839991,w=24.0,h=12.569988]
My machine
fnt manager: null
java.awt.Font[family=Tahoma,name=tahoma,style=plain,size=10]
java.awt.geom.Rectangle2D$Float[x=0.0,y=-10.004883,w=19.399414,h=12.0703125]
My Story (may help, you may try)
I had similar issue back in some years ago, where text rendering using exactly same embedded font failed on macOs/jdk8, for complex text rendering(lots of ligatures). Not just sizes, but also broken ligatures, kerning, etc...
I could fix my issue(cannot remember if fixed the sizing, but no broken ligatures for sure), using another workground, as following
InputStream is = Main.class.getResourceAsStream(fontFile);
Font newFont = Font.createFont(Font.TRUETYPE_FONT, is);
GraphicsEnvironment.getLocalGraphicsEnvironment().registerFont(newFont);
//later load the font by constructing a Font ins
Font f = new Font(name/*name of the embedded font*/, style, size);
Registering font using GraphicsEnvironment, and then instancing it using Font fixed our issue. So you may also give it a try.
Solution
Finally, I just step-down the jdk stuffs(it's really great pain in neck) for good, and came up with harfbuzz(shaping) + freetype(rendering) native impl that indeed was a peace in mind.
So...
• You may consider your production server(easy way) as reference for rendered font advance and rendering, and validate the result based on it(rather than dev machine)
• Or, use a cross and standalone (and probably native) shaping/rendering font engine/impl to make sure dev and production results will be the same.

iText 5.5.11 - bold text looks blurry after using PdfCleanUpProcessor

I need to remove some content from an existing pdf created with Jasper Reports in iText 5.5.11 but after running PdfCleanUpProcessor all bold text is blurry.
This is the code I'm using:
PdfReader reader = new PdfReader("input.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("output.pdf"));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
cleanUpLocations.add(new PdfCleanUpLocation(1, new Rectangle(0f, 0f, 595f, 680f)));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
As already discussed here downgrading to itext-5.5.4 solves the problem, but in my case itext-5.5.11 is already in use for other reasons and so downgrading is not an option.
Is there another solution or workaround?
This are the pdf files before and after cleaning: BEFORE - AFTER

By comparing the before and after files it becomes clear that for some reason the PdfCleanUpProcessor falsely drops general graphics state operations (at least w, J, and d).
In your before document in particular the w operation is important for the text because a poor man's bold variant is used, i.e. instead of using an actual bold font the normal font is used and the text rendering mode is set to not only fill the glyph contours but also draw a line along it giving it a bold'ish appearance.
The width of that line is set to 0.23333 using a w operation. As that operation is missing in the after document, the default width value of 1 is used. Thus, the line along the contour now is 4 times as big as before resulting in a very fat appearance.
This issue has been introduced in commit d5abd23 (dated May 4th, 2015) which (among other things) added this block to PdfCleanUpContentOperator.invoke:
} else if (lineStyleOperators.contains(operatorStr)) {
if ("w" == operatorStr) {
cleanUpStrategy.getContext().setLineWidth(((PdfNumber) operands.get(0)).floatValue());
} else if ("J" == operatorStr) {
cleanUpStrategy.getContext().setLineCapStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("j" == operatorStr) {
cleanUpStrategy.getContext().setLineJoinStyle(((PdfNumber) operands.get(0)).intValue());
} else if ("M" == operatorStr) {
cleanUpStrategy.getContext().setMiterLimit(((PdfNumber) operands.get(0)).floatValue());
} else if ("d" == operatorStr) {
cleanUpStrategy.getContext().setLineDashPattern(new LineDashPattern(((PdfArray) operands.get(0)),
((PdfNumber) operands.get(1)).floatValue()));
}
disableOutput = true;
This causes all lineStyleOperators to be dropped while at the same time an attempt was made to store the changed values in the cleanup strategy context. But of course using == for String comparisons in Java usually is a very bad idea, so since this version the line style operators were dropped for good in iText.
Actually this code had been ported from iTextSharp, and in C# == for the string type works entirely different; nonetheless, even in the iTextSharp version these stored values at first glance only seem to have been taken into account if paths were stroked, not if text rendering included stroking along the contour.
Later on in commit 9967627 (on the same day as the commit above) the inner if..else if..else.. has been removed with the comment Replaced PdfCleanUpGraphicsState with existing GraphicsState from itext.pdf.parser package, added missing parameters into the latter, only the disableOutput = true remained. This (also at first glance) appears to have fixed the difference between iText/Java and iTextSharp/.Net, but the line style values still are not considered if text rendering included stroking along the contour.
As a work-around consider removing the lines
} else if (lineStyleOperators.contains(operatorStr)) {
disableOutput = true;
from PdfCleanUpContentOperator.invoke. Now the line style operators are not dropped anymore and the text in your PDF after redaction looks like before. I have not checked for any side effects, though, so please test with a number of documents before even considering using that work-around in production.

Exception when adding a PdfFormField to a big PDF

I am adding a PdfTextFormField over a Table cell using a custom renderer, as per the iText7 example code in CreateFormInTable.java. This works initially, until I create a Table on page 3 or later of the PDF, at which point I'm getting an exception:
Caused by: java.lang.NullPointerException
at com.itextpdf.kernel.pdf.PdfDictionary.get(PdfDictionary.java:552)
at com.itextpdf.kernel.pdf.PdfDictionary.getAsArray(PdfDictionary.java:156)
at com.itextpdf.kernel.pdf.PdfPage.getAnnotations(PdfPage.java:746)
at ...pdf.annot.PdfAnnotation.getPage(PdfAnnotation.java:435)
at ...forms.fields.PdfFormField.regenerateField(PdfFormField.java:1761)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:1038)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:999)
at ...forms.fields.PdfFormField.setValue(PdfFormField.java:994)
etc.
It seems fairly easy to reproduce, and I can provide a full code sample if you want, but a simple way to see the problem is to insert:
for (int i=1; i < 2; i++) // Change 2 to 3 and you get an NPE
{
Paragraph para = new Paragraph("Page "+ i);
doc.add( para );
doc.add( new AreaBreak( AreaBreakType.NEXT_PAGE ) );
}
straight after the Document constructor in the aforementioned iText7 Java sample file at:
http://developers.itextpdf.com/examples/form-examples/clone-create-fields-table#2350-createformintable.java
I've tested it on 7.0.1 and 7.0.2, with same result.

Well, currently some of the form-related functionality requires the whole PDF document structure to be in memory to operate. This means that no object can be flushed. But layout's DocumentRenderer flushes the pages when possible. The problem reproduces only for three or more pages because there is a small "window" of unflushed pages.
This is indeed not mentioned in the sample and can be improved in the future. In the current version, to get the desired PDF, you can set the Document to operate in "postpone flushing" mode using the following constructor:
Document doc = new Document(pdfDoc, PageSize.A4, false);

Xtext ecore file size limited to 50kb?

The Xtext projects ecore file exceeds the 50kb.
The workflow generation always works fine. But when i start the editor it crashes.
If i comment out some grammar rules, reducing the ecore file size to less than 50kb it works well. But as soon it exceeds the limit following exception arise:
!MESSAGE com.sample.mydsl.ui.internal.MyDslActivator - Failed to create injector for com.sample.mydsl.MyDsl
...
Caused by: java.lang.RuntimeException: Missing serialized package: myDsl.ecore
at com.sample.mydsl.myDsl.impl.MyDslPackageImpl.loadPackage(MyDslPackageImpl.java:5897)
at com.sample.mydsl.myDsl.impl.MyDslPackageImpl.init(MyDslPackageImpl.java:1084)
at com.sample.mydsl.myDsl.MyDslPackage.<clinit>(MyDslPackage.java:58)
I am pretty sure that it's not the rules logic itself, because i also tested to limit the grammar to running conditions. And extended it then by mock rules to increase the file size. Anyway it crashed...
I guess the problem is lying deeper than exceptions message shows.
My workflow is configured as follows:
fragment = parser.antlr.XtextAntlrGeneratorFragment auto-inject {
options = {
classSplitting=true
fieldsPerClass = "500"
methodsPerClass = "500"
}
}
same settings for XtextAntlrUiGeneratorFragment
Did anyone collect experience with those problem already? I would be very grateful for some suggestions.

Exception created : java.lang.OutOfMemoryError

I have made some modification in a code of an existing application. While testing i am getting Exception created : java.lang.OutOfMemoryError. But the error occurs only once in a while. Below the is the code snippet where the error occurs
}else if(subject.equals("Mobile")){
to=(String)hashMap.get("M_MOBILETOMAIL");
m_mobileoptionvalue=(String)parameters.get("m_mobileoptionvalue");
m_mobileq1value=(String)parameters.get("m_mobileq1value");
StringTokenizer m_tokenizer1 = new StringTokenizer(m_mobileq1value,"|");
while (m_tokenizer1.hasMoreTokens()){
m_mobileq1List.add(m_tokenizer1.nextToken());
}
m_mobileq2value=(String)parameters.get("m_mobileq2value");
StringTokenizer m_tokenizer2 = new StringTokenizer(m_mobileq2value,"|");
while (m_tokenizer2.hasMoreTokens()){
m_mobileq2List.add(m_mobileq2value);
}
m_mobileq3value=(String)parameters.get("m_mobileq3value");
StringTokenizer m_tokenizer3 = new StringTokenizer(m_mobileq3value,"|");
while (m_tokenizer3.hasMoreTokens()){
m_mobileq3List.add(m_mobileq3value);
}
m_mobileq4value=(String)parameters.get("m_mobileq4value");
m_mobileq4=(String)parameters.get("m_mobileq4");
}
The error i am gettting is in the line
m_mobileq2List.add(m_mobileq2value);
Also attaching the JVM logs ----
exception created in one of the service methods of the servlet MailSend in application interact_assorted_intapp7. Exception created : java.lang.OutOfMemoryError
at java.util.ArrayList.newElementArray(ArrayList.java:94)
at java.util.ArrayList.growAtEnd(ArrayList.java:375)
at java.util.ArrayList.add(ArrayList.java:158)
at com.international.servlets.MailSend.doPost(MailSend.java:473)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:738)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:831)
I went through a few related post but did not get any proper results.Also Increase of HeapSize is out of scope.

while (m_tokenizer2.hasMoreTokens()){
m_mobileq2List.add(m_mobileq2value);
}
You are never moving your tokenizer pointer forward, so when this condition is met, it is infinitely adding the first token to your list. Try
while (m_tokenizer2.hasMoreTokens()){
m_mobileq2List.add(m_tokenizer2.nextToken());
}

If you are running out of memory and you can't increase the heap size then all you can do is try and use less memory.
Attach a profiler of some kind to your application (most IDEs have one built in) and look at where the memory is going and what you can do to reduce it, or remove any potential resource leaks you may have.
It's also worth running findbugs against your project and seeing if that finds anything. Again it's available as a plugin for most IDEs.

Issue in your code is infinite while loop.change your code to
m_mobileq2List.add(m_tokenizer2.nextToken());
Also make null all your Strings after use.Go for StringBuffer,StringBuilder instead of Strings whenever possible.If you are using any Input/Output Stream close them after use and make them null.Making large objects null saves lot of memory.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting OutOfMemoryError with PDFBox Annotation constructAppearances() method - java

Related

Why does java.awt.Font.getStringBounds give different result on different machines?

iText 5.5.11 - bold text looks blurry after using PdfCleanUpProcessor

Exception when adding a PdfFormField to a big PDF

Xtext ecore file size limited to 50kb?

Exception created : java.lang.OutOfMemoryError

Categories

Resources