Apache PDFBox Removes Horizontal Lines When Converting to PNG

Apache PDFBox Removes Horizontal Lines When Converting to PNG - java

I have a PDF that when I render it to a png it removes the horizontal and vertical lines. This is the PDF and what it should look like: https://drive.google.com/file/d/1sAXwnaoZ-QJn1Kbpw85hhzV_X5zwgfkA/view?usp=sharing
And here is the PNG of the PDF using PDFBox 2.0.13:
Why are those lines removed and how can I get them to be rendered in the PNG?

The problem (most likely) is that you have no Java ImageIO plugin for the JBIG2 image format installed as the missing lines and headings are actually JBIG2 images.
When I run the PDFBox PDF Debugger without such a plugin and open your PDF in it, it does not display the missing parts either; having added such a plugin to its classpath, it suddenly does display them.
For more details on the PDFBox dependencies please read the PDFBox 2.0 Dependencies page. In particular
JAI Image I/O
PDF supports embedded image files, however support for some formats require third party libraries which are distributed under terms incompatible with the Apache 2.0 license:
Reading JBIG2 images: JBIG2 ImageIO
Reading JPEG 2000 (JPX) images: JAI Image I/O Tools Core
Writing TIFF images requires JAI Image I/O Tools Core also.
These libraries are optional and will be loaded if present on the classpath, otherwise support for these image formats will be disabled and a warning will be logged when an unsupported image is encountered.
Maven dependencies for these components can be found in parent/pom.xml. Change the scope of the components if needed. Please make sure that any third party licenses are suitable for your project.
To include the JBIG2 library the following part can be included in your project pom.xml:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jbig2-imageio</artifactId>
<version>3.0.0</version>
</dependency>

Related

Apache FOP: fox:external-document and fo:external-graphic

i've got a problem with Apache FOP and creating PDF files. I'm trying to include an image as a header and an another pdf as an attachment.
Running my application (compiled with Java 8) on Windows it creates PDF with image header and PDF attachment; running the same jar on AS400 machine I got PDF only with image header and no attachment.
My jar is created with shade plugin; does exists a particular order in pom dependencies to obtain the same result on Windows and AS400?
Thanks in advance.

Fop vs xmlgraphics jar conflict

I have a heavy monolithic guidewire application that has both fop.jar and xmlgraphics-commons.jar.
So, I am trying to add an image into a pdf file which is supposed to get rendered with the xmlgraphics-commons.jar. This jar has about 5/6 image renderers available .
Now when I try to print this pdf, the fop jar is used to render some other text onto the pdf and for the image, the xmlgraphics-commons.jar within the fop jar is used. This particular version of the same jar has somehow just 2 image decoders and although I tried to match the format SVG/WMF, it still doesn't work. I get this 'No-ImagePreloader Found' error.This gw application works correctly in my local but when deployed to websphere, it gives this error. Maybe something because of classloader policies.
I cannot use a custom class loader as I don't have control over the code here because the java call happens internally as I am using .pcf files for pdf printing.
I examined the fop and xmlgraphics-commons jars and clearly see the difference in the Meta-Inf Services files.
What can I do to get this working ? Maybe there is a simple solution I am missing out on.
i can only work with these jars.
I checked some similar questions but they dont help me. Also this below thread might get you understand the crux of my issue better.
Apache FOP in a Java Applet - No ImagePreloader found for data
I tried updating the 'org.apache.xmlgraphics.image.loader.spi.ImageLoader' file in meta-inf/services of fop to give the implementation from my external xmlgraphics jar. Got the same error.
If this application had been a maven one, I would have simply excluded this xmlgraphics within fop and used the external one.

Trouble loading .png file using LibGDX

It will load files badlogic.jpg and icon.png but it won't load start or stop.png files, yet they are in same folder.
Using Android Studio

LibGdX uses stb_image for image loading.
no 16-bit-per-channel PNG is one of the limitation of std_image so you
need to use 8 bit .png image.
https://github.com/libgdx/libgdx/blob/master/gdx/jni/gdx2d/stb_image.h#L24
You may check your image by ImageMagick. You can also use pngcheck for the same.

How to configure BIRT Report Engine to load fonts directly from the classpath?

I am writing a Java application that uses BIRT to produce reports. I want to package custom fonts in a jar file and be able embed them in PDF reports.
I could extract fonts to the file system first and then point BIRT to the file system locations, but I wonder whether it is possible to configure BIRT to load fonts directly from the classpath?

I consulted the source code of BIRT and found that it is impossible to configure BIRT to register embeddable fonts from the classpath. BIRT registers fonts by the paths specified in fontsConfig.xml. It uses iText's FontFactory. Surprisingly, FontFactory itself can register fonts from the classpath. But the developers of BIRT probably don't know about this feature, so BIRT don't register any font that is not on the file system, i.e. when File#exists() returns false.
Fortunately, FontFactory.register() is a static method, so there is a workaround: we can register fonts ourselves bypassing BIRT. We can do just the following before initializing BIRT:
FontFactory.register("/com/example/fonts/font1.ttf");
FontFactory.register("/com/example/fonts/font2.ttf");
I tried this, and fonts are correctly embedded in the PDF output.

Many thanks #dened.
Using your answer, I found custom fonts can also be loaded as resources by:
Copying the fonts into the resources folder (eg src/main/resources for a Maven project)
In the BIRT engine initialisation code, register the fonts without specifying the path. Just use the filenames eg:\
import com.lowagie.text.FontFactory;
...
FontFactory.register("gillsans.ttf");
FontFactory.register("GILLUBCD.TTF");
The FontFactory will search for and find these files in the resources folder. This works for BIRT runtime 4.4.2 with iText v2.1.7.
This seems to be a good way to load non-standard fonts into the BIRT runtime engine so that they work in generated PDFs. If this approach is used, the fonts don't need to be added to the system fonts folder or the JRE/lib/fonts folder, and the fontsConfig.xml files in BIRT's jars don't need to be edited... Everything is contained within the application.

Using Solr CELL's ExtractingRequestHandler to index/extract files from package formats

Can you use ExtractingRequestHandler and Tika with any of
the compressed file formats (zip, tar, gz, etc) to extract the content out for indexing?
I am sending solr the archived.tar file using curl. curl "
http://localhost:8983/solr/update/extract?literal.id=doc1&fmap.content=body_texts&commit=true"
-H 'Content-type:application/octet-stream' --data-binary
"#/home/archived.tar"
The result I get when I query the document is that the file names inside the
archive are indexed as the "body_texts", but the content of those files is
not extracted or included. This is not the behavior I expected. Ref:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example.
When I send 1 of the actual documents inside the archive using the same curl
command the extracted content is then stored in the "body_texts" field. Am
I missing a step for the compressed files?
I have added all the extraction dependencies as indicated by mat in
http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-cell and
am able to successfully extract data from MS Word, PDF, HTML documents.
I'm using the following library versions.
Solr 1.40, Solr Cell 1.4.1, with Tika Core 0.4
Given everything I have read this version of Tika should support extracting
data from all files within a compressed file. Any help or suggestions would
be appreciated.

The short answer: Solr Cell 1.4.1 and Tika Core 0.6.
The long answer: After a lot of headaches I was able to get this working. I'll answer it for both people using solr directly and for people using solr with the Ruby library sunspot (which was my problem).
Here was what I did: I used this https://github.com/tomasc/sunspot_cell plugin to extend sunspot and give it the attachment feature. (Ignore this step if you're not using ruby/sunspot)
v1.4.1 works for individual files but not with compressed files, so I had to explore a bit. I downloaded the v1.4.1 codebase from http://lucene.apache.org/solr/ and grabbed the dist/apache-solr-cell-1.4.1.jar then I had to pull down the Tika libraries from the 1.5 branch http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/contrib/extraction/lib/.
You can download each individually, or you can use svn to checkout the branch by
svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.5-dev
Or just checkout the library folder:
svn co http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.5-dev/contrib/extraction/lib/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.