I tried to print a PDF document from Java using PDFRenderer and ICEpdf.
In both cases some of the text came out rotated in 180 degrees while the images stayed correct.
With PDFREndere all the text is rotated and in ICEpdf only some of the lines.
Any idea why is this happening?
Sounds like a bug in pdf renderer. Have you tried their bug tracker? https://pdf-renderer.dev.java.net/servlets/ProjectIssues
Perhaps the pdf is broken. Also try to open the pdf using Ghostscript. Acrobat is too indulgent with broken or malformed pdf files (it automatically correct some "syntax" error in the pdf structure).
Sounds like the font you are using in the document isn't available on the system. I had the very same problem with java-generated eps files.
Related
the task is to write a Java file that analyzes a PDF file. PDFBox from Apache should be used.
The number of words, the number of images, the names of the fonts used, etc. are all no problem.
My problem is: How do I get all used Font Sizes in the PDF file? I read a lot, that I have to use TextStrippe and writeString, but I dont see a solution.
So how do I get the Font Sizes in pt. for a pdf file? Has anyone an idea or solution?
I used org.apache.pdfbox fontbox 2.0.22 and org.apache.pdfbox pdfbox 2.0.22 to convert PDF to pictures, which can be converted normally in centOS and windows, but in centOS-ARM, the converted pictures are abnormal.
Below is the link to the full picture.
https://drive.google.com/file/d/1IAa_VuHXA592AK_fkE6ur5wLq6QJ8KZe/view?usp=sharing
https://drive.google.com/file/d/1rMJtaA4CL5yFQLCRcgBJF2aVe0AUsik4/view?usp=sharing
Under centos-arm, use jdk-11.0.10_linux-aarch64_bin.tar.gz, and the transferred pictures are good, thanks to the help of Tilman Hausherr.
I'm working on an e-learning project. I have the pdf file's and I have to do the validation, that is pdf file contain the RGB/CMYK color profile or not.
If color profile is found RGB that is require to reject the file. I have tried so much but did not get appropriate logic/answer.
If anyone has any idea that how can i do this in itext or other java pdf library. please suggest me.
iText is for producing PDF files programmatically (e.g. converting from HTML to PDF, or producing PDF reports.) - it is only for producing, not for rendering, so you can't use it to check color.
In order to check color of a pixel in the PDF document
you need to render it to a BufferedImage or so, and then take the color of pixel in specific (x,y) position.
To render PDF you could use a library like ICEpdf, jpedal.
There is a topic on SO about java pdf renderer libraries Java PDF Renderer
Any java library?
How to make searchable text using any java library?
Open source or Paid.
how to apply OCR to pdf using PDFBox?
how to make pdf text searchable programmatically using pdfbox
I searched alot. Didn't find any solution.
Can anyone paste code for OCR PDFBox.
Try Apache PDFBox.
To extract text: Textextraction.
Any java library? How to make searchable text using any java library? Open source or Paid.
You can achieve this using Gnostice XtremeDocumentStudio for Java. For more details, follow the link below.
http://www.gnostice.com/nl_article.asp?id=289&t=How_to_convert_scanned_images_to_searchable_PDF_in_Java
FYI, in the article, we have demonstrated how to convert scanned image to searchable PDF. In fact the input can be any scanned document (images, PDF or DOCX).
Disclaimer: I work for Gnostice.
You can use PDFBox to extract images from a PDF file, and then use the OCR system of your choice (for example, Tesseract) to obtain the text. Alternatively, if the PDF is mixed text and images, you can use Ghostscript to create an image of each PDF page, and then run OCR.
If you then need a searchable PDF file, build a new PDF by writing the text first, and then drawing the image over top of the text. The text will be searchable, but you will only see the image.
Note that OCR engines like Tesseract and Google Vision will return positional information for each word, so you will be able to place the text in the correct position.
I'm running across a problem trying to print a crystal report in java where all of the text is being replaced with the little box characters. The report blob is stored in an Oracle database, and I can preview it using adobe reader and see that it is properly formed with actual text. This blob is passed to a java applet that uses the PDFRenderer to print it.
My theory is that the problem lies in the fact that the crystal reports that we generate use version 1.2 of PDF. There are also a number of jasper reports that are generated as version 1.4 and these print correctly - it's only the 1.2 pdfs that have this problem.
Does PDFRenderer not support printing this version or is there some additional steps I need to take to successfully print those?
Any help is greatly appreciated.
It's very unlikely that you encounter an issue that's due to PDF version.
Especially with text content the PDF spec get's very complex and probability is high that crystal reports creates content that either
relies on some strange encoding
uses CID (multibyte) font techniques
and pdf renderer has a blind spot there.
You may try to play around with settings on the report side regarding the
encoding
font (Type1 / TrueType9)
font embedding
and maybe you find an option better suited.
Does PDFRenderer display the PDFs if you use it as a viewer? PDFRenderer does not have support for later PDF versions (ie compressed objects) but 1.2 is fairly straight-forward.