Could you please help me to find a way to convert the PDF file to XPS file with java programmable? Is it possible to do that as freeware library??
Its not the best solution but according to this thread you can use ghostscript (invoking command line command) to convert the PDF to Images, from those images then create an XPS document:
gswin32c.exe -Z, -sDEVICE=png16m-sOutputFile="%04do.png" "temp.pdf"
Related
Any java library?
How to make searchable text using any java library?
Open source or Paid.
how to apply OCR to pdf using PDFBox?
how to make pdf text searchable programmatically using pdfbox
I searched alot. Didn't find any solution.
Can anyone paste code for OCR PDFBox.
Try Apache PDFBox.
To extract text: Textextraction.
Any java library? How to make searchable text using any java library? Open source or Paid.
You can achieve this using Gnostice XtremeDocumentStudio for Java. For more details, follow the link below.
http://www.gnostice.com/nl_article.asp?id=289&t=How_to_convert_scanned_images_to_searchable_PDF_in_Java
FYI, in the article, we have demonstrated how to convert scanned image to searchable PDF. In fact the input can be any scanned document (images, PDF or DOCX).
Disclaimer: I work for Gnostice.
You can use PDFBox to extract images from a PDF file, and then use the OCR system of your choice (for example, Tesseract) to obtain the text. Alternatively, if the PDF is mixed text and images, you can use Ghostscript to create an image of each PDF page, and then run OCR.
If you then need a searchable PDF file, build a new PDF by writing the text first, and then drawing the image over top of the text. The text will be searchable, but you will only see the image.
Note that OCR engines like Tesseract and Google Vision will return positional information for each word, so you will be able to place the text in the correct position.
Currently, Adobe doesn't offer PostScript drivers for Solaris OS. I need to print a PDF file using Java Program. I use the PDFBox 1.8.0. I think if I convert PDF to PS using java, I can print the file without any problem. Any of you know, How to convert PDF to PS using Java?
I am wanting to make a simple word counter for my latex documents so that I can double check my word count is accurate. More generally it is useful to discover whether java can interpret text from pdf files anyway. A google on it brought nothing up so I am thinking maybe not? If not, why?
You can't read text from a .pdf without a PDF file reader. Here are a couple of Java .pdf libraries:
Apache PDF Box
iText
See also this link, for an example of Java text extraction with PDF Box:
http://pdfbox.apache.org/userguide/text_extraction.html
Can any body tell me is there any API in Java (1.5) which converts a Microsoft SNP file to PDF format please?
I found a command line tool (Snp2Pdf.exe) to convert SNP docs to PDF format. Where in we can invoke the same using Process and run the program.
I'm using a JTree to browse the content of a folder and I want that when a user click on a file, the software shows a preview of it (a screenshot of its first page).
The files are mostly Office documents and PDF.
I manage to do it for PDF file using a module downloaded from Sun, but I'd like to know if there is a way to do it using any software (JARs preferably) or even the built-in Windows API.
I was thinking of converting the file to PDF then do a preview of this PDF but this isn't optimal.
Any ideas ?
I've got the similar problem and the best I found after couple of days of googling is following.
Alfresco has the same problem and resolved it with :
An open office which runs in server mode (socket) and all the office documents are sent by alfresco to open office in order to convert them in PDF
Those PDF are converted to .swf viewer thanks to SWFTOOLS
This .swf is integrated in the HTML
For images, it uses ImageMagick to create small version of the file I suppose
Personnaly, I will try to implement it this way :
Converting office documents to PDF thanks to open office in socket mode
Transform the first page of the PDF into a PNG thanks to JPedal library (the LGPL version)
Diplay that PNG to the end user
For images I would perhaps use ImageMagick too ... but for now, I'm using Seam Image.scaleToFit API
I had the same problem too and stumbled over this thread. Starting with the solution from Anthony I am using Libre Office in socket mode to convert office documents directly to a PNG. Unfortunately this isn't posible from PDF's. Here is a good overview which ways are possible.
unoconv --connection 'socket,host=127.0.0.1,port=2220,tcpNoDelay=1;urp;StarOffice.ComponentContext' -f png -e PageRange=1 your_file_name.extension
Little reference to start Libre Office in socket mode: click me
I asked this a long time ago: solution