Multipage PDF Java - java

I currently have a program that can take a pdf saved to an android device and calls an intent to display the pdf. What I wish to do is to create a multipage pdf out of 2 pdf files saved to the device. Is there some sort of token or symbol that pdfs look for that tells the displayer to create a new page? What can I put between the two inputstreams to denote the fact that I want a new pdf page?
I did a lot of research and tried to use itext but it seemed to over-complicate my program. Is there a simple way to achieve my goal?
Thanks in advance!

Related

Is it possible to merge multiple pdf pages to one sheet of pdf and print in java?

I am trying to create Java program, which can read multiple pdf files and merge them into a single pdf file.
Then print the PDF but while printing i need to merge multiple pdf pages in a single page and print.
Even if it is a new pdf created, that is fine for me.
I need some open source java pdf manipulation library to handle this.
I know one solution is while printing, select multiple printing option for muliple pages to one sheet. But the printer i can access doesn't have such capabilities.
Can anybody suggest some solution for this or is it possible in any other programming language?

Is there any function in Android to add PDF files?

I am making an Android app which will contain several PDF files, so that when I share this application with my friends they must be able to download these PDF files. But I am unaware of how to add PDF files in Android applications. Is there any function to do so?
Webview would be the easiest form of PDF viewing. Maybe start from there and work yourself up to something more demanding.

How to Make Existing PDF Text Searchable using any Java Library? With OCR

Any java library?
How to make searchable text using any java library?
Open source or Paid.
how to apply OCR to pdf using PDFBox?
how to make pdf text searchable programmatically using pdfbox
I searched alot. Didn't find any solution.
Can anyone paste code for OCR PDFBox.
Try Apache PDFBox.
To extract text: Textextraction.
Any java library? How to make searchable text using any java library? Open source or Paid.
You can achieve this using Gnostice XtremeDocumentStudio for Java. For more details, follow the link below.
http://www.gnostice.com/nl_article.asp?id=289&t=How_to_convert_scanned_images_to_searchable_PDF_in_Java
FYI, in the article, we have demonstrated how to convert scanned image to searchable PDF. In fact the input can be any scanned document (images, PDF or DOCX).
Disclaimer: I work for Gnostice.
You can use PDFBox to extract images from a PDF file, and then use the OCR system of your choice (for example, Tesseract) to obtain the text. Alternatively, if the PDF is mixed text and images, you can use Ghostscript to create an image of each PDF page, and then run OCR.
If you then need a searchable PDF file, build a new PDF by writing the text first, and then drawing the image over top of the text. The text will be searchable, but you will only see the image.
Note that OCR engines like Tesseract and Google Vision will return positional information for each word, so you will be able to place the text in the correct position.

How to read content of scanned pdf file in java / jsp or in javascript

How can i read content of scanned pdf file in java/jsp or in javascript, can you tell how to achieve this with developing code?
advance thanks for reply
You can convert the scanned PDF to a image using GhostScript and then feed it to an OCR engine, such as Tesseract. Take a look at VietOCR for an example implementation.
What you are trying to do (I think) is use OCR to extract text from a image PDF produced by a scanner. Java is probably the best for doing this. There are a number of options for doing this, depending on whether you are prepared to pay for software to do this. Google for Java (or Javascript), PDF and OCR.
IMO, this task is not something that should be done in a JSP. JSPs are best for rendering results ... not for generating them in the first place.
Actually, I am working on the same project at the moment, I am doing this in the following steps and the result works well.
User upload a scanned pdf to PDFUploader servlet, returns a server side file name to front end, which indicates upload is successful.
Front end uses this file name and default page 0 to ask PDFReader servlet to retrieve the first page of pdf file and display is at the front end, you can convert this pdf to a image for use an iframe to have the embedded pdf reader.
Front end uses this file name and default page 0 to ask OCRServlet to perform OCR. I am using WeOCR and tesseract as my OCR engine in an Apache http server. I have modified some parts of the submit.cgi in WeOCR server since I know what types of the format that the WeOCR server will receive. I still have some problems while I convert the scanned pdf to an image (I am using pdfbox )
Google for anything OCR related,
best bet will be to use existing libraries like http://asprise.com/product/ocr/index.php?lang=java

What is the best way to display multiple PDF files via browser?

I'm developing an web application using Flex and JSP.
I am having some performance issues with displaying multiple PDF files.
I am trying to display about 50-100 PDF files. I know that is a little crazy.
Hence, I made the project to convert PDF files to JPG format and display the JPG files.
I'm wondering if there is a way to decrease the file size of PDF to size of JPG.
Additionally, I would like to seek other way that may improve the performance.
Does anyone know a good way to display many PDF files (that will be mostly just text) for web application? Or, should I just have it display JPG files?
If the PDF files are mostly text you should probably use HTML. Is there something that would prevent you from making regular pages from your PDFs?
You can convert the PDF to rtf text file, use the text from rtf file to populate your HTML page perhaps in a table.
Check out ghostscript lib for doing this conversion.

Categories