I need to replace a specific text or tags in a PDF file and save it.
I tried, iText, PDFBox and other libraries, but nothing works correctly.
Currently am going to use Sejda SDK but I cannot find the code which actually does the replacement.
They have this functionality working in the Desktop app. and in the web app. but I have to dig in the code to find it.
Can any one help please? and thanx in advance.
That is not part of the open source SDK.
Related
i have one pdf viewer which i have downloaded from here :https://andpdf.svn.sourceforge.net/svnroot/andpdf with svn
it's working fine in emulator..but it dosen't have feature of seraching any text like we do in adobe pdf reader.so i want to add text view searching feature in it,can any one tell me how to do it?i am new to android and i don't no too much about it,and if i need to change my library for it than what library i should use and from where i can get it..please help me to solve it if any one knows about it...
Thanks in Advance
Aamirkhan I.
i've been searching on the internet on how to convert a HTML page into a PDF file using Java. i found a lot of pointers, and in short, they don't work or are too difficult to implement. i also downloaded a commercial product, pdf4ml; the API is something i'd be happy to work with, except that when i crawled a simple page on wikipedia, i get a out of memory error (setting Xmx to 1024 M). in some approaches, they suggest converting HTML -> XHTML -> FO -> PDF. however, i am getting a lot of exceptions for the XHTML-to-FO XLS file; and reading the documentations, it's not something that i have enough time to understand right now.
here are my questions/concerns.
1. is there another cohesive API out there that will easily convert HTML to PDF (commercial or not)?
2. is there a way i can simply capture a HTML page and store it as a single file. this approach would be similar to using internet explorer's way of saving a web page as a web archive (single file, MHT format)?
any help is appreciated. (btw, i know this question has been asked repeatedly, but in addition to the original spirit of the question, i'm opened to other ways). thanks.
Try wkhtmltopdf, which is using WebKit. Another option (I'm using that currently) is using OpenOffice (remote controlled via macros).
you may use iText open source Java lib for that, and read this
or use YaHPConverter open source Java lib.
or do this whith help of icepdf popular open source lib
or use pd4ml, but it not free, only trial.
or use this, and this is man for it.
My 2 cents using opensource tools:
You can use either Capture screenshots with Selenium or WebDriver to save html page's screenshot in an image file from your Java code. And once you have image file you can convert it to pdf again from your Java code.
EDIT:
It seems you can do all that in 1 step using itext Html to Pdf
I am not sure but you could Try
1) cobra html rendering engine http://lobobrowser.org/cobra.jsp
2) htmleditorkit -- part of jdk
3) JWebPane
Use the rendering kit to parse and render html. The rendered out put is a swing component. Swing component can be used by itext to generate pdf file out put
You can try out Pdfcrowd. It is an easy to use commercial online API with many options and with support for Java.
It can create PDF either from web pages or raw HTML code.
I am using PD4ML to print a PDF file and It is working fine. Now the thing is I want show that file directly in acrobat with out save that file. In Local version I am using
Program.launch(getFilePath());
It is working fine but in web version I am unable to get that.
Can you please suggest me, Its very helpful.
Thanks,
Vara Kumar PJD
The web isn't like your desktop, so forget about doing things on the web the way you do them on the desktop without at least some effort.
Know that you don't read PDF files on the web using Acrobat without a browser plugin. Or some other reader like Foxit Reader.
My recommendation: forget about doing it this way. Either server your pdf as a file that can be downloaded, or read this SO post about embedding PDF in HTML.
I don't think this will be possible: "showing file outside browser in an application without user consent" because that is how browser are made for security reason. The best you can do is, as pointed in earlier post is by darioo, to show file in browser or prompt user to download/open.
Am working on a Java application that requires me to display PDF documents within the application. Am not sure if Java currently supports this or will i need to get a Java library to get this done.
Please, i need advise on how to go about this.
Thanks in advance.
See pdf-renderer
The PDF Renderer is just what the name implies: an open source, all Java library which renders PDF documents to the screen using Java2D.
Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.