Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm looking for an open source OCR library that runs on Linux. I need this to work for PNGs and PDFs. Mostly I would like to interface this library from java or ruby. Any idea if there is anything available?
Regards.
Tesseract is a very good OCR engine: https://github.com/tesseract-ocr/tesseract
The project has been launched by HP Labs and is now continued and sponsored by Google (for Google Books !). It is released under the Apache license, and it runs on Linux. It uses Tiff or PNGs files ; for PDFs, you will need to convert to one of these formats. I suppose that there is no binding so you should invoke this software as a subprogram...
Cuneiform is free and does a decent job. You could invoke it as a subprogram but there's no language binding that I know of. It won't read PDFs directly but you can easily take apart PDFs that are sequences of scanned images to feed them to Cuneiform. There are also scripts to reassemble the images and text back into a searchable PDF.
Try tesjeract, which uses JNI to call Tesseract OCR API.
For PDF, you'll need to convert them to image first, using GhostScript, for instance.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have been working on my own project for the past few months and i have stumbled upon a problem. I need to attach some data from database(in pdf format) to emails i will be sending to clients. I know this could be done using Itext, but for commercial licence they charge around 1300$ for 2500 emails which is insane.
Do you guys know any other library i could use in my application i plan to offer commercially? Any other idea which will help me solve this problem will be greatly appreciated!
Cheers
I used PDFBox in the distant past. Admittedly, I was using it for reading PDF files, but it did a good job and seemed well-designed.
First you can try
flying saucer
Basically it uses an old version of iText which was free (more here)
Flying Saucer will allow you to render PDF on server-side from an HTML template (CSS 2.1 is supported) - we're using this solution (with mustaches templates) in our project
Another option (valid for Google Chrome) - you can do PDF exporting on client, just calling window.print() and using Chrome "Print to PDF' functionality
You can use Apache's FOP library for generating PDF. It is released under Apache 2 licence.
https://xmlgraphics.apache.org/fop/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a user guide for my application that I would like to provide both an HTML verson and a PDF version (and possible some other indexed version for a Java help). Are there any tools, preferably for maven that I could integrate into my build cycle that will convert from HTML to PDF? Currently I have a word doc format that I manually convert to PDF (and no HTML version available), which is prone to errors and really just a pain.
Well, after a short search, I went on http://www.xhtml2pdf.com/ and if you have your HTML, it does the trick.
However, I prefer using a wiki for documentation. It has all advantages, since it can be edited in parallel, in multiple languages, and a lot of them have both static HTML export and PDF export.
I should recommend you Dokuwiki (and you can find your plugins at http://www.dokuwiki.org/plugins) because it's really easy to install and administrate, but you can also use any other one that have PDF and HTML export.
You might use a tool like DocBook and write the documentation in a scripting language (XML in their case). Then use the tool to transform the source to the target formats, e.g. HTML and PDF.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Since GAE has severe restrictions like - "A Java application cannot use any classes used to write to the filesystem"...
Is there a good Java PDF library that can write the PDF to memory for streaming to the cloud?
You can use iText without limitations now. There is no need for a patch since version 5.2.0 anymore.
Have a look at the following post for an example: Generate PDF using GAE and iText
According to this thread on google groups (requires authentication), PDFjet can be used on GAE (it has been slightly modified to replace files by streams at a few places). As they say in the thread:
It's a quite low-level library but should be ok for simple tasks.
As of now, both iText and JasperReports are listed as incompatible on the "Will it play in App Engine" page due to the dependence on several classes that are not in the JRE class whitelist.
Update (2010/09/26): As mentioned by Guido in a comment (and I thank him for that), some people claim they have an iText patch to make it compatible with GAE. Worth the try if you want to play with iText.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Consider the requirements for embedding help in a Java desktop application (or applet):
Single source for content (such as AsciiDoc) to generate high quality PDF manuals1
Hooks for context-sensitive help
Robust, simple, and well documented API (under an hour to learn)
Small footprint (a sub-100K Java archive)2
Integrate as a docked MDI-style window, or a separate window
Free open source software
Google says:
JavaHelp
Java Programming Help
Help Authoring Tools
Which of these, or any others, would you recommend?
1Storing the content in AsciiDoc format would be ideal, so long as conversion is trivial.
2Up to 500kb.
Have a look at DocBook - last time I looked it could generate PDF, HTML and JavaHelp from files written in DocBook XML .
A crash course is available at: http://opensource.bureau-cornavin.com/crash-course/
Definitely Asciidoc, or its more recent cousin AsciiDoctor.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm after a library that can read and write JPEG image metadata. For example if I wanted to embed and read back a short description or story relating to the jpeg image, in the image file itself, what development library/s would you recommend?
I'm not too fussed about what language (it's a new project), though I've tagged this question for languages I'm familiar with (I'd also consider other languages however). Preferably something that's relatively cross-platform (mac/linux/win), such as Java, FreePascal/Lazarus, C++, Objective-C, etc (to be honest I'm not that familiar with cross-platform, so no idea whether C# is a possibility) - aside from more popular ones such as Java or .NET, it would be preferable for there be no requirement to have any particular framework installed.
try here http://www.drewnoakes.com/code/exif/
Looks easy to use
Exif does it all but it is written in Perl.
Perl is cross-platform.
libjpeg is an excellent library written in C. It can be used to do just about any type of jpeg manipulation. I have successfully compiled it in windows, unix and linux.