need to convert a pdf file to a doc file. I found different type of example to generate pdf file but not got pdf to doc.
What your asking is actually very difficult
I recommend you start here and look for a good parsing library. then you would have to write it out in .doc format. Inevitably a lot of the formatting and extra information would be lost. it would be a lot easier to output to docx format, but i assume thats not what your looking for.
I see few possible solutions:
Davisor Publishor 6.2 probably can be used, but it is commercial, and seems that generates only txt from pdf... just have a look
parse pdf with iText, and then
generate doc with Apache POI -
another way to try (free one ;)
look for command line tools, like
Convert PDF To DOC and execute
them from java
Otherwise take a look at Con's answer, there is a link to the list with java pdf processing libraries, maybe some library can do it directly, or can be used to parse pdf (better than iText), and then just use Apache POI to generate doc. Hope it helps ;)
Related
I'm looking for a solution in my "JAVA" project to convert documents (including MS-word files) to PDF. I already used OpenOffice+JodConvertor but the result most of the time is not as good as I expect. Does anyone knows a framwork or a solution rather OpenOffice?
Best
I would take a look at this.
http://poi.apache.org/
The basic API of JAVA that uses RTFEditorKit and HTMLEditorKit, is not able of recognize tags like <br/> and <table>.
So I have searched on internet a better way of converting HTML to RTF and i have found two solutions that seem to work.
JODConverter and HTML-to-RTFconverter. The first one needs OppenOffice installed to work and the second one uses DLL, so it can’t be used on Linux.
Does anyone know about other solution?
Thanks for any help!!!!
Do they want it in RTF or do they want it in Word format? There's a big difference.
Ensure your editor is generating XHTML (or convert it yourself with jtidy, htmlcleanup etc) then download the content as an XHTML but with a .doc extension and the MS Word mime type. Word 2003 or higher will open it as a word doc.
If it is valid html, you can use Apache-FOP.
There are stylesheets for transforming html to FO.
Apache FOP can write PDF and RTF as well.
http://www.torsten-horn.de/techdocs/java-xsl.htm#XSL-FO-Java
http://html2fo.sourceforge.net/index.html
You can take a look at RTF Template (http://rtftemplate.sourceforge.net/) Don't know if it fits your needs, but I used several times under Linux and was OK.
I already used the html-to-pdf and got the expected result. I have helped.
By RTF conversion there is an important issue to care about: a target RTF viewer. All of them declare RTF support, but, for instance, Notepad.exe can only show images in WMF format, it does not display headers and footers. TextEdit on MacOS can only deal with images embedded as a kind of active objects and has troubles with tables, OpenOffice is not tolerant to minor markup inconsistencies etc.
My favorite tool for HTML->RTF conversion is PD4ML - it produces clean, almost human-readable RTF markup and successfully solves another challenging problem for RTF generating tool - a support of nested tables (if you work with HTML - they are everywhere).
HI all
is there any kind of abstraction API over Apache POI/FOP allowing one to use the same API to write both Word and PDF documents ?
I'm not aware of a unified API for the two libraries you have mentioned.
However you may still have a couple of options using a single API:
Use Apache POI to generate the documents in Word format and then use a Word to PDF conversion library to create a PDF from the word document. Another commenter has suggested IText
Use OpenOffice via its Java API to create documents and export them in Microsoft Word or PDF format.
Docmosis will do what you require, assuming you mean a Java (or command line) API. It reads doc and odt files as templates, populates/manipulates via the Java API, and produces the output formats OpenOffice supports. Have a look at the online demo on the web site which lets you see various output formats to render a document in.
When I was working on previous project, I was sure the Apache/POI can be used for Microsoft Documents.
we have IText.jar which we can use it for PDF generation and alteration. please check this will help you.
How to create pdf with complex design views in Java?I have tried it using jasper reports.Is there Any Ideas for creating PDF for Income tax forms?.
A commonly used Java API to create PDF files is iText. Give it a look. API documentation can be found here, code examples can be found here, a tutorial can be found here.
A good but less widely known Java API is OOo API wherein you can create any OOo document to your taste and finally export to PDF.
Have you taken a look at the Apache PDFBox project. I believe you can create PDFs using this library, although it is more commonly used in Lucene to convert PDFs to text to allow indexing.
You could also try Docmosis or JODConverter to do the conversion as long as you can install OpenOffice somewhere. They work on many platforms and can be Java controlled and will save you the hassle of learning the OOo UNO API.
Design your complex PDF Form with the appropriate tools, something like Acrobat Professional. Then from your Java code, you generate an FDF file (Form Data Format) and let the PDF Reader do the merging or you do it from the server-side and stream back the result.
Possible solutions to process FDF are Adobe Java FDF Toolkit or Apache PDFBox.
one approach that requires very little programming is converting your Java object to XML using the Java Binding API for XML (JABX) and then use apache FOP (XSL-FO) to create the PDF from XML. The adavantage of this approach is that is almost 100% declarative, .i.e no programming involved other than executing jabx and apache fop. If you want a tool to create the XSL-FO template, look at J4L FO Designer
You can try ITextPDF.jar Add this jar to your application and please go through the examples to know more about the tags and design procedure used for creating a PDF Document. Check this link for a simple exmaple http://itextpdf.com/examples/iia.php?id=12
I generate a report in doc format using Jasper, Jasper uses JRRTFExporter to generate doc reports, but wen i try and compare doc reports using POI it throws exception stating some header issues. Is there any way to convert rtf to doc in jasper or any API available to convert rtf to doc? Please help!
There are some projects with handling rtf files, but all of them are third-party, so - not reliable, especially because rtf is not open format.
Most reliable solution is to use word automation. Script which makes word to open rtf file and save it as .doc file will consist of tree strings of code in any language and can be easily googled :)
You could use Docmosis to do this if you want, or the OpenOffice UNO API directly since you don't need any other features of Docmosis. If it is an option, using Docmosis to generate the doc from scratch will give you less overhead, but you have to wear the cost of migrating.
Load the RTF in Microsoft.Office.Interop.Word application using a Document object. After reading from the RTF you can save the loaded document object as a Word format (*.doc). I guess this can be a solution. I did this using c# .Net 3.5