The basic API of JAVA that uses RTFEditorKit and HTMLEditorKit, is not able of recognize tags like <br/> and <table>.
So I have searched on internet a better way of converting HTML to RTF and i have found two solutions that seem to work.
JODConverter and HTML-to-RTFconverter. The first one needs OppenOffice installed to work and the second one uses DLL, so it can’t be used on Linux.
Does anyone know about other solution?
Thanks for any help!!!!
Do they want it in RTF or do they want it in Word format? There's a big difference.
Ensure your editor is generating XHTML (or convert it yourself with jtidy, htmlcleanup etc) then download the content as an XHTML but with a .doc extension and the MS Word mime type. Word 2003 or higher will open it as a word doc.
If it is valid html, you can use Apache-FOP.
There are stylesheets for transforming html to FO.
Apache FOP can write PDF and RTF as well.
http://www.torsten-horn.de/techdocs/java-xsl.htm#XSL-FO-Java
http://html2fo.sourceforge.net/index.html
You can take a look at RTF Template (http://rtftemplate.sourceforge.net/) Don't know if it fits your needs, but I used several times under Linux and was OK.
I already used the html-to-pdf and got the expected result. I have helped.
By RTF conversion there is an important issue to care about: a target RTF viewer. All of them declare RTF support, but, for instance, Notepad.exe can only show images in WMF format, it does not display headers and footers. TextEdit on MacOS can only deal with images embedded as a kind of active objects and has troubles with tables, OpenOffice is not tolerant to minor markup inconsistencies etc.
My favorite tool for HTML->RTF conversion is PD4ML - it produces clean, almost human-readable RTF markup and successfully solves another challenging problem for RTF generating tool - a support of nested tables (if you work with HTML - they are everywhere).
Related
I have this requirement to convert multiple DOCX files into HTML format and if possible into RTF
Docx4j seems to be a good java library for doing this.
Using the HtmlExporterNG2.html method is not necessarily giving out the desired result for me. So I am thinking of modifying the stylesheet that is extracted from the docx file and then using it for this conversion, as all these docx files have varying formatting and hence cannot use a standard stylesheet.
Am I correct in thinking that runtime tinkering with the stylesheet will work? and what are the important thigs I should be aware of?
I am using it as a standalone java application with java version 6.
My query might be a bit vague but am seeking for a right direction at this juncture.
#Jason I want to ignore certain formatting in the input docx. As the converted html had some extra spacing or junk characters etc added into it.
As a solution I created a new xslt. For most, it is very similar to the one in the sample but with few minor tweaks. The new xslt now converts the input docx file into a properly formatted(as I need) html for IE8, Mozilla or Chrome.
I am trying to convert a PDF document to a single HTML file in java. Most of the converters online converts one PDF file to multiple HTML files. I want to convert the whole PDF to a single HTML file.
Any suggestions?
Any suggestions?
You might always write some code using the JSoup API to write a single document that incorporates the body of each of the multiple HTML files. Combining styles & style-sheets (CSS) might be a bit more tricky (especially if the original HTML uses 'id' elements).
Though I find it hard to believe there is not a converter out there in which 'single document' is an option. I recommend searching further.
I think it should be possible to parse your PDF document with itext and then generate your html file.
I must admit I haven't checked if it is doable though.
Have you looked at http://www.jpedal.org/html_index.php which has an optiont to write to single file.
How can I merge individual selected PDF files into one PDF upon download?
I want to achieve the following:
http://annualreport2010.landsecurities.com/create-your-own-report.aspx
Do I require an ASP website, or could I do something similar using a static HTML site?
Static HTML won't do it.
You need something on the server side. The other answers have options that would work, I just wanted to also mention pdftk, which you could then call from the server side. Be sure to escape all file names and such, though, because you would have to use system calls.
pdftk is really easy. The very first example from their documentation shows how to merge several PDFs, named 1.pdf, 2.pdf and 3.pdf, into one PDF called 123.pdf:
pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
For PHP, there even is pdftk-php, if you want to look into that.
I think that the Apache PDFBox project can be good for you.
In particular, take a look at the PDFMerger class.
You can also use iText but in my opinion it's less easy to use.
See if itext can help in merging. A quick search gives many links - like Java: Merging multiple PDFs into a single PDF using iText.
There is a Java PDF merging software at http://codesforus.blogspot.com/.
p.s. They link to this download page: http://messiahpsychoanalyst.org/Documents/Downloads.html#part1
You can find Source code for PDF split and Merge Source forge PDF SAM
I want to be able to open up documents containing a combination of one or two pictures and text from java. The documents don't have to be pretty, but I need to be able to switch documents relatively quickly. I'm trying to figure out what the easiest method to do this is.
I can save the documents in whatever format is easiest for me, for instance html or PDF. But the documents must be somewhat easy to modify or generate new ones. I don't care if the document is displayed within a java frame or by an external tool so long as the tool is common enough to be installed on most OS and I can switch documents quickly and without too much hassle. This is an internal tool so it doesn't have to work at professional level quality.
Unfortunately, various company limitations make it a real hassle to get approval to use open source packages that haven't been pre-approved. So I can't do the obvious thing and grab an open source implementation of PDF or HTML reader for java.
So, any suggestions on the easiest format for my documents and how to read it?
You can use XHTML. So, your document will be directory that contains HTML document and image files as-is. you do not need anything beyond JDK to implement this and can use any browser to view such document. Modification is easy too.
Note: I said XHTML as a HTML that can be parsed using regular XML parser. I think it is the best choice for you.
well i have been looking for a java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for this requirement -
1. Designing a PDF template using a IDE (eg. Livecycle designer ..which is not free)
2. Then at runtime using java, populate data into this PDF template...either using xml or other datasources...
such a simple requirement and NONE has a good "open-source and free" solution yet ! Is anyone aware of any ? I have been searching for since 3-4 years now..for a clean way out...
Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.
Jasper - ireport is also good but that tool does not have a table concept and is kind of annoying ! Also barcode support is not good.
XSL-FO has not free IDE for design .
Looking for a better answer .. got one ?
If it's a "simple requirement", you could create a report designer around iText and release it as FOSS yourself.
What are your key requirements? Does your input have to be a PDF? If so, you'll be probably working uphill for a long time still. Obviously you want to inject data and output a PDF.
If your templates can be something other than PDF, you could try using the OpenOffice API to get OpenOffice to do manipulate documents and produce a PDF. JODReports or Docmosis would be better ways of interacting with OpenOffice and Docmosis allows you to treat documents (doc and odt) as templates.
You can create a PDF file with AcroField through iText API AcroField values can be populated.
Note: Using OpenOffice you can create PDF document with FormFields.
http://blog.rubypdf.com/2007/08/01/freely-fill-pdf-form-with-the-help-of-itext-or-itextsharp/
You could use OpenOffice's UNO API. It looks rather heavyweight but at least you get something full-featured.
Have a look at XDocReport. You create your templates in word .docx or OpenOffice .odt files, then turn them into populated PDF files with Java code.