Clipboard format for DOCX data - java

My Java application generates a document in DOCX format using DocX4J. I need to send it into clipboard to be pasted in Word. I know that Word will consume HTML, but I rather not convert DOCX to HTML (I am not sure if DocX4J supports it and I rather not loose any formatting). What clipboard format (in Java terms DataFlavor) can I use to send DOCX data to clipboard so Word will understand it?
I am doing a similar thing with OpenOffice document and for that I use
DataFlavor odtFlavor = new DataFlavor("application/x-openoffice-embed-source-xml;"+
"representationclass=java.io.InputStream");
How should I represent the DOCX document itself? In case of OpenOffice ODT I pass the InputStream created from the ODT file.
I believe a similar question has been asked by David Thielen here: What are the clipboard formats for Microsoft Office where you can drop a chart? but there are no answers.

Worst case, docx4j can export to HTML, so you could do that.
Or you could use RTF. docx4j uses FOP to create PDF, so you could use
the XSL FO output to create RTF (FOP can do that - your mileage may
vary).
Not sure which of these will give you better quality. Possibly the
HTML (though what happens to images?).
Or you could make a basic docx to RTF converter.
There may be a way to use the docx format.
If you copy from Word, and look at it in ClipSpy (binary available in
the source download from CodeProject), you'll see "Embed Source" is
the data as a docx in OLE.
how to reload saved "Embed Source" clipboard data?
says you can write your own "Embed Source" by passing
Clipboard.SetData a stream object
Seems to depend whether you want to copy/paste or drag/drop though.
See further your link What are the clipboard formats for Microsoft Office where you can drop a chart? and
http://social.msdn.microsoft.com/Forums/en/worddev/thread/84263fb9-61c2-424a-a294-a12f69fd6b1b

Related

Why extracting tables in a converted docx work better than in the original PDF?

I'm trying to perform automaticaly table extraction inside PDF. I know there are several libraries and methods Java and Python, but to my surprise, the method that has worked best for me is to convert my Pdf to a Docx document and from there to extract the tables (thanks to: How to get pictures and tables from .docx document using apache poi?).
My question is this: Assuming that within the format conversion there may be loss of information, why are my results better this way? Tabula hasn't been able to do better automatically. To understand this, I have looked for information (e.g. Extracting table contents from a collection of PDF files) but I'm still very confused.
PD: For the moment, I have used https://github.com/thoqbk/traprange (A method based on Pdfbox), How to extract table as text from the PDF using Python? (PyPdf2) and Tabula. When I get to my home I going to put code and cases, I'm writing from my smartphone.

How to append/paste bufferedImage into a Word or RTF document using Java?

I created a Microsoft Word document and tried to write the buffered image to it but all I got was garbled text. Is there a way to write (preferably append) a buffered image to a doc or RTF file?
I want to avoid using docx4j or iText or any external package for that matter due to some constraints. But if there is no other way then please do let me know.
My code in case anyone needs for reference:
ps_file = new File("ps_file.doc");
ImageIO.write(i1, "jpg", ps_file);
Word Documents have their own syntax to store their data so you can't just append text to them and expect it to just work.
You will have to use a 3rd party library unless if you're willing to reinvent the car.
You can however create an RTF file which stores the image. There's a question similar to it that's been answered here:
Programmatically adding Images to RTF Document
Obviously it's for C# but the same procedures can easily be applied in Java.

Print xml in pdf using itext

I want to print xml in pdf using itext in java, as well formatted and displayed in color and indention as well like shown in notepad++,
any api or suggestion regarding this?
I have converted XHTML to pdf, via iText, using flying saucer for the rendering (previously xhtml renderer).
http://code.google.com/p/flying-saucer/
You can format using CSS, though I do remember it's slightly temperamental, however you can tweak it to get what you want, and end up with something nicely formatted.
I wasn't sure whet you meant regarding Notepad++ - I don't have PDF support there, just opens as Binary file contents, unless there is a PDF plugin you use?
::Answer updated after comments below.
Thanks for the comment, I understand the question much better now. I thought you wanted to output the data in the XML in the PDF, now I understand you want to see the raw XML itself in the PDF, formatted as you'd see XML formatted in Notepad, colours and all.
XML is a markup language designed to describe data, so you want to get this into a language that can descibe the presentation and style as well as the data. I'd suggest
1) Convert the XML to XHTML - so all the XML (tags, attributes) is your content, and you have classes describing each type (for example, attribute names, attribute values, starter tag, end tag). I don't know if you can use an XSLT library to transform it this way, oterwise you can write something yourself in Java, walking through the DOM and output it in the way you want. This way you can
2) Create CSS to style your classes as you want - e.g. have all attribute names as text color "red"
3) Use iText and flying saucer as above to convert the XHTML and CSS into PDF using Java, as described in original answer

Convert Word Document to PDF using Java

I want to convert ms word document to PDF file using POI.jar(read the MS word Content) and Itext.jar(Creat the PDF File).
For Plain text in MS word, I am able to conver into PDF. But I have few images on ms word. I want to put those images on PDF.
Could some please help me out?
You lucky man i just stumbled upon JODConverter it uses openoffice to covert through java and its very easy to use.
There isn't such a solution for free, you will have to buy something like Aspose components, but you can also save the Word document as HTML and use any of the available HTML-to-PDF tools to convert it to PDF using Java. One of them is wkhtmltopdf.

Print XML as PDF using PCL

I have an xml file with which i want to print as a PDF using PCL. I am new to PCL. Can i use PCL to get the xml printed in PDF format directly or should i have some intermediate process to create a PDF file and then use PCL to get it printed as PDF?
If you have a xml, there are two ways to recieve PDF file.
1. Create stylesheet for your xml, and use XEP
or
2. use just your xml and VisualXSL, which will help you create your pdf for print.
More additional: If you will create your xsl stylsheet, you can format by XEP many type of PDFs, for example PDF/1A, or another levels
Both XEP and VisualXSL are Renderx products(http://www.renderx.com/tools/index.html) and they have trial versions, that you can use:). I have used both products many times, and was satisfied.
You can also visit the forum where you can find answers about how to use and how usefull are products described above. http://cooltools.renderx.com
PCL is a printer control language. In other words command bytes you send to a (usually HP) printer which is then converted to ink on a page. This is normally not the way you will generate a PDF since too much information from the original will be lost.
You will normally want to convert your XML to something describing the actual print you want to have. A reasonable choice for this is the XSL-FO XML dialect which, however, is not very nice to do by hand. You can then choose to convert your XML into DocBook XML which in turn has very nice style sheets for converting further on to XSL-FO and other formats.
You can then use Apache FOP to convert XSL-FO into many formats, one being PDF. This allows you to - if FOP gets too small - to replace with one of several commercial XSL_FO rendering engines at a later date.

Categories