IO Issue - Byte Array Image into XHTML(FlyingSaucer) - java

I have a solution that inserts strings into an XHTML document and prints the results as Reports. My employer has asked if we could pull images off their SQL database (stored as byte arrays) to insert into the Reports.
I am using FlyingSaucer as the XHTML interpreter and I've been using Java DOM to modify pre-stored reports that I have stored in the Report Generator's package.
The only solution I can think of at the moment is to construct the images, save them as a file, link the file in an img tag (or background-image) in a constructed report, print the report and then delete the file. This seems really sloppy and I imagine it will be very time consuming.
I can't help but feel there must be a more elegant solution. Any suggestions for inserting a byte array into html?

Read the image and convert it into it's Base64-encoded form:
InputStream image = getClass().getClassLoader().getResourceAsStream("image.png");
String encodedImage = BaseEncoding.base64().encode(ByteStreams.toByteArray(image));
I've used BaseEncoding and ByteStreams from Google Guava.
Change src attribute of img element within your Document object.
Document doc = ...; // get Document from XHTMLPanel.getDocument() or create
// new one using DocumentBuilderFactory
doc.getElementById("myImage").getAttributes().getNamedItem("src").setNodeValue("data:image/png;base64," + encodedImage);
Unfortunatley FlyingSaucer does not support DataURIs out-of-the-box so you'll have to create your own ReplacedElementFactory. Read Using Data URLs for embedding images in Flying Saucer generated PDFs article - it contains a complete solution.

Related

Replacing text in XWPFParagraph without changing format of the docx file

I am developing font converter app which will convert Unicode font text to Krutidev/Shree Lipi (Marathi/Hindi) font text. In the original docx file there are formatted words (i.e. Color, Font, size of the text, Hyperlinks..etc. ).
I want to keep format of the final docx same as the original docx after converting words from Unicode to another font.
PFA.
Here is my Code
try {
fileInputStream = new FileInputStream("StartDoc.docx");
document = new XWPFDocument(fileInputStream);
XWPFWordExtractor extractor = new XWPFWordExtractor(document);
List<XWPFParagraph> paragraph = document.getParagraphs();
Converter data = new Converter() ;
for(XWPFParagraph p :document.getParagraphs())
{
for(XWPFRun r :p.getRuns())
{
String string2 = r.getText(0);
data.uniToShree(string2);
r.setText(string2,0);
}
}
//Write the Document in file system
FileOutputStream out = new FileOutputStream(new File("Output.docx");
document.write(out);
out.close();
System.out.println("Output.docx written successully");
}
catch (IOException e) {
System.out.println("We had an error while reading the Word Doc");
}
Thank you for ask-an-answer.
I have worked using POI some years ago, but over excel-workbooks, but still I’ll try to help you reach the root cause of your error.
The Java compiler is smart enough to suggest good debugging information in itself!
A good first step to disambiguate the error is to not overwrite the exception message provided to you via the compiler complain.
Try printing the results of e.getLocalizedMessage()or e.getMessage() and see what you get.
Getting the stack trace using printStackTrace method is also useful oftentimes to pinpoint where your error lies!
Share your findings from the above method calls to further help you help debug the issue.
[EDIT 1:]
So it seems, you are able to process the file just right with respect to the font conversion of the data, but you are not able to reconstruct the formatting of the original data in the converted data file.
(thus, "We had an error while reading the Word Doc", is a lie getting printed ;) )
Now, there are 2 elements to a Word document:
Content
Structure or Schema
You are able to convert the data as you are working only on the content of your respective doc files.
In order to be able to retain the formatting of the contents, your solution needs to be aware of the formatting of the doc files as well and take care of that.
MS Word which defined the doc files and their extension (.docx) follows a particular set of schemas that define the rules of formatting. These schemas are defined in Microsoft's XML Namespace packages[1].
You can obtain the XML(HTML) format of the doc-file you want quite easily (see steps in [1] or code in link [2]) and even apply different schemas or possibly your own schema definitions based on the definitions provided by MS's namespaces, either programmatically, for which you need to get versed with XML, XSL and XSLT concepts (w3schools[3] is a good starting point) but this method is no less complex than writing your own version of MS-Word; or using MS-Word's inbuilt tools as shown in [1].
[1]. https://www.microsoftpressstore.com/articles/article.aspx?p=2231769&seqNum=4#:~:text=During%20conversion%2C%20Word%20tags%20the,you%20can%20an%20HTML%20file.
[2]. https://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/testcases/org/apache/poi/hwpf/converter/TestWordToHtmlConverter.java
[3]. https://www.w3schools.com/xml/
My answer provides you with a cursory overview of how to achieve what you want to, but depending on your inclination and time availability, you may want to use your discretion before you decide to head onto one path than the other.
Hope it helps!

Get Row and Col for embedded Object with POI

i'm currently working with Excel files (*.xlsm) and Apache POI , and i have been cracking my head over a task.
I receive some excel files that have PDFs embedded in it and i want to extract them and rename them based on the row and column they are in.
This seems weird as i know the embedded objects are represented as images ,they can occupy more than one cell and technically they are not "In" the cell.
The following code snippet lets me extract the embedded PDFs but they are named OleObject[1..2..3.etc..] wich doesnt give me any clue.
inStream = new FileInputStream(file);
XSSFWorkbook workbook = new XSSFWorkbook(inStream);
for (PackagePart pPart : workbook.getAllEmbedds()) {
String contentType = pPart.getContentType();
if (contentType.equals("application/vnd.openxmlformats-officedocument.oleObject")){
POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
TikaInputStream stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));
byte[] bytes = IOUtil.toByteArray(stream);
stream.close();
OutputStream outStream = new FileOutputStream(new File(ROOT_DIRECTORY.getAbsolutePath()+"\\PDF"+i+".pdf"));
IOUtil.copy(bytes, outStream);
outStream.close();
}}
I wanted to know if org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet will let me see the xml code of the excell sheet and maybe eith taht i can get the info i need. Like this.
<oleObjects><mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"><mc:Choice Requires="x14"><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"><objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr></oleObject></mc:Choice><mc:Fallback><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"/></mc:Fallback></mc:AlternateContent></oleObjects>
--
<objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr>
I guess using the anchor information would be possible but im just unable to find how to get it.
Hope this information makes things clear on what im trying to do .
Thanks in advance.
I've looked at the source code for the current poi-ooxml-schemas sources jars which you can locate here: http://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.3/
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet extends org.apache.xmlbeans.XmlObject which can give you the XML as a string using the inherited .toString() method. Or you can quickly access the list of OLE objects in the worksheet by calling getOleObjects() on your CTWorksheet object.
/**
* Gets the "oleObjects" element
*/
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObjects getOleObjects();
CTOleObjects itself extends org.apache.xmlbeans.XmlObject and again you can get the XML using toString() for parsing, or get a list of org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject OLE objects for iteration using CTOleObjects.getOleObjectList().
/**
* Gets a List of "oleObject" elements
*/
java.util.List<org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject> getOleObjectList();
CTOleObject doesn't seem to have getter methods to get the and child XML elements to allow you to determine the columns, so I think you would need to do some XML parsing, or string searching to get this info if it is contained in the string XML representation.
Hope this helps.

How to create a PDF/DOCX files in Java/Scala?

I am creating a web application which will accept some inputs from user (like name, age, address etc) and generate some predefined forms with filled information for user to download and print.
For example, an Application Form for driving license or something along those lines. The backend will have the format information about the document to be generated and other information will be gathered from user from front-end.
I am going to use Play Framework 2.5 for this and Java/Scala as programming language. But right now I am not aware if there are any free libraries/APIs that I can use to achieve this document generation.
I should be able to manipulate the font size, style, indentations, paragraphs, page borders, page numbers, alignments, document headers and footers, page size (A4, Legal etc) some other basic stuff. And I need documents in format that are widely supported for editing and printing purposes. Like PDF, DOCX for example. DOCX is preferred so user can edit something after downloading the document before taking a print out.
I have used the apache POI library to parse and create ms word documents (including docx) files:
http://www.tutorialspoint.com/apache_poi_word/apache_poi_word_quick_guide.htm
It's not amazing but it's the best I've found :)
I have used docx4j.jar which simply converts xhtml to docx.
What you can do for your requirement is save your format information as xhtml template and place input from form (like name,age,address etc) into the template at runtime.
This is a sample code to refer from this link
public static void main(String[] args) throws Exception
{
String xhtml=
"<table border=\"1\" cellpadding=\"1\" cellspacing=\"1\" style=\"width:100%;\"><tbody><tr><td>test</td><td>test</td></tr><tr><td>test</td><td>test</td></tr><tr><td>test</td><td>test</td></tr></tbody></table>";
// To docx, with content controls
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
wordMLPackage.getMainDocumentPart().getContent().addAll(
XHTMLImporter.convert( xhtml, null) );
wordMLPackage.save(new java.io.File("D://sample.docx"));
}

How to add custom XML storage part to Word doc - preferrably with docx4j

I'm trying to populate a Word content control with XML data using docx4j (version 3.2.1). I'm evaluating this in order to use it for invoice generation. The documents we want to generate are not very complicated so this looks like a good approach to me.
I have created the content control through Word 2010 dev tools. This is how I try to inject the XML into the docx (taken from this example):
WordprocessingMLPackage wordMLPackage = Docx4J.load(new File(input_DOCX));
FileInputStream xmlStream = new FileInputStream(new File(input_XML));
Docx4J.bind(wordMLPackage, xmlStream, Docx4J.FLAG_BIND_INSERT_XML & Docx4J.FLAG_BIND_BIND_XML);
I get the following exception:
org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't find CustomXmlDataStoragePart! exiting..
at org.docx4j.Docx4J.bind(Docx4J.java:300)
at org.docx4j.Docx4J.bind(Docx4J.java:271)
How can I add the CustomXmlDataStoragePart with docx4j, if it doesn't exist yet? Or should/can I do this in Word directly?
Note: I decided to prepare templates in Word directly, because later on these templates must be edited by non-technical users and I don't want to burden them with extra tools, if possible.
You say you "created the content control through Word 2010 dev tools". Unless you mean the content control toolkit, you need to use that or better, either of the OpenDoPE Word addins. Not both.
These tools add a custom xml part into the docx, and allow you to associate it with your content controls via XPath data bindings.
Then, when at runtime you invoke Docx4J.bind, docx4j finds that existing custom xml part, and replaces it with the xml file you provide which contains your runtime data.

unable to retrive inlines images from the mail body in Lotus Notes

I am unable to retrive inline images/screen shot from Java in Lotus Notes from
document.getItemValueString('Body')
By above function am i able retrive text available in mailbody not inline images.
Please provide your suggestions in order to retrive inlines images from the mail body
Thanks in advance.
LSP Jyothi
First of all: Body is a NotesRichtextItem. You would have to use the NotesRichtextItem- methods and properties to get the inline- image... if there where any for that purpose.
Inline- images are not handled by any means in LotusScript. To get them, you need to:
Export the document as XML
Find the part in the XML that represents the inline image
take the Base64- encoded value there and convert it into a binary format, use Mime- Classes for that (Trick).
Write the data to a file
There is a lot of code involved in doing this. I just post the "crucial" parts of the code here (untested, no syntax check, just as a starting point):
EDIT: Sorry, I am not an expert in Java and only saw the tag "lotusscript", therefor my example is LotusScript- Code (should be similar with java, and I think Base64- operations are alreays built in in java, no need to use the Mime- Trick)
Dim strDxl as String
Dim strFoundBase64 as String
Dim exporter as NotesDXLExporter
Dim stream as NotesStream
Dim docConvert as NotesDocument
Dim mimeEntity as NotesMimeEntity
Set exporter = session.CreateDXLExporter
exporter.ConvertNotesBitmapsToGIF = True
strDxl = exporter.Export(document)
'- Search through strDxl and find everything that is in the following tags:
'- <gif></gif>, <gif originalformat='notesbitmap'></gif>, <jpeg></jpeg>, <png></png>
strFoundBase64 = ...'assign text between tags
'- use Mime class to convert to binary
Set docConvert = New NotesDocument( document.ParentDatabase )
Set mimeEntity = docConvert.CreateMIMEEntity
Call mimeEntity.SetContentFromBytes(strFoundBase64, "image/gif", ENC_BASE64)
'- Write result to file
Set stream = ses.CreateStream
Call stream.Open( "C:\Temp\image.gif", "binary")
Call mimeEntity.GetContentAsBytes(stream)
Call stream.Close()

Categories