Converting Document object to Byte[]

Converting Document object to Byte[] - java

I am init Document object like this:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
After that I am building an XML file by inserting data to the doc object.
Finally I am writing the contents to a file on my computer.
My question is how to write the contents of doc in to a byte[]?*
This is how i write the content to the XML file:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("changeOut.xml"));
// Output to console for testing
// StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

Pass OutputStream instead of File to the StreamResult constructor.
ByteArrayOutputStream bos=new ByteArrayOutputStream();
StreamResult result=new StreamResult(bos);
transformer.transform(source, result);
byte []array=bos.toByteArray();

This work for me:
public byte[] documentToByte(Document document)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
org.apache.xml.security.utils.XMLUtils.outputDOM(document, baos, true);
return baos.toByteArray();
}

Put a ByteArrayOutputStream where you have the File and you should be good.

Related

How to get PDF in byte[] without forming a file?

Below is my code snippet:
try (OutputStream out = new FileOutputStream(PDF_NAME)) {
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(new StreamSource(xsltFile));
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(new StreamSource(IOUtils.toInputStream(xml, "UTF-8")), res);
}
byte[] inputFile = Files.readAllBytes(Paths.get(PDF_NAME));
String encodedFile = Base64.getEncoder().encodeToString(inputFile);
InventoryListSnapshot pojo = new InventoryListSnapshot(invList.getInventoryLayoutId(), invList.getProjectId(), invList.getAuthorUsername(), encodedFile);
repository.save(pojo);
It used xsl-fo to form PDF in the file. I need to place this PDF encoded by Base64 as BLOB into DB - so I don't use the file itself.
How can I save PDF into DB without forming a file?

You would change this:
OutputStream out = new FileOutputStream(PDF_NAME)
to
OutputStream out = new ByteArrayOutputStream()

Thanks, it's works.
The new version is:
byte[] pdf;
try (OutputStream out = new ByteArrayOutputStream()){
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(new StreamSource(xsltFile));
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(new StreamSource(IOUtils.toInputStream(xml, "UTF-8")), res);
pdf = ((ByteArrayOutputStream) out).toByteArray();
}
String encodedFile = Base64.getEncoder().encodeToString(pdf);
InventoryListSnapshot pojo = new InventoryListSnapshot(invList.getInventoryLayoutId(), invList.getProjectId(), invList.getAuthorUsername(), encodedFile);
repository.save(pojo);

Creating PDF File from dynamic invoice and XSLT

I have been trying to generate PDF files for customer invoices for a long time. Invoices are saved as xml files. And customers can have their own xslt file in order to have their own invoice view( if not default one is used as xslt ).
My Problem is transforming XML/ (X)HTML files to pdf files. I have read almost about all libraries for doing this and tried to transform almost with all of them.
1) Apache FOP
http://www.javaworld.com/article/2071749/java-app-dev/convert-html-content-to-pdf-format.html
I transformed invoice xml to xhtml using default xslt and jtidy. And then I tried to convert generated xhtml to pdf with XSL-FO given by Antenna House.I managed to generate a pdf file with just header. also no success. Code for doing this below.
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(getClass().getResourceAsStream("/xslts/general.xslt"));
// xslt.setSystemId("/xslts/general.xslt");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = factory.newTransformer(xslt);
DOMSource domSource = new DOMSource(document);
transformer.transform(domSource, result);
String strResult = writer.toString();
Tidy tidy = new Tidy();
// tidy.setDropEmptyParas(true);
// tidy.setJoinStyles(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
ByteArrayInputStream boas = new ByteArrayInputStream(strResult.getBytes("UTF-8"));
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream baoOut = new FileOutputStream(new File("C:\\Users\\xxx\\out.pdf"));
Document tiedDoc = tidy.parseDOM(boas, bos);
DOMSource tiedDocDomSource = new DOMSource(tiedDoc);
StringWriter writer2 = new StringWriter();
StreamResult result2 = new StreamResult(writer2);
Transformer xsl2foTrans = factory.newTransformer(new StreamSource(getClass().getResourceAsStream("/xslt/xhtml2fo.xsl")));
xsl2foTrans.transform(tiedDocDomSource, result2);
// // ab hier
final FopFactory fopFactory = FopFactory.newInstance(new File(".").toURI());
File userConfig = new File("C:\\Users\\xxx\\Desktop\\pdfWork\\fop.xconf");
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
// configure foUserAgent as desired
// Setup output stream. Note: Using BufferedOutputStream
// for performance reasons (helpful with FileOutputStreams).
OutputStream out = baoOut;
out = new BufferedOutputStream(out);
// Construct fop with desired output format
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
// Setup JAXP using identity transformer
transformer = factory.newTransformer(); // identity transformer
// Setup input stream
Source src = new StreamSource(new StringReader(writer2.toString()));
// Resulting SAX events (the generated FO) must be piped through to FOP
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
transformer.transform(src, res);
out.close();
2) IText
as far as i know, we should provide a valid xhtml to IText for generating PDF Files. so i transform invoice xml to html using default.xslt and then to xhtml using jtidy with setXhtml option true. i managed to generate pdf file from given xhtml. But pdf is not rendered well. somehow, css in style tag are not recognized. no success. code for doing this below
StreamSource xslt = new StreamSource(getClass().getResourceAsStream("/xslt/general.xslt"));
// StreamSource xslt = new StreamSource(new FileInputStream(new File("C:\\Users\\XXX\\Desktop\\pdfWork\\firm.xslt")));
TransformerFactory factory = TransformerFactory.newInstance();
// Source xslt = new StreamSource(getClass().getResourceAsStream("/xslts/general.xslt"));
// xslt.setSystemId("/xslts/general.xslt");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = factory.newTransformer(xslt);
DOMSource domSource = new DOMSource(document);
transformer.transform(domSource, result);
String strResult = writer.toString();
Tidy tidy = new Tidy();
// tidy.setDropEmptyParas(true);
tidy.setJoinStyles(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
ByteArrayInputStream boas = new ByteArrayInputStream(strResult.getBytes("UTF-8"));
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream baoOut = new FileOutputStream(new File("C:\\Users\\XXX\\Desktop\\pdfWork\\out.pdf"));
tidy.parseDOM(boas, bos);
com.itextpdf.text.Document documentText = new com.itextpdf.text.Document(PageSize.LETTER); // PageSize.A4, 10.0F, 10.0F, 10.0F, 0.0F
PdfWriter pdfWriter = PdfWriter.getInstance(documentText, new FileOutputStream(new File("C:\\Users\\Onur\\Desktop\\pdfWork\\out.pdf")));
documentText.open();
// documentText.open();
// HTMLWorker htmlWorker = new HTMLWorker(documentText);
// htmlWorker.parse(new StringReader(IOUtils.toString(new ByteArrayInputStream(bos.toByteArray()), "UTF-8")));
// documentText.close();
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
worker.parseXHtml(pdfWriter, documentText, new StringReader(IOUtils.toString(new ByteArrayInputStream(bos.toByteArray()))));
documentText.close();
3) Flying Saucer
I did almost same steps as IText. I managed to generate PDF with recognized css style tag. single problem i have is table and some elements overflow. they are not fitting to the page. i solved this problem with page rule as suggested on
How can i make my html page to be fit in the pdf using Flying Saucer
Document document = // invoice as document
StreamSource xslt = new StreamSource(getClass().getResourceAsStream("/xslt/general.xslt"));
TransformerFactory factory = TransformerFactory.newInstance();
// Source xslt = new StreamSource(getClass().getResourceAsStream("/xslts/general.xslt"));
// xslt.setSystemId("/xslts/general.xslt");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
Transformer transformer = factory.newTransformer(xslt);
DOMSource domSource = new DOMSource(document);
transformer.transform(domSource, result);
String strResult = writer.toString();
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
ByteArrayInputStream boas = new ByteArrayInputStream(strResult.getBytes("UTF-8"));
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream baoOut = new FileOutputStream(new File("C:\\Users\\XXX\\Desktop\\pdfWork\\out2.pdf"));
tidy.parseDOM(boas, bos);
System.out.println(bos.toString("UTF-8"));
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("/unicode/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
// renderer.setDocument();
renderer.setDocumentFromString(IOUtils.toString(new ByteArrayInputStream(bos.toByteArray())));
// renderer.setPDFVersion('');
renderer.layout();
renderer.createPDF(baoOut);
renderer.finishPDF();
baoOut.flush();
baoOut.close();
as i said, i managed to generate PDF file from xhtml using flying saucer. But i had to add page rule and some inline styling in general.xslt for doing this. But problem is that each customer can have his own xslt for invoice view. so i dont want to touch and change xslt. general.xslt can be downloaded from link below. How can i achieve this ? is it possible ? Or what i am doing wrong ? Thanks in Advance!
http://www.efatura.gov.tr/dosyalar/kilavuzlar/UBL-TR1.2_Paketi.zip

How to convert Word DOCX to HTML using java

I am using the following code:
My code is converting Doc document to HTML only. I need to convert Docx document to HTML.
try
{
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("C:\\DOC.doc"));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
wordToHtmlConverter.processDocument(wordDocument);
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);
ConvertDocxBigToXHTML html = new ConvertDocxBigToXHTML();
html.creatHTML(result);
}
catch(Exception e)
{
e.printStackTrace();
}
Can someone help me to what changes i have to do above this code

how to read the formated text as a html text from ms word(.doc) using poi?

I want to read the formated text as a html text like(<html><b>boldvalue<b><img src"link" ></html>) also i want to get the image using the image tag link. I'm using poi does poi have any option to get data like this in html format?

try this
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);

Convert Word to HTML with Apache POI

I see that there is a converter called WordToHtmlConverter but the process method is not exposed. How should I pass a doc file and get HTML file (or OutputStream)?

This code is now working for me!
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder()
.newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
System.out.println(result);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Converting Document object to Byte[] - java

Pass OutputStream instead of File to the StreamResult constructor. ByteArrayOutputStream bos=new ByteArrayOutputStream(); StreamResult result=new StreamResult(bos); transformer.transform(source, result); byte []array=bos.toByteArray();

This work for me: public byte[] documentToByte(Document document) { ByteArrayOutputStream baos = new ByteArrayOutputStream(); org.apache.xml.security.utils.XMLUtils.outputDOM(document, baos, true); return baos.toByteArray(); }

Put a ByteArrayOutputStream where you have the File and you should be good.

Related

How to get PDF in byte[] without forming a file?

Creating PDF File from dynamic invoice and XSLT

How to convert Word DOCX to HTML using java

how to read the formated text as a html text from ms word(.doc) using poi?

Convert Word to HTML with Apache POI

Categories

Resources