How to get the drawings from the apache POI XWPFDocument? - java

I tried to get the drawings from the XWPFDocument by this way(My data.docx only contains one rectangle and it's text).
XWPFDocument wordDocumentObj = new XWPFDocument(new FileInputStream(new File("data.docx")));
Iterator<IBodyElement> bodyElementIterator = wordDocumentObj.getBodyElementsIterator();
while(bodyElementIterator.hasNext()){
IBodyElement element = bodyElementIterator.next();
if (element instanceof XWPFParagraph) {
XWPFParagraph paragrapObj = (XWPFParagraph)element;
for(IRunElement irunObj : paragrapObj.getIRuns()) {
XWPFRun runObj = (XWPFRun)irunObj;
// I read whole the API doc, I think it is the only way to get the drawings
System.out.println(runObj.getCTR().getDrawingList());// No element returned
System.out.println(runObj.getCTR().getDrawingArray());// No element returned
}
}
}
Do you have any idea to get the drawings from the XWPFDocument?
Updated: The XML content of XWPFRun. I tried to extract the word file. There is no image in the /word/* directory:
<xml-fragment >
<mc:AlternateContent>
<mc:Choice Requires="wps">
<w:drawing>
<wp:anchor>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
<wps:wsp>
<wps:txbx>
<w:txbxContent>
<w:p w14:paraId="2744738E" w14:textId="0811E43C" w:rsidR="00832A19" w:rsidRDefault="00832A19" w:rsidP="00832A19">
<w:r>
<w:t>Some text here</w:t>
</w:r>
</w:p>
</w:txbxContent>
</wps:txbx>
</wps:wsp>
</a:graphicData>
</a:graphic>
</wp:anchor>
</w:drawing>
</mc:Choice>
<mc:Fallback>
<w:pict>
<v:rect w14:anchorId="684D682E" id="Rectangle 2" o:spid="_x0000_s1026" style="" fillcolor="#4f81bd [3204]" strokecolor="#243f60 [1604]" strokeweight="2pt">
<v:textbox>
<w:txbxContent>
<w:p w14:paraId="2744738E" w14:textId="0811E43C" w:rsidR="00832A19" w:rsidRDefault="00832A19" w:rsidP="00832A19">
<w:r>
<w:t>Some text here</w:t>
</w:r>
</w:p>
</w:txbxContent>
</v:textbox>
</v:rect>
</w:pict>
</mc:Fallback>
</mc:AlternateContent>
</xml-fragment>

Your provided XML shows, your Word document uses alternate content which was introduced after publishing Office Open XML in 2007. So apache poi does not provide methods to get that content as it only provides methods for Office Open XML according standard ECMA-376. That is because the underlying ooxml-schemas were created from that ECMA-376 standard only.
So the drawing elements in the AlternateContent elements only can be got using XML (XPath) methods directly.
This could look like so:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordGetAllDrawingsFromRuns {
private static List<CTDrawing> getAllDrawings(XWPFRun run) throws Exception {
CTR ctR = run.getCTR();
XmlCursor cursor = ctR.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:drawing");
List<CTDrawing> drawings = new ArrayList<CTDrawing>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
CTDrawing drawing = CTDrawing.Factory.parse(obj.newInputStream());
drawings.add(drawing);
}
return drawings;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDocument.docx"));
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph) bodyElement;
for(IRunElement runElement : paragraph.getIRuns()) {
if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun) runElement;
List<CTDrawing> drawings = getAllDrawings(run);
System.out.println(drawings);
}
}
}
}
document.close();
}
}
But next problem will be how to get contents out of the drawing elements then as <wps:wsp><wps:txbx> also is not part of Office Open XML according standard ECMA-376. So ooxml-schemas methods of CTDrawing also are not able to get those. So if the need is then getting the text box contents from the drawing, this also is only possible using XML (XPath) methods directly.
This could look like so then:
private static CTTxbxContent getTextBoxContent(CTDrawing drawing) throws Exception {
XmlCursor cursor = drawing.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:txbxContent");
List<CTTxbxContent> txbxContents = new ArrayList<CTTxbxContent>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
CTTxbxContent txbxContent = CTTxbxContent.Factory.parse(obj.newInputStream());
txbxContents.add(txbxContent);
break;
}
CTTxbxContent txbxContent = null;
if (txbxContents.size() > 0) {
txbxContent = txbxContents.get(0);
}
return txbxContent;
}

Related

How to read < as < from an XML? [duplicate]

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

Apache FOP - is there a way to embed font programmatically?

When creating a PDF using Apache FOP it is possible to embed a font with configuration file. The problem emerges when the app is a web application and it is necessary to embed a font that is inside WAR file (so treated as resource).
It is not acceptable to use particular container's folder structure to determine where exactly the war is located (when in configuration xml file we set tag to ./, it is set to the base folder of running container like C:\Tomcat\bin).
So the question is: Do anyone know the way to embed a font programatically?
After going through lots of FOP java code I managed to get it to work.
Descriptive version
Main idea is to force FOP to use custom PDFRendererConfigurator that will return desired font list when getCustomFontCollection() is executed.
In order to do it we need to create custom PDFDocumentHandlerMaker that will return custom PDFDocumentHandler (form method makeIFDocumentHandler()) which will in turn return our custom PDFRendererConfigurator (from getConfigurator() method) that, as above, will set out custom font list.
Then just add custom PDFDocumentHandlerMaker to RendererFactory and it will work.
FopFactory > RendererFactory > PDFDocumentHandlerMaker > PDFDocumentHandler > PDFRendererConfigurator
Full code
FopTest.java
public class FopTest {
public static void main(String[] args) throws Exception {
// the XSL FO file
StreamSource xsltFile = new StreamSource(
Thread.currentThread().getContextClassLoader().getResourceAsStream("template.xsl"));
// the XML file which provides the input
StreamSource xmlSource = new StreamSource(
Thread.currentThread().getContextClassLoader().getResourceAsStream("employees.xml"));
// create an instance of fop factory
FopFactory fopFactory = new FopFactoryBuilder(new File(".").toURI()).build();
RendererFactory rendererFactory = fopFactory.getRendererFactory();
rendererFactory.addDocumentHandlerMaker(new CustomPDFDocumentHandlerMaker());
// a user agent is needed for transformation
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
// Setup output
OutputStream out;
out = new java.io.FileOutputStream("employee.pdf");
try {
// Construct fop with desired output format
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, out);
// Setup XSLT
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xsltFile);
// Resulting SAX events (the generated FO) must be piped through to
// FOP
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
// That's where the XML is first transformed to XSL-FO and then
// PDF is created
transformer.transform(xmlSource, res);
} finally {
out.close();
}
}
}
CustomPDFDocumentHandlerMaker.java
public class CustomPDFDocumentHandlerMaker extends PDFDocumentHandlerMaker {
#Override
public IFDocumentHandler makeIFDocumentHandler(IFContext ifContext) {
CustomPDFDocumentHandler handler = new CustomPDFDocumentHandler(ifContext);
FOUserAgent ua = ifContext.getUserAgent();
if (ua.isAccessibilityEnabled()) {
ua.setStructureTreeEventHandler(handler.getStructureTreeEventHandler());
}
return handler;
}
}
CustomPDFDocumentHandler.java
public class CustomPDFDocumentHandler extends PDFDocumentHandler {
public CustomPDFDocumentHandler(IFContext context) {
super(context);
}
#Override
public IFDocumentHandlerConfigurator getConfigurator() {
return new CustomPDFRendererConfigurator(getUserAgent(), new PDFRendererConfigParser());
}
}
CustomPDFRendererConfigurator.java
public class CustomPDFRendererConfigurator extends PDFRendererConfigurator {
public CustomPDFRendererConfigurator(FOUserAgent userAgent, RendererConfigParser rendererConfigParser) {
super(userAgent, rendererConfigParser);
}
#Override
protected FontCollection getCustomFontCollection(InternalResourceResolver resolver, String mimeType)
throws FOPException {
List<EmbedFontInfo> fontList = new ArrayList<EmbedFontInfo>();
try {
FontUris fontUris = new FontUris(Thread.currentThread().getContextClassLoader().getResource("UbuntuMono-Bold.ttf").toURI(), null);
List<FontTriplet> triplets = new ArrayList<FontTriplet>();
triplets.add(new FontTriplet("UbuntuMono", Font.STYLE_NORMAL, Font.WEIGHT_NORMAL));
EmbedFontInfo fontInfo = new EmbedFontInfo(fontUris, false, false, triplets, null, EncodingMode.AUTO, EmbeddingMode.AUTO);
fontList.add(fontInfo);
} catch (Exception e) {
e.printStackTrace();
}
return createCollectionFromFontList(resolver, fontList);
}
}
Yes you can do this. You need to set FOP's first base directory programmatically.
fopFactory = FopFactory.newInstance();
// for image base URL : images from Resource path of project
String serverPath = request.getSession().getServletContext().getRealPath("/");
fopFactory.setBaseURL(serverPath);
// for fonts base URL : .ttf from Resource path of project
fopFactory.getFontManager().setFontBaseURL(serverPath);
Then use FOB font config file.It will use above base path.
Just put your font files in web applications resource folder and refer that path in FOP's font config file.
After Comment : Reading font config programmatically (not preferred & clean way still as requested)
//This is NON tested and PSEUDO code to get understanding of logic
FontUris fontUris = new FontUris(new URI("<font.ttf relative path>"), null);
EmbedFontInfo fontInfo = new EmbedFontInfo(fontUris, "is kerning enabled boolean", "is aldvaned enabled boolean", null, "subFontName");
List<EmbedFontInfo> fontInfoList = new ArrayList<>();
fontInfoList.add(fontInfo);
//set base URL for Font Manager to use relative path of ttf file.
fopFactory.getFontManager().updateReferencedFonts(fontInfoList);
You can get more info for FOP's relative path https://xmlgraphics.apache.org/fop/2.2/configuration.html
The following approach may be useful for those who use PDFTranscoder.
Put the following xml template in the resources:
<?xml version="1.0" encoding="UTF-8"?>
<fop version="1.0">
<fonts>
<font kerning="no" embed-url="IBM_PLEX_MONO_PATH" embedding-mode="subset">
<font-triplet name="IBM Plex Mono" style="normal" weight="normal"/>
</font>
</fonts>
</fop>
Then one can load this xml and replace the line with font (IBM_PLEX_MONO_PATH) with the actual URI of the font from the resource bundle at runtime:
private val fopConfig = DefaultConfigurationBuilder()
.buildFromFile(javaClass.getResourceAsStream("/fonts/fopconf.xml")?.use {
val xml = BufferedReader(InputStreamReader(it)).use { bf ->
bf.readLines()
.joinToString("")
.replace(
"IBM_PLEX_MONO_PATH",
javaClass.getResource("/fonts/IBM_Plex_Mono/IBMPlexMono-Text.ttf")!!.toURI().toString()
)
}
val file = Files.createTempFile("fopconf", "xml")
file.writeText(xml)
file.toFile()
})
Now one can use this config with PDFTranscoder and your custom fonts will be probably rendered and embedded in PDF:
val pdfTranscoder = if (type == PDF) PDFTranscoder() else EPSTranscoder()
ContainerUtil.configure(pdfTranscoder, fopConfig)
val input = TranscoderInput(ByteArrayInputStream(svg.toByteArray()))
ByteArrayOutputStream().use { byteArrayOutputStream ->
val output = TranscoderOutput(byteArrayOutputStream)
pdfTranscoder.transcode(input, output)
byteArrayOutputStream.toByteArray()
}

Changing Default text of a Plain Text Content Control of a existing .docx file

I am given a .docx template for which I need to populate in my java application. Initially, I am planning to use Apache POI, since before this, I was tasked to fill up a .xlsx template and it worked well. But, based on my research, doc4j is more suitable for my case.
My case is that this .docx template uses Plain Text Content Control like this:
Now, upon inspection to its XML structure, I see the <w:sdt> directly under <w:p> directly under the <w:body> tag.
<w:body>
...
<w:p w:rsidR="00ED05E8" w:rsidRPr="00DA4BE7" w:rsidRDefault="00AC5B37" w:rsidP="00BA6F7F">
...
<w:sdt>
<w:sdtPr>
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:i/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
<w:u w:val="single"/>
</w:rPr>
<w:alias w:val="Name of Office/Agency Name"/>
<w:tag w:val="Name of Office/Agency Name"/>
<w:id w:val="-781645881"/>
<w:placeholder>
<w:docPart w:val="DefaultPlaceholder_-1854013440"/>
</w:placeholder>
<w:text/>
</w:sdtPr>
<w:sdtEndPr/>
<w:sdtContent>
<w:r w:rsidR="00340180" w:rsidRPr="00616BA5">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:i/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
<w:u w:val="single"/>
</w:rPr>
<w:t>(Name of Office/Agency Name)</w:t>
</w:r>
</w:sdtContent>
</w:sdt>
...
</w:body>
I want to change the text on that <w:t> of that <w:sdt> from "(Name of Agency)" into a different String. The problem is that I do not know how and is stucked on after these lines:
WordprocessingMLPackage document = WordprocessingMLPackage.load(new java.io.File(...));
MainDocumentPart mainDocument = document.getMainDocumentPart();
I have this w:id of -781645881, but I don't know what to do with this information. Is this even the itemId referred on this ContentControlsXmlEdit sample class from the docx4j site?
I cannot fetch that <w:sdt> node even after using the following code:
String itemId = "-781645881".toLowerCase();
CustomXmlDataStoragePart customXmlDataStoragePart = (CustomXmlDataStoragePart)wordMLPackage.getCustomXmlDataStorageParts().get(itemId);
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart.getData();
What should I do to be able to change the value of the plain text content control?
This answer is something I devised out of desperation for the following reasons:
I quite not mastered yet accessing things in the Word's .xml programmatically using docx4j.
I extracted the exact .xml file of the .docx file I'm processing.
There are no storeItemid found on my .xml of the .docx file.
Here is my utility class written in .groovy:
import javax.xml.bind.JAXBElement
import org.apache.poi.openxml4j.exceptions.InvalidFormatException
import org.docx4j.openpackaging.packages.WordprocessingMLPackage
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
import org.docx4j.wml.CTBookmark
import org.docx4j.wml.P
import org.docx4j.wml.R
import org.docx4j.wml.SdtBlock
import org.docx4j.wml.SdtContent
import org.docx4j.wml.SdtRun
import org.docx4j.wml.Text
class WordReport {
private WordprocessingMLPackage document
private Map<String, String> contentControlMapping
private Map<String, Object> reportArgs
public WordReport(Map<String, Object> reportArgs) {
document = WordprocessingMLPackage.createPackage()
this.reportArgs = reportArgs
}
public WordprocessingMLPackage exportReport() {
return document
}
private String getNewMapping(String contentControlText) {
return contentControlMapping.get(contentControlText)
}
private boolean isMapped(String contentControlText) {
return contentControlMapping.containsKey(contentControlText)
}
protected void mapNewMapping() {
MainDocumentPart mainDocument = document.getMainDocumentPart()
List<Object> nodes = mainDocument.getJAXBNodesViaXPath("//w:sdt", false)
String key
SdtContent content
nodes.each { n ->
if(n instanceof SdtBlock) {
content = n.getSdtContent()
}
else if(n instanceof JAXBElement) {
if(n.getValue() instanceof SdtRun) {
content = n.getValue().getSdtContent()
}
}
content.getContent().each { sdtcc ->
if(sdtcc instanceof P) {
sdtcc.getContent().each { pc ->
pc.getContent().each { rc ->
println "rc.getValue().getClass(): " + rc.getValue().getClass()
if(rc.getValue() instanceof Text) {
key = rc.getValue().getValue()
isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
}
else if(rc.getValue() instanceof R) {
rc.getValue().getContent().each { rrc ->
if(rrc instanceof JAXBElement) {
key = rrc.getValue().getValue()
isMapped(key) ? rrc.getValue().setValue(getNewMapping(key)) : null
}
}
}
}
}
}
else if(sdtcc instanceof R) {
sdtcc.getContent().each { rc ->
if(rc instanceof JAXBElement) {
key = rc.getValue().getValue()
isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
}
}
}
else if(sdtcc instanceof JAXBElement) {
if(sdtcc.getValue() instanceof CTBookmark) {
}
else if(sdtcc.getValue() instanceof JAXBElement) {
key = sdtcc.getValue().getValue()
isMapped(key) ? sdtcc.getValue().setValue(getNewMapping(key)) : null
}
}
}
}
}
public void setMapping(Map contentControlMapping) {
this.contentControlMapping = contentControlMapping
}
}
The core part of this class is the mapNewMapping() method. What basically it does is it maps the mapping on the contentControlMapping variable into any <w:t> inside <w:sdt>s, regardless whether it is directly under a <w:sdt> or if it's inside of <w:rPr>, etc. I retrieve the list of all <w:sdt> using the getJAXBNodesViaXPath() method.
The limitation of this is this can only support limited set of combinations of P, R, CTBookmark, SdtBlock, SdtContent, SdtRun. If the <w:t> is found inside complex or deep nested .xml that I have not anticipated, it will not be mapped. That is why I included mentioning that I have read first the .xml of the .docx file.

Multi-page PDF generation from SVG with Java and Apache Batik

I have two simple SVG documents that I want to convert to a PDF such that each document is on one page in the PDF.
My first SVG document has two rectangles that look as follow:
and the second one is a black circle.
The code looks as follow:
import java.io.*;
import org.apache.batik.anim.dom.*;
import org.apache.batik.transcoder.*;
import org.w3c.dom.*;
public class MultiPagePdf {
public static void main(String[] args) {
MultiPagePDFTranscoder transcoder = new MultiPagePDFTranscoder();
try {
final DOMImplementation impl = SVGDOMImplementation.getDOMImplementation();
SVGOMDocument doc1 = (SVGOMDocument) impl.createDocument(SVGDOMImplementation.SVG_NAMESPACE_URI, "svg", null);
// 1st rectangle in doc1
Element el = doc1.createElementNS(SVGDOMImplementation.SVG_NAMESPACE_URI, "rect");
el.setAttributeNS(null, "width", "60");
el.setAttributeNS(null, "height", "60");
el.setAttributeNS(null, "fill", "none");
el.setAttributeNS(null, "stroke", "blue");
SVGOMSVGElement docEl = (SVGOMSVGElement) doc1.getDocumentElement();
docEl.appendChild(el);
// 2nd rectangle in doc1
Element ell = doc1.createElementNS(SVGDOMImplementation.SVG_NAMESPACE_URI, "rect");
ell.setAttributeNS(null, "x", "50");
ell.setAttributeNS(null, "y", "50");
ell.setAttributeNS(null, "width", "25");
ell.setAttributeNS(null, "height", "25");
ell.setAttributeNS(null, "fill", "green");
docEl.appendChild(ell);
final DOMImplementation impl2 = SVGDOMImplementation.getDOMImplementation();
SVGOMDocument doc2 = (SVGOMDocument) impl2.createDocument(SVGDOMImplementation.SVG_NAMESPACE_URI, "svg", null);
// circle in doc2
Element el2 = doc2.createElementNS(SVGDOMImplementation.SVG_NAMESPACE_URI, "circle");
el2.setAttributeNS(null, "cx", "130");
el2.setAttributeNS(null, "cy", "100");
el2.setAttributeNS(null, "r", "50");
SVGOMSVGElement docEl2 = (SVGOMSVGElement) doc2.getDocumentElement();
docEl2.appendChild(el2);
OutputStream outputStream = new FileOutputStream(new File("/C:/Users/ah/Documents/simpleMulti.pdf"));
TranscoderOutput transcoderOutput = new TranscoderOutput(outputStream);
Document[] doccs = { doc1, doc2 };
transcoder.transcode(doccs, null, transcoderOutput); // generate PDF doc
} catch (Exception e) {
e.printStackTrace();
}
}
}
I've printed both SVG documents and they look as they should:
1st SVG Document:
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" contentScriptType="text/ecmascript" zoomAndPan="magnify" contentStyleType="text/css" preserveAspectRatio="xMidYMid meet" version="1.0">
<rect fill="none" width="60" height="60" stroke="blue"/>
<rect fill="green" x="50" width="25" height="25" y="50"/>
</svg>
2nd SVG Document:
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" contentScriptType="text/ecmascript" zoomAndPan="magnify" contentStyleType="text/css" preserveAspectRatio="xMidYMid meet" version="1.0">
<circle r="50" cx="130" cy="100"/>
</svg>
I found a code that should be doing what I'm after using Apache FOP; I have the following code to generate the PDF:
import java.io.*;
import java.util.*;
import org.apache.batik.transcoder.*;
import org.apache.fop.*;
import org.apache.fop.svg.*;
import org.w3c.dom.*;
public class MultiPagePDFTranscoder extends AbstractFOPTranscoder {
protected PDFDocumentGraphics2D graphics = null;
protected Map<String, Object> params = null;
public MultiPagePDFTranscoder() {
super();
}
protected void transcode(Document[] documents, String uri, TranscoderOutput output) throws TranscoderException {
graphics = new PDFDocumentGraphics2D(isTextStroked());
graphics.getPDFDocument().getInfo().setProducer("Apache FOP Version " + Version.getVersion() + ": PDF Transcoder for Batik");
try {
OutputStream out = output.getOutputStream();
if (!(out instanceof BufferedOutputStream)) {
out = new BufferedOutputStream(out);
}
for (int i = 0; i < documents.length; i++) {
Document document = documents[i];
super.transcode(document, uri, null);
int tmpWidth = 300;
int tmpHeight = 300;
if (i == 0) {
graphics.setupDocument(out, tmpWidth, tmpHeight);
} else {
graphics.nextPage(tmpWidth, tmpHeight);
}
graphics.setGraphicContext(new org.apache.xmlgraphics.java2d.GraphicContext());
graphics.transform(curTxf);
this.root.paint(graphics);
}
graphics.finish();
} catch (IOException ex) {
throw new TranscoderException(ex);
}
}
}
The PDF file is generated, but I have two problems with it.
The translation and scale of the SVG elements are not correct.
On the second page, the first document is present (I've tried with multiple pages, and all the previous documents are present on the current page).
I use Apache FOP 2.1 with Apache Batik 1.8.
Any help with either problem would be highly appreciated.
I'm also open to other solutions to my overall task (converting SVGs to multi-paged PDF).
I had a similar problem. The way I went about it was to use the SVGConverter app in Batik (org.apache.batik.apps.rasterizer.SVGConverter) to convert the single SVG to PDF then stick them together into one file using PDFBox (org.apache.pdfbox.multipdf.PDFMergerUtility).
Convert SVG to single page PDF(s):
File outputFile = new File(pdfPath);
SVGConverter converter = new SVGConverter();
converter.setDestinationType(DestinationType.PDF);
converter.setSources(new String[] { svgPath });
converter.setDst(outputFile);
converter.execute();
Join the PDFs together:
File pdffile = new File(multipagepdfPath);
PDFMergerUtility pdf = new PDFMergerUtility();
pdf.setDestinationFileName(pdffile.getPath());
pdf.addSource(page1pdffile);
pdf.addSource(page2pdffile);
pdf.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
//The memory settings depend on if you want to use RAM or Temp files.
If you find a better solution please let me know.
The main problem I had was that the converter app is the only one in Batik that converts to pdf and keeps the sizes correctly.
rsvg-convert can turn multiple svg-documents into a single PDF.
rsvg-convert -f pdf -o out.pdf file1.svg file2.svg file3.svg
or for all files in a folder:
rsvg-convert -f pdf -o out.pdf *.svg
Don't know how to do this with Java though. Just a thought...

Using Flying Saucer to Render Images to PDF In Memory

I'm using Flying Saucer to convert XHTML to a PDF document. I've gotten the code to work with just basic HTML and in-line CSS, however, now I'm attempting to add an image as a sort of header to the PDF. What I'm wondering is if there is any way whatsoever to add the image by reading in an image file as a Java Image object, then adding that somehow to the PDF (or to the XHTML -- like it gets a virtual "url" representing the Image object that I can use to render the PDF). Has anyone ever done anything like this?
Thanks in advance for any help you can provide!
I had to do that last week so hopefully I will be able to answer you right away.
Flying Saucer
The easiest way is to add the image you want as markup in your HTML template before rendering with Flying Saucer. Within Flying Saucer you will have to implement a ReplacedElementFactory so that you can replace any markup before rendering with the image data.
/**
* Replaced element in order to replace elements like
* <tt><div class="media" data-src="image.png" /></tt> with the real
* media content.
*/
public class MediaReplacedElementFactory implements ReplacedElementFactory {
private final ReplacedElementFactory superFactory;
public MediaReplacedElementFactory(ReplacedElementFactory superFactory) {
this.superFactory = superFactory;
}
#Override
public ReplacedElement createReplacedElement(LayoutContext layoutContext, BlockBox blockBox, UserAgentCallback userAgentCallback, int cssWidth, int cssHeight) {
Element element = blockBox.getElement();
if (element == null) {
return null;
}
String nodeName = element.getNodeName();
String className = element.getAttribute("class");
// Replace any <div class="media" data-src="image.png" /> with the
// binary data of `image.png` into the PDF.
if ("div".equals(nodeName) && "media".equals(className)) {
if (!element.hasAttribute("data-src")) {
throw new RuntimeException("An element with class `media` is missing a `data-src` attribute indicating the media file.");
}
InputStream input = null;
try {
input = new FileInputStream("/base/folder/" + element.getAttribute("data-src"));
final byte[] bytes = IOUtils.toByteArray(input);
final Image image = Image.getInstance(bytes);
final FSImage fsImage = new ITextFSImage(image);
if (fsImage != null) {
if ((cssWidth != -1) || (cssHeight != -1)) {
fsImage.scale(cssWidth, cssHeight);
}
return new ITextImageElement(fsImage);
}
} catch (Exception e) {
throw new RuntimeException("There was a problem trying to read a template embedded graphic.", e);
} finally {
IOUtils.closeQuietly(input);
}
}
return this.superFactory.createReplacedElement(layoutContext, blockBox, userAgentCallback, cssWidth, cssHeight);
}
#Override
public void reset() {
this.superFactory.reset();
}
#Override
public void remove(Element e) {
this.superFactory.remove(e);
}
#Override
public void setFormSubmissionListener(FormSubmissionListener listener) {
this.superFactory.setFormSubmissionListener(listener);
}
}
You will notice that I have hardcoded here /base/folder which is the folder where the HTML file is located as it will be the root url for Flying Saucer for resolving medias. You may change it to the correct location, coming from anywhere you want (Properties for example).
HTML
Within your HTML markup you indicate somewhere a <div class="media" data-src="somefile.png" /> like so:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<style type="text/css">
#logo { /* something if needed */ }
</style>
</head>
<body>
<!-- Header -->
<div id="logo" class="media" data-src="media/logo.png" style="width: 177px; height: 60px" />
...
</body>
</html>
Rendering
And finally you just need to indicate your ReplacedElementFactory to Flying-Saucer when rendering:
String content = loadHtml();
ITextRenderer renderer = new ITextRenderer();
renderer.getSharedContext().setReplacedElementFactory(new MediaReplacedElementFactory(renderer.getSharedContext().getReplacedElementFactory()));
renderer.setDocumentFromString(content.toString());
renderer.layout();
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
renderer.createPDF(baos);
// baos.toByteArray();
I have been using Freemarker to generate the HTML from a template and then feeding the result to FlyingSaucer with great success. This is a pretty neat library.
what worked for me is putting it as a embedded image. So converting image to base64 first and then embed it:
byte[] image = ...
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString("<html>\n" +
" <body>\n" +
" <h1>Image</h1>\n" +
" <div><img src=\"data:image/png;base64," + Base64.getEncoder().encodeToString(image) + "\"></img></div>\n" +
" </body>\n" +
"</html>");
renderer.layout();
renderer.createPDF(response.getOutputStream());
Thanks Alex for detailed solution. I'm using this solution and found there is another line to be added to make it work.
public ReplacedElement createReplacedElement(LayoutContext layoutContext, BlockBox blockBox, UserAgentCallback userAgentCallback, int cssWidth, int cssHeight) {
Element element = blockBox.getElement();
....
....
final Image image = Image.getInstance(bytes);
final int factor = ((ITextUserAgent)userAgentCallback).getSharedContext().getDotsPerPixel(); //Need to add this line
image.scaleAbsolute(image.getPlainWidth() * factor, image.getPlainHeight() * factor) //Need to add this line
final FSImage fsImage = new ITextFSImage(image);
....
....
We need to read the DPP from SharedContext and scale the image to display render the image on PDF.
Another suggestion:
We can directly extend ITextReplacedElement instead of implementing ReplacedElementFactory. In that case we can set the ReplacedElementFactory in the SharedContext as follows:
renderer.getSharedContext().setReplacedElementFactory(new MediaReplacedElementFactory(renderer.getOutputDevice());

Categories