I have several documents and I want to combine them all into one docx file.
My code :
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
public class WordMerge {
private final OutputStream result;
private final List<InputStream> inputs;
private XWPFDocument first;
public WordMerge(OutputStream result) {
this.result = result;
inputs = new ArrayList<>();
}
public void add(InputStream stream) throws Exception{
inputs.add(stream);
OPCPackage srcPackage = OPCPackage.open(stream);
XWPFDocument src1Document = new XWPFDocument(srcPackage);
if(inputs.size() == 1){
first = src1Document;
} else {
CTBody srcBody = src1Document.getDocument().getBody();
first.getDocument().addNewBody().set(srcBody);
}
}
public void doMerge() throws Exception{
first.write(result);
}
public void close() throws Exception{
result.flush();
result.close();
for (InputStream input : inputs) {
input.close();
}
}
}
And it use :
public static void main(String[] args) throws Exception {
FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");
WordMerge wm = new WordMerge(faos);
wm.add( new FileInputStream("/home/victor/001.docx") );
wm.add( new FileInputStream("/home/victor/002.docx") );
wm.doMerge();
wm.close();
}
It works, unfortunatly it becomes a bit messy if you have listings in any of the non-first document. Listing symbols change to numbers and worse sometimes a listing from the previous document will be continued in the attached document. Say doc1 has a.b.c listing , second has non ordered listing then this latter one becomes d.e.f. (It followed the previous document formatting.)
How to make each document that is merged on the next page and not follow the formatting of the previous document?
Your code only appends multiple CTBody elements into the document. But that is not how a Word document is structured. "It works" because Microsoft Word is tolerant enough to interpret it. But it fails when it comes to references within the Word document structure.
For example to numbering definitions are referenced by IDs. And that can be the same ID for different definitions in different documents. So ID 1 in first document might be pointing to a decimal numbering while ID 1 in second document might be pointing to a bullet numbering. So the numIDs needs to be merged and not only copied.
Embedded media (images for ex.) are referenced by rIDs. So the CTBody only contains the IDs. The media itself is stored outside the document body. So if the document body refers to a picture having rID12 and this picture is not stored, then the document gets corrupted.
Same is with many other document elements.
So that approach is not usable at all.
The need is traversing all body elements of the document, which shall be appended. Then append each found body element to the first document and update it's references.
The following is a working draft to show the principle. It is not ready yet. For example it does not considering hyperlinks, footnotes, comments, structured document tags and much more things. And as you see from the sheer amount of code needed, considering all possible things will be a very laborious task to do. To avoid even more code I simply copy the underlying XML bean if possible. This also should be better formed out for productive usage. But the principle should be clear from this code.
The code is commented when names of methods and variables are not self explained.
The code is tested and works using current apache poi 5.1.0. Former versions are not tested and also should not be used since they offer even less support for XWPF.
The code needs the full jar of all of the schemas as mentioned in Apache POI FAQ.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.util.Units;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.math.BigInteger;
public class WordMerger {
private Map<BigInteger, BigInteger> numIDs = null; // to handle merging numID
public WordMerger() {
this.numIDs= new HashMap<BigInteger, BigInteger>();
}
private void traverseBodyElements(List<IBodyElement> bodyElements, IBody resultBody) throws Exception {
for (IBodyElement bodyElement : bodyElements) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
XWPFParagraph resultParagraph = createParagraphWithPPr(paragraph, resultBody);
traverseRunElements(paragraph.getIRuns(), resultParagraph);
} else if (bodyElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)bodyElement;
XWPFSDT resultSDT = createSDT(sDT, resultBody);
//ToDo: handle further issues ...
} else if (bodyElement instanceof XWPFTable) {
XWPFTable table = (XWPFTable)bodyElement;
XWPFTable resultTable = createTableWithTblPrAndTblGrid(table, resultBody);
traverseTableRows(table.getRows(), resultTable);
}
}
}
private XWPFSDT createSDT(XWPFSDT sDT, IBody resultBody) {
//not ready yet
//we simply add paragraphs to avoid corruped documents
if (resultBody instanceof XWPFDocument) {
XWPFDocument resultDocument = (XWPFDocument)resultBody;
XWPFParagraph resultParagraph = resultDocument.createParagraph();
//ToDo: handle further issues ...
} else if (resultBody instanceof XWPFTableCell) {
XWPFTableCell resultTableCell = (XWPFTableCell)resultBody;
XWPFParagraph resultParagraph = resultTableCell.addParagraph();
//ToDo: handle further issues ...
} //ToDo: else others ...
//ToDo: handle SDT properly
return null;
}
private XWPFParagraph createParagraphWithPPr(XWPFParagraph paragraph, IBody resultBody) {
if (resultBody instanceof XWPFDocument) {
XWPFDocument resultDocument = (XWPFDocument)resultBody;
XWPFParagraph resultParagraph = resultDocument.createParagraph();
resultParagraph.getCTP().setPPr(paragraph.getCTP().getPPr());//simply copy the underlying XML bean to avoid more code
handleStyles(resultDocument, paragraph);
handleNumberings(paragraph, resultParagraph);
//ToDo: handle further issues ...
return resultParagraph;
} else if (resultBody instanceof XWPFTableCell) {
XWPFTableCell resultTableCell = (XWPFTableCell)resultBody;
XWPFParagraph resultParagraph = resultTableCell.addParagraph();
resultParagraph.getCTP().setPPr(paragraph.getCTP().getPPr());//simply copy the underlying XML bean to avoid more code
handleStyles(resultTableCell, paragraph);
//ToDo: handle further issues ...
return resultParagraph;
} //ToDo: else others ...
return null;
}
private void handleNumberings(XWPFParagraph paragraph, XWPFParagraph resultParagraph) {
//if we have numberings, we need merging the numIDs and abstract numberings of the two different documents
BigInteger numID = paragraph.getNumID();
if (numID == null) return;
BigInteger resultNumID = this.numIDs.get(numID);
if (resultNumID == null) {
XWPFDocument document = paragraph.getDocument();
XWPFNumbering numbering = document.createNumbering();
XWPFNum num = numbering.getNum(numID);
BigInteger abstractNumID = numbering.getAbstractNumID(numID);
XWPFAbstractNum abstractNum = numbering.getAbstractNum(abstractNumID);
XWPFAbstractNum resultAbstractNum = new XWPFAbstractNum((org.openxmlformats.schemas.wordprocessingml.x2006.main.CTAbstractNum)abstractNum.getCTAbstractNum().copy());
XWPFDocument resultDocument = resultParagraph.getDocument();
XWPFNumbering resultNumbering = resultDocument.createNumbering();
int pos = resultNumbering.getAbstractNums().size();
resultAbstractNum.getCTAbstractNum().setAbstractNumId(BigInteger.valueOf(pos));
BigInteger resultAbstractNumID = resultNumbering.addAbstractNum(resultAbstractNum);
resultNumID = resultNumbering.addNum(resultAbstractNumID);
XWPFNum resultNum = resultNumbering.getNum(resultNumID);
resultNum.getCTNum().setLvlOverrideArray(num.getCTNum().getLvlOverrideArray());
this.numIDs.put(numID, resultNumID);
}
resultParagraph.setNumID(resultNumID);
}
private void handleStyles(IBody resultBody, IBodyElement bodyElement) {
//if we have bodyElement styles we need merging those styles of the two different documents
XWPFDocument document = null;
String styleID = null;
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
document = paragraph.getDocument();
styleID = paragraph.getStyleID();
} else if (bodyElement instanceof XWPFTable) {
XWPFTable table = (XWPFTable)bodyElement;
if (table.getPart() instanceof XWPFDocument) {
document = (XWPFDocument)table.getPart();
styleID = table.getStyleID();
}
} //ToDo: else others ...
if (document == null || styleID == null || "".equals(styleID)) return;
XWPFDocument resultDocument = null;
if (resultBody instanceof XWPFDocument) {
resultDocument = (XWPFDocument)resultBody;
} else if (resultBody instanceof XWPFTableCell) {
XWPFTableCell resultTableCell = (XWPFTableCell)resultBody;
resultDocument = resultTableCell.getXWPFDocument();
} //ToDo: else others ...
if (resultDocument != null) {
XWPFStyles styles = document.getStyles();
XWPFStyles resultStyles = resultDocument.getStyles();
XWPFStyle style = styles.getStyle(styleID);
//merge each used styles, also the related ones
for (XWPFStyle relatedStyle : styles.getUsedStyleList(style)) {
if (resultStyles.getStyle(relatedStyle.getStyleId()) == null) {
resultStyles.addStyle(relatedStyle);
}
}
}
}
private XWPFTable createTableWithTblPrAndTblGrid(XWPFTable table, IBody resultBody) {
if (resultBody instanceof XWPFDocument) {
XWPFDocument resultDocument = (XWPFDocument)resultBody;
XWPFTable resultTable = resultDocument.createTable();
resultTable.removeRow(0);
resultTable.getCTTbl().setTblPr(table.getCTTbl().getTblPr());//simply copy the underlying XML bean to avoid more code
resultTable.getCTTbl().setTblGrid(table.getCTTbl().getTblGrid());//simply copy the underlying XML bean to avoid more code
handleStyles(resultDocument, table);
//ToDo: handle further issues ...
return resultTable;
} else if (resultBody instanceof XWPFTableCell) {
//ToDo: handle stacked tables
} //ToDo: else others ...
return null;
}
private void traverseRunElements(List<IRunElement> runElements, IRunBody resultRunBody) throws Exception {
for (IRunElement runElement : runElements) {
if (runElement instanceof XWPFFieldRun) {
XWPFFieldRun fieldRun = (XWPFFieldRun)runElement;
XWPFFieldRun resultFieldRun = createFieldRunWithRPr(fieldRun, resultRunBody);
traversePictures(fieldRun, resultFieldRun);
} else if (runElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun hyperlinkRun = (XWPFHyperlinkRun)runElement;
XWPFHyperlinkRun resultHyperlinkRun = createHyperlinkRunWithRPr(hyperlinkRun, resultRunBody);
traversePictures(hyperlinkRun, resultHyperlinkRun);
} else if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
XWPFRun resultRun = createRunWithRPr(run, resultRunBody);
traversePictures(run, resultRun);
} else if (runElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)runElement;
//ToDo: handle SDT
}
}
}
private void copyTextOfRuns(XWPFRun run, XWPFRun resultRun) {
//copy all of the possible T contents of the runs
for (int i = 0; i < run.getCTR().sizeOfTArray(); i++) {
resultRun.setText(run.getText(i), i);
}
}
private XWPFFieldRun createFieldRunWithRPr(XWPFFieldRun fieldRun, IRunBody resultRunBody) {
if (resultRunBody instanceof XWPFParagraph) {
XWPFParagraph resultParagraph = (XWPFParagraph)resultRunBody;
XWPFFieldRun resultFieldRun = (XWPFFieldRun)resultParagraph.createRun();
resultFieldRun.getCTR().setRPr(fieldRun.getCTR().getRPr());//simply copy the underlying XML bean to avoid more code
//ToDo: handle field runs properly ...
handleRunStyles(resultParagraph.getDocument(), fieldRun);
//ToDo: handle further issues ...
return resultFieldRun;
} else if (resultRunBody instanceof XWPFSDT) {
//ToDo: handle SDT
}
return null;
}
private XWPFHyperlinkRun createHyperlinkRunWithRPr(XWPFHyperlinkRun hyperlinkRun, IRunBody resultRunBody) {
if (resultRunBody instanceof XWPFParagraph) {
XWPFParagraph resultParagraph = (XWPFParagraph)resultRunBody;
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHyperlink resultCTHyperLink = resultParagraph.getCTP().addNewHyperlink();
resultCTHyperLink.addNewR();
XWPFHyperlinkRun resultHyperlinkRun = new XWPFHyperlinkRun(resultCTHyperLink, resultCTHyperLink.getRArray(0), resultParagraph);
if (hyperlinkRun.getAnchor() != null) {
resultHyperlinkRun = resultParagraph.createHyperlinkRun(hyperlinkRun.getAnchor());
}
resultHyperlinkRun.getCTR().setRPr(hyperlinkRun.getCTR().getRPr());//simply copy the underlying XML bean to avoid more code
copyTextOfRuns(hyperlinkRun, resultHyperlinkRun);
//ToDo: handle external hyperlink runs properly ...
handleRunStyles(resultParagraph.getDocument(), hyperlinkRun);
//ToDo: handle further issues ...
return resultHyperlinkRun;
} else if (resultRunBody instanceof XWPFSDT) {
//ToDo: handle SDT
}
return null;
}
private XWPFRun createRunWithRPr(XWPFRun run, IRunBody resultRunBody) {
if (resultRunBody instanceof XWPFParagraph) {
XWPFParagraph resultParagraph = (XWPFParagraph)resultRunBody;
XWPFRun resultRun = resultParagraph.createRun();
resultRun.getCTR().setRPr(run.getCTR().getRPr());//simply copy the underlying XML bean to avoid more code
copyTextOfRuns(run, resultRun);
handleRunStyles(resultParagraph.getDocument(), run);
//ToDo: handle further issues ...
return resultRun;
} else if (resultRunBody instanceof XWPFSDT) {
//ToDo: handle SDT
}
return null;
}
private void handleRunStyles(IBody resultBody, IRunElement runElement) {
//if we have runElement styles we need merging those styles of the two different documents
XWPFDocument document = null;
String styleID = null;
if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
document = run.getDocument();
styleID = run.getStyle();
} else if (runElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun run = (XWPFHyperlinkRun)runElement;
document = run.getDocument();
styleID = run.getStyle();
} else if (runElement instanceof XWPFFieldRun) {
XWPFFieldRun run = (XWPFFieldRun)runElement;
document = run.getDocument();
styleID = run.getStyle();
} //ToDo: else others ...
if (document == null || styleID == null || "".equals(styleID)) return;
XWPFDocument resultDocument = null;
if (resultBody instanceof XWPFDocument) {
resultDocument = (XWPFDocument)resultBody;
} else if (resultBody instanceof XWPFTableCell) {
XWPFTableCell resultTableCell = (XWPFTableCell)resultBody;
resultDocument = resultTableCell.getXWPFDocument();
} //ToDo: else others ...
if (resultDocument != null) {
XWPFStyles styles = document.getStyles();
XWPFStyles resultStyles = resultDocument.getStyles();
XWPFStyle style = styles.getStyle(styleID);
//merge each used styles, also the related ones
for (XWPFStyle relatedStyle : styles.getUsedStyleList(style)) {
if (resultStyles.getStyle(relatedStyle.getStyleId()) == null) {
resultStyles.addStyle(relatedStyle);
}
}
}
}
private void traverseTableRows(List<XWPFTableRow> tableRows, XWPFTable resultTable) throws Exception {
for (XWPFTableRow tableRow : tableRows) {
XWPFTableRow resultTableRow = createTableRowWithTrPr(tableRow, resultTable);
traverseTableCells(tableRow.getTableICells(), resultTableRow);
}
}
private XWPFTableRow createTableRowWithTrPr(XWPFTableRow tableRow, XWPFTable resultTable) {
XWPFTableRow resultTableRow = resultTable.createRow();
for (int i = resultTableRow.getTableCells().size(); i > 0; i--) { //table row should be empty at first
resultTableRow.removeCell(i-1);
}
resultTableRow.getCtRow().setTrPr(tableRow.getCtRow().getTrPr());//simply copy the underlying XML bean to avoid more code
//ToDo: handle further issues ...
return resultTableRow;
}
private void traverseTableCells(List<ICell> tableICells, XWPFTableRow resultTableRow) throws Exception {
for (ICell tableICell : tableICells) {
if (tableICell instanceof XWPFSDTCell) {
XWPFSDTCell sDTCell = (XWPFSDTCell)tableICell;
XWPFSDTCell resultSdtTableCell = createSdtTableCell(sDTCell, resultTableRow);
//ToDo: handle further issues ...
} else if (tableICell instanceof XWPFTableCell) {
XWPFTableCell tableCell = (XWPFTableCell)tableICell;
XWPFTableCell resultTableCell = createTableCellWithTcPr(tableCell, resultTableRow);
traverseBodyElements(tableCell.getBodyElements(), resultTableCell);
}
}
}
private XWPFSDTCell createSdtTableCell(XWPFSDTCell sDTCell, XWPFTableRow resultTableRow) {
//create at least a cell to avoid corrupted document
XWPFTableCell resultTableCell = resultTableRow.createCell();
//ToDo: handle SDTCell properly
//ToDo: handle further issues ...
return null;
}
private XWPFTableCell createTableCellWithTcPr(XWPFTableCell tableCell, XWPFTableRow resultTableRow) {
XWPFTableCell resultTableCell = resultTableRow.createCell();
resultTableCell.removeParagraph(0);
resultTableCell.getCTTc().setTcPr(tableCell.getCTTc().getTcPr());//simply copy the underlying XML bean to avoid more code
//ToDo: handle further issues ...
return resultTableCell;
}
private void traversePictures(IRunElement runElement, IRunElement resultRunElement) throws Exception {
List<XWPFPicture> pictures = null;
if (runElement instanceof XWPFFieldRun) {
XWPFFieldRun fieldRun = (XWPFFieldRun)runElement;
pictures = fieldRun.getEmbeddedPictures();
} else if (runElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun hyperlinkRun = (XWPFHyperlinkRun)resultRunElement;
pictures = hyperlinkRun.getEmbeddedPictures();
} else if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
pictures = run.getEmbeddedPictures();
} else if (runElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)runElement;
//ToDo: handle SDT
}
if (pictures != null) {
for (XWPFPicture picture : pictures) {
XWPFPictureData pictureData = picture.getPictureData();
XWPFPicture resultPicture = createPictureWithDrawing(runElement, picture, pictureData, resultRunElement);
}
}
}
private XWPFPicture createPictureWithDrawing(IRunElement runElement, XWPFPicture picture, XWPFPictureData pictureData, IRunElement resultRunElement) {
if (resultRunElement instanceof XWPFFieldRun) {
XWPFFieldRun fieldRun = (XWPFFieldRun)runElement;
XWPFFieldRun resultFieldRun = (XWPFFieldRun)resultRunElement;
XWPFPicture resultPicture = createPictureWithDrawing(fieldRun, resultFieldRun, picture, pictureData);
return resultPicture;
} else if (resultRunElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun hyperlinkRun = (XWPFHyperlinkRun)runElement;
XWPFHyperlinkRun resultHyperlinkRun = (XWPFHyperlinkRun)resultRunElement;
XWPFPicture resultPicture = createPictureWithDrawing(hyperlinkRun, resultHyperlinkRun, picture, pictureData);
return resultPicture;
} else if (resultRunElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
XWPFRun resultRun = (XWPFRun)resultRunElement;
XWPFPicture resultPicture = createPictureWithDrawing(run, resultRun, picture, pictureData);
return resultPicture;
} else if (resultRunElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)resultRunElement;
//ToDo: handle SDT
}
return null;
}
private XWPFPicture createPictureWithDrawing(XWPFRun run, XWPFRun resultRun, XWPFPicture picture, XWPFPictureData pictureData) {
try {
XWPFPicture resultPicture = resultRun.addPicture(
pictureData.getPackagePart().getInputStream(),
pictureData.getPictureType(),
pictureData.getFileName(),
Units.pixelToEMU((int)picture.getWidth()),
Units.pixelToEMU((int)picture.getDepth()));
String rId = resultPicture.getCTPicture().getBlipFill().getBlip().getEmbed();
resultRun.getCTR().setDrawingArray(0, run.getCTR().getDrawingArray(0));//simply copy the underlying XML bean to avoid more code
//but then correct the rID
String declareNameSpaces = "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main'; ";
org.apache.xmlbeans.XmlObject[] selectedObjects = resultRun.getCTR().getDrawingArray(0).selectPath(
declareNameSpaces
+ "$this//a:blip");
for (org.apache.xmlbeans.XmlObject blipObject : selectedObjects) {
if (blipObject instanceof org.openxmlformats.schemas.drawingml.x2006.main.CTBlip) {
org.openxmlformats.schemas.drawingml.x2006.main.CTBlip blip = (org.openxmlformats.schemas.drawingml.x2006.main.CTBlip)blipObject;
if (blip.isSetEmbed()) blip.setEmbed(rId);
}
}
//remove rIDs to external hyperlinks to avoid corruot document
selectedObjects = resultRun.getCTR().getDrawingArray(0).selectPath(
declareNameSpaces
+ "$this//a:hlinkClick");
for (org.apache.xmlbeans.XmlObject hlinkClickObject : selectedObjects) {
if (hlinkClickObject instanceof org.openxmlformats.schemas.drawingml.x2006.main.CTHyperlink) {
org.openxmlformats.schemas.drawingml.x2006.main.CTHyperlink hlinkClick = (org.openxmlformats.schemas.drawingml.x2006.main.CTHyperlink)hlinkClickObject;
if (hlinkClick.isSetId()) hlinkClick.setId("");
//ToDo: handle pictures having hyperlinks properly
}
}
//ToDo: handle further issues ...
return resultPicture;
} catch (Exception ex) {
ex.printStackTrace();
}
return null;
}
public void merge(String firstFilePath, String secondFilePath, String resultFilePath) throws Exception {
XWPFDocument resultDocument = new XWPFDocument(new FileInputStream(firstFilePath));
XWPFDocument documentToAppend = new XWPFDocument(new FileInputStream(secondFilePath));
traverseBodyElements(documentToAppend.getBodyElements(), resultDocument);
documentToAppend.close();
FileOutputStream out = new FileOutputStream(resultFilePath);
resultDocument.write(out);
out.close();
resultDocument.close();
}
public static void main(String[] args) throws Exception {
WordMerger merger = new WordMerger();
merger.merge("./WordDocument1.docx", "./WordDocument2.docx", "./WordDocumentResult.docx");
}
}
Related
Let's assume i have a word document, with this body.
Word document before replacing images
private void findImages(XWPFParagraph p) {
for (XWPFRun r : p.getRuns()) {
for (XWPFPicture pic : r.getEmbeddedPictures()) {
XWPFPicture picture = pic;
XWPFPictureData source = picture.getPictureData();
BufferedImage qrCodeImage = printVersionService.generateQRCodeImage("JASAW EMA WWS");
File imageFile = new File("image.jpg");
try {
ImageIO.write(qrCodeImage, "jpg", imageFile);
} catch (IOException e) {
e.printStackTrace();
}
try ( FileInputStream in = new FileInputStream(imageFile);
OutputStream out = source.getPackagePart().getOutputStream();
) {
byte[] buffer = new byte[2048];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
}
So this code replaces any image with QR code.
But I have one trouble.
Word Document after replacing
So my question is?
How can I replace only the image i chose or how can i replace inserted figure with text with image generated by my own function?
Detecting the picture and replacing the picture data will be the simplest. In following answer I have shown how to detect and replace pictures by name: Java Apache POI: insert an image "infront the text". If you do not know the name of the embedded picture, a picture also can be detected by alt text. To edit the alt text of a picture, open the context menu by right mouse click on the picture and choose Edit A̲lt Text from that context menu.
In How to read alt text of image in word document apache.poi I have shown already how to read alt text of image.
So code could look like:
import java.io.FileInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordReplacePictureData {
static org.apache.xmlbeans.XmlObject getInlineOrAnchor(org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture ctPictureToFind, org.apache.xmlbeans.XmlObject inlineOrAnchor) {
String declareNameSpaces = "declare namespace pic='http://schemas.openxmlformats.org/drawingml/2006/picture'; ";
org.apache.xmlbeans.XmlObject[] selectedObjects = inlineOrAnchor.selectPath(
declareNameSpaces
+ "$this//pic:pic");
for (org.apache.xmlbeans.XmlObject selectedObject : selectedObjects) {
if (selectedObject instanceof org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture) {
org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture ctPicture = (org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture)selectedObject;
if (ctPictureToFind.equals(ctPicture)) {
// this is the inlineOrAnchor for that picture
return inlineOrAnchor;
}
}
}
return null;
}
static org.apache.xmlbeans.XmlObject getInlineOrAnchor(XWPFRun run, XWPFPicture picture) {
org.openxmlformats.schemas.drawingml.x2006.picture.CTPicture ctPictureToFind = picture.getCTPicture();
for (org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDrawing drawing : run.getCTR().getDrawingList()) {
for (org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline inline : drawing.getInlineList()) {
org.apache.xmlbeans.XmlObject inlineOrAnchor = getInlineOrAnchor(ctPictureToFind, inline);
// if inlineOrAnchor is not null, then this is the inline for that picture
if (inlineOrAnchor != null) return inlineOrAnchor;
}
for (org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTAnchor anchor : drawing.getAnchorList()) {
org.apache.xmlbeans.XmlObject inlineOrAnchor = getInlineOrAnchor(ctPictureToFind, anchor);
// if inlineOrAnchor is not null, then this is the anchor for that picture
if (inlineOrAnchor != null) return inlineOrAnchor;
}
}
return null;
}
static org.openxmlformats.schemas.drawingml.x2006.main.CTNonVisualDrawingProps getNonVisualDrawingProps(org.apache.xmlbeans.XmlObject inlineOrAnchor) {
if (inlineOrAnchor == null) return null;
if (inlineOrAnchor instanceof org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline) {
org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline inline = (org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline)inlineOrAnchor;
return inline.getDocPr();
} else if (inlineOrAnchor instanceof org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTAnchor) {
org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTAnchor anchor = (org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTAnchor)inlineOrAnchor;
return anchor.getDocPr();
}
return null;
}
static String getSummary(org.openxmlformats.schemas.drawingml.x2006.main.CTNonVisualDrawingProps nonVisualDrawingProps) {
if (nonVisualDrawingProps == null) return "";
String summary = "Id:=" + nonVisualDrawingProps.getId();
summary += " Name:=" + nonVisualDrawingProps.getName();
summary += " Title:=" + nonVisualDrawingProps.getTitle();
summary += " Descr:=" + nonVisualDrawingProps.getDescr();
return summary;
}
static XWPFPicture getPictureByAltText(XWPFRun run, String altText) {
if (altText == null) return null;
for (XWPFPicture picture : run.getEmbeddedPictures()) {
String altTextSummary = getSummary(getNonVisualDrawingProps(getInlineOrAnchor(run, picture)));
System.out.println(altTextSummary);
if (altTextSummary.contains(altText)) {
return picture;
}
}
return null;
}
static void replacePictureData(XWPFPictureData source, String pictureResultPath) {
try ( FileInputStream in = new FileInputStream(pictureResultPath);
OutputStream out = source.getPackagePart().getOutputStream();
) {
byte[] buffer = new byte[2048];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
static void replacePicture(XWPFRun run, String altText, String pictureResultPath) {
XWPFPicture picture = getPictureByAltText(run, altText);
if (picture != null) {
XWPFPictureData source = picture.getPictureData();
replacePictureData(source, pictureResultPath);
}
}
public static void main(String[] args) throws Exception {
String templatePath = "./source.docx";
String resultPath = "./result.docx";
String altText = "Placeholder QR-Code";
String pictureResultPath = "./QR.jpg";
try ( XWPFDocument document = new XWPFDocument(new FileInputStream(templatePath));
FileOutputStream out = new FileOutputStream(resultPath);
) {
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
for (XWPFRun run : paragraph.getRuns()) {
replacePicture(run, altText, pictureResultPath);
}
}
}
document.write(out);
}
}
}
This replaces the picture or pictures having alt text "Placeholder QR-Code". All other pictures remain as they are.
Replacing shapes with pictures is very laborious as shapes are stored in alternate content elements (to choice shape and fallback) and so the shape needs to be changed as well as the fallback. If one would let the fallback untouched, then applications which rely on that fallback will further show the old shape. Furthermore detecting shapes by text box content is not really much simpler than detecting pictures by alt text content.
I can replace the text inside the table and footer, but I can't replace the text outside the table. I don't know why.
Please any idea how to replace a paragraph like ${name} outside the table ?
I want that in the Map.
public static boolean changWord(String inputUrl, String outputUrl, Map<String, String> textMap) {
// Template conversion default success
boolean changeFlag = true;
try {
File file = new File(outputUrl);
FileOutputStream stream = new FileOutputStream(file);
#SuppressWarnings("resource")
XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(inputUrl));
WorderToNewWordUtils.changeText(document, textMap);
document.write(stream);
stream.close();
} catch (IOException e) {
e.printStackTrace();
changeFlag = false;
}
return changeFlag;
}
public static void changeText(XWPFDocument document, Map<String, String> textMap) {
for (XWPFParagraph p : document.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (checkText(text)) {
r.setText(changeValue(r.toString(), textMap), 0);
}
}
}
// Replace Text inside Table
for (XWPFTable tbl : document.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (checkText(text)) {
r.setText(changeValue(r.toString(), textMap), 0);
}
// System.out.println("Bevor Fußzeiler" + text);
}
}
}
}
}
// Replace Text in Footer
for (XWPFFooter footer : document.getFooterList()) {
for (XWPFParagraph paragraph1 : footer.getParagraphs()) {
for (XWPFRun r : paragraph1.getRuns()) {
String text = r.getText(0);
if (checkText(text)) {
r.setText(changeValue(r.toString(), textMap), 0);
}
// System.out.println("Nach Fußzeile" + text);
}
}
}
}
public static boolean checkText(String text) {
boolean check = false;
if (text.indexOf("$") != -1) {
check = true;
}
return check;
}
public static String changeValue(String value, Map<String, String> textMap) {
for (Map.Entry<String, String> textSet : textMap.entrySet()) {
// match template and replacement value format ${key}
String key = "${" + textSet.getKey() + "}";
if (value.indexOf(key) != -1) {
value = textSet.getValue();
}
}
return value;
}
public static void main(String[] args) {
// Template file address
String inputUrl = "D:\\Test.docx";
Map<String, String> testMap = new HashMap<>();
testMap.put("ja", "Nein");
testMap.put("red", "Blue");
testMap.put("No", "yes");
testMap.put("Preis", "999$");
testMap.put("Something", "Nothing");
testMap.put("nein", "Ja");
testMap.put("antwort", "Schöne");
testMap.put("name", "Sayer");
testMap.put("Test", "Email");
// .pdf if you want the Document in PDF Format
String outputUrl = "D:\\New-Test.docx";
WorderToNewWordUtils.changWord(inputUrl, outputUrl, testMap);
}
}
i found a Solution if you want it send me your email.
I am using Apache POI to convert a Word document to HTML. I have a Word document that has a footnote which includes an external hyperlink. I am not able to get the hyperlink URL for that hyperlink. Here is my code:
List<CTHyperlink> links = paragraph.getCTP().getHyperlinkList();
log.debug("Count of hyperlinks="+links.size());
for (CTHyperlink ctHyperlink : links) {
String rId = ctHyperlink.getId();
log.debug("rid="+rId);
XWPFHyperlink link = document.getHyperlinkByID(rId);
if(link!=null) {
log.debug("link not NULL");
}else {
log.debug("link is NULL");
}
}
From the above code, I see that in my case, the count of hyperlinks is 2. I am getting the rId correctly as "rId1" and "rId2" but link is always coming as NULL.
In the OOXML, I see that the hyperlinks in the document are stored in package name "/word/_rels/document.xml.rels" while hyperlinks in the footnote are stored in the package name "/word/_rels/footnotes.xml.rels". Probably that is the reason why my link variable is coming as NULL. But I am not sure how to get the hyperlink element from the footnote relationship package.
You are correct. If the paragraph in your code snippet is in a XWPFAbstractFootnoteEndnote then it is in package part /word/footnotes.xml or /word/endnotes.xml and not in /word/document.xml. And XWPFDocument.getHyperlinkByID only gets the hyperlinks stored in /word/document.xml.
The solution depends on where the paragraph in your code snippet is coming from. This you are not showing.
But simplest solution would be to get the XWPFHyperlinkRun from the XWPFParagraph and then get the XWPFHyperlink from that XWPFHyperlinkRun. If the parent package part of the XWPFHyperlinkRun is not the XWPFDocument then this must be done using underlying PackageRelationship since a hyperlink list only exists for XWPFDocument until now.
In Unable to read all content in order of a word document (docx) in Apache POI I have shown a basic example for how to traverse a Worddocument. This code I have extended now to traverse footnotes and endnotes as well as headers and footers and to handle found XWPFHyperlinkRuns.
Example:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import java.util.List;
public class WordTraverseAll {
static void traversePictures(List<XWPFPicture> pictures) throws Exception {
for (XWPFPicture picture : pictures) {
System.out.println(picture);
XWPFPictureData pictureData = picture.getPictureData();
System.out.println(pictureData);
}
}
static void traverseComments(XWPFRun run) throws Exception {
CTMarkup comr = null;
if (run.getCTR().getCommentReferenceList().size() > 0) {
comr = run.getCTR().getCommentReferenceList().get(0);
}
if (comr != null) {
XWPFComment comment = run.getDocument().getCommentByID(String.valueOf(comr.getId().intValue()));
System.out.println("Comment from " + comment.getAuthor() + ": " + comment.getText());
}
}
static void traverseFootnotes(XWPFRun run) throws Exception {
CTFtnEdnRef ftn = null;
if (run.getCTR().getFootnoteReferenceList().size() > 0) {
ftn = run.getCTR().getFootnoteReferenceList().get(0);
} else if (run.getCTR().getEndnoteReferenceList().size() > 0) {
ftn = run.getCTR().getEndnoteReferenceList().get(0);
}
if (ftn != null) {
XWPFAbstractFootnoteEndnote footnote =
ftn.getDomNode().getLocalName().equals("footnoteReference") ?
run.getDocument().getFootnoteByID(ftn.getId().intValue()) :
run.getDocument().getEndnoteByID(ftn.getId().intValue());
for (XWPFParagraph paragraph : footnote.getParagraphs()) {
traverseRunElements(paragraph.getIRuns());
}
}
}
static void traverseRunElements(List<IRunElement> runElements) throws Exception {
for (IRunElement runElement : runElements) {
if (runElement instanceof XWPFFieldRun) {
XWPFFieldRun fieldRun = (XWPFFieldRun)runElement;
//System.out.println(fieldRun.getClass().getName());
System.out.println(fieldRun);
traversePictures(fieldRun.getEmbeddedPictures());
} else if (runElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun hyperlinkRun = (XWPFHyperlinkRun)runElement;
//System.out.println(hyperlinkRun.getClass().getName());
String rId = hyperlinkRun.getHyperlinkId();
XWPFHyperlink hyperlink = null;
if (hyperlinkRun.getParent().getPart() instanceof XWPFAbstractFootnotesEndnotes) {
PackageRelationship rel = hyperlinkRun.getParent().getPart().getPackagePart().getRelationships().getRelationshipByID(rId);
hyperlink = new XWPFHyperlink(rId, rel.getTargetURI().toString());
} else if (hyperlinkRun.getParent().getPart() instanceof XWPFHeaderFooter) {
PackageRelationship rel = hyperlinkRun.getParent().getPart().getPackagePart().getRelationships().getRelationshipByID(rId);
hyperlink = new XWPFHyperlink(rId, rel.getTargetURI().toString());
} else if (hyperlinkRun.getParent().getPart() instanceof XWPFDocument) {
hyperlink = hyperlinkRun.getDocument().getHyperlinkByID(rId);
}
System.out.print(hyperlinkRun);
if (hyperlink != null) System.out.println("->" + hyperlink.getURL());
traversePictures(hyperlinkRun.getEmbeddedPictures());
} else if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
//System.out.println(run.getClass().getName());
System.out.println(run);
traverseFootnotes(run);
traverseComments(run);
traversePictures(run.getEmbeddedPictures());
} else if (runElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)runElement;
System.out.println(sDT);
System.out.println(sDT.getContent());
//ToDo: The SDT may have traversable content too.
}
}
}
static void traverseTableCells(List<ICell> tableICells) throws Exception {
for (ICell tableICell : tableICells) {
if (tableICell instanceof XWPFSDTCell) {
XWPFSDTCell sDTCell = (XWPFSDTCell)tableICell;
System.out.println(sDTCell);
//ToDo: The SDTCell may have traversable content too.
} else if (tableICell instanceof XWPFTableCell) {
XWPFTableCell tableCell = (XWPFTableCell)tableICell;
//System.out.println(tableCell);
traverseBodyElements(tableCell.getBodyElements());
}
}
}
static void traverseTableRows(List<XWPFTableRow> tableRows) throws Exception {
for (XWPFTableRow tableRow : tableRows) {
//System.out.println(tableRow);
traverseTableCells(tableRow.getTableICells());
}
}
static void traverseBodyElements(List<IBodyElement> bodyElements) throws Exception {
for (IBodyElement bodyElement : bodyElements) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
//System.out.println(paragraph);
traverseRunElements(paragraph.getIRuns());
} else if (bodyElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)bodyElement;
System.out.println(sDT);
System.out.println(sDT.getContent());
//ToDo: The SDT may have traversable content too.
} else if (bodyElement instanceof XWPFTable) {
XWPFTable table = (XWPFTable)bodyElement;
//System.out.println(table);
traverseTableRows(table.getRows());
}
}
}
static void traverseHeaderFooterElements(XWPFDocument document) throws Exception {
for (XWPFHeader header : document.getHeaderList()) {
traverseBodyElements(header.getBodyElements());
}
for (XWPFFooter footer : document.getFooterList()) {
traverseBodyElements(footer.getBodyElements());
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordHavingHyperlinks.docx"));
System.out.println("===== Document body elements =====");
traverseBodyElements(document.getBodyElements());
System.out.println("===== Header and footer elements =====");
traverseHeaderFooterElements(document);
document.close();
}
}
I am trying to search a string in docx and replace with some other text using java apache poi but it is replacing randomly
getting error as arrayIndexoutofbound Exception in line
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
public class WordReplaceTextInFormFields {
private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
boolean foundformfield = false;
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
XmlCursor cursor = run.getCTR().newCursor();
cursor.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/#w:fldCharType");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
if ("begin".equals(((SimpleValue) obj).getStringValue())) {
cursor.toParent();
obj = cursor.getObject();
obj = obj.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
if (ffname.equals(((SimpleValue) obj).getStringValue())) {
foundformfield = true;
} else {
foundformfield = false;
}
} else if ("end".equals(((SimpleValue) obj).getStringValue())) {
if (foundformfield)
return;
foundformfield = false;
}
}
if (foundformfield && run.getCTR().getTList().size() > 0) {
run.getCTR().getTList().get(0).setStringValue(text);
// System.out.println(run.getCTR());
}
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
replaceFormFieldText(document, "Text1", "Моя Компания");
replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
replaceFormFieldText(document, "Text3", "Доверенность");
document.write(new FileOutputStream("WordReplaceTextInFormFields.docx"));
document.close();
}
}
it misses some string, it not replaces entire document..please help with sample code
I do something similar in my project at https://github.com/centic9/poi-mail-merge which provides a general mail-merge functionality based on POI. It is using a bit different functionality from XmlBeans which replaces strings in the full XML-content of the document instead of each paragraph separately.
private static void appendBody(CTBody src, String append, boolean first) throws XmlException {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String srcString = src.xmlText();
String prefix = srcString.substring(0,srcString.indexOf(">")+1);
final String mainPart;
// exclude template itself in first appending
if(first) {
mainPart = "";
} else {
mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));
}
String suffix = srcString.substring( srcString.lastIndexOf("<") );
String addPart = append.substring(append.indexOf(">") + 1, append.lastIndexOf("<"));
CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+suffix);
src.set(makeBody);
}
}
See line 132 in MailMerge.java
currently I'm developing at a big software project and I need some help.
I added a FilteredTree so display a tree I created myself.
Following code is used to initialize it:
PatternFilter patternFilter = new PatternFilter();
dataTree = new FilteredTree(comp1, SWT.BORDER | SWT.MULTI | SWT.V_SCROLL, patternFilter, true);
dataTreeViewer = signalTree.getViewer();
dataTreeViewer.setContentProvider(getDataTreeContentProvider());
dataTreeViewer.setLabelProvider(getDataTreeLabelProvider());
dataTreeViewer.setInput(data);
The input is a tree represented by a single node, which has childs and so on.
If I fill in some words in the filter literally nothing happens. I can write weird stuff but still everything will be shown.
I will also add some code for my label provider. Maybe you see the mistake. This is driving me nuts because I'm wasting hours and hours for this little thing.
public String getText(Object element) {
if (element != null) {
// MatTreeNode
if (element instanceof MatTreeNode) {
MatTreeNode node = (MatTreeNode) element;
return node.getName();
}
// OtherData
if (element instanceof OtherData) {
OtherData data = (OtherData) element;
return data.getName();
}
}
return null;
}
In my code OtherData is a different named class. I'm not allowed to post this code due to copyright reasons.
I hope someone of you can help me.
Have a great weekend!
Best regards
LouBen3010
Complete ContentProvider:
public class RawdataTreeContentProvider extends TreeNodeContentProvider implements IContentProvider {
final static String[] EMPTY_ARRAY = {};
#Override
public Object[] getElements(Object inputElement) {
return getChildren(inputElement);
}
#Override
public Object[] getChildren(Object parentElement) {
// MatTreeNode
if (parentElement instanceof MatTreeNode) {
MatTreeNode parent = (MatTreeNode) parentElement;
LinkedList<MatTreeNode> children = parent.getChildren();
Iterator<MatTreeNode> iterator = children.iterator();
MatTreeNode[] res = new MatTreeNode[children.size()];
int i = 0;
while (iterator.hasNext()) {
MatTreeNode node = iterator.next();
res[i] = node;
i++;
}
return res;
}
// SimpleTreeNode
if (parentElement instanceof SimpleTreeNode) {
SimpleTreeNode treeNode = (SimpleTreeNode) parentElement;
return treeNode.getChildren().toArray();
}
return EMPTY_ARRAY;
}
#Override
public Object getParent(Object element) {
if (element instanceof MatTreeNode) {
return ((MatTreeNode) element).getParent();
}
if (element instanceof SimpleTreeNode) {
return ((SimpleTreeNode) element).getParent();
}
return null;
}
#Override
public boolean hasChildren(Object element) {
return getChildren(element).length > 0;
}
}
Complete LabelProvider:
public class RawdataTreeLabelProvider extends LabelProvider implements ILabelProvider {
private Map<ImageDescriptor, Image> imageCache = new HashMap<ImageDescriptor, Image>();
#Override
public String getText(Object element) {
if (element != null) {
// MatTreeNode
if (element instanceof MatTreeNode) {
MatTreeNode node = (MatTreeNode) element;
return node.getName();
}
// SimpleTreeNode
if (element instanceof SimpleTreeNode) {
SimpleTreeNode signal = (SimpleTreeNode) element;
return signal.getName();
}
}
return null;
}
#Override
public Image getImage(Object element) {
// SimpleTreeNode
if (element instanceof SimpleTreeNode) {
ImageDescriptor descriptor = null;
SimpleTreeNode treeNode = (SimpleTreeNode) element;
SignalData signalData = treeNode.getContent();
// Chose image by signal type
if (signalData.getType() == 0) {
// Input
descriptor = getImageDescriptor("login-16.png");
}
else if (signalData.getType() == 1) {
// Output
descriptor = getImageDescriptor("logout-16.png");
}
// Try to get the image from cache
Image image = (Image) imageCache.get(descriptor);
// If not in cache load the corresponding image
if (image == null) {
image = descriptor.createImage();
imageCache.put(descriptor, image);
}
return image;
}
else {
return super.getImage(element);
}
}
/*
* Custom methods
*/
public ImageDescriptor getImageDescriptor(String name) {
ClassLoader loader = getClass().getClassLoader();
URL url = loader.getResource(name);
return ImageDescriptor.createFromURL(url);
}
}