Editing Word Document using Apache POI [closed]

Editing Word Document using Apache POI [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to read a word document template and then replace the variables from the template, with user given data.without changing the heading or style as on the tempate.I'm not sure that what I am doing is correct way or not but this is the way I started:
'XWPFDocument docx = new XWPFDocument(
new FileInputStream(
"D://TestDocumentPrep/src/XXXXX_TestReport_URL_Document.docx"));
XWPFWordExtractor we = new XWPFWordExtractor(docx);
String textData = we.getText();
String newTestData=textData.replace("$var_source_code$", list.get(1))
.replace("$var_rsvp_code$", list.get(2))
.replace("$var_ssn$", list.get(3))
.replace("$var_zip_code$", list.get(4))
.replace("$var_point_for_business$",
anotherData.getPointForBusiness())
.replace("$var_E1_url$", anotherData.getE1url())
.replace("$var_E2_url$", anotherData.getE2url())
.replace("$var_E3_url$", anotherData.getE3url());
System.out.println(newTestData);'
This is what I have done.But Its reading the content of the word document as a string and replacing the variables.Now how to put the replaced string in word document in the template format?
Here I found something but Not exactly my solution
Here also I found something but not exact solution

Hi I am able to find the solution
Here is the code I have used for editing my word document and it's fine with both .doc and .docx format of the file which i want to edit and generate an edited new word document without changing the base template.
public void wordDocProcessor(AnotherVO anotherData, ArrayList<String> list,
String sourse, String destination) throws IOException,
InvalidFormatException {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(sourse
+ "XXXXX_TestReport_URL_Document.doc"));
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null
&& text.contains("var_source_code")) {
text = text.replace("var_source_code",
list.get(1));
r.setText(text, 0);
}
if (text != null && text.contains("var_rsvp_code")) {
text = text.replace("var_rsvp_code",
list.get(2));
r.setText(text, 0);
}
if (text != null && text.contains("var_ssn")) {
text = text.replace("var_ssn", list.get(3));
r.setText(text, 0);
}
if (text != null && text.contains("var_zip_code")) {
text = text
.replace("var_zip_code", list.get(4));
r.setText(text, 0);
}
if (text != null
&& text.contains("var_point_for_business")) {
text = text.replace("var_point_for_business",
anotherData.getPointForBusiness());
r.setText(text, 0);
}
if (text != null && text.contains("var_E1_url")) {
text = text.replace("var_E1_url",
anotherData.getE1url());
r.setText(text, 0);
}
if (text != null && text.contains("var_E2_url")) {
text = text.replace("var_E2_url",
anotherData.getE2url());
r.setText(text, 0);
}
if (text != null && text.contains("var_E3_url")) {
text = text.replace("var_E3_url",
anotherData.getE3url());
r.setText(text, 0);
}
}
}
}
}
}
doc.write(new FileOutputStream(destination + list.get(0)
+ "_TestReport_URL_Document.doc"));
}

Related

Apache POI docx file content control parse

I'm trying to parse docx file that contains content control fields (that are added using window like this, reference image, mine is on another language)
I'm using library APACHE POI. I found this question on how to do it. I used the same code:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import java.util.List;
import java.util.ArrayList;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlCursor;
import javax.xml.namespace.QName;
public class ReadWordForm {
private static List<XWPFSDT> extractSDTsFromBody(XWPFDocument document) {
XWPFSDT sdt;
XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
List<XWPFSDT> allsdts = new ArrayList<XWPFSDT>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
if (xmlcursor.getObject() instanceof CTSdtRun) {
sdt = new XWPFSDT((CTSdtRun)xmlcursor.getObject(), document);
//System.out.println("block: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtBlock) {
sdt = new XWPFSDT((CTSdtBlock)xmlcursor.getObject(), document);
//System.out.println("inline: " + sdt);
allsdts.add(sdt);
}
}
}
}
return allsdts;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDataCollectingForm.docx"));
List<XWPFSDT> allsdts = extractSDTsFromBody(document);
for (XWPFSDT sdt : allsdts) {
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
}
}
document.close();
}
}
The problem is that this code doesn't see these fields in the my docx file until I open it in LibreOffice and re-save it. So if the file is from Windows being put into this code it doesn't see these content control fields. But if I re-save the file in the LibreOffice (using the same format) it starts to see these fields, even tho it loses some of the data (titles and tags of some fields). Can someone tell me what might be the reason of it, how do I fix that so it will see these fields? Or there's an easier way using docx4j maybe? Unfortunately there's not much info about how to do it using these 2 libs in the internet, at least I didn't find it.
Examle files are located on google disk. The first one doesn't work, the second one works (after it was opened in Libre and field was changed to one of the options).

According to your uploaded sample files your content controls are in a table. The code you had found only gets content controls from document body directly.
Tables are beastly things in Word as table cells may contain whole document bodies each. That's why content controls in table cells are strictly separated from content controls in main document body. Their ooxml class is CTSdtCell instead of CTSdtRun or CTSdtBlock and in apache poi their class is XWPFSDTCell instead of XWPFSDT.
If it is only about reading the content, then one could fall back to XWPFAbstractSDT which is the abstract parent class of XWPFSDTCell as well as of XWPFSDT. So following code should work:
private static List<XWPFAbstractSDT> extractSDTsFromBody(XWPFDocument document) {
XWPFAbstractSDT sdt;
XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
List<XWPFAbstractSDT> allsdts = new ArrayList<XWPFAbstractSDT>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
//System.out.println(xmlcursor.getObject().getClass().getName());
if (xmlcursor.getObject() instanceof CTSdtRun) {
sdt = new XWPFSDT((CTSdtRun)xmlcursor.getObject(), document);
//System.out.println("block: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtBlock) {
sdt = new XWPFSDT((CTSdtBlock)xmlcursor.getObject(), document);
//System.out.println("inline: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtCell) {
sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), null, null);
//System.out.println("cell: " + sdt);
allsdts.add(sdt);
}
}
}
}
return allsdts;
}
But as you see in code line sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), null, null), the XWPFSDTCell totaly lost its connection to table and tablerow.
There is not a proper method to get the XWPFSDTCell directly from a XWPFTable. So If one would need to get XWPFSDTCell connected to its table, then also parsing the XML is needed. This could look like so:
private static List<XWPFSDTCell> extractSDTsFromTableRow(XWPFTableRow row) {
XWPFSDTCell sdt;
XmlCursor xmlcursor = row.getCtRow().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
QName qnameTr = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "tr", "w");
List<XWPFSDTCell> allsdts = new ArrayList<XWPFSDTCell>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
//System.out.println(xmlcursor.getObject().getClass().getName());
if (xmlcursor.getObject() instanceof CTSdtCell) {
sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), row, row.getTable().getBody());
//System.out.println("cell: " + sdt);
allsdts.add(sdt);
}
}
} else if (tokentype.isEnd()) {
//we have to check whether we are at the end of the table row
xmlcursor.push();
xmlcursor.toParent();
if (qnameTr.equals(xmlcursor.getName())) {
break;
}
xmlcursor.pop();
}
}
return allsdts;
}
And called from document like so:
...
for (XWPFTable table : document.getTables()) {
for (XWPFTableRow row : table.getRows()) {
List<XWPFSDTCell> allTrsdts = extractSDTsFromTableRow(row);
for (XWPFSDTCell sdt : allTrsdts) {
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
System.out.println(content);
}
}
}
}
...
Using curren apache poi 5.2.0 it is possible getting the XWPFSDTCell from XWPFTableRow via XWPFTableRow.getTableICells. This gets al List of ICells which is an interface which XWPFSDTCell also implemets.
So following code will get all XWPFSDTCell from tables without the need of low level XML parsing:
...
for (XWPFTable table : document.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (ICell iCell : row.getTableICells()) {
if (iCell instanceof XWPFSDTCell) {
XWPFSDTCell sdt = (XWPFSDTCell)iCell;
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
System.out.println(content);
}
}
}
}
}
...

Apache POI: Remove Chart from Word Template file entirely

I am using a Word template with Excel graphs which I want to programmatically manipulate with the Java Apache POI library. For this I also need to be able to conditionally delete a Chart which is stored in this template.
Based on Axel Richters post (Removing chart from PowerPoint slide with Apache POI) I think I am almost there, but when I want to open the updated Word file it gives an error about unreadable content. This is what I have thus far:
PackagePart packagePartChart = xWPFChart.getPackagePart();
PackagePart packagePartWordDoc = xWPFDocument.getPackagePart();
OPCPackage packageWordDoc = packagePartWordDoc.getPackage();
// iterate over all relations the chart has and remove them
for (PackageRelationship chartrelship : packagePartChart.getRelationships()) {
String partname = chartrelship.getTargetURI().toString();
PackagePart part = packageWordDoc.getPartsByName(Pattern.compile(partname)).get(0);
packageWordDoc.removePart(part);
packagePartChart.removeRelationship(chartrelship.getId());
}
// now remove the chart itself from the word doc
Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class);
removeRelation.setAccessible(true);
removeRelation.invoke(xWPFDocument, xWPFChart);
If I unzip the Word file I correctly see that:
the relation between the WordDoc and the Chart are deleted in '\word\ _rels\document.xml.rels'
the chart itself is deleted in folder '\word\charts'
the relations between the documents supporting the Chart itself are deleted in folder '\word\charts\' _rels
the related chart items themselves are deleted:
StyleN / ColorsN in folder '\word\charts' and
Microsoft_Excel_WorksheetN in folder '\word\embeddings'
Anybody any idea on what could be going wrong here?

Indeed the finding of the right paragraph holding the chart was a challenge. In the end, for simplicity sake, I added a one row/one column placeholder table with one cell with a text, let's say 'targetWordString' in it, DIRECTLY before the chart. With the below function I am location the BodyElementID of this table:
private Integer iBodyElementIterator (XWPFDocument wordDoc,String targetWordString) {
Iterator<IBodyElement> iter = wordDoc.getBodyElementsIterator();
Integer bodyElementID = null;
while (iter.hasNext()) {
IBodyElement elem = iter.next();
bodyElementID = wordDoc.getBodyElements().indexOf(elem);
if (elem instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph) elem;
for (XWPFRun runText : paragraph.getRuns()) {
String text = runText.getText(0);
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: " + text);
if (text != null && text.equals(targetWordString)) {
break;
}
}
} else if (elem instanceof XWPFTable) {
if (((XWPFTable) elem).getRow(0) != null && ((XWPFTable) elem).getRow(0).getCell(0) != null) {
// the first cell holds the name via the template
String tableTitle = ((XWPFTable) elem).getRow(0).getCell(0).getText();
if (tableTitle.equals(targetWordString)) {
break;
}
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: " + tableTitle);
} else {
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Table removed!");
}
}
else {
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: ?");
}
}
return bodyElementID;
}
In the main part of the code I am calling this function to locate the table, then first delete the chart (ID +1) and then the table (ID)
int elementIDToBeRemoved = iBodyElementIterator(xWPFWordDoc,targetWordString);
xWPFWordDoc.removeBodyElement(elementIDToBeRemoved + 1);
xWPFWordDoc.removeBodyElement(elementIDToBeRemoved);
It needs to be in this order, because the ID's are reordered once you delete a number in between, hence deleting the table first, would mean the chart would get that ID in principle.

I did find a more structural solution for it without the use of dummy tables for the positioning. The solution is build out into 2 parts:
Determine relationship id of chart in word document
Remove chart based on relationship id and remove paragraph + runs
Determine relationship id of chart in word document
First I have determined with a boolean variable 'chartUsed' whether the chart needs to be deleted.
if (!chartUsed) {
String chartPartName = xWPFChart.getPackagePart().getPartName().getName();
String ridChartToBeRemoved = null;
// iterate over all relations the chart has and remove them
for (RelationPart relation : wordDoc.getRelationParts()) {
PackageRelationship relShip = relation.getRelationship();
if (relShip.getTargetURI().toString().equals(chartPartName)) {
ridChartToBeRemoved = relShip.getId();
Core.getLogger(logNode).info("Chart with title " + chartTitle + " has relationship id " + relShip.getId());
break;
}
}
removeChartByRelId(wordDoc, ridChartToBeRemoved);
}
(Core.getLogger is an internal API from the Mendix low code platform I use).
At the end I am calling the function removeChartByRelId which will remove the chart if the relation is found within a run of a paragraph:
Remove chart based on relationship id and remove paragraph + runs
private void removeChartByRelId (XWPFDocument wordDoc, String chartRelId) {
Iterator<IBodyElement> iter = wordDoc.getBodyElementsIterator();
Integer bodyElementID = null;
while (iter.hasNext()) {
IBodyElement elem = iter.next();
bodyElementID = wordDoc.getBodyElements().indexOf(elem);
if (elem instanceof XWPFParagraph) {
Core.getLogger(logNode).trace("XWPFParagraph Body Element ID: " + bodyElementID);
XWPFParagraph paragraph = (XWPFParagraph) elem;
boolean removeParagraph = false;
for (XWPFRun run : paragraph.getRuns()) {
if (run.getCTR().getDrawingList().size()>0) {
String drawingXMLText = run.getCTR().getDrawingList().get(0).xmlText();
boolean isChart = drawingXMLText.contains("http://schemas.openxmlformats.org/drawingml/2006/chart");
String graphicDataXMLText = run.getCTR().getDrawingList().get(0).getInlineArray(0).getGraphic().getGraphicData().xmlText();
boolean contains = graphicDataXMLText.contains(chartRelId);
if (isChart && contains) {
Core.getLogger(logNode).info("Removing chart with relId " + chartRelId + " as it isnt used in data");
run.getCTR().getDrawingList().clear();
removeParagraph = true;
}
}
}
if (removeParagraph) {
for (int i = 0; i < paragraph.getRuns().size(); i++){
paragraph.removeRun(i);
}
}
}
}
}

XWPFDocument - replacing text doesn't work

I want to replace tags with values in a docx document.
Here is a line of the document :
<Site_rattachement>, le <date_avenant>
I want to replace <Site_rattachement> and <date_avenant> by some value.
My code :
doc = new XWPFDocument(OPCPackage.open(docxFile));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
replaceIfNeeded(r, text, my_value);
}
}
}
But first r.getText(0) gives me < instead of <Site_rattachement>.
Next occurence gives me Site_rattachement.
Next occurence gives me >.
Is there something wrong with my docx file?

How to update a record after extracting

private List<String> getSCFData(int trdCustomerKy, Date lastRunDate, Date currentDate) throws TradeException {
List<String> reportData = null;
String paymentDate = EMPTY_STRING;
String partyId = EMPTY_STRING;
YOWDAO hdDAO = new YOWDAO(mConnection);
List<YOWSCFExtractData> reportItems = hdDAO.getSCFData(trdCustomerKy, lastRunDate, currentDate);
if (null != reportItems && reportItems.size() > 0) {
reportData = new ArrayList<String>();
mTracer.log("Total records retrieved: " + reportItems.size());
for (YOWSCFExtractData data : reportItems) {
String Source = (null != data.getSource()) ? data.getSource() : BLANK_STRING;
String paymentCurrencyCd = (null != data.getPaymentCurrencyCd()) ? data.getPaymentCurrencyCd()
: BLANK_STRING;
String sellerName = (null != data.getSellerName()) ? data.getSellerName() : BLANK_STRING;
String paymentAmount = (null != data.getPaymentAmount()) ? data.getPaymentAmount() : BLANK_STRING;
if (null != data.getPaymentDate()) {
paymentDate = DateUtil.formatDate(data.getPaymentDate());
}
if (null != data.getapplCifId()) {
partyId = hdDAO.getPartyId(mConfiguration.getCustomerKy(), data.getapplCifId());
}
String dataRow = StringUtils.join(new String[] { Source, data.getBankRef(), partyId, sellerName,
data.getPartyId(), paymentAmount, paymentDate, paymentCurrencyCd}, COMMA);
reportData.add(dataRow);
}
}
return reportData;
}
I am extracting the data from oracle database. I want to update the record of a column once it is fetched to a string. for example when I had extracted data.getBanref() then I want to set it some string back in database. how would I do that? I am using hibernate........

What you can do is set the object data whatever values you want and then save it in the hibernate. If you want to update then use session.saveOrUpdate() or if you want to save a new record then use session.save(). Hope that helps!

You can write a hibernate query
Update table_Name column_Name and set it to whatever you want and call this query in your program. It will be easier i think so

Insert a line break inside a paragraph in XWPFDocument

I am writing values into a word template using apache poi 3.8. I replace specific strings in a word file (keys) with required values, e.g. word document has a paragraph containing key %Entry1%, and I want to replace it with "Entry text line1 \nnew line". All replaced keys and values are stored in a Map in my realisation.
Map<String, String> replacedElementsMap;
The code for HWPFDocument is:
Range range = document.getRange();
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
range.replaceText(entry.getKey(), entry.getValue());
}
This code works fine, I just have to put \n in the entry string for a line break. However I can't find similiar method for XWPFDocument. My current code for XWPFDocument is:
List<XWPFParagraph> xwpfParagraphs = document.getParagraphs();
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
Now the "\n"-string doesn't result in the carriage return, and if I use xwpfRun.addCarriageReturn(); I just get a line break after the paragraph. How should I create new lines in xwpf correctly?

I have another solution and it is easier:
if (data.contains("\n")) {
String[] lines = data.split("\n");
run.setText(lines[0], 0); // set first line into XWPFRun
for(int i=1;i<lines.length;i++){
// add break and insert new text
run.addBreak();
run.setText(lines[i]);
}
} else {
run.setText(data, 0);
}

After all, I had to create paragraphs manually. Basically, I split the replace string to an array and create a new paragraph for each array element. Here is the code:
protected void replaceElementInParagraphs(List<XWPFParagraph> xwpfParagraphs,
Map<String, String> replacedMap) {
if (!searchInParagraphs(xwpfParagraphs, replacedMap)) {
replaceElementInParagraphs(xwpfParagraphs, replacedMap);
}
}
private boolean searchInParagraphs(List<XWPFParagraph> xwpfParagraphs, Map<String, String> replacedMap) {
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
if (entry.getValue().contains("\n")) {
String[] paragraphs = entry.getValue().split("\n");
entry.setValue("");
createParagraphs(xwpfParagraph, paragraphs);
return false;
}
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
return true;
}
private void createParagraphs(XWPFParagraph xwpfParagraph, String[] paragraphs) {
if(xwpfParagraph!=null){
for (int i = 0; i < paragraphs.length; i++) {
XmlCursor cursor = xwpfParagraph.getCTP().newCursor();
XWPFParagraph newParagraph = document.insertNewParagraph(cursor);
newParagraph.setAlignment(xwpfParagraph.getAlignment());
newParagraph.getCTP().insertNewR(0).insertNewT(0).setStringValue(paragraphs[i]);
newParagraph.setNumID(xwpfParagraph.getNumID());
}
document.removeBodyElement(document.getPosOfParagraph(xwpfParagraph));
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Editing Word Document using Apache POI [closed] - java

Related

Apache POI docx file content control parse

Apache POI: Remove Chart from Word Template file entirely

XWPFDocument - replacing text doesn't work

How to update a record after extracting

Insert a line break inside a paragraph in XWPFDocument

Categories

Resources