I'm currently trying to work on the code mentioned on a previous post called Replacing a text in Apache POI XWPF.
I have tried the below and it works but I don't know if I am missing anything. When I run the code the text is not replaced but added onto the end of what was searched. For example I have created a basic word document and entered the text "test". In the below code when I run it I eventually get the new document with the text "testDOG".
I have had to change the original code from String text = r.getText(0) to String text = r.toString() because I kept getting a NullError while running the code.
import java.io.*;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class testPOI {
public static void main(String[] args) throws Exception{
String filepath = "F:\\MASTER_DOC.docx";
String outpath = "F:\\Test.docx";
XWPFDocument doc = new XWPFDocument(OPCPackage.open(filepath));
for (XWPFParagraph p : doc.getParagraphs()){
for (XWPFRun r : p.getRuns()){
String text = r.toString();
if(text.contains("test")) {
text = text.replace("test", "DOG");
r.setText(text);
}
}
}
doc.write(new FileOutputStream(outpath));
}
EDIT: Thanks for your help everyone. I browsed around and found a solution on Replace table column value in Apache POI
This method replace search Strings in paragraphs and is able to work with Strings spanning over more than one Run.
private long replaceInParagraphs(Map<String, String> replacements, List<XWPFParagraph> xwpfParagraphs) {
long count = 0;
for (XWPFParagraph paragraph : xwpfParagraphs) {
List<XWPFRun> runs = paragraph.getRuns();
for (Map.Entry<String, String> replPair : replacements.entrySet()) {
String find = replPair.getKey();
String repl = replPair.getValue();
TextSegement found = paragraph.searchText(find, new PositionInParagraph());
if ( found != null ) {
count++;
if ( found.getBeginRun() == found.getEndRun() ) {
// whole search string is in one Run
XWPFRun run = runs.get(found.getBeginRun());
String runText = run.getText(run.getTextPosition());
String replaced = runText.replace(find, repl);
run.setText(replaced, 0);
} else {
// The search string spans over more than one Run
// Put the Strings together
StringBuilder b = new StringBuilder();
for (int runPos = found.getBeginRun(); runPos <= found.getEndRun(); runPos++) {
XWPFRun run = runs.get(runPos);
b.append(run.getText(run.getTextPosition()));
}
String connectedRuns = b.toString();
String replaced = connectedRuns.replace(find, repl);
// The first Run receives the replaced String of all connected Runs
XWPFRun partOne = runs.get(found.getBeginRun());
partOne.setText(replaced, 0);
// Removing the text in the other Runs.
for (int runPos = found.getBeginRun()+1; runPos <= found.getEndRun(); runPos++) {
XWPFRun partNext = runs.get(runPos);
partNext.setText("", 0);
}
}
}
}
}
return count;
}
Your logic is not quite right. You need to collate all the text in the runs first and then do the replace. You also need to remove all runs for the paragraph and add a new single run if a match on "test" is found.
Try this instead:
public class testPOI {
public static void main(String[] args) throws Exception{
String filepath = "F:\\MASTER_DOC.docx";
String outpath = "F:\\Test.docx";
XWPFDocument doc = new XWPFDocument(new FileInputStream(filepath));
for (XWPFParagraph p : doc.getParagraphs()){
int numberOfRuns = p.getRuns().size();
// Collate text of all runs
StringBuilder sb = new StringBuilder();
for (XWPFRun r : p.getRuns()){
int pos = r.getTextPosition();
if(r.getText(pos) != null) {
sb.append(r.getText(pos));
}
}
// Continue if there is text and contains "test"
if(sb.length() > 0 && sb.toString().contains("test")) {
// Remove all existing runs
for(int i = 0; i < numberOfRuns; i++) {
p.removeRun(i);
}
String text = sb.toString().replace("test", "DOG");
// Add new run with updated text
XWPFRun run = p.createRun();
run.setText(text);
p.addRun(run);
}
}
doc.write(new FileOutputStream(outpath));
}
}
Worth noticing that, run.getPosition() returns -1 most of the cases. But it does not effect when there is only one text postion per a run. But, technically it can have any number of textPositions and I've experienced such cases. So, the best way is to getCTR () for run and terate through each the run for count of textPositions. Number of textPositions are equal to ctrRun.sizeOfTArray()
A sample code
for (XWPFRun run : p.getRuns()){
CTR ctrRun = run.getCTR();
int sizeOfCtr = ctrRun.sizeOfTArray();
for(int textPosition=0; textPosition<sizeOfCtr){
String text = run.getText(textPosition);
if(text.contains("test")) {
text = text.replace("test", "DOG");
r.setText(text,textPosition);
}
}
}
just change text for every run in your paragraph, and then save the file.
this code worked for mi
XWPFDocument doc = new XWPFDocument(new FileInputStream(filepath));
for (XWPFParagraph p : doc.getParagraphs()) {
StringBuilder sb = new StringBuilder();
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("variable1")) {
text = text.replace("variable1", "valeur1");
r.setText(text, 0);
}
if (text != null && text.contains("variable2")) {
text = text.replace("variable2", "valeur2");
r.setText(text, 0);
}
if (text != null && text.contains("variable3")) {
text = text.replace("variable3", "valeur3");
r.setText(text, 0);
}
}
}
doc.write(new FileOutputStream(outpath));
Related
My goal is to search for a word or a phrase in a Word .docx document, and add a comment to it. I have been referring to the sample code found here, here, and here with regards to adding comments using Apache POI. However, all three examples add comments to a whole paragraph (or even a whole table) rather than to a specific word, or run.
I have tried creating an XML cursor at the run level, but cannot cast it to the necessary CTMarkupRange to apply the start and end of the comment.
// Create comment
BigInteger cId = getCommentId(comments);
ctComment = comments.addNewComment();
ctComment.setAuthor("John Smith");
ctComment.setInitials("JS");
ctComment.setDate(new GregorianCalendar(Locale.getDefault()));
ctComment.addNewP().addNewR().addNewT().setStringValue("Test Comment");
ctComment.setId(cId);
// Set CommentRangeStart
String uri = CTMarkupRange.type.getName().getNamespaceURI();
String localPart = "commentRangeStart";
// XmlCursor cursor = p.getCTP().newCursor();
XmlCursor cursor = r.getCTR().newCursor();
cursor.toFirstChild();
cursor.beginElement(localPart, uri);
cursor.toParent();
CTMarkupRange commentRangeStart = (CTMarkupRange) cursor.getObject(); // This line throws a ClassCastException error
cursor.dispose();
commentRangeStart.setId(cId);
// Set CommentRangeEnd and CommentReference
p.getCTP().addNewCommentRangeEnd().setId(cId);
// p.getCTP().addNewR().addNewCommentReference().setId(cId);
r.getCTR().addNewCommentReference().setId(cId);
EDIT1: Snippet showing the logic for looping through the runs
for(XWPFParagraph p:paragraphs){
List<XWPFRun> runs = p.getRuns();
if (runs.size() > 0) {
for (XWPFRun r : runs) {
String text = r.getText(0);
for (Map.Entry<String, List<String>> entry : rules.entrySet()) {
String key = entry.getKey();
List<String> value = entry.getValue();
for (int i = 0; i < value.size(); i++) {
if (text != null && regexContains(text, value.get(i))) {
// Create comment
BigInteger cId = getCommentId(comments);
ctComment = comments.addNewComment();
ctComment.setAuthor("John Smith");
ctComment.setInitials("JS");
ctComment.setDate(new GregorianCalendar(Locale.getDefault()));
ctComment.addNewP().addNewR().addNewT().setStringValue(key);
ctComment.setId(cId);
// New snippet from Axel Richter
p.getCTP().addNewCommentRangeStart().setId(cId);
p.getCTP().addNewCommentRangeEnd().setId(cId);
p.getCTP().addNewR().addNewCommentReference().setId(cId);
}
}
}
}
}
}
This is not as difficult as you might think.
To comment a run inside a paragraph, the comment range start needs to be set before text run starts in paragraph. The comment range end needs to be set after text run ends in paragraph. This is exactly what my code examples had done already. Of course all paragraphs in my code examples have had only one text run.
In following complete example the second comment comments the word "second" only. To do so the paragraph has three text runs. First having text "Paragraph with the ", second having text "second" and has comment and third having text " comment.".
import java.io.*;
import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;
import org.apache.xmlbeans.*;
import org.apache.poi.xwpf.usermodel.*;
import static org.apache.poi.ooxml.POIXMLTypeLoader.DEFAULT_XML_OPTIONS;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import javax.xml.namespace.QName;
import java.math.BigInteger;
import java.util.GregorianCalendar;
import java.util.Locale;
public class CreateWordWithComments {
//a method for creating the CommentsDocument /word/comments.xml in the *.docx ZIP archive
private static MyXWPFCommentsDocument createCommentsDocument(XWPFDocument document) throws Exception {
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/comments.xml");
PackagePart part = oPCPackage.createPart(partName, "application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml");
MyXWPFCommentsDocument myXWPFCommentsDocument = new MyXWPFCommentsDocument(part);
String rId = document.addRelation(null, XWPFRelation.COMMENT, myXWPFCommentsDocument).getRelationship().getId();
return myXWPFCommentsDocument;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
MyXWPFCommentsDocument myXWPFCommentsDocument = createCommentsDocument(document);
CTComments comments = myXWPFCommentsDocument.getComments();
CTComment ctComment;
XWPFParagraph paragraph;
XWPFRun run;
//first comment
BigInteger cId = BigInteger.ZERO;
ctComment = comments.addNewComment();
ctComment.setAuthor("Axel Ríchter");
ctComment.setInitials("AR");
ctComment.setDate(new GregorianCalendar(Locale.US));
ctComment.addNewP().addNewR().addNewT().setStringValue("The first comment.");
ctComment.setId(cId);
paragraph = document.createParagraph();
paragraph.getCTP().addNewCommentRangeStart().setId(cId); //comment range start is set before text run
run = paragraph.createRun();
run.setText("Paragraph with the first comment.");
paragraph.getCTP().addNewCommentRangeEnd().setId(cId); //comment range end is set after text run
paragraph.getCTP().addNewR().addNewCommentReference().setId(cId);
//paragraph without comment
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Paragraph without comment.");
//second comment
cId = cId.add(BigInteger.ONE);
ctComment = comments.addNewComment();
ctComment.setAuthor("Axel Ríchter");
ctComment.setInitials("AR");
ctComment.setDate(new GregorianCalendar(Locale.US));
ctComment.addNewP().addNewR().addNewT().setStringValue("The second comment. Comments the word \"second\".");
ctComment.setId(cId);
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Paragraph with the ");
paragraph.getCTP().addNewCommentRangeStart().setId(cId); //comment range start is set before text run
run = paragraph.createRun();
run.setText("second");
paragraph.getCTP().addNewCommentRangeEnd().setId(cId); //comment range end is set after text run
run = paragraph.createRun();
run.setText(" comment.");
paragraph.getCTP().addNewR().addNewCommentReference().setId(cId);
//write document
FileOutputStream out = new FileOutputStream("CreateWordWithComments.docx");
document.write(out);
out.close();
document.close();
}
//a wrapper class for the CommentsDocument /word/comments.xml in the *.docx ZIP archive
private static class MyXWPFCommentsDocument extends POIXMLDocumentPart {
private CTComments comments;
private MyXWPFCommentsDocument(PackagePart part) throws Exception {
super(part);
comments = CommentsDocument.Factory.newInstance().addNewComments();
}
private CTComments getComments() {
return comments;
}
#Override
protected void commit() throws IOException {
XmlOptions xmlOptions = new XmlOptions(DEFAULT_XML_OPTIONS);
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTComments.type.getName().getNamespaceURI(), "comments"));
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
comments.save(out, xmlOptions);
out.close();
}
}
}
I'm trying to use PDFBOX 2.0 to replace empty or delete a text pattern, (in my case i want to remove all "[QR]" words from all PDF), but I can't find anything that works for me.
I tried itext, but the same, nothing works.
The "[QR]" string from my pdf were edited after the PDF was created, maybe that's why they don't appear as tj operators?
My main:
replaceText(documentoPDF, "[QR]", "");
My method (i printed Tj values and my pattern dont appear there):
public void replaceText(PDDocument documentoPDF, String searchString, String replacement) throws IOException{
for ( PDPage page : documentoPDF.getPages()){
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List<?> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++){
Object next = tokens.get(j);
if (next instanceof Operator){
Operator op = (Operator) next;
String pstring = "";
int prej = 0;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj"))
{
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else
if (op.getName().equals("TJ"))
{
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++)
{
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString)
{
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
if (j == prej) {
pstring += string;
} else {
prej = j;
pstring = string;
}
}
}
System.out.println(pstring.trim());
if (searchString.equals(pstring.trim()))
{
COSString cosString2 = (COSString) previous.getObject(0);
cosString2.setValue(replacement.getBytes());
int total = previous.size()-1;
for (int k = total; k > 0; k--) {
previous.remove(k);
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(documentoPDF);
OutputStream out = updatedStream.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
out.close();
page.setContents(updatedStream);
}
documentoPDF.save("resources\\resultado\\nuevo.pdf");
}
This is an example of pdf with some [QR] patterns: http://www.mediafire.com/file/9w3kkc4yozwsfms/file
If someone can help, i will appreciate it.
I can upload my entire project if you need
Thanks in advance.
As already mentioned in comments, the reason why your code doesn't work is simple - you completely ignore the encoding of the font of that text. In the content stream there actually are [( >) ( 4) ( 5) ( #) ] TJ instructions (The "spaces" before '>', '4', '5', and '#' actually are zero bytes, 0x00). Thus, apparently the encoding is some 16bit encoding which additionally does not have ASCII naturally embedded.
To properly take the font into account one has to keep track of the current font. This means parsing the whole content stream and analyzing text font setting calls, save graphics state calls, and restore graphics state calls. Then you have to retrieve the proper font object from the correct resources.
All this actually is already done by the PDFBox content parsing framework used for e.g. text extraction. Thus, we can create a content stream editor around this framework.
Actually, this also has already been done, see the PdfContentStreamEditor from this answer.
As in case of your document the text pieces to delete are drawn by a single text drawing instruction each and each of these instructions draws only a text piece to remove, we can simply look at the text the current instruction draws and then decide whether to keep the instruction or not:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
final StringBuilder recentChars = new StringBuilder();
#Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement)
throws IOException {
String string = font.toUnicode(code);
if (string != null)
recentChars.append(string);
super.showGlyph(textRenderingMatrix, font, code, displacement);
}
#Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String recentText = recentChars.toString();
recentChars.setLength(0);
String operatorString = operator.getName();
if (TEXT_SHOWING_OPERATORS.contains(operatorString) && "[QR]".equals(recentText))
{
return;
}
super.write(contentStreamWriter, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
editor.processPage(page);
}
document.save("nuevo-noQrText.pdf");
(EditPageContent test testRemoveQrTextNuevo)
Depending on your PDFBox version the showGlyph method to override may have a fifth parameter; thus, please check the showGlyph signature of your PDFBox copy and adapt if this code does not work. Thanks to #DanielNorberg for the hint!
In the result the "[QR]" texts underneath the QR codes have vanished, e.g.
became
I hve one pdf file, which contain 60 pages. In each pages I've unique and repeated Invoice Nos. Im using Apache PDFBOX.
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
import java.util.regex.*;
public class PDFtest1 {
public static void main(String[] args){
PDDocument pd;
try {
File input = new File("G:\\Sales.pdf");
// StringBuilder to store the extracted text
StringBuilder sb = new StringBuilder();
pd = PDDocument.load(input);
PDFTextStripper stripper = new PDFTextStripper();
// Add text to the StringBuilder from the PDF
sb.append(stripper.getText(pd));
Pattern p = Pattern.compile("Invoice No.\\s\\w\\d\\d\\d\\d\\d\\d\\d\\d\\d\\d");
// Matcher refers to the actual text where the pattern will be found
Matcher m = p.matcher(sb);
while (m.find()){
// group() method refers to the next number that follows the pattern we have specified.
System.out.println(m.group());
}
if (pd != null) {
pd.close();
}
} catch (Exception e){
e.printStackTrace();
}
}
}
I'm able to read all Invoice Nos. using java regex.
Finally the Result is as follow
run:
Invoice No. D0000003010
Invoice No. D0000003011
Invoice No. D0000003011
Invoice No. D0000003011
Invoice No. D0000003011
Invoice No. D0000003012
Invoice No. D0000003012
Invoice No. D0000003012
Invoice No. D0000003013
Invoice No. D0000003013
Invoice No. D0000003014
Invoice No. D0000003014
Invoice No. D0000003015
Invoice No. D0000003016
I need to split the pdf according to tht Invoice No.s. For example Invoice No. D0000003011, all pdf pages should be merge as a single pdf and so on.
Hw can i achive dis. ..
public static void main(String[] args) throws IOException, COSVisitorException
{
File input = new File("G:\\Sales.pdf");
PDDocument outputDocument = null;
PDDocument inputDocument = PDDocument.loadNonSeq(input, null);
PDFTextStripper stripper = new PDFTextStripper();
String currentNo = null;
for (int page = 1; page <= inputDocument.getNumberOfPages(); ++page)
{
stripper.setStartPage(page);
stripper.setEndPage(page);
String text = stripper.getText(inputDocument);
Pattern p = Pattern.compile("Invoice No.(\\s\\w\\d\\d\\d\\d\\d\\d\\d\\d\\d\\d)");
// Matcher refers to the actual text where the pattern will be found
Matcher m = p.matcher(text);
String no = null;
if (m.find())
{
no = m.group(1);
}
System.out.println("page: " + page + ", value: " + no);
PDPage pdPage = (PDPage) inputDocument.getDocumentCatalog().getAllPages().get(page - 1);
if (no != null && !no.equals(currentNo))
{
saveCloseCurrent(currentNo, outputDocument);
// create new document
outputDocument = new PDDocument();
currentNo = no;
}
if (no == null && currentNo == null)
{
System.out.println ("header page ??? " + page + " skipped");
continue;
}
// append page to current document
outputDocument.importPage(pdPage);
}
saveCloseCurrent(currentNo, outputDocument);
inputDocument.close();
}
private static void saveCloseCurrent(String currentNo, PDDocument outputDocument)
throws IOException, COSVisitorException
{
// save to new output file
if (currentNo != null)
{
// save document into file
File f = new File(currentNo + ".pdf");
if (f.exists())
{
System.err.println("File " + f + " exists?!");
System.exit(-1);
}
outputDocument.save(f);
outputDocument.close();
}
}
Beware:
this has not been tested with your file (because I don't have it);
the code makes the assumption that identical invoice numbers are always together;
your regular expression has been changed slightly;
make sure that the first and the last PDF files are correct, and check a few at random, and with different viewers if available;
verify that the total count of files is as expected;
the summed up size of all files will be bigger than the source file, this is because of the font resources;
use the 1.8.10 version. Don't use PDFBox 0.7.3.jar at the same time!
error handling is very basic, you need to change it;
update 19.8.2015:
it now supports pages with no invoice number, these will be appended.
I've been trying to work through the examples FieldMailMerge and VariableReplace but can't seem to get a local test case running. I'm basically trying to start with one docx template document and have it create x docx documents from that one template with the variables replaced.
In the code below docx4jReplaceSimpleTest() tries to replace a single variable but fails to do so. The ${} values in the template files are removed as part of the processing therefore I believe it's finding them but not replacing them for some reason. I understand it could be due to formatting as explained in the comments of the sample code but for troubleshooting just to get something working I'm trying it anyways.
In the code below docx4jReplaceTwoPeopleTest(), the one I want to get working, I'm trying to do it in what I believe is the proper way, but that's not finding or replacing anything. It's not even removing the ${} from the docx file.
public static void main(String[] args) throws Exception
{
docx4jReplaceTwoPeopleTest();
docx4jReplaceSimpleTest();
}
private static void docx4jReplaceTwoPeopleTest() throws Exception
{
String docxFile = "C:/temp/template.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(docxFile));
List<Map<DataFieldName, String>> data = new ArrayList<Map<DataFieldName, String>>();
Map<DataFieldName, String> map1 = new HashMap<DataFieldName, String>();
map1.put(new DataFieldName("Person.Firstname"), "myFirstname");
map1.put(new DataFieldName("Person.Lastname"), "myLastname");
data.add(map1);
Map<DataFieldName, String> map2 = new HashMap<DataFieldName, String>();
map2.put(new DataFieldName("Person.Firstname"), "myFriendsFirstname");
map2.put(new DataFieldName("Person.Lastname"), "myFriendsLastname");
data.add(map2);
org.docx4j.model.fields.merge.MailMerger.setMERGEFIELDInOutput(OutputField.KEEP_MERGEFIELD);
int x=0;
for(Map<DataFieldName, String> docMapping: data)
{
org.docx4j.model.fields.merge.MailMerger.performMerge(wordMLPackage, docMapping, true);
wordMLPackage.save(new java.io.File("C:/temp/OUT__MAIL_MERGE_" + x++ + ".docx") );
}
}
private static void docx4jReplaceSimpleTest() throws Exception
{
String docxFile = "C:/temp/template.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(docxFile));
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("Person.Firstname", "myFirstname");
mappings.put("Person.Lastname", "myLastname");
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
documentPart.variableReplace(mappings);
wordMLPackage.save(new java.io.File("C:/temp/OUT_SIMPLE.docx") );
}
The docx file consists of the following text (no formatting is done):
This is a letter to someone
Hi ${Person.Firstname} ${Person.Lastname},
How are you?
Thank you again. I wish to see you soon ${Person.Firstname}
Regards,
Someone
Notice that I'm also trying to replace Person.Firstname at least twice as well. As the lastname is not even replaced I don't think this has anything to do with it but I'm adding it just in case.
I had the same issue and of course I could not force user to do some extra stuff when composing their word document so I decided to just write an algo to scan the whole document for expressions appending run after run, inserting replacement value and remove expressions in the second run. In case other people may need it below is what I did. I got the class from somewhere so it may be familiar. I just added the method searchAndReplace()
package com.my.docx4j;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.wml.ContentAccessor;
import org.docx4j.wml.Text;
public class Docx4j {
public static void main(String[] args) throws Docx4JException, IOException, JAXBException {
String filePath = "C:\\Users\\markamm\\Documents\\tmp\\";
String file = "Hello.docx";
Docx4j docx4j = new Docx4j();
WordprocessingMLPackage template = docx4j.getTemplate(filePath+file);
// MainDocumentPart documentPart = template.getMainDocumentPart();
List<Object> texts = getAllElementFromObject(
template.getMainDocumentPart(), Text.class);
searchAndReplace(texts, new HashMap<String, String>(){
{
this.put("${abcd_efg.soanother_hello_broken_shit}", "Company Name here...");
this.put("${I_dont_know}", "Hmmm lemme see");
this.put("${${damn.right_lol}", "Gotcha!!!");
this.put("${one_here_and}", "Firstname");
this.put("${one}", "ChildA");
this.put("${two}", "ChildB");
this.put("${three}", "ChildC");
}
#Override
public String get(Object key) {
// TODO Auto-generated method stub
return super.get(key);
}
});
docx4j.writeDocxToStream(template, filePath+"Hello2.docx");
}
public static void searchAndReplace(List<Object> texts, Map<String, String> values){
// -- scan all expressions
// Will later contain all the expressions used though not used at the moment
List<String> els = new ArrayList<String>();
StringBuilder sb = new StringBuilder();
int PASS = 0;
int PREPARE = 1;
int READ = 2;
int mode = PASS;
// to nullify
List<int[]> toNullify = new ArrayList<int[]>();
int[] currentNullifyProps = new int[4];
// Do scan of els and immediately insert value
for(int i = 0; i<texts.size(); i++){
Object text = texts.get(i);
Text textElement = (Text) text;
String newVal = "";
String v = textElement.getValue();
// System.out.println("text: "+v);
StringBuilder textSofar = new StringBuilder();
int extra = 0;
char[] vchars = v.toCharArray();
for(int col = 0; col<vchars.length; col++){
char c = vchars[col];
textSofar.append(c);
switch(c){
case '$': {
mode=PREPARE;
sb.append(c);
// extra = 0;
} break;
case '{': {
if(mode==PREPARE){
sb.append(c);
mode=READ;
currentNullifyProps[0]=i;
currentNullifyProps[1]=col+extra-1;
System.out.println("extra-- "+extra);
} else {
if(mode==READ){
// consecutive opening curl found. just read it
// but supposedly throw error
sb = new StringBuilder();
mode=PASS;
}
}
} break;
case '}': {
if(mode==READ){
mode=PASS;
sb.append(c);
els.add(sb.toString());
newVal +=textSofar.toString()
+(null==values.get(sb.toString())?sb.toString():values.get(sb.toString()));
textSofar = new StringBuilder();
currentNullifyProps[2]=i;
currentNullifyProps[3]=col+extra;
toNullify.add(currentNullifyProps);
currentNullifyProps = new int[4];
extra += sb.toString().length();
sb = new StringBuilder();
} else if(mode==PREPARE){
mode = PASS;
sb = new StringBuilder();
}
}
default: {
if(mode==READ) sb.append(c);
else if(mode==PREPARE){
mode=PASS;
sb = new StringBuilder();
}
}
}
}
newVal +=textSofar.toString();
textElement.setValue(newVal);
}
// remove original expressions
if(toNullify.size()>0)
for(int i = 0; i<texts.size(); i++){
if(toNullify.size()==0) break;
currentNullifyProps = toNullify.get(0);
Object text = texts.get(i);
Text textElement = (Text) text;
String v = textElement.getValue();
StringBuilder nvalSB = new StringBuilder();
char[] textChars = v.toCharArray();
for(int j = 0; j<textChars.length; j++){
char c = textChars[j];
if(null==currentNullifyProps) {
nvalSB.append(c);
continue;
}
// I know 100000 is too much!!! And so what???
int floor = currentNullifyProps[0]*100000+currentNullifyProps[1];
int ceil = currentNullifyProps[2]*100000+currentNullifyProps[3];
int head = i*100000+j;
if(!(head>=floor && head<=ceil)){
nvalSB.append(c);
}
if(j>currentNullifyProps[3] && i>=currentNullifyProps[2]){
toNullify.remove(0);
if(toNullify.size()==0) {
currentNullifyProps = null;
continue;
}
currentNullifyProps = toNullify.get(0);
}
}
textElement.setValue(nvalSB.toString());
}
}
private WordprocessingMLPackage getTemplate(String name)
throws Docx4JException, FileNotFoundException {
WordprocessingMLPackage template = WordprocessingMLPackage
.load(new FileInputStream(new File(name)));
return template;
}
private static List<Object> getAllElementFromObject(Object obj,
Class<?> toSearch) {
List<Object> result = new ArrayList<Object>();
if (obj instanceof JAXBElement)
obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch))
result.add(obj);
else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}
private void replacePlaceholder(WordprocessingMLPackage template,
String name, String placeholder) {
List<Object> texts = getAllElementFromObject(
template.getMainDocumentPart(), Text.class);
for (Object text : texts) {
Text textElement = (Text) text;
if (textElement.getValue().equals(placeholder)) {
textElement.setValue(name);
}
}
}
private void writeDocxToStream(WordprocessingMLPackage template,
String target) throws IOException, Docx4JException {
File f = new File(target);
template.save(f);
}
}
The issue is that I was trying to create the placeholders as just plain text within the docx file. What I should've been doing instead is using the MergeField functionality within Word which I didn't fully understand and appreciate, hence the confusion. Basically I didn't know that this is what was being meant within the documentation because I'd never used it, I just assumed it was still some kind of xml text replacement.
That being said it's still fairly difficult to find a good explanation of this Word feature. After looking at a few dozen explanations I still couldn't find a nice clean explanation of this Word feature. The best explanation I was able find can be found here. Basically you want to do Step 3.
That being said, once I created MergeFields in Word and ran the code, it worked perfectly. The method to use is docx4jReplaceTwoPeopleTest. The problem wasn't in the code but in my understanding of how it worked within Word.
I am able to read tables from doc file. (see following code)
public String readDocFile(String filename, String str) {
try {
InputStream fis = new FileInputStream(filename);
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
boolean intable = false;
boolean inrow = false;
for (int i = 0; i < range.numParagraphs(); i++) {
Paragraph par = range.getParagraph(i);
//System.out.println("paragraph "+(i+1));
//System.out.println("is in table: "+par.isInTable());
//System.out.println("is table row end: "+par.isTableRowEnd());
//System.out.println(par.text());
if (par.isInTable()) {
if (!intable) {//System.out.println("New table creating"+intable);
str += "<table border='1'>";
intable = true;
}
if (!inrow) {//System.out.println("New row creating"+inrow);
str += "<tr>";
inrow = true;
}
if (par.isTableRowEnd()) {
inrow = false;
} else {
//System.out.println("New text adding"+par.text());
str += "<td>" + par.text() + "</td>";
}
} else {
if (inrow) {//System.out.println("Closing Row");
str += "</tr>";
inrow = false;
}
if (intable) {//System.out.println("Closing Table");
str += "</table>";
intable = false;
}
str += par.text() + "<br/>";
}
}
} catch (Exception e) {
System.out.println("Exception: " + e);
}
return str;
}
Can anyone suggest me how can I do the same with docx file ?
I tried to do that. But could not locate a replacement of 'Range' class.
Please help.
By popular request, promoting a comment to an answer...
In the Apache POI code examples, you can find the XWPF SimpleTable example
This shows how to create a simple table, and how to create one with lots of fancy styling.
Assuming you just want a simple table from scratch, in a brand new workbook, then the code you need goes along the lines of:
// Start with a new document
XWPFDocument doc = new XWPFDocument();
// Add a 3 column, 3 row table
XWPFTable table = doc.createTable(3, 3);
// Set some text in the middle
table.getRow(1).getCell(1).setText("EXAMPLE OF TABLE");
// table cells have a list of paragraphs; there is an initial
// paragraph created when the cell is created. If you create a
// paragraph in the document to put in the cell, it will also
// appear in the document following the table, which is probably
// not the desired result.
XWPFParagraph p1 = table.getRow(0).getCell(0).getParagraphs().get(0);
XWPFRun r1 = p1.createRun();
r1.setBold(true);
r1.setText("The quick brown fox");
r1.setItalic(true);
r1.setFontFamily("Courier");
r1.setUnderline(UnderlinePatterns.DOT_DOT_DASH);
r1.setTextPosition(100);
// And at the end
table.getRow(2).getCell(2).setText("only text");
// Save it out, to view in word
FileOutputStream out = new FileOutputStream("simpleTable.docx");
doc.write(out);
out.close();
The following snippet uses Apache POI 5.0.0, and it works well when reading docx table data
public void readDocxTables(String docxFilePath) throws FileNotFoundException, IOException {
XWPFDocument doc = new XWPFDocument(new FileInputStream(docxFilePath));
for(XWPFTable table : doc.getTables()) {
for(XWPFTableRow row : table.getRows()) {
for(XWPFTableCell cell : row.getTableCells()) {
System.out.println("cell text: " + cell.getText());
}
}
}
}
This is not Apache POI, but using third party component found it much easier.
An example how to get tables from a docx file.
Of course, just idea if you do not find solution with the POI,