I have developed a Java code which replaces some string patterns in a template and then generates a output docx file, using Apache POI. It was easy to replace the patterns in the headers and paragraphs, but I got an issue while trying to replace inside textboxes. I am using the code provided by Axel Ritcher in Replace text in text box of docx by using Apache POI, but the problem is that it is trimming some white spaces on each run.
For example:
cp -r basedir destination
Becomes:
cp-r basedir destination
The part of the code responsible for doing this substitution is this (The parameters of the function are: doc_buffer is a XWPFDocument, pattern and replacement are both Strings):
for (XWPFParagraph paragraph : doc_buffer.getParagraphs()) {
XmlCursor cursor = paragraph.getCTP().newCursor();
cursor.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:txbxContent/w:p/w:r");
List<XmlObject> ctrsintxtbx = new ArrayList<XmlObject>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrsintxtbx.add(obj);
}
for (XmlObject obj : ctrsintxtbx) {
CTR ctr = CTR.Factory.parse(obj.toString());
XWPFRun bufferrun = new XWPFRun(ctr, (IRunBody) paragraph);
String text = bufferrun.getText(0);
if ((text != null) && (text.contains(pattern))) {
text = text.replaceAll(pattern, replacement);
bufferrun.setText(text, 0);
}
obj.set(bufferrun.getCTR());
}
}
If you need any additional information, please let me know.
Thanks in advance!
Somehow I have managed to find the issue that was causing this. I'll post it here so if anyone have the same problem, they can see how I have solved.
The method CTR.Factory.parse used on the example required a String type, but if you check the XmlObject.Factory docs, there are many parse function which require different types of parameters to use. So I have changed this line:
CTR ctr = CTR.Factory.parse(obj.toString());
To the method that accepts XMLInputStream as argument, and then created a new InputStream for the XmlObject:
CTR ctr = CTR.Factory.parse(obj.newInputStream());
Related
My goal is to insert a docx (with keeping the style / formatting) into another docx's specific row. In the second docx there is a word, "placeholder" and first, I have to find this word, and then change it to first docx text, keeping the inserted docx styles and formats.
I have an idea. Maybe I should create a new docx, divide the second docx with the "placeholder", put the first part to the new docx, then put the whole docx, and then put the second part of the second docx. But how can I keep the styles and formats? I don't have images / tablets or anything, just texts and formatting stuff, like lists, tabs, text style, etc.
Currently I am using apache POI and java. (I tried docx4j, but I had less success)
The example code does a simple merging but nothing more. How can I find the "placeholder" word and insert my docx there?
public static void merge(InputStream src1, InputStream src2, OutputStream dest) throws Exception {
OPCPackage src1Package = OPCPackage.open(src1);
OPCPackage src2Package = OPCPackage.open(src2);
XWPFDocument src1Document = new XWPFDocument(src1Package);
CTBody src1Body = src1Document.getDocument().getBody();
XWPFDocument src2Document = new XWPFDocument(src2Package);
CTBody src2Body = src2Document.getDocument().getBody();
appendBody(src1Body, src2Body);
src1Document.write(dest);
}
private static void appendBody(CTBody src, CTBody append) throws Exception {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String appendString = append.xmlText(optionsOuter);
String srcString = src.xmlText();
String prefix = srcString.substring(0, srcString.indexOf(">") + 1);
String mainPart = srcString.substring(srcString.indexOf(">") + 1, srcString.lastIndexOf("<"));
String suffix = srcString.substring(srcString.lastIndexOf("<"));
String addPart = appendString.substring(appendString.indexOf(">") + 1, appendString.lastIndexOf("<"));
CTBody makeBody = CTBody.Factory.parse(prefix + mainPart + addPart + suffix);
src.set(makeBody);
}
Re docx4j you can insert a docx at a specific location (eg in a table cell) using MergeDocx in our commercial Docx4j Enterprise.
You can get a trial version from https://www.plutext.com/m/index.php/products
Then see the MergeIntoTableCell sample and documentation.
Other solution is: in my example in mainPart, we can find the text (using indexof / lastindexof / substring are better, than using regex) and add (and replace the text to) the addPart and ready to go.
2 possible problem:
1: if we have numbered lists / bulleted lists in addPart, that can be be messy after adding to the other document.
2: inserting picture is not possible in this way, it has to be handle in other way.
I want to print a simple pretty json string (containing multiple line breaks - many \n) into a word document. I tried the following but docx4j just prints all the contents inline in one single line (without \n). Ideally it should print multiline pretty json as it is recognising the "\n" the json string contains :
1)
wordMLPackage.getMainDocumentPart().addParagraphOfText({multiline pretty json String})
2)
ObjectFactory factory = Context.getWmlObjectFactory();
P p = factory.createP();
Text t = factory.createText();
t.setValue(text);
R run = factory.createR();
run.getContent().add(t);
p.getContent().add(run);
PPr ppr = factory.createPPr();
p.setPPr(ppr);
ParaRPr paraRpr = factory.createParaRPr();
ppr.setRPr(paraRpr);
wordMLPackage.getMainDocumentPart().addObject(p);
Looking for help. Thanks.
The docx file format doesn't treat \n as a newline.
So you'll need to split your string on \n, and either create a new P, or use w:br, like so:
Br br = wmlObjectFactory.createBr();
run.getContent().add( br);
Hi i m trying to replace some text in a docx file, but i got problemes with text to be replaced that can be on multiple runs. So i tried this : but it erase everything in the document :/
private void replaceText(XWPFParagraph p, String target, String replacement) {
if (p.getRuns() != null) {
String paragraph = p.getText();
for (int i = 0; i < p.getRuns().size(); i++) {
p.removeRun(i);
}
paragraph = paragraph.replace(target, replacement);
XWPFRun r = new XWPFRun(CTR.Factory.newInstance(), p);
r.setText(paragraph, 0);
}
}
It will surely erase everything because you are removing all the runs in the paragraph. Point to understand here is that the text in the paragraph is stored inside the runs. What getText() does is it returns all the text in all the runs of the paragraph.
Removing all runs and adding just one new run will surely disrupt the font and alignment of the text
You are removing all the runs and then adding one run with the replaced text.
I believe this is not what you wish to achieve.
Just loop over the runs and replace the text inside them.
For one of my projects I chose a different route, I work on the underlying XML data and do a search/replace there which usually works quite nicely.
See https://github.com/centic9/poi-mail-merge for the details, but basically I fetch the CTBody low-level item via
CTBody body = doc.getDocument().getBody();
And then read the full XML body text
// read the current full Body text
String srcString = body.xmlText();
then do the replacements.
Finally I create a new CTBody item with the new contents via
CTBody makeBody = CTBody.Factory.parse(resultStr);
See https://github.com/centic9/poi-mail-merge/blob/master/src/main/java/org/dstadler/poi/mailmerge/MailMerge.java#L81 for the full code-details as there are a few more things that are handled to make it work nicely.
I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.
The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)
I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.
I have a program that loads lines from a user file, then selects the last part of the String (which would be an int)
Here's the style it's saved in:
nameOfValue = 0
nameOfValue2 = 0
and so on. I have selected the value for sure - I debugged it by printing. I just can't seem to save it back in.
if(nameOfValue.equals(type)) {
System.out.println(nameOfValue+" equals "+type);
value.replace(value, Integer.toString(Integer.parseInt(value)+1));
}
How would I resave it? I've tried bufferedwriter but it just erases everything in the file.
My suggestion is, save all the contents of the original file (either in memory or in a temporary file; I'll do it in memory) and then write it again, including the modifications. I believe this would work:
public static void replaceSelected(File file, String type) throws IOException {
// we need to store all the lines
List<String> lines = new ArrayList<String>();
// first, read the file and store the changes
BufferedReader in = new BufferedReader(new FileReader(file));
String line = in.readLine();
while (line != null) {
if (line.startsWith(type)) {
String sValue = line.substring(line.indexOf('=')+1).trim();
int nValue = Integer.parseInt(sValue);
line = type + " = " + (nValue+1);
}
lines.add(line);
line = in.readLine();
}
in.close();
// now, write the file again with the changes
PrintWriter out = new PrintWriter(file);
for (String l : lines)
out.println(l);
out.close();
}
And you'd call the method like this, providing the File you want to modify and the name of the value you want to select:
replaceSelected(new File("test.txt"), "nameOfValue2");
I think most convenient way is:
Read text file line by line using BufferedReader
For each line find the int part using regular expression and replace
it with your new value.
Create a new file with the newly created text lines.
Delete source file and rename your new created file.
Please let me know if you need the Java program implemented above algorithm.
Hard to answer without the complete code...
Is value a string ? If so the replace will create a new string but you are not saving this string anywhere. Remember Strings in Java are immutable.
You say you use a BufferedWriter, did you flush and close it ? This is often a cause of values mysteriously disappearing when they should be there. This exactly why Java has a finally keyword.
Also difficult to answer without more details on your problem, what exactly are you trying to acheive ? There may be simpler ways to do this that are already there.