How to replace bookmarks in ".docx", using POI without loosing format? - java

I am trying to replace bookmark with values.
private FileInputStream fis = new FileInputStream(new File("D:\\test.docx"));
private XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paraList = this.document.getParagraphs();
private final void procParaList(List<XWPFParagraph> paraList, String bookmarkName, String bookmarkValue) {
Iterator<XWPFParagraph> paraIter = null;
XWPFParagraph para = null;
List<CTBookmark> bookmarkList = null;
Iterator<CTBookmark> bookmarkIter = null;
CTBookmark bookmark = null;
XWPFRun run = null;
Node nextNode = null;
paraIter = paraList.iterator();
while (paraIter.hasNext()) {
para = paraIter.next();
bookmarkList = para.getCTP().getBookmarkStartList();
bookmarkIter = bookmarkList.iterator();
while (bookmarkIter.hasNext()) {
bookmark = bookmarkIter.next();
if (bookmark.getName().equals(bookmarkName)) {
run = para.createRun();
run.setText(bookmarkValue);
nextNode = bookmark.getDomNode().getNextSibling();
while (!(nextNode.getNodeName().contains("bookmarkEnd"))) {
para.getCTP().getDomNode().removeChild(nextNode);
nextNode = bookmark.getDomNode().getNextSibling();
}
para.getCTP().getDomNode().insertBefore(run.getCTR().getDomNode(), nextNode);
}
}
}
}
I am able to replace bookmark to value but it is not keeping the same format(font family, font size, color etc) as bookmark text have.
Can anyone please provide some advice.

As Discussed earlier , i believe this is your exact use case , official archive link
help Please focus on the use of Node styleNode to copy the style information.
/**
* Replace the text - if any - contained between the bookmarkStart and
it's
* matching bookmarkEnd tag with the text specified. The technique used
will
* resemble that employed when inserting text after the bookmark. In
short,
* the code will iterate along the nodes until it encounters a matching
* bookmarkEnd tag. Each node encountered will be deleted unless it is
the
* final node before the bookmarkEnd tag is encountered and it is a
* character run. If this is the case, then it can simply be updated to
* contain the text the users wishes to see inserted into the document.
If
* the last node is not a character run, then it will be deleted, a new
run
* will be created and inserted into the paragraph between the
bookmarkStart
* and bookmarkEnd tags.
*
* #param run An instance of the XWPFRun class that encapsulates the
text
* that is to be inserted into the document following the bookmark.
*/
private void replaceBookmark(XWPFRun run) {
Node nextNode = null;
Node styleNode = null;
Node lastRunNode = null;
Node toDelete = null;
NodeList childNodes = null;
Stack<Node> nodeStack = null;
boolean textNodeFound = false;
boolean foundNested = true;
int bookmarkStartID = 0;
int bookmarkEndID = -1;
int numChildNodes = 0;
nodeStack = new Stack<Node>();
bookmarkStartID = this._ctBookmark.getId().intValue();
nextNode = this._ctBookmark.getDomNode();
nodeStack.push(nextNode);
// Loop through the nodes looking for a matching bookmarkEnd tag
while (bookmarkStartID != bookmarkEndID) {
nextNode = nextNode.getNextSibling();
nodeStack.push(nextNode);
// If an end tag is found, does it match the start tag? If so,
end
// the while loop.
if (nextNode.getNodeName().contains(Bookmark.BOOKMARK_END_TAG))
{
try {
bookmarkEndID = Integer.parseInt(
nextNode.getAttributes().getNamedItem(
Bookmark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
} catch (NumberFormatException nfe) {
bookmarkEndID = bookmarkStartID;
}
}
//else {
// Place a reference to the node on the nodeStack
// nodeStack.push(nextNode);
//}
}
// If the stack of nodes found between the bookmark tags is not
empty
// then they have to be removed.
if (!nodeStack.isEmpty()) {
// Check the node at the top of the stack. If it is a run, get
it's
// style - if any - and apply to the run that will be replacing
it.
//lastRunNode = nodeStack.pop();
lastRunNode = nodeStack.peek();
if ((lastRunNode.getNodeName().equals(Bookmark.RUN_NODE_NAME)))
{
styleNode = this.getStyleNode(lastRunNode);
if (styleNode != null) {
run.getCTR().getDomNode().insertBefore(
styleNode.cloneNode(true),
run.getCTR().getDomNode().getFirstChild());
}
}

Related

Extract word document comments and the text they comment on

I need to extract word document comments and the text they comment on. Below is my current solution, but it is not working as expcted
public class Main {
public static void main(String[] args) throws Exception {
var document = new Document("sample.docx");
NodeCollection<Paragraph> paragraphs = document.getChildNodes(PARAGRAPH, true);
List<MyComment> myComments = new ArrayList<>();
for (Paragraph paragraph : paragraphs) {
var comments = getComments(paragraph);
int commentIndex = 0;
if (comments.isEmpty()) continue;
for (Run run : paragraph.getRuns()) {
var runText = run.getText();
for (int i = commentIndex; i < comments.size(); i++) {
Comment comment = comments.get(i);
String commentText = comment.getText();
if (paragraph.getText().contains(runText + commentText)) {
myComments.add(new MyComment(runText, commentText));
commentIndex++;
break;
}
}
}
}
myComments.forEach(System.out::println);
}
private static List<Comment> getComments(Paragraph paragraph) {
#SuppressWarnings("unchecked")
NodeCollection<Comment> comments = paragraph.getChildNodes(COMMENT, false);
List<Comment> commentList = new ArrayList<>();
comments.forEach(commentList::add);
return commentList;
}
static class MyComment {
String text;
String commentText;
public MyComment(String text, String commentText) {
this.text = text;
this.commentText = commentText;
}
#Override
public String toString() {
return text + "-->" + commentText;
}
}
}
sample.docx contents are:
And the output is (which is incorrect):
factors-->This is word comment
%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
Expected output is:
factors-->This is word comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->First paragraph comment
Please help me with a better way of extarcting word document comments and the text they comment on. If you need additional details let me know, I will provide all the required details
The commented text is marked by special nodes CommentRangeStart and CommentRangeEnd. CommentRangeStart and CommentRangeEnd nodes has Id, which corresponds the Comment id the range is linked to. So you need to extract content between the corresponding start and end nodes.
By the way, the code example in the Aspose.Words API reference shows how print the contents of all comments and their comment ranges using a document visitor. Looks like exactly what you are looking for.
EDIT: You can use code like the following to accomplish your task. I did not provide full code for extracting content between nodes, is is availabel on GitHub
Document doc = new Document("C:\\Temp\\in.docx");
// Get the comments in the document.
Iterable<Comment> comments = doc.getChildNodes(NodeType.COMMENT, true);
Iterable<CommentRangeStart> commentRangeStarts = doc.getChildNodes(NodeType.COMMENT_RANGE_START, true);
Iterable<CommentRangeEnd> commentRangeEnds = doc.getChildNodes(NodeType.COMMENT_RANGE_END, true);
for (Comment c : comments)
{
System.out.println(String.format("Comment %d : %s", c.getId(), c.toString(SaveFormat.TEXT)));
CommentRangeStart start = null;
CommentRangeEnd end = null;
// Search for an appropriate start and end.
for (CommentRangeStart s : commentRangeStarts)
{
if (c.getId() == s.getId())
{
start = s;
break;
}
}
for (CommentRangeEnd e : commentRangeEnds)
{
if (c.getId() == e.getId())
{
end = e;
break;
}
}
if (start != null && end != null)
{
// Extract content between the start and end nodes.
// Code example how to extract content between nodes is here
// https://github.com/aspose-words/Aspose.Words-for-Java/blob/master/Examples/src/main/java/com/aspose/words/examples/programming_documents/document/ExtractContentBetweenCommentRange.java
}
else
{
System.out.println(String.format("Comment %d Does not have comment range"));
}
}

Problem loading a XML file using Dom parser

I am new with Java programming and I have problem reading XML-file. I am trying to save information from XML using DOM parser. I load the xml into a Document and then trying to save all the schedules of a radio channel in a NodeList. but the program saves repeatedly just infromation of the first node. Where is the problem with my code ?
NodeList episodeElement = doc.getElementsByTagName("schedule");
for (int i = 0; i < episodeElement.getLength(); i++) {
Node n = episodeElement.item(i);
if (n.getNodeType() == Node.ELEMENT_NODE && getSize(doc) != 0) {
Element e = (Element) n;
String title = e.getElementsByTagName("title").item(i).getTextContent();
NodeList nd = e.getElementsByTagName("description");
String description;
if (nd.getLength() > 0) {
description = nd.item(i).getTextContent();
}else {
description = null;
}
String startTime = e.getElementsByTagName("starttimeutc").item(i).getTextContent();
String endTime = e.getElementsByTagName("endtimeutc").item(i).getTextContent();
Program prog = new Program(id, title, description, startTime, endTime);
System.out.println(startTime);
programs.add(i, prog);
}
else {
System.out.println("No schedules found");
}
}
You haven't used .getChildNodes() method to traverse a layer down the tag and looped around that, that's why it is just fetching you information of the first node.
Visit this link and u can find an excellent example.
https://www.youtube.com/watch?v=HfGWVy-eMRc

ConcurrentModificationException when trying to replace XWPFHyperlink for XWPFRun

I am trying to replace a string pattern for another one with hyperlink, but I am getting java.util.ConcurrentModificationException. The lines of code which the error is pointing don't make sense, so I wasn't able to find out what happened.
// Replace occurrences in all paragraphs
for (XWPFParagraph p : doc_buffer.getParagraphs()) {
List<XWPFRun> p_runs = p.getRuns();
if (p_runs != null) {
for (XWPFRun r : p_runs) {
String text = r.getText(0);
if ((text != null) && (text.contains(pattern))) {
if (pattern.equals("LINK_TO_DOCS")) {
//TODO
String h_url = "http://example.com/linktodocs/";
String h_text = replacement;
// Creates the link as an external relationship
XWPFParagraph temp_p = doc_buffer.createParagraph();
String id = temp_p.getDocument().getPackagePart().addExternalRelationship(h_url, XWPFRelation.HYPERLINK.getRelation()).getId();
// Binds the link to the relationship
CTHyperlink link = temp_p.getCTP().addNewHyperlink();
link.setId(id);
// Creates the linked text
CTText linked_text = CTText.Factory.newInstance();
linked_text.setStringValue(h_text);
// Creates a wordprocessing Run wrapper
CTR ctr = CTR.Factory.newInstance();
ctr.setTArray(new CTText[] {linked_text});
link.setRArray(new CTR[] {ctr});
r = new XWPFHyperlinkRun(link, r.getCTR(), r.getParent());
}
else {
text = text.replaceAll(pattern, replacement);
r.setText(text, 0);
}
}
}
}
}
Console error:
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
at java.util.ArrayList$Itr.next(ArrayList.java:859)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
at releasenotes.ReleaseNotesUpdater.replaceAllOccurrences(ReleaseNotesUpdater.java:263)
at releasenotes.ReleaseNotesUpdater.main(ReleaseNotesUpdater.java:85)
Also, besides this error, I also would like some advice about how can I replace a string pattern for another one with hyperlink. I have searched but I am a bit confused about how it works.
Edit.:
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
public Iterator<E> iterator() {
return new Iterator<E>() {
private final Iterator<? extends E> i = c.iterator();
public boolean hasNext() {return i.hasNext();}
public E next() {return i.next();}
public void remove() {
throw new UnsupportedOperationException();
}
#Override
public void forEachRemaining(Consumer<? super E> action) {
// Use backing collection version
i.forEachRemaining(action);
}
};
}
at java.util.ArrayList$Itr.next(ArrayList.java:859)
#SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
I have found the solution so I am sharing if anyone has the same trouble.
To replace a common run with a Hyperlink run, simply do the following:
String h_url = "http://example.com/index.html";
String h_text = replacement;
// Creates the link as an external relationship
String id = r.getDocument().getPackagePart()
.addExternalRelationship(h_url, XWPFRelation.HYPERLINK.getRelation()).getId();
// Binds the link to the relationship
CTHyperlink link = r.getParagraph().getCTP().addNewHyperlink();
link.setId(id);
// Creates the linked text
CTText linked_text = CTText.Factory.newInstance();
linked_text.setStringValue(h_text);
// Creates a XML wordprocessing wrapper for Run
// The magic is here
CTR ctr = r.getCTR();
ctr.setTArray(new CTText[] { linked_text });
// Stylizing
CTRPr rpr_c = ctr.addNewRPr();
CTColor color = CTColor.Factory.newInstance();
color.setVal("0000FF");
rpr_c.setColor(color);
CTRPr rpr_u = ctr.addNewRPr();
rpr_u.addNewU().setVal(STUnderline.SINGLE);
The code above is inside a loop which is iterating over all runs in a paragraph (r is the current run). So you just have to call r.getCTR() to be able to edit the run.
The reason why the exception was happening, was because I was trying to modify the document structure while going through it in this line:
XWPFParagraph temp_p = doc_buffer.createParagraph();
If anyone has questions, feel free to ask in the comments.

Insert a bulleted list from an ArrayList Apache POI XWPF

I have an array list that I want to use to create a new bullet list inside a document.
I already have numbering (with numbers) and I want to have both (number and bullet) on different lists.
My document is pre-populated with some data and I have some tokens who determine where go my data. For my list, I have token who is like this one and I able to reach it.
{{tokenlist1}}
I want to :
first option : reach my token, create a new bullet list and delete my token
second option : replace my token by my first element and continue my bullet list.
It would be really appreciated if the bullet form (square, round, check, ....) can stay the same as they are with the token.
EDIT
for those who want an answer here's my solution.
Action
Map<String, Object> replacements = new HashMap<String, Object>();
replacements.put("{{token1}}", "texte changé 1");
replacements.put("{{token2}}", "ici est le texte du token numéro 2");
replacements.put("{{tokenList1}}", tokenList1);
replacements.put("{{tokenList2}}", tokenList1);
templateWithToken = reportService.findAndReplaceToken(replacements, templateWithToken);
Service
public XWPFDocument findAndReplaceToken (Map<String, Object> replacements,
XWPFDocument document) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (int i = 0; i < paragraphs.size(); i++) {
XWPFParagraph paragraph = paragraphs.get(i);
List<XWPFRun> runs = paragraph.getRuns();
for (Map.Entry<String, Object> replPair : replacements
.entrySet()) {
String find = replPair.getKey();
Object repl = replPair.getValue();
TextSegment found =
paragraph.searchText(find, new PositionInParagraph());
if (found != null) {
if (repl instanceof String) {
replaceText(found, runs, find, repl);
} else if (repl instanceof ArrayList<?>) {
Iterator<?> iterArrayList =
((ArrayList) repl).iterator();
boolean isPassed = false;
while (iterArrayList.hasNext()) {
Object object = (Object) iterArrayList.next();
if (isPassed == false) {
replaceText(found, runs, find,
object.toString());
} else {
XWPFRun run = paragraph.createRun();
run.addCarriageReturn();
run.setText(object.toString());
}
isPassed = true;
}
}
}
}
}
return document;
}
private void replaceText(TextSegment found, List<XWPFRun> runs,
String find, Object repl) {
int biginRun = found.getBeginRun();
int biginRun2 = found.getEndRun();
if (found.getBeginRun() == found.getEndRun()) {
// whole search string is in one Run
XWPFRun run = runs.get(found.getBeginRun());
String runText = run.getText(run.getTextPosition());
String replaced = runText.replace(find, repl.toString());
run.setText(replaced, 0);
} else {
// The search string spans over more than one Run
// Put the Strings together
StringBuilder b = new StringBuilder();
for (int runPos = found.getBeginRun(); runPos <= found
.getEndRun(); runPos++) {
XWPFRun run = runs.get(runPos);
b.append(run.getText(run.getTextPosition()));
}
String connectedRuns = b.toString();
String replaced = connectedRuns.replace(find, repl.toString());
// The first Run receives the replaced String of all
// connected Runs
XWPFRun partOne = runs.get(found.getBeginRun());
partOne.setText(replaced, 0);
// Removing the text in the other Runs.
for (int runPos = found.getBeginRun() + 1; runPos <= found
.getEndRun(); runPos++) {
XWPFRun partNext = runs.get(runPos);
partNext.setText("", 0);
}
}
}

In ArrayList, how to remove subdirecyory if its parent is already present in the list?

I've an ArrayList<String> containing paths of directiories, like:
/home, /usr...
I want to write a code that will remove all the paths from the list if the list already contains parent direcotry of that element.
For e.g:
If the list contains:
/home
/home/games
then, /home/games should get removed as its parent /home is already in the list.
Below is the code:
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(checkedList.contains(f.getParent()));
checkedList.remove(checkedList.get(i));
}
Above checkedList is a String arrayList.
The problem comes when the list contains:
/home
/home/games/minesweeper
Now the minesweeper folder will not get removed as its parent games is not in the list. How to remove these kinds of elements too?
Another possible solution would be using String.startsWith(String).
But of course you could take advantage of parent functionality of File class in order to handle the relative directories and other particularities. Follows a draft of the solution:
List<String> listOfDirectories = new ArrayList<String>();
listOfDirectories.add("/home/user/tmp/test");
listOfDirectories.add("/home/user");
listOfDirectories.add("/tmp");
listOfDirectories.add("/etc/test");
listOfDirectories.add("/etc/another");
List<String> result = new ArrayList<String>();
for (int i = 0; i < listOfDirectories.size(); i++) {
File current = new File(listOfDirectories.get(i));
File parent = current;
while ((parent = parent.getParentFile()) != null) {
if (listOfDirectories.contains(parent.getAbsolutePath())) {
current = parent;
}
}
String absolutePath = current.getAbsolutePath();
if (!result.contains(absolutePath)) {
result.add(absolutePath);
}
}
System.out.println(result);
This would print:
[/home/user, /tmp, /etc/test, /etc/another]
You can do some string manipulation to get the base directory of each string.
int baseIndex = checkedList.get(i).indexOf("/",1);
String baseDirectory = checkedList.get(i).substring(0,baseIndex);
if(baseIndex != -1 && checkedList.contains(baseDirectory))
{
checkedList.remove(checkedList.get(i));
}
This will get the index of the second '/' and extract the string up until that slash. If the second slash exists, then it checks if the list contains the base string and removes the current string if there's a mtach.
You can substract the root from your string and add it to a hashset.
For example:
if you have /home/games you can substract "home" from the string using string substraction or a regular expression or whateever you want.
before you add "home" to the hashset you must check if it's already added:
if (hashset.Contains("home"))
{
//then it s already added
}
else
{
hashhset.add("home");
}
would doing the opposite work? if the parent is NOT found in your ArrayList, add the value to a final output ArrayList?
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(!checkedList.contains(f.getParent()));
yourOutputList.Add(checkedList.get(i));
}
You should check every parent of each list item in turn.
I will assume that your list contains normalized absolute path File objects:
for (int i = 0; i < checkedList.size(); i++) {
File curItem = checkedList.get(i);
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
checkedList.remove( curItem );
break;
}
}
}
Actually, I would rewrite it with ListIterator
for (ListIterator iter = checkedList.iterator(); iter.hasNext(); )
{
File curItem = iter.next();
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
iter.remove( );
break;
}
}
}

Categories