Insert a bulleted list from an ArrayList Apache POI XWPF - java

I have an array list that I want to use to create a new bullet list inside a document.
I already have numbering (with numbers) and I want to have both (number and bullet) on different lists.
My document is pre-populated with some data and I have some tokens who determine where go my data. For my list, I have token who is like this one and I able to reach it.
{{tokenlist1}}
I want to :
first option : reach my token, create a new bullet list and delete my token
second option : replace my token by my first element and continue my bullet list.
It would be really appreciated if the bullet form (square, round, check, ....) can stay the same as they are with the token.

EDIT
for those who want an answer here's my solution.
Action
Map<String, Object> replacements = new HashMap<String, Object>();
replacements.put("{{token1}}", "texte changé 1");
replacements.put("{{token2}}", "ici est le texte du token numéro 2");
replacements.put("{{tokenList1}}", tokenList1);
replacements.put("{{tokenList2}}", tokenList1);
templateWithToken = reportService.findAndReplaceToken(replacements, templateWithToken);
Service
public XWPFDocument findAndReplaceToken (Map<String, Object> replacements,
XWPFDocument document) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (int i = 0; i < paragraphs.size(); i++) {
XWPFParagraph paragraph = paragraphs.get(i);
List<XWPFRun> runs = paragraph.getRuns();
for (Map.Entry<String, Object> replPair : replacements
.entrySet()) {
String find = replPair.getKey();
Object repl = replPair.getValue();
TextSegment found =
paragraph.searchText(find, new PositionInParagraph());
if (found != null) {
if (repl instanceof String) {
replaceText(found, runs, find, repl);
} else if (repl instanceof ArrayList<?>) {
Iterator<?> iterArrayList =
((ArrayList) repl).iterator();
boolean isPassed = false;
while (iterArrayList.hasNext()) {
Object object = (Object) iterArrayList.next();
if (isPassed == false) {
replaceText(found, runs, find,
object.toString());
} else {
XWPFRun run = paragraph.createRun();
run.addCarriageReturn();
run.setText(object.toString());
}
isPassed = true;
}
}
}
}
}
return document;
}
private void replaceText(TextSegment found, List<XWPFRun> runs,
String find, Object repl) {
int biginRun = found.getBeginRun();
int biginRun2 = found.getEndRun();
if (found.getBeginRun() == found.getEndRun()) {
// whole search string is in one Run
XWPFRun run = runs.get(found.getBeginRun());
String runText = run.getText(run.getTextPosition());
String replaced = runText.replace(find, repl.toString());
run.setText(replaced, 0);
} else {
// The search string spans over more than one Run
// Put the Strings together
StringBuilder b = new StringBuilder();
for (int runPos = found.getBeginRun(); runPos <= found
.getEndRun(); runPos++) {
XWPFRun run = runs.get(runPos);
b.append(run.getText(run.getTextPosition()));
}
String connectedRuns = b.toString();
String replaced = connectedRuns.replace(find, repl.toString());
// The first Run receives the replaced String of all
// connected Runs
XWPFRun partOne = runs.get(found.getBeginRun());
partOne.setText(replaced, 0);
// Removing the text in the other Runs.
for (int runPos = found.getBeginRun() + 1; runPos <= found
.getEndRun(); runPos++) {
XWPFRun partNext = runs.get(runPos);
partNext.setText("", 0);
}
}
}

Related

ConcurrentModificationException when trying to replace XWPFHyperlink for XWPFRun

I am trying to replace a string pattern for another one with hyperlink, but I am getting java.util.ConcurrentModificationException. The lines of code which the error is pointing don't make sense, so I wasn't able to find out what happened.
// Replace occurrences in all paragraphs
for (XWPFParagraph p : doc_buffer.getParagraphs()) {
List<XWPFRun> p_runs = p.getRuns();
if (p_runs != null) {
for (XWPFRun r : p_runs) {
String text = r.getText(0);
if ((text != null) && (text.contains(pattern))) {
if (pattern.equals("LINK_TO_DOCS")) {
//TODO
String h_url = "http://example.com/linktodocs/";
String h_text = replacement;
// Creates the link as an external relationship
XWPFParagraph temp_p = doc_buffer.createParagraph();
String id = temp_p.getDocument().getPackagePart().addExternalRelationship(h_url, XWPFRelation.HYPERLINK.getRelation()).getId();
// Binds the link to the relationship
CTHyperlink link = temp_p.getCTP().addNewHyperlink();
link.setId(id);
// Creates the linked text
CTText linked_text = CTText.Factory.newInstance();
linked_text.setStringValue(h_text);
// Creates a wordprocessing Run wrapper
CTR ctr = CTR.Factory.newInstance();
ctr.setTArray(new CTText[] {linked_text});
link.setRArray(new CTR[] {ctr});
r = new XWPFHyperlinkRun(link, r.getCTR(), r.getParent());
}
else {
text = text.replaceAll(pattern, replacement);
r.setText(text, 0);
}
}
}
}
}
Console error:
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
at java.util.ArrayList$Itr.next(ArrayList.java:859)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
at releasenotes.ReleaseNotesUpdater.replaceAllOccurrences(ReleaseNotesUpdater.java:263)
at releasenotes.ReleaseNotesUpdater.main(ReleaseNotesUpdater.java:85)
Also, besides this error, I also would like some advice about how can I replace a string pattern for another one with hyperlink. I have searched but I am a bit confused about how it works.
Edit.:
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
public Iterator<E> iterator() {
return new Iterator<E>() {
private final Iterator<? extends E> i = c.iterator();
public boolean hasNext() {return i.hasNext();}
public E next() {return i.next();}
public void remove() {
throw new UnsupportedOperationException();
}
#Override
public void forEachRemaining(Consumer<? super E> action) {
// Use backing collection version
i.forEachRemaining(action);
}
};
}
at java.util.ArrayList$Itr.next(ArrayList.java:859)
#SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
I have found the solution so I am sharing if anyone has the same trouble.
To replace a common run with a Hyperlink run, simply do the following:
String h_url = "http://example.com/index.html";
String h_text = replacement;
// Creates the link as an external relationship
String id = r.getDocument().getPackagePart()
.addExternalRelationship(h_url, XWPFRelation.HYPERLINK.getRelation()).getId();
// Binds the link to the relationship
CTHyperlink link = r.getParagraph().getCTP().addNewHyperlink();
link.setId(id);
// Creates the linked text
CTText linked_text = CTText.Factory.newInstance();
linked_text.setStringValue(h_text);
// Creates a XML wordprocessing wrapper for Run
// The magic is here
CTR ctr = r.getCTR();
ctr.setTArray(new CTText[] { linked_text });
// Stylizing
CTRPr rpr_c = ctr.addNewRPr();
CTColor color = CTColor.Factory.newInstance();
color.setVal("0000FF");
rpr_c.setColor(color);
CTRPr rpr_u = ctr.addNewRPr();
rpr_u.addNewU().setVal(STUnderline.SINGLE);
The code above is inside a loop which is iterating over all runs in a paragraph (r is the current run). So you just have to call r.getCTR() to be able to edit the run.
The reason why the exception was happening, was because I was trying to modify the document structure while going through it in this line:
XWPFParagraph temp_p = doc_buffer.createParagraph();
If anyone has questions, feel free to ask in the comments.

How to extract key phrases from a given text with OpenNLP?

I'm using Apache OpenNLP and i'd like to extract the Keyphrases of a given text. I'm already gathering entities - but i would like to have Keyphrases.
The problem i have is that i can't use TF-IDF cause i don't have models for that and i only have a single text (not multiple documents)
Here is some code (prototyped - not so clean)
public List<KeywordsModel> extractKeywords(String text, NLPProvider pipeline) {
SentenceDetectorME sentenceDetector = new SentenceDetectorME(pipeline.getSentencedetecto("en"));
TokenizerME tokenizer = new TokenizerME(pipeline.getTokenizer("en"));
POSTaggerME posTagger = new POSTaggerME(pipeline.getPosmodel("en"));
ChunkerME chunker = new ChunkerME(pipeline.getChunker("en"));
ArrayList<String> stopwords = pipeline.getStopwords("en");
Span[] sentSpans = sentenceDetector.sentPosDetect(text);
Map<String, Float> results = new LinkedHashMap<>();
SortedMap<String, Float> sortedData = new TreeMap(new MapSort.FloatValueComparer(results));
float sentenceCounter = sentSpans.length;
float prominenceVal = 0;
int sentences = sentSpans.length;
for (Span sentSpan : sentSpans) {
prominenceVal = sentenceCounter / sentences;
sentenceCounter--;
String sentence = sentSpan.getCoveredText(text).toString();
int start = sentSpan.getStart();
Span[] tokSpans = tokenizer.tokenizePos(sentence);
String[] tokens = new String[tokSpans.length];
for (int i = 0; i < tokens.length; i++) {
tokens[i] = tokSpans[i].getCoveredText(sentence).toString();
}
String[] tags = posTagger.tag(tokens);
Span[] chunks = chunker.chunkAsSpans(tokens, tags);
for (Span chunk : chunks) {
if ("NP".equals(chunk.getType())) {
int npstart = start + tokSpans[chunk.getStart()].getStart();
int npend = start + tokSpans[chunk.getEnd() - 1].getEnd();
String potentialKey = text.substring(npstart, npend);
if (!results.containsKey(potentialKey)) {
boolean hasStopWord = false;
String[] pKeys = potentialKey.split("\\s+");
if (pKeys.length < 3) {
for (String pKey : pKeys) {
for (String stopword : stopwords) {
if (pKey.toLowerCase().matches(stopword)) {
hasStopWord = true;
break;
}
}
if (hasStopWord == true) {
break;
}
}
}else{
hasStopWord=true;
}
if (hasStopWord == false) {
int count = StringUtils.countMatches(text, potentialKey);
results.put(potentialKey, (float) (Math.log(count) / 100) + (float)(prominenceVal/5));
}
}
}
}
}
sortedData.putAll(results);
System.out.println(sortedData);
return null;
}
What it basically does is giving me the Nouns back and sorting them by prominence value (where is it in the text?) and counts.
But honestly - this doesn't work soo good.
I also tried it with lucene analyzer but the results were also not so good.
So - how can i achieve what i want to do? I already know of KEA/Maui-indexer etc (but i'm afraid i can't use them because of GPL :( )
Also interesting? Which other algorithms can i use instead of TF-IDF?
Example:
This text: http://techcrunch.com/2015/09/04/etsys-pulling-the-plug-on-grand-st-at-the-end-of-this-month/
Good output in my opinion: Etsy, Grand St., solar chargers, maker marketplace, tech hardware
Finally, i found something:
https://github.com/srijiths/jtopia
It is using the POS from opennlp/stanfordnlp. It has an ALS2 license. Haven't measured precision and recall yet but it delivers great results in my opinion.
Here is my code:
Configuration.setTaggerType("openNLP");
Configuration.setSingleStrength(6);
Configuration.setNoLimitStrength(5);
// if tagger type is "openNLP" then give the openNLP POS tagger path
//Configuration.setModelFileLocation("model/openNLP/en-pos-maxent.bin");
// if tagger type is "default" then give the default POS lexicon file
//Configuration.setModelFileLocation("model/default/english-lexicon.txt");
// if tagger type is "stanford "
Configuration.setModelFileLocation("Dont need that here");
Configuration.setPipeline(pipeline);
TermsExtractor termExtractor = new TermsExtractor();
TermDocument topiaDoc = new TermDocument();
topiaDoc = termExtractor.extractTerms(text);
//logger.info("Extracted terms : " + topiaDoc.getExtractedTerms());
Map<String, ArrayList<Integer>> finalFilteredTerms = topiaDoc.getFinalFilteredTerms();
List<KeywordsModel> keywords = new ArrayList<>();
for (Map.Entry<String, ArrayList<Integer>> e : finalFilteredTerms.entrySet()) {
KeywordsModel keyword = new KeywordsModel();
keyword.setLabel(e.getKey());
keywords.add(keyword);
}
I modified the Configurationfile a bit so that the POSModel is loaded from the pipeline instance.

Parse string, using default methods

I have used the following code to extract text from .odt files:
public class OpenOfficeParser {
StringBuffer TextBuffer;
public OpenOfficeParser() {}
//Process text elements recursively
public void processElement(Object o) {
if (o instanceof Element) {
Element e = (Element) o;
String elementName = e.getQualifiedName();
if (elementName.startsWith("text")) {
if (elementName.equals("text:tab")) // add tab for text:tab
TextBuffer.append("\\t");
else if (elementName.equals("text:s")) // add space for text:s
TextBuffer.append(" ");
else {
List children = e.getContent();
Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Object child = iterator.next();
//If Child is a Text Node, then append the text
if (child instanceof Text) {
Text t = (Text) child;
TextBuffer.append(t.getValue());
}
else
processElement(child); // Recursively process the child element
}
}
if (elementName.equals("text:p"))
TextBuffer.append("\\n");
}
else {
List non_text_list = e.getContent();
Iterator it = non_text_list.iterator();
while (it.hasNext()) {
Object non_text_child = it.next();
processElement(non_text_child);
}
}
}
}
public String getText(String fileName) throws Exception {
TextBuffer = new StringBuffer();
//Unzip the openOffice Document
ZipFile zipFile = new ZipFile(fileName);
Enumeration entries = zipFile.entries();
ZipEntry entry;
while(entries.hasMoreElements()) {
entry = (ZipEntry) entries.nextElement();
if (entry.getName().equals("content.xml")) {
TextBuffer = new StringBuffer();
SAXBuilder sax = new SAXBuilder();
Document doc = sax.build(zipFile.getInputStream(entry));
Element rootElement = doc.getRootElement();
processElement(rootElement);
break;
}
}
System.out.println("The text extracted from the OpenOffice document = " + TextBuffer.toString());
return TextBuffer.toString();
}
}
now my problem occurs when using the returned string from getText() method.
I ran the program and extracted some text from a .odt, here is a piece of extracted text:
(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....
So I tried this
System.out.println( TextBuffer.toString().split("\\n"));
the output I received was:
substring: [Ljava.lang.String;#505bb829
I also tried this:
System.out.println( TextBuffer.toString().trim() );
but no changes in the printed string.
Why this behaviour?
What can I do to parse that string correctly?
And, if I wanted to add to array[i] each substring that ends with "\n\n" how can I do?
edit:
Sorry I made a mistake with the example because I forgot that split() returns an array.
The problem is that it returns an array with one line so what I'm asking is why doing this:
System.out.println(Arrays.toString(TextBuffer.toString().split("\\n")));
has no effect on the string I wrote in the example.
Also this:
System.out.println( TextBuffer.toString().trim() );
has no effects on the original string, it just prints the original string.
I want to example the reason why I want to use the split(), it is because I want parse that string and put each substring that ends with "\n" in an array line, here is an example:
my originale string:
(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....
after parsing I would print each line of an array and the output should be:
line 1: (no hi virtual x oy)\
line 2: house cat
line 3: open it
line 4: trying to
and so on.....
If I understood your question correctly I would do something like this
String str = "(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....";
List<String> al = new ArrayList<String>(Arrays.asList(str.toString()
.split("\\n")));
al.removeAll(Arrays.asList("", null)); // remove empty or null string
for (int i = 0; i< al.size(); i++) {
System.out.println("Line " + i + " : " + al.get(i).trim());
}
Output
Line 0 : (no hi virtual x oy)
Line 1 : house cat
Line 2 : open it
Line 3 : trying to....

Insert a line break inside a paragraph in XWPFDocument

I am writing values into a word template using apache poi 3.8. I replace specific strings in a word file (keys) with required values, e.g. word document has a paragraph containing key %Entry1%, and I want to replace it with "Entry text line1 \nnew line". All replaced keys and values are stored in a Map in my realisation.
Map<String, String> replacedElementsMap;
The code for HWPFDocument is:
Range range = document.getRange();
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
range.replaceText(entry.getKey(), entry.getValue());
}
This code works fine, I just have to put \n in the entry string for a line break. However I can't find similiar method for XWPFDocument. My current code for XWPFDocument is:
List<XWPFParagraph> xwpfParagraphs = document.getParagraphs();
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
Now the "\n"-string doesn't result in the carriage return, and if I use xwpfRun.addCarriageReturn(); I just get a line break after the paragraph. How should I create new lines in xwpf correctly?
I have another solution and it is easier:
if (data.contains("\n")) {
String[] lines = data.split("\n");
run.setText(lines[0], 0); // set first line into XWPFRun
for(int i=1;i<lines.length;i++){
// add break and insert new text
run.addBreak();
run.setText(lines[i]);
}
} else {
run.setText(data, 0);
}
After all, I had to create paragraphs manually. Basically, I split the replace string to an array and create a new paragraph for each array element. Here is the code:
protected void replaceElementInParagraphs(List<XWPFParagraph> xwpfParagraphs,
Map<String, String> replacedMap) {
if (!searchInParagraphs(xwpfParagraphs, replacedMap)) {
replaceElementInParagraphs(xwpfParagraphs, replacedMap);
}
}
private boolean searchInParagraphs(List<XWPFParagraph> xwpfParagraphs, Map<String, String> replacedMap) {
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
if (entry.getValue().contains("\n")) {
String[] paragraphs = entry.getValue().split("\n");
entry.setValue("");
createParagraphs(xwpfParagraph, paragraphs);
return false;
}
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
return true;
}
private void createParagraphs(XWPFParagraph xwpfParagraph, String[] paragraphs) {
if(xwpfParagraph!=null){
for (int i = 0; i < paragraphs.length; i++) {
XmlCursor cursor = xwpfParagraph.getCTP().newCursor();
XWPFParagraph newParagraph = document.insertNewParagraph(cursor);
newParagraph.setAlignment(xwpfParagraph.getAlignment());
newParagraph.getCTP().insertNewR(0).insertNewT(0).setStringValue(paragraphs[i]);
newParagraph.setNumID(xwpfParagraph.getNumID());
}
document.removeBodyElement(document.getPosOfParagraph(xwpfParagraph));
}
}

In ArrayList, how to remove subdirecyory if its parent is already present in the list?

I've an ArrayList<String> containing paths of directiories, like:
/home, /usr...
I want to write a code that will remove all the paths from the list if the list already contains parent direcotry of that element.
For e.g:
If the list contains:
/home
/home/games
then, /home/games should get removed as its parent /home is already in the list.
Below is the code:
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(checkedList.contains(f.getParent()));
checkedList.remove(checkedList.get(i));
}
Above checkedList is a String arrayList.
The problem comes when the list contains:
/home
/home/games/minesweeper
Now the minesweeper folder will not get removed as its parent games is not in the list. How to remove these kinds of elements too?
Another possible solution would be using String.startsWith(String).
But of course you could take advantage of parent functionality of File class in order to handle the relative directories and other particularities. Follows a draft of the solution:
List<String> listOfDirectories = new ArrayList<String>();
listOfDirectories.add("/home/user/tmp/test");
listOfDirectories.add("/home/user");
listOfDirectories.add("/tmp");
listOfDirectories.add("/etc/test");
listOfDirectories.add("/etc/another");
List<String> result = new ArrayList<String>();
for (int i = 0; i < listOfDirectories.size(); i++) {
File current = new File(listOfDirectories.get(i));
File parent = current;
while ((parent = parent.getParentFile()) != null) {
if (listOfDirectories.contains(parent.getAbsolutePath())) {
current = parent;
}
}
String absolutePath = current.getAbsolutePath();
if (!result.contains(absolutePath)) {
result.add(absolutePath);
}
}
System.out.println(result);
This would print:
[/home/user, /tmp, /etc/test, /etc/another]
You can do some string manipulation to get the base directory of each string.
int baseIndex = checkedList.get(i).indexOf("/",1);
String baseDirectory = checkedList.get(i).substring(0,baseIndex);
if(baseIndex != -1 && checkedList.contains(baseDirectory))
{
checkedList.remove(checkedList.get(i));
}
This will get the index of the second '/' and extract the string up until that slash. If the second slash exists, then it checks if the list contains the base string and removes the current string if there's a mtach.
You can substract the root from your string and add it to a hashset.
For example:
if you have /home/games you can substract "home" from the string using string substraction or a regular expression or whateever you want.
before you add "home" to the hashset you must check if it's already added:
if (hashset.Contains("home"))
{
//then it s already added
}
else
{
hashhset.add("home");
}
would doing the opposite work? if the parent is NOT found in your ArrayList, add the value to a final output ArrayList?
for (int i = 0; i < checkedList.size(); i++) {
File f = new File(checkedList.get(i));
if(!checkedList.contains(f.getParent()));
yourOutputList.Add(checkedList.get(i));
}
You should check every parent of each list item in turn.
I will assume that your list contains normalized absolute path File objects:
for (int i = 0; i < checkedList.size(); i++) {
File curItem = checkedList.get(i);
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
checkedList.remove( curItem );
break;
}
}
}
Actually, I would rewrite it with ListIterator
for (ListIterator iter = checkedList.iterator(); iter.hasNext(); )
{
File curItem = iter.next();
for (
File curParent = curItem.getParent( );
curParent != null;
curParent = curParent.getParent( )
)
{
if(checkedList.contains( curParent ) )
{
iter.remove( );
break;
}
}
}

Categories