I want to replace tags with values in a docx document.
Here is a line of the document :
<Site_rattachement>, le <date_avenant>
I want to replace <Site_rattachement> and <date_avenant> by some value.
My code :
doc = new XWPFDocument(OPCPackage.open(docxFile));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
replaceIfNeeded(r, text, my_value);
}
}
}
But first r.getText(0) gives me < instead of <Site_rattachement>.
Next occurence gives me Site_rattachement.
Next occurence gives me >.
Is there something wrong with my docx file?
Related
I have an array list that I want to use to create a new bullet list inside a document.
I already have numbering (with numbers) and I want to have both (number and bullet) on different lists.
My document is pre-populated with some data and I have some tokens who determine where go my data. For my list, I have token who is like this one and I able to reach it.
{{tokenlist1}}
I want to :
first option : reach my token, create a new bullet list and delete my token
second option : replace my token by my first element and continue my bullet list.
It would be really appreciated if the bullet form (square, round, check, ....) can stay the same as they are with the token.
EDIT
for those who want an answer here's my solution.
Action
Map<String, Object> replacements = new HashMap<String, Object>();
replacements.put("{{token1}}", "texte changé 1");
replacements.put("{{token2}}", "ici est le texte du token numéro 2");
replacements.put("{{tokenList1}}", tokenList1);
replacements.put("{{tokenList2}}", tokenList1);
templateWithToken = reportService.findAndReplaceToken(replacements, templateWithToken);
Service
public XWPFDocument findAndReplaceToken (Map<String, Object> replacements,
XWPFDocument document) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (int i = 0; i < paragraphs.size(); i++) {
XWPFParagraph paragraph = paragraphs.get(i);
List<XWPFRun> runs = paragraph.getRuns();
for (Map.Entry<String, Object> replPair : replacements
.entrySet()) {
String find = replPair.getKey();
Object repl = replPair.getValue();
TextSegment found =
paragraph.searchText(find, new PositionInParagraph());
if (found != null) {
if (repl instanceof String) {
replaceText(found, runs, find, repl);
} else if (repl instanceof ArrayList<?>) {
Iterator<?> iterArrayList =
((ArrayList) repl).iterator();
boolean isPassed = false;
while (iterArrayList.hasNext()) {
Object object = (Object) iterArrayList.next();
if (isPassed == false) {
replaceText(found, runs, find,
object.toString());
} else {
XWPFRun run = paragraph.createRun();
run.addCarriageReturn();
run.setText(object.toString());
}
isPassed = true;
}
}
}
}
}
return document;
}
private void replaceText(TextSegment found, List<XWPFRun> runs,
String find, Object repl) {
int biginRun = found.getBeginRun();
int biginRun2 = found.getEndRun();
if (found.getBeginRun() == found.getEndRun()) {
// whole search string is in one Run
XWPFRun run = runs.get(found.getBeginRun());
String runText = run.getText(run.getTextPosition());
String replaced = runText.replace(find, repl.toString());
run.setText(replaced, 0);
} else {
// The search string spans over more than one Run
// Put the Strings together
StringBuilder b = new StringBuilder();
for (int runPos = found.getBeginRun(); runPos <= found
.getEndRun(); runPos++) {
XWPFRun run = runs.get(runPos);
b.append(run.getText(run.getTextPosition()));
}
String connectedRuns = b.toString();
String replaced = connectedRuns.replace(find, repl.toString());
// The first Run receives the replaced String of all
// connected Runs
XWPFRun partOne = runs.get(found.getBeginRun());
partOne.setText(replaced, 0);
// Removing the text in the other Runs.
for (int runPos = found.getBeginRun() + 1; runPos <= found
.getEndRun(); runPos++) {
XWPFRun partNext = runs.get(runPos);
partNext.setText("", 0);
}
}
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to read a word document template and then replace the variables from the template, with user given data.without changing the heading or style as on the tempate.I'm not sure that what I am doing is correct way or not but this is the way I started:
'XWPFDocument docx = new XWPFDocument(
new FileInputStream(
"D://TestDocumentPrep/src/XXXXX_TestReport_URL_Document.docx"));
XWPFWordExtractor we = new XWPFWordExtractor(docx);
String textData = we.getText();
String newTestData=textData.replace("$var_source_code$", list.get(1))
.replace("$var_rsvp_code$", list.get(2))
.replace("$var_ssn$", list.get(3))
.replace("$var_zip_code$", list.get(4))
.replace("$var_point_for_business$",
anotherData.getPointForBusiness())
.replace("$var_E1_url$", anotherData.getE1url())
.replace("$var_E2_url$", anotherData.getE2url())
.replace("$var_E3_url$", anotherData.getE3url());
System.out.println(newTestData);'
This is what I have done.But Its reading the content of the word document as a string and replacing the variables.Now how to put the replaced string in word document in the template format?
Here I found something but Not exactly my solution
Here also I found something but not exact solution
Hi I am able to find the solution
Here is the code I have used for editing my word document and it's fine with both .doc and .docx format of the file which i want to edit and generate an edited new word document without changing the base template.
public void wordDocProcessor(AnotherVO anotherData, ArrayList<String> list,
String sourse, String destination) throws IOException,
InvalidFormatException {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(sourse
+ "XXXXX_TestReport_URL_Document.doc"));
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null
&& text.contains("var_source_code")) {
text = text.replace("var_source_code",
list.get(1));
r.setText(text, 0);
}
if (text != null && text.contains("var_rsvp_code")) {
text = text.replace("var_rsvp_code",
list.get(2));
r.setText(text, 0);
}
if (text != null && text.contains("var_ssn")) {
text = text.replace("var_ssn", list.get(3));
r.setText(text, 0);
}
if (text != null && text.contains("var_zip_code")) {
text = text
.replace("var_zip_code", list.get(4));
r.setText(text, 0);
}
if (text != null
&& text.contains("var_point_for_business")) {
text = text.replace("var_point_for_business",
anotherData.getPointForBusiness());
r.setText(text, 0);
}
if (text != null && text.contains("var_E1_url")) {
text = text.replace("var_E1_url",
anotherData.getE1url());
r.setText(text, 0);
}
if (text != null && text.contains("var_E2_url")) {
text = text.replace("var_E2_url",
anotherData.getE2url());
r.setText(text, 0);
}
if (text != null && text.contains("var_E3_url")) {
text = text.replace("var_E3_url",
anotherData.getE3url());
r.setText(text, 0);
}
}
}
}
}
}
doc.write(new FileOutputStream(destination + list.get(0)
+ "_TestReport_URL_Document.doc"));
}
I have used the following code to extract text from .odt files:
public class OpenOfficeParser {
StringBuffer TextBuffer;
public OpenOfficeParser() {}
//Process text elements recursively
public void processElement(Object o) {
if (o instanceof Element) {
Element e = (Element) o;
String elementName = e.getQualifiedName();
if (elementName.startsWith("text")) {
if (elementName.equals("text:tab")) // add tab for text:tab
TextBuffer.append("\\t");
else if (elementName.equals("text:s")) // add space for text:s
TextBuffer.append(" ");
else {
List children = e.getContent();
Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Object child = iterator.next();
//If Child is a Text Node, then append the text
if (child instanceof Text) {
Text t = (Text) child;
TextBuffer.append(t.getValue());
}
else
processElement(child); // Recursively process the child element
}
}
if (elementName.equals("text:p"))
TextBuffer.append("\\n");
}
else {
List non_text_list = e.getContent();
Iterator it = non_text_list.iterator();
while (it.hasNext()) {
Object non_text_child = it.next();
processElement(non_text_child);
}
}
}
}
public String getText(String fileName) throws Exception {
TextBuffer = new StringBuffer();
//Unzip the openOffice Document
ZipFile zipFile = new ZipFile(fileName);
Enumeration entries = zipFile.entries();
ZipEntry entry;
while(entries.hasMoreElements()) {
entry = (ZipEntry) entries.nextElement();
if (entry.getName().equals("content.xml")) {
TextBuffer = new StringBuffer();
SAXBuilder sax = new SAXBuilder();
Document doc = sax.build(zipFile.getInputStream(entry));
Element rootElement = doc.getRootElement();
processElement(rootElement);
break;
}
}
System.out.println("The text extracted from the OpenOffice document = " + TextBuffer.toString());
return TextBuffer.toString();
}
}
now my problem occurs when using the returned string from getText() method.
I ran the program and extracted some text from a .odt, here is a piece of extracted text:
(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....
So I tried this
System.out.println( TextBuffer.toString().split("\\n"));
the output I received was:
substring: [Ljava.lang.String;#505bb829
I also tried this:
System.out.println( TextBuffer.toString().trim() );
but no changes in the printed string.
Why this behaviour?
What can I do to parse that string correctly?
And, if I wanted to add to array[i] each substring that ends with "\n\n" how can I do?
edit:
Sorry I made a mistake with the example because I forgot that split() returns an array.
The problem is that it returns an array with one line so what I'm asking is why doing this:
System.out.println(Arrays.toString(TextBuffer.toString().split("\\n")));
has no effect on the string I wrote in the example.
Also this:
System.out.println( TextBuffer.toString().trim() );
has no effects on the original string, it just prints the original string.
I want to example the reason why I want to use the split(), it is because I want parse that string and put each substring that ends with "\n" in an array line, here is an example:
my originale string:
(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....
after parsing I would print each line of an array and the output should be:
line 1: (no hi virtual x oy)\
line 2: house cat
line 3: open it
line 4: trying to
and so on.....
If I understood your question correctly I would do something like this
String str = "(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....";
List<String> al = new ArrayList<String>(Arrays.asList(str.toString()
.split("\\n")));
al.removeAll(Arrays.asList("", null)); // remove empty or null string
for (int i = 0; i< al.size(); i++) {
System.out.println("Line " + i + " : " + al.get(i).trim());
}
Output
Line 0 : (no hi virtual x oy)
Line 1 : house cat
Line 2 : open it
Line 3 : trying to....
I am writing values into a word template using apache poi 3.8. I replace specific strings in a word file (keys) with required values, e.g. word document has a paragraph containing key %Entry1%, and I want to replace it with "Entry text line1 \nnew line". All replaced keys and values are stored in a Map in my realisation.
Map<String, String> replacedElementsMap;
The code for HWPFDocument is:
Range range = document.getRange();
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
range.replaceText(entry.getKey(), entry.getValue());
}
This code works fine, I just have to put \n in the entry string for a line break. However I can't find similiar method for XWPFDocument. My current code for XWPFDocument is:
List<XWPFParagraph> xwpfParagraphs = document.getParagraphs();
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedElementsMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
Now the "\n"-string doesn't result in the carriage return, and if I use xwpfRun.addCarriageReturn(); I just get a line break after the paragraph. How should I create new lines in xwpf correctly?
I have another solution and it is easier:
if (data.contains("\n")) {
String[] lines = data.split("\n");
run.setText(lines[0], 0); // set first line into XWPFRun
for(int i=1;i<lines.length;i++){
// add break and insert new text
run.addBreak();
run.setText(lines[i]);
}
} else {
run.setText(data, 0);
}
After all, I had to create paragraphs manually. Basically, I split the replace string to an array and create a new paragraph for each array element. Here is the code:
protected void replaceElementInParagraphs(List<XWPFParagraph> xwpfParagraphs,
Map<String, String> replacedMap) {
if (!searchInParagraphs(xwpfParagraphs, replacedMap)) {
replaceElementInParagraphs(xwpfParagraphs, replacedMap);
}
}
private boolean searchInParagraphs(List<XWPFParagraph> xwpfParagraphs, Map<String, String> replacedMap) {
for(XWPFParagraph xwpfParagraph : xwpfParagraphs) {
List<XWPFRun> xwpfRuns = xwpfParagraph.getRuns();
for(XWPFRun xwpfRun : xwpfRuns) {
String xwpfRunText = xwpfRun.getText(xwpfRun.getTextPosition());
for(Map.Entry<String, String> entry : replacedMap.entrySet()) {
if (xwpfRunText != null && xwpfRunText.contains(entry.getKey())) {
if (entry.getValue().contains("\n")) {
String[] paragraphs = entry.getValue().split("\n");
entry.setValue("");
createParagraphs(xwpfParagraph, paragraphs);
return false;
}
xwpfRunText = xwpfRunText.replaceAll(entry.getKey(), entry.getValue());
}
}
xwpfRun.setText(xwpfRunText, 0);
}
}
return true;
}
private void createParagraphs(XWPFParagraph xwpfParagraph, String[] paragraphs) {
if(xwpfParagraph!=null){
for (int i = 0; i < paragraphs.length; i++) {
XmlCursor cursor = xwpfParagraph.getCTP().newCursor();
XWPFParagraph newParagraph = document.insertNewParagraph(cursor);
newParagraph.setAlignment(xwpfParagraph.getAlignment());
newParagraph.getCTP().insertNewR(0).insertNewT(0).setStringValue(paragraphs[i]);
newParagraph.setNumID(xwpfParagraph.getNumID());
}
document.removeBodyElement(document.getPosOfParagraph(xwpfParagraph));
}
}
I am trying to scrape a list of medicines from a website.
I am using JSOUP to parse the Html.
Here is my code :
URL url = new URL("http://www.medindia.net/drug-price/index.asp?alpha=a");
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.getElementsByAttributeValue("style", "padding-left:5px;border-right:1px solid #A5A5A5;");
for(Element row : rows){
String htm = row.text();
if(!(htm.equals("View Price")||htm.contains("Show Details"))) {
System.out.println(htm);
System.out.println();
}
}
Here is the Output that I am getting:
P.S. This is not the complete output But As I couldn't Take The Screen Shot of the complete output, I just displayed it.
I need to Know Two Things :
Question 1. Why am I getting an Extra Space In front of each Drug Name and why am I getting Extra New Line After Some Drug's Name?
Question 2. How do I resolve this Issue?
A few things:
It's not the complete output because there's more than one page. I put a for loop that fixes that for you.
You should probably trim the output using htm.trim()
You should probably make sure to not print when there's a newLine (!htm.isEmpty())
That website has a weird character with ASCII value 160 in it. I added a small fix that solves the problem. (with .replace)
Here's the fixed code:
for(char page='a'; page <= 'z'; page++) {
String urlString = String.format("http://www.medindia.net/drug-price/index.asp?alpha=%c", page);
URL url = new URL(urlString);
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.getElementsByAttributeValue("style", "padding-left:5px;border-right:1px solid #A5A5A5;");
for(Element row : rows){
String htm = row.text().replace((char) 160, ' ').trim();
if(!(htm.equals("View Price")||htm.contains("Show Details"))&& !htm.isEmpty())
{
System.out.println(htm.trim());
System.out.println();
}
}
}
Do one thing :
Use trim function in syso : System.out.println(htm.trim());
UPDATED :
After a lot of effort I was able to parse all 80 medicines like this :-
URL url = new URL("http://www.medindia.net/drug-price/index.asp?alpha=a");
Document doc1 = Jsoup.parse(url, 0);
Elements rows = doc1.select("td.ta13blue");
Elements rows1 = doc1.select("td.ta13black.tbold");
int cnt=0;
for(Element row : rows){
cnt++;
String htm = row.text().trim();
if(!(htm.equals("View Price")||htm.contains("Show Details") || htm.startsWith("Drug"))) {
System.out.println(cnt+" : "+htm);
System.out.println();
}
}
for(Element row1 : rows1){
cnt++;
String htm = row1.text().trim();
if(!(htm.equals("View Price")||htm.contains("Show Details") || htm.startsWith("Drug"))) {
System.out.println(cnt+" : "+htm);
System.out.println();
}
}
1) Taking elements by style is quite dangerous;
2) Calling ROWS what instead is a list of FIELDS is even more dangerous :)
3) Opening the page , you can see that the extra lines are added ONLY after "black names", name of items not wrapped in an anchor link.
You problem is then that the second field in that rows is not Show Details nor View Price and not even empty... it is:
<td bgcolor="#FFFFDB" align="center"
style="padding-left:5px;border-right:1px solid #A5A5A5;">
</td>
It is a one space string. Modify your code like this:
for(Element row : rows){
String htm = row.text().trim(); // <!-- This one
if(!
(htm.equals("View Price")
|| htm.contains("Show Details")
|| htm.equals(" ")) // <!-- And this one
) {
System.out.println(htm);
System.out.println();
}
}