I have a string that alternates between text and chapter marks. I'd like to have it in a key-value-array where the key is the chapter name and the value is the chapter content. The text looks like this:
<chapter name="First chapter" />
Lorem ipsum dolor sit amet, consetetur sadipscing elitr.
<chapter name="Second chapter" />
Sed diam nonumy eirmod tempor invidunt ut labore et.
<chapter name="Third chapter" />
Dolore magna aliquyam erat, sed diam voluptua.
The resulting array is supposed to look like this:
[
{"First chapter", "Lorem ipsum dolor sit amet, consetetur sadipscing elitr."},
{"Second chapter", "Sed diam nonumy eirmod tempor invidunt ut labore et."},
{"Third chapter", "Dolore magna aliquyam erat, sed diam voluptua."}
]
How can I do this?
You can use regular expression to locate subject and content. Your case is very suitable for that.
The link below has a summary for regex in java.
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
As suggested by #devd with this posting, the solution to the above case is XPath. There is an example here.
Related
I am trying to stitch together multiple multi-line strings together to create the effect of several columns of text. Consider the three text blocks below:
Lorem ipsum dolor si
t amet, consectetur
adipiscing elit, sed
do eiusmod tempor in
cididunt ut labore e
t dolore magna aliqu
a.
Volutpat consequat m
auris nunc congue ni
si vitae. Sed risus
ultricies tristique
nulla aliquet enim t
ortor at auctor.
Urna porttitor rhonc
us dolor purus non.
Interdum varius sit
amet mattis vulputat
e enim nulla.
The block width is fixed at 20 characters. Ignore the wrapping of words.
What I want to do is stitch or append these separate multi-line strings together to produce the following:
Lorem ipsum dolor si Volutpat consequat m Urna porttitor rhonc
t amet, consectetur auris nunc congue ni us dolor purus non.
adipiscing elit, sed si vitae. Sed risus Interdum varius sit
do eiusmod tempor in ultricies tristique amet mattis vulputat
cididunt ut labore e nulla aliquet enim t e enim nulla.
t dolore magna aliqu ortor at auctor.
a.
In this case, the column spacing is 4 characters wide.
Is anyone aware of a Java library or utility that facilitates this? If not implemented in Java, is there anything that could do this, that could be invoked from Java code?
Good, someone could insert a text box in the lowest position of a word document in a floating way, attached an image, thanks for your help
This is not supported using the high level XWPF classes of apache poi until now. But *.docx is simply a ZIP archive containing XML files in an directory structure. So we can create what we want using Word and then have a look at /word/document.xml in the resulting *.docx ZIP archive. Then we can try reproducing that XML using the low level ooxml-schema classes.
The following example needs the full jar of all of the schemas ooxml-schemas-1.4.jar as mentioned in FAQ. It is tested using apache poi 4.1.1.
The example is positioning the text box bottom right of the page. But not all printers will be able printing seamless. So the better choose would be using mso-position-*-relative:margin instead mso-position-*-relative:page. There the page margin determines the bottom right position.
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTxbxContent;
import com.microsoft.schemas.vml.CTGroup;
import com.microsoft.schemas.vml.CTRect;
import com.microsoft.schemas.office.word.STWrapType;
import org.w3c.dom.Node;
public class CreateWordTextBox {
public static void main(String[] args) throws Exception {
String textBoxWidth = "200pt";
String textBoxHeight = "200pt";
XWPFDocument doc= new XWPFDocument();
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The Body text: ");
CTGroup ctGroup = CTGroup.Factory.newInstance();
CTRect ctRect = ctGroup.addNewRect();
ctRect.addNewWrap().setType(STWrapType.SQUARE);
ctRect.setStyle("position:absolute"
+ ";width:" + textBoxWidth
+ ";height:" + textBoxHeight
+ ";mso-position-horizontal:right"
+ ";mso-position-horizontal-relative:page"
//+ ";mso-position-horizontal-relative:margin"
+ ";mso-position-vertical:bottom"
+ ";mso-position-vertical-relative:page"
//+ ";mso-position-vertical-relative:margin"
);
CTTxbxContent ctTxbxContent = ctRect.addNewTextbox().addNewTxbxContent();
ctTxbxContent.addNewP().addNewR().addNewT().setStringValue("Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.");
Node ctGroupNode = ctGroup.getDomNode();
CTPicture ctPicture = CTPicture.Factory.parse(ctGroupNode);
run=paragraph.createRun();
CTR cTR = run.getCTR();
cTR.addNewPict();
cTR.setPictArray(0, ctPicture);
paragraph = doc.createParagraph();
FileOutputStream out = new FileOutputStream("WordTextBox.docx");
doc.write(out);
out.close();
}
}
I would like to have 2 sections on the same page in XWPFDocument. First section should have only 1 column, second section should have 2 columns. Currently, I am using following code:
CTBody body = document.getDocument().getBody();
// 1-column section
section = body.addNewSectPr();
columns = CTColumns.Factory.newInstance();
columns.setNum(new BigInteger("1"));
section.setCols(columns);
paragraph = document.createParagraph();
paragraph.getCTP().addNewPPr().setSectPr(section);
run = paragraph.createRun();
run.setText(firstSectionContent);
//2-column section
section = body.addNewSectPr();
columns = CTColumns.Factory.newInstance();
columns.setNum(new BigInteger("2"));
section.setCols(columns);
paragraph = document.createParagraph();
paragraph.getCTP().addNewPPr().setSectPr(section);
run = paragraph.createRun();
run.setText(secondSectionContent);
This produces 2 sections with correct number of columns, but the sections are not on the same page. How to apply continuous section break instead of next page section break?
The CTSectPr needs to be of CTSectType CONTINUOUS.
Example:
import java.io.File;
import java.io.FileOutputStream;
import java.math.BigInteger;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.Borders;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTColumns;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTColumn;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocGrid;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STDocGrid;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STSectionMark;
public class Word2ColumnPage {
public static void main(String[] args) throws Exception {
XWPFDocument document= new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("One column on top. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.");
paragraph = document.createParagraph();
//paragraph with section setting for one column section above
paragraph = document.createParagraph();
CTSectPr ctSectPr = paragraph.getCTP().addNewPPr().addNewSectPr();
CTColumns ctColumns = ctSectPr.addNewCols();
ctColumns.setNum(BigInteger.valueOf(1));
//left column
paragraph = document.createParagraph();
run=paragraph.createRun();
run.setText("The left side");
paragraph = document.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.");
paragraph = document.createParagraph();
// right column
//paragraph with column break
paragraph = document.createParagraph();
run=paragraph.createRun();
run.addBreak(BreakType.COLUMN);
run=paragraph.createRun();
run.setText("The right side");
//left border for the paragrapphs on right side
paragraph.setBorderLeft(Borders.THREE_D_EMBOSS);
paragraph.getCTP().getPPr().getPBdr().getLeft().setSz(BigInteger.valueOf(20));
paragraph = document.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.");
paragraph.setBorderLeft(Borders.THREE_D_EMBOSS);
paragraph.getCTP().getPPr().getPBdr().getLeft().setSz(BigInteger.valueOf(20));
paragraph = document.createParagraph();
paragraph.setBorderLeft(Borders.THREE_D_EMBOSS);
paragraph.getCTP().getPPr().getPBdr().getLeft().setSz(BigInteger.valueOf(20));
//paragraph with section break continuous for two column section above
paragraph = document.createParagraph();
ctSectPr = paragraph.getCTP().addNewPPr().addNewSectPr();
ctSectPr.addNewType().setVal(STSectionMark.CONTINUOUS);
ctColumns = ctSectPr.addNewCols();
ctColumns.setNum(BigInteger.valueOf(2));
ctColumns.setEqualWidth(STOnOff.OFF);
ctColumns.setSep(STOnOff.ON);
CTColumn ctColumn = ctColumns.addNewCol();
ctColumn.setW(BigInteger.valueOf(6000));
ctColumn.setSpace(BigInteger.valueOf(300));
ctColumn = ctColumns.addNewCol();
ctColumn.setW(BigInteger.valueOf(3000));
paragraph.setBorderLeft(Borders.THREE_D_EMBOSS);
paragraph.getCTP().getPPr().getPBdr().getLeft().setSz(BigInteger.valueOf(20));
paragraph = document.createParagraph();
run=paragraph.createRun();
run.setText("One column on bottom");
paragraph = document.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.");
//section setting continuous for one column section above
CTDocument1 ctDocument = document.getDocument();
CTBody ctBody = ctDocument.getBody();
ctSectPr = ctBody.addNewSectPr();
ctSectPr.addNewType().setVal(STSectionMark.CONTINUOUS);
ctColumns = ctSectPr.addNewCols();
ctColumns.setNum(BigInteger.valueOf(1));
FileOutputStream out = new FileOutputStream("Word2ColumnPage.docx");
document.write(out);
out.close();
document.close();
}
}
Up to apache poi 4.1.2 this code needs the full jar of all of the schemas ooxml-schemas-*.jar as mentioned in FAQ-N10025.
The principle of sections having multiple columns in Word are as follows:
Per default - using no special settings - Word uses a one column section.
If section settings shall change, then a paragraph with section settings for the section above is needed. All body elements above that paragraph use those settings. All body elements after that paragraph are in a new section and use settings of next paragraph having section settings or use sections settings in document body.
At last section settings for last section above needs to be in body settings.
Using apache poi 5.0.0 the org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff was removed in poi-ooxml-full-5.0.0.jar. And CTColumns.setEqualWidth and CTColumns.setSep uses java.langObject as attribute type now.
So it would must be now:
//ctColumns.setEqualWidth(STOnOff.OFF);
ctColumns.setEqualWidth("0");
//ctColumns.setSep(STOnOff.ON);
ctColumns.setSep("1");
What I am trying to achieve is to match all words in text, but ignore those words in line (before new line) that start with 4 whitespaces.
Example
Text file to find words:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat.
This must NOT be matched. Because it has 4 whitespaces at the beginning.
Lorem ipsum dolor sit amet. Ut enim ad minim veniam.
So, the words in following line should be NOT considered to match pattern:
This must NOT be matched. Because it has 4 whitespaces at the beginning.
Code
Here is my regex and it can find all words:
\\b[A-Za-z]+\\b
I know that in Java's RegEx syntax there is except which is ^ symbol but I only know how to use it in more simple expressions.
Maybe following snippet could be a basis for what you want to achieve.
String[] lines = {"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do",
"eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut",
"enim ad minim veniam, quis nostrud exercitation ullamco laboris",
"nisi ut aliquip ex ea commodo consequat.",
"",
" This must NOT be matched. Because it has 4 whitespaces at the beginning.",
"",
"Lorem ipsum dolor sit amet. Ut enim ad minim veniam."};
for (String line : lines) {
if (!line.startsWith(" ")) {
String[] words = line.split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+");
System.out.println("words = " + Arrays.toString(words));
}
}
output
words = [Lorem, ipsum, dolor, sit, amet, consectetur, adipiscing, elit, sed, do]
words = [eiusmod, tempor, incididunt, ut, labore, et, dolore, magna, aliqua, Ut]
words = [enim, ad, minim, veniam, quis, nostrud, exercitation, ullamco, laboris]
words = [nisi, ut, aliquip, ex, ea, commodo, consequat]
words = []
words = []
words = [Lorem, ipsum, dolor, sit, amet, Ut, enim, ad, minim, veniam]
PS: the regex has been borrowed from this answer
The following should do that
(?<!\s{4})\\b[A-Za-z]+\\b
It begins with a negative lookbehind so it won't match anything with \s{4} preceding it.
I have a text String, in this form
Lorem ipsum dolor sit amet,
consectetuer adipiscing elit
,
lo
sed diam
nonummy nibh
quis
nostrud exerci.
So looks realy bad when I set the text in a textView.
I need that the String is loaded in this form
Lorem ipsum dolor sit amet,
consectetuer adipiscing elit,
lo sed diam nonummy nibh quis
nostrud exerci.
Filling all the row (when is possible) before start new line.
Since cannot edit all the db entries to adjust the text.
Use this code to remove all new line special characters in the text
yourstring.replaceAll("[\n\r]", "")