I'm using apache poi for iteration table in docx file. All works fine but if table in text box, my code don't see table - table.size() = 0
XWPFDocument doc = new XWPFDocument(new FileInputStream(fileName));
List<XWPFTable> table = doc.getTables();
for (XWPFTable xwpfTable : table) {
List<XWPFTableRow> row = xwpfTable.getRows();
for (XWPFTableRow xwpfTableRow : row) {
List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
for (XWPFTableCell xwpfTableCell : cell) {
if(xwpfTableCell != null){
List<XWPFTable> itable = xwpfTableCell.getTables();
if(itable.size()!=0){
for (XWPFTable xwpfiTable : itable) {
List<XWPFTableRow> irow = xwpfiTable.getRows();
for (XWPFTableRow xwpfiTableRow : irow) {
List<XWPFTableCell> icell = xwpfiTableRow.getTableCells();
for (XWPFTableCell xwpfiTableCell : icell) {
if(xwpfiTableCell!=null){
}
}
}
}
}
}
}
}
}
Following code is low level parsing a *.docx document and getting all tables in document body of it.
The approach is using a org.apache.xmlbeans.XmlCursor and searching for all w:tbl elements in document.xml. If found add them to a List<CTTbl>.
Because a text box rectangle shape provides fall-back content in the document.xml, we need to skip the mc:Fallback elements. Else we would have the tables within the text boxes twice.
At last we go through the List<CTTbl> and get the contents of all the tables.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTbl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
import org.apache.xmlbeans.impl.values.XmlAnyTypeImpl;
import org.apache.xmlbeans.XmlCursor;
import javax.xml.namespace.QName;
import java.util.List;
import java.util.ArrayList;
public class WordReadAllTables {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("22.docx"));
CTBody ctbody = document.getDocument().getBody();
XmlCursor xmlcursor = ctbody.newCursor();
QName qnameTbl = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "tbl", "w");
QName qnameFallback = new QName("http://schemas.openxmlformats.org/markup-compatibility/2006", "Fallback", "mc");
List<CTTbl> allCTTbls = new ArrayList<CTTbl>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameTbl.equals(xmlcursor.getName())) {
if (xmlcursor.getObject() instanceof CTTbl) {
allCTTbls.add((CTTbl)xmlcursor.getObject());
} else if (xmlcursor.getObject() instanceof XmlAnyTypeImpl) {
allCTTbls.add(CTTbl.Factory.parse(xmlcursor.getObject().newInputStream()));
}
} else if (qnameFallback.equals(xmlcursor.getName())) {
xmlcursor.toEndToken();
}
}
}
for (CTTbl cTTbl : allCTTbls) {
StringBuffer tableHTML = new StringBuffer();
tableHTML.append("<table>\n");
for (CTRow cTRow : cTTbl.getTrList()) {
tableHTML.append(" <tr>\n");
for (CTTc cTTc : cTRow.getTcList()) {
tableHTML.append(" <td>");
for (CTP cTP : cTTc.getPList()) {
for (CTR cTR : cTP.getRList()) {
for (CTText cTText : cTR.getTList()) {
tableHTML.append(cTText.getStringValue());
}
}
}
tableHTML.append("</td>");
}
tableHTML.append("\n </tr>\n");
}
tableHTML.append("</table>");
System.out.println(tableHTML);
}
document.close();
}
}
This code needs the full jar of all of the schemas ooxml-schemas-1.3.jar as mentioned in faq-N10025.
Related
package parser;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
public class App {
public static void main(String[] args) {
List<List<List<String>>> tablesResults = new LinkedList<>();
try {
XWPFDocument doc = new XWPFDocument(new FileInputStream("filename"));
List<IBodyElement> documentBody = doc.getBodyElements();
for (IBodyElement i: documentBody){
if (i.getElementType() == org.apache.poi.xwpf.usermodel.BodyElementType.TABLE){
XWPFTable table = (XWPFTable) i;
List<XWPFTableRow> tableRows = table.getRows();
List<List<String>> tableList = new LinkedList<>();
for (XWPFTableRow r: tableRows){
List<String> rowList = new LinkedList<>();
for (XWPFTableCell cell: r.getTableCells()){
rowList.add(cell.getText());
}
tableList.add(rowList);
}
tablesResults.add(tableList);
}
}
for (List<List<String>> table: tablesResults){
for(List<String> row: table){
for(String cell: row){
System.out.print(cell + ", ");
}
System.out.println();
}
System.out.println("-------------------------");
}
} catch (IOException ex) {
System.out.println("Exception:");
System.out.println(ex.toString());
}
}
}
I am not able to extract the checkboxes from the tabular cells and also another table. at present I am using Apache poi, I need your suggestion and help to parse the data from a word document, in the next step I am going to compare this tabular data with another word document
picture of the table
I am using Apache POI API to read Excel file and check the quantity of products available or not. I am successfully reading excel file using below code but I am unable to read the specific cell (Quantity Column) to check weather asked product is available or not
package practice;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Iterator;
public class ReadingExcel {
private static String brand = "Nike";
private static String shoeName = "Roshe One";
private static String colour = "Black";
private static String size = "9.0";
private static String quantity;
public static void main(String [] args){
try {
FileInputStream excelFile = new FileInputStream(new File("C:\\Users\\admin\\Downloads\\Project\\assets\\Shoe_Store_Storeroom.xlsx"));
Workbook workbook = new XSSFWorkbook(excelFile);
Sheet datatypeSheet = workbook.getSheetAt(0);
DataFormatter dataFormatter = new DataFormatter();
Iterator<Row> iterator = datatypeSheet.iterator();
while (iterator.hasNext()) {
Row currentRow = iterator.next();
Iterator<Cell> cellIterator = currentRow.iterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
String cellValue = dataFormatter.formatCellValue(cell);
System.out.print(cellValue + "\t\t");
}
System.out.println();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
this is my excel file
using opencsv 4.1 i do this
import org.apache.commons.lang.StringUtils;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
...
try (InputStream excelFile = new FileInputStream(new File(fileName));
Workbook wb = new XSSFWorkbook(excelFile);) {
for (int numSheet = 0; numSheet < wb.getNumberOfSheets(); numSheet++) {
Sheet xssfSheet = wb.getSheetAt(numSheet);
xssfSheet.forEach(row -> {
Cell cell = row.getCell(0);
if (cell != null) {
if(StringUtils.isNumeric(cell.getStringCellValue())){
// Use here cell.getStringCellValue()
}
}
});
}
}
Use below for loop:
for(int r=sh.getFirstRowNum()+1; r<=sh.getLastRowNum(); r++){ //iterating through all the rows of excel
Row currentRow = sh.getRow(r); //get r'th row
Cell quantityCell = currentRow.getCell(4); //assuming your quantity cell will always be at position 4; if that is not the case check for the header value first
String currentCellValue = quantityCell.getStringCellValue();
if(currentCellValue.equalsIgnoreCase(<value you want to campare with>))
//<do your further actions here>
else
//do your further actions here
}
i'm a complete novice with apache POI and i already tried several things. My problem is that i have a few bookmarks in a docx-File and i want to replace the value of them.
i already got so far that i add the text to the bookmark, but the previous value is still there
my code:
InputStream fis = new FileInputStream(fileName);
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs)
{
//Here you have your paragraph;
CTP ctp = paragraph.getCTP();
// Get all bookmarks and loop through them
List<CTBookmark> bookmarks = ctp.getBookmarkStartList();
for(CTBookmark bookmark : bookmarks)
{
if(bookmark.getName().equals("Firma1234"))
{
System.out.println(bookmark.getName());
XWPFRun run = paragraph.createRun();
run.setText(lcFirma);
ctp.getDomNode().insertBefore(run.getCTR().getDomNode(), bookmark.getDomNode());
}
}
}
OutputStream out = new FileOutputStream(output);
document.write(out);
document.close();
out.close();
the value of "lcFirma" is "Firma"
the value of the Bookmark is "Testmark"
my docx-File before:
Testmark -> name=Firma1234
my docx-File after:
FirmaTestmark
like i said the text is inserted before the value of the bookmark instead of replacing it, how do i replace the text instead?
Greetings,
Kevin
I also had similar requirement of setting the "Default text" field of a .docx bookmark. I was not able to do so, so, I did this as a workaround : Replaced the entire paragraph containing the bookmark with text. So, instead of the bookmark being populated with a default text, I had a paragraph that held the bookmarked text. In my case, the .docx had to finally converted to a .pdf file, so the absence of bookmark did not matter, but the presence of correct text was more important.
This is how I did it with Apache POI :
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.commons.lang3.StringUtils;
import org.apache.poi.util.TempFileCreationStrategy;
import org.apache.poi.xdgf.usermodel.section.geometry.RelMoveTo;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.UserDataHandler;
/**
*
* #author binita.bharati#gmail.com
*
* This code will replace bookmark with plain text. A bookmark is seen as "Text Form Field" in a .docx file.
*
*/
public class BookmarkReplacer {
public static void main(String[] args) throws Exception {
replaceBookmark();
}
private static String replaceBookmarkedPara(String input, String bookmarkTxt) {
char[] tmp = input.toCharArray();
StringBuilder sb = new StringBuilder();
int bookmarkedCharCount = 0;
for (int i = 0 ; i < tmp.length ; i++) {
int asciiCode = tmp[i];
if (asciiCode == 8194) {
bookmarkedCharCount ++;
if (bookmarkedCharCount == 5) {
sb.append(bookmarkTxt);
}
}
else {
sb.append(tmp[i]);
}
}
return sb.toString();
}
private static void removeAllRuns(XWPFParagraph paragraph) {
int size = paragraph.getRuns().size();
for (int i = 0; i < size; i++) {
paragraph.removeRun(0);
}
}
private static void insertReplacementRuns(XWPFParagraph paragraph, String replacedText) {
String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacedText, "\n");
for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
String part = replacementTextSplitOnCarriageReturn[j];
XWPFRun newRun = paragraph.insertNewRun(j);
newRun.setText(part);
if (j+1 < replacementTextSplitOnCarriageReturn.length) {
newRun.addCarriageReturn();
}
}
}
public static void replaceBookmark () throws Exception
{
InputStream fis = new FileInputStream("C:\\input.docx");
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs)
{
//Here you have your paragraph;
CTP ctp = paragraph.getCTP();
// Get all bookmarks and loop through them
List<CTBookmark> bookmarks = ctp.getBookmarkStartList();
for(CTBookmark bookmark : bookmarks)
{
if(bookmark.getName().equals("data_incipit") || bookmark.getName().equals("incipit_Codcli")
|| bookmark.getName().equals("Incipit_titolo"))
{
String paraText = paragraph.getText();
System.out.println("paraText = "+paraText +" for bookmark name "+bookmark.getName());
String replacementText = replaceBookmarkedPara(paraText, "haha");
removeAllRuns(paragraph);
insertReplacementRuns(paragraph, replacementText);
}
}
}
OutputStream out = new FileOutputStream("C:\\output.docx");
document.write(out);
document.close();
out.close();
}
}
Try below code
private List<XWPFParagraph> collectParagraphs()
{
List<XWPFParagraph> paragraphs = new ArrayList<>();
paragraphs.addAll(this.document.getParagraphs());
for (XWPFTable table : this.document.getTables())
{
for (XWPFTableRow row : table.getRows())
{
for (XWPFTableCell cell : row.getTableCells())
paragraphs.addAll(cell.getParagraphs());
}
}
return paragraphs;
}
public List<String> getBookmarkNames()
{
List<String> bookmarkNames = new ArrayList<>();
Iterator<XWPFParagraph> paraIter = null;
XWPFParagraph para = null;
List<CTBookmark> bookmarkList = null;
Iterator<CTBookmark> bookmarkIter = null;
CTBookmark bookmark = null;
XWPFRun run = null;
// Get an Iterator for the XWPFParagraph object and step through them
// one at a time.
paraIter = collectParagraphs().iterator();
while (paraIter.hasNext())
{
para = paraIter.next();
// Get a List of the CTBookmark object sthat the paragraph
// 'contains' and step through these one at a time.
bookmarkList = para.getCTP().getBookmarkStartList();
bookmarkIter = bookmarkList.iterator();
while (bookmarkIter.hasNext())
{
bookmark = bookmarkIter.next();
bookmarkNames.add(bookmark.getName());
}
}
return bookmarkNames;
}
I have a class:
class Node{
private Node parent;
private List<Node> children;
...
}
How I can export a tree of it's items to excel using Apache POI for getting document like this (I need to shift only first column in table):
A
B
C
D
E
F
G
A simple solution would be to create a NodeWriter class that essentially writes the Node onto an Excel Spreadsheet:
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class NodeWriter {
public void write(Node tree, String filePathName) {
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Tree");
writeHelp(0, 1, tree, sheet);
try (FileOutputStream outputStream = new FileOutputStream(filePathName)) {
workbook.write(outputStream);
workbook.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private void writeHelp(int indent, int rowNum, Node tree, XSSFSheet sheet) {
if (sheet.getRow(rowNum) != null) {
writeHelp(indent, rowNum+1, tree, sheet);
} else {
Row row = sheet.createRow(rowNum);
Cell cell = row.createCell(indent);
cell.setCellValue(tree.getNodeName());
for (Node child : tree.getChildren()) {
writeHelp(indent + 1, rowNum + 1, child, sheet);
}
}
}
}
I've made some assumptions about your Node class. This solution ensures that you create a new Row and don't overwrite existing rows (as you would if that if loop wasn't there in writeHelp).
My solution - merging. Have a nice day. Thanks.
Looks like this:
https://docs.oracle.com/cd/E36352_01/epm.1112/disclosure_mgmt_admin/new_files/image002.jpg
My solution with merging:
private int createHierarchy(Sheet sheet, Node node, int currentRowIdx, int nodeLevel) {
if(node.getParent() == null){
sheet.setColumnWidth(8, 1000);
Row row = sheet.createRow(currentRowIdx);
row.createCell(nodeLevel).setCellValue(node.getName());
row.createCell(9).setCellValue(node.getValue());
sheet.addMergedRegion(new CellRangeAddress(currentRowIdx, currentRowIdx, nodeLevel, 8));
nodeLevel++;
}
for (Node node : node.getChildren()) {
Row row = sheet.createRow(++currentRowIdx);
row.createCell(nodeLevel).setCellValue(node.getName());
row.createCell(9).setCellValue(node.getValue());
sheet.addMergedRegion(new CellRangeAddress(currentRowIdx, currentRowIdx, nodeLevel, 8));
currentRowIdx = createHierarchy(sheet, node, currentRowIdx, nodeLevel+1);
}
return currentRowIdx;
}
I have spent countless hours trying to find a solution to this. I have tried Apache POI, JExcel and JXLS but no where have I found code to successfully read checkbox (form control) values.
If anyone has found a working solution then it would be great if you could share it here. Thanks!
UPDATE
I have written code that reads the checkbox but it cannot determine whether it is checked or not.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.CommonObjectDataSubRecord;
import org.apache.poi.hssf.record.ObjRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.SubRecord;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
public class App {
private static final String path = "C:\\test.xls";
private static final String Workbook = "Workbook";
private static void readExcelfile() {
FileInputStream file = null;
try {
file = new FileInputStream(new File(path));
// Get the workbook instance for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(file);
// Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t\t");
break;
}
}
System.out.println();
}
// file.close();
// FileOutputStream out = new FileOutputStream(
// new File(path));
// workbook.write(out);
// out.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (file != null)
file.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
private static void readCheckbox() {
FileInputStream file = null;
InputStream istream = null;
try {
file = new FileInputStream(new File(path));
POIFSFileSystem poifs = new POIFSFileSystem(file);
istream = poifs.createDocumentInputStream(Workbook);
HSSFRequest req = new HSSFRequest();
req.addListenerForAllRecords(new EventExample());
HSSFEventFactory factory = new HSSFEventFactory();
factory.processEvents(req, istream);
} catch (Exception ex) {
ex.printStackTrace();
} finally {
try {
if (file != null)
file.close();
if (istream != null)
istream.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
public static void main(String[] args) {
System.out.println("ReadExcelFile");
readExcelfile();
System.out.println("ReadCheckbox");
readCheckbox();
}
}
class EventExample implements HSSFListener {
public void processRecord(Record record) {
switch (record.getSid()) {
case ObjRecord.sid:
ObjRecord objRec = (ObjRecord) record;
List<SubRecord> subRecords = objRec.getSubRecords();
for (SubRecord subRecord : subRecords) {
if (subRecord instanceof CommonObjectDataSubRecord) {
CommonObjectDataSubRecord datasubRecord = (CommonObjectDataSubRecord) subRecord;
if (datasubRecord.getObjectType() == CommonObjectDataSubRecord.OBJECT_TYPE_CHECKBOX) {
System.out.println("ObjId: "
+ datasubRecord.getObjectId() + "\nDetails: "
+ datasubRecord.toString());
}
}
}
break;
}
}
}
Sorry for the late reply, but I ran into the same. I found a trick to determine the checkbox state.
In your example you ar looping over the SubRecords and you examine the CommonObjectDataSubRecord. But the value for the checkbox can be found in the one of the SubRecord.UnknownSubRecord. This is unfortunately a private class so you cannot call any method on it, but the toString() reveals the data, and with a little regex the value can be found. So using the code below I managed to retrieve the state of the checkbox:
Pattern p = Pattern.compile("\\[sid=0x000A.+?\\[0(\\d),");
if (!(subRecord instanceof CommonObjectDataSubRecord)) {
Matcher m = p.matcher(subRecord.toString());
if (m.find()) {
String checkBit = m.group(1);
if (checkBit.length() == 1) {
boolean checked = "1".equals(checkBit);
checkBox.setChecked(checked);
}
}
}
Now my challenge is to retrieve the checkbox value in a xlsx file...