Unable to read entire cell of a .doc file using Apache POI - java

In one of my projects I need to read images from a .doc file using Apache POI. For each row there is a cell containing an images(one, two, three, etc. ) which I need to read out along side with text data.
So I tried the following code
FileInputStream fileInputStream = new FileInputStream(file);
POIFSFileSystem poifsFileSystem = new POIFSFileSystem(fileInputStream);
HWPFDocument doc = new HWPFDocument(poifsFileSystem);
Range range = doc.getRange();
PicturesTable pictureTable = doc.getPicturesTable();
PicturesSource pictures = new PicturesSource(doc);
Paragraph tableParagraph = range.getParagraph(0);
Table table = range.getTable(tableParagraph);
TableRow row = table.getRow(0);
TableCell cell1 = row.getCell(0);
for (int j = 0; j < cell1.getParagraph(0).numCharacterRuns(); j++) {
CharacterRun cr = cell1.getParagraph(0).getCharacterRun(j);
if (pictureTable.hasPicture(cr)) {
logger.debug("Has picture If--");
Picture picture = pictures.getFor(cr);
logger.debug("pictures Description--" + picture.getDescription());
}
}
Now I am able to read images of a particular cell, but the problem is I am not able to read all the images of a cell means, I am able to read image before the text and image in between the text, but I am not able to read the image which is followed by the text. Example "image_1---some text---image_2 some text---.image_3". Now in this case I am not able to read image_3 only. What should I do, So I can read image_3 also. I searched a lot but no luck till now. Hope someone knows the way to do this. Thanks in Advance.

With the HWPFDocument, I am having problems, too. If you have a chance to change the Word documents to docx before processing, here's an example that works with XWPFDocuments:
FileInputStream fileInputStream = new FileInputStream(file);
XWPFDocument doc = new XWPFDocument(fileInputStream);
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph para : cell.getParagraphs()) {
for (XWPFRun run : para.getRuns()) {
for (XWPFPicture pic : run.getEmbeddedPictures()) {
System.out.println(pic.getPictureData());
}
}
}
}
}
}

Related

Cell style is lost or not displayed in Excel 97-2003 (.xls)

I am using the Apache POI library to export data to Excel. I have tried all the latest versions (3.17, 4.1.2, and 5.2.1).
I have a problem with Excel 97 (.xls) format in relation to cell styles. The cell style somehow is lost (or not displayed) after a certain number of columns.
Here is my sample code:
private void exportXls() {
try (
OutputStream os = new FileOutputStream("test.xls");
Workbook wb = new HSSFWorkbook();) {
Sheet sh = wb.createSheet("test");
Row r = sh.createRow(0);
for (int i = 0; i < 50; i++) {
Cell c = r.createCell(i);
c.setCellValue(i + 1);
CellStyle cs = wb.createCellStyle();
cs.setFillBackgroundColor(IndexedColors.WHITE.index);
cs.setFillPattern(FillPatternType.SOLID_FOREGROUND);
cs.setFillForegroundColor(IndexedColors.LIGHT_BLUE.getIndex());
c.setCellStyle(cs);
}
wb.write(os);
os.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
And the result as viewed by MS Excel 2019
Viewed by MS Excel
As you can see, the style/format is lost after cell 43rd.
But, when I open the same file by other applications like XLS Viewer Free (from Microsoft Store) or Google Sheets (online), the style/format still exists and is displayed well.
Viewed by XLS Viewer Free
Viewed by Google Sheets
Could anyone please tell me what is going on here?
Did I miss something in my code?
Is there any hidden setting in MS Excel that causes this problem?
Thank you.
Creating cell styles for each single cell is not a good idea using apache poi. Cell styles are stored on workbook level in Excel. The sheets and cells share the cell styles if possible.
And there are limits for maximum count of different cell styles in all Excel versions. The limit for the binary *.xls is less than the one for the OOXML *.xlsx.
The limit alone cannot be the only reason for the result you have. But it seems as if Excel is not very happy with the 50 exactly same cell styles in workbook. Those are memory waste as only one shared style would be necessary as all the 50 cells share the same style.
Solutions are:
Do creating the cell styles on workbook level outside cell creating loops and only set the styles to the cells in the loop.
Example:
private static void exportXlsCorrect() {
try (
OutputStream os = new FileOutputStream("testCorrect.xls");
Workbook wb = new HSSFWorkbook();) {
CellStyle cs = wb.createCellStyle();
cs.setFillBackgroundColor(IndexedColors.WHITE.index);
cs.setFillPattern(FillPatternType.SOLID_FOREGROUND);
cs.setFillForegroundColor(IndexedColors.LIGHT_BLUE.getIndex());
Sheet sh = wb.createSheet("test");
Row r = sh.createRow(0);
for (int i = 0; i < 50; i++) {
Cell c = r.createCell(i);
c.setCellValue(i + 1);
c.setCellStyle(cs);
}
wb.write(os);
os.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
Sometimes it is not really possible to know all possible needed cell styles before creating the cells. Then CellUtil can be used. This has a method CellUtil.setCellStyleProperties which is able to set specific style properties to cells. Doing that new cell styles are created on workbook level only if needed. If already present, the present cell styles are used.
Example:
private static void exportXlsUsingCellUtil() {
try (
OutputStream os = new FileOutputStream("testUsingCellUtil.xls");
Workbook wb = new HSSFWorkbook();) {
Sheet sh = wb.createSheet("test");
Row r = sh.createRow(0);
for (int i = 0; i < 50; i++) {
Cell c = r.createCell(i);
c.setCellValue(i + 1);
java.util.Map<java.lang.String,java.lang.Object> properties = new java.util.HashMap<java.lang.String,java.lang.Object>();
properties.put(org.apache.poi.ss.util.CellUtil.FILL_BACKGROUND_COLOR, IndexedColors.WHITE.index);
properties.put(org.apache.poi.ss.util.CellUtil.FILL_FOREGROUND_COLOR, IndexedColors.LIGHT_BLUE.getIndex());
properties.put(org.apache.poi.ss.util.CellUtil.FILL_PATTERN, FillPatternType.SOLID_FOREGROUND);
org.apache.poi.ss.util.CellUtil.setCellStyleProperties(c, properties);
}
wb.write(os);
os.flush();
} catch (Exception e) {
e.printStackTrace();
}
}

Updating .XLSM file using Apache POI

I am trying to update an existing .XLSM file using Apache POI. Every time I run my code I receive an error as shown below.
Exception in thread "main" java.lang.IllegalArgumentException: Attempting to write a row[1] in the range [0,9] that is already written to disk.
at org.apache.poi.xssf.streaming.SXSSFSheet.createRow(SXSSFSheet.java:136)
at com.log.test.Test.main(Test.java:41)
Basically I wanted to use a macro enabled excel file as standard template , using java code i wanted to make a copy of template and update the some sheet's columns data and save the file.
I am trying with below sample code :
OPCPackage pkg = OPCPackage.open(new File("C:/LogTest/testme.xlsm"));
XSSFWorkbook wb_template;
wb_template = new XSSFWorkbook(pkg);
System.out.println("package loaded");
SXSSFWorkbook wb = new SXSSFWorkbook(wb_template);
wb.setCompressTempFiles(true);
SXSSFSheet sh = (SXSSFSheet) wb.getSheet("Asset Names");
sh.setRandomAccessWindowSize(100);
for (int rownum = 1; rownum < 10; rownum++) {
Row row = sh.createRow(rownum);
for (int cellnum = 0; cellnum < 2; cellnum++) {
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue("hello");
}
}
FileOutputStream out = new FileOutputStream(new File("C:/output/new.xlsm"));
wb.write(out);
out.close();
wb.dispose();
System.out.println("Done !!!");
Can this be achieved using Apache POI ? or i need to use some other libraries ?
sample template

Apache poi replace existing picture on header

Is there any way to replace an image on word(docx) file header by name of the image with apache poi? I'am thinking about that:
+--------------------------------+
+HEADER myimage.jpeg-+
+ -----------BODY------------+
+--------------------------------+
replaceImage("myimage.jpeg", newPictureInputStream,
"newPicture_name.jpeg");
Here what I tried:
XWPFParagraph originalParagraph = null;
originalParagraph = getPictureParagraphInHead(lookingPictureName);
ListIterator<XWPFRun> it = originalParagraph.getRuns().listIterator();
XWPFRun replacedRun = null;
while (it.hasNext()) {
XWPFRun run = it.next();
int runIDX = it.nextIndex();
if (run.getEmbeddedPictures().size() > 0) {
XWPFRun newRun = null;
newRun = new XWPFRun(run.getCTR(), (IRunBody) originalParagraph);
originalParagraph.addRun(newRun);
originalParagraph.removeRun(originalParagraph.getRuns().indexOf(run));
break;
}
}
I'm not sure if you can get the "filename" of the image with POI. It's probably in the XML so you might have to make your own method for finding the image.
To get the Header you do:
XWPFHeaderFooterPolicy policy = new XWPFHeaderFooterPolicy(doc); // XWPFDocument
XWPFHeader header = policy.getDefaultHeader();
And to delete the images, get the XWPFRun from your paragraph (cell/row/table..)
CTR ctr = myRun.getCTR(); //
List<CTDrawing> images = ctr.getDrawingList();
for (int i=0; i<images.size(); i++)
{
ctr.removeDrawing(i);
}

Changing XLSX form control location with Apache POI

I've some number of xlsm files containing form controls. I'd like to programmatically move a particular button down a few rows on each sheet. My first hope was to do something like this:
FileInputStream inputStream = new FileInputStream(new File("t.xlsm"));
XSSFWorkbook wb = new XSSFWorkbook(inputStream);
XSSFSheet xs = (XSSFSheet)wb.getSheetAt(1);
RelationPart rp = xs.getRelationParts().get(0);
XSSFDrawing drawing = (XSSFDrawing)rp.getDocumentPart();
for(XSSFShape sh : drawing.getShapes()){
XSSFClientAnchor a = (XSSFClientAnchor)sh.getAnchor();
if (sh.getShapeName().equals("Button 2")) {
a.setRow1(a.getRow1()+10);
a.setRow2(a.getRow2()+10);
}
}
However, the shape objects given by XSSFDrawing.getShapes() are copies and any changes to them are not reflected in the document after a wb.write().
I tried a couple other approaches, such as getting the CTShape and parsing the XML within but things quickly got hairy.
Is there a recommended way to manage form controls like this via POI?
I ended up fiddling directly with the XML:
wb = new XSSFWorkbook(new File(xlsmFile));
XSSFSheet s = wb.getSheet("TWO");
XmlObject[] subobj = s.getCTWorksheet().selectPath(declares+
" .//mc:AlternateContent/mc:Choice/main:controls/mc:AlternateContent/mc:Choice/main:control");
String targetButton = "Button 2";
int rowsDown = 10;
for (XmlObject obj : subobj) {
XmlCursor cursor = obj.newCursor();
cursor.push();
String attrName = cursor.getAttributeText(new QName("name"));
if (attrName.equals(targetButton)) {
cursor.selectPath(declares+" .//main:from/xdr:row");
if (!cursor.toNextSelection()) {
throw new Exception();
}
int newRow = Integer.parseInt(cursor.getTextValue()) + rowsDown;
cursor.setTextValue(Integer.toString(newRow));
cursor.pop();
cursor.selectPath(declares+" .//main:to/xdr:row");
if (!cursor.toNextSelection()) {
throw new Exception();
}
newRow = Integer.parseInt(cursor.getTextValue()) + rowsDown;
cursor.setTextValue(Integer.toString(newRow));
}
cursor.dispose();
}
This moves the named button down 10 rows. I had to discover the button name (which may not be easy to do via Excel, I inspected the file directly). I'm guessing this is going to be very sensitive to the version of Excel in use.

Find a table in word and write in that table using java

I have a word document which may have n number of tables. The table is identified by the table name which is written in the 1st cell as heading. Now i have to find the table with table name and write in one of the cell of that table. I tried using apache-poi for the same but unable to figure out how to use it for my purpose. Please refer to the attached screen shot, if i am not able to explain how the document looks like.
Thanks
String fileName = "E:\\a1.doc";
if (args.length > 0) {
fileName = args[0];
}
InputStream fis = new FileInputStream(fileName);
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
for (int i=0; i<range.numParagraphs(); i++){
Paragraph tablePar = range.getParagraph(i);
if (tablePar.isInTable()) {
Table table = range.getTable(tablePar);
for (int rowIdx=0; rowIdx<table.numRows(); rowIdx++) {
for (int colIdx=0; colIdx<row.numCells(); colIdx++) {
TableCell cell = row.getCell(colIdx);
System.out.println("column="+cell.getParagraph(0).text());
}
}
}
}
this is what i have tried, but this reads only the 1st table.
I've found u get misunderstanding in poi.
If u just meant to read a table.Just use the TableIterator to fetch the table's content or u will get an exception with not start of table.
I suppose there is only one paragraph in every table cell.
InputStream fis = new FileInputStream(fileName);
POIFSFileSystem fs = new POIFSFileSystem(fis);
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
TableIterator itr = new TableIterator(range);
while(itr.hasNext()){
Table table = itr.next();
for(int rowIndex = 0; rowIndex < table.numRows(); rowIndex++){
TableRow row = table.getRow(rowIndex);
for(int colIndex = 0; colIndex < row.numCells(); colIndex++){
TableCell cell = row.getCell(colIndex);
System.out.println(cell.getParagraph(0).text());
}
}
}
I think Apache POI is the way to go. It's not well documented, but the time spent on research how to use it may be worth it. Word document is basically a hierarchical (tree) structure which you need to traverse and find the data you need.

Categories