How to remove user-defined style for XLS via ApachePOI? - java

I want remove some predefined styles for XLS - for example "Good". For XLSX there is no problem: create new CTCellStyle (unfortunatelly by reflection), setName("Good"), setBuiltinId(26) and setHidden(true) - now Excel (2016) doesn’t show "Good" style. Can I do sth like this for XLS?
EDIT
Sample code:
Hidding style for XLSX - there is no problem:
StylesTable styleSource = xssfWorkbook.getStylesSource(); // xssfWorkbook is instance of XSSFWorkbook
try {
// get ctCellStyles (by reflection)
Field field = StylesTable.class.getDeclaredField("doc");
field.setAccessible(true);
Object obj = field.get(styleSource);
StyleSheetDocument ssd = (StyleSheetDocument) obj;
CTStylesheet ctStyleSheet = ssd.getStyleSheet();
CTCellStyles ctCellStyles = ctStyleSheet.getCellStyles();
// find style "Good"
for (int i = 0; i < ctCellStyles.sizeOfCellStyleArray(); i++) {
CTCellStyle ctCellStyle = ctCellStyles.getCellStyleArray(i);
if (ctCellStyle.getName().equals("Good")) {
XmlBoolean hiddenXml = XmlBoolean.Factory.newInstance();
hiddenXml.setStringValue("1");
ctCellStyle.xsetHidden(hiddenXml);
}
}
} catch (Exception e) {}
Hidding style for XLS:
If style exists in workbook I can get it, but how to set it as "hidden"?
try {
// get InternalWorkbook (by reflection)
Field field = HSSFWorkbook.class.getDeclaredField("workbook");
field.setAccessible(true);
Object iwb = field.get(hssfWorkbook); // hssfWorkbook is instance of HSSFWorkbook
InternalWorkbook internalWorkbook = (InternalWorkbook) iwb;
// find style "Good"
for (int xfIndex = 0; xfIndex < internalWorkbook.getNumRecords(); xfIndex++) {
// try to get every record as StyleRecord from internalWorkbook
StyleRecord styleRecord = internalWorkbook.getStyleRecord(xfIndex);
if (styleRecord != null && styleRecord.getName() != null) {
if (styleRecord.getName().equals("Good")) {
new DebugUtil(styleRecord.getName());
// TODO set here sth like "hidden" for styleRecord or maybe:
// get style with current id from workbook
HSSFCellStyle hssfCellStyle = hssfWorkbook.getCellStyleAt((short) xfIndex); // workbook is instance of org.apache.poi.ss.usermodel.Workbook
// TODO set here sth like "hidden" for hssfCellStyle
}
}
}
} catch (Exception e) {}
Even If I could mark style as "hidden", there is other problem: If I iterate from 0 to internalWorkbook.getNumRecords() I get only existing styles. So if I'm creating workbook self, probably I should create new StyleRecord and/or HSSFCellStyle and mark as "hidden". I tried this:
int size = internalWorkbook.getSize();
StyleRecord newStyleRecord = internalWorkbook.createStyleRecord(size);
HSSFCellStyle newHssfCellStyle = hssfWorkbook.createCellStyle();
newHssfCellStyle.setAlignment((short) 3); // align right, for tests, to see difference between original and created "Good" style
newStyleRecord.setName("Good");
// TODO set here sth like "hidden" for newStyleRecord and/or for newHssfCellStyle
This is the way to set my own "Good" style. If I don't do it, Excel (2016) will show default "Good" style.

You should be able to use HSSFWorkbook.getCellStyleAt(int index) to access styles at a given position.

Related

SharedStringsTable payload before parsing [duplicate]

I am trying to open a large size(>30mb) .xlsx and copy all the rows and columns(consists of >200k rows) in that sheet into a new workbook sheet. I got an error on the following code:
FileInputStream fis = new FileInputStream(file);
XSSFWorkbook newWorkBook = new XSSFWorkbook(fis);
Increasing heap space does not help. After much research, i understand that a work around is either to use XSSF and SAX (Event API) or XLSX2CSV.java. I just need to copy the whole data from old sheet to new sheet. Somehow after trying SAX, i am stuck as i am not sure how to get the value from old sheet to copy to new sheet. Also, empty cells are not included in the SST. i need to copy over all cells, inclusive of empty cell.
#Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
//print cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if value is an index in the SST
String cellType = attributes.getValue("t");
System.out.println("CellType " + cellType);
if(cellType != null && cellType.equals("s")) {
isNextString = true;
}else {
isNextString = false;
}
}
//Clear last content
lastContents = "";
}
I am using Java, though i can try on C#.

Read large size xlsx file and copy all

I am trying to open a large size(>30mb) .xlsx and copy all the rows and columns(consists of >200k rows) in that sheet into a new workbook sheet. I got an error on the following code:
FileInputStream fis = new FileInputStream(file);
XSSFWorkbook newWorkBook = new XSSFWorkbook(fis);
Increasing heap space does not help. After much research, i understand that a work around is either to use XSSF and SAX (Event API) or XLSX2CSV.java. I just need to copy the whole data from old sheet to new sheet. Somehow after trying SAX, i am stuck as i am not sure how to get the value from old sheet to copy to new sheet. Also, empty cells are not included in the SST. i need to copy over all cells, inclusive of empty cell.
#Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
//print cell reference
System.out.print(attributes.getValue("r") + " - ");
// Figure out if value is an index in the SST
String cellType = attributes.getValue("t");
System.out.println("CellType " + cellType);
if(cellType != null && cellType.equals("s")) {
isNextString = true;
}else {
isNextString = false;
}
}
//Clear last content
lastContents = "";
}
I am using Java, though i can try on C#.

How to detect excel cell reference style of a file using apache POI?

I get an excel file through front end and I do not know what is the user preferred cell reference style (A1 or R1C1) for that file. I want to display the header with column position as present in the file.
For example, if the file is using R1C1 reference style then the column position should be shown as 1, 2, 3... and for A1 references, it should return A, B C...
I want to achieve this using Java apache POI. Any lead in this will be helpful.
Thanks in advance.
The used reference mode (either A1 or R1C1) can be stored in the Excel files. It may be omitted. Then Excel defaults to the last used setting in application.
In the old binary *.xls file system (HSSF) it gets stored using a RefModeRecord in the worksheet' s record stream. Although it cannot be different for single worksheets, it will be stored for each worksheet separately. But it cannot be different for different sheets in same workbook.
In Office Open XML file system (*.xlsx, XSSF) it gets stored in xl/workbook.xml using element calcPr having attribute refMode set.
Both is not dicrectly suppoerted by apache poi upto now. But if one knows the internally structure of the file systems, then it can be set and get using following code:
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.record.RecordBase;
import org.apache.poi.hssf.record.RefModeRecord;
import org.apache.poi.hssf.model.InternalSheet;
import java.lang.reflect.Field;
import java.util.List;
public class CreateExcelRefModes {
static void setRefMode(HSSFWorkbook hssfWorkbook, String refMode) throws Exception {
for (Sheet sheet : hssfWorkbook) {
HSSFSheet hssfSheet = (HSSFSheet)sheet;
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
RefModeRecord refModeRecord = null;
for (RecordBase record : records) {
if (record instanceof RefModeRecord) refModeRecord = (RefModeRecord)record;
}
if ("R1C1".equals(refMode)) {
if (refModeRecord == null) {
refModeRecord = new RefModeRecord();
records.add(records.size() - 1, refModeRecord);
}
refModeRecord.setMode(RefModeRecord.USE_R1C1_MODE);
} else if ("A1".equals(refMode)) {
if (refModeRecord == null) {
refModeRecord = new RefModeRecord();
records.add(records.size() - 1, refModeRecord);
}
refModeRecord.setMode(RefModeRecord.USE_A1_MODE);
}
}
}
static String getRefMode(HSSFWorkbook hssfWorkbook) throws Exception {
for (Sheet sheet : hssfWorkbook) {
HSSFSheet hssfSheet = (HSSFSheet)sheet;
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
RefModeRecord refModeRecord = null;
for (RecordBase record : records) {
if (record instanceof RefModeRecord) refModeRecord = (RefModeRecord)record;
}
if (refModeRecord == null) return "not specified";
if (refModeRecord.getMode() == RefModeRecord.USE_R1C1_MODE) return "R1C1";
if (refModeRecord.getMode() == RefModeRecord.USE_A1_MODE) return "A1";
}
return null;
}
static void setRefMode(XSSFWorkbook xssfWorkbook, String refMode) {
if ("R1C1".equals(refMode)) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) xssfWorkbook.getCTWorkbook().addNewCalcPr();
xssfWorkbook.getCTWorkbook().getCalcPr().setRefMode(org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.R_1_C_1);
} else if ("A1".equals(refMode)) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) xssfWorkbook.getCTWorkbook().addNewCalcPr();
xssfWorkbook.getCTWorkbook().getCalcPr().setRefMode(org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.A_1);
}
}
static String getRefMode(XSSFWorkbook xssfWorkbook) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) return "not specified";
if (xssfWorkbook.getCTWorkbook().getCalcPr().getRefMode() == org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.R_1_C_1) return "R1C1";
if (xssfWorkbook.getCTWorkbook().getCalcPr().getRefMode() == org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.A_1) return "A1";
return null;
}
public static void main(String[] args) throws Exception {
Workbook workbook = new XSSFWorkbook(); String filePath = "./CreateExcelRefModes.xlsx";
//Workbook workbook = new HSSFWorkbook(); String filePath = "./CreateExcelRefModes.xls";
Sheet sheet = workbook.createSheet();
if (workbook instanceof XSSFWorkbook) {
XSSFWorkbook xssfWorkbook = (XSSFWorkbook)workbook;
setRefMode(xssfWorkbook, "R1C1" );
//setRefMode(xssfWorkbook, "A1" );
System.out.println(getRefMode(xssfWorkbook));
} else if (workbook instanceof HSSFWorkbook) {
HSSFWorkbook hssfWorkbook = (HSSFWorkbook)workbook;
setRefMode(hssfWorkbook, "R1C1" );
//setRefMode(hssfWorkbook, "A1" );
System.out.println(getRefMode(hssfWorkbook));
}
FileOutputStream out = new FileOutputStream(filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
But question is: Why? Microsoft Excel uses A1 reference mode per default while storing formulas. In stored Excel file systems you never will find R1C1 formulas. Office Open XML stores formulas as strings in XML. And although the Office Open XML specification allows R1C1 there, even Microsoft Excel itself never stores R1C1 formula strings. The old binary *.xls file system stores formulas as binary Ptg records which are independent of their string representation. The conversion to R1C1 is done in Excel GUI only. It is done by the Excel application while parsing the file. Doing this, it puts in memory two kind of formulas each, one A1 and one R1C1. So both kinds of formulas are available in GUI and in VBA.
But apache poi does not support R1C1 formulas until now. If it would must then it would must do the conversion programmatically as the Excel application does. But that code is not public available and not reverse engineered from apache poi up to now.
When using current apache poi versions using reflection will not be necessary anymore. HSSFSheet has a method getSheet which returns the InternalSheet and InternalSheet has a method getRecords which returns the List<RecordBase>.
So code could be changed as so:
...
/*
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
*/
InternalSheet internalsheet = hssfSheet.getSheet();
/*
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
*/
List<RecordBase> records = internalsheet.getRecords();
...

PDFBox inconsistent PDTextField behaviour after setValue

PDFBox setValue() is not setting data for each PDTextField. It is saving few fields. It is not working for fields which have similar appearance in getFullyQualifiedName().
Note: field.getFullyQualifiedName() { customdutiesa, customdutiesb, customdutiesc } it is working for customdutiesa, but not working for customdutiesb and customdutiesc etc...
#Test
public void testb3Generator() throws IOException {
File f = new File(inputFile);
outputFile = String.format("%s_b3-3.pdf", "123");
try (PDDocument document = PDDocument.load(f)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
int i = 0;
for (PDField field : acroForm.getFields()) {
i=i+1;
if (field instanceof PDTextField) {
PDTextField textField = (PDTextField) field;
textField.setValue(Integer.toString(i));
}
}
document.getDocumentCatalog().getAcroForm().flatten();
document.save(new File(outputFile));
document.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
Input pdf link : https://s3-us-west-2.amazonaws.com/kx-filing-docs/b3-3.pdf
Ouput pdf link : https://kx-filing-docs.s3-us-west-2.amazonaws.com/123_b3-3.pdf
The problem is that under certain conditions PDFBox does not construct appearances for fields it sets the value of, and, therefore, during flattening completely forgets the field content:
// in case all tests fail the field will be formatted by acrobat
// when it is opened. See FreedomExpressions.pdf for an example of this.
if (actions == null || actions.getF() == null ||
widget.getCOSObject().getDictionaryObject(COSName.AP) != null)
{
... generate appearance ...
}
(org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(String))
I.e. if there is a JavaScript action for value formatting associated with the field and no appearance stream is yet present, PDFBox assumes it does not need to create an appearance (and probably would do it wrong anyways as it does not use that formatting action).
In case of a use case later flattening the form, that assumption of PDFBox obviously is wrong.
To force PDFBox to generate appearances for those fields, too, simply remove the actions before setting field values:
if (field instanceof PDTextField) {
PDTextField textField = (PDTextField) field;
textField.setActions(null);
textField.setValue(Integer.toString(i));
}
(from FillAndFlatten test testLikeAbubakarRemoveAction)

How to update cell reference values using Apache POI

I am using Apache POI to create new XSSFWorkbook from an existing one, after updating some values. Suppose I have two worksheets (Lets say: worksheet A & B) in my existing workbook. Worksheet B has some cell reference from Worksheet A. IF i modify those cell values of worksheet A and save them as a new workbook, corresponding cell values of worksheet B should be updated too. But it doesn't. How can i update them programmatically? . Thank you.
My code:
public void createExcel(ClientData cd) throws FileNotFoundException, IOException, InvalidFormatException{
// create a new file
double[] dataHolder1= cd.getFinalData1(), param1 = cd.getRecord1Param();
double[] dataHolder2 = cd.getFinalData2(), param2 = cd.getRecord2Param();
double[] ncv = cd.getNcv();
String[] pname = cd.getName();
Workbook workbook = new XSSFWorkbook(OPCPackage.open(new FileInputStream("template/mncv.xlsx"))); // or sample.xls
//CreationHelper createHelper = workbook.getCreationHelper();
Sheet s=workbook.getSheetAt(0);
int counter = dataHolder1.length + param1.length +param2.length+dataHolder2.length;//+ param1.length + param2.length;
// r = s.getRow(0);
// r.getCell(0).setCellValue("Param1");
// r.getCell(1).setCellValue("Record1");
// r.getCell(2).setCellValue("Param2");
// r.getCell(3).setCellValue("Record2");
int i;
for(i=0;i<counter;i++){
if(i<param1.length){
for(int j=0;j<param1.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(param1[j]);
i++;
}
}else if(i<dataHolder1.length+param1.length && i>=param1.length){
for(int j=0;j<dataHolder1.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(dataHolder1[j]);
i++;
}
}else if(i<dataHolder1.length+param1.length+param2.length && i>=dataHolder1.length+param1.length){
for(int j=0;j<param2.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(param2[j]);
i++;
}
}else{
for(int j=0;j<dataHolder2.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(dataHolder2[j]);
i++;
}
}
// if(i<=param1.length){
// r.getCell(0).setCellValue(param1[i-1]);
// r.getCell(2).setCellValue(param2[i-1]);
//
// }
// r.getCell(0).setCellValue(param1[i]);
//r.getCell(3).setCellValue(dataHolder2[i-1]);
i--;
}
for(int k=0;k<ncv.length;k++){
r = s.getRow(i);
r.getCell(0).setCellValue(ncv[k]);
i++;
}
s = workbook.getSheetAt(1);
s.getRow(2).getCell(5).setCellValue(pname[0]+" "+pname[1]+" "+pname[2]);
s.getRow(3).getCell(5).setCellValue(cd.getAge());
s.getRow(4).getCell(5).setCellValue(cd.getGender());
try (FileOutputStream out = new FileOutputStream("workbook.xlsx")) {
//WorkbookEvaluator we = new WorkbookEvaluator(workbook);
workbook.write(out);
out.close();
XSSFFormulaEvaluator.evaluateAllFormulaCells((XSSFWorkbook) workbook);
}catch(Exception e){
System.out.println(e);
}
The Excel file format caches the result of formula evaluation, to make opening the file quicker. This means that when you're done making changes to your file, you'll need to evaluate all of the formula cells to updated their cached value. (Otherwise, when you load the file in Excel, for almost all cases it'll still show the old value until you go into that cell)
Luckily, Apache POI provides code to do that, see the Formula Evaluation documentation for details. (You can choose to only recalculate certain formulas, if you know just those cells have changed, or do everything)
For any cell, say "B5", at runtime,
cell.getReference();
will give you cell reference (like in example... it will return you "B5")
cell.getReference().toString().charAt(0);
will give you the Column Reference (will give you "B" if the current cell is B5). Now
cell.getRowIndex();
OR
cell.getReference().toString().charAt(1);
will give you Row Index. Now you have the reference of the target cell. just replace these character with the references you have already created. This will update the cell references.
The following solution worked for me
wb.setForceFormulaRecalculation(true);
// replace "wb" with your HSSFWorkbook/XSSFWorkbook object

Categories