Apache Poi identify formula with reference to another workbook

Apache Poi identify formula with reference to another workbook - java

I have a workbook, which I must clear from all references to other workbooks. I am currently trying to parse the cell formulas wo check if they are referencing any excel file.
For that I use this line
cell.getCellFormula().matches(".*\\[.*\\.xls[xm]?\\].*")
the issue with this is, that the cell looks like this in XML format:
<c r="K64" s="2128">
<f>[5]Segments!$AS$7/Annual!AF38</f>
<v>0.0</v>
</c>
As you can see, the formula doesn't actually contain .xls, '.xlsx' or .xlsm at all. As far as I know [5] indicates a shared string which holds the actual path and therefore the actual value for the formula.
Now one could say and change the regex to .*\\[\d+\\].*, but I think that this can be pretty error prone. Also I think that not literally every external reference will look like this.
So my question is:
How can I identify formulas which reference an external workbook?
If you have any questions, feel free to ask.
EDIT:
I have prepared an sample excel file showcasing the issue. It's available for download at workupload.com

The way shown in Dynamically add External (Cross-Workbook) references definitely is the way to go. Go through all formula tokens and if one of those has an external sheet index, then this formula refers an external sheet.
Example using your uploaded file:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.formula.*;
import org.apache.poi.ss.formula.ptg.*;
import org.apache.poi.ss.formula.EvaluationWorkbook.ExternalSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFEvaluationWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook;
import java.io.FileInputStream;
public class ExcelReadExternalReference {
public static void main(String[] args) throws Exception {
String filePath = "TestExternalLinks.xlsx";
// String filePath = "TestExternalLinks.xls";
Workbook workbook = WorkbookFactory.create(new FileInputStream(filePath));
EvaluationWorkbook evalWorkbook = null;
if (workbook instanceof HSSFWorkbook) {
evalWorkbook = HSSFEvaluationWorkbook.create((HSSFWorkbook) workbook);
} else if (workbook instanceof XSSFWorkbook) {
evalWorkbook = XSSFEvaluationWorkbook.create((XSSFWorkbook) workbook);
}
Sheet sheet = workbook.getSheetAt(0);
EvaluationSheet evalSheet = evalWorkbook.getSheet(0);
for (Row row : sheet) {
for (Cell cell : row) {
if (cell.getCellType() == CellType.FORMULA) {
String cellFormula = cell.getCellFormula();
System.out.println(cellFormula);
EvaluationCell evaluationCell = evalSheet.getCell(cell.getRowIndex(), cell.getColumnIndex());
Ptg[] formulaTokens = evalWorkbook.getFormulaTokens(evaluationCell);
for (Ptg formulaToken : formulaTokens) {
int externalSheetIndex = -1;
if (formulaToken instanceof Ref3DPtg) {
Ref3DPtg refToken = (Ref3DPtg) formulaToken;
externalSheetIndex = refToken.getExternSheetIndex();
} else if (formulaToken instanceof Area3DPtg) {
Area3DPtg refToken = (Area3DPtg) formulaToken;
externalSheetIndex = refToken.getExternSheetIndex();
} else if (formulaToken instanceof Ref3DPxg) {
Ref3DPxg refToken = (Ref3DPxg) formulaToken;
externalSheetIndex = refToken.getExternalWorkbookNumber();
} else if (formulaToken instanceof Area3DPxg) {
Area3DPxg refToken = (Area3DPxg) formulaToken;
externalSheetIndex = refToken.getExternalWorkbookNumber();
}
if (externalSheetIndex >= 0) {
System.out.print("We have extrenal sheet index: " + externalSheetIndex
+ ". So this formula refers an external sheet in workbook: ");
ExternalSheet externalSheet = null;
if (workbook instanceof HSSFWorkbook) {
externalSheet = evalWorkbook.getExternalSheet(externalSheetIndex);
} else if (workbook instanceof XSSFWorkbook) {
externalSheet = evalWorkbook.getExternalSheet(null, null, externalSheetIndex);
}
String linkedFileName = externalSheet.getWorkbookName();
System.out.println(linkedFileName);
}
}
}
}
}
workbook.close();
}
}

Related

How to get SignatureLines excel apache poi

Good morning,
I created an excel with signature lines.
I'm trying to obtain signature lines in a excel document with apache poi library.
XSSFWorkbook w = new XSSFWorkbook(mp.getInputStream());
w.get......?
Any suggestion?
Thanks in advance,
Pablo
I see there is a class called XSSFSignatureLine but i don't see any example to use it.

To get/set a signature line from/into an Excel sheet, apache poi has introduced XSSFSignatureLine in current apache poi 5.x. This class provides a method parse(XSSFSheet sheet) which gets one signature line per sheet. Seems as if apache poihad not expected that there can be multiple signature lines per sheet.
The text data of the signatures are stored in a VML drawing. So if it is only to get the text data out of the signature lines, then one could get the sheet's VML drawing and select the data from that XML. Of course the binary data of the signature lines cannot be got from that XML.
Following code sample shows both nof the methods.
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import java.io.FileInputStream;
class ExcelGetSignatureLines {
static void getSignatureLines(XSSFSheet sheet) throws Exception {
XSSFSignatureLine signatureLine = new XSSFSignatureLine();
signatureLine.parse(sheet);
System.out.println("Found XSSFSignatureLine:");
System.out.println(signatureLine.getSuggestedSigner());
System.out.println(signatureLine.getSuggestedSigner2());
System.out.println(signatureLine.getSuggestedSignerEmail());
}
static void getSignatureLinesFromVMLDrawing(XSSFSheet sheet) throws Exception {
XSSFVMLDrawing vmlDrawing = sheet.getVMLDrawing(false);
if (vmlDrawing != null) {
org.apache.poi.schemas.vmldrawing.XmlDocument vmlDrawingDocument = vmlDrawing.getDocument();
String declareNameSpaces = "declare namespace v='urn:schemas-microsoft-com:vml'; "
+ "declare namespace o='urn:schemas-microsoft-com:office:office' ";
org.apache.xmlbeans.XmlObject[] selectedObjects = vmlDrawingDocument.selectPath(
declareNameSpaces
+ ".//v:shape//o:signatureline");
for (org.apache.xmlbeans.XmlObject object : selectedObjects) {
if (object instanceof com.microsoft.schemas.office.office.CTSignatureLine) {
com.microsoft.schemas.office.office.CTSignatureLine ctSignatureLine = (com.microsoft.schemas.office.office.CTSignatureLine)object;
System.out.println("Found CTSignatureLine:");
System.out.println(ctSignatureLine.getSuggestedsigner());
System.out.println(ctSignatureLine.getSuggestedsigner2());
System.out.println(ctSignatureLine.getSuggestedsigneremail());
}
}
}
}
public static void main(String[] args) throws Exception {
Workbook workbook = WorkbookFactory.create(new FileInputStream("./WorkbookHavingSignatureLines.xlsx"));
for (Sheet sheet : workbook ) {
if (sheet instanceof XSSFSheet) {
System.out.println("Sheet " + sheet.getSheetName());
getSignatureLines((XSSFSheet)sheet);
getSignatureLinesFromVMLDrawing((XSSFSheet)sheet);
}
}
workbook.close();
}
}

How to detect excel cell reference style of a file using apache POI?

I get an excel file through front end and I do not know what is the user preferred cell reference style (A1 or R1C1) for that file. I want to display the header with column position as present in the file.
For example, if the file is using R1C1 reference style then the column position should be shown as 1, 2, 3... and for A1 references, it should return A, B C...
I want to achieve this using Java apache POI. Any lead in this will be helpful.
Thanks in advance.

The used reference mode (either A1 or R1C1) can be stored in the Excel files. It may be omitted. Then Excel defaults to the last used setting in application.
In the old binary *.xls file system (HSSF) it gets stored using a RefModeRecord in the worksheet' s record stream. Although it cannot be different for single worksheets, it will be stored for each worksheet separately. But it cannot be different for different sheets in same workbook.
In Office Open XML file system (*.xlsx, XSSF) it gets stored in xl/workbook.xml using element calcPr having attribute refMode set.
Both is not dicrectly suppoerted by apache poi upto now. But if one knows the internally structure of the file systems, then it can be set and get using following code:
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.record.RecordBase;
import org.apache.poi.hssf.record.RefModeRecord;
import org.apache.poi.hssf.model.InternalSheet;
import java.lang.reflect.Field;
import java.util.List;
public class CreateExcelRefModes {
static void setRefMode(HSSFWorkbook hssfWorkbook, String refMode) throws Exception {
for (Sheet sheet : hssfWorkbook) {
HSSFSheet hssfSheet = (HSSFSheet)sheet;
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
RefModeRecord refModeRecord = null;
for (RecordBase record : records) {
if (record instanceof RefModeRecord) refModeRecord = (RefModeRecord)record;
}
if ("R1C1".equals(refMode)) {
if (refModeRecord == null) {
refModeRecord = new RefModeRecord();
records.add(records.size() - 1, refModeRecord);
}
refModeRecord.setMode(RefModeRecord.USE_R1C1_MODE);
} else if ("A1".equals(refMode)) {
if (refModeRecord == null) {
refModeRecord = new RefModeRecord();
records.add(records.size() - 1, refModeRecord);
}
refModeRecord.setMode(RefModeRecord.USE_A1_MODE);
}
}
}
static String getRefMode(HSSFWorkbook hssfWorkbook) throws Exception {
for (Sheet sheet : hssfWorkbook) {
HSSFSheet hssfSheet = (HSSFSheet)sheet;
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
RefModeRecord refModeRecord = null;
for (RecordBase record : records) {
if (record instanceof RefModeRecord) refModeRecord = (RefModeRecord)record;
}
if (refModeRecord == null) return "not specified";
if (refModeRecord.getMode() == RefModeRecord.USE_R1C1_MODE) return "R1C1";
if (refModeRecord.getMode() == RefModeRecord.USE_A1_MODE) return "A1";
}
return null;
}
static void setRefMode(XSSFWorkbook xssfWorkbook, String refMode) {
if ("R1C1".equals(refMode)) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) xssfWorkbook.getCTWorkbook().addNewCalcPr();
xssfWorkbook.getCTWorkbook().getCalcPr().setRefMode(org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.R_1_C_1);
} else if ("A1".equals(refMode)) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) xssfWorkbook.getCTWorkbook().addNewCalcPr();
xssfWorkbook.getCTWorkbook().getCalcPr().setRefMode(org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.A_1);
}
}
static String getRefMode(XSSFWorkbook xssfWorkbook) {
if (xssfWorkbook.getCTWorkbook().getCalcPr() == null) return "not specified";
if (xssfWorkbook.getCTWorkbook().getCalcPr().getRefMode() == org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.R_1_C_1) return "R1C1";
if (xssfWorkbook.getCTWorkbook().getCalcPr().getRefMode() == org.openxmlformats.schemas.spreadsheetml.x2006.main.STRefMode.A_1) return "A1";
return null;
}
public static void main(String[] args) throws Exception {
Workbook workbook = new XSSFWorkbook(); String filePath = "./CreateExcelRefModes.xlsx";
//Workbook workbook = new HSSFWorkbook(); String filePath = "./CreateExcelRefModes.xls";
Sheet sheet = workbook.createSheet();
if (workbook instanceof XSSFWorkbook) {
XSSFWorkbook xssfWorkbook = (XSSFWorkbook)workbook;
setRefMode(xssfWorkbook, "R1C1" );
//setRefMode(xssfWorkbook, "A1" );
System.out.println(getRefMode(xssfWorkbook));
} else if (workbook instanceof HSSFWorkbook) {
HSSFWorkbook hssfWorkbook = (HSSFWorkbook)workbook;
setRefMode(hssfWorkbook, "R1C1" );
//setRefMode(hssfWorkbook, "A1" );
System.out.println(getRefMode(hssfWorkbook));
}
FileOutputStream out = new FileOutputStream(filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
But question is: Why? Microsoft Excel uses A1 reference mode per default while storing formulas. In stored Excel file systems you never will find R1C1 formulas. Office Open XML stores formulas as strings in XML. And although the Office Open XML specification allows R1C1 there, even Microsoft Excel itself never stores R1C1 formula strings. The old binary *.xls file system stores formulas as binary Ptg records which are independent of their string representation. The conversion to R1C1 is done in Excel GUI only. It is done by the Excel application while parsing the file. Doing this, it puts in memory two kind of formulas each, one A1 and one R1C1. So both kinds of formulas are available in GUI and in VBA.
But apache poi does not support R1C1 formulas until now. If it would must then it would must do the conversion programmatically as the Excel application does. But that code is not public available and not reverse engineered from apache poi up to now.
When using current apache poi versions using reflection will not be necessary anymore. HSSFSheet has a method getSheet which returns the InternalSheet and InternalSheet has a method getRecords which returns the List<RecordBase>.
So code could be changed as so:
...
/*
Field _sheet = HSSFSheet.class.getDeclaredField("_sheet");
_sheet.setAccessible(true);
InternalSheet internalsheet = (InternalSheet)_sheet.get(hssfSheet);
*/
InternalSheet internalsheet = hssfSheet.getSheet();
/*
Field _records = InternalSheet.class.getDeclaredField("_records");
_records.setAccessible(true);
#SuppressWarnings("unchecked")
List<RecordBase> records = (List<RecordBase>)_records.get(internalsheet);
*/
List<RecordBase> records = internalsheet.getRecords();
...

Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)

The problem:
I'm just trying to open it .xls file using the Apache-poi 4.1.0 library and it gives the same error as 4 years ago in a similar question.
I already tried
to put version 3.12-3.16.
3.13 as well
All versions can open blank .xls and filled by myself but not this one.
This document is generated automatically and I need to make a program that accepts it.
I already made a .Net standart library C# which is work, I tried to use xamarin android it's a horror, the app weighs 50 mb vs 3 mb due to various terrible SDK link errors, but that's a different story. So I decided to do it on Kotlin.
Code is from the documentation
You can check file on git
val inputStream = FileInputStream("./test.xls")
val wb = HSSFWorkbook(inputStream)
I expect no errors while opening xls.
Actual output is
Exception in thread "main" java.lang.RuntimeException: Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)
at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:97)
at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:183)
at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:122)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:354)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:400)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt:19)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt)
The document will not be modified in any way. If there any other libraries to just read the dataset or strings from the .xls file will be OK.

After some investigation I found the problem with your test.xls file.
According the file format specifications, all HyperlinkRecords should be together in the Hyperlink Table. It is contained in the Sheet Substream following the cell records. In your case the HyperlinkRecords are between other records (between NumberRecords and LabelSSTRecords in that case). So I suspect it was not Excel what had created that test.xls file.
Excelmight be tolerant enough to open that file nevertheless. But you cannot expect that apache poi also tries to tolerate all possible violations in file format. If you open the file using Excel and then re-save it, apache poi is able creating the Workbookafter that.
Apache poi is not able repairing this as Excel can do. But one could read the POIFSFileSystem a low level way and filtering out the HyperlinkRecords that are between other records. That way one could read the content using apache poi, of course except the hyperlinks.
Example:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.poifs.filesystem.DirectoryNode;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.NameRecord;
import org.apache.poi.hssf.record.NameCommentRecord;
import org.apache.poi.hssf.record.HyperlinkRecord;
import org.apache.poi.hssf.record.RecordFactoryInputStream;
import org.apache.poi.hssf.record.RecordFactory;
import org.apache.poi.hssf.model.RecordStream;
import org.apache.poi.hssf.model.InternalWorkbook;
import org.apache.poi.hssf.model.InternalSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFName;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.util.CellReference;
import java.util.List;
import java.util.ArrayList;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import java.lang.reflect.Constructor;
class ExcelOpenHSSF {
public static void main(String[] args) throws Exception {
String fileName = "test(2).xls";
try (InputStream is = new FileInputStream(fileName);
POIFSFileSystem fileSystem = new POIFSFileSystem(is)) {
//find workbook directory entry
DirectoryNode directory = fileSystem.getRoot();
String workbookName = "";
for(String wbName : InternalWorkbook.WORKBOOK_DIR_ENTRY_NAMES) {
if(directory.hasEntry(wbName)) {
workbookName = wbName;
break;
}
}
InputStream stream = directory.createDocumentInputStream(workbookName);
//loop over all records and manipulate if needed
List<Record> records = new ArrayList<Record>();
RecordFactoryInputStream recStream = new RecordFactoryInputStream(stream, true);
//here we filter out the HyperlinkRecords that are between other records (NumberRecords and LabelSSTRecords in that case)
//System.out.println prints the problematic records
Record record1 = null;
Record record2 = null;
while ((record1 = recStream.nextRecord()) != null) {
record2 = recStream.nextRecord();
if (!(record1 instanceof HyperlinkRecord) && (record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record1);
} else if ((record1 instanceof HyperlinkRecord) && !(record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record2);
} else {
records.add(record1);
if (record2 != null) records.add(record2);
}
}
//now create the HSSFWorkbook
//see https://svn.apache.org/viewvc/poi/tags/REL_4_1_0/src/java/org/apache/poi/hssf/usermodel/HSSFWorkbook.java?view=markup#l322
InternalWorkbook internalWorkbook = InternalWorkbook.createWorkbook(records);
HSSFWorkbook wb = HSSFWorkbook.create(internalWorkbook);
int recOffset = internalWorkbook.getNumRecords();
Method convertLabelRecords = HSSFWorkbook.class.getDeclaredMethod("convertLabelRecords", List.class, int.class);
convertLabelRecords.setAccessible(true);
convertLabelRecords.invoke(wb, records, recOffset);
RecordStream rs = new RecordStream(records, recOffset);
while (rs.hasNext()) {
InternalSheet internelSheet = InternalSheet.createSheet(rs);
Constructor constructor = HSSFSheet.class.getDeclaredConstructor(HSSFWorkbook.class, InternalSheet.class);
constructor.setAccessible(true);
HSSFSheet hssfSheet = (HSSFSheet)constructor.newInstance(wb, internelSheet);
Field _sheets = HSSFWorkbook.class.getDeclaredField("_sheets");
_sheets.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFSheet> sheets = (ArrayList<HSSFSheet>)_sheets.get(wb);
sheets.add(hssfSheet);
}
for (int i = 0 ; i < internalWorkbook.getNumNames() ; ++i){
NameRecord nameRecord = internalWorkbook.getNameRecord(i);
Constructor constructor = HSSFName.class.getDeclaredConstructor(HSSFWorkbook.class, NameRecord.class, NameCommentRecord.class);
constructor.setAccessible(true);
HSSFName name = (HSSFName)constructor.newInstance(wb, nameRecord, internalWorkbook.getNameCommentRecord(nameRecord));
Field _names = HSSFWorkbook.class.getDeclaredField("names");
_names.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFName> names = (ArrayList<HSSFName>)_names.get(wb);
names.add(name);
}
//now the workbook is created properly
System.out.println(wb);
/*
//getting the data
DataFormatter formatter = new DataFormatter();
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
System.out.print(cellRef.formatAsString());
System.out.print(" - ");
String text = formatter.formatCellValue(cell);
System.out.println(text);
}
}
*/
}
}
}

I was able to open a file of this "corrupted" type by using JExcel API
But using poi.apache.org also opens the file if manually resave it using excel application. (It may not be suitable for someone)
Sorry that it was asking strange questions. Thank you all and hope that someone may find useful.
val inputStream = FileInputStream("./testCorrupted.xls")
val workbook = Workbook.getWorkbook(inputStream)
val sheet = workbook.getSheet(0)
val cell1 = sheet.getCell(0, 0)
print(cell1.contents + ":")

Java Apache POI Excel save as PDF

How can I convert/save excel file to pdf? I'm using java play framework to generate some excel files and now the requirement changes to pdf. I don't want to recode everything.
Is there a way to convert to pdf?
The excel files I'm generating are from a template; I read the excel template file, write changes, and save as new excel file. That way, the template is unchanged. It contains border, image, and other formatting.

You would need the following Java libraries and associated JAR files for the program to work.
POI v3.8
iText v5.3.4
Try this Example to convert XLS to PDF
The complete Java code that accepts Excel spreadsheet data as an input and transforms that to a PDF table data is provided below:
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class excel2pdf {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("C:\\excel_to_pdf.xls"));
// Read workbook into HSSFWorkbook
HSSFWorkbook my_xls_workbook = new HSSFWorkbook(input_document);
// Read worksheet into HSSFSheet
HSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0);
// To iterate over the rows
Iterator<Row> rowIterator = my_worksheet.iterator();
//We will create output PDF document objects at this point
Document iText_xls_2_pdf = new Document();
PdfWriter.getInstance(iText_xls_2_pdf, new FileOutputStream("Excel2PDF_Output.pdf"));
iText_xls_2_pdf.open();
//we have two columns in the Excel sheet, so we create a PDF table with two columns
//Note: There are ways to make this dynamic in nature, if you want to.
PdfPTable my_table = new PdfPTable(2);
//We will use the object below to dynamically add new data to the table
PdfPCell table_cell;
//Loop through rows.
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
//you need to add more code here based on
//your requirement / transformations
case Cell.CELL_TYPE_STRING:
//Push the data from Excel to PDF Cell
table_cell=new PdfPCell(new Phrase(cell.getStringCellValue()));
//feel free to move the code below to suit to your needs
my_table.addCell(table_cell);
break;
}
//next line
}
}
//Finally add the table to PDF document
iText_xls_2_pdf.add(my_table);
iText_xls_2_pdf.close();
//we created our pdf file..
input_document.close(); //close xls
}
}
i hope this will help you

Add on to assylias's answer
The code from assylias above was very helpful to me in solving this problem. The answer from santhosh could be great if you don't care about the resulting PDF looking exactly like your excel pdf export would look. However, if you are, say, filling out an excel template using Apache POI an then trying to export that while preserving its look and not writing a ton of code in iText just to try to get close to that look, then the VBS option is quite nice.
I'll share a Java version of the kotlin assylias has above in case that helps anyone. All credit to assylias for the general form of the solution.
In Java:
try {
//create a temporary file and grab the path for it
Path tempScript = Files.createTempFile("script", ".vbs");
//read all the lines of the .vbs script into memory as a list
//here we pull from the resources of a Gradle build, where the vbs script is stored
System.out.println("Path for vbs script is: '" + Main.class.getResource("xl2pdf.vbs").toString().substring(6) + "'");
List<String> script = Files.readAllLines(Paths.get(Main.class.getResource("xl2pdf.vbs").toString().substring(6)));
// append test.xlsm for file name. savePath was passed to this function
String templateFile = savePath + "\\test.xlsm";
templateFile = templateFile.replace("\\", "\\\\");
String pdfFile = savePath + "\\test.pdf";
pdfFile = pdfFile.replace("\\", "\\\\");
System.out.println("templateFile is: " + templateFile);
System.out.println("pdfFile is: " + pdfFile);
//replace the placeholders in the vbs script with the chosen file paths
for (int i = 0; i < script.size(); i++) {
script.set(i, script.get(i).replaceAll("XL_FILE", templateFile));
script.set(i, script.get(i).replaceAll("PDF_FILE", pdfFile));
System.out.println("Line " + i + " is: " + script.get(i));
}
//write the modified code to the temporary script
Files.write(tempScript, script);
//create a processBuilder for starting an operating system process
ProcessBuilder pb = new ProcessBuilder("wscript", tempScript.toString());
//start the process on the operating system
Process process = pb.start();
//tell the process how long to wait for timeout
Boolean success = process.waitFor(timeout, minutes);
if(!success) {
System.out.println("Error: Could not print PDF within " + timeout + minutes);
} else {
System.out.println("Process to run visual basic script for pdf conversion succeeded.");
}
} catch (Exception e) {
e.printStackTrace();
Alert saveAsPdfAlert = new Alert(AlertType.ERROR);
saveAsPdfAlert.setTitle("ERROR: Error converting to pdf.");
saveAsPdfAlert.setHeaderText("Exception message is:");
saveAsPdfAlert.setContentText(e.getMessage());
saveAsPdfAlert.showAndWait();
}
VBS:
Option Explicit
Dim objExcel, strExcelPath, objSheet
strExcelPath = "XL_FILE"
Set objExcel = CreateObject("Excel.Application")
objExcel.WorkBooks.Open strExcelPath
Set objSheet = objExcel.ActiveWorkbook.Worksheets(1)
objSheet.ExportAsFixedFormat 0, "PDF_FILE",0, 1, 0, , , 0
objExcel.ActiveWorkbook.Close
objExcel.Application.Quit

An alternative is to use a VB script and call it from Java.
Example:
xl2pdf.vbs
Option Explicit
Dim objExcel, strExcelPath, objSheet
strExcelPath = "$XL_FILE"
Set objExcel = CreateObject("Excel.Application")
objExcel.WorkBooks.Open strExcelPath
Set objSheet = objExcel.ActiveWorkbook.Worksheets(1)
objSheet.ExportAsFixedFormat 0, "$PDF_FILE",0, 1, 0, , , 0
objExcel.ActiveWorkbook.Close
objExcel.Application.Quit
In Java (actually kotlin, but easy to translate)
fun xl2pdf(xlFile: Path, pdfFile: Path, timeout: Long = 1, timeUnit: TimeUnit = TimeUnit.MINUTES) {
val tempScript = Files.createTempFile("script", ".vbs")
val script = Files.readAllLines(Paths.get("xl2pdf.vbs"))
.map { it.replace("\$XL_FILE", "$xlFile") }
.map { it.replace("\$PDF_FILE", "$pdfFile") }
Files.write(tempScript, script)
try {
val pb = ProcessBuilder("wscript", tempScript.toString())
val process = pb.start()
val success = process.waitFor(timeout, timeUnit)
if (!success) LOG.error("Could not print PDF within $timeout $timeUnit")
} catch (e: IOException) {
LOG.error("Error while printing Excel file to PDF", e)
}
}

<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.xls.free</artifactId>
<version>5.1.0</version>
</dependency>
import com.spire.xls.FileFormat;
import com.spire.xls.Workbook;
import java.io.File;
public class EIceblueConverter {
public static void main(String[] args) {
for (Sources xls : Sources.values()) {
if (isFileExists(xls)) convert(xls);
}
}
private static boolean isFileExists(Sources xls) {
File file = new File(xls.getPath());
return file.exists() && file.isFile();
}
private static void convert(Sources xls) {
Workbook workbook = new Workbook();
workbook.loadFromFile(xls.getPath());
workbook.getConverterSetting().setSheetFitToPage(true);
workbook.saveToFile(Util.getOutputPath(xls.getPath()), FileFormat.PDF);
}
}
Before converting you should edit view area in file.xls*
... and more convertors, including the interesting solution: use libre office as converter .xls* to .pdf.
(do test it in src/main/java/jodconverter/AppStarter.java)
https://github.com/fedor83/xlsToPdfConverter.git

Here is the full fledge working example
Dependencies :
compile 'com.itextpdf:itextpdf:5.5.13.2'
compile 'org.apache.poi:poi-ooxml:5.0.0'
Java code:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class Excel2PDF {
public static void main(String[] args) throws Exception {
Workbook my_xls_workbook = WorkbookFactory.create(new File("/Users/harshad/Desktop/excel.xlsx"));
Sheet my_worksheet = my_xls_workbook.getSheetAt(0);
short availableColumns = my_worksheet.getRow(0).getLastCellNum();
System.out.println("Available columns : " + availableColumns);
Iterator<Row> rowIterator = my_worksheet.iterator();
Document iText_xls_2_pdf = new Document();
PdfWriter.getInstance(iText_xls_2_pdf, new FileOutputStream("/Users/harshad/Desktop/excel.pdf"));
iText_xls_2_pdf.open();
PdfPTable my_table = new PdfPTable(availableColumns);
PdfPCell table_cell = null;
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
default:
try {
table_cell = new PdfPCell(new Phrase(cell.getStringCellValue()));
} catch (IllegalStateException illegalStateException) {
//TODO: Need to handle exceptions for different type too
if (illegalStateException.getMessage().equals("Cannot get a STRING value from a NUMERIC cell")) {
table_cell = new PdfPCell(new Phrase(String.valueOf(cell.getNumericCellValue())));
}
}
my_table.addCell(table_cell);
break;
}
}
}
iText_xls_2_pdf.add(my_table);
iText_xls_2_pdf.close();
my_xls_workbook.close();
}
}

Using hashmap for POI Java XLSX

I have been trying to edit my code to allow a XLSX file to be uploaded and be able to be read on the website. But after countless tries, the data I typed into the XLSX File is unable to be captured on the website. (Eg: After downloading the XLSX Template from the website, I am able to type in anything that I want in the XLSX file and able to upload it again to the website so I do not need to keep on adding new data by clicking "new" every single time. I can just type in everything in that XLSX File all at once and upload it right away)
I was told to use hashmap but I am unsure of the way it works. The codes I have currently only enables the website to capture the header title and I am not suppose to use jxl.
While removing those codes that has jxl, I encounter some errors (being underline in red).
public HashMap getConstructJXLList_xlsx(UploadedFile File, int Sheetindex) {
String _LOC = "[PageCodeBase: getConstructJXLList]";
HashMap _m = new HashMap();
InputStream _is = null;
try {
_is = File.getInputstream();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
XSSFWorkbook workbook;
XSSFSheet s;
try {
workbook = new XSSFWorkbook(_is);
s = workbook.getSheetAt(Sheetindex);
} catch (Exception e) {
System.out.println(_LOC + "1.0 " + " Test:");
int _totalc = getColumns(); //getColumns is being underline in red
int _totalr = getRows(); //getRows is being underline in red
// Header r=0
String[] _st = new String[_totalc];
//XSSFSheet sheet = null;
for (int _c = 0; _c < _totalc; _c++) {
_st[_c] = getCell(_c, 0); //getCell is being underline in red
}
_m.put("HEADER", _st);
System.out.println(_LOC + "1.0 " + " _m:" + _m);
// Data r=1 thereafter
List _l = new ArrayList();
for (int _r = 1; _r < _totalr; _r++) {
Object[] _o = new Object[_totalc];
String _s_r = null;
for (int _c = 0; _c < _totalc; _c++) {
_o[_c] = getCell(_c, _r);
String _cn = _o[_c].getClass().getName();
String _s_c = null;
if (!isEmptyNull(_s_c)) {
_s_r = "record_available";
}
}
if ((_o != null) && (_o.length != 0)) {
_l.add(_o);
}
}
_m.put("DATA", _l);
System.out.println(_LOC + "1.0 " + " _m:" + _m);
}
return _m;
}
Do you mind helping me to solve this? Why there isn't any data being capture in the website? The error shown is "The method getColumns/getCell/getRows is undefined for the type PageCodeBase." And the help/quick fix given is to create a new method. But after creating the new method, I am unsure of what to add in the methods. Have tried various example (http://snippetjournal.wordpress.com/2014/02/05/read-xlsx-using-poi/) but I stil can't seem to get it work out.

I would recommend you to manage de excel file using this classes from the apache POI api
org.apache.poi.ss.usermodel.Cell;
org.apache.poi.ss.usermodel.Row;
org.apache.poi.ss.usermodel.Sheet;
org.apache.poi.ss.usermodel.Workbook;
org.apache.poi.ss.usermodel.WorkbookFactory;
instead of those XSSFWorkbook, XSSFSheet...
And also when accessing the file input stream try doing it this way:
FileInputStream input = new FileInputStream(new File("C:\\Users\\admin\\Desktop\\Load_AcctCntr_Template.xlsx"));
Workbook workBook = WorkbookFactory.create(stream);
workBook.getSheetAt(0);

use this.
FileInputStream input = new FileInputStream(new File("C:/Users/admin/Desktop/Load_AcctCntr_Template.xlsx"));
Workbook wb = WorkbookFactory.create(input);
as mentioned in user3661357 answer. use
Workbook instead of XSSFWorkbook.
Sheet instead of XSSFSheet.
etc..
Also read this
Getting Exception(org.apache.poi.openxml4j.exception - no content type [M1.13]) when reading xlsx file using Apache POI?
*HINT > use ALT+SHIFT+I in netbeans to load the necessary packages.
A working example
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class POITest {
public static void test() {
try {
FileInputStream input = new FileInputStream(new File("C:/Users/kingslayer/Desktop/test/a.xlsx"));
Workbook wb = WorkbookFactory.create(input);
Sheet s = wb.getSheetAt(0);
Iterator<Row> rows = s.rowIterator();
while (rows.hasNext()) {
Row row = rows.next();
Iterator cells = row.cellIterator();
while (cells.hasNext()) {
Cell cell = (Cell) cells.next();
if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
System.out.print(cell.getStringCellValue() + "t");
} else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
System.out.print(cell.getNumericCellValue() + "t");
} else if (cell.CELL_TYPE_BLANK == cell.getCellType()) {
System.out.print("BLANK ");
} else {
System.out.print("Unknown cell type");
}
}
input.close();
}
} catch (IOException | InvalidFormatException ex) {
Logger.getLogger(POITest.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
test();
}
}
All the libraries you must have on the project path.
commons-codec-1.5.jar ,
commons-logging-1.1.jar ,
dom4j-1.6.1.jar ,
junit-3.8.1.jar ,
log4j-1.2.13.jar ,
poi-3.9-20121203.jar ,
poi-excelant-3.9-20121203.jar ,
poi-ooxml-3.9-20121203.jar ,
poi-ooxml-schemas-3.9-20121203.jar ,
poi-scratchpad-3.9-20121203.jar ,
stax-api-1.0.1.jar ,
xmlbeans-2.3.0.jar ,

1) get rid of POIFSFileSystem fs = new POIFSFileSystem(input); as you are not using it
2) input.close(); is called after first iteration of row

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Poi identify formula with reference to another workbook - java

Related

How to get SignatureLines excel apache poi

How to detect excel cell reference style of a file using apache POI?

Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)

Java Apache POI Excel save as PDF

Using hashmap for POI Java XLSX

Categories

Resources