I need to read several xlsx files looking for data specific to an employee and simultaneously create another xlsx file (if I find data in any of the file)with file name as employee Id appended to the name I found the data in. Eg. there is an employee with emp id 1 and there are severaal xlsx files such as A,B, C... so on; I need to look for data relating to emp id 1 in each file and for the files I get a hit I need to create a file named 1_A.xlsx.
Now although I have built the logic and am using Apache POI APIs for reading and writing, my code is throwing Out Of Memory error after creating just the first file with the data. And is unable to read the rest of the files.
I have tried using SXSSF instead of XSSF but same OOM happens.
Increasing the heap space is not an option for me.
Please help here...Thanks in advance.
Here is a piece of code :
//Reader:
Row row = null;
List<Row> listOfRecords = new ArrayList<Row>();
try {
FileInputStream fis = new FileInputStream(metaDataFile);
new InputStreamReader(fis, "ISO-8859-1");
XSSFWorkbook wb = new XSSFWorkbook(fis);
XSSFSheet sheet = wb.getSheetAt(0);
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
if (!isEmptyRow(row)) {
listOfRecords.add(row);
}
}
wb.close();
fis.close();
//Writer
LOGGER.info("in createWorkbook " );
Workbook empWorkbook = new SXSSFWorkbook(200);
Sheet empSheet = empWorkbook.createSheet("Itype Sheet For Emp_"
+ personnelNumber);
int rowNum = listOfRecords.size();
System.out.println("Creating excel");
Cell c = null;
for (int i = 0; i < rowNum; i++) {
Row record = listOfRecords.get(i);
Row empRow = empSheet.createRow(i++);
if (!isEmptyRow(record)) {
int colNum = record.getLastCellNum() + 1;
for (int j = 0; j < colNum; j++) {
Cell newCell = empRow.createCell(j);
System.out.println("cellVal:"
+ String.valueOf(record.getCell(j)));
newCell.setCellValue(String.valueOf(record.getCell(j)));
}
}
}
The writer method is called from within the reader.
Reading of multiple xlsx files is indeed tricky business butI finally solved it.
I had to break down my code several folds to realise that the OOM error was due to the fact that after reading 3 files no more memory was left to process the rest of the files.
xlsx files are compressed xml files. So when we try to read them using XSSF or SXSSF APIs it loads the entire DOM to the memory thereafter choking it.
I found an excellent solution here :
[https://github.com/monitorjbl/excel-streaming-reader]
Hope this will help others who come here facing the same issue.
How can I convert/save excel file to pdf? I'm using java play framework to generate some excel files and now the requirement changes to pdf. I don't want to recode everything.
Is there a way to convert to pdf?
The excel files I'm generating are from a template; I read the excel template file, write changes, and save as new excel file. That way, the template is unchanged. It contains border, image, and other formatting.
You would need the following Java libraries and associated JAR files for the program to work.
POI v3.8
iText v5.3.4
Try this Example to convert XLS to PDF
The complete Java code that accepts Excel spreadsheet data as an input and transforms that to a PDF table data is provided below:
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class excel2pdf {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("C:\\excel_to_pdf.xls"));
// Read workbook into HSSFWorkbook
HSSFWorkbook my_xls_workbook = new HSSFWorkbook(input_document);
// Read worksheet into HSSFSheet
HSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0);
// To iterate over the rows
Iterator<Row> rowIterator = my_worksheet.iterator();
//We will create output PDF document objects at this point
Document iText_xls_2_pdf = new Document();
PdfWriter.getInstance(iText_xls_2_pdf, new FileOutputStream("Excel2PDF_Output.pdf"));
iText_xls_2_pdf.open();
//we have two columns in the Excel sheet, so we create a PDF table with two columns
//Note: There are ways to make this dynamic in nature, if you want to.
PdfPTable my_table = new PdfPTable(2);
//We will use the object below to dynamically add new data to the table
PdfPCell table_cell;
//Loop through rows.
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
//you need to add more code here based on
//your requirement / transformations
case Cell.CELL_TYPE_STRING:
//Push the data from Excel to PDF Cell
table_cell=new PdfPCell(new Phrase(cell.getStringCellValue()));
//feel free to move the code below to suit to your needs
my_table.addCell(table_cell);
break;
}
//next line
}
}
//Finally add the table to PDF document
iText_xls_2_pdf.add(my_table);
iText_xls_2_pdf.close();
//we created our pdf file..
input_document.close(); //close xls
}
}
i hope this will help you
Add on to assylias's answer
The code from assylias above was very helpful to me in solving this problem. The answer from santhosh could be great if you don't care about the resulting PDF looking exactly like your excel pdf export would look. However, if you are, say, filling out an excel template using Apache POI an then trying to export that while preserving its look and not writing a ton of code in iText just to try to get close to that look, then the VBS option is quite nice.
I'll share a Java version of the kotlin assylias has above in case that helps anyone. All credit to assylias for the general form of the solution.
In Java:
try {
//create a temporary file and grab the path for it
Path tempScript = Files.createTempFile("script", ".vbs");
//read all the lines of the .vbs script into memory as a list
//here we pull from the resources of a Gradle build, where the vbs script is stored
System.out.println("Path for vbs script is: '" + Main.class.getResource("xl2pdf.vbs").toString().substring(6) + "'");
List<String> script = Files.readAllLines(Paths.get(Main.class.getResource("xl2pdf.vbs").toString().substring(6)));
// append test.xlsm for file name. savePath was passed to this function
String templateFile = savePath + "\\test.xlsm";
templateFile = templateFile.replace("\\", "\\\\");
String pdfFile = savePath + "\\test.pdf";
pdfFile = pdfFile.replace("\\", "\\\\");
System.out.println("templateFile is: " + templateFile);
System.out.println("pdfFile is: " + pdfFile);
//replace the placeholders in the vbs script with the chosen file paths
for (int i = 0; i < script.size(); i++) {
script.set(i, script.get(i).replaceAll("XL_FILE", templateFile));
script.set(i, script.get(i).replaceAll("PDF_FILE", pdfFile));
System.out.println("Line " + i + " is: " + script.get(i));
}
//write the modified code to the temporary script
Files.write(tempScript, script);
//create a processBuilder for starting an operating system process
ProcessBuilder pb = new ProcessBuilder("wscript", tempScript.toString());
//start the process on the operating system
Process process = pb.start();
//tell the process how long to wait for timeout
Boolean success = process.waitFor(timeout, minutes);
if(!success) {
System.out.println("Error: Could not print PDF within " + timeout + minutes);
} else {
System.out.println("Process to run visual basic script for pdf conversion succeeded.");
}
} catch (Exception e) {
e.printStackTrace();
Alert saveAsPdfAlert = new Alert(AlertType.ERROR);
saveAsPdfAlert.setTitle("ERROR: Error converting to pdf.");
saveAsPdfAlert.setHeaderText("Exception message is:");
saveAsPdfAlert.setContentText(e.getMessage());
saveAsPdfAlert.showAndWait();
}
VBS:
Option Explicit
Dim objExcel, strExcelPath, objSheet
strExcelPath = "XL_FILE"
Set objExcel = CreateObject("Excel.Application")
objExcel.WorkBooks.Open strExcelPath
Set objSheet = objExcel.ActiveWorkbook.Worksheets(1)
objSheet.ExportAsFixedFormat 0, "PDF_FILE",0, 1, 0, , , 0
objExcel.ActiveWorkbook.Close
objExcel.Application.Quit
An alternative is to use a VB script and call it from Java.
Example:
xl2pdf.vbs
Option Explicit
Dim objExcel, strExcelPath, objSheet
strExcelPath = "$XL_FILE"
Set objExcel = CreateObject("Excel.Application")
objExcel.WorkBooks.Open strExcelPath
Set objSheet = objExcel.ActiveWorkbook.Worksheets(1)
objSheet.ExportAsFixedFormat 0, "$PDF_FILE",0, 1, 0, , , 0
objExcel.ActiveWorkbook.Close
objExcel.Application.Quit
In Java (actually kotlin, but easy to translate)
fun xl2pdf(xlFile: Path, pdfFile: Path, timeout: Long = 1, timeUnit: TimeUnit = TimeUnit.MINUTES) {
val tempScript = Files.createTempFile("script", ".vbs")
val script = Files.readAllLines(Paths.get("xl2pdf.vbs"))
.map { it.replace("\$XL_FILE", "$xlFile") }
.map { it.replace("\$PDF_FILE", "$pdfFile") }
Files.write(tempScript, script)
try {
val pb = ProcessBuilder("wscript", tempScript.toString())
val process = pb.start()
val success = process.waitFor(timeout, timeUnit)
if (!success) LOG.error("Could not print PDF within $timeout $timeUnit")
} catch (e: IOException) {
LOG.error("Error while printing Excel file to PDF", e)
}
}
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.xls.free</artifactId>
<version>5.1.0</version>
</dependency>
import com.spire.xls.FileFormat;
import com.spire.xls.Workbook;
import java.io.File;
public class EIceblueConverter {
public static void main(String[] args) {
for (Sources xls : Sources.values()) {
if (isFileExists(xls)) convert(xls);
}
}
private static boolean isFileExists(Sources xls) {
File file = new File(xls.getPath());
return file.exists() && file.isFile();
}
private static void convert(Sources xls) {
Workbook workbook = new Workbook();
workbook.loadFromFile(xls.getPath());
workbook.getConverterSetting().setSheetFitToPage(true);
workbook.saveToFile(Util.getOutputPath(xls.getPath()), FileFormat.PDF);
}
}
Before converting you should edit view area in file.xls*
... and more convertors, including the interesting solution: use libre office as converter .xls* to .pdf.
(do test it in src/main/java/jodconverter/AppStarter.java)
https://github.com/fedor83/xlsToPdfConverter.git
Here is the full fledge working example
Dependencies :
compile 'com.itextpdf:itextpdf:5.5.13.2'
compile 'org.apache.poi:poi-ooxml:5.0.0'
Java code:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import java.util.Iterator;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class Excel2PDF {
public static void main(String[] args) throws Exception {
Workbook my_xls_workbook = WorkbookFactory.create(new File("/Users/harshad/Desktop/excel.xlsx"));
Sheet my_worksheet = my_xls_workbook.getSheetAt(0);
short availableColumns = my_worksheet.getRow(0).getLastCellNum();
System.out.println("Available columns : " + availableColumns);
Iterator<Row> rowIterator = my_worksheet.iterator();
Document iText_xls_2_pdf = new Document();
PdfWriter.getInstance(iText_xls_2_pdf, new FileOutputStream("/Users/harshad/Desktop/excel.pdf"));
iText_xls_2_pdf.open();
PdfPTable my_table = new PdfPTable(availableColumns);
PdfPCell table_cell = null;
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
default:
try {
table_cell = new PdfPCell(new Phrase(cell.getStringCellValue()));
} catch (IllegalStateException illegalStateException) {
//TODO: Need to handle exceptions for different type too
if (illegalStateException.getMessage().equals("Cannot get a STRING value from a NUMERIC cell")) {
table_cell = new PdfPCell(new Phrase(String.valueOf(cell.getNumericCellValue())));
}
}
my_table.addCell(table_cell);
break;
}
}
}
iText_xls_2_pdf.add(my_table);
iText_xls_2_pdf.close();
my_xls_workbook.close();
}
}
I am having 2 issues using the apache POI to write data from a csv into an excel file.
The data consists of dates, and numbers
The issues are:
1) The numbers are written as strings.
2) Excel cannot read the date format (this messes the graphs up)
The code (that I received help with previously):
String name = "test";
Sheet sheet = wb.getSheet(name);
if (sheet == null) {
sheet = wb.createSheet(name);
}
int rowCount = 0;
Scanner scanner = new Scanner(new File("/tmp/" + name + ".csv"));
while (scanner.hasNextLine()) {
String[] rowData = scanner.nextLine().split(",");
for (int col = 0; col < rowData.length; col++) {
Row row = sheet.getRow(rowCount);
if (row == null)
row = sheet.createRow(rowCount);
Cell cell = row.getCell(col);
if (cell == null) {
cell = row.createCell(col);
}
cell.setCellValue(rowData[col]);
}
rowCount++;
}
wb.write(new FileOutputStream(excel));
}
1) I tried using Double.parseDouble(rowData[col]) when entering the data into the excel file. but this gives an empty string error. I even set the cell format with style.setDataFormat(format.getFormat("#,##0.0000")); but it still does not work
2) I tried using the date format cellStyle.setDataFormat(createHelper.createDataFormat().getFormat("m/d/yyyy hh:mm:ss")); but still the excel graphs can't read this format. (when I manually copy and paste from the csv file it works).
So basically, when copying data using the apache poi, none of the other data that relies on the copied cells is updated.
for example if a cell has a value of the average of 100 cells, and I manually copy data into those cells, it updates automatically. But when it copies through java, the cells do not update.
The following should do something more.
try {
double value = Double.parseDouble(rowData[col]);
cell.setCellValue(value);
} catch (NumberFormatException | NullPointerException e) {
String value = rowData[col];
cell.setCellValue(value);
}
(However you might not use Apache POI and straight copy the CSV file to a .xls, if
it is just a need for double-click reading by Excel.)
I am using Apache POI to create new XSSFWorkbook from an existing one, after updating some values. Suppose I have two worksheets (Lets say: worksheet A & B) in my existing workbook. Worksheet B has some cell reference from Worksheet A. IF i modify those cell values of worksheet A and save them as a new workbook, corresponding cell values of worksheet B should be updated too. But it doesn't. How can i update them programmatically? . Thank you.
My code:
public void createExcel(ClientData cd) throws FileNotFoundException, IOException, InvalidFormatException{
// create a new file
double[] dataHolder1= cd.getFinalData1(), param1 = cd.getRecord1Param();
double[] dataHolder2 = cd.getFinalData2(), param2 = cd.getRecord2Param();
double[] ncv = cd.getNcv();
String[] pname = cd.getName();
Workbook workbook = new XSSFWorkbook(OPCPackage.open(new FileInputStream("template/mncv.xlsx"))); // or sample.xls
//CreationHelper createHelper = workbook.getCreationHelper();
Sheet s=workbook.getSheetAt(0);
int counter = dataHolder1.length + param1.length +param2.length+dataHolder2.length;//+ param1.length + param2.length;
// r = s.getRow(0);
// r.getCell(0).setCellValue("Param1");
// r.getCell(1).setCellValue("Record1");
// r.getCell(2).setCellValue("Param2");
// r.getCell(3).setCellValue("Record2");
int i;
for(i=0;i<counter;i++){
if(i<param1.length){
for(int j=0;j<param1.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(param1[j]);
i++;
}
}else if(i<dataHolder1.length+param1.length && i>=param1.length){
for(int j=0;j<dataHolder1.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(dataHolder1[j]);
i++;
}
}else if(i<dataHolder1.length+param1.length+param2.length && i>=dataHolder1.length+param1.length){
for(int j=0;j<param2.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(param2[j]);
i++;
}
}else{
for(int j=0;j<dataHolder2.length;j++){
r = s.getRow(i);
r.getCell(0).setCellValue(dataHolder2[j]);
i++;
}
}
// if(i<=param1.length){
// r.getCell(0).setCellValue(param1[i-1]);
// r.getCell(2).setCellValue(param2[i-1]);
//
// }
// r.getCell(0).setCellValue(param1[i]);
//r.getCell(3).setCellValue(dataHolder2[i-1]);
i--;
}
for(int k=0;k<ncv.length;k++){
r = s.getRow(i);
r.getCell(0).setCellValue(ncv[k]);
i++;
}
s = workbook.getSheetAt(1);
s.getRow(2).getCell(5).setCellValue(pname[0]+" "+pname[1]+" "+pname[2]);
s.getRow(3).getCell(5).setCellValue(cd.getAge());
s.getRow(4).getCell(5).setCellValue(cd.getGender());
try (FileOutputStream out = new FileOutputStream("workbook.xlsx")) {
//WorkbookEvaluator we = new WorkbookEvaluator(workbook);
workbook.write(out);
out.close();
XSSFFormulaEvaluator.evaluateAllFormulaCells((XSSFWorkbook) workbook);
}catch(Exception e){
System.out.println(e);
}
The Excel file format caches the result of formula evaluation, to make opening the file quicker. This means that when you're done making changes to your file, you'll need to evaluate all of the formula cells to updated their cached value. (Otherwise, when you load the file in Excel, for almost all cases it'll still show the old value until you go into that cell)
Luckily, Apache POI provides code to do that, see the Formula Evaluation documentation for details. (You can choose to only recalculate certain formulas, if you know just those cells have changed, or do everything)
For any cell, say "B5", at runtime,
cell.getReference();
will give you cell reference (like in example... it will return you "B5")
cell.getReference().toString().charAt(0);
will give you the Column Reference (will give you "B" if the current cell is B5). Now
cell.getRowIndex();
OR
cell.getReference().toString().charAt(1);
will give you Row Index. Now you have the reference of the target cell. just replace these character with the references you have already created. This will update the cell references.
The following solution worked for me
wb.setForceFormulaRecalculation(true);
// replace "wb" with your HSSFWorkbook/XSSFWorkbook object
Sample Excel --> Sorry I'm not allowed to attached a image..
TC No. | Title | Result
1 | State and Vin | Failed
2 | State and Reg Code | Passed
3 | Booking a Test Drive | Passed
public class sampleTest{
public static void main(String[] args) throws Exception {
int iTest = 2, iTest2 = 3;
if (iTest == iTest2){
//It will pass a value into the excel (e.g. "Passed")
}
else{
//It will pass a value into the excel (e.g. "Failed")
}
}
My program's goal is to generate a report by getting the Passed and Failed results from my tests. My main problem here is on how to read the results from the excel and place the value "Passed" or "Failed" under Result column.
Download the apache poi jar from here
Go through these examples which demonstrates how to read/write xls data from a java program
Sample help code:
public static void main(String[] args) throws IOException {
Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("sheet");
Row row = sheet.createRow((short) 0);
row.createCell(0).setCellValue(1.2);
row.createCell(1).setCellValue(wb.getCreationHelper().createRichTextString("This is a string"));
row.createCell(2).setCellValue(true);
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();
}
This might help you to get started.
The whole flow is:
Create a workbook => The main xls file
Then create a sheet
Then create a row.
For each row create as many cells as you want and fill the cells with different values
Write the workbook like a file.
There can be multiple type of cells see this for more info.
To know how to read an excel file:
InputStream myxls = new FileInputStream("workbook.xls");
wb = new HSSFWorkbook(myxls);
sheet = wb.getSheetAt(0); // first sheet
row = sheet.getRow(0); // third row
HSSFCell cell = (HSSFCell) row.getCell((short)1); // fourth cell
if (cell.getCellType() == HSSFCell.CELL_TYPE_STRING) {
System.out.println("The Cell was a String with value \" " + cell.getStringCellValue()+" \" ");
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
System.out.println("The cell was a number " + cell.getNumericCellValue());
} else {
System.out.println("The cell was nothing we're interested in");
}
For more info see this
Via library that will be your interface to Excel document. One option is Apache POI. Excel example code can be found from here.
Other option is Java Excel API.