Apache POI SAX XSSFReader reads wrong date format

Apache POI SAX XSSFReader reads wrong date format - java

Apache POI SAX reader implemented similar to this well known example https://github.com/pjfanning/poi-shared-strings-sample/blob/master/src/main/java/com/github/pjfanning/poi/sample/XLSX2CSV.java reads some date values not as they are presented in excel despite it is supposed to read "formatted value".
Value in excel file : 1/1/2019, "formatted value" read by reader : 1/1/19.
Any idea why there is a difference?
Apache POI version 3.17
Reader code:
package com.lopuch.sk.lita.is.importer;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.log4j.Logger;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.util.ZipSecureFile;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.util.SAXHelper;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFComment;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import com.lopuch.sk.lita.is.importer.fileImport.ExcelRowReadListener;
public class ExcelSaxImporter {
private static final Logger logger = Logger.getLogger(ExcelSaxImporter.class);
private ExcelRowReadListener listener;
public void setOnRowRead(ExcelRowReadListener listener) {
this.listener = listener;
}
public ExcelRowReadListener getListener() {
return listener;
};
public void process(byte[] fileByteArray)
throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
ZipSecureFile.setMinInflateRatio(0.0d);
OPCPackage opcpPackage = OPCPackage.open(new ByteArrayInputStream(fileByteArray));
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(opcpPackage);
XSSFReader xssfReader = new XSSFReader(opcpPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
InputStream stream = iter.next();
processSheet(styles, strings, getHandler(), stream);
stream.close();
}
}
private SheetContentsHandler getHandler() {
return new SheetContentsHandler() {
private boolean firstCellOfRow = false;
private int currentRow = -1;
private int currentCol = -1;
// Maps column Letter name to its value.
// Does not contain key-value pair if cell value is null for
// currently
// processed column and row.
private Map<String, String> rowValues;
#Override
public void startRow(int rowNum) {
// Prepare for this row
firstCellOfRow = true;
currentRow = rowNum;
currentCol = -1;
rowValues = new HashMap<String, String>();
}
#Override
public void endRow(int rowNum) {
if (rowValues.keySet().size() == 0) {
logger.trace("Skipping calling rowRead() because of empty row");
} else {
ExcelSaxImporter.this.getListener().rowRead(rowValues);
}
}
#Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
if (firstCellOfRow) {
firstCellOfRow = false;
}
// gracefully handle missing CellRef here in a similar way
// as XSSFCell does
if (cellReference == null) {
cellReference = new CellAddress(currentRow, currentCol).formatAsString();
}
// Did we miss any cells?
int thisCol = (new CellReference(cellReference)).getCol();
currentCol = thisCol;
cellReference = cellReference.replaceAll("\\d","");
rowValues.put(cellReference, formattedValue);
}
#Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
};
}
/**
* Parses and shows the content of one sheet using the specified styles and
* shared-strings tables.
*
* #param styles
* #param strings
* #param sheetInputStream
*/
public void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, SheetContentsHandler sheetHandler,
InputStream sheetInputStream) throws IOException, ParserConfigurationException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
}

Difference in value displayed by excel and read by Apache POI comes from date formats that react to user language settings. From Excel:
Date formats that begin with an asterisk (*) responds to changes in regional date and time settings that are specified for the operating system.
Apache POI DataFormatter ignores these locale specific formats and returns default US format date. From Apache POI DataFormatter documentation:
Some formats are automatically "localized" by Excel, eg show as mm/dd/yyyy when loaded in Excel in some Locales but as dd/mm/yyyy in others. These are always returned in the "default" (US) format, as stored in the file.
To work around this behavior see answer to Java: excel to csv date conversion issue with Apache Poi

Related

Tensorflow lite model(.tflite) in android is always giving the same result for text classification

I am trying to classify the messages that i receive into 5 categories(sports, politics, business, tech, entertainment), but for every message I send the tflite model classify it as sports ONLY.
this is my model:
i am using Average Word Vector model to train and test my data. and it gives me correct predictions when testing it. However, when I integrate the model into android studio, the model always predict the message as sports with high accuracy(around 95%)
!pip install -q tflite-model-maker
import numpy as np
import os
from tflite_model_maker import configs
from tflite_model_maker import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker import TextClassifierDataLoader
import pandas as pd
import tensorflow as tf
assert tf.__version__.startswith('2')
data = pd.read_csv("bbc-text.csv")
print(data)
awv_spec = model_spec.get('average_word_vec')
awv_train_data = TextClassifierDataLoader.from_csv(
filename='bbc-text.csv',
text_column='text',
label_column='category',
model_spec=awv_spec,
is_training=True)
awv_test_data = TextClassifierDataLoader.from_csv(
filename='bbc-text1.csv',
text_column='text',
label_column='category',
model_spec=awv_spec,
is_training=False)
awv_model = text_classifier.create(awv_train_data, model_spec=awv_spec, epochs=20)
awv_model.evaluate(awv_test_data)
awv_model.export(export_dir='average_word_vec/')
and this is how i retrive info in android:
package com.example.letstalk.lib_interpreter;
import android.content.Context;
import android.content.res.AssetFileDescriptor;
import android.content.res.AssetManager;
import android.util.Log;
import com.example.letstalk.lib_interpreter.Result;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.PriorityQueue;
import org.tensorflow.lite.Interpreter;
import org.tensorflow.lite.support.metadata.MetadataExtractor;
public class TextClassificationClient {
private static final String TAG = "Interpreter";
private static final int SENTENCE_LEN = 256; // The maximum length of an input sentence.
// Simple delimiter to split words.
private static final String SIMPLE_SPACE_OR_PUNCTUATION = " |\\,|\\.|\\!|\\?|\n";
private static final String MODEL_PATH = "sentiment_analysis.tflite";
/*
* Reserved values in ImdbDataSet dic:
* dic["<PAD>"] = 0 used for padding
* dic["<START>"] = 1 mark for the start of a sentence
* dic["<UNKNOWN>"] = 2 mark for unknown words (OOV)
*/
private static final String START = "<START>";
private static final String PAD = "<PAD>";
private static final String UNKNOWN = "<UNKNOWN>";
/** Number of results to show in the UI. */
private static final int MAX_RESULTS = 3;
private final Context context;
private final Map<String, Integer> dic = new HashMap<>();
private final List<String> labels = new ArrayList<>();
private Interpreter tflite;
public TextClassificationClient(Context context) {
this.context = context;
}
/** Load the TF Lite model and dictionary so that the client can start classifying text. */
public void load() {
loadModel();
}
/** Load TF Lite model. */
private synchronized void loadModel() {
try {
// Load the TF Lite model
ByteBuffer buffer = loadModelFile(this.context.getAssets(), MODEL_PATH);
tflite = new Interpreter(buffer);
Log.v(TAG, "TFLite model loaded.");
// Use metadata extractor to extract the dictionary and label files.
MetadataExtractor metadataExtractor = new MetadataExtractor(buffer);
// Extract and load the dictionary file.
InputStream dictionaryFile = metadataExtractor.getAssociatedFile("vocab.txt");
loadDictionaryFile(dictionaryFile);
Log.v(TAG, "Dictionary loaded.");
// Extract and load the label file.
InputStream labelFile = metadataExtractor.getAssociatedFile("labels.txt");
loadLabelFile(labelFile);
Log.v(TAG, "Labels loaded.");
} catch (IOException ex) {
Log.e(TAG, "Error loading TF Lite model.\n", ex);
}
}
/** Free up resources as the client is no longer needed. */
public synchronized void unload() {
tflite.close();
dic.clear();
labels.clear();
}
/** Classify an input string and returns the classification results. */
public synchronized List<Result> classify(String text) {
// Pre-prosessing.
int[][] input = tokenizeInputText(text);
// Run inference.
Log.v(TAG, "Classifying text with TF Lite...");
float[][] output = new float[1][labels.size()];
tflite.run(input, output);
// Find the best classifications.
PriorityQueue<Result> pq =
new PriorityQueue<>(
MAX_RESULTS, (lhs, rhs) -> Float.compare(rhs.getConfidence(), lhs.getConfidence()));
for (int i = 0; i < labels.size(); i++) {
pq.add(new Result("" + i, labels.get(i), output[0][i]));
}
final ArrayList<Result> results = new ArrayList<>();
while (!pq.isEmpty()) {
results.add(pq.poll());
}
Collections.sort(results);
// Return the probability of each class.
return results;
}
/** Load TF Lite model from assets. */
private static MappedByteBuffer loadModelFile(AssetManager assetManager, String modelPath)
throws IOException {
try (AssetFileDescriptor fileDescriptor = assetManager.openFd(modelPath);
FileInputStream inputStream = new FileInputStream(fileDescriptor.getFileDescriptor())) {
FileChannel fileChannel = inputStream.getChannel();
long startOffset = fileDescriptor.getStartOffset();
long declaredLength = fileDescriptor.getDeclaredLength();
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
}
}
/** Load dictionary from model file. */
private void loadLabelFile(InputStream ins) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(ins));
// Each line in the label file is a label.
while (reader.ready()) {
labels.add(reader.readLine());
}
}
/** Load labels from model file. */
private void loadDictionaryFile(InputStream ins) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(ins));
// Each line in the dictionary has two columns.
// First column is a word, and the second is the index of this word.
while (reader.ready()) {
List<String> line = Arrays.asList(reader.readLine().split(" "));
if (line.size() < 2) {
continue;
}
dic.put(line.get(0), Integer.parseInt(line.get(1)));
}
}
/** Pre-prosessing: tokenize and map the input words into a float array. */
int[][] tokenizeInputText(String text) {
Log.d("hello", "tokenize: "+ text);
int[] tmp = new int[SENTENCE_LEN];
List<String> array = Arrays.asList(text.split(SIMPLE_SPACE_OR_PUNCTUATION));
int index = 0;
// Prepend <START> if it is in vocabulary file.
if (dic.containsKey(START)) {
tmp[index++] = dic.get(START);
}
for (String word : array) {
if (index >= SENTENCE_LEN) {
break;
}
tmp[index++] = dic.containsKey(word) ? dic.get(word) : (int) dic.get(UNKNOWN);
}
// Padding and wrapping.
Arrays.fill(tmp, index, SENTENCE_LEN - 1, (int) dic.get(PAD));
int[][] ans = {tmp};
return ans;
}
Map<String, Integer> getDic() {
return this.dic;
}
Interpreter getTflite() {
return this.tflite;
}
List<String> getLabels() {
return this.labels;
}
}

How can I go back to Main method in my code, And depending on Condition?

In this program I am Reading .xlsx file. And adding cell data to vector, if vector size is less-than 12 no need to read remaining data, and i need to go main method.
How can I do in my program ?
This is my Code :
package com.read;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Vector;
import org.apache.poi.openxml4j.opc.OPCPackage;
import java.io.InputStream;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class SendDataToDb {
public static void main(String[] args) {
SendDataToDb sd = new SendDataToDb();
try {
sd.processOneSheet("C:/Users/User/Desktop/New folder/Untitled 2.xlsx");
System.out.println("in Main method");
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void processOneSheet(String filename) throws Exception {
System.out.println("executing Process Method");
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
System.out.println("count "+sst.getCount());
XMLReader parser = fetchSheetParser(sst);
// To look up the Sheet Name / Sheet Order / rID,
// you need to process the core Workbook stream.
// Normally it's of the form rId# or rSheet#
InputStream sheet2 = r.getSheet("rId2");
System.out.println("Sheet2");
InputSource sheetSource = new InputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
//System.out.println("EXECUTING fetchSheetParser METHOD");
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
System.out.println("Method :fetchSheetParser");
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
Vector values = new Vector(20);
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
//long l = Long.valueOf(attributes.getValue("r"));
if(name.equals("c")){
columnNum++;
}
if(name.equals("c")) {
// Print the cell reference
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
//System.out.println("Method :222222222");
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
values.add(lastContents);
}
if(name.equals("row")) {
System.out.println(values);
//values.setSize(50);
System.out.println(values.size()+" "+values.capacity());
//********************************************************
//I AM CHECKING CONDITION HERE, IF CONDITION IS TRUE I NEED STOP THE REMAINING PROCESS AND GO TO MAIN METHOD.
if(values.size() < 12)
values.removeAllElements();
//WHAT CODE I NEED TO WRITE HERE TO STOP THE EXECUTION OF REMAINING PROCESS AND GO TO MAIN
//***************************************************************
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
//System.out.println("method : 333333333333");
lastContents += new String(ch, start, length);
}
}
}
check the code in between lines of //******************************
and
//******************************************

You can throw a SAXException wherever you want the parsing to stop:
throw new SAXException("<Your message>")
and handle it in the main method.

After your checking, you should throw the Exception to get out from there and get it back to the main method.
throw new Exception("vector size has to be less than 12");

Java - XLSX parse & database export

I have an excel that filled about 50k-60k rows.
I have to make that excel content uploaded into MySQL, usually I use the apache poi to read and upload it into MySQL, but this file cannot be read using apache poi cause the file was to LARGE.
Can anybody guide me how to do that? Here is my sample code to upload the content into MySQL using apache poi (it works for some little xlsx files that contains 1000-2000 rows)
public static void uploadCrossSellCorpCard(FileItem file, String dbtable) {
System.out.println("UploadUtil Running" + file.getFileName().toString());
try {
for(int i = 0; i<=sheetx.getLastRowNum(); i++){
row = sheetx.getRow(i);
try{
int oc = (int) row.getCell(0).getNumericCellValue();
if((String.valueOf(oc).matches("[A-Za-z0-9]{3}"))){
String rm_name = row.getCell(1).getStringCellValue();
String company = row.getCell(2).getStringCellValue();
String product = row.getCell(3).getStringCellValue();
String detail = row.getCell(4).getStringCellValue();
String type = row.getCell(5).getStringCellValue();
String sql = "INSERT INTO " + dbtable + " VALUES('"
+ oc + "','" + rm_name + "','" + company + "','"
+ product + "','" + detail + "','" + type + "')";
save(sql);
System.out.println("Import rows " + i);
}
} catch (IllegalStateException e) {
e.printStackTrace();
} catch (NullPointerException e) {
System.out.println(e);
}
}
System.out.println("Success import xlsx to mysql table");
} catch (NullPointerException e){
System.out.println(e);
System.out.println("Select the file first before uploading");
}
}
Note: I use hibernate method for handle upload schema.. "save(sql)" is calling my hibernate method

You can try using Apache POI SAX - read the section --> XSSF and SAX (Event API) on https://poi.apache.org/spreadsheet/how-to.html
You can read entire excel with 60k rows or even 100k rows just like reading an xml file. only thing you need to take care is empty cell since xml tag for empty cell will just skip the cell it but you may like to update null value in db table for the cell representing empty value.
Solution --> you can read each row and fire insert statement in a loop. and keep watch on empty cell by monitoring cell address if gap occurs then check respective column name and accordingly update your insert statement with null value.
I hope this helps you. below sample code read excel and store it in ArrayList of ArrayList for tabular representation. I am printing message in console - "new row begins" before start reading and printing row. and cell number of each value before printing cell value itself.
I have not taken care of cell gaps for empty cell but that you can code it based on finding cell gap since in my case I don't have empty cell.
look for cell address in the console that helps you in spotting any gap and handling it as you wish.
Run this code and works fine for me. don't forget to add xmlbeans-2.3.0.jar
other then jars required by import statements.
import java.io.InputStream;
import java.util.ArrayList;
import org.apache.commons.lang3.time.DurationFormatUtils;
import org.apache.commons.lang3.time.StopWatch;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
public class ExcelToStringArray implements Cloneable {
public static ArrayList<ArrayList<StringBuilder>> stringArrayToReturn = new ArrayList<ArrayList<StringBuilder>>();
public static ArrayList<StringBuilder> retainedString;
public static Integer lineCounter = 0;
public ArrayList<ArrayList<StringBuilder>> GetSheetInStringArray(String PathtoFilename, String rId)
throws Exception {
ExcelToStringArray myParser = new ExcelToStringArray();
myParser.processOneSheet(PathtoFilename, rId);
return stringArrayToReturn;
}
public void processOneSheet(String PathtoFilename, String rId) throws Exception {
OPCPackage pkg = OPCPackage.open(PathtoFilename);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
InputStream sheet = r.getSheet(rId);
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
private class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
if (name.equals("row")) {
retainedString = new ArrayList<StringBuilder>();
if (retainedString.isEmpty()) {
stringArrayToReturn.add(retainedString);
retainedString.clear();
}
System.out.println("New row begins");
retainedString.add(new StringBuilder(lineCounter.toString()));
lineCounter++;
}
// c => cell
if (name.equals("c")) {
// Print the cell reference
System.out.print(attributes.getValue("r") + " - ");
// System.out.print(attributes.getValue("r") + " - ");
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
if (cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name) throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if (nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if (name.equals("v")) {
System.out.println(lastContents);
// value of cell what it string or number
retainedString.add(new StringBuilder(lastContents));
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
StopWatch watch = new StopWatch();
watch.start();
ExcelToStringArray generate = new ExcelToStringArray();
// rID1 is first sheet in my workbook for rId2 for second sheet and so
// on.
generate.GetSheetInStringArray("D:\\Users\\NIA\\Desktop\\0000_MasterTestSuite.xlsx", "rId10");
watch.stop();
System.out.println(DurationFormatUtils.formatDurationWords(watch.getTime(), true, true));
System.out.println("done");
System.out.println(generate.stringArrayToReturn);
}
}

Low memory writing/reading with Apache POI

I'm trying to write a pretty large XLSX file (4M+ cells) and I'm having some memory issues.
I can't use SXSSF since I also need to read the existing cells in the template.
Is there anything I can do to reduce the memory footprint?
Perhaps combine streaming reading and streaming writing?

To handle large data with low memory, the best and I think the only option is SXSSF api-s.
If you need to read some data of the existing cells, I assume you do not need the entire 4M+ at the same time.
In such a case based on your application requirement, you can handle the window size yourself and keep in memory only the amount of data you need at a particular time.
You can start by looking at the example at :
http://poi.apache.org/spreadsheet/how-to.html#sxssf
Something as
SXSSFWorkbook wb = new SXSSFWorkbook(-1); // turn off auto-flushing and accumulate all rows in memory
// manually control how rows are flushed to disk
if(rownum % NOR == 0) {
((SXSSFSheet)sh).flushRows(NOR); // retain NOR last rows and flush all others
Hope this helps.

I used SAX parser to process events of the XML document presentation. This is
import com.sun.org.apache.xerces.internal.parsers.SAXParser;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import java.io.BufferedInputStream;
import java.io.InputStream;
import java.util.Collections;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
public class LowMemoryExcelFileReader {
private String file;
public LowMemoryExcelFileReader(String file) {
this.file = file;
}
public List<String[]> read() {
try {
return processFirstSheet(file);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private List<String []> readSheet(Sheet sheet) {
List<String []> res = new LinkedList<>();
Iterator<Row> rowIterator = sheet.rowIterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
int cellsNumber = row.getLastCellNum();
String [] cellsValues = new String[cellsNumber];
Iterator<Cell> cellIterator = row.cellIterator();
int cellIndex = 0;
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
cellsValues[cellIndex++] = cell.getStringCellValue();
}
res.add(cellsValues);
}
return res;
}
public String getFile() {
return file;
}
public void setFile(String file) {
this.file = file;
}
private List<String []> processFirstSheet(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
SheetHandler handler = new SheetHandler(sst);
XMLReader parser = fetchSheetParser(handler);
Iterator<InputStream> sheetIterator = r.getSheetsData();
if (!sheetIterator.hasNext()) {
return Collections.emptyList();
}
InputStream sheetInputStream = sheetIterator.next();
BufferedInputStream bisSheet = new BufferedInputStream(sheetInputStream);
InputSource sheetSource = new InputSource(bisSheet);
parser.parse(sheetSource);
List<String []> res = handler.getRowCache();
bisSheet.close();
return res;
}
public XMLReader fetchSheetParser(ContentHandler handler) throws SAXException {
XMLReader parser = new SAXParser();
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private static final String ROW_EVENT = "row";
private static final String CELL_EVENT = "c";
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private List<String> cellCache = new LinkedList<>();
private List<String[]> rowCache = new LinkedList<>();
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if (CELL_EVENT.equals(name)) {
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
} else if (ROW_EVENT.equals(name)) {
if (!cellCache.isEmpty()) {
rowCache.add(cellCache.toArray(new String[cellCache.size()]));
}
cellCache.clear();
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
cellCache.add(lastContents);
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
public List<String[]> getRowCache() {
return rowCache;
}
}
}

jExcelApi - cell date format reusing doesn't work

I created a wrapper to jExcelApi classes to easily export lists of objects to Excel. To minimize object creation cell formats are created as static fields and reused in consecutive calls to export. But I have problems with Date format - the first call works good, but in all consecutive exports date cells have numeric format instead of date format. If I create a new object for date format instead of using static field, everything is fine. Is there any reason that using same format object for different sheets or workbooks fails?
Here is the code, with exception handling simplified, other data types omitted and probably some imports missing:
ExcelCellGenerator.java:
import jxl.write.WritableCell;
public interface ExcelCellGenerator<T> {
WritableCell getCell(int col, int row, T arg);
}
ExcelCellGeneratorFactory.java:
import jxl.write.DateFormat;
import jxl.write.DateTime;
import jxl.write.Label;
import jxl.write.NumberFormat;
import jxl.write.NumberFormats;
import jxl.write.WritableCell;
import jxl.write.WritableCellFormat;
import ExcelExporter.DateTimeExtractor;
final class ExcelCellGeneratorFactory {
private ExcelCellGeneratorFactory() {}
private static final WritableCellFormat DATE_FORMAT = new WritableCellFormat ( new DateFormat ("dd MMM yyyy hh:mm:ss")); // reusing this field fails
static public <T> ExcelCellGenerator<T> createDateCellGenerator(final DateTimeExtractor<T> extractor) {
return new ExcelCellGenerator<T>() {
public WritableCell getCell(int col, int row, T arg) {
return new DateTime(col, row, extractor.extract(arg), DATE_FORMAT);
// if there is new WritableCellFormat(new DateFormat(...)) instead of DATE_FORMAT, works fine
}
};
}
}
ExcelExporter.java:
import jxl.Workbook;
import jxl.write.DateFormat;
import jxl.write.DateTime;
import jxl.write.Label;
import jxl.write.NumberFormat;
import jxl.write.WritableCellFormat;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
public class ExcelExporter<T> {
// describe a column in Excel sheet
private static class ColumnDescription<T> {
public ColumnDescription() {}
// column title
private String title;
// a way to generate a value given an object to export
private ExcelCellGenerator<T> generator;
}
// all columns for current sheet
private List<ColumnDescription<T>> columnDescList = new ArrayList<ColumnDescription<T>>();
// export given list to Excel (after configuring exporter using addColumn function
// in row number rowStart starting with column colStart there will be column titles
// and below, in each row, extracted values from each rowList element
public byte[] exportList(int rowStart, int colStart, List<? extends T> rowList) {
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
WritableWorkbook workbook;
try {
workbook = Workbook.createWorkbook(outputStream);
} catch (IOException e) {
e.printStackTrace();
}
final WritableSheet sheet = workbook.createSheet("Arkusz1", 0);
int currRow = rowStart;
try {
int currCol = colStart;
for (ColumnDescription<T> columnDesc : columnDescList) {
final Label label = new Label(currCol, currRow, columnDesc.title);
sheet.addCell(label);
currCol++;
}
currRow++;
for (T object : rowList) {
currCol = colStart;
for (ColumnDescription<T> columnDesc : columnDescList) {
sheet.addCell(columnDesc.generator.getCell(currCol, currRow, object));
currCol++;
}
currRow++;
}
workbook.write();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
workbook.close();
} catch (Exception e) {
e.printStackTrace();
}
}
return outputStream.toByteArray();
}
// configure a Date column
public ExcelExporter<T> addColumn(String title, DateTimeExtractor<T> extractor) {
final ColumnDescription<T> desc = new ColumnDescription<T>();
desc.title = title;
desc.generator = ExcelCellGeneratorFactory.createDateCellGenerator(extractor);
columnDescList.add(desc);
return this;
}
// and test that shows the problem
public static void main(String []args) {
final ExcelExporter<Date> exporter = new ExcelExporter<Date>();
exporter.addColumn("Data", new DateTimeExtractor<Date>() {
public Date extract(Date date) {
return date;
}});
// this file looks OK
FileOutputStream ostream = new FileOutputStream("C:\\tmp\\test1.xls");
try {
ostream.write(exporter.exportList(0, 0, Collections.singletonList(new Date())));
} finally {
ostream.close();
}
// but in this file date is shown in cell with numeric format
final ExcelExporter<Date> exporter2 = new ExcelExporter<Date>();
exporter2.addColumn("Data", new DateTimeExtractor<Date>() {
public Date extract(Date date) {
return date;
}});
ostream = new FileOutputStream("C:\\tmp\\test2.xls");
try {
ostream.write(exporter2.exportList(0, 0, Collections.singletonList(new Date())));
} finally {
ostream.close();
}
}
}

Telcontar's answer was helpful as stating it's a feature, not a bug, but was not enough as not giving any link to FAQ or doc. So I did some research and found out a FAQ that says:
also, it's important that you Do Not declare your cell formats as static. As a cell format is added to a sheet, it gets assigned an internal index number.
So the answer is - formats cannot be reused in different sheets because they are not designed to be reused this way.

In jxl format objects can't be reused in multiple workbooks. i don't know why.

It is actually worse than that. The fonts and formats implicitly rely on "the workbook". The question of which workbook illuminates the problem. They appear to need to be reassigned after creating subsequent workbooks.
final WritableWorkbook workbook = Workbook.createWorkbook(response
.getOutputStream());
// We have to assign this every time we create a new workbook.
bodyText = new WritableCellFormat(WritableWorkbook.ARIAL_10_PT);
...
The API should be changed so that the constructors require as an argument the workbook they relate to, or the constructors should be private and the fonts and formats should be obtained from the workbook.
WritableCellFormat bodyText = new WritableCellFormat(workbook,
WritableWorkbook.ARIAL_10_PT);
or
WritableCellFormat bodyText = workbook.getCellFormat(
WritableWorkbook.ARIAL_10_PT);

you can try SmartXLS,the cell format can be reused anywhere you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache POI SAX XSSFReader reads wrong date format - java

Related

Tensorflow lite model(.tflite) in android is always giving the same result for text classification

How can I go back to Main method in my code, And depending on Condition?

Java - XLSX parse & database export

Low memory writing/reading with Apache POI

jExcelApi - cell date format reusing doesn't work

Categories

Resources