Excel to csv conversion in java. Problems with currency representation - java

while converting from excel to csv using the jxl api, I face a problem which is any currency amount greater than 999 is being split into two columns.
i.e $100,000 is split into 100 and 000.
does anyone have a reliable java code that can convert excel to csv without any problems,
regards,
nithen

Enclose the column into quotes otherwise the , is treated as a column seperator
"$100,000"

Related

Apache POI - Formula to find maximum length of characters in a column

Hi All,
My objective was to find the maximum length of characters in a column. To achieve that I thought of using the below piece of code but it is not providing the expected result.
String range = "MAX(LEN(A1:A3))";
formulaCell.setCellFormula(range);
formulaEvaluator.evaluateInCell(formulaCell);
Below is the issue which I am facing
The formula given in the code is being set as =MAX(LEN(#A1:A3)) because of which the formula returns the length of the cell in the same row where the formula is being inserted.
I am not sure why apache POI is adding '#' to the formula if someone please let me know if there are any alternate ways.
It is not apache poi which puts the # into the formula. Excel 365 is doing this because the formula part LEN(A1:A3) is wrong for a normal cell formula as LEN only expects one parameter and not multiple or a cell range. So MAX(LEN(A1:A3)) would must be an array formula.
So if you would set the formula as such:
...
String formula = "MAX(LEN(A1:A3))";
//cell.setCellFormula(formula);
CellRange cellRange = sheet.setArrayFormula(formula, CellRangeAddress.valueOf("B1"));
...
Then Excel would handle that formula correctly.
But FormulaEvaluator cannot evaluate such array formulas correctly. It would evaluate only MAX(LEN(A1)).
So if the need is that apache poi shall know the maximum length of characters in a column, then the only way is iterating over all rows (cells) in that column, get the length of cell contents for each cell and get the maximum.
Btw.: If someone wonders why Excel 365 not has the need to put that formula as an array formula using CtrlShiftEnter and not has array markers ({=MAX(LEN(A1:A3)}) around that formula:
Excel 365 has a new feature called dynamic array formulas. This feature detects array formulas - in this case because LEN has a cell range as the argument - and then marks such formulas as array automatically. Additional it puts a metadata information to the cell that this cell contains an array formula. So the special markers {...} are not more needed. But apache poi does not support that new feature.

How to avoid automatically change data type in .CSV file

I have created an application in java language and I have used MySQL as a DBMS. I have a button in this application which export my data to .CSV file.
The problem is that when I open .CSV file, it automatically change the data. For example, 00001 will becomes 1.
How can I avoid automatically change data type in excel?
If you open the *.csv directly in Excel, then all columns will be read as type general (means there is an automated guessing for the value type).
If you import the file as text into the current sheet you can specify the value delimiter and also the type of each column.
May be this is related.
In that case the solution would be to save value as
"=""00001"""
This will be interpretered as text by excel.
Have you tried prefixing it with a single quote?
Example: '00001
It's just a guess based on how that value would need to be typed into Excel for it to treat it as a string and not to convert the value into a numeric 1.
Another way that might work: when creating the csv, what value are you writing? Are you writing 0001 (so, as number); or "0001"?
You see, when this column is seen as number ... then it might be reasonable that you get 1 for 00001.
So, tried saving that the values for that column as "00001"?

Number as it is from excel file using Apache POI

I am using Apache POI (version 3.9) framework for dealing with excel files, i.e. reading excel file.
I am having trouble with number types. User can write a number in excel file with comma or dot as decimal separator, i.e. 12.34 or 12,34, for various reasons.
Now, I would like to get that value as it is (i.e. if it is 12.34, then I would like to get 12.34, or if it is 12,34, then I would like to get 12,34), but instead POI Cell preliminary gives me a double with dot as decimal separator.
So, if a cell value was 12,34, POI Cell would give double of value 12.34.
This behaviour is not what I would like to be. I would like to get the value that was entered (12,34).
How to avoid/solve this?
I have searched Stackoverflow for similar problems and tried to use solution given in thread How can I read numeric strings in Excel cells as string (not numbers) with Apache POI? and also tried using alternative org.apache.poi.ss.usermodel.DataFormatter class, and it works, but it does not work when the the value is 12,34 and type of cell is Number defined in Excel file itself.
If you are validating the numbers, I would suggest that you might have more luck converting your validation string into a number to compare with the value in the cell. There is no information stored in the spreadsheet about the decimal/thousands separator. That is based on the locale set in the application or OS and is display-only - it is not stored at all. So you will have to manually format the number as a String.

extract strings from database's table

I have database's table containing source code (not a traditional language ) which i want to parse using regex ; I want to know, should I proceed row by row, or should I copy all of the rows to a text file to process them using regex?
I think it depends on the length of the code. If the code is short then it's better to parse it in one time (read all rows). But if the code length is large then it's better to parse it chunk by chunk. Let's say 100 rows and after that another 100 rows and etc. I think It depends on parsing performance

Reading string value from Excel with HSSF but it's double

I'm using HSSF-POI for reading excel data. The problem is I have values in a cell that look like a number but really are strings. If I look at the format cell in Excel, it says the type is "text". Still the HSSF Cell thinks it's numeric. How can I get the value as a string?
If I try to use cell.getRichStringValue, I get exception; if cell.toString, it's not the exact same value as in Excel sheet.
Edit: until this gets resolved, I'll use
new BigDecimal(cell.getNumericCellValue()).toString()
The class you're looking for in POI is DataFormatter
When Excel writes the file, some cells are stored as literal Strings, while others are stored as numbers. For the latter, a floating point value representing the cell is stored in the file, so when you ask POI for the value of the cell that's what it actually has.
Sometimes though, especially when doing Text Extraction (but not always), you want to make the cell value look like it does in Excel. It isn't always possible to get that exactly in a String (non full space padding for example), but the DataFormatter class will get you close.
If you're after a String of the cell, looking much as you had it looking in Excel, just do:
// Create a formatter, do this once
DataFormatter formatter = new DataFormatter(Locale.US);
.....
for(Cell cell : row) {
CellReference ref = new CellReference(cell);
// eg "The value of B12 is 12.4%"
System.out.println("The value of " + ref.formatAsString() + " is " + formatter.formatCellValue(cell));
}
The formatter will return String cells as-is, and for Numeric cells will apply the formatting rules on the style to the number of the cell
If the documents you are parsing are always in a specific layout, you can change the cell type to "string" on the fly and then retrieve the value. For example, if column 2 should always be string data, set its cell type to string and then read it with the string-type get methods.
cell.setCellType(Cell.CELL_TYPE_STRING);
In my testing, changing the cell type did not modify the contents of the cell, but did allow it to be retrieved with either of the following approaches:
cell.getStringCellValue();
cell.getRichStringCellValue().getString();
Without an example of a value that is not converting properly, it is difficult to know if this will behave any differently than the cell.toString() approach you described in the description.
You mean HSSF-POI says
cell.getCellType() == Cell.CELL_TYPE_NUMERIC
NOT
Cell.CELL_TYPE_STRING as it should be?
I would think it's a bug in POI, but every cell contains a Variant, and Variant has a type. It's kind of hard to make a bug there, so instead I think Excel uses some extra data or heuristic to report the field as text. Usual MS way, alas.
P.S. You cannot use any getString() on a Variant containing numeric, as the binary representation of the Variant data depends on it's type, and trying to get a string from what is actually a number would result in garbage -- hence the exception.
This below code works fine to read any celltype but that cell should contain numeric value
new BigDecimal(cell.getNumericCellValue()));
e.g.
ase.setGss(new BigDecimal(hssfRow.getCell(3).getNumericCellValue()));
where variable gss is of BigDecimal type.
Excel will convert anything that looks like a number or date or time from a string. See MS Knowledge base article, which basically suggests to enter the number with an extra character that makes it a string.
You are probably dealing with an Excel problem. When you create the spreadsheet, the default cell type is Generic. With this type, Excel guesses the type based on the input and this type is saved with each cell.
When you later change the cell format to Text, you are just changing the default. Excel doesn't change every cell's type automatically. I haven't found a way to do this automatically.
To confirm this, you can go to Excel and retype one of the numbers and see if it's text in HSSF.
You can also look at the real cell type by using this function,
#Cell("type", A1)
A1 is the cell for the number. It shows "l" for text, "v" for numbers.
The problem with Excel is that the default format is generic. With this format Excel stores numbers entered in the cell as numeric. You have to change the format to text before entering the values. Reentering the values after changing the format will also work.
That will lead to little green triangles in the left upper corner of the cells if the content looks like a number to Excel. If this is the case the value is really stored as text.
With new BigDecimal(cell.getNumericCellValue()).toString() you will still have a lot of problems. For example if you have identifying numbers (e.g. part numbers or classification numbers) you probably have cases that have leading zeros which will be a problem with the getNumericCellValue() approach.
I try to thoroughly explain how to correctly create the Excel to the party creating the files I have to handle with POI. If the files are uploaded by end users I even have created a validation program to check for expected cell types if I know the columns in advance. As a by-product you can also check various other things of the supplied files (e.g. are the right columns provided or mandatory values).
"The problem is I have values in a cell that look like a number" => look like number when viewed in Excel?
"but really are strings" => what does that mean? How do you KNOW that they really are strings?
"If I look at the format cell" => what's "the format cell"???
'... in Excel, it says the type is "text"' => Please explain.
"Still the HSSF Cell thinks it's numeric." => do you mean that the_cell.getCellType() returns Cell.CELL_TYPE_NUMERIC?
"How can I get the value as a string?" => if it's NUMERIC, get the numeric value using the_cell.getNumericCellValue(), then format it as a string any way you want to.
"If I try to use cell.getRichStringValue, I get exception;" => so it's not a string.
"if cell.toString, it's not the exact same value as in Excel sheet." => so cell.toString() doesn't format it the way that Excel formats it.
Whatever heuristic Excel uses to determine type is irrelevant to you. It's the RESULT of that decision as stored in the file and revealed by getCellType() that matters.

Categories