xlsx analysis:where XMLvalue is different from cells value - java

now we've building a function that import a large xlsx files (more than 200MB) to DB,using apache.poi and go through all the xml files reading for that data.
that function had completed but have a questuion:
when I input a value '1:16' in a xlsx cell,it would auto covery store type to 'user-defined-numeric'
in xml file you'll see
<c r="A1" s="1"><v>5.2777777777777778E-2</v></c>
and i just need get that value '1:16'
how can i do?

The "number" 1:16 is converted by Excel to a time and date/times in excel are stored as a number where the integer part is the number of days since the epoch and the decimal part is the percentage of the day.
So in your example:
= 0.0527777777777778 *24 *60 (hours * minutes)
= 76 mins
= 1 hour 16 mins
With POI you will need to use a data formatter. Something like:
DataFormatter formatter = new DataFormatter(Locale.US);
if(DateUtil.isCellDateFormatted(cell)) {
String formattedData = formatter.formatCellValue(cell);
...
}

Related

CSV parsing with Commons CSV - Quotes within quotes causing IOException

I am using Commons CSV to parse CSV content relating to TV shows. One of the shows has a show name which includes double quotes;
116,6,2,29 Sep 10,""JJ" (60 min)","http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"
The showname is "JJ" (60 min) which is already in double quotes. This is throwing an IOException java.io.IOException: (line 1) invalid char between encapsulated token and delimiter.
ArrayList<String> allElements = new ArrayList<String>();
CSVFormat csvFormat = CSVFormat.DEFAULT;
CSVParser csvFileParser = new CSVParser(new StringReader(line), csvFormat);
List<CSVRecord> csvRecords = null;
csvRecords = csvFileParser.getRecords();
for (CSVRecord record : csvRecords) {
int length = record.size();
for (int x = 0; x < length; x++) {
allElements.add(record.get(x));
}
}
csvFileParser.close();
return allElements;
CSVFormat.DEFAULT already sets withQuote('"')
I think that this CSV is not properly formatted as ""JJ" (60 min)" should be """JJ"" (60 min)" - but is there a way to get commons CSV to handle this or do I need to fix this entry manually?
Additional information: Other show names contain spaces and commas within the CSV entry and are placed within double quotes.
The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:
//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);
//create the parser
CsvParser parser = new CsvParser(settings);
//parse your line
String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");
for(String e : out){
System.out.println(e);
}
This will print:
116
6
2
29 Sep 10
"JJ" (60 min)
http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj
Hope it helps.
Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)
Quoting mainly allows for field to contain separator characters. If embedded quotes in a field are not escaped, this can't work, so there isn't any point in using quotes. If your example value was "JJ", 60 Min, how is a parser to know the comma is part of the field? The data format can't handle embedded commas reliably, so if you want to be able to do that, best to change the source to generate an RFC compliant csv format.
Otherwise, it looks like the data source is simply surrounding non-numeric fields with quotes, and separating each field a comma, so the parser needs to do the reverse. You should probably just treat the data as comma-delimited and strip the leading/trailing quotes yourself with removeStart/removeEnd.
You might use CSVFormat .withQuote(null), or forget about that and just use String .split(',')
You can use withEscape('\\') to ignore quotes within quotes
CSVFormat csvFormat = CSVFormat.DEFAULT.withEscape('\\')
Reference: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html
I think that having both quotations AND spaces in the same token is what confuses the parser. Try this:
CSVFormat csvFormat = CSVFormat.DEFAULT.withQuote('"').withQuote(' ');
That should fix it.
Example
For your input line:
String line = "116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"";
Output is (and no exception is thrown):
[116, 6, 2, 29 Sep 10, ""JJ" (60 min)", "http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"]
No need of special parsers: just add a double quote in front the double quote:
116,6,2,29 Sep 10,"""JJ"" (60 min)",...
It's all specified in RFC 4180
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
This is already implemented by CSVFormat #DEFAULT.

BIGDECIMAL WITHOUT THE EXPONENTIAL FORMAT -->0.0000 from DATABASE being converted to 0E-4 in .txt file

I have to get value from a DB2 database and put it in a .txt file.
The value of a column is 0.0000
but in the .txt file it is coming as 0E-4.
When I take the value from the database, I put it in String datatype and then write it into a file.
How do i print 0.0000 in my .txt file?
Use DecimalFormat
Example :
double value = 123456789.6666;
System.out.println(value);
DecimalFormat formatter_1 = new DecimalFormat("#,###.0000");
String stValue = formatter_1.format(value);
System.out.println(stValue);
DecimalFormat formatter_2 = new DecimalFormat("###.0000");
stValue = formatter_2.format(value);
System.out.println(stValue);
Output :
1.234567896666E8
123,456,789.6666
123456789.6666
If price_from_db column declared as decimal(0,4), if there is no value in the database, it will consider 0.0000 by default.
Bigdecimal price;
if this value comes from DB as exponetial format say 0E-11
we can use price.toPlainString() in our java code to get the value like 0.00000000000
Worked for me !!

error during grouping files based on the date field

I have a large file which has 10,000 rows and each row has a date appended at the end. All the fields in a row are tab separated. There are 10 dates available and those 10 dates have randomly been assigned to all the 10,000 rows. I am now writing a java code to write all those rows with the same date into a separate file where each file has the corresponding rows with that date.
I am trying to do it using string manipulations, but when I am trying to sort the rows based on date, I am getting an error while mentioning the date and the error says the literal is out of range. Here is the code that I used. Please have a look at it let me know if this is the right approach, if not, kindly suggest a better approach. I tried changing the datatype to Long, but still the same error. The row in the file looks something like this:
Each field is tab separated and the fields are:
business id, category, city, biz.name, longitude, state, latitude, type, date
**
qarobAbxGSHI7ygf1f7a_Q ["Sandwiches","Restaurants"] Gilbert Jersey
Mike's Subs -111.8120071 AZ 3.5 33.3788385 business 06012010
**
The code is:
File f=new File(fn);
if(f.exists() && f.length()>0)
{
BufferedReader br=new BufferedReader(new FileReader(fn));
BufferedWriter bw = new BufferedWriter(new FileWriter("FilteredDate.txt"));
String s=null;
while((s=br.readLine())!=null){
String[] st=s.split("\t");
if(Integer.parseInt(st[13])==06012010){
Thanks a lot for your time..
Try this,
List<String> sampleList = new ArrayList<String>();
sampleList.add("06012012");
sampleList.add("06012013");
sampleList.add("06012014");
sampleList.add("06012015");
//
//
String[] sampleArray = s.split(" ");
if (sampleArray != null)
{
String sample = sampleArray[sampleArray.length - 1];
if (sampleList.contains(sample))
{
stringBuilder.append(sample + "\n");
}
}
i suggest not to use split, but rather use
String str = s.subtring(s.lastIndexOf('\t'));
in any case, you try to take st[13] when i see you only have 9 columns. might be you just need st[8]
one last thing, look at this post to learn what 06012010 really means

Writing in excel file with poi

I am getting a phone number from one excel file and write into another excel file using the following code
cellph = row.getCell(3);
Object phone = cellph.getNumericCellValue();
String strphone = phone.toString();
cellfour.setCellType(cellfour.CELL_TYPE_STRING);
cellfour.setCellValue("0"+strphone);
It writes the phone number as 09.8546586. I want to write it as 098546586(without precision value). How to do that?
Your problem isn't with the write. Your problem is with the read, that's what's giving you the floating point number
From your code and description, it looks like your phone numbers are stored in Excel as number cells, with an integer format applied to it. That means that when you retrieve the cell, you'll get a double number, and a cell format that tells you how to format it like Excel does.
I think what you probably want to do is something more like:
DataFormatter formatter = new DataFormatter();
cellph = row.getCell(3);
String strphone = "(none available)";
if (cellph.getCellType() == Cell.CELL_TYPE_NUMERIC) {
// Number with a format
strphone = "0" + formatter.formatCellValue(cellph);
}
if (cellph.getCellType() == Cell.CELL_TYPE_STRING) {
// String, eg they typed ' before the number
strphone = "0" + cellph.getStringCellValue();
}
// For all other types, we'll show the none available message
cellfour.setCellType(cellfour.CELL_TYPE_STRING);
cellfour.setCellValue(strphone);

Is there an elegant way to Generate Excel spreadsheet from List<POJO>? (JAVA)

In java, Is there a elegant way to Generate Excel spreadsheet from List?
There are two possible and radically different approaches:
Write a CSV file. That's comma-separated, you just write out your fields, separated by commas, into a file with a .csv extension. Excel can read that just fine and it's dramatically simple.
Use Apache/Jakarta POI, a library, to write perfectly formatted, Office-compatible Excel files (Excel 95, 2003, ... various standards). This takes a bit more work.
As a previous answer suggests, CSV is an easy way to do this, but Excel has a habit of inferring data types - for example, if a string looks like a number, it will be formatted as a number, even if you have double-quoted it. If you want more control, you can try generating Excel XML, which in your case may be using a template, and generating a table that looks a little bit like an HTML table. See an example of a simple Excel XML document.
You can try ssio
public class Player {
#SsColumn(index = 0, name = "Id")
private long id;
#SsColumn(index = 1) // the column name will be decided as "Birth Country"
private String birthCountry;
#SsColumn(index = 2, typeHandler = FullNameTypeHandler.class) //complex prop type
private FullName fullName;
#SsColumn(index = 3) //The enum's name() will be saved. Otherwise, use a typeHandler
private SportType sportType;
#SsColumn(index = 4, format = "yyyy/MM/dd") //date format
private LocalDate birthDate;
#SsColumn(index = 5, typeHandler = TimestampAsMillisHandler.class)
//if you prefer saving timestamp as number
private LocalDateTime createdWhen;
...
}
SaveParam<Player> saveParam =
//Excel-like file. For CSV, use "new CsvSaveParamBuilder()"
new OfficeSaveParamBuilder<Player>()
.setBeanClass(Player.class)
.setBeans(players)
.setOutputTarget(outputStream)
.build();
SsioManager ssioManager = SsioManagerFactory.newInstance();
SaveResult saveResult = ssioManager.save(saveParam);

Categories