Determining Categorical vs. Numerical Data in CSV File -- - java

I have a CSV (Comma Separated Value) file that has a mixture of categorical, and numerical data.
I would like to be able to determine whether the data is either categorical or numerical, so I can programatically plot the data in a scatter plot that I created in Swing.
Any ideas on how to achieve this in Java? I'm looking for approaches--not code.

Use Double.parseDouble() or Integer.parseInt() to check for numerical data.
Both methods take a String as an argument and return a double or an int primitive. These methods will throw a NumberFormatException if the string you pass is not numerical.
I see that you have fixed column ordering, as described in your comment. In this case, you already know which data is numerical and which is categorical by its position in the line. So only be sure to parse numerical values on the columns that expect numerical values. If you catch this exception, it means your data file is malformed or you have a bug in your parsing logic. Of course, you have to strip out the dollar sign.

Related

Apache POI fails to read numerical String [duplicate]

When I am reading excel numeric cell values, i am getting the output with decimals. eg: 79 is reading as 79.0, 0.00 reading as 0.0. The code for my application I have written is:
int type = cell.getCellType();
if (type == HSSFCell.CELL_TYPE_NUMERIC)
System.out.println(cell.getNumericCellValue());
This question comes up a lot on Stack Overflow, so you'd be well advised to look through many of the similar ones to see there answers!
Excel stores almost all numbers in the file format as floating point values, which is why POI will give you back a double for a numeric cell as that's what was really there. If you want it to look like it does in Excel, you need to apply the formatting rules defined against the cell to the number. Thus, you want to do exactly the same thing as in my answer here. To quote:
What you want to do is use the DataFormatter class. You pass this a cell, and it does its best to return you a string containing what Excel would show you for that cell. If you pass it a string cell, you'll get the string back. If you pass it a numeric cell with formatting rules applied, it will format the number based on them and give you the string back.
For your case, I'd assume that the numeric cells have an integer formatting rule applied to them. If you ask DataFormatter to format those cells, it'll give you back a string with the integer string in it.
All you need to do is:
// Only need one of these
DataFormatter fmt = new DataFormatter();
// Once per cell
String valueAsSeenInExcel = fmt.formatCellValue(cell);
I'm not 100% this would work but I believe what you're looking for is DecimalFormat.
I assume you're using Apache POI.
You should consider playing with DataFormats. Here is some very minimal draft code.
Otherwise, if the data in your work book is consistent, you might want to round the double returned by getNumericCellValue to int.
System.out.println((int)Math.round(cell.getNumericCellValue()));
Use the DataFormatter as the following, it will detect the format of the cell automatically and produce the string output
DataFormatter formatter = new DataFormatter();
String df2 = formatter.formatCellValue(cell);
still people are facing issues with this can use this
https://poi.apache.org/apidocs/dev/org/apache/poi/ss/util/NumberToTextConverter.html
NumberToTextConverter.toText(cell.getNumberiCellValue())

How to insert float values separated by a comma to Excel?

Hello to all once again :)
I have an ArrayList of Strings which contains various data. It is filled with numbers, decimal numbers, simple strings and so one(but all of them are stored as a Strings).
The problem is that since a while I have been storing those files only as Strings in Excel file using POI library with the following code:
cell.setCellValue(listOfResults.get(iterationNumber));
Right now I have to face another problem. From time to time the results of this ArrayList are Float numbers with format *,* or Integers and, as you can see, floats numbers are separated by a comma i.e:
0
1
1,23
213,23899
For above data I have to set a cellType to NUMERIC(not as previously as general). So I was trying with this:
CellStyle style = wb.createCellStyle();
style.setDataFormat(HSSFDataFormat.getBuiltinFormat("0,00"));
cell.setCellStyle(style);
cell.setCellValue(Float.parseFloat(listOfResults.get(iteration)));
//after this I have a negative output with the following Exception
java.lang.NumberFormatException: For input string: "2,26776"
And I know that the problem is connected with this comma value.
So please give me a hint in these two areas:
1.
HSSFDataFormat.getBuiltinFormat("0,00")
how to set it properly that it should be working for my example data ["0", "1", "1,23","21312,23999"] and so one
2.
cell.setCellValue(Float.parseFloat(listOfResults.get(iteration)))
How to properly parse a float with "," inside. I have tried with DecimalFormatSymbols but it doesn't work like I want with POI.
Thanks a lot in advance!
I think you could just write the comma formatted floats into your excel file as a string. For that you do:
cell.setCellValue(listOfResults.get(iteration));
If you intern to use the cell value for further calculations as numeric values, then I think you only have to convert the string number to a float.
float value = Float.parseFloat(listOfResults.get(iteration).replace(",", "."));
cell.setCellValue(value);

Java: Simple format standard for various precision data

I'm trying to format output for user/report appeal, and there are two criteria I'm finding to be in a bit of conflict.
First, the decimal values should line up (format on "%12.10f", predicted integer value range 0-99)
Second, the decimal shouldn't trail an excessive series of zeroes.
For example, I have output that looks like
0.5252772000
0.2053628186
10.5234500000
But using a general formatting, I also end up with:
0.53260000000
0.52630000000
12.43540000000
In certain cases, and it looks kind of garbage.
Is there a simple way to solve this problem? The only solution I can come up with at the moment involves pre-interrogating the data before printing (instead of formatting it during print) which, while technically not expensive, just bugs me as being redundant data handling (ie I have to go through all data once to find the extrema of trailing zeroes to parse against it, and then set the format so that it can go through the data again to parse it)
You can set a DecimalFormat:
DecimalFormat format = new DecimalFormat("0.#");
for (float f : yourFloats){
System.out.println(format.format(f));
}
This also works on doubles.

Android:how to get a value from a RSS feed and subtract it from a value in another RSS feed

Hey i have these two RSS feeds - http://www.petrolprices.com/feeds/averages.xml?search_type=town&search_value=kilmarnock and http://www.petrolprices.com/feeds/averages.xml?search_type=town&search_value=glasgow. Now what i want to do is take a value from one RSS feed and calculate it with a value from the other RSS feed. So for example
132.9 - 133.1
How would i go about doing this?
The basic idea is that the user creates the RSS URLs and then the onClick takes all the values from each of the RSS feeds and compares it against the other so that the user gets the difference so the overall money saved by selecting one or the other
To my understanding, your question has a simple answer, and a more specifically helpful answer. I'll state the simple answer of how to convert character data to a number (whether an int, double, float, etc.) first for the record, specifically focusing on the exception cases, then delve into the detail that specifically applies to your problem.
Any time you have a String representation of something you believe is a certain type of number, you can call the appropriate valueOf() or parseXYZ() method for the target wrapper class. E.g. if you're looking for an integer: theInt is the String "42". Integer.valueOf(theInt) would return an Integer with the value 42, and Integer.parseInt(theInt) would return int 42.
http://developer.android.com/reference/java/lang/Double.html
If theInt represented, say, "forty-two" or "42.0" either method would throw NumberFormatException. Parsing a floating-point number follows much the same process, except that "42.0" would parse correctly, and "42.0.0" will throw a NumberFormatException on Android. The whole string passed to one of these methods must be a valid number of the chosen type. Whitespace at the end will also throw the exception.
If you're using a Scanner, you can use hasNextXYZ() and nextXYZ() to check for and get the next number, where XYZ can be any of the primitive types. The Scanner will operate on the next token, which it will define based on the delimiters you have set.
http://developer.android.com/reference/java/util/Scanner.html
"Great, so when, where, and how should one take the numbers in the XML and pass them to any of the above methods?" You should have a data structure to hold each value, which you populate as the XML is being parsed. Based on the state of things over at your related question, it is my understanding that parsing the XML into tokens has been solved. Therefore, update your parser to call the right String-to-number conversion method for the values of highest, average, and lowest elements. The Strings you need are already correctly trimmed and are passing through the parser at each stage.
Or, to decouple your code further, create an object the hold the data sets you will be comparing, then have the parser simply instantiate and call setters. FuelData could be that object.
class FuelData {
String KEY_TYPE;
double highest;
double average;
double lowest;
// if future support for currency types needed, would go here, hook in to units attribute in xml
FuelData(String type) { // call this every time a type is encountered parsing html
KEY_TYPE = type;
}
void setHighest(String val) { // here, val is value of "Highest" element
try {
highest = Double.parseDouble(val); // because you're not using a Scanner to parse
} catch (NumberFormatException e) {
// handle appropriately
}
// perhaps sanity check: disallow negatives, check not less than lowest, etc.
}
// and so on for average and lowest
double computeSavings(FuelData pricesB) { // called by onClick
// your subtraction goes here. Perhaps you decide it's reasonable to use this method
// to compute savings for Regular vs. Premium and therefore do not check type,
// perhaps you decide that's illogical and do check types.
// Note: good arguments can be made that this method should exist in a different
// class. I've placed it here for simplicity's sake.
}
}
Collect the FuelData in a logical way, that can be accessed after parsing has finished, such as feed1 being parsed into a Set of FuelData, feed2 to a set of FuelData, etc, then have onClick take all the FuelData that was parsed out of each of the RSS feeds, do the comparisons via computeSavings, and return the results.

Java getBytes() method is stripping .00 from numbers in my String

I'm generating a csv file and I have a bunch of numbers without decimal points and I'm being required to put .00 for those cases, I'm using:
DecimalFormat f = new DecimalFormat("#.00");
So fa so good I can see a string looking this way:
String myStringWithDecimalPoints = "124.00, 24567868.00, 5.00"
but when I do:
out.write(myStringWithDecimalPoints.getBytes());
I get in my csv:
124, 24567868, 5
Why is this happening?
Any workarounds? (it does have to be CSV and .00 must appear)
You have to be careful with how you view your data, especially in spreadsheets like Excel where the format of the output depends on the type of the cell. It may or may not show decimal values.
A note for the future, with a call like
out.write(myStringWithDecimalPoints.getBytes());
you can safely assume that Java is writing all the bytes to the OutputStream. If you're not seeing the same thing in the receiving side, then the receiving isn't being done like you would expect.
Most likely there is some confusion between the value of your original string, and how your string appears when you output it in certain ways using your formatter. This does not necessarily mean that your "original" string has been altered.
We could say with greater certainty if you provided a more complete code example.

Categories