Extracting fields from .csv file input line

Extracting fields from .csv file input line - java

I'm very new to the world of Java programming, and although I know this is a ridiculously easy question, I can't seem to phrase my searches in a way that turns up the answer I need...so hopefully someone from this community won't mind helping me.
MY program needs to take an input line from a .csv file and split it into fields of an array, using commas as delimiters. The fields of the array are then assigned to variables that are different data types - char, int, float, and string. What I'm struggling with is the formatting for my String variables.
Here is part of my code:
public void parseCSV(String inputLine) {
String[] splitFields;
splitFields = inputLine.split(",");
try {
empNumber = Integer.parseInt(splitFIelds[0[);
payType = splitFields[1].charAt(0);
hourlyRate = Float.parseFloat(splitFields[2]);
last name =
I need to assign variable lastName, a String data type, to position 3 of my splitFields array. I just don't know how to format it. Help would be greatly appreciated!

A warning on your overall approach
Go with the other answers if you're doing a homework assignment with a simple csv file, but splitting a String on the comma character , will not work for more complicated CSVs. Example:
"Roberts, John", Chicago
This should be read as two cells where the first string is Roberts, John. Naive splitting on , will read this as three cells: "Roberts, John", and Chicago.
What you should be doing (for robust code)
If you're writing serious/production level code, you should use the Apache Commons CSV library to parse CSVs. There are enough tricky issues with commas and quotations, enough variation in possible formats that it makes sense to use a mature library. There's no reason to reinvent the wheel.
Another tool for parsing text
If you're a beginner, this might be opening up a can of worms, but a powerful tool for parsing/validating text input is "regular expressions." Regular expressions can be used to match a string against a pattern and to extract portions of a string. Once you have extracted a String from a specific cell of a csv, you could use a regular expression to validate that the String is in the format you're expecting.
While you're unlikely to really need regular expressions for this project, I thought I'd mention it.

String.split(...) returns a String[] so you really can just assign a specific index to a String.
String s = "one two dog!";
String[] sa = s.split(" ");
String ns = sa[1]; // ns now equals "two"
so you can just:
last_name = splitFields[index]; // this will work fine as long as index is within the `array` bounds.
Please mind that your last name var has a space(that might have been you problem).
I also recommend minding the parses, Integer.parseInt(...) & Float.parseFloat(...) might throw a NumberFormatException if you try to parse a non decimal values.

Easy, it is already a String, so you do not have to perform additional parsing. The following assignment will do the trick:
lastName = splitFields[3];

Related

String.split() returns an array with an additional empty value

I'm working on a piece of code where I've to split a string into individual parts. The basic logic flow of my code is, the numbers below on the LHS, i.e 1, 2 and 3 are ids of an object. Once I split them, I'd use these ids, get the respective value and replace the ids in the below String with its respective values. The string that I have is as follow -
String str = "(1+2+3)>100";
I've used the following code for splitting the string -
String[] arraySplit = str.split("\\>|\\<|\\=");
String[] finalArray = arraySplit[0].split("\\(|\\)|\\+|\\-|\\*");
Now the arrays that I get are as such -
arraySplit = [(1+2+3), >100];
finalArray = [, 1, 2, 3];
So, after the string is split, I'd replace the string with the values, i.e the string would now be, (20+45+50)>100 where 20, 45 and 50 are the respective values. (this string would then be used in SpEL to evaluate the formula)
I'm almost there, just that I'm getting an empty element at the first position. Is there a way to not get the empty element in the second array, i.e finalArray? Doing some research on this, I'm guessing it is splitting the string (1+2+3) and taking an empty element as a part of the string.
If this is the thing, then is there any other method apart from String.split() that would give me the same result?
Edit -
Here, (1+2+3)>100 is just an example. The round braces are part of a formula, and the string could also be as ((1+2+3)*(5-2))>100.
Edit 2 -
After splitting this String and doing some code over it, I'm goind to use this string in SpEL. So if there's a better solution by directly using SpEL then also it would be great.
Also, currently I'm using the syntax of the formula as such - (1+2+3) * 4>100 but if there's a way out by changing the formula syntax a bit then that would also be helpful, e.g replacing the formula by - ({#1}+{#2}+{#3}) *
{#4}>100, in this case I'd get the variable using {# as the variable and get the numbers.
I hope this part is clear.
Edit 3 -
Just in case, SpEL is also there in my project although I don't have much idea on it, so if there's a better solution using SpEL then its more than welcome. The basic logic of the question is written at the starting of the question in bold.

If you take a look at the split(String regex, int limit)(emphasis is mine):
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
Thus, you can specify 0 as limit param:
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

If you keep things really simple, you may be able to get away with using a combination of regular expressions and string operations like split and replace.
However, it looks to me like you'd be better off writing a simple parser using ANTLR.
Take a look at Parsing an arithmetic expression and building a tree from it in Java and https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Five+minute+introduction+to+ANTLR+3
Edit: I haven't used ANTLR in a while - it's now up to version 4, and there may be some significant differences, so make sure that you check the documentation for that version.

Using regex in Java to extract string after first comma and before two capital letters and a comma

I am currently working with strings that follow this format:
4,Matt, Hopkins,MI,5.75,Wood,33.0,2.25,2.1,2016-09-02,74.25,69.3,8.254125,151.804125
and I am trying to use regex to extract all the words and integers as separate strings ( as in MI, Wood, 33.0 and so forth) with one exception: I want to treat the part that follows the first comma as a single string, until we get to the all caps - so the regex would extract this:
[4] [Matt, Hopkins] [MI] [5.75] [Wood] and so forth.
Note that the name part can have no commas at all i.e. [Hopkins] or more than one i.e. [Matt, Jr., Hopkins]. The all caps field desribes a state and so always follows the same format.
I do not understand Regex well enough to do that - so far I only came up with
[a-zA-Z(?:\d*\.)?\d+-]+
which handles all fields alright, except the name.

You can do something like (my Java is a bit rusty and I'm posting this from a phone):
String[] values = data.split(",(?! )");
Java allows splitting a string on a regex, and this simple specimen uses a negative lookahead to ensure that you're only splitting on CSV commas, rather than the ones in names.

Using regex might just make things harder for yourself here.
This looks like CSV data. You can use a CSV library to correctly parse this into individual fields (*):
String[] fields = YourCsvLibrary.parseRow(string); // or string.split(","), maybe.
and then recombine the fields as appropriate. For example, your regex's logic can be expressed via the following code:
String[] output = Arrays.copyOfRange(fields, 1, fields.length);
output[0] = fields[0];
output[1] = fields[1] + "," + fields[2];
Ideone demo
(*) String.split(",") might work, provided the field data doesn't contain quotes, commas, newlines, etc.

Effective String Splitting

I have a 'Text' File from which I have to read data row-by-row. File contains around 1330 Rows. I need to read each row (which is a String) and then split it into substrings which will be inserted as data into database.
I'm able to read the data from the file row-by-row.
I'm able to insert data into database as well.
The Length of the String that I have to split has approximately 2750 characters. 1 option of splitting this String will be using 'substring(start, end)' method. However, as the line has 2750 characters, the number of splitted strings would be huge around 200 - 225 (I have mapping which suggests certain character length will have what string in Xml).
Can someone suggest any other technique of splitting these strings?

I suspect that given your numbers, your initial approach would be well within any standard JVM memory constraints.
As ever, premature optimisation is the root of all evil. I would try a simple split, and look to refine it if you have issues. I suspect at 200 strings over a line of 2700 chars that you won't have problems.
Note that the String object implements a flyweight pattern. That is, substring() doesn't replicate strings but merely reports back on a window on the original String's data (char array). Consequently an implementation using substring() will use very little extra memory (for what it's worth)

Since you already have the start/end defined and don't seem to even need to parse the string, the substring call is probably the fastest way. The lookups in substring will be hitting array indexes, addresses in memory, so the lookup is probably O(1)... and then maybe Java will copy out the particular string needed, but that's going to have to happen anyway and will only be O(n) even for all substrings if there's no overlap.
substring doesn't actually change the underlying string, it's just going to copy out the relevant portion you're looking for on each call (if it even does that, it would be theoretically possible for it to return a kind of String that encapsulated the original string). Unless you have identified an actual performance problem, the simplest solution is the best one.
If you had to split on, for example, commas, I'd use a CSVReader library.

you can use split() method of String class to split the string but for that string to be split it has to have some delimiter like comma, dash or something, and using that delimiter you can split the string.
String str = "one-two-three";
String[] temp;
/* delimiter */
String delimiter = "-";
/* given string will be split by the argument delimiter provided. */
temp = str.split(delimiter);

What's the best way to have stringTokenizer split up a line of text into predefined variables

I'm not sure if the title is very clear, but basically what I have to do is read a line of text from a file and split it up into 8 different string variables. Each line will have the same 8 chunks in the same order (title, author, price, etc). So for each line of text, I want to end up with 8 strings.
The first problem is that the last two fields in the line may or may not be present, so I need to do something with stringTokenizer.hasMoreTokens, otherwise it will die messily when fields 7 and 8 are not present.
I would ideally like to do it in one while of for loop, but I'm not sure how to tell that loop what the order of the fields is going to be so it can fill all 8 (or 6) strings correctly. Please tell me there's a better way that using 8 nested if statements!
EDIT: The String.split solution seems definitely part of it, so I will use that instead of stringTokenizer. However, I'm still not sure what the best way of feeding the individual strings into the constructor. Would the best way be to have the class expecting an array, and then just do something like this in the constructor:
line[1] = isbn;
line[2] = title;

The best way is to not use a StringTokenizer at all, but use String's split method. It returns an array of Strings, and you can get the length from that.
For each line in your file you can do the following:
String[] tokens = line.split("#");
tokens will now have 6 - 8 Strings. Use tokens.length() to find out how many, then create your object from the array.

Regular expression is the way. You can convert your incoming String into an array of String using the split method
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)

Would a regular expression with capture groups work for you? You can certainly make parts of the expression optional.
An example line of data or three might be helpful.

Is this a CSV or similar file by any chance? If so, there are libraries to help you, for example Apache Commons CSV (link to alternatives on their page too). It will get you a String[] for each line in the file. Just check the array size to know what optional fields are present.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting fields from .csv file input line - java

Easy, it is already a String, so you do not have to perform additional parsing. The following assignment will do the trick: lastName = splitFields[3];

Related

String.split() returns an array with an additional empty value

Using regex in Java to extract string after first comma and before two capital letters and a comma

Suggested ways of reading a text file with inconsistent formatting

Effective String Splitting

What's the best way to have stringTokenizer split up a line of text into predefined variables

Categories

Resources