Effective String Splitting - java

I have a 'Text' File from which I have to read data row-by-row. File contains around 1330 Rows. I need to read each row (which is a String) and then split it into substrings which will be inserted as data into database.
I'm able to read the data from the file row-by-row.
I'm able to insert data into database as well.
The Length of the String that I have to split has approximately 2750 characters. 1 option of splitting this String will be using 'substring(start, end)' method. However, as the line has 2750 characters, the number of splitted strings would be huge around 200 - 225 (I have mapping which suggests certain character length will have what string in Xml).
Can someone suggest any other technique of splitting these strings?

I suspect that given your numbers, your initial approach would be well within any standard JVM memory constraints.
As ever, premature optimisation is the root of all evil. I would try a simple split, and look to refine it if you have issues. I suspect at 200 strings over a line of 2700 chars that you won't have problems.
Note that the String object implements a flyweight pattern. That is, substring() doesn't replicate strings but merely reports back on a window on the original String's data (char array). Consequently an implementation using substring() will use very little extra memory (for what it's worth)

Since you already have the start/end defined and don't seem to even need to parse the string, the substring call is probably the fastest way. The lookups in substring will be hitting array indexes, addresses in memory, so the lookup is probably O(1)... and then maybe Java will copy out the particular string needed, but that's going to have to happen anyway and will only be O(n) even for all substrings if there's no overlap.
substring doesn't actually change the underlying string, it's just going to copy out the relevant portion you're looking for on each call (if it even does that, it would be theoretically possible for it to return a kind of String that encapsulated the original string). Unless you have identified an actual performance problem, the simplest solution is the best one.
If you had to split on, for example, commas, I'd use a CSVReader library.

you can use split() method of String class to split the string but for that string to be split it has to have some delimiter like comma, dash or something, and using that delimiter you can split the string.
String str = "one-two-three";
String[] temp;
/* delimiter */
String delimiter = "-";
/* given string will be split by the argument delimiter provided. */
temp = str.split(delimiter);

Related

Rearranging one string to another in Java

I am trying to find whether a part of given string A can be or can not be rearranged to given string B (Boolean output).
Since the algorithm must be at most O(n), to ease it, I used stringA.retainAll(stringB), so now I know string A and string B consist of the same set of characters and now the whole task smells like regex.
And .. reading about regex, I might be now having two problems(c).
The question is, do I potentially face a risk of getting O(infinity) by using regex or its more efficient to use StreamAPI with the purpose of finding whether each character of string A has enough duplicates to cover each of character of string B? Let alone regex syntax is not easy to read and build.
As of now, I can't use sorting (any sorting is at least n*log(n)) nor hashsets and the likes (as it eliminates duplicates in both strings).
Thank you.
You can use a HashMap<Character,Integer> to count the number of occurrences of each character of the first String. That would take linear time.
Then, for each Character of the second String, find if it's in the HashMap and decrement the counter (if it's still positive). This will also take linear time, and if you manage to decrement the counters for all the characters of the second String, you succeed.

String.split() returns an array with an additional empty value

I'm working on a piece of code where I've to split a string into individual parts. The basic logic flow of my code is, the numbers below on the LHS, i.e 1, 2 and 3 are ids of an object. Once I split them, I'd use these ids, get the respective value and replace the ids in the below String with its respective values. The string that I have is as follow -
String str = "(1+2+3)>100";
I've used the following code for splitting the string -
String[] arraySplit = str.split("\\>|\\<|\\=");
String[] finalArray = arraySplit[0].split("\\(|\\)|\\+|\\-|\\*");
Now the arrays that I get are as such -
arraySplit = [(1+2+3), >100];
finalArray = [, 1, 2, 3];
So, after the string is split, I'd replace the string with the values, i.e the string would now be, (20+45+50)>100 where 20, 45 and 50 are the respective values. (this string would then be used in SpEL to evaluate the formula)
I'm almost there, just that I'm getting an empty element at the first position. Is there a way to not get the empty element in the second array, i.e finalArray? Doing some research on this, I'm guessing it is splitting the string (1+2+3) and taking an empty element as a part of the string.
If this is the thing, then is there any other method apart from String.split() that would give me the same result?
Edit -
Here, (1+2+3)>100 is just an example. The round braces are part of a formula, and the string could also be as ((1+2+3)*(5-2))>100.
Edit 2 -
After splitting this String and doing some code over it, I'm goind to use this string in SpEL. So if there's a better solution by directly using SpEL then also it would be great.
Also, currently I'm using the syntax of the formula as such - (1+2+3) * 4>100 but if there's a way out by changing the formula syntax a bit then that would also be helpful, e.g replacing the formula by - ({#1}+{#2}+{#3}) *
{#4}>100, in this case I'd get the variable using {# as the variable and get the numbers.
I hope this part is clear.
Edit 3 -
Just in case, SpEL is also there in my project although I don't have much idea on it, so if there's a better solution using SpEL then its more than welcome. The basic logic of the question is written at the starting of the question in bold.
If you take a look at the split(String regex, int limit)(emphasis is mine):
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
Thus, you can specify 0 as limit param:
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
If you keep things really simple, you may be able to get away with using a combination of regular expressions and string operations like split and replace.
However, it looks to me like you'd be better off writing a simple parser using ANTLR.
Take a look at Parsing an arithmetic expression and building a tree from it in Java and https://theantlrguy.atlassian.net/wiki/display/ANTLR3/Five+minute+introduction+to+ANTLR+3
Edit: I haven't used ANTLR in a while - it's now up to version 4, and there may be some significant differences, so make sure that you check the documentation for that version.

Extracting fields from .csv file input line

I'm very new to the world of Java programming, and although I know this is a ridiculously easy question, I can't seem to phrase my searches in a way that turns up the answer I need...so hopefully someone from this community won't mind helping me.
MY program needs to take an input line from a .csv file and split it into fields of an array, using commas as delimiters. The fields of the array are then assigned to variables that are different data types - char, int, float, and string. What I'm struggling with is the formatting for my String variables.
Here is part of my code:
public void parseCSV(String inputLine) {
String[] splitFields;
splitFields = inputLine.split(",");
try {
empNumber = Integer.parseInt(splitFIelds[0[);
payType = splitFields[1].charAt(0);
hourlyRate = Float.parseFloat(splitFields[2]);
last name =
I need to assign variable lastName, a String data type, to position 3 of my splitFields array. I just don't know how to format it. Help would be greatly appreciated!
A warning on your overall approach
Go with the other answers if you're doing a homework assignment with a simple csv file, but splitting a String on the comma character , will not work for more complicated CSVs. Example:
"Roberts, John", Chicago
This should be read as two cells where the first string is Roberts, John. Naive splitting on , will read this as three cells: "Roberts, John", and Chicago.
What you should be doing (for robust code)
If you're writing serious/production level code, you should use the Apache Commons CSV library to parse CSVs. There are enough tricky issues with commas and quotations, enough variation in possible formats that it makes sense to use a mature library. There's no reason to reinvent the wheel.
Another tool for parsing text
If you're a beginner, this might be opening up a can of worms, but a powerful tool for parsing/validating text input is "regular expressions." Regular expressions can be used to match a string against a pattern and to extract portions of a string. Once you have extracted a String from a specific cell of a csv, you could use a regular expression to validate that the String is in the format you're expecting.
While you're unlikely to really need regular expressions for this project, I thought I'd mention it.
String.split(...) returns a String[] so you really can just assign a specific index to a String.
String s = "one two dog!";
String[] sa = s.split(" ");
String ns = sa[1]; // ns now equals "two"
so you can just:
last_name = splitFields[index]; // this will work fine as long as index is within the `array` bounds.
Please mind that your last name var has a space(that might have been you problem).
I also recommend minding the parses, Integer.parseInt(...) & Float.parseFloat(...) might throw a NumberFormatException if you try to parse a non decimal values.
Easy, it is already a String, so you do not have to perform additional parsing. The following assignment will do the trick:
lastName = splitFields[3];

Best algorithm possible to search for a letter in a string and put a ' before the letter efficiently?

i need to search in the thousands strings of a array for the character '. If i find the character ', then, i must put another character ' before it. Like this: ''
For example, imagine that i have a 1000 strings on this array: List <String> strings. For example, this is one of my strings:
"I have some Levi's shoes."
The algorithm must transform the string into: "I have some Levi''s shoes."
I must check all the thousands of strings of my array strings
Wich is the best efficient way to achieve this?
Thanks
The simplest way is to iterate over the string in your array, and use replace(CharSequence, CharSequence) on each, assigning the results back into the array.
For a single string:
myString = myString.replace("'", "''");
For a List of strings:
for(int i=0;i<myList.size();i++){
myList.put(i, myList.get(i).replace("'", "''"));
}
As Jon Skeet pointed out, replace is better than replaceAll because you don't have to compile and run a regex for a simple character sequence.
You can use replaceAll method or replace. It simply iterate over String and replace characters.Both of this method compile regex and use StringBuffer. You don't need regex in you case. Probably you can slightly boost it with your implementation, you don't need regex, you can try StringBuilder instead, it's not synchronized.
You are essentially doing a worst case text search -- that being a single character. I think the only way to get a real speedup is to divide the work and use more threads to do it faster. Multi-core CPU or GPU can really speed up your search, and I know there are Java bindings / libraries for both.
try StringUtils.replace(String str, String searchChars, String replaceChars) (apache commons)

What's the best way to have stringTokenizer split up a line of text into predefined variables

I'm not sure if the title is very clear, but basically what I have to do is read a line of text from a file and split it up into 8 different string variables. Each line will have the same 8 chunks in the same order (title, author, price, etc). So for each line of text, I want to end up with 8 strings.
The first problem is that the last two fields in the line may or may not be present, so I need to do something with stringTokenizer.hasMoreTokens, otherwise it will die messily when fields 7 and 8 are not present.
I would ideally like to do it in one while of for loop, but I'm not sure how to tell that loop what the order of the fields is going to be so it can fill all 8 (or 6) strings correctly. Please tell me there's a better way that using 8 nested if statements!
EDIT: The String.split solution seems definitely part of it, so I will use that instead of stringTokenizer. However, I'm still not sure what the best way of feeding the individual strings into the constructor. Would the best way be to have the class expecting an array, and then just do something like this in the constructor:
line[1] = isbn;
line[2] = title;
The best way is to not use a StringTokenizer at all, but use String's split method. It returns an array of Strings, and you can get the length from that.
For each line in your file you can do the following:
String[] tokens = line.split("#");
tokens will now have 6 - 8 Strings. Use tokens.length() to find out how many, then create your object from the array.
Regular expression is the way. You can convert your incoming String into an array of String using the split method
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)
Would a regular expression with capture groups work for you? You can certainly make parts of the expression optional.
An example line of data or three might be helpful.
Is this a CSV or similar file by any chance? If so, there are libraries to help you, for example Apache Commons CSV (link to alternatives on their page too). It will get you a String[] for each line in the file. Just check the array size to know what optional fields are present.

Categories