I have a csv file formatted
<F,Bird,20,10/> < A,Fish,5,11,2/>
I was wondering how to read in those values separately.
Would I have to get the whole line to an array?
I have thought of doing line.split("/>") but then the first data would have < in them which I don't want.
If I on the other hand just seperate it using line.split(",") and then assign each values accordingly the values in the middle would merge so that does not work neither.
Is there a way to separate the string first without the <>/ symbols?
You can use several delimiters in split regexp, like this:
String line = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
String[] lines = line.split("<|/> <|/>");
for (String item: lines) {
System.out.println(item);
}
Output (with all your spaces):
F,Bird,20,10
A,Fish,5,11,2
Try splitting your input string using the lookbehind ?<=/>:
String input = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
input = input.replaceAll("\\s+", "");
String[] parts = input.split("(?<=/>)");
for (String part : parts) {
System.out.println(part.replaceAll("[<>/]", ""));
}
Note that I removed all spaces from your string to make splitting cleaner. We could still try to split with arbitrary whitespace present, but it would be more work. From this point, you can easily access the CSV data contained within each tag.
Output:
F,Bird,20,10
A,Fish,5,11,2
Demo here:
Rextester
Related
I'm new to Java and to regex in particular
I have a CSV file that look something like :
col1,col2,clo3,col4
word1,date1,date2,port1,port2,....some amount of port
word2,date3,date4,
....
What I would like is to iterate over each line (I suppose I'll do it with simple for loop) and get all ports back.
I guess what I need is the fetch every thing after the two dates and look for
,(\d+),? and the group that comes back
My question(s) is :
1) Can it be done with one expression? (meaning, without storing the result in a string and then apply another regex)
2) Can I maybe incorporate the iteration over the lines into the regex?
There are many ways to do it, I will show a few for educational purpose.
I put your input in a String just for the example, you will have to read it properly. I also store the results in a List and print them at the end:
public static void main(String[] args) {
String source = "col1,col2,clo3,col4" + System.lineSeparator() +
"word1,date1,date2,port1,port2,port3" + System.lineSeparator() +
"word2,date3,date4";
List<String> ports = new ArrayList<>();
// insert code blocks bellow
System.out.println(ports);
}
Using Scanner:
Scanner scanner = new Scanner(source);
scanner.useDelimiter("\\s|,");
while (scanner.hasNext()) {
String token = scanner.next();
if (token.startsWith("port"))
ports.add(token);
}
Using String.split:
String[] values = source.split("\\s|,");
for (String value : values) {
if (value.startsWith("port"))
ports.add(value);
}
Using Pattern-Matcher:
Matcher matcher = Pattern.compile("(port\\d+)").matcher(source);
while (matcher.find()) {
ports.add(matcher.group());
}
Output:
[port1, port2, port3]
If you know where the "ports" are located in the file, you can use that info to slightly increase performance by specifying the location and getting a substring.
Yes, it can be done in one line:
first remove all non-port terms (those containing a non-digit)
then split the result of step one on commas
Here's the magic line:
String[] ports = line.replaceAll("(^|(?<=,))[^,]*[^,\\d][^,]*(,|$)", "").split(",");
The regex says "any term that has a non-digit" where a "term" is a series of characters between start-of-input/comma and comma/end-of-input.
Conveniently, the split() method doesn't return trailing blank terms, so no need worry about any trailing commas left after the first replace.
In java 8, you can do it in one line, but things are much more straightforward:
List<String> ports = Arrays.stream(line.split(",")).filter(s -> s.matches("\\d+")).collect(Collectors.toList());
This streams the result of a split on commas, then filters out non-all-numeric elements, them collects the result.
Some test code:
String line = "foo,12-12-12,11111,2222,bar,3333";
String[] ports = line.replaceAll("(^|(?<=,))[^,]*[^,\\d][^,]*(,|$)", "").split(",");
System.out.println(Arrays.toString(ports));
Output:
[11111, 2222, 3333]
Same output in java 8 for:
String line = "foo,12-12-12,11111,2222,bar,3333,baz";
List<String> ports = Arrays.stream(line.split(",")).filter(s -> s.matches("\\d+")).collect(Collectors.toList());
I am trying to implement a way for taking in arguments for a photo album that I am building. However, I am having a hard time figuring out how to tokenize the input.
Two sample inputs:
addPhoto "DSC_017.jpg" "DSC_017" "Fall colors"
addPhoto "DSC_018.jpg" "DSC_018" "Colorado Springs"
I would like this input to return a String array containing 4 elements where
String s[1]="addPhoto"
String s[2]="DSC_017.jpg"
String s[3]="DSC_017"
String s[4] = "Fall colors"
I looked into StringTokenizer and String.split but I'm not sure how to go about setting the delimiters.
String line = "addPhoto \"DSC_018.jpg\" \"DSC_018\" \"Colorado Springs\"";
String[] pieces = line.split(" \"");
for (String p : pieces) {
System.out.println(p.replaceAll("\"", ""));
}
You might want to pull these arguments off the command line args, the shell will do the quote handling for you. However, you'll only be able to do one addPhoto operation at a time.
If you can't do that, you might try one of these answers:
http://www.source-code.biz/snippets/java/5.htm
Parsing quoted text in java
Split a quoted string with a delimiter
Tokenizing a String but ignoring delimiters within quotes
I used Tokenizer to split a text file which was separated like so:
FIRST NAME, MIDDLE NAME, LAST NAME
harry, rob, park
tom,,do
while (File.hasNext())
{
StringTokenizer strTokenizer = new StringTokenizer(File.nextLine(), ",");
while (strTokenizer.hasMoreTokens())
{
I have used this method to split it.
The problem is that when there is missing data (for example Tom does not have a middle name) it will ignore the split (,) and register the next value as the middle name.
How do I get it to register it as blank instead?
based on Davio's answer you can manage the blank and replace it with your own String :
String[] result = File.nextLine().split(",");
for (int x = 0; x < result.length; x++){
if (result[x].isEmpty()) {
System.out.println("Undefined");
} else
System.out.println(result[x]);
}
Use String.split instead.
"tom,,do".split(",") will yield an array with 3 elements: tom, (blank) and do
Seems to me like you have 2 solutions:
You either double the comma in the file as suggested by ToYonos OR.
You can count the tokens before you assign the values to the variables using the countTokens() method, if there are only 2 tokens that means the person doesn't have a middle name.
From JavaDoc of StringTokenizer
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead
So, you don't need the use of StringTokenizer.
An example with String.split
String name = "name, ,last name";
String[] nameParts = name.split(",");
for (String part : nameParts) {
System.out.println("> " + part);
}
this produces the following output
> name
>
> last name
I am having a difficult time figuring out how to split a string like the one following:
String str = "hi=bye,hello,goodbye,pickle,noodle
This string was read from a text file and I need to split the string into each element between the commas. So I would need to split each element into their own string no matter what the text file reads. Keep in mind, each element could be any length and there could be any amount of elements which 'hi' is equal to. Any ideas? Thanks!
use split!
String[] set=str.split(",");
then access each string as you need from set[...] (so lets say you want the 3rd string, you would say: set[2]).
As a test, you can print them all out:
for(int i=0; i<set.length;i++){
System.out.println(set[i]);
}
If you need a bit more advanced approach, I suggest guava's Splitter class:
Iterable<String> split = Splitter.on(',')
.omitEmptyStrings()
.trimResults()
.split(" bye,hello,goodbye,, , pickle, noodle ");
This will get rid of leading or trailing whitespaces and omit blank matches. The class has some more cool stuff in it like splitting your String into key/value pairs.
str = str.subString(indexOf('=')+1); // remove "hi=" part
String[] set=str.split(",");
I'm wondering: Do you mean to split it as such:
"hi=bye"
"hi=hello"
"hi=goodbye"
"hi=pickle"
"hi=noodle"
Because a simple split(",") will not do this. What's the purpose of having "hi=" in your given string?
Probably, if you mean to chop hi= from the front of the string, do this instead:
String input = "hi=bye,hello,goodbye,pickle,noodle";
String hi[] = input.split(",");
hi[0] = (hi[0].split("="))[1];
for (String item : hi) {
System.out.println(item);
}
I'm trying to split paragraphs of information from an array into a new one which is broken into individual words. I know that I need to use the String[] split(String regex), but I can't get this to output right.
What am I doing wrong?
(assume that sentences[i] is the existing array)
String phrase = sentences[i];
String[] sentencesArray = phrase.split("");
System.out.println(sentencesArray[i]);
Thanks!
It might be just the console output going wrong. Try replacing the last line by
System.out.println(java.util.Arrays.toString(sentencesArray));
The empty-string argument to phrase.split("") is suspect too. Try passing a word boundary:
phrase.split("\\b");
You are using an empty expression for splitting, try phrase.split(" ") and work from there.
This does nothing useful:
String[] sentencesArray = phrase.split("");
you're splitting on empty string and it will return an array of the individual characters in the string, starting with an empty string.
It's hard to tell from your question/code what you're trying to do but if you want to split on words you need something like:
private static final Pattern SPC = Pattern.compile("\\s+");
.
.
String[] words = SPC.split(phrase);
The regex will split on one or more spaces which is probably what you want.
String[] sentencesArray = phrase.split("");
The regex based on which the phrase needs to be split up is nothing here. If you wish to split it based on a space character, use:
String[] sentencesArray = phrase.split(" ");
// ^ Give this space