I am getting a value as list of string in string format like this: "["a", "b"]". I would like to convert them to a list of strings. I can do this by stripping the leading and trailing braces and then splitting on comma. But here the problem is that I may receive the same value as single string also "a" that too I want to convert to a list of strings. So is there any way to generalize this.
One possible solution is to use Regex.
Your expression can look like this: "(.+?)"
.+? matches any character (except for line terminators)
+? Quantifier - Matches between one and unlimited times, as few times as possible, expanding as needed.
String tokens = "[\"a\", \"b,c\", \"test\"]";
String pattern = "\"(.+?)\"";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(tokens);
List<String> tokenList = new ArrayList<String>();
while (m.find()) {
tokenList.add(m.group());
}
System.out.println(tokenList);
you can generalize the following:
String str = "\"[\"a\",\"b\"]\"";
String[] splitStrs = str.split("\"",7);
System.out.println(splitStrs[0]+" "+splitStrs[1]+" "+splitStrs[2]+" "+splitStrs[3]+" "+splitStrs[4]+" "+splitStrs[5]+" "+splitStrs[6]);
My output
[ a , b ]
Related
I have a string below which I want to split in String array with multiple delimiters.
The delimiters are comma (,), semicolon (;), "OR" and "AND".
But I do not want to split on a comma if it's in brackets.
Example input:
device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)
I am able to split the String with regex, but it doesn't handle commas in brackets correctly.
How can I fix this?
Pattern ptn = Pattern.compile("(,|;|OR|AND)");
String[] parts = ptn.split(query);
for(String p:parts){
System.out.println(p);
queryParams.add(p.trim());
}
You could use a negative look-ahead:.
String[] parts = input.split(",(?![^()]*\\))|;| OR | AND ")
Or an uglier (but perhaps conceptually simpler) way you could do it would be to replace any commas within brackets with a temporary placeholder, then do the split and replace the placeholders with real commas in the results.
String input = "X,Y=((A,B),C) OR Z";
Pattern pattern = Pattern.compile("\\(.*\\)");
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group().replaceAll(",", "_COMMA_"));
}
matcher.appendTail(sb);
String[] parts = sb.toString().split("(,|;| OR | AND )");
for (String part : parts) {
System.out.println(part.replace("_COMMA_", ","));
}
Prints:
X
Y=((A,B),C)
Z
Alternatively, you could write your own little tokenizer that reads the input character-by-character using charAt(index) or define a grammar for an off-the-shelf parser.
You can use negative look-ahead (?!...), which looks at the following characters, and if those characters match the pattern in brackets, the overall match will fail.
String query = "device_name==device503,device_type!=GATEWAY;site_name<site3434 OR country==India AND location==BLR; new_name=in=(Rajesh,Suresh)";
String[] parts = query.split("\\s*(,(?![^()]*\\))|;|OR|AND)\\s*");
for(String part: parts)
System.out.println(part);
Output:
device_name==device503
device_type!=GATEWAY
site_name<site3434
country==India
location==BLR
new_name=in=(Rajesh,Suresh)
So in this case we check whether the characters following the , are 0 or more characters which aren't either ( or ), followed by a ), and if this is true, the , match fails.
This won't work if you can have nested brackets.
Note:
String also has a split method (as used above), which is useful for simplicity's sake (but would be slower than reusing the same Pattern over and over again for multiple Strings).
You can add \\s* (0 or more whitespace characters) to your regex to remove any spaces before or after a delimiter.
If you're using | without anything before or after (e.g. "a|b|c"), you don't need to put it in brackets.
How can i get a String inside brackets. See code below.
String str = "C1<C2, C3<T1>>.C4<T2>.C5"
I need to get C1<C2, C3<T1>>, C4<T2>, and C5.
See code what I tried below
Pattern pat = Pattern.compile("(\\w+(<[^>]+>)?)(.\\w+(<[^>]+>)?)*");
Matcher mat = pat.matcher(str);
but the result was
C1<C2, C3<T1>
There are 2 problems that I see with your code:
It seems like you are only printing the first match instead of
looping through the results. Use while(mat.find()) to iterate
through the list of matches.
Simplify your pattern to \\w+(<[^>]+>+)? to get C1<C2, C3<T1>>, C4<T2>, and C5.
RegEx pattern explained:
w+= 1 or more alphanumeric or underscore character
()? = 0 or 1 of what is in the parenthesis
< = match the < character
[^>]+ = 1 or more sets characters until the > character
>+ = 1 or more > character (An alternative would be >{1,2} if you want to enforce only either one or two > characters.)
Your resulting code should look like the following:
public static void main(String[] args)
{
String str = "C1<C2, C3<T1>>.C4<T2>.C5";
Pattern pat = Pattern.compile("\\w+(<[^>]+>+)?");
Matcher mat = pat.matcher(str);
while(mat.find()) {
System.out.println(mat.group());
}
}
If you just want a list of the parts though, a much simpler way to accomplish this would be to use split() instead of RegEx. You can split the string on ., save the pieces in an array and then iterate through the array as so desired.
That would be accomplished with the following:
String[] parts = str.split("\\.");
Just split on dots:
String[] parts = str.split("\\.");
This does what you want using the sample input in the question.
I have this text tokenized as follows:
∅habbaz∅abdelkrim∅habbaz∅abdelkrim∅habbaz∅abdelkrim
I want to get every string between the character ∅. I have tried the following:
ArrayList<String> ta = new ArrayList();
String test=t2.getText();
String str = test;
Pattern pattern = Pattern.compile("∅(.*?)∅");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
ta.add(matcher.group(1));
}
t3.setText(ta.toString());
It's supposed to give me:
[habbaz,abdelkrim, habbaz,abdelkrim, habbaz,abdelkrim]
But it's giving me only:
[habbaz, habbaz, habbaz]
If you want to go with the regex solution, try this:
Pattern pattern = Pattern.compile("∅([^∅]*)");
This pattern will match a ∅ followed by any number of non-∅, which should do the trick.
Use split:
String input = "∅habbaz∅abdelkrim∅habbaz∅abdelkrim∅habbaz∅abdelkrim";
String[] tokens = input.split("∅");
This will produce an array of those strings that are between your delimiter. Note that the first string in the array will be "", the empty string, because your input string starts with the delimiter ∅. To avoid this, take a substring of the input right before you split (if (input.startsWith("∅")) {input = input.substring(1);}), or process the resulting tokens to exclude any empty strings.
To turn the tokens into your ArrayList, use the following:
ArrayList ta = new ArrayList<Element>(Arrays.asList(tokens))
Or you could just write:
List ta = Arrays.asList(input.split("∅"));
I have this code:
String s = "bla mo& lol!";
Pattern p = Pattern.compile("[a-z]");
String[] allWords = p.split(s);
I'm trying to get all the words according to this specific pattern into an array.
But I get all the opposite.
I want my array to be:
allWords = {bla, mo, lol}
but I get:
allWords = { ,& ,!}
Is there any fast solution or do I have to use the matcher and a while loop to insert it
into an array?
Pattern p = Pattern.compile("[a-z]");
p.split(s);
means all [a-z] would be separator, not array elements. You may want to have:
Pattern p = Pattern.compile("[^a-z]+");
You are splitting s AT the letters. split uses for delimiters, so change your pattern
[^a-z]
The split method is given a delimiter, which is your Pattern.
It's the inverted syntax, yet the very same mechanism of String.split, wherein you give a Pattern representation as argument, which will act as delimiter as well.
Your delimiter being a character class, that is the intended result.
If you only want to keep words, try this:
String s = "bla mo& lol!";
// | will split on
// | 1 + non-word
// | characters
Pattern p = Pattern.compile("\\W+");
String[] allWords = p.split(s);
System.out.println(Arrays.toString(allWords));
Output
[bla, mo, lol]
One simple way is:
String[] words = s.split("\\W+");
I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.
I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:
String[] tokens = input.split("[^\\[\\d\\]]");
which produced the following:
[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]
Oh dear. So, I thought, "what would replaceAll do in this instance?":
String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");
which produced:
[0][1]
Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?
To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).
well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.
as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.
From the API documentation:
Splits this string around matches of
the given regular expression.
This method works as if by invoking
the two-argument split method with the
given expression and a limit argument
of zero. Trailing empty strings are
therefore not included in the
resulting array.
The string "boo:and:foo", for example,
yields the following results with
these expressions:
Regex Result
: { "boo", "and", "foo" }
o { "b", "", ":and:f" }
This is not a direct answer to your question, however I want to show you a great API that will suit your need.
Check out Splitter from Google Guava.
So for your example, you would use it like this:
Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);
//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
System.out.println(s);
}
This prints:
0
1
split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur.
replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string.
If you're trying to grab the 0 and the 1, it's a trivial loop:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);
Or if it's always exactly two of them:
String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);
The problem is that split is the wrong operation here.
In ruby, I'd tell you to string.scan(/\[\d+\]/), which would give you the array ["[0]","[1]"]
Java doesn't have a single-method equivalent, but we can write a scan method as follows:
public List<String> scan(String string, String regex){
List<String> list = new ArrayList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
list.add(matcher.group());
}
return retval;
}
and we can call it as scan(string,"\\[\\d+\\]")
The equivalent Scala code is:
"""\[\d+\]""".r findAllIn string