How to split string using regex in java [duplicate]

How to split string using regex in java [duplicate] - java

This question already has an answer here:
Split regex to extract Strings of contiguous characters
(1 answer)
Closed 7 years ago.
I have some string patterns. Each pattern consist of two characters "A" and "B".
My patterns are like "AA" or "ABA" or "AABBABA" or ...
I want to split these patterns and the output for these examples must be like: {"AA"} or {"A","B","A"} or {"AA","BB","A","B","A"}
What I tried so far :
String pattern = "AABBABA" //or whatever
String firstChar = pattern.toString().substring(1, 2);
String[] split = pattern.split(firstChar);
for (String string : split) {
Log.i("findPattern", "Splitted Pattern: " + string + "");
}
The problem with my code is that it removes all strings that are equal to firstChar.
What regular expression should I use to split my patterns to separated strings?

The idea behind this is, (.)\\1+ helps to match any number of repeated characters at very first and this |. helps to match all the other single characters. Finally put all the matched characters into a list and then print it.
String s = "AABBABA";
ArrayList<String> fields = new ArrayList<String>();
Pattern regex = Pattern.compile("(.)\\1+|.");
Matcher m = regex.matcher(s);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA, BB, A, B, A]
By defining all the above input inside an array.
String s[] = {"AA", "ABA", "AABBABA"};
Pattern regex = Pattern.compile("(.)\\1+|.");
for(String i:s)
{
ArrayList<String> fields = new ArrayList<String>();
Matcher m = regex.matcher(i);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA]
[A, B, A]
[AA, BB, A, B, A]

I tried to follow your logics and got this code, perhaps, not what you want:
String pattern = "AABBABA"; //or whatever
String firstChar = pattern.toString().substring(1, 2);
String[] split = pattern.split("(?!" + firstChar + ")");
for (String strng : split)
{
System.console().writer().println(strng);
}
Output:
AA
B
BA
BA
Or try the Matcher:
// String to be scanned to find the pattern.
String line = "AABBABA";
String pattern1 = "(A+|B+)";
Pattern r = Pattern.compile(pattern1);
Matcher m = r.matcher(line);
int count = 0;
while(m.find())
{
count++;
System.console().writer().println(m.group(0));
}
Output:
AA
BB
A
B
A

you may capture what you want instead of split using this pattern
(A+|B+)
Demo

Related

extra space after parsing a string with regular expression

I have the following simple code:
String d = "_|,|\\.";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
The output gives
b
a
Where does the extra space come from between b and a?

You do not have an extra space, you get an empty element in the resulting array because your regex matches only 1 character, and when there are several characters from the set on end, the string is split at each of those characters.
Thus, you should match as many of those characters in your character class as possible with + (1 or more) quantifier by placing the whole expression into a non-capturing group ((?:_|,|\\.)+), or - better - using a character class [_,.]+:
String d = "(?:_|,|\\.)+"; // Or better: String d = "[_,.]+";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
See IDEONE demo

While i get puzzled my self, maybe what you want is to change your regex to
String d = "[_,\\.]+";

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?

There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG

Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo

Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.

You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

Split string without losing split character

I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?

You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"

Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]

I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]

You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]

Get string inside parenthesis Java

I want get the value inside []. For example, this string:
String str ="[D][C][B][A]Hello world!";
and I want an array which contains item DCBA, how should I do this?
Thank you in advance!

Try with regex if there is only one character inside [].
Here Matcher#group() is used that groups any matches found inside parenthesis ().
Here escape character \ is used to escape the [ and ] that is already a part of regex pattern itself.
Sample code:
String str = "[D][C][B][A]Hello world!";
List<Character> list = new ArrayList<Character>();
Pattern p = Pattern.compile("\\[(.)\\]");
Matcher m = p.matcher(str);
while (m.find()) {
list.add(m.group(1).charAt(0));
}
Character[] array = list.toArray(new Character[list.size()]);
System.out.println(Arrays.toString(array));
Try this one if there is more than one character inside []
String str = "[DD][C][B][A]Hello world!";
List<String> list = new ArrayList<String>();
Pattern p = Pattern.compile("\\[(\\w*)\\]");
Matcher m = p.matcher(str);
while (m.find()) {
list.add(m.group(1));
}
String[] array = list.toArray(new String[list.size()]);
System.out.println(Arrays.toString(array));
Pattern description
\w A word character: [a-zA-Z_0-9]
. Any character (may or may not match line terminators)
X* X, zero or more times
Read more about here JAVA Regex Pattern

This code is simple
String str ="[D][C][B][A]Hello world!";
String[] s = str.split("\\[");
StringBuilder b = new StringBuilder();
for (String o: s) {
if (o.length() > 0){
String[] s2 = o.split("\\]");
b.append(s2[0]);
}
}
char[] c = b.toString().toCharArray();
System.out.println(new String(c));

Working with a regular expression

I've a string with alpha numeric terms like below. I want to extract alphabets into an array. I've written following code.
String pro = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] p = pro.split("^([0-9].*)$");
Pattern pattern = Pattern.compile("([0-9].*)([A-z].*)");
Matcher matcher = pattern.matcher(pro.toString());
while (matcher.find())
{
System.out.println(matcher.group());
}
for(String s: p)
{
System.out.println(s);
}
System.out.println("End");
Output:
1a1a2aa3aaa4aaaa15aaaaa6aaaaaa
ENd
I even tried to use split based on regular expression, but even that is not true. I think my regular expression is wrong. I'm expecting output with all the alphabets in array.
array[] = {'a', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa'}

You could use the following which split(s) on anything except alphabetic characters.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] parts = s.split("[^a-zA-Z]+")
for (String m: parts) {
System.out.println(m);
}
Using the Matcher method, you could do the following.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
Pattern p = Pattern.compile("[a-zA-Z]+");
Matcher m = p.matcher(s);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
System.out.println(matches); // => [a, a, aa, aaa, aaaa, aaaaa, aaaaaa]

If you want only alphabet characters wouldn't make more sense to use this expression instead: /([a-zA-Z]+)/g
using ^ and $ is not something you may want in your expression because what you want instead is to match all possible matches /g
Here is an online demo:
http://regex101.com/r/fI1eB8

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to split string using regex in java [duplicate] - java

you may capture what you want instead of split using this pattern (A+|B+) Demo

Related

extra space after parsing a string with regular expression

Finding Upper Case in String Array and extracting it out

Split string without losing split character

Get string inside parenthesis Java

Working with a regular expression

Categories

Resources