Regex How to match 2 any, but different characters - java

So I have a String String s = "4433334552223"; that I would like to split into an array, on every character change (between every pair of different of characters). String [] aRay = s.split("IDK"); I'm wanting the String array to contain {44,3333,4,55,222,3} after the split().
I know how to do it with a loop and such, but I was just wondering if there was a simple way to do this with regex??

You can use a backreference to match repeated characters:
String s = "4433334552223";
Matcher m = Pattern.compile("(.)\\1*").matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Ideone Demo

You can use the following code:
String input ="4433334552223";
final String PATTERN = "(.)(\\1*)";
Matcher m = Pattern.compile(PATTERN).matcher(input);
ArrayList<String> result = new ArrayList<String>();
while(m.find())
{
result.add(m.group(1)+m.group(2));
}
System.out.println(result.toString());
This produce the following output:
[44, 3333, 4, 55, 222, 3]

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?
You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa
Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

Java Regex Matcher skipping the matches

Below is my Java code to delete all pair of adjacent letters that match, but I am getting some problems with the Java Matcher class.
My Approach
I am trying to find all successive repeated characters in the input e.g.
aaa, bb, ccc, ddd
Next replace the odd length match with the last matched pattern and even length match with "" i.e.
aaa -> a
bb -> ""
ccc -> c
ddd -> d
s has single occurrence, so it's not matched by the regex pattern and excluded from the substitution
I am calling Matcher.appendReplacement to do conditional replacement of the patterns matched in input, based on the group length (even or odd).
Code:
public static void main(String[] args) {
String s = "aaabbcccddds";
int i=0;
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("([a-z])\\1+");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(i).length()%2==0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
i++;
}
m.appendTail(output);
System.out.println(output);
}
Input : aaabbcccddds
Actual Output : aaabbcccds (only replacing ddd with d but skipping aaa, bb and ccc)
Expected Output : acds
This can be done in a single replaceAll call like this:
String repl = str.replaceAll( "(?:(.)\\1)+", "" );
Regex expression (?:(.)\\1)+ matches all occurrences of even repetitions and replaces it with empty string this leaving us with first character of odd number of repetitions.
RegEx Demo
Code using Pattern and Matcher:
final Pattern p = Pattern.compile( "(?:(.)\\1)+" );
Matcher m = p.matcher( "aaabbcccddds" );
String repl = m.replaceAll( "" );
//=> acds
You can try like that:
public static void main(String[] args) {
String s = "aaabbcccddds";
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(\\w)(\\1+)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(2).length()%2!=0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
}
m.appendTail(output);
System.out.println(output);
}
It is similar to yours but when getting just the first group you match the first character and your length is always 0. That's why I introduce a second group which is the matched adjacent characters. Since it has length of -1 I reverse the odd even logic and voila -
acds
is printed.
You don't need multiple if statements. Try:
(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)
Replace with $1
Regex live demo
Java code:
str.replaceAll("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)", "$1");
Java live demo
Regex breakdown:
(?: Start of non-capturing group
(\\w) Capture a word character
(?:\\1\\1)+ Match an even number of same character
| Or
(\\w) Capture a word character
\\2+ Match any number of same character
) End of non-capturing group
(?!\\1|\\2) Not followed by previous captured characters
Using Pattern and Matcher with StringBuffer:
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) m.appendReplacement(output, "$1");
m.appendTail(output);
System.out.println(output);

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

Java string split() Regex

I have a string like this,
["[number][name]statement_1.","[number][name]statement_1."]
i want to get only statement_1 and statement_2. I used tried in this way,
String[] statement = message.trim().split("\\s*,\\s*");
but it gives ["[number][name]statement_1." and "[number][name]statement_2."] . how can i get only statement_1 and statement_2?
Match All instead of Splitting
Splitting and Match All are two sides of the same coin. In this case, Match All is easier.
You can use this regex:
(?<=\])[^\[\]"]+(?=\.)
See the matches in the regex demo.
In Java code:
Pattern regex = Pattern.compile("(?<=\\])[^\\[\\]\"]+(?=\\.)");
Matcher regexMatcher = regex.matcher(yourString);
while (regexMatcher.find()) {
// the match: regexMatcher.group()
}
In answer to your question to get both matches separately:
Pattern regex = Pattern.compile("(?<=\\])[^\\[\\]\"]+(?=\\.)");
Matcher regexMatcher = regex.matcher(yourString);
if (regexMatcher.find()) {
String theFirstMatch: regexMatcher.group()
}
if (regexMatcher.find()) {
String theSecondMatch: regexMatcher.group()
}
Explanation
The lookbehind (?<=\]) asserts that what precedes the current position is a ]
[^\[\]"]+ matches one or more chars that are not [, ] or "
The lookahead (?=\.) asserts that the next character is a dot
Reference
Match All and Split are Two Sides of the Same Coin
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
I somehow don't think that is your actual string, but you may try the following.
String s = "[\"[number][name]statement_1.\",\"[number][name]statement_2.\"]";
String[] parts = s.replaceAll("\\[.*?\\]", "").split("\\W+");
System.out.println(parts[0]); //=> "statement_1"
System.out.println(parts[1]); //=> "statement_2"
is the string going to be for example [50][James]Loves cake?
Scanner scan = new Scanner(System.in);
System.out.println ("Enter string");
String s = scan.nextLine();
int last = s.lastIndexOf("]")+1;
String x = s.substring(last, s.length());
System.out.println (x);
Enter string
[21][joe]loves cake
loves cake
Process completed.
Use a regex instead.
With Java 7
final Pattern pattern = Pattern.compile("(^.*\\])(.+)?");
final String[] strings = { "[number][name]statement_1.", "[number][name]statement_2." };
final List<String> results = new ArrayList<String>();
for (final String string : strings) {
final Matcher matcher = pattern.matcher(string);
if (matcher.matches()) {
results.add(matcher.group(2));
}
}
System.out.println(results);
With Java 8
final Pattern pattern = Pattern.compile("(^.*\\])(.+)?");
final String[] strings = { "[number][name]statement_1.", "[number][name]statement_2." };
final List<String> results = Arrays.stream(strings)
.map(pattern::matcher)
.filter(Matcher::matches)
.map(matcher -> matcher.group(2))
.collect(Collectors.toList());
System.out.println(results);

Working with a regular expression

I've a string with alpha numeric terms like below. I want to extract alphabets into an array. I've written following code.
String pro = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] p = pro.split("^([0-9].*)$");
Pattern pattern = Pattern.compile("([0-9].*)([A-z].*)");
Matcher matcher = pattern.matcher(pro.toString());
while (matcher.find())
{
System.out.println(matcher.group());
}
for(String s: p)
{
System.out.println(s);
}
System.out.println("End");
Output:
1a1a2aa3aaa4aaaa15aaaaa6aaaaaa
ENd
I even tried to use split based on regular expression, but even that is not true. I think my regular expression is wrong. I'm expecting output with all the alphabets in array.
array[] = {'a', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa'}
You could use the following which split(s) on anything except alphabetic characters.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] parts = s.split("[^a-zA-Z]+")
for (String m: parts) {
System.out.println(m);
}
Using the Matcher method, you could do the following.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
Pattern p = Pattern.compile("[a-zA-Z]+");
Matcher m = p.matcher(s);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
System.out.println(matches); // => [a, a, aa, aaa, aaaa, aaaaa, aaaaaa]
If you want only alphabet characters wouldn't make more sense to use this expression instead: /([a-zA-Z]+)/g
using ^ and $ is not something you may want in your expression because what you want instead is to match all possible matches /g
Here is an online demo:
http://regex101.com/r/fI1eB8

Categories