Working with a regular expression - java

I've a string with alpha numeric terms like below. I want to extract alphabets into an array. I've written following code.
String pro = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] p = pro.split("^([0-9].*)$");
Pattern pattern = Pattern.compile("([0-9].*)([A-z].*)");
Matcher matcher = pattern.matcher(pro.toString());
while (matcher.find())
{
System.out.println(matcher.group());
}
for(String s: p)
{
System.out.println(s);
}
System.out.println("End");
Output:
1a1a2aa3aaa4aaaa15aaaaa6aaaaaa
ENd
I even tried to use split based on regular expression, but even that is not true. I think my regular expression is wrong. I'm expecting output with all the alphabets in array.
array[] = {'a', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa'}

You could use the following which split(s) on anything except alphabetic characters.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] parts = s.split("[^a-zA-Z]+")
for (String m: parts) {
System.out.println(m);
}
Using the Matcher method, you could do the following.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
Pattern p = Pattern.compile("[a-zA-Z]+");
Matcher m = p.matcher(s);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
System.out.println(matches); // => [a, a, aa, aaa, aaaa, aaaaa, aaaaaa]

If you want only alphabet characters wouldn't make more sense to use this expression instead: /([a-zA-Z]+)/g
using ^ and $ is not something you may want in your expression because what you want instead is to match all possible matches /g
Here is an online demo:
http://regex101.com/r/fI1eB8

Related

Java Regex Matcher skipping the matches

Below is my Java code to delete all pair of adjacent letters that match, but I am getting some problems with the Java Matcher class.
My Approach
I am trying to find all successive repeated characters in the input e.g.
aaa, bb, ccc, ddd
Next replace the odd length match with the last matched pattern and even length match with "" i.e.
aaa -> a
bb -> ""
ccc -> c
ddd -> d
s has single occurrence, so it's not matched by the regex pattern and excluded from the substitution
I am calling Matcher.appendReplacement to do conditional replacement of the patterns matched in input, based on the group length (even or odd).
Code:
public static void main(String[] args) {
String s = "aaabbcccddds";
int i=0;
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("([a-z])\\1+");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(i).length()%2==0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
i++;
}
m.appendTail(output);
System.out.println(output);
}
Input : aaabbcccddds
Actual Output : aaabbcccds (only replacing ddd with d but skipping aaa, bb and ccc)
Expected Output : acds
This can be done in a single replaceAll call like this:
String repl = str.replaceAll( "(?:(.)\\1)+", "" );
Regex expression (?:(.)\\1)+ matches all occurrences of even repetitions and replaces it with empty string this leaving us with first character of odd number of repetitions.
RegEx Demo
Code using Pattern and Matcher:
final Pattern p = Pattern.compile( "(?:(.)\\1)+" );
Matcher m = p.matcher( "aaabbcccddds" );
String repl = m.replaceAll( "" );
//=> acds
You can try like that:
public static void main(String[] args) {
String s = "aaabbcccddds";
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(\\w)(\\1+)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(2).length()%2!=0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
}
m.appendTail(output);
System.out.println(output);
}
It is similar to yours but when getting just the first group you match the first character and your length is always 0. That's why I introduce a second group which is the matched adjacent characters. Since it has length of -1 I reverse the odd even logic and voila -
acds
is printed.
You don't need multiple if statements. Try:
(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)
Replace with $1
Regex live demo
Java code:
str.replaceAll("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)", "$1");
Java live demo
Regex breakdown:
(?: Start of non-capturing group
(\\w) Capture a word character
(?:\\1\\1)+ Match an even number of same character
| Or
(\\w) Capture a word character
\\2+ Match any number of same character
) End of non-capturing group
(?!\\1|\\2) Not followed by previous captured characters
Using Pattern and Matcher with StringBuffer:
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) m.appendReplacement(output, "$1");
m.appendTail(output);
System.out.println(output);

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

How to split string using regex in java [duplicate]

This question already has an answer here:
Split regex to extract Strings of contiguous characters
(1 answer)
Closed 7 years ago.
I have some string patterns. Each pattern consist of two characters "A" and "B".
My patterns are like "AA" or "ABA" or "AABBABA" or ...
I want to split these patterns and the output for these examples must be like: {"AA"} or {"A","B","A"} or {"AA","BB","A","B","A"}
What I tried so far :
String pattern = "AABBABA" //or whatever
String firstChar = pattern.toString().substring(1, 2);
String[] split = pattern.split(firstChar);
for (String string : split) {
Log.i("findPattern", "Splitted Pattern: " + string + "");
}
The problem with my code is that it removes all strings that are equal to firstChar.
What regular expression should I use to split my patterns to separated strings?
The idea behind this is, (.)\\1+ helps to match any number of repeated characters at very first and this |. helps to match all the other single characters. Finally put all the matched characters into a list and then print it.
String s = "AABBABA";
ArrayList<String> fields = new ArrayList<String>();
Pattern regex = Pattern.compile("(.)\\1+|.");
Matcher m = regex.matcher(s);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA, BB, A, B, A]
By defining all the above input inside an array.
String s[] = {"AA", "ABA", "AABBABA"};
Pattern regex = Pattern.compile("(.)\\1+|.");
for(String i:s)
{
ArrayList<String> fields = new ArrayList<String>();
Matcher m = regex.matcher(i);
while(m.find()){
fields.add(m.group(0));
}
System.out.println(fields);
}
Output:
[AA]
[A, B, A]
[AA, BB, A, B, A]
I tried to follow your logics and got this code, perhaps, not what you want:
String pattern = "AABBABA"; //or whatever
String firstChar = pattern.toString().substring(1, 2);
String[] split = pattern.split("(?!" + firstChar + ")");
for (String strng : split)
{
System.console().writer().println(strng);
}
Output:
AA
B
BA
BA
Or try the Matcher:
// String to be scanned to find the pattern.
String line = "AABBABA";
String pattern1 = "(A+|B+)";
Pattern r = Pattern.compile(pattern1);
Matcher m = r.matcher(line);
int count = 0;
while(m.find())
{
count++;
System.console().writer().println(m.group(0));
}
Output:
AA
BB
A
B
A
you may capture what you want instead of split using this pattern
(A+|B+)
Demo

Get string inside parenthesis Java

I want get the value inside []. For example, this string:
String str ="[D][C][B][A]Hello world!";
and I want an array which contains item DCBA, how should I do this?
Thank you in advance!
Try with regex if there is only one character inside [].
Here Matcher#group() is used that groups any matches found inside parenthesis ().
Here escape character \ is used to escape the [ and ] that is already a part of regex pattern itself.
Sample code:
String str = "[D][C][B][A]Hello world!";
List<Character> list = new ArrayList<Character>();
Pattern p = Pattern.compile("\\[(.)\\]");
Matcher m = p.matcher(str);
while (m.find()) {
list.add(m.group(1).charAt(0));
}
Character[] array = list.toArray(new Character[list.size()]);
System.out.println(Arrays.toString(array));
Try this one if there is more than one character inside []
String str = "[DD][C][B][A]Hello world!";
List<String> list = new ArrayList<String>();
Pattern p = Pattern.compile("\\[(\\w*)\\]");
Matcher m = p.matcher(str);
while (m.find()) {
list.add(m.group(1));
}
String[] array = list.toArray(new String[list.size()]);
System.out.println(Arrays.toString(array));
Pattern description
\w A word character: [a-zA-Z_0-9]
. Any character (may or may not match line terminators)
X* X, zero or more times
Read more about here JAVA Regex Pattern
This code is simple
String str ="[D][C][B][A]Hello world!";
String[] s = str.split("\\[");
StringBuilder b = new StringBuilder();
for (String o: s) {
if (o.length() > 0){
String[] s2 = o.split("\\]");
b.append(s2[0]);
}
}
char[] c = b.toString().toCharArray();
System.out.println(new String(c));

How to extract uppercase substrings from a String in Java?

I need a piece of code with which I can extract the substrings that are in uppercase from a string in Java.
For example:
"a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]"
I need to extract CC BBBBBBB and AAAA
You can do it with String[] split(String regex). The only problem can be with empty strings, but it's easy to filter them out:
String str = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
String[] substrings = str.split("[^A-Z]+");
for (String s : substrings)
{
if (!s.isEmpty())
{
System.out.println(s);
}
}
Output:
AAAA
BBBBBBB
CC
This should demonstrate the proper syntax and method. More details can be found here http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html and http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html
String myStr = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
Pattern upperCase = Pattern.compile("[A-Z]+");
Matcher matcher = upperCase.matcher(myStr);
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group());
}
for (String s : results) {
System.out.println(s);
}
The [A-Z]+ part is the regular expression which does most of the work. There are a lot of strong regular expression tutorials if you want to look more into it.
If you want just to extract all the uppercase letter use [A-Z]+, if you want just uppercase substring, meaning that if you have lowercase letters you don't need it (HELLO is ok but Hello is not) then use \b[A-Z]+\b
I think you should do a replace all regular expression to turn the character you don't want into a delimiter, perhaps something like this:
str.replaceAll("[^A-Z]+", " ")
Trim any leading or trailing spaces.
Then, if you wish, you can call str.split(" ")
This is probably what you're looking for:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatcherDemo {
private static final String REGEX = "[A-Z]+";
private static final String INPUT = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
List<String> sequences = new Vector<String>();
while(m.find()) {
sequences.add(INPUT.substring(m.start(), m.end()));
}
}
}

Categories