How to extract uppercase substrings from a String in Java? - java

I need a piece of code with which I can extract the substrings that are in uppercase from a string in Java.
For example:
"a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]"
I need to extract CC BBBBBBB and AAAA

You can do it with String[] split(String regex). The only problem can be with empty strings, but it's easy to filter them out:
String str = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
String[] substrings = str.split("[^A-Z]+");
for (String s : substrings)
{
if (!s.isEmpty())
{
System.out.println(s);
}
}
Output:
AAAA
BBBBBBB
CC

This should demonstrate the proper syntax and method. More details can be found here http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html and http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html
String myStr = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
Pattern upperCase = Pattern.compile("[A-Z]+");
Matcher matcher = upperCase.matcher(myStr);
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group());
}
for (String s : results) {
System.out.println(s);
}
The [A-Z]+ part is the regular expression which does most of the work. There are a lot of strong regular expression tutorials if you want to look more into it.

If you want just to extract all the uppercase letter use [A-Z]+, if you want just uppercase substring, meaning that if you have lowercase letters you don't need it (HELLO is ok but Hello is not) then use \b[A-Z]+\b

I think you should do a replace all regular expression to turn the character you don't want into a delimiter, perhaps something like this:
str.replaceAll("[^A-Z]+", " ")
Trim any leading or trailing spaces.
Then, if you wish, you can call str.split(" ")

This is probably what you're looking for:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatcherDemo {
private static final String REGEX = "[A-Z]+";
private static final String INPUT = "a:[AAAA|0.1;BBBBBBB|-1.90824;CC|0.0]";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
List<String> sequences = new Vector<String>();
while(m.find()) {
sequences.add(INPUT.substring(m.start(), m.end()));
}
}
}

Related

Regex to convert set of small letters to capital letters in a String

Could someone please tell me how do I write a Regular expression which replaces all the "aeiou" chars found in my string to capital letters like "AEIOU" and vice versa?
I wanted to use replaceAll method of java String class but not sure about the regEx.
This could be the solution.
It seems to me that it has to have Java 9 to use replaceAll method.
Read this Use Java and RegEx to convert casing in a string
public class Main {
public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";
public static void main(String[] args) {
char [] chars = EXAMPLE_TEST.toCharArray(); // trasform your string in a char array
Pattern pattern = Pattern.compile("[aeiou]"); // compile your pattern
Matcher matcher = pattern.matcher(EXAMPLE_TEST); // create a matcher
while (matcher.find()) {
int index = matcher.start(); //index where match exist
chars[index] = Character.toUpperCase(chars[index]); // change char array where match
}
String s = new String(chars); // obtain your new string :-)
//ThIs Is my smAll ExAmplE strIng whIch I'm gOIng tO UsE fOr pAttErn mAtchIng.
System.out.println(s);
}
}
You can use the Pattern and Matcher class, I wrote a quick code it should be clear (subtracting 32 from an ascii alphabetical lower case char will give you its upper case, see the ascii table).
String s = "Anthony";
Pattern pattern = Pattern.compile("[aeiou]");
Matcher matcher = pattern.matcher(s);
String modifiedString = "";
while(matcher.find())
{
modifiedString = s.substring(0, matcher.start()) + (char)(s.charAt(matcher.start()) - 32) + s.substring(matcher.end());
}
System.out.println(modifiedString);

Extract values between commas without the quotation marks

Let's say I have a string such as 'John','Smith'. I want my regex to extract the values John and Smith from that string, without the commas and quotation marks. I looked around the site and found a solution that gets rid of the commas, but not the quotation marks.
This is the regex I tried (?:^|(?<=,))[^,]*
With that I get 'John' and 'Smith'. Of course, I could simply iterate over the Matcher like this and remove the quotation marks manually, but I was wondering if there's a more direct solution using regex without having to resort to replaceAll.
Pattern pat = Pattern.compile("(?:^|(?<=,))[^,]*");
Matcher matcher = pat.matcher("'John', 'Smith'");
List<String> matches = new ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group().replaceAll("'", ""));
}
The following regex will work: "[^,']+".
Below is the updated code.
public static void main(String[] args) {
String regex = "[^,']+";
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher("'John', 'Smith'");
List<String> matches = new ArrayList<>();
while (matcher.find()) {
matches.add(matcher.group());
}
System.out.println(matches);
}
Output:
[John, , Smith]
I tried this code and it gives output string without single quotes:
public class SubstringExample{
public static void main(String args[]){
String nameStr="'John','Smith'";
String newNameStr = nameStr.replaceAll("\'","");
System.out.println(newNameStr);
}}

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

Regex to exclude word from matches java code

Maybe someone could help me. I'm trying to include within a java code a regex to match all strings except the ZZ78. I'd like to know what it's missing in the regex I have.
The input string is str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78"
and I'm trying with this regex (?:(?![ZZF8]).)* but if you test in http://regexpal.com/
this regex against the string, you'll see that is not working completely.
str = new String ("ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78");
Pattern pattern = Pattern.compile("(?:(?![ZZ78]).)*");
the matched strings should be
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Update:
Hello Avinash Raj and Chthonic Project. Thanks so much for your help and solutions provided.
I originally thougth in split method, but I was trying to avoid get empty strings as result
when for example the delimiter string is at the beginning or at the end of the main string.
Then, I thought that a regex could help me to extract all except "ZZ78", avoiding in this way
empty results in the output.
Below I show the code using split method (Chthonic´s) and regex (Avinash´s) both produce empty
string if the commented "if()" conditions are not used.
Does the use of those "if()" are the only way to not print empty strings? or could be the regex
tweaked a little bit to match not empty strings?
This is the code I have tested so far:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78")) {
//if ( !s.isEmpty() ) {
System.out.println("This is a match <<" + s + ">>");
//}
}
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
//if ( !matcher.group(1).isEmpty() ) {
System.out.println("This is a match <<" + matcher.group(1) + ">>");
//}
}
}
}
**and the output (without use the "if()´s"):**
########### Matches with Split ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
##########################################
########### Matches with Regex ###########
This is a match <<>>
This is a match <<ab57cd>>
This is a match <<efghZZ7ij#klm>>
This is a match <<noCODpqr>>
This is a match <<stuvw27z#xyz>>
This is a match <<>>
Thanks for help so far.
Thanks in advance
Update #2:
Excellent both of your answers and solutions. Now it works very nice. This is the final code I've tested with both solutions.
Many thanks again.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
System.out.println("########### Matches with Split ###########");
String str = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).forEach(System.out::println);
System.out.println("##########################################");
System.out.println("########### Matches with Regex ###########");
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
}
}
And output:
########### Matches with Split ###########
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
##########################################
########### Matches with Regex ###########
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]
The easiest way to do this is as follows:
public static void main(String[] args) {
String str = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
for (String s : str.split("ZZ78"))
System.out.println(s);
}
The output, as expected, is:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
If the pattern used to split the string is at the beginning (i.e. "ZZ78" in your example code), the first element returned will be an empty string, as you have already noted. To avoid that, all you need to do is filter the array. This is essentially the same as putting an if, but you can avoid the extra condition line this way. I would do this as follows (in Java 8):
String test_str = ...; // whatever string you want to test it with
Arrays.stream(str.split("ZZ78")).filter(s -> !s.isEmpty()).foreach(System.out::println);
You must need to remove the character class since [ZZ78] matches a single charcater from the given list. (?:(?!ZZ78).)* alone won't give the match you want. Consider this ab57cdZZ78 as an input string. At first this (?:(?!ZZ78).)* matches the string ab57cd, next it tries to match the following Z and check the condition (?!ZZ78) which means match any character but not of ZZ78. So it failes to match the following Z, next the regex engine moves on to the next character Z and checks this (?!ZZ78) condition. Because of the second Z isn't followed by Z78, this Z got matched by the regex engine.
String s = "ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
ab57cd
efghZZ7ij#klm
noCODpqr
stuvw27z#xyz
Explanation:
((?:(?!ZZ78).)*) Capture any character but not of ZZ78 zero or more times.
(ZZ78|$) And also capture the following ZZ78 or the end of the line anchor into group 2.
Group index 1 contains single or group of characters other than ZZ78
Update:
String s = "ZZ78ab57cdZZ78efghZZ7ij#klmZZ78noCODpqrZZ78stuvw27z#xyzZZ78";
Pattern regex = Pattern.compile("((?:(?!ZZ78).)*)(ZZ78|$)");
Matcher matcher = regex.matcher(s);
ArrayList<String> allMatches = new ArrayList<String>();
ArrayList<String> list = new ArrayList<String>();
while(matcher.find()){
allMatches.add(matcher.group(1));
}
for (String s1 : allMatches)
if (!s1.equals(""))
list.add(s1);
System.out.println(list);
Output:
[ab57cd, efghZZ7ij#klm, noCODpqr, stuvw27z#xyz]

Working with a regular expression

I've a string with alpha numeric terms like below. I want to extract alphabets into an array. I've written following code.
String pro = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] p = pro.split("^([0-9].*)$");
Pattern pattern = Pattern.compile("([0-9].*)([A-z].*)");
Matcher matcher = pattern.matcher(pro.toString());
while (matcher.find())
{
System.out.println(matcher.group());
}
for(String s: p)
{
System.out.println(s);
}
System.out.println("End");
Output:
1a1a2aa3aaa4aaaa15aaaaa6aaaaaa
ENd
I even tried to use split based on regular expression, but even that is not true. I think my regular expression is wrong. I'm expecting output with all the alphabets in array.
array[] = {'a', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa'}
You could use the following which split(s) on anything except alphabetic characters.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
String[] parts = s.split("[^a-zA-Z]+")
for (String m: parts) {
System.out.println(m);
}
Using the Matcher method, you could do the following.
String s = "1a1a2aa3aaa4aaaa15aaaaa6aaaaaa";
Pattern p = Pattern.compile("[a-zA-Z]+");
Matcher m = p.matcher(s);
List<String> matches = new ArrayList<String>();
while (m.find()) {
matches.add(m.group());
}
System.out.println(matches); // => [a, a, aa, aaa, aaaa, aaaaa, aaaaaa]
If you want only alphabet characters wouldn't make more sense to use this expression instead: /([a-zA-Z]+)/g
using ^ and $ is not something you may want in your expression because what you want instead is to match all possible matches /g
Here is an online demo:
http://regex101.com/r/fI1eB8

Categories