Java string split() Regex

Java string split() Regex - java

I have a string like this,
["[number][name]statement_1.","[number][name]statement_1."]
i want to get only statement_1 and statement_2. I used tried in this way,
String[] statement = message.trim().split("\\s*,\\s*");
but it gives ["[number][name]statement_1." and "[number][name]statement_2."] . how can i get only statement_1 and statement_2?

Match All instead of Splitting
Splitting and Match All are two sides of the same coin. In this case, Match All is easier.
You can use this regex:
(?<=\])[^\[\]"]+(?=\.)
See the matches in the regex demo.
In Java code:
Pattern regex = Pattern.compile("(?<=\\])[^\\[\\]\"]+(?=\\.)");
Matcher regexMatcher = regex.matcher(yourString);
while (regexMatcher.find()) {
// the match: regexMatcher.group()
}
In answer to your question to get both matches separately:
Pattern regex = Pattern.compile("(?<=\\])[^\\[\\]\"]+(?=\\.)");
Matcher regexMatcher = regex.matcher(yourString);
if (regexMatcher.find()) {
String theFirstMatch: regexMatcher.group()
}
if (regexMatcher.find()) {
String theSecondMatch: regexMatcher.group()
}
Explanation
The lookbehind (?<=\]) asserts that what precedes the current position is a ]
[^\[\]"]+ matches one or more chars that are not [, ] or "
The lookahead (?=\.) asserts that the next character is a dot
Reference
Match All and Split are Two Sides of the Same Coin
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind

I somehow don't think that is your actual string, but you may try the following.
String s = "[\"[number][name]statement_1.\",\"[number][name]statement_2.\"]";
String[] parts = s.replaceAll("\\[.*?\\]", "").split("\\W+");
System.out.println(parts[0]); //=> "statement_1"
System.out.println(parts[1]); //=> "statement_2"

is the string going to be for example [50][James]Loves cake?
Scanner scan = new Scanner(System.in);
System.out.println ("Enter string");
String s = scan.nextLine();
int last = s.lastIndexOf("]")+1;
String x = s.substring(last, s.length());
System.out.println (x);
Enter string
[21][joe]loves cake
loves cake
Process completed.

Use a regex instead.
With Java 7
final Pattern pattern = Pattern.compile("(^.*\\])(.+)?");
final String[] strings = { "[number][name]statement_1.", "[number][name]statement_2." };
final List<String> results = new ArrayList<String>();
for (final String string : strings) {
final Matcher matcher = pattern.matcher(string);
if (matcher.matches()) {
results.add(matcher.group(2));
}
}
System.out.println(results);
With Java 8
final Pattern pattern = Pattern.compile("(^.*\\])(.+)?");
final String[] strings = { "[number][name]statement_1.", "[number][name]statement_2." };
final List<String> results = Arrays.stream(strings)
.map(pattern::matcher)
.filter(Matcher::matches)
.map(matcher -> matcher.group(2))
.collect(Collectors.toList());
System.out.println(results);

Related

How to trim a string using regex?

I have this alphabet: {'faa','fa','af'}
and I have this string: "faaf"
I have this regex: "(faa|fa|af)*" which helps me match the string with the alphabet.
How do I make Java trim my string into: {fa,af}, which is the correct way to write the string: "faaf" based on my alphabet?
here is my code:
String regex = "(faa|fa|af)*";
String str = "faaf";
boolean isMatch = Pattern.matches(regex, str);
if(isMatch)
{
//trim the string
while(str.length()!=0)
{
Pattern pattern = Pattern.compile("^(faa|fa|af)(faa|fa|af)*$");
Matcher mc = pattern.matcher(str);
if (mc.find())
{
String l =mc.group(1);
alphabet.add(l);
str = str.substring(l.length());
System.out.println("\n"+ l);
}
}
}
Thanks to Aaron who helped me with this problem.

You need a loop.
Pattern pattern = Pattern.compile(regex + "*");
LinkedList<String> parts = new LinkedList<>();
while (!str.isEmpty()) {
Matcher m = pattern.matcher(str);
if (!m.matches()) { // In the first loop step.
break;
}
parts.addFirst(m.group(1)); // The last repetition matching group.
str = str.substring(0, m.start(1));
}
String result = parts.stream().collect(Collectors.joining(", ", "{", "}"));
This utilizes that a match (X)+ will yield in m.group(1) the last occurrence's value of X.
Unfortunately the regex module does not provide a bored-open matches, such as the overloaded replaceAll with a lambda working on a single MatchResult.
Note that matches applies to the entire string.

How can I grep same format substrings within a long string by java regular expression?

For example, I want to grep both /css/screen/shared/styles.css and /css/screen/nol/styles.css from this long string:
#import "/css/screen/shared/styles.css";
#import "/css/screen/nol/styles.css";
Note that this long string contains 2 lines, it should look like this in java code:
String sentence = "#import \"/css/screen/nol/styles.css\";\n#import \"/css/screen/shared/styles.css\";";
So far, I have:
"#import\\s\"(.*?)\";\n"
it only identifies the "/css/screen/shared/styles.css", but ignores the "/css/screen/nol/styles.css".
Here is my code:
public static String getImportCSS(String sentence){
String result = "";
if(sentence.length() == 0) return null;
if(sentence.indexOf("#import ") != -1){
Pattern regex = Pattern.compile("#import\\s\"(.*)\";");
Matcher regexMatcher = regex.matcher(sentence);
if(regexMatcher.find()){
for(int i = 0; i <= regexMatcher.groupCount(); i++){
result = regexMatcher.group(1);
}
}
return result;
}
return null;
}
What am I doing wrong here? Thanks!

You cannot match the second string because your regex has an LF (\n) at the end.
Remove it, and the pattern will find both the strings. However, I'd advise to use a negated character class [^"]* (zero or more characters other than a ") rather than a lazy dot matching since the strings should not contain a double quote:
#import\s*\"([^"]*)\";
See the regex demo
Java demo:
String str = "#import \"/css/screen/shared/styles.css\";\n#import \"/css/screen/nol/styles.css\";";
Pattern ptrn = Pattern.compile("#import\\s*\"([^\"]*)\";");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?

There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG

Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo

Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.

You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

Java regular expression to validate and extract some values

I want to extract all three parts of the following string in Java
MS-1990-10
The first part should always be 2 letters (A-Z)
The second part should always be a year
The third part should always be a number
Does anyone know how can I do that using Java's regular expressions?

You can do this using java's pattern matcher and group syntax:
Pattern datePatt = Pattern.compile("([A-Z]{2})-(\\d{4})-(\\d{2})");
Matcher m = datePatt.matcher("MS-1990-10");
if (m.matches()) {
String g1 = m.group(1);
String g2 = m.group(2);
String g3 = m.group(3);
}

Use Matcher's group so you can get the patterns that actually matched.
In Matcher, the matches inside parenthesis will be captured and can be retrieved via the group() method. To use parenthesis without capturing the matches, use the non-capturing parenthesis (?:xxx).
See also Pattern.
public static void main(String[] args) throws Exception {
String[] lines = { "MS-1990-10", "AA-999-12332", "ZZ-001-000" };
for (String str : lines) {
System.out.println(Arrays.toString(parse(str)));
}
}
private static String[] parse(String str) {
String regex = "";
regex = regex + "([A-Z]{2})";
regex = regex + "[-]";
// regex = regex + "([^0][0-9]+)"; // any year, no leading zero
regex = regex + "([12]{1}[0-9]{3})"; // 1000 - 2999
regex = regex + "[-]";
regex = regex + "([0-9]+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (!matcher.matches()) {
return null;
}
String[] tokens = new String[3];
tokens[0] = matcher.group(1);
tokens[1] = matcher.group(2);
tokens[2] = matcher.group(3);
return tokens;
}

This is a way to get all 3 parts with a regex:
public class Test {
public static void main(String... args) {
Pattern p = Pattern.compile("([A-Z]{2})-(\\d{4})-(\\d{2})");
Matcher m = p.matcher("MS-1990-10");
m.matches();
for (int i = 1; i <= m.groupCount(); i++)
System.out.println(m.group(i));
}
}

String rule = "^[A-Z]{2}-[1-9][0-9]{3}-[0-9]{2}";
Pattern pattern = Pattern.compile(rule);
Matcher matcher = pattern.matcher(s);
regular matches year between 1000 ~ 9999, u can update as u really need.

Extract every complete word that contains a certain substring

I'm trying to write a function that extracts each word from a sentence that contains a certain substring e.g. Looking for 'Po' in 'Porky Pork Chop' will return Porky Pork.
I've tested my regex on regexpal but the Java code doesn't seem to work. What am I doing wrong?
private static String foo()
{
String searchTerm = "Pizza";
String text = "Cheese Pizza";
String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b";
Pattern pattern = Pattern.compile ( sPattern );
Matcher matcher = pattern.matcher ( text );
if(matcher.find ())
{
String result = "-";
for(int i=0;i < matcher.groupCount ();i++)
{
result+= matcher.group ( i ) + " ";
}
return result.trim ();
}else
{
System.out.println("No Luck");
}
}

In Java to pass \b word boundaries to regex engine you need to write it as \\b. \b represents backspace in String object.
Judging by your example you want to return all words that contains your substring. To do this don't use for(int i=0;i < matcher.groupCount ();i++) but while(matcher.find()) since group count will iterate over all groups in single match, not over all matches.
In case your string can contain some special characters you probably should use Pattern.quote(searchTerm)
In your code you are trying to find "Pizza" in "Cheese Pizza" so I assume that you also want to find strings that same as searched substring. Although your regex will work fine for it, you can change your last part (.+?)?) to \\w* and also add \\w* at start if substring should also be matched in the middle of word (not only at start).
So your code can look like
private static String foo() {
String searchTerm = "Pizza";
String text = "Cheese Pizza, Other Pizzas";
String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b";
StringBuilder result = new StringBuilder("-").append(searchTerm).append(": ");
Pattern pattern = Pattern.compile(sPattern);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
result.append(matcher.group()).append(' ');
}
return result.toString().trim();
}

While the regex approach is certainly a valid method, I find it easier to think through when you split the words up by whitespace. This can be done with String's split method.
public List<String> doIt(final String inputString, final String term) {
final List<String> output = new ArrayList<String>();
final String[] parts = input.split("\\s+");
for(final String part : parts) {
if(part.indexOf(term) > 0) {
output.add(part);
}
}
return output;
}
Of course it is worth nothing that doing this will effectively be doing two passes through your input String. The first pass to find the characters that are whitespace to split on, and the second pass looking through each split word for your substring.
If one pass is necessary though, the regex path is better.

I find nicholas.hauschild's answer to be the best.
However if you really wanted to use regex, you could do it as such:
String searchTerm = "Pizza";
String text = "Cheese Pizza";
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm)
+ "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Pizza

The pattern should have been
String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b";
You want to capture the whole (pizza)string.?: ensures you don't capture a part of the string twice.

Try this pattern:
String searchTerm = "Po";
String text = "Porky Pork Chop oPod zzz llPo";
Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+");
Matcher m = p.matcher(myString);
while(m.find()) {
System.out.println(">> " + m.group());
}

Ok, I give you a pattern in raw style (not java style, you must double escape yourself):
(?i)\b[a-z]*po[a-z]*\b
And that's all.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java string split() Regex - java

Related

How to trim a string using regex?

How can I grep same format substrings within a long string by java regular expression?

Finding Upper Case in String Array and extracting it out

Java regular expression to validate and extract some values

Extract every complete word that contains a certain substring

Categories

Resources