Regular Expression in Java. Splitting a string using pattern and matcher - java

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance

To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo

Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four

Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases

Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

Related

Java Find Substring Inbetween Characters

I am very stuck. I use this format to read a player's name in a string, like so:
"[PLAYER_yourname]"
I have tried for a few hours and can't figure out how to read only the part after the '_' and before the ']' to get there name.
Could I have some help? I played around with sub strings, splitting, some regex and no luck. Thanks! :)
BTW: This question is different, if I split by _ I don't know how to stop at the second bracket, as I have other string lines past the second bracket. Thanks!
You can do:
String s = "[PLAYER_yourname]";
String name = s.substring(s.indexOf("_") + 1, s.lastIndexOf("]"));
You can use a substring. int x = str.indexOf('_') gives you the character where the '_' is found and int y = str.lastIndexOF(']') gives you the character where the ']' is found. Then you can do str.substring(x + 1, y) and that will give you the string from after the symbol until the end of the word, not including the closing bracket.
Using the regex matcher functions you could do:
String s = "[PLAYER_yourname]";
String p = "\\[[A-Z]+_(.+)\\]";
Pattern r = Pattern.compile(p);
Matcher m = r.matcher(s);
if (m.find( ))
System.out.println(m.group(1));
Result:
yourname
Explanation:
\[ matches the character [ literally
[A-Z]+ match a single character (case sensitive + between one and unlimited times)
_ matches the character _ literally
1st Capturing group (.+) matches any character (except newline)
\] matches the character ] literally
This solution uses Java regex
String player = "[PLAYER_yourname]";
Pattern PLAYER_PATTERN = Pattern.compile("^\\[PLAYER_(.*?)]$");
Matcher matcher = PLAYER_PATTERN.matcher(player);
if (matcher.matches()) {
System.out.println( matcher.group(1) );
}
// prints yourname
see DEMO
You can do like this -
public static void main(String[] args) throws InterruptedException {
String s = "[PLAYER_yourname]";
System.out.println(s.split("[_\\]]")[1]);
}
output: yourname
Try:
Pattern pattern = Pattern.compile(".*?_([^\\]]+)");
Matcher m = pattern.matcher("[PLAYER_yourname]");
if (m.matches()) {
String name = m.group(1);
// name = "yourname"
}

How to match on single line for regex?

I have a regex to match a line and delete it. Everything is below it (and keep everything above it).
Two Part Ask:
1) Why won't this pattern match the given String text below?
2) How can I be sure to just match on a single line and not multiple lines?
- The pattern has to be found on the same single line.
String text = "Keep this.\n\n\nPlease match junkhere this t-h-i-s is missing.\n"
+ "Everything should be deleted here but don't match this on this line" + "\n\n";
Pattern p = Pattern.compile("^(Please(\\s)(match)(\\s)(.*?)\\sthis\\s(.*))$", Pattern.DOTALL );
Matcher m = p.matcher(text);
if (m.find()) {
text = (m.replaceAll("")).replaceAll("[\n]+$", ""); // remove everything below at and below "Please match ... this"
System.out.println(text);
}
Expected Output:
Keep this.
You are complicating your life...
First, as I said in the comment, use Pattern.MULTILINE.
Then, to truncate the string from the beginning of the match, use .substring():
final Pattern p = Pattern.compile("^Please\\s+match\\b.*?this",
Pattern.MULTILINE);
final Matcher m = p.matcher(input);
return m.find() ? input.substring(0, m.start()) : input;
Remove DOTALL to make sure to match on a single line and convert \s to " "
Pattern p = Pattern.compile("^(Please( )(match)( )(.*?) this (.*))$");
DOTALL makes a dot match newlines as well
\s can match any whitespace including new lines.

pattern matching to detect special characters in a word

I am trying to identify any special characters ('?', '.', ',') at the end of a string in java. Here is what I wrote:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
}
This returns a false when it's expected to be true. Please suggest.
Use "sure?".matches(".*[.,?]").
String#matches(...) anto-anchors the regex with ^ and $, no need to add them manually.
This is your code:
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
You have 2 problems:
You're using { and } instead of character class [ and ]
You're using Matcher#matches() instead of Matcher#find. matches method matches the full input line while find performs a search anywhere in the string.
Change your code to:
Pattern pattern = Pattern.compile("[.,?]$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - " + matcher.find());
Try this
Pattern pattern = Pattern.compile(".*[.,?]");
...

Punctuation Regex in Java

First, i'm read the documentation as follow
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
And i want find any punctuation character EXCEPT #',& but i don't quite understand.
Here is :
public static void main( String[] args )
{
// String to be scanned to find the pattern.
String value = "#`~!#$%^";
String pattern = "\\p{Punct}[^#',&]";
// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
// Now create matcher object.
Matcher m = r.matcher(value);
if (m.find()) {
System.out.println("Found value: " + m.groupCount());
} else {
System.out.println("NO MATCH");
}
}
Result is NO MATCH.
Is there any mismatch ?
Thanks
MRizq
You're matching two characters, not one. Using a (negative) lookahead should solve the task:
(?![#',&])\\p{Punct}
You may use character subtraction here:
String pat = "[\\p{Punct}&&[^#',&]]";
The whole pattern represents a character class, [...], that contains a \p{Punct} POSIX character class, the && intersection operator and [^...] negated character class.
A Unicode modifier might be necessary if you plan to also match all Unicode punctuation:
String pat = "(?U)[\\p{Punct}&&[^#',&]]";
^^^^
The pattern matches any punctuation (with \p{Punct}) except #, ', , and &.
If you need to exclude more characters, add them to the negated character class. Just remember to always escape -, \, ^, [ and ] inside a Java regex character class/set. E.g. adding a backslash and - might look like "[\\p{Punct}&&[^#',&\\\\-]]" or "[\\p{Punct}&&[^#',&\\-\\\\]]".
Java demo:
String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^#',&]]";
Pattern r = Pattern.compile(pattern); // Create a Pattern object
Matcher m = r.matcher(value); // Now create matcher object.
while (m.find()) {
System.out.println("Found value: " + m.group());
}
Output:
Found value: #
Found value: !
Found value: #
Found value: %
Found value: ,

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks
If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters
Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]
^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

Categories