Regex + Java - how to capture trailing numbers and everything else - java

i'm trying to capture 2 things in a String "T3st12345"
I want to capture the trailing numbers ("12345") and also the name of the test "T3st".
This is what I have right now to match the trailing numbers with java's Matcher library:
Pattern pattern = Pattern.compile("([0-9]*$)");
Matcher matcher = pattern.matcher("T3st12345");
but it returns "no match found".
How can I make this work for the trailing numbers and how do I capture the name of the test as well?

You can use this regex with 2 captured groups:
^(.*?)(\d+)$
RegEx Demo
RegEx Breakup:
^: Start
(.*?): Captured group #1 that matches zero of any character (lazy)
(\d+): Captured group #1 that matches one or more digits before End
$: End

You may use the following regex:
Pattern pattern = Pattern.compile("(\\p{Alnum}+?)([0-9]*)");
Matcher matcher = pattern.matcher("T3st12345");
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
The (\\p{Alnum}+?)([0-9]*) pattern is used in the .matches() method (to require a full string match) and matches and captures into Group 1 one or more alphanumeric chars, as few as possible (+? is a lazy quantifier), and captures into Group 2 any zero or more digits.
Note that \\p{Alnum} can be replaced with a more explicit [a-zA-Z0-9].

Related

Regex to find a given number of characters after last underscore

I need to find two characters after the last underscore in given filename.
Example string:
sample_filename_AB12123321.pdf
I am using [^_]*(?=\.pdf), but it finds all the characters after the underscore like AB12123321.
I need to find the first two characters AB only.
Moreover, there is no way to access the code, I can only modify the regex pattern.
If you want to solve the problem using a regex you may use:
(?<=_)[^_]{2}(?=[^_]*$)
See regex demo.
Details
(?<=_) - an underscore must appear immediately to the left of the current position
[^_]{2} - Capturing group 1: any 2 chars other than underscore
(?=[^_]*$) - immediately to the left of the current position, there must appear any 0+ chars other than underscore and then an end of string.
See the Java demo:
String s = "sample_filename_AB12123321.pdf";
Pattern pattern = Pattern.compile("(?<=_)[^_]{2}(?=[^_]*$)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(0));
}
Output: AB.

Regex to search multiple numbers - Numbers within number - Same pattern [duplicate]

Is there a regular expression that will capture all instances of an expression, regardless of whether or not they overlap?
E.g. in /abc/def/ghi if I want to capture all strings beginning with /. The regex (/.*) only returns the entire string, but I'd want it to match on /def/ghi and /ghi as well.
Sure, match an empty string and place a look-ahead after it that captures /.* in a capturing group:
Matcher m = Pattern.compile("(?=(/.*))").matcher("/abc/def/ghi");
while(m.find()) {
System.out.println(m.group(1));
}
would print:
/abc/def/ghi
/def/ghi
/ghi

Regex: How to detect a pattern if an undesired sub pattern comes before the pattern?

I'm new to regex and I'm trying to use Java to detect a sequence of either: lowercase, uppercase, or digits, but not JUST digits separated by periods.
Restriction: No consecutive periods.
The sample String I have is: ###951.324.1###foo1.bar2.123proccess.this.subString
I currently have the following regex: ((\p{Alnum})+\.)+(\p{Alnum})+
I'm trying to have the pattern recognize foo1.bar2.123proccess.this.subString but my regex gives me 951.324.1 since it's a sub-pattern of the pattern I defined.
How would I go about detecting the subString foo1.bar2.123proccess.this.subString
I would imagine the general nature would be: The entire returned String should have at least 1 lowercase or uppercase char, but I'm hopelessly confused on how I would detect that in the String.
[a-zA-Z\d.]*[a-zA-Z][a-zA-Z\d.]*
This can be split into 3 parts:
[a-zA-Z\d.]* // optional sequence of letters/numbers/dots
[a-zA-Z] // MUST have a letter
[a-zA-Z\d.]* // optional sequence of letters/numbers/dots
Basically, "sandwiching" things that are required in optional things.
Try it here: https://regex101.com/r/VT4t2x/1
You may use
String rx = "\\d+(?:\\.\\d+)+|(\\p{Alnum}+(?:\\.\\p{Alnum}+)+)";
See the regex demo (pattern adjusted since regex101 does not support Java POSIX character class syntax)
The point is to match and skip dot-separated digit chunks, and only match and capture what you need. See Java demo:
String s = "###951.324.1###abc.123";
String rx = "\\d+(?:\\.\\d+)+|(\\p{Alnum}+(?:\\.\\p{Alnum}+)+)";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
if (matcher.group(1) != null) {
System.out.println(matcher.group(1));
}
} // => abc.123

Regex expression to capture only words without numbers or symbols

I need some regex that given the following string:
"test test3 t3st test: word%5 test! testing t[st"
will match only words in a-z chars:
Should match: test testing
Should not match: test3 t3st test: word%5 test! t[st
I have tried ([A-Za-z])\w+ but word%5 should not be a match.
You may use
String patt = "(?<!\\S)\\p{Alpha}+(?!\\S)";
See the regex demo.
It will match 1 or more letters that are enclosed with whitespace or start/end of string locations. Alternative pattern is either (?<!\S)[a-zA-Z]+(?!\S) (same as the one above) or (?<!\S)\p{L}+(?!\S) (if you want to also match all Unicode letters).
Details:
(?<!\\S) - a negative lookbehind that fails the match if there is a non-whitespace char immediately to the left of the current location
\\p{Alpha}+ - 1 or more ASCII letters (same as [a-zA-Z]+, but if you use a Pattern.UNICODE_CHARACTER_CLASS modifier flag, \p{Alpha} will be able to match Unicode letters)
(?!\\S) - a negative lookahead that fails the match if there is a non-whitespace char immediately to the right of the current location.
See a Java demo:
String s = "test test3 t3st test: word%5 test! testing t[st";
Pattern pattern = Pattern.compile("(?<!\\S)\\p{Alpha}+(?!\\S)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
Output: test and testing.
Try this
Pattern tokenPattern = Pattern.compile("[\\p{L}]+");
[\\p{L}]+ this prints group of letters

Regex to find integer or decimal from a string in java in a single group?

I am trying (\d+|\d+\.\d+) on this sample string:
Oats 124 0.99 V 1.65
but it is giving me decimal number in different groups when I am using pattern matcher classes in Java.
I want my answers in a single group.
You don't need to have a separate patterns for integer and floating point numbers. Just make the decimal part as optional and you could get both type of numbers from a single group.
(\d+(?:\.\d+)?)
Use the above pattern and get the numbers from group index 1.
DEMO
Code:
String s = "Oats 124 0.99 V 1.65";
Pattern regex = Pattern.compile("(\\d+(?:\\.\\d+)?)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
124
0.99
1.65
Pattern explanation:
() capturing group .
\d+ matches one or more digits.
(?:) Non-capturing group.
(?:\.\d+)? Matches a dot and the following one or more digits. ? after the non-capturing group makes the whole non-capturing group as optional.
OR
Your regex will also work only if you change the order of the patterns.
(\d+\.\d+|\d+)
DEMO
Try this pattern:
\d+(?:\.\d+)?
Edit:
\d+ match 1 or more digit
(?: non capturing group (optional)
\. '.' character
\d+ 1 or more digit
)? Close non capturing group
Question not entirely clear, but the first problem I see is . is a magic character in regex meaning any character. You need to escape it with as . There are lots of regex cheat sheets out there, for example JavaScript Regex Cheatsheet
(\d+|\d+\.\d+)

Categories