Regex expression to capture only words without numbers or symbols - java

I need some regex that given the following string:
"test test3 t3st test: word%5 test! testing t[st"
will match only words in a-z chars:
Should match: test testing
Should not match: test3 t3st test: word%5 test! t[st
I have tried ([A-Za-z])\w+ but word%5 should not be a match.

You may use
String patt = "(?<!\\S)\\p{Alpha}+(?!\\S)";
See the regex demo.
It will match 1 or more letters that are enclosed with whitespace or start/end of string locations. Alternative pattern is either (?<!\S)[a-zA-Z]+(?!\S) (same as the one above) or (?<!\S)\p{L}+(?!\S) (if you want to also match all Unicode letters).
Details:
(?<!\\S) - a negative lookbehind that fails the match if there is a non-whitespace char immediately to the left of the current location
\\p{Alpha}+ - 1 or more ASCII letters (same as [a-zA-Z]+, but if you use a Pattern.UNICODE_CHARACTER_CLASS modifier flag, \p{Alpha} will be able to match Unicode letters)
(?!\\S) - a negative lookahead that fails the match if there is a non-whitespace char immediately to the right of the current location.
See a Java demo:
String s = "test test3 t3st test: word%5 test! testing t[st";
Pattern pattern = Pattern.compile("(?<!\\S)\\p{Alpha}+(?!\\S)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
Output: test and testing.

Try this
Pattern tokenPattern = Pattern.compile("[\\p{L}]+");
[\\p{L}]+ this prints group of letters

Related

Split string using multiple patterns, where second pattern matches smaller parts of the first

I'm reading special "formatting codes" in a string and am trying to split the string so that I have those formatting codes and the string's text separated.
There are two "types" of formatting codes: "Encoded" hex colors: §x§7§3§7§5§f§f and other codes in the format of §r.
Given the example string: §x§7§3§7§5§f§f§ltest1 §rtest2
I need the larger pattern split as a whole, and then the smaller ones. I can do what I want on those patterns separately, but am having trouble combining them into a single regex. Because the second pattern matches pieces of the first pattern, it's just splitting everything into smaller groups.
I'm trying this:
for (String substr : "§x§7§3§7§5§f§f§ltest1 §rtest2".split("((?<=(§x(§[0-9a-f]){6}))|(?<=§[0-9a-z])|(?=§[0-9a-z]))")) {
System.out.println(substr);
}
My expected output is:
§x§7§3§7§5§f§f
§l
test1
§r
test
My actual output is:
§x
§7
§3
§7
§5
§f
§f
§l
test1
§r
test2
When I split the expressions up into different split tests, they work, they're just not working together.
Instead of splitting, you could just use this simplified regex for matching:
§x(?:§[0-9a-f]){6}|§[0-9a-z]|[^§\s]+
RegEx Demo
RegEx Details:
§x(?:§[0-9a-f]){6}: Match text starting with §x and 6 hex characters
|: OR
§[0-9a-z]: Match text starting with § and an alphanumeric
|: OR
[^§\s]+: Match 1+ non-whitespace and non-§ characters
Code:
final String regex = "§x(?:§[0-9a-f]){6}|§[0-9a-z]|[^§\\s]+";
final String string = "§x§7§3§7§5§f§f§ltest1 §rtest2";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println( matcher.group(0) );
}
You can use the following regex:
See it working here
?((?:§[^§])(?=[^§])|[^§ ]{2,})
How it works:
? optionally match the space character
((?:§[^§])(?=[^§])|[^§ ]{2,}) capture either of the following:
(?:§[^§])(?=[^§]) match the following:
(?:§[^§]) match § followed by any character except §
(?=[^§]) lookahead ensuring what follows is not § (same as (?!§) but more efficient)
[^§ ]{2,} match any character except § or space two or more times
With the substitution of \n$1
Result:
§x§7§3§7§5§f§f
§l
test1
§r
test2

Regex java a regular expression for extraction the first alphabetical caracters

How to extract first alphabetical characters in Java, for example after applying regex on the string "ABD123EZ13 I should get "ABD", Is this possible, I searched for a while and didn't find any thing.
I find this regex :
String firstThreeCharacters = str.replaceAll("(?i)^[^a-z]*([a-z])[^a-z]*([a-z])[^a-z]*([a-z]).*$", "$1$2$3")
To extract the first n caracters, but it doesn't check if a th caracters are alphabetical or not.
Other Examples:
"AAAA" => "AAAA"
"1231" => ""
"_abvbv" => ""
"abd_12df" => "abd"
You may use
String result = s.replaceFirst("(?s)\\P{L}.*", "");
See the regex demo
Details
(?s) - a Pattern.DOTALL modifier to make . match line break cahrs
\\P{L} - any char other than a Unicode letter
.* - any 0+ chars, up to the end of the string.
You do not need replaceAll since there will be only 1 replacement operation, replaceFirst is fine.
If you only need to only handle ASCII letters, replace \\P{L}, replace with \\P{Alpha} that only matches any chars other than ASCII letters.
Probably a matching approach will be easiest with ^\p{L}+ or ^\p{Alpha}+ patterns that match 1 or more letters from the start of the string only:
String s = "abd_12df";
Pattern pattern = Pattern.compile("^\\p{L}+"); // or just Pattern.compile("^[a-zA-Z]+") to get the first one or more ASCII letters
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(0));
}
See the Java demo.

Regex + Java - how to capture trailing numbers and everything else

i'm trying to capture 2 things in a String "T3st12345"
I want to capture the trailing numbers ("12345") and also the name of the test "T3st".
This is what I have right now to match the trailing numbers with java's Matcher library:
Pattern pattern = Pattern.compile("([0-9]*$)");
Matcher matcher = pattern.matcher("T3st12345");
but it returns "no match found".
How can I make this work for the trailing numbers and how do I capture the name of the test as well?
You can use this regex with 2 captured groups:
^(.*?)(\d+)$
RegEx Demo
RegEx Breakup:
^: Start
(.*?): Captured group #1 that matches zero of any character (lazy)
(\d+): Captured group #1 that matches one or more digits before End
$: End
You may use the following regex:
Pattern pattern = Pattern.compile("(\\p{Alnum}+?)([0-9]*)");
Matcher matcher = pattern.matcher("T3st12345");
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
The (\\p{Alnum}+?)([0-9]*) pattern is used in the .matches() method (to require a full string match) and matches and captures into Group 1 one or more alphanumeric chars, as few as possible (+? is a lazy quantifier), and captures into Group 2 any zero or more digits.
Note that \\p{Alnum} can be replaced with a more explicit [a-zA-Z0-9].

Java Regex match between string and a space

I have the following string in Java.
ActivityRecord{7615a77 u0 com.example.grano.example_project/.MainActivity t20}
My need is to get the string MainActivity, ie the part between the ./ and the space after the word.
So basically I'm looking for a regular expression able to catch something in the middle of given characters and a white space.
You can use the expression:
(?<=\/\.)\w+?(?=\s)
Broken down:
(?<= \/\. )
^ lookbehind
^ for a literal / followed by a literal .
\w +?
^ word character
^ one or more (non-greedy)
(?= \s )
^ lookahead
^ a whitespace character
Test it here.
Assuming your proceeding text doesn't have / in it and the text you want to isolate doesn't have a space in it, you can use this
replaceAll("^[^/]*/\\.([^ ]*).*$","$1"));
which looks from the start for the first /, then /., then captures everything up to the first space from that point, and then matches everything else, and replaces it all with the capture
You can use a regex like this /\.(.*?)\s with pattern like :
String str = ...;
String regex = "/\\.(.*?)\\s";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(1));
//-------------------------------^-----get the group (.*?) between '/.' and 'space'
}
Output
MainActivity

Regex - How to recognize a String + white spaces + String

I need to recognize some pattern which goes like this:
[letters][some spaces][letters]
What I done so far is this:
String regex = "[a-zA-Z]\\s+[a-zA-Z]";
As per the requirement, you wrote letters (with a s at the end).
[letters][some spaces][letters]
So to do that you must be quantifying the character class as
String regex = "[a-zA-Z]+\\s+[a-zA-Z]+";
[a-zA-Z]+ Matches one or more letters. Here + is the quantifier which quantifies [a-zA-Z] One or more times.
Regex Demo
Where as if you write [a-zA-Z]\\s+[a-zA-Z], it would only match a single character before and after the space.
Regex Demo
If you want the entire string to follow this pattern, you must be adding anchors as well to the pattern as
String regex = "^[a-zA-Z]+\\s+[a-zA-Z]+$";
^ Anchors the regex at the start of the string.
$ Anchors the regex at the end of the string.
These anchors ensure that immediatly following start of string, ^ number of letters occure, [a-zA-Z]+ followed by space and again letters. The second group of letters is followed by end of string $

Categories