Regex to match words of a certain length - java

I would like to know the regex to match words such that the words have a maximum length.
for eg, if a word is of maximum 10 characters in length, I would like the regex to match, but if the length exceeds 10, then the regex should not match.
I tried
^(\w{10})$
but that brings me matches only if the minimum length of the word is 10 characters. If the word is more than 10 characters, it still matches, but matches only first 10 characters.

I think you want \b\w{1,10}\b. The \b matches a word boundary.
Of course, you could also replace the \b and do ^\w{1,10}$. This will match a word of at most 10 characters as long as its the only contents of the string. I think this is what you were doing before.
Since it's Java, you'll actually have to escape the backslashes: "\\b\\w{1,10}\\b". You probably knew this already, but it's gotten me before.

^\w{0,10}$ # allows words of up to 10 characters.
^\w{5,}$ # allows words of more than 4 characters.
^\w{5,10}$ # allows words of between 5 and 10 characters.

Length of characters to be matched.
{n,m} n <= length <= m
{n} length == n
{n,} length >= n
And by default, the engine is greedy to match this pattern. For example, if the input is 123456789, \d{2,5} will match 12345 which is with length 5.
If you want the engine returns when length of 2 matched, use \d{2,5}?

Method 1
Word boundaries would work perfectly here, such as with:
\b\w{3,8}\b
\b\w{2,}
\b\w{,10}\b
\b\w{5}\b
RegEx Demo 1
Java
Some languages such as Java and C++ are double-escape required:
\\b\\w{3,8}\\b
\\b\\w{2,}
\\b\\w{,10}\\b
\\b\\w{5}\\b
PS: \\b\\w{,10}\\b may not work for all languages or flavors.
Test 1
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "\\b\\w{3,8}\\b";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 1
Full match: words
Full match: with
Full match: length
Full match: three
Full match: eight
Method 2
Another good-to-know method is to use negative lookarounds:
(?<!\w)\w{3,8}(?!\w)
(?<!\w)\w{2,}
(?<!\w)\w{,10}(?!\w)
(?<!\w)\w{5}(?!\w)
Java
(?<!\\w)\\w{3,8}(?!\\w)
(?<!\\w)\\w{2,}
(?<!\\w)\\w{,10}(?!\\w)
(?<!\\w)\\w{5}(?!\\w)
RegEx Demo 2
Test 2
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "(?<!\\w)\\w{1,10}(?!\\w)";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 2
Full match: words
Full match: with
Full match: length
Full match: three
Full match: to
Full match: eight
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Even, I was looking for the same regex but I wanted to include the all special character and blank spaces too. So here is the regex for that:
^[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{0,10}$

Simple, complete and tested java code, for finding words of certain length n:
int n = 10;
String regex = "\\b\\w{" + n + "}\\b";
String str = "Hello, this is a test 1234567890";
ArrayList<String> words = new ArrayList<>();
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
words.add(matcher.group(0));
}
System.out.println(words);
For more explanations and different options - see other answers.

Liked Pardeep's answer but I needed whole word bounds in a string/title that can be any messed up string an advertising dept. can think up .
**\b\w(**[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22}**)\b**
should iterate through a string ( tested notepad++ ) and get the largest group of words in the range i.e. 1,22 chars here without splitting mid word.
Here was the final command for me in python to add some LF's
name = re.sub(r"\b(\w[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22})\b","\\\1\\\n",name)

Related

RegEx for capturing special chars

I am trying to replace a string using regular expression what i need basically is to convert a code like assignment:
k*=i
into
k=k+i
In my example:
jregex.Pattern p=new jregex.Pattern("([a-z]|[A-Z])([a-z]|[A-Z]|\\d)*[\\+|\\*|\\-|\\/][=]([a-z]|[A-Z])*([a-z]|[A-Z]|\\d)");
Replacer r= new Replacer(p,"1=$1,2=$2,3=$3,4=$4,5=$5,6=$6,7=$7,8=$8");
String result=r.replace("k*=i");
The regex seems to not extract the special chars.
(in this example: +, -, *, /, =)
So what I get as result is:
1=k,2=,3=,4=i,5=,6=,7=,8=
(I can extract only the k & i)
How do I solve this problem?
Here, we can design as expression similar to:
(.+)[*+-/]=(.+)
where we are capturing our k and i using these two capturing groups in the start and end:
(.+)
We can add more boundaries, if we wish, such as start and end char:
^(.+)[*+-/]=(.+)$
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(.+)[*+-/]=(.+)";
final String string = "k*=i\n"
+ "apple*=orange";
final String subst = "$1=$1+$2";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
DEMO
RegEx Circuit
jex.im visualizes regular expressions:
You could use 3 capturing groups and capturing *+/- in a character class.
([a-zA-Z])([*+/-])=([a-zA-Z])
That will match:
([a-zA-Z]) Capture group 1, match a-z A-Z
([*+/-]) Capture group 2, match * + / -
= Match literally
([a-zA-Z]) Capture group 3, match a-z A-Z
Regex demo | Java demo
And replace with:
$1=$1$2$3

Regular expression to split between pipes except in brackets

I have the following text line:
|random|[abc|www.abc.org]|1024|
I would like to split these into 3 parts with a regular expression
random
[abc|www.abc.org]
1024
Currently the following result is achieved with expression \|
random
[abc
www.abc.org]
1024
My problem is that I cannot exclude the pipe symbol in the middle column surrounded by the brackets [].
If you have to use split, you can use the regex
\|(?=$|[^]]+\||\[[^]]+\]\|)
https://regex101.com/r/7OxmiY/1
It will match a pipe, then lookahead for either:
$, the end of the string, so that the final | is split on, or
[^]]+\|, non-] characters until a pipe is reached, ensuring that pipes inside []s will not be split upon, or
\[[^]]+\]\| - Same as above, except with literal [ and ]s surrounding the pattern
In Java:
String input = "|random|[abc|www.abc.org]|[test]|1024|";
String[] output = input.split("\\|(?=$|[^]]+\\|)");
You can use follow code:
final String regex = "(?<=|)\\[?[\\w.]+\\|?[\\w.]+\\]?(?=|)";
final String string = "|random|[abc|www.abc.org]|[test]|1024|";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
Output:
Full match: random
Full match: [abc|www.abc.org]
Full match: [test]
Full match: 1024
See here at regex101: https://regex101.com/r/Fcb3Wx/1

Find ALL matches of a regex pattern in Java - even overlapping ones [duplicate]

This question already has answers here:
Matcher not finding overlapping words?
(4 answers)
Closed 4 years ago.
I have a String of the form:
1,2,3,4,5,6,7,8,...
I am trying to find all substrings in this string that contain exactly 4 digits. For this I have the regex [0-9],[0-9],[0-9],[0-9]. Unfortunately when I try to match the regex against my String, I never obtain all the substrings, only a part of all the possible substrings. For instance, in the example above I would only get:
1,2,3,4
5,6,7,8
although I expect to get:
1,2,3,4
2,3,4,5
3,4,5,6
...
How would I go about finding all matches corresponding to my regex?
for info, I am using Pattern and Matcher to find the matches:
Pattern pattern = Pattern.compile([0-9],[0-9],[0-9],[0-9]);
Matcher matcher = pattern.matcher(myString);
List<String> matches = new ArrayList<String>();
while (matcher.find())
{
matches.add(matcher.group());
}
By default, successive calls to Matcher.find() start at the end of the previous match.
To find from a specific location pass a start position parameter to find of one character past the start of the previous find.
In your case probably something like:
while (matcher.find(matcher.start()+1))
This works fine:
Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");
public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}
printing
0,1,2,3
1,2,3,4
...
If you are looking for a pure regex based solution then you may use this lookahead based regex for overlapping matches:
(?=((?:[0-9],){3}[0-9]))
Note that your matches are available in captured group #1
RegEx Demo
Code:
final String regex = "(?=((?:[0-9],){3}[0-9]))";
final String string = "0,1,2,3,4,5,6,7,8,9";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Code Demo
output:
0,1,2,3
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
6,7,8,9
Some sample code without regex (since it seems not useful to me). Also I would assume regex to be slower in this case. Yet it will only work as it is as long as the numbers are only 1 character long.
String s = "a,b,c,d,e,f,g,h";
for (int i = 0; i < s.length() - 8; i+=2) {
System.out.println(s.substring(i, i + 7));
}
Ouput for this string:
a,b,c,d
b,c,d,e
c,d,e,f
d,e,f,g
As #OldCurmudgeon pointed out, find() by default start looking from the end of the previous match. To position it right after the first matched element, introduce the first matched region as a capturing group, and use it's end index:
Pattern pattern = Pattern.compile("(\\d,)\\d,\\d,\\d");
Matcher matcher = pattern.matcher("1,2,3,4,5,6,7,8,9");
List<String> matches = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
start = matcher.end(1);
matches.add(matcher.group());
}
System.out.println(matches);
results in
[1,2,3,4, 2,3,4,5, 3,4,5,6, 4,5,6,7, 5,6,7,8, 6,7,8,9]
This approach would also work if your matching region is longer than one digit

Java Regular Expression to check for fixed length and more

I am not even sure if regular expressions are the best way to do this. Here is the requirement on a string:
To check length is 13 characters
First and Last 2 characters are always characters only.
Characters from 3 - 11 are numeric.
Please suggest whether regular expression is the best way to do it and what the regular expression would like to check such a thing?
Regards
Akhil
Use e.g.
"^[a-z]{2}[0-9]{9}[a-z]{2}$"
The square brackets say what is allowed, 'a-z' means small alphabetics between a and z. The curly says how many must be there. ^ means no characters before this, and $ means no characters after.
Usage:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatcherExample {
public static void main(String[] args) {
String text = "aa123456789bb";
String patternString = "^[a-z]{2}[0-9]{9}[a-z]{2}$";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
boolean matches = matcher.matches();
System.out.println("Matches: " + matches);
}
}

Regex to replace a repeating string pattern

I need to replace a repeated pattern within a word with each basic construct unit. For example
I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.
I am trying to do it in Java with replaceAll method.
I think you want this (works for any length of the repeated string):
String result = source.replaceAll("(.+)\\1+", "$1")
Or alternatively, to prioritize shorter matches:
String result = source.replaceAll("(.+?)\\1+", "$1")
It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.
Example
String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";
System.out.println(source.replaceAll("(.+?)\\1+", "$1"));
// HEY dude what's up? Trolo ye .0
You had better use a Pattern here than .replaceAll(). For instance:
private static final Pattern PATTERN
= Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");
//...
final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");
edit: example:
public static void main(final String... args)
{
System.out.println("TATATA GHRGHRGHRGHR"
.replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}
This prints:
TA GHR
Since you asked for a regex solution:
(\\w)(\\w)(\\1\\2){2,};
(\w)(\w): matches every pair of consecutive word characters ((.)(.) will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\1\\2) matches anytime the characters in those groups are repeated again immediately afterward, and {2,} matches when it repeats two or more times ({2,10} would match when it repeats more than one but less than ten times).
String s = "hello TATATATA world";
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
//prints "TATATATA"

Categories