How to match just 1 or 2 chars with regex - java

i want regx to match any word of 2 or 1 characters example ( is , an , or , if, a )
i tried this :-
int scount = 0;
String txt = "hello everyone this is just test aa ";
Pattern p2 = Pattern.compile("\\w{1,2}");
Matcher m2 = p2.matcher(txt);
while (m2.find()) {
scount++;
}
but got wrong matches.

You probably want to use word boundary anchors:
Pattern p2 = Pattern.compile("\\b\\w{1,2}\\b");
These anchors match at the start/end of alphanumeric "words", that is, in positions before a \w character if there is no \w character before that, or after a \w character if there is no \w character after that.

I think that you should be a bit more descriptive. Your current code returns 15 from the variable scount. That's not nothing.
If you want to get a count of the 2 letter words, and that is excluding underscores, digits within this count, I think that you would be better off with negative lookarounds:
Pattern.compile("(?i)(?<![a-z])[a-z]{1,2}(?![a-z])");
With a string input of hello everyone this is just 1 test aa, you get the value of scount as 2 (is and aa) and not 3 (is, 1, aa) as you would have if you were looking for only 1 or 2 consecutive \w.
Also, with hello everyone this is just test aa_, you get a count of 1 with \w (is), but 2 (is, aa)with the lookarounds.

Related

Use regex to get 2 specific groups of substring

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M
If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}
You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Extracting words with - included upper lowercase not working for words it only extracts chars

I'm trying to extract several words from a string with regex matcher &pattern. I did spend some time to make the regular expression I'm using but this doesn't work as expected, any help would be very appreciated.
I made the regular expression I'm using but this doesn't work as expected, some help would be great. I'm able to extract the chars from the words I want but not the entire word.
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main (String[] args){
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'";
Pattern pattern = Pattern.compile("[((a-zA-Z1-9-0)/W)]");
Matcher matcher = pattern.matcher(mebo);
while (matcher.find()) {
System.out.printf("Word is %s %n",matcher.group(0));
}
}
}
This is current output:
Word is 1 Word is 3 Word is 2 Word is 3 Word is 9 Word is 9 Word
is B Word is I Word is M Word is C Word is P Word is 1 Word is 2
Word is B Word is M Word is W Word is Q Word is - Word is C Word
is S Word is P Word is S Word is - Word is D Word is 1 Word is 0
Word is 1 Word is 9 Word is 2 Word is 2 Word is 9 Word is 2 Word
is 2 Word is 9
============
My expectation is to iterate entire words for example:
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'"
word is 1323 word is 99BIMCP word is 1 word is 2 word is BMWQ-CSPS-D1
word is 0192 word is 29229
You can use this as it seems from your regex you want to include character digit and - in your match.
`[\w-]+`
[\w-]+ - Matches (a-z 0-9 _ and - ) one or more time.
Demo
The easiest solution here seems to be to ditch regex overall and just split the string instead. You want to allow digits, alphabetic characters and - in your words. Consider the following code:
for (String word : mebo.split("[^\\d\\w-]+")) {
System.out.printf("Word is %s %n", word);
}
This should exhibit the desired behaviour. Note that this will generate some empty strings, unless you have the + in the splitting pattern.
What this does is splitting the input string between everything that does not match your desired characters. This is accomplished through using an inverted character class.
I would suggest a regex split, followed by a regex replacement:
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'";
String[] parts = mebo.split("\\s*,?\\s+");
for (String part : parts) {
System.out.println(part.replaceAll("[']", ""));
}
1323
99BIMCP
1
2
BMWQ-CSPS-D1
0192
29229
The logic here is to split on whitespace, possibly including a comma separator. Then, we can do a regex replacement cleanup to remove stray characters such as single quotes. Double quotes and any other unwanted characters can easily be added to the character class used for replacement.
In general, regex alone may not suffice here, and you may need a parser to cover every edge case. Case in point, consider the following input line:
One, "Two or more", Three
My answer fails here, because it blindly splits on whitespace, and does not know that escaped whitespace is not a token. A regex would also fail here.

java regex minimum character not working

^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
this is the regex that should match the following conditions
should start only with alphabets and numbers ,
contains alphabets numbers ,dot and hyphen
should not end with hyphen
it works for all conditions but when i try with three character like
vu6
111
aaa
after four characters validation is working properly did i miss anything
Reason why your Regex doesn't work:
Hope breaking it into smaller pieces will help:
^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
[a-zA-Z1-9]: Will match a single alphanumeric character ( except for _ )
[a-zA-Z1-9_\\.-]{2,64}: Will match alphanumeric character + "." + -
[^\\.-]: Will expect exactly 1 character which should not be "." or "-"
Solution:
You can use 2 simple regex:
This answer assumes that the length of the string you want to match lies between [3-65] (both inclusive)
First, that will actually validate the string
[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}
Second, that will check the char doesn't end with ".|-"
[^\\.-]$
In Java
Pattern pattern1 = Pattern.compile("^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}$");
Pattern pattern2 = Pattern.compile("[^\\.-]$");
Matcher m1 = pattern1.matcher(input);
Matcher m2 = pattern1.matcher(input);
if(m1.find() && m2.find()) {
System.out.println("found");
}

What regex should I use to check a string only has numbers and 2 special characters ( - and , ) in Java?

Scenario: I want to check whether string contains only numbers and 2 predefined special characters, a dash and a comma.
My string contains numbers (0 to 9) and 2 special characters: a dash (-) defines a range and a comma (,) defines a sequence.
Tried attempt :
Tried following regex [0-9+-,]+, but not working as expected.
Possible inputs :
1-5
1,5
1-5,6
1,3,5-10
1-5,6-10
1,3,5-7,8,10
The regex should not accept these types of strings:
-----
1--4
,1,5
5,6,
5,4,-
5,6-
-5,6
Please can any one help me to create regex for above scenario?
You may use
^\d+(?:-\d+)?(?:,\d+(?:-\d+)?)*$
See the regex demo
Regex details:
^ - start of string
\d+ - 1 or more digits
(?:-\d+)? - an optional sequence of - and 1+ digits
(?:,\d+(?:-\d+)?)* - zero or more seuqences of:
, - a comma
\d+(?:-\d+)? - same pattern as described above
$ - end of string.
Change your regex [0-9+-,]+ to [0-9,-]+
final String patternStr = "[0-9,-]+";
final Pattern p = Pattern.compile(patternStr);
String data = "1,3,5-7,8,10";
final Matcher m = p.matcher(data);
if (m.matches()) {
System.out.println("SUCCESS");
}else{
System.out.println("ERROR");
}

Mask mobile number in Java [duplicate]

I would like to mask the last 4 digits of the identity number (hkid)
A123456(7) -> A123***(*)
I can do this by below:
hkid.replaceAll("\\d{3}\\(\\d\\)", "***(*)")
However, can my regular expression really can match the last 4 digit and replace by "*"?
hkid.replaceAll(regex, "*")
Please help, thanks.
Jessie
Personally, I wouldn't do it with regular expressions:
char[] cs = hkid.toCharArray();
for (int i = cs.length - 1, d = 0; i >= 0 && d < 4; --i) {
if (Character.isDigit(cs[i])) {
cs[i] = '*';
++d;
}
}
String masked = new String(cs);
This goes from the end of the string, looking for digit characters, which it replaces with a *. Once it's found 4 (or reaches the start of the string), it stops iterating, and builds a new string.
While I agree that a non-regex solution is probably the simplest and fastest, here's a regex to catch the last 4 digits independent if there is a grouping ot not: \d(?=(?:\D*\d){0,3}\D*$)
This expression is meant to match any digit that is followed by 0 to 3 digits before hitting the end of the input.
A short breakdown of the expression:
\d matches a single digit
\D matches a single non-digit
(?=...) is a positive look-ahead that contributes to the match but isn't consumed
(?:...){0,3} is a non-capturing group with a quantity of 0 to 3 occurences given.
$ matches the end of the input
So you could read the expression as follows: "match a single digit if it is followed by a sequence of 0 to 3 times any number of non-digits which are followed by a single digit and that sequence is followed by any number of non-digits and the end of the input" (sounds complicated, no?).
Some results when using input.replaceAll( "\\d(?=(?:\\D*\\d){0,3}\\D*$)", "*" ):
input = "A1234567" -> output = "A123****"
input = "A123456(7)" -> output = "A123***(*)"
input = "A12345(67)" -> output = "A123**(**)"
input = "A1(234567)" -> output = "A1(23****)"
input = "A1234B567" -> output = "A123*B***"
As you can see in the last example the expression will match digits only. If you want to match letters as well either replace \d and \D with \w and \W (note that \w matches underscores as well) or use custom character classes, e.g. [02468] and [^02468] to match even digits only.

Categories