Java-flavoured Regex : Match whole string if group is n chars - java

I'm trying to create a Regex for a String validator. My String must be exactly 8 characters long, and begin with a letter (lowercase or uppercase) or a number. It can only contain letters (lowercase and uppercase), numbers or whitespaces right after that first character. If a whitespace is found, there can only be whitespaces after it.
For now, I have the match group for the second part : [a-zA-Z0-9]{1,}\s*
I can't find a way to specify that this group is matched only if it has exactly 8 characters. I tried ^([a-zA-Z0-9]{1,}\s*){8}$ but this is not the expected result.
Here are some test cases (with trailing whitespaces).
Valid :
9013
20130
89B
A5000000
Invalid :
9013
20130
90 90
123456789

There probably is a smart regex way to do it but you could also first check the length of the string:
input.length() == 8 && input.matches("[a-zA-Z0-9]+\\s*")
This is also probably more efficient than a complex regex.

You can use this lookahead based regex:
^[a-zA-Z0-9](?!.* [a-zA-Z0-9])[a-zA-Z0-9 ]{7}$
RegEx Demo
^[a-zA-Z0-9] matches an alpha-num char at start
(?!.* [a-zA-Z0-9]) is negative lookahead to make sure that there is no instance of an alpha-num char followed by a space.
[a-zA-Z0-9 ]{7}$ matches 7 chars containing alpha-num char or space.

Related

How to make a regex for decimal that has a defined length (comma included)

I am looking for a 15 characters length regex with a decimal.
In the swift documentation, the regex would look like this : 3!a15d where 3!a means [a-zA-Z]{3} and 15d means a decimal of 15 characters length with a comma.
I tried the regex below :
([A-Z]{3}[0-9]{1,14}[,][0-9]{1})|([A-Z]{3}[0-9]{1,13}[,][0-9]{1,2})|([0-9]{1,12}[,][0-9]{1,3})|([0-9]{1,11}[,][0-9]{1,4})|([0-9]{1,10}[,][0-9]{1,5})|([0-9]{1,9}[,][0-9]{1,6})|([0-9]{1,8}[,][0-9]{1,7})|([0-9]{1,7}[,][0-9]{1,8})|([0-9]{1,6}[,][0-9]{1,9})|([0-9]{1,5}[,][0-9]{1,10})|([0-9]{1,4}[,][0-9]{1,11})|([0-9]{1,3}[,][0-9]{1,12})|([0-9]{1,2}[,][0-9]{1,13})|[0-9]{1}[,][0-9]{1,14}
But it didn't work.
Do you have any tips to help me?
You can use
^[a-zA-Z]{3}(?=[^,]*,[^,]*$)\d(?:,?\d){14}$
See the regex demo.
Details:
^ - start of string
[a-zA-Z]{3} - three ASCII letters
(?=[^,]*,[^,]*$) - only one obligatory comma must be present further in the string
\d - a digit
(?:,?\d){14} - fourteen repetitions of an optional comma and a digit
$ - end of string.
Sample usage in Java to validate a string:
Boolean isValid = text.matches("[a-zA-Z]{3}(?=[^,]*,[^,]*$)\\d(?:,?\\d){14}");

Java regex: find sequence of letter-digit combinations, allowing certain symbols

I am trying to arrive at a regex to detect tokens from a sentence. These tokens should be a combination of letters and digits (mandatory), with optional chars like , or .
Given the sentence:
M5 x 35mm Full Thread Hexagon Bolts (DIN 933) - PEEK DescriptionThe M5 x 0.035mm, and 6NB7 plus a Go9IuN.
It should find six tokens:
M5, 35mm, M5, 0.035mm, 6NB7, Go9IuN
I have tried the following which does not work:
Pattern alphanum=Pattern.compile("\\b(([A-Za-z].*[0-9])|([0-9].*[A-Za-z]))\\b");
Any suggestions please?
Thanks
You could use a positive lookahead to assert at least 1 digit and then match at least 1 char a-zA-Z
The .* part will over match as it will match any char 0+ times except a newline
\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\b
Explanation
\b Word boundary
(?=[a-zA-Z0-9.,]*[0-9]) Assert at least 1 digit
[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]* Match at least 1 char a-zA-Z
\b Word boundary
Regex demo
In Java
final String regex = "\\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\\b";
Perhaps the following regex will do the job
(?=[A-Za-z,.]*\d)(?=[\d,.]*[A-Za-z])[A-Za-z\d,.]{2,}(?<![,.])
It starts with two positive lookaheads which form an and condition.
The first lookahead (?=[A-Za-z,.]*\d) checks if a token contains at least one digit.
The second lookahead (?=[\d,.]*[A-Za-z]) checks if it contains at least one letter.
The actual match [A-Za-z\d,.]{2,} reads at least two letters, digits, , or ..
In the end it checks that the match does not end with those special characters: (?<![,.])
regex101 demo

Extract substring from end till first alphabet in java

I have a string of format: A-2-Q4567
More examples: AB-456-T12, A24-5-M12345, etc.
I want to extract the last numerical values out of these strings, which are: 4567, 12, 12345 respectively (which is the numerical value of the substring from the end till first non-numeric character is encountered)
I can split the string, get the last string from the splitted string array, and then do a parseInt after removing the non-numerical characters from it.
But is there a more elegant way of doing this?
You can use this regex: (\d+$). It returns the last sequence of digits in the string.
EDIT - some explanation:
The \d means any digit.
The + means one or more of the previous symbols. Since the previous symbol is a digit, then \d+ means "one or more digits".
The $ means the end of the string, so \d+$ is the last sequence of digits in the string.
you can do this :
String getLastNumeric(String input)
{
String str="";
char c;
for(int i=input.length()-1;i>=0 && Character.isDigit(c=input.charAt(i));i--)
str=c+str;
return str;
}
The regex solutions might be more elegant but performance-wise I think the above is the best because Regex match can be more expensive than a simple for loop with a simple condition to evaluate.
Ofcourse The Regex is more flexible, what if your requirements change and now a dash "-" must precede the numbers ? with Regex it should be just a matter of changing one regex expression.
I put the Regex version here but remember if you're sure your requirements won't change I think the above solution is better on the CPU :
Matcher matcher= Pattern.compile("(\\d+$)").matcher(input);
if(matcher.find())
return matcher.group();
return "";

Replace last word in a string if it is 2 characters long using regex

I am trying to replace last word of a string if it is 2 characters long using regex. I used [a-zA-Z]{2}$ but it is finding last 2 characters of string. I don't want to replace the last word if it is not exactly 2 characters long, how can I do it?
You need to match a word boundary (\b) before the two letters:
\b[a-zA-Z]{2}$
This will match any two Latin letters that appear at the end of a string, as long as they are not preceded by a 'word' character (which is a Latin letter, digit, or underscore).
In case you want to replace the word even if it is preceded by a digit or underscore, you might want to use a lookbehind assertion, like this:
(?<![a-zA-Z])[a-zA-Z]{2}$
\\b\\w\\w\\b$ (regex in java flavor)
should work as well
Edit: in fact \\b\\w\\w$ should be enough. (or \b\w\w$ in non-java flavor.. see demo link)
You could also use:
[^\p{Alpha}]\p{Alpha}{2}$
Use Alnum instead if digits count as words. This does, however, fail if the entire string is only two characters long.

regex to match a recurring pattern

I am trying to write a regex for java that will match the following string:
number,number,number (it could be this simple or it could have a variable number of numbers, but each number has to have a comma after it there will not be any white space though)
here was my attempt:
[[0-9],[0-9]]+
but it seems to match anything with a number in it
You could try something along the lines of ([0-9]+,)*[0-9]+
This will match:
Only one number, e.g.: 7
Two numbers, e.g.: 7,52
Three numbers, e.g.: 7,52,999
etc.
This will not match:
Things with spaces, e.g.: 7, 52
A list ending with a comma, e.g.: 7, 52,
Many other things out of the scope of this problem.
I think this would work
\d+,(\d+,)+
Note that as you want, that will only capture number followed by a comma
I guess you are starting with a String. Why don't you just use String.split(",") ?
^ means the start of a string and $ means the end. If you don't use those, you could match something in the middle (b matched "abc").
The + works on the element before it. b is an element, [0-9] is an element, and so are groups (things wrapped in parenthesis).
So, the regex you want matches:
The start of the string ^
a number [0-9]
any amount of comas flowed by numbers (,[0-9])+
the end of the string $
or, ^[0-9](,[0-9])+$
Try regex as [\d,]* string representation as [\\d,]* e.g. below:
Pattern p4 = Pattern.compile("[\\d,]*");
Matcher m4 = p4.matcher("12,1212,1212ad,v");
System.out.println(m4.find()); //prints true
System.out.println(m4.group());//prints 12,1212,1212
If you want to match minimum one comma (,) and two numbers e.g. 12,1212 then you may want to use regex as (\d+,)+\d+ with string representation as \\d+,)+\\d+. This regex matches a a region with a number minimum one digit followed by one comma(,) followed by minimum one digit number.

Categories