Java Pattern matcher and RegEx

Java Pattern matcher and RegEx - java

I need RegEX help please... basically a pattern that matches the following strings
G1:k6YxekrAP71LqRv[P:3]
G1:k6YxekrAP71LqRv[S:2,3,4|P:3]
G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]
G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]
"G1:k6YxekrAP71LqRv" and "P:3" are the main thing to match
I've done the below to match the first string but got lost with the rest.
G1:k6YxekrAP71LqRv(\[|\|)P:3(\||\])

If I am not mistaken, the strings all begin with G1:k6YxekrAP71LqRv.
After that, there is [P:3] by itself, or with either left S:2,3,4|, right |R:2,3,4,5 or with both left and right. The values 2,3,4 and 2,3,4,5 could be repetitive digits divided by a comma.
To match the full pattern you could use:
(G1:k6YxekrAP71LqRv)\[(?:S:(?:\d,)+\d\|)?(P:3)(?:\|R:(?:\d,)+\d)?\]
Explanation
(G1:k6YxekrAP71LqRv) # Match literally in group 1
\[ # Match [
(?: # Non capturing group
S: # Match literally
(?:\d,)+\d\| # Match repeatedly a digit and comma one or more times
\d\| # Followed by a digit and |
)? # Close group and make it optional
(P:3) # Capture P:3 in group 2
(?: # Non capturing group
\|R: # match |R:
(?:\d,)+ # Match repeatedly a digit and comma one or more times
\d # Followed by a digit
)? # Close group and make it optional
\] # Match ]
Java Demo
And for the (?:\d,)+\d you could also use 2,3,4 and 2,3,4,5 fi you want to match those literally.
To match the whole string with G1:k6YxekrAP71LqRv at the start and should contain P:3, you could use a positive lookahead (?=.*P:3):
\AG1:k6YxekrAP71LqRv(?=.*P:3).*\z

Solution:
"((G1:k6YxekrAP71LqRv)\\[.*(?<=\\||\\[)P:3(?=\\]|\\,|\\|)[^\\]]*\\])"
Explanation:
\\ - this is used in the regex to escape characters that have special meaning in regex
G1:k6YxekrAP71LqRv these characters need to be matched literally (matching group #1 in parenthesis ("()")
\\[.* - [ character and after it any character zero or more times
(?<=\\||\\[)P:3 - positive lookbehind - here you say, you want P:3 to be preceded by | OR [
AND
P:3(?=\\]|\\,|\\|) - positive lookahead - P:3 to be followed only by ] OR , OR | (if you don't want to match e.g.: P:3,4, simply delete the following part from the regex: |\\,
(P:3) - capturing group #2
[^\\]]* - there can appear zero or more characters other than ]
\\] - ] character at the end of match
Code to check pattern:
String s1 = "G1:k6YxekrAP71LqRv[P:3]";
String s2 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3]";
String s3 = "G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]";
String s4 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]";
String withCommaAfter = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]";
String notMatch1 ="G1:k6YxekrAP71LqRv[P:33]";
String notMatch2 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:33]";
String[] sampleStrings = new String[] {s1, s2, s3, s4, withCommaAfter, notMatch1, notMatch2}; // to store source strings and to print results in a loop
Pattern p = Pattern.compile("(G1:k6YxekrAP71LqRv)\\[.*(?<=\\||\\[)(P:3)(?=\\]|\\,|\\|)[^\\]]*\\]");
for(String s : sampleStrings) {
System.out.println("Checked String: \"" + s + "\"");
Matcher m = p.matcher(s);
while(m.find()) { // if match is found print the following line to the console
System.out.println("\t whole String : " + m.group());
System.out.println("\t G1...qRv part : " + m.group(1));
System.out.println("\t P:3 part : " + m.group(2) + "\n");
}
}
Output that you get if you want String withCommaAfter to be matched too (if you don't want it to be matched, delete |\\, from the regex:
Checked String: "G1:k6YxekrAP71LqRv[P:3]"
whole String : G1:k6YxekrAP71LqRv[P:3]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]"
whole String : G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[P:33]"
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:33]"

Related

Java - How to validate regex expression with multiple parentheses then extract the components from the string?

I have an input string like:
abc(123:456),def(135.666:3434.777),ghi("2015-06-07T09:01:05":"2015-07-08")
Basically, it is (naive idea with regex):
[a-zA-Z0-9]+(((number)|(quoted datetime)):((number)|(quoted datetime)))(,[a-zA-Z0-9]+(((number)|(quoted datetime)):((number)|(quoted datetime))))+?
How can I make a regex pattern in Java to validate that the input string follows this pattern and then I can extract the values [a-zA-Z0-9]+ and ((number)|(quoted datetime)):((number)|(quoted datetime)) from them?

You can use
(\w+)\((\d+(?:\.\d+)?|\"[^\"]*\"):(\d+(?:\.\d+)?|\"[^\"]*\")\)
See the regex demo.
In Java, it can be declared as:
String regex = "(\\w+)\\((\\d+(?:\\.\\d+)?|\"[^\"]*\"):(\\d+(?:\\.\\d+)?|\"[^\"]*\")\\)";
Details:
(\w+) - Group 1: one or more word chars
\( - a ( char
(\d+(?:\.\d+)?|\"[^\"]*\") - Group 2: one or more digits optionally followed with . and one or more digits, or ", zero or more chars other than " and then a " char
: - a colon
(\d+(?:\.\d+)?|\"[^\"]*\") - Group 3: one or more digits optionally followed with . and one or more digits, or ", zero or more chars other than " and then a " char
\) - a ) char

How do I replace a certain char in between 2 strings using regex

I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.
So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"
I would say input.replaceAll(regex, '\\\"');
I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)
Example input:
This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "
Expected output:
This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "

Find all characters between START and END, and for those characters replace " with \".
To achieve this, apply a replacer function to all matches of characters between START and END:
string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
.replaceAll(mr -> mr.group().replace("\"", "\\\\\""));
which produces your expected output.
Some notes on how this works.
This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:
(?<=START).*?(?=END)
The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:
START a"b END c"d START e"f END
A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.
The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).

For now i've been able to solve it by creating 3 capture groups and continuously replacing the match until there are no more matches left. In this case I even had to insert a replace indentifier because replacing with " would keep the " char there and create an infinite loop. Then when there are no more matches left I replaced my identifier and i'm now getting the expected result.
I still feel like there has to be a way cleaner way to do this using only 1 replace statement...
Code that worked for me:
class Playground {
public static void main(String[ ] args) {
String input = "\"ThSTARTis is a\" te\"\"stEND \" !!!";
String regex = "(.*START.+)\"+(.*END+.*)";
while(input.matches(regex)){
input = input.replaceAll(regex, "$1---replace---$2");
}
String result = input.replace("---replace---", "\\\"");
System.out.println(result);
}
}
Output:
"ThSTARTis is a\" te\"\"stEND " !!!
I would love any suggestions as to how I could solve this in a better/cleaner way.

Another option is to make use of the \G anchor with 2 capture groups. In the replacement use the 2 capture groups followed by \"
(?:(START)(?=.*END)|\G(?!^))((?:(?!START|END)(?>\\+\"|[^\r\n\"]))*)\"
Explanation
(?: Non capture group
(START)(?=.*END) Capture group 1, match START and assert there is END to the right
| Or
\G(?!^) Assert the current position at the end of the previous match
) Close non capture group
( Capture group 2
(?: Non capture group
(?!START|END) Negative lookhead, assert not START or END directly to the right
(?>\\+\"|[^\r\n\"]) Match 1+ times \ followed by " or match any char except " or a newline
)* Close the non capture group and optionally repeat it
) Close group 2
\" Match "
See a Java regex demo and a Java demo
For example:
String regex = "(?:(START)(?=.*END)|\\G(?!^))((?:(?!START|END)(?>\\\\+\\\"|[^\\r\\n\\\"]))*)\\\"";
String string = "This \" is START an \" example input END string \"\"\n"
+ "START This is a \"\" second example END\n"
+ "This\" is \"a START third example END \" \"";
String subst = "$1$2\\\\\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceAll(subst);
System.out.println(result);
Output
This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "

regex to capture the string between a word and first occurrence of a character

Want to capture the string after the last slash and before the first occurrence of backward slash().
sample data:
sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemSummary/amex","Accept":"application/json, application/javascript","sessionId":"ggh76734",
targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/basket","Accept":"application/json, application/javascript","sessionId":"ggh76734", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/","Accept":"application/json, application/javascript","sessionId":"ggh76734",targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;List item
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/basket?id=76734&para=jhjdfhj&type=new&ordertype=kjkf", "Accept":"application/json, application/javascript","sessionId":"ggh76734", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
Expecting the below output:
amex
basket
''(empty string)
basket
Have build the below regex to capture it but its 100% accurate. It is capturing some additional part.
Regex
\bmyreferer\\\":\\\"\S+\/(.*?)\\\",
Could you please help me to improve the regex to get desired output?

You could use a negated character class with a capture group:
\bmyreferer":"[^"]+/([^/"]*)"
\bmyreferer":" Match literally preceded by a word boundary
[^"]+/ Match 1+ times any char except ", followed by a /
( Capture group 1
[^/"]* Optionally match (to also match an empty string) any char except / and "
)" Close group 1 and match "
regex demo | Java demo
Example code
String regex = "\\bmyreferer\":\"[^\"]+/([^/\"]*)\"";
String string = "sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemSummary/amex\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\", targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;\n\n"
+ "sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemList/basket\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;\n\n"
+ "sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemList/\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\",targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;List item";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Group 1 value: " + matcher.group(1));
}
Output
Group 1 value: amex
Group 1 value: basket
Group 1 value:

Regex java a regular expression to extract string except the last number

How to extract all characters from a string without the last number (if exist ) in Java, I found how to extract the last number in a string using this regex [0-9.]+$ , however I want the opposite.
Examples :
abd_12df1231 => abd_12df
abcd => abcd
abcd12a => abcd12a
abcd12a1 => abcd12a

What you might do is match from the start of the string ^ one or more word characters \w+ followed by not a digit using \D
^\w+\D
As suggested in the comments, you could expand the characters you want to match using a character class ^[\w-]+\D or if you want to match any character you could use a dot ^.+\D

If you want to remove one or more digits at the end of the string, you may use
s = s.replaceFirst("[0-9]+$", "");
See the regex demo
To also remove floats, use
s = s.replaceFirst("[0-9]*\\.?[0-9]+$", "");
See another regex demo
Details
(?s) - a Pattern.DOTALL inline modifier
^ - start of string
(.*?) - Capturing group #1: any 0+ chars other than line break chars as few as possible
\\d*\\.?\\d+ - an integer or float value
$ - end of string.
Java demo:
List<String> strs = Arrays.asList("abd_12df1231", "abcd", "abcd12a", "abcd12a1", "abcd12a1.34567");
for (String str : strs)
System.out.println(str + " => \"" + str.replaceFirst("[0-9]*\\.?[0-9]+$", "") + "\"");
Output:
abd_12df1231 => "abd_12df"
abcd => "abcd"
abcd12a => "abcd12a"
abcd12a1 => "abcd12a"
abcd12a1.34567 => "abcd12a"
To actually match a substring from start till the last number, you may use
(?s)^(.*?)\d*\.?\d+$
See the regex demo
Java code:
String s = "abc234 def1.566";
Pattern pattern = Pattern.compile("(?s)^(.*?)\\d*\\.?\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}

With this Regex you could capture the last digit(s)
\d+$
You could save that digit and do a string.replace(lastDigit,"");

Regex to find words with letters and numbers separated or not by symbols

I need to build a regex that match words with these patterns:
Letters and numbers:
A35, 35A, B503X, 1ABC5
Letters and numbers separated by "-", "/", "\":
AB-10, 10-AB, A10-BA, BA-A10, etc...
I wrote this regex for it:
\b[A-Za-z]+(?=[(?<!\-|\\|\/)\d]+)[(?<!\-|\\|\/)\w]+\b|\b[0-9]+(?=[(?<!\-|\\|\/)A-Za-z]+)[(?<!\-|\\|\/)\w]+\b
It works partially, but it's match only letters or only numbers separated by symbols.
Example:
10-10, open-office, etc.
And I don't wanna this matches.
I guess that my regex is very repetitive and somewhat ugly.
But it's what I have for now.
Could anyone help me?
I'm using java/groovy.
Thanks in advance.

Interesting challenge. Here is a java program with a regex that picks out the types of "words" you are after:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
String s = "A35, 35A, B503X, 1ABC5 " +
"AB-10, 10-AB, A10-BA, BA-A10, etc... " +
"10-10, open-office, etc.";
Pattern regex = Pattern.compile(
"# Match special word having one letter and one digit (min).\n" +
"\\b # Match first word having\n" +
"(?=[-/\\\\A-Za-z]*[0-9]) # at least one number and\n" +
"(?=[-/\\\\0-9]*[A-Za-z]) # at least one letter.\n" +
"[A-Za-z0-9]+ # Match first part of word.\n" +
"(?: # Optional extra word parts\n" +
" [-/\\\\] # separated by -, / or //\n" +
" [A-Za-z0-9]+ # Match extra word part.\n" +
")* # Zero or more extra word parts.\n" +
"\\b # Start and end on a word boundary",
Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(s);
while (regexMatcher.find()) {
System.out.print(regexMatcher.group() + ", ");
}
}
}
Here is the correct output:
A35, 35A, B503X, 1ABC5, AB-10, 10-AB, A10-BA, BA-A10,
Note that the only complex regexes which are "ugly", are those that are not properly formatted and commented!

Just use this:
([a-zA-Z]+[-\/\\]?[0-9]+|[0-9]+[-\/\\]?[a-zA-Z]+)
In Java \\ and \/ should be escaped:
([a-zA-Z]+[-\\\/\\\\]?[0-9]+|[0-9]+[-\\\/\\\\]?[a-zA-Z]+)

Excuse me to write my solution in Python, I don't know enough Java to write in Java.
pat = re.compile('(?=(?:([A-Z])|[0-9])' ## This part verifies that
'[^ ]*' ## there are at least one
'(?(1)\d|[A-Z]))' ## letter and one digit.
'('
'(?:(?<=[ ,])[A-Z0-9]|\A[A-Z0-9])' # start of second group
'[A-Z0-9-/\\\\]*'
'[A-Z0-9](?= |\Z|,)' # end of second group
')',
re.IGNORECASE) # this group 2 catches the string
.
My solution catches the desired string in the second group: ((?:(?<={ ,])[A-Z0-9]|\A[A-Z0-9])[A-Z0-9-/\\\\]*[A-Z0-9](?= |\Z|,))
.
The part before it verifies that one letter at least and one digit at least are present in the catched string:
(?(1)\d|[A-Z]) is a conditional regex that means "if group(1) catched something, then there must be a digit here, otherwise there must be a letter"
The group(1) is ([A-Z]) in (?=(?:([A-Z])|[0-9])
(?:([A-Z])|[0-9]) is a non-capturing group that matches a letter (catched) OR a digit, so when it matches a letter, the group(1) isn't empty
.
The flag re.IGNORECASE allows to treat strings with upper or lower cased letters.
.
In the second group, I am obliged to write (?:(?<=[ ,])[A-Z0-9]|\A[A-Z0-9]) because lookbehind assertions with non fixed length are not allowed. This part signifies one character that can't be '-' preceded by a blank or the head of the string.
At the opposite, (?= |\Z[,) means 'end of string or a comma or a blank after'
.
This regex supposes that the characters '-' , '/' , '\' can't be the first character or the last one of a captured string . Is it right ?
import re
pat = re.compile('(?=(?:([A-Z])|[0-9])' ## (from here) This part verifies that
'[^ ]*' # there are at least one
'(?(1)\d|[A-Z]))' ## (to here) letter and one digit.
'((?:(?<=[ ,])[A-Z0-9]|\A[A-Z0-9])'
'[A-Z0-9-/\\\\]*'
'[A-Z0-9](?= |\Z|,))',
re.IGNORECASE) # this group 2 catches the string
ch = "ALPHA13 10 ZZ 10-10 U-R open-office ,10B a10 UCS5000 -TR54 code vg4- DV-3000 SEA 300-BR gt4/ui bn\\3K"
print [ mat.group(2) for mat in pat.finditer(ch) ]
s = "A35, 35A, B503X,1ABC5 " +\
"AB-10, 10-AB, A10-BA, BA-A10, etc... " +\
"10-10, open-office, etc."
print [ mat.group(2) for mat in pat.finditer(s) ]
result
['ALPHA13', '10B', 'a10', 'UCS5000', 'DV-3000', '300-BR', 'gt4/ui', 'bn\\3K']
['A35', '35A', 'B503X', '1ABC5', 'AB-10', '10-AB', 'A10-BA', 'BA-A10']

My first pass yields
(^|\s)(?!\d+[-/\\]?\d+(\s|$))(?![A-Z]+[-/\\]?[A-Z]+(\s|$))([A-Z0-9]+[-/\\]?[A-Z0-9]+)(\s|$)
Sorry, but it's not java formatted (you'll need to edit the \ \s etc.). Also, you can't use \b b/c a word boundary is anything that is not alphanumeric and underscore, so I used \s and the start and end of the string.
This is still a bit raw
EDIT
Version 2, slightly better, but could be improved for performance by usin possessive quantifiers. It matches ABC76 AB-32 3434-F etc, but not ABC or 19\23 etc.
((?<=^)|(?<=\s))(?!\d+[-/\\]?\d+(\s|$))(?![A-Z]+[-/\\]?[A-Z]+(\s|$))([A-Z0-9]+[-/\\]?[A-Z0-9]+)((?=$)|(?=\s))

A condition (A OR NOT A) can be omited. So symbols can savely been ignored.
for (String word : "10 10-10 open-office 10B A10 UCS5000 code DV-3000 300-BR".split (" "))
if (word.matches ("(.*[A-Za-z].*[0-9])|(.*[0-9].*[A-Za-z].*)"))
// do something
You didn't mention -x4, 4x-, 4-x-, -4-x or -4-x-, I expect them all to match.
My expression looks just for something-alpha-something-digits-something, where something might be alpha, digits or symbols, and the opposite: something-alpha-something-digits-something. If something else might occur, like !#$~()[]{} and so on, it would get longer.
Tested with scala:
scala> for (word <- "10 10-10 open-office 10B A10 UCS5000 code DV-3000 300-BR".split (" ")
| if word.matches ("(.*[A-Za-z].*[0-9])|(.*[0-9].*[A-Za-z].*)")) yield word
res89: Array[java.lang.String] = Array(10B, A10, UCS5000, DV-3000, 300-BR)
Slightly modified to filter matches:
String s = "A35, 35A, B53X, 1AC5, AB-10, 10-AB, A10-BA, BA-A10, etc. -4x, 4x- -4-x- 10-10, oe-oe, etc";
Pattern pattern = java.util.regex.Pattern.compile ("\\b([^ ,]*[A-Za-z][^ ,]*[0-9])[^ ,]*|([^ ,]*[0-9][^ ,]*[A-Za-z][^ ,]*)\\b");
matcher = pattern.matcher (s);
while (matcher.find ()) { System.out.print (matcher.group () + "|") }
But I still have an error, which I don't find:
A35|35A|B53X|1AC5|AB-10|10-AB|A10-BA|BA-A10|-4x|4x|-4-x|
4x should be 4x-, and -4-x should be -4-x-.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Pattern matcher and RegEx - java

Related

Java - How to validate regex expression with multiple parentheses then extract the components from the string?

How do I replace a certain char in between 2 strings using regex

regex to capture the string between a word and first occurrence of a character

Regex java a regular expression to extract string except the last number

Regex to find words with letters and numbers separated or not by symbols

Categories

Resources