I have a string that needs to be extracted using regex. It’s preferable that only a single regex is used. As it’s used in a loop with 9 pre-existing Regex’s.(Ie, so i can just add it to the ArrayList of available regex's)
The pattern of strings will always be
Between {4,8} A-Z0-9. Followed by either,
[A-Z]{1} or [A-Z0-9]{2} or, another [A-Z0-9]{4,8}
For example:
“A1B1C1 ABCD E FGHI JK X0Y0Z0”
I’d want this to return four matches.
A1B1C1 & ABCD E & FGHI JK & X0Y0Z0
I've been trying to match the first part of {4,8} chatactures, followed by a non-greedy match for {1,2}. For example(s):
[A-Z0-9]{4,8}(\\s{1}[A-Z0-9]{1,2})*? && [A-Z0-9]{4,8}(\\s{1}[A-Z]{1}|\\s{1}[A-Z0-9]{2})*?
But this never returns more than the first {4,8} charactures.
You could use an optional part with a word boundary and an alternation to match either [A-Z0-9]{2} or [A-Z]
\b[A-Z0-9]{4,8}(?:\h+(?:[A-Z0-9]{2}|[A-Z]))?\b
\b Word boundary
[A-Z0-9]{4,8} Match 4 - 8 times A-Z0-9
(?: Non capture group
\h+ Match 1+ horizontal whitespace chars
(?:[A-Z0-9]{2}|[A-Z]) Match either 2 x A-Z0-9 or 1 x A-Z
)? Close non capture group and make it optional
\b Word boundary
Regex demo | Java demo
In Java
String regex = "\\b[A-Z0-9]{4,8}(?:\\h+(?:[A-Z0-9]{2}|[A-Z]))?\\b";
Related
I created regex expression in JAVA for 2 links at once:
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/test0218.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/TestTes-09-05-2018.pdf
Regex:
String REGEX_LINK = "https:..downloads.test.test.testagain.tes.test-test.test."
Pattern pattern = Pattern.compile( REGEX_LINK + ".[\w*/]*.((\d{2}-\d{2}-)?\d{4}).pdf" );
But I have to create regex expression for 3 links at once and I don't know how to do that, I need help with this:
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/test0218.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/TestTes-09-05-2018.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/01-01-18_Testt_Testing_ASB_Test_Final.pdf
I have to create one regex expression to extract String from 1 link: "0218", from 2 link: "09-05-2018", from 3 link: "01-01-18"
Maybe someone has a any idea how to do this?
You could match 2 times 2 digits with an optional hyphen, and then optionally 4 or 2 digits preceded by a hyphen.
Note that the pattern by itself does not verify a valid date.
(?<!\d)(\d{2}-?\d{2}(?:-(?:\d{4}|\d{2}))?)\S*\.pdf\b
Explanation
(?<!\d) Negative lookbehind, assert not a digit to the left
( Capture group 1
\d{2}-?\d{2} Match 2 digits, optional hyphen and 2 digits
(?:-(?:\d{4}|\d{2}))? Optionally match - and either 4 or 2 digits
) Close group 1
\S* Match optional non whitespace chars
\.pdf\b Match a dot and pdf followed by a word boundary
Regex demo
Or if there can not be any other digits following till the end of the string:
(?<!\d)(\d{2}-?\d{2}(?:-(?:\d{4}|\d{2}))?)[^\d\s]*\.pdf\b
Regex demo
When running the command sc query <serviceName>, the next output is getting back:
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
How can I extract the service state (in this case is RUNNING) using regex with Java?
You could match:
\b a word boundary
the word STATE followed by
one or more whitespace characters \s+
one \d or one or more digits \d+
one or more whitespace characters \s+
Capture in a group (group 1) one or more non-whitespace characters (\S+) which in this case will be RUNNING
\bSTATE\s+:\s+\d+\s+(\S+)
In Java
\\bSTATE\\s+:\\s+\\d+\\s+(\\S+)
Test
You can try the below regex
(.*?STATE\s*:\s*\d*\s*)(\w+)
and match group 2. See this link Regex Solution
You can use this regex:
STATE.*?(\S+)\n
The matched group should be RUNNING. You can see details and example here.
The important part is the use of a lazy star operator *? that allows matching the whole word at the end of the line instead of a single letter.
P.S: depending on whether you match multiline regex or not, you might want to switch the \n for a $.
There is another way to quickly get this value using .replaceFirst():
String result = s.replaceFirst("(?s).*?STATE[\\d\\s:]*(\\w+).*", "$1");
See the regex demo. Since the string is always in this format, and is not that long, this approach is quite convenient to implement in Java.
See the online Java demo.
Details
(?s) - a DOTALL modifier making . match any char including line break chars
.*? - any 0+ chars, as few as possible
STATE - substring STATE
[\d\s:]* - 0+ digits, whitespaces and :
(\w+) - Capturing group 1 (what we want to keep, later, we can refer to the value using $1 placeholder from the replacement pattern): 1+ word chars
.* - any 0+ chars as many as possible (up to the string end).
I have text that looks like something like this:
1. Must have experience in Java 2. Team leader...
I want to render this in HTML as an ordered list. Now adding the </li> tag to the end is simple enough:
s = replace(s, ". ", "</li>");
But how do I go about replacing the 1., 2. etc with <li>?
I have the regular expression \d*\.$ which matches a number with a period, but the problem is is that is a substring so matching 1. Must have experience in Java 2. Team leader with \d*\.$ returns false.
Code
See regex in use here
\d+\.\s+(.*?)\s*(?=\d+\.\s+|$)
Replace
<li>$1</li>\n
Results
Input
Must have experience in Java 2. Team leader...
Output
<li>Must have experience in Java</li>
<li>Team leader...</li>
Explanation
\d+ Match one or more digits
\. Match the dot character . literally
\s+ Match one or more whitespace characters
(.*?) Capture any character any number of times, but as few as possible, into capture group 1
\s* Match any number of whitespace characters
(?=\d+\.\s+|$) Positive lookahead ensuring either of the following doesn't match
\d+\.\s+
\d+ Match one or more digits
\. Match the dot character . literally
\s+ Match one or more whitespace characters
$ Assert position at the end of the line
But how do I go about replacing the 1., 2. etc with <li>?
You can use String#replaceAll which can allow regex instead of replace :
s = s.replaceAll("\\d+\\.\\s", "</li>");
Note
You don't need to use $ in the end of your regex.
You have to escape dot . because it's mean any character in regex
You can use \s for one space or \s* for zero or more spaces or \s+ for one or more space
We want
<ol>
<li>one</li>
<li>two<li>
</ol>
This can be done as:
s = s.replaceAll("(?s)(\\d+\\.)\\s+(.*\\.)\\s*", "<li>$2</li></ol>");
s = s.replaceFirst("<li>", "<ol><li>");
s = s.replaceAll("(?s)</li></ol><li>", "</li>\n<li>");
The trick is to first add </li></ol> with a spurious </ol> that should only remain after the last list item.
(?s) is the DOTALL notation, causing . to also match line breaks.
In case of more than one numbered list this will not do. Also it assumes one single sentence per list item.
I'm trying to create a regex pattern to match a specific string and return true if the string matches the pattern and false if it doesn't. Here are the conditions:
Must start with [ and end with ]
Each item inside the brackets have to be separated by commas
Each item separated by commas have to follow this regex pattern:
^[A-Za-z][A-Za-z0-9_]*$
How can I make one regex that checks for all these conditions?
Enclose in the group which could repeat:
\[[A-Za-z][A-Za-z0-9_]*(?:,[A-Za-z][A-Za-z0-9_])*\]
This is as it should appear in the final string. Escape specials according to specific language.
In Java, \w without the Pattern.UNICODE_CHARACTER_CLASS flag actually matches the same as [a-zA-Z0-9_]. So, I'd use
String pat = "\\[[a-zA-Z]\\w*(?:,[a-zA-Z]\\w*)*]";
See the IDEONE demo. Use with String#matches, or you will have to add ^ (or \\A) at the beginning and $ (or \\z) at the end.
String pat = "\\[[a-zA-Z]\\w*(?:,[a-zA-Z]\\w*)*]";
System.out.println("[c1,T4,yu5]".matches(pat)); // TRUE
Pattern explanation:
\\[ - a literal [
[a-zA-Z] - an English letter (same as \\p{Alpha})
\\w* - zero or more characters from [a-zA-Z0-9_] set
(?: - start of the non-capturing group matching...
, - a comma
[a-zA-Z]\\w* - see above
)* - ... zero or more times
] - a literal ] (does not require escaping outside of the character class to be treated literally).
I have following regex which doesn’t match two different strings.
Actual regex which finds AB-434. Which doesn’t match TEMS-54534.
([a-zA-Z][a-zA-Z0-9_]+-[1-9][0-9]*)([^.]|\.[^0-9]|\.$|$)
here is the sample inputs
TEMS-54534
TEMS-5453
TEMS-1233
TEMS-12
CB-213
CB-2135
CB-12
ABC-2223
ABC-223
ABC-12
You seem to be looking for a pattern that starts with 1 ASCII letter followed with 1 or more alphanumeric or underscore characters followed with a - followed with one or more digits not starting with 0.
You can use
^[a-zA-Z][a-zA-Z0-9_]+-[1-9][0-9]*$
or
^[a-zA-Z]\w+-(?!0)\d+$
See the regex demo (and another one).
Explanation:
^ - start of string
[a-zA-Z][a-zA-Z0-9_]+ / [a-zA-Z]\w+ - an ASCII letter followed with 1+ alphanumerics/underscore chars
- - a hyphen
[1-9][0-9]* / (?!0)\d+ - a digit from 1-9 range followed with 0+ nay digits (you can restrict it with {min,max} limiting quantifier if need be)
$ - end of string
More details:
[a-zA-Z0-9_] can be written as \w (if no Pattern.UNICODE_CHARACTER_CLASS is used)
In Java, do not forget to use double backslashes to escape metacharacters and shorthand character classes
If the pattern is used with String#matches(), the ^ at the start and $ at the end of the pattern are redundant.
And a Java demo:
List<String> strs = Arrays.asList("TEMS-54534","TEMS-5453","TEMS-1233","TEMS-12","CB-213",
"CB-2135","CB-12","ABC-2223","ABC-223","ABC-12");
for (String str : strs)
System.out.println(str.matches("[a-zA-Z]\\w+-(?!0)\\d+"));