Regex - how to get windows service state in Java? - java

When running the command sc query <serviceName>, the next output is getting back:
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
How can I extract the service state (in this case is RUNNING) using regex with Java?

You could match:
\b a word boundary
the word STATE followed by
one or more whitespace characters \s+
one \d or one or more digits \d+
one or more whitespace characters \s+
Capture in a group (group 1) one or more non-whitespace characters (\S+) which in this case will be RUNNING
\bSTATE\s+:\s+\d+\s+(\S+)
In Java
\\bSTATE\\s+:\\s+\\d+\\s+(\\S+)
Test

You can try the below regex
(.*?STATE\s*:\s*\d*\s*)(\w+)
and match group 2. See this link Regex Solution

You can use this regex:
STATE.*?(\S+)\n
The matched group should be RUNNING. You can see details and example here.
The important part is the use of a lazy star operator *? that allows matching the whole word at the end of the line instead of a single letter.
P.S: depending on whether you match multiline regex or not, you might want to switch the \n for a $.

There is another way to quickly get this value using .replaceFirst():
String result = s.replaceFirst("(?s).*?STATE[\\d\\s:]*(\\w+).*", "$1");
See the regex demo. Since the string is always in this format, and is not that long, this approach is quite convenient to implement in Java.
See the online Java demo.
Details
(?s) - a DOTALL modifier making . match any char including line break chars
.*? - any 0+ chars, as few as possible
STATE - substring STATE
[\d\s:]* - 0+ digits, whitespaces and :
(\w+) - Capturing group 1 (what we want to keep, later, we can refer to the value using $1 placeholder from the replacement pattern): 1+ word chars
.* - any 0+ chars as many as possible (up to the string end).

Related

Java Regex Troubles

I have a string that needs to be extracted using regex. It’s preferable that only a single regex is used. As it’s used in a loop with 9 pre-existing Regex’s.(Ie, so i can just add it to the ArrayList of available regex's)
The pattern of strings will always be
Between {4,8} A-Z0-9. Followed by either,
[A-Z]{1} or [A-Z0-9]{2} or, another [A-Z0-9]{4,8}
For example:
“A1B1C1 ABCD E FGHI JK X0Y0Z0”
I’d want this to return four matches.
A1B1C1 & ABCD E & FGHI JK & X0Y0Z0
I've been trying to match the first part of {4,8} chatactures, followed by a non-greedy match for {1,2}. For example(s):
[A-Z0-9]{4,8}(\\s{1}[A-Z0-9]{1,2})*? && [A-Z0-9]{4,8}(\\s{1}[A-Z]{1}|\\s{1}[A-Z0-9]{2})*?
But this never returns more than the first {4,8} charactures.
You could use an optional part with a word boundary and an alternation to match either [A-Z0-9]{2} or [A-Z]
\b[A-Z0-9]{4,8}(?:\h+(?:[A-Z0-9]{2}|[A-Z]))?\b
\b Word boundary
[A-Z0-9]{4,8} Match 4 - 8 times A-Z0-9
(?: Non capture group
\h+ Match 1+ horizontal whitespace chars
(?:[A-Z0-9]{2}|[A-Z]) Match either 2 x A-Z0-9 or 1 x A-Z
)? Close non capture group and make it optional
\b Word boundary
Regex demo | Java demo
In Java
String regex = "\\b[A-Z0-9]{4,8}(?:\\h+(?:[A-Z0-9]{2}|[A-Z]))?\\b";

Replace URL String with Integer characters located in the end of that String

I have some URL link and tried to replace all non-integer values with integers in the end of the link using regex
The URL is something like
https://some.storage.com/test123456.bucket.com/folder/80.png
Regex i tried to use:
Integer.parseInt(string.replaceAll(".*[^\\d](\\d+)", "$1"))
Output for that regex is "80.png", and i need only "80". Also i tried this tool - https://regex101.com. And as i see the main problem is that ".png" not matching with my regex and then, after substitution, this part adding to matching group.
I'm totally noob in regex, so i kindly ask you for help.
You may use
String result = string.replaceAll("(?:.*\\D)?(\\d+).*", "$1");
See the regex demo.
NOTE: If there is no match, the result will be equal to the string value. If you do not want this behavior, instead of "(?:.*\\D)?(\\d+).*", use "(?:.*\\D)?(\\d+).*|.+".
Details
(?:.*\D)? - an optional (it must be optional because the Group 1 pattern might also be matched at the start of the string) sequence of
.* - any 0+ chars other than line break chars, as many as possible
\D - a non-digit
(\d+) - Group 1: any one or more digits
.* - any 0+ chars other than line break chars, as many as possible
The replacement is $1, the backreference to Group 1 value, actually, the last 1+ digit chunk in the string that has no line breaks.
Line breaks can be supported if you prepend the pattern with the (?s) inline DOTALL modifier, i.e. "(?s)(?:.*\\D)?(\\d+).*|.+".

Java Pattern regex search between strings

Given the following strings (stringToTest):
G2:7JAPjGdnGy8jxR8[RQ:1,2]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
G2:7JAPjGdnGy8jxR8[RQ:3,4]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
And the Pattern:
Pattern p = Pattern.compile("G2:\\S+RQ:3,4");
if (p.matcher(stringToTest).find())
{
// Match
}
For string 1 I DON'T want to match, because RQ:3,4 is associated with the G3 section, not G2, and I want string 2 to match, as RQ:3,4 is associated with G2 section.
The problem with the current regex is that it's searching too far and reaching the RQ:3,4 eventually in case 1 even though I don't want to consider past the G2 section.
It's also possible that the stringToTest might be (just one section):
G2:7JAPjGdnGy8jxR8[RQ:3,4]
The strings 7JAPjGdnGy8jxR8 and jRo6pN8ZW9aglYz are variable length hashes.
Can anyone help me with the correct regex to use, to start looking at G2 for RQ:3,4 but stopping if it reaches the end of the string or -G (the start of the next section).
You may use this regex with a negative lookahead in between:
G2:(?:(?!G\d+:)\S)*RQ:3,4
RegEx Demo
RegEx Details:
G2:: Match literal text G2:
(?: Start a non-capture group
(?!G\d+:): Assert that we don't have a G<digit>: ahead of us
\S: Match a non-whitespace character
)*: End non-capture group. Match 0 or more of this
RQ:3,4: Match literal text RQ:3,4
In Java use this regex:
String re = "G2:(?:(?!G\\d+:)\\S)*RQ:3,4";
The problem is that \S matches any whitespace char and the regex engine parses the text from left to right. Once it finds G2: it grabs all non-whitespaces to the right (since \S* is a ghreedy subpattern) and then backtracks to find the rightmost occurrence of RQ:3,4.
In a general case, you may use
String regex = "G2:(?:(?!-G)\\S)*RQ:3,4";
See the regex demo. (?:(?!-G)\S)* is a tempered greedy token that will match 0+ occurrences of a non-whitespace char that does not start a -G substring.
If the hyphen is only possible in front of the next section, you may subtract - from \S:
String regex = "G2:[^\\s-]*RQ:3,4"; // using a negated character class
String regex = "G2:[\\S&&[^-]]*RQ:3,4"; // using character class subtraction
See this regex demo. [^\\s-]* will match 0 or more chars other than whitespace and -.
Try to use [^[] instead of \S in this regex: G2:[^[]*\[RQ:3,4
[^[] means any character but [
Demo
(considering that strings like this: G2:7JAP[jGd]nGy8[]R8[RQ:3,4] are not possible)

Difficulties finding a Java regex equivalent to a JavaScript regex

So, what I am trying to do is:
I have a string:
Special Skills:
someText
could range
through multiple lines
Special Abilities:
another
someText
Background:
multiline
text
I've already managed to come up with the following regex. It works perfectly in JavaScript according to regexr.com, but not in Java, according to Intellij's built-in Check-Regex and freeformatter.com.
Special Abilities:\n(.*\n)+?(Special Skills:|Background:)
The expression should, first off, extract
Special Skills:
someText
could range
through multiple lines
Mind that the both the sections "Special Abilities" and "Background" are optional.
Since I am kindoff stuck here, any help would be greatly appreciated!
You may add the end-of-string(line) anchor $ as an alternative to the alternation group at the end of the pattern, make sure the . matches carriage returns with (?d) Pattern.UNIX_LINES embedded flag and wrap (.*\n)+? with a capturing group to capture all text it matches into 1 group (and the (.*\n)+? can be changed into a non-capturing group):
(?d)Special Abilities:\r?\n((?:.*\n)*?)(Special Skills:|Background:|$)
See this regex demo.
Details
(?d) - . now matches any char but a newline
Special Abilities: - a literal text
\r?\n - a CRLF or LF line ending
((?:.*\n)*?) - Group 1: zero or more, but as few as possible, repetitionsof 0+ chars other than LF symbol and then an LF symbol
(Special Skills:|Background:|$) - either of the three alternatives: Special Skills:, Background: or end of string ($).
An alternative expression:
(?ms)Special Abilities:\r?\n(.*?)(^Special Skills:|^Background:|\Z)
See this regex demo
Here, (?ms) defines the multiline and dotall modes (^ will match start of a line here and . will match all symbols). Instead of $, we need to use \Z - end of string anchor.

How do I capture the text that is before and after a multiple regex matches in java?

Given a test string of:
I have a 1234 and a 2345 and maybe a 3456 id.
I would like to match all the IDs (the four digit numbers) AND at the same time get 12 characters of their surrounding text (before and after) (if any!)
So the matches should be:
BEFORE MATCH AFTER
Match #1: I have a- 1234 -and a 2345-
Match #2: -1234 and a- 2345 -and maybe a
Match #3: and maybe a- 3456 -id.
This (-) is a space character
Note:
The BEFORE match of Match #1 is not 12 characters long (not many characters at the beginning of the string). Same with the AFTER match of Match #3 (not many characters after the last match)
Can I achieve these matches with a single regex in java?
My best attempt so far is to use a positive look behind and an atomic group (to get the surrounding text) but it fails in the beginning and the end of the string when there are not enough characters (like my note above)
(?<=(.{12}))(\d{4})(?>(.{12}))
This matches only 2345. If I use a small enough value for the quantifiers (2 instead of 12, for example) then I correctly match all IDs.
Here is a link to my regex playground where I was trying my regex's:
http://regex101.com/r/cZ6wG4
When you look at the MatchResult (http://docs.oracle.com/javase/7/docs/api/java/util/regex/MatchResult.html) interface implemented by the Matcher class (http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html) you will find the functions start() and end() which give you the index of the first / last character of the match within the input string. Once you have the indicies, you can use some simple math and the substring function to extract the parts you want.
I hope this helps you, because I won't write the entire code for you.
There might be a possibility to do what you want purely with regex. But I think using the indicies and substring is easier (and probably more reliable)
You can do it in a single regex:
Pattern regex = Pattern.compile("(?<=^.{0,10000}?(.{0,12}))(\\d+)(?=(.{0,12}))");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
before = regexMatcher.group(1);
match = regexMatcher.group(2);
after = regexMatcher.group(3);
}
Explanation:
(?<= # Assert that the following can be matched before current position
^.{0,10000}? # Match as few characters as possible from the start of the string
(.{0,12}) # Match and capture up to 12 chars in group 1
) # End of lookbehind
(\d+) # Match and capture in group 2: Any number
(?= # Assert that the following can be matched here:
(.*) # Match and capture up to 12 chars in group 3
) # End of lookahead
You don't need a lookbehind or an atomic group for this, but you do need a lookahead:
(.{0,12}?)\b(\d+)\b(?=(.{0,12}))
I'm assuming your ID's are not enclosed in longer words (thus the \b). I used a reluctant quantifier in the leading portion ({0,12}?) to prevent it consuming more than one ID when they're spaced close to each other, and in:
I have a 1234, 2345 and 1456 id.

Categories