Difficulties finding a Java regex equivalent to a JavaScript regex - java

So, what I am trying to do is:
I have a string:
Special Skills:
someText
could range
through multiple lines
Special Abilities:
another
someText
Background:
multiline
text
I've already managed to come up with the following regex. It works perfectly in JavaScript according to regexr.com, but not in Java, according to Intellij's built-in Check-Regex and freeformatter.com.
Special Abilities:\n(.*\n)+?(Special Skills:|Background:)
The expression should, first off, extract
Special Skills:
someText
could range
through multiple lines
Mind that the both the sections "Special Abilities" and "Background" are optional.
Since I am kindoff stuck here, any help would be greatly appreciated!

You may add the end-of-string(line) anchor $ as an alternative to the alternation group at the end of the pattern, make sure the . matches carriage returns with (?d) Pattern.UNIX_LINES embedded flag and wrap (.*\n)+? with a capturing group to capture all text it matches into 1 group (and the (.*\n)+? can be changed into a non-capturing group):
(?d)Special Abilities:\r?\n((?:.*\n)*?)(Special Skills:|Background:|$)
See this regex demo.
Details
(?d) - . now matches any char but a newline
Special Abilities: - a literal text
\r?\n - a CRLF or LF line ending
((?:.*\n)*?) - Group 1: zero or more, but as few as possible, repetitionsof 0+ chars other than LF symbol and then an LF symbol
(Special Skills:|Background:|$) - either of the three alternatives: Special Skills:, Background: or end of string ($).
An alternative expression:
(?ms)Special Abilities:\r?\n(.*?)(^Special Skills:|^Background:|\Z)
See this regex demo
Here, (?ms) defines the multiline and dotall modes (^ will match start of a line here and . will match all symbols). Instead of $, we need to use \Z - end of string anchor.

Related

Replace URL String with Integer characters located in the end of that String

I have some URL link and tried to replace all non-integer values with integers in the end of the link using regex
The URL is something like
https://some.storage.com/test123456.bucket.com/folder/80.png
Regex i tried to use:
Integer.parseInt(string.replaceAll(".*[^\\d](\\d+)", "$1"))
Output for that regex is "80.png", and i need only "80". Also i tried this tool - https://regex101.com. And as i see the main problem is that ".png" not matching with my regex and then, after substitution, this part adding to matching group.
I'm totally noob in regex, so i kindly ask you for help.
You may use
String result = string.replaceAll("(?:.*\\D)?(\\d+).*", "$1");
See the regex demo.
NOTE: If there is no match, the result will be equal to the string value. If you do not want this behavior, instead of "(?:.*\\D)?(\\d+).*", use "(?:.*\\D)?(\\d+).*|.+".
Details
(?:.*\D)? - an optional (it must be optional because the Group 1 pattern might also be matched at the start of the string) sequence of
.* - any 0+ chars other than line break chars, as many as possible
\D - a non-digit
(\d+) - Group 1: any one or more digits
.* - any 0+ chars other than line break chars, as many as possible
The replacement is $1, the backreference to Group 1 value, actually, the last 1+ digit chunk in the string that has no line breaks.
Line breaks can be supported if you prepend the pattern with the (?s) inline DOTALL modifier, i.e. "(?s)(?:.*\\D)?(\\d+).*|.+".

Regex - how to get windows service state in Java?

When running the command sc query <serviceName>, the next output is getting back:
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
How can I extract the service state (in this case is RUNNING) using regex with Java?
You could match:
\b a word boundary
the word STATE followed by
one or more whitespace characters \s+
one \d or one or more digits \d+
one or more whitespace characters \s+
Capture in a group (group 1) one or more non-whitespace characters (\S+) which in this case will be RUNNING
\bSTATE\s+:\s+\d+\s+(\S+)
In Java
\\bSTATE\\s+:\\s+\\d+\\s+(\\S+)
Test
You can try the below regex
(.*?STATE\s*:\s*\d*\s*)(\w+)
and match group 2. See this link Regex Solution
You can use this regex:
STATE.*?(\S+)\n
The matched group should be RUNNING. You can see details and example here.
The important part is the use of a lazy star operator *? that allows matching the whole word at the end of the line instead of a single letter.
P.S: depending on whether you match multiline regex or not, you might want to switch the \n for a $.
There is another way to quickly get this value using .replaceFirst():
String result = s.replaceFirst("(?s).*?STATE[\\d\\s:]*(\\w+).*", "$1");
See the regex demo. Since the string is always in this format, and is not that long, this approach is quite convenient to implement in Java.
See the online Java demo.
Details
(?s) - a DOTALL modifier making . match any char including line break chars
.*? - any 0+ chars, as few as possible
STATE - substring STATE
[\d\s:]* - 0+ digits, whitespaces and :
(\w+) - Capturing group 1 (what we want to keep, later, we can refer to the value using $1 placeholder from the replacement pattern): 1+ word chars
.* - any 0+ chars as many as possible (up to the string end).

java regex matching &[text(text - text text) !text]

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text(text - text text) !text]. Here text can be any char really. and the spacing is important. The text will be listed as shown.
I have tried the following regex but I cannot seem to get it to work:
&\\[([^\\]]*)\\]
Any help would be greatly appreciated.
You replace text with \w+ to capture 1 or more word characters.
Assuming everything else was a literal, the following regular expression should work:
&\[\w+\(\w+ - \w+ \w+\) !\w+\]
You could also use [a-zA-Z] in place of \w if you would like. It is sometimes easier to understand since it explicitly describes the characters to match, a-z and A-Z inclusive.
&\[[a-zA-Z]+\([a-zA-Z]+ - [a-zA-Z]+ [a-zA-Z]+\) ![a-zA-Z]+\]
And for one character only, remove the +
&\[\w\(\w - \w \w\) !\w\]
&\[[a-zA-Z]\([a-zA-Z] - [a-zA-Z] [a-zA-Z]\) ![a-zA-Z]\]
P.S - I cant remember if -, &, or ! are coutned as regex symbols and if they are you can make them literals by using \-, \&, or \!.
P.P.S - In java you have to escape \ so \w becomes \\w in a string.
If you want to extract text as groups to work with them after:
&\\[(\\w+)\\((\w+)\\s\\-\\s(\\w+)\\s(\\w+)\\)\\s!(\\w+)]
example

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

Regex for "* word"

Any Regex masters out there? I need a regular expression in Java that matches:
"RANDOMSTUFF SPECIFICWORD"
Including the quotation marks.
Thus I need
to match the first quote,
RANDOMSTUFF (any number of words with spaces between preceding SPECIFICWORD)
SPECIFICWORD (a specific word which I won't specify here.)
and the ending quote.
I don't want to match things such as:
RANDOMSTUFF SPECIFICWORD
"RANDOMSTUFF NOTTHESPECIFICWORD"
"RANDOMSTUFF SPECIFICWORD MORERANDOMSTUFF"
\".*\sSPECIFICWORD\"
If you don't want to allow quotes in between, use \"[^"]*\sSPECIFICWORD\"
. matches any character
* says 0 or more of the preceding character (in this case, 0 or more of any characters)
\s matches any whitespace character
SPECIFICWORD will be treated as a string literal, assuming there are no special characters (escape them if there are)
\" matches the quote
[^"] means any character except a quote (the ^ is what makes it 'except')
Also, this link could be useful. Regex's are powerful expressions and are applicable across virtually any language, so it would be a good thing to become comfortable with using them.
EDIT:
As several other posters have pointed out, adding ^ to the beginning and $ to the end will only match if the entire line matches.
^ matches the beginning of the line
$ matches the end of the line
^.*\s+SPECIFICWORD"$
'^' matches 'from the start of the line'
.* matches anything
\s+ matches 'any amount of whitespace, but at least some'
SPECIFICWORD" is a string literal
$ means 'this is the end of the line'
Note that ^ and $ are not always 'line'-based; most languages allow you to specify a 'multiline' mode that would cause them to match 'start of the string/end of the string' instead of one line at a time.
Will this string be matched as a line by line basis or will it be found within the text? If so, you can add anchors to ensure that it matches the string.
^(\".*\sSPECIFICWPRD\")$
Saying, at the start of the line, look for a double quote followed by zero or more random characters followed by a single whitespace, followed by the specific word, followed by a double quote at the end of the string.
Optionally, there are excellent tools for designing regex patterns and seeing what they match in real time.
Here are a couple of examples:
http://gskinner.com/RegExr/
http://regex101.com/r/zC3fM1
Try:
\"[\w\s]*SPECIFICWORD\"
Works like this:
\" matches opening quote
[\w\s]* matches zero or more of the characters from the following sets:
[a-zA-Z_0-9] (\w part)
[ \t\n\x0B\f\r] (\s part)
SPECIFICWORD matches the SPECIFICWORD
\" matches closing quote

Categories