Java Pattern regex search between strings - java

Given the following strings (stringToTest):
G2:7JAPjGdnGy8jxR8[RQ:1,2]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
G2:7JAPjGdnGy8jxR8[RQ:3,4]-G3:jRo6pN8ZW9aglYz[RQ:3,4]
And the Pattern:
Pattern p = Pattern.compile("G2:\\S+RQ:3,4");
if (p.matcher(stringToTest).find())
{
// Match
}
For string 1 I DON'T want to match, because RQ:3,4 is associated with the G3 section, not G2, and I want string 2 to match, as RQ:3,4 is associated with G2 section.
The problem with the current regex is that it's searching too far and reaching the RQ:3,4 eventually in case 1 even though I don't want to consider past the G2 section.
It's also possible that the stringToTest might be (just one section):
G2:7JAPjGdnGy8jxR8[RQ:3,4]
The strings 7JAPjGdnGy8jxR8 and jRo6pN8ZW9aglYz are variable length hashes.
Can anyone help me with the correct regex to use, to start looking at G2 for RQ:3,4 but stopping if it reaches the end of the string or -G (the start of the next section).

You may use this regex with a negative lookahead in between:
G2:(?:(?!G\d+:)\S)*RQ:3,4
RegEx Demo
RegEx Details:
G2:: Match literal text G2:
(?: Start a non-capture group
(?!G\d+:): Assert that we don't have a G<digit>: ahead of us
\S: Match a non-whitespace character
)*: End non-capture group. Match 0 or more of this
RQ:3,4: Match literal text RQ:3,4
In Java use this regex:
String re = "G2:(?:(?!G\\d+:)\\S)*RQ:3,4";

The problem is that \S matches any whitespace char and the regex engine parses the text from left to right. Once it finds G2: it grabs all non-whitespaces to the right (since \S* is a ghreedy subpattern) and then backtracks to find the rightmost occurrence of RQ:3,4.
In a general case, you may use
String regex = "G2:(?:(?!-G)\\S)*RQ:3,4";
See the regex demo. (?:(?!-G)\S)* is a tempered greedy token that will match 0+ occurrences of a non-whitespace char that does not start a -G substring.
If the hyphen is only possible in front of the next section, you may subtract - from \S:
String regex = "G2:[^\\s-]*RQ:3,4"; // using a negated character class
String regex = "G2:[\\S&&[^-]]*RQ:3,4"; // using character class subtraction
See this regex demo. [^\\s-]* will match 0 or more chars other than whitespace and -.

Try to use [^[] instead of \S in this regex: G2:[^[]*\[RQ:3,4
[^[] means any character but [
Demo
(considering that strings like this: G2:7JAP[jGd]nGy8[]R8[RQ:3,4] are not possible)

Related

Regex for matching a character later in the string if a certain character is present before

Let's say I have the following string
['json.key']
I want a regex pattern that will match the entire string because it contains the matching closing '] to the opening ['.
But sometimes the [' and '] don't have to exist, and it should be okay too.
jsonKey
But I don't want strings like these to match
['jsonKey
jsonKey']
Because they are missing the matching [' and '].
The current regex pattern I have for this is
(\[')?[\w-]+('])?
But this doesn't quite work because it lets the two last cases pass.
I need a regex pattern for Java and JavaScript code. But they are separate modules, it could be different patterns.
In Java or Javascript you can use alternation and look arounds like this:
(?<!\S)(?:\['[\w-]+']|[\w-]+)(?!\S)
RegEx Demo
RegEx Details:
(?<!\S): Assert that previous char is not a non-whitespace
(?:: Start non-capture group
\['[\w-]+']: Match ['<1+ word char>']
|: OR
[\w-]+: Match 1+ of word char or hyphen
): End non-capture group
(?!\S): Assert that next char is not a non-whitespace

Replace URL String with Integer characters located in the end of that String

I have some URL link and tried to replace all non-integer values with integers in the end of the link using regex
The URL is something like
https://some.storage.com/test123456.bucket.com/folder/80.png
Regex i tried to use:
Integer.parseInt(string.replaceAll(".*[^\\d](\\d+)", "$1"))
Output for that regex is "80.png", and i need only "80". Also i tried this tool - https://regex101.com. And as i see the main problem is that ".png" not matching with my regex and then, after substitution, this part adding to matching group.
I'm totally noob in regex, so i kindly ask you for help.
You may use
String result = string.replaceAll("(?:.*\\D)?(\\d+).*", "$1");
See the regex demo.
NOTE: If there is no match, the result will be equal to the string value. If you do not want this behavior, instead of "(?:.*\\D)?(\\d+).*", use "(?:.*\\D)?(\\d+).*|.+".
Details
(?:.*\D)? - an optional (it must be optional because the Group 1 pattern might also be matched at the start of the string) sequence of
.* - any 0+ chars other than line break chars, as many as possible
\D - a non-digit
(\d+) - Group 1: any one or more digits
.* - any 0+ chars other than line break chars, as many as possible
The replacement is $1, the backreference to Group 1 value, actually, the last 1+ digit chunk in the string that has no line breaks.
Line breaks can be supported if you prepend the pattern with the (?s) inline DOTALL modifier, i.e. "(?s)(?:.*\\D)?(\\d+).*|.+".

Regex - how to get windows service state in Java?

When running the command sc query <serviceName>, the next output is getting back:
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
How can I extract the service state (in this case is RUNNING) using regex with Java?
You could match:
\b a word boundary
the word STATE followed by
one or more whitespace characters \s+
one \d or one or more digits \d+
one or more whitespace characters \s+
Capture in a group (group 1) one or more non-whitespace characters (\S+) which in this case will be RUNNING
\bSTATE\s+:\s+\d+\s+(\S+)
In Java
\\bSTATE\\s+:\\s+\\d+\\s+(\\S+)
Test
You can try the below regex
(.*?STATE\s*:\s*\d*\s*)(\w+)
and match group 2. See this link Regex Solution
You can use this regex:
STATE.*?(\S+)\n
The matched group should be RUNNING. You can see details and example here.
The important part is the use of a lazy star operator *? that allows matching the whole word at the end of the line instead of a single letter.
P.S: depending on whether you match multiline regex or not, you might want to switch the \n for a $.
There is another way to quickly get this value using .replaceFirst():
String result = s.replaceFirst("(?s).*?STATE[\\d\\s:]*(\\w+).*", "$1");
See the regex demo. Since the string is always in this format, and is not that long, this approach is quite convenient to implement in Java.
See the online Java demo.
Details
(?s) - a DOTALL modifier making . match any char including line break chars
.*? - any 0+ chars, as few as possible
STATE - substring STATE
[\d\s:]* - 0+ digits, whitespaces and :
(\w+) - Capturing group 1 (what we want to keep, later, we can refer to the value using $1 placeholder from the replacement pattern): 1+ word chars
.* - any 0+ chars as many as possible (up to the string end).

Regex match a word that starts with a string

How to generate a regex to match only one word which starts with big
I have tried to form a regex with start and end string. Starting string as big and ending string as \s space.
Consider this line You are my big-big-big friend and also a brother
When i use the below regex, it gives me result as big-big-bigfriendandalsoabrother
(.big.*\s)
But i am expecting result as big-big-big. The word can be at starting of line or at the end. I want to generate a regex to match the full word which starts with big
Help would be appreciated.
The following regex may be used:
(?<!\S)big\S*
Details:
(?<!\S) - a negative lookbehind that makes sure there is start of string or a whitespace immediately to the left of the current location
big - a literal substring
\S* - any 0 or more chars other than whitespace chars
You can use the Regex
(?!\s)big\S*
It'll match exactly what you asked for.
Explanation:
(?!\s)
It may or may not have a whitespace behind it, but it shouldn't be counted as part of the capture (negative lookahead)
big
Will find the word big
\S*
Will find any character that's NOT a whitespace, 0 or more times
So:
(?!\s)big\S*
Finds the word big, followed by anything that's not a whitespace, until it hits a whitespace. It may or may not have a whitespace behind.

regular expressions using java.util.regex API- java

How can I create a regular expression to search strings with a given pattern? For example I want to search all strings that match pattern '*index.tx?'. Now this should find strings with values index.txt,mainindex.txt and somethingindex.txp.
Pattern pattern = Pattern.compile("*.html");
Matcher m = pattern.matcher("input.html");
This code is obviously not working.
You need to learn regular expression syntax. It is not the same as using wildcards. Try this:
Pattern pattern = Pattern.compile("^.*index\\.tx.$");
There is a lot of information about regular expressions here. You may find the program RegexBuddy useful while you are learning regular expressions.
The code you posted does not work because:
dot . is a special regex character. It means one instance of any character.
* means any number of occurrences of the preceding character.
therefore, .* means any number of occurrences of any character.
so you would need something like
Pattern pattern = Pattern.compile(".*\\.html.*");
the reason for the \\ is because we want to insert dot, although it is a special regex sign.
this means: match a string in which at first there are any number of wild characters, followed by a dot, followed by html, followed by anything.
* matches zero or more occurrences of the preceding token, so if you want to match zero or more of any character, use .* instead (. matches any char).
Modified regex should look something like this:
Pattern pattern = Pattern.compile("^.*\\.html$");
^ matches the start of the string
.* matches zero or more of any char
\\. matches the dot char (if not escaped it would match any char)
$ matches the end of the string

Categories