RegEx for combining multiple sequences - java

As many people ,i am struggling with what it seems a "trivial" regex issue.
in a given text, whenever I encounter a word within {} brackets i need to extract it.At first i used
"\\{-?(\\w{3,})\\}"
and it worked ok:
as long as the word didnt have any white space or special character like ' .
For example {Project} returns Project.But {Project Test} or {Project D'arce} don't return anything.
i know that for white characters i need to use \s.But it is absolutely not clear for me how to add to the above , i tried :
"%\\{-?(\\w(\\s{3,})\\)\\}"))
but not working.Also what if i want to add words containing a special characters like ' ??? Its really frustrating

How about matching any character inside {..} which is not }?
To do so you can use negated character class [^..] like [^}]. So your regex can look like
"\\{[^}]{3,}\\}"
But if you want to limit your regex only to some specific alphabet you can also use character class to combine many characters and even predefined shorthand character classes like \w \s \d and so on.
So if you want to accept any word character \w or whitespace \s or ' your regex can look like
"\\{[\\w\\s']{3,}\\}"

You could use a character class [\w\s']and add to it what you could allow to match:
\{-?([\w\s']{3,})}
In Java
String regex = "\\{-?([\\w\\s']{3,})}";
Regex demo
If you want to prevent matching only 3 whitespace chars, you could use a repeating group:
\{-?\h*([\w']{3,}(?:\h+[\w']+)*)\h*}
About the pattern
\{ Match { char
-? Optional hyphen
\h* Match 0+ times a horizontal whitespace char
([\w\s']{3,}) Capture in a group matching 3 or more times either a word char, whitespace char or '
(?:\h[\w']+)* Repeat 0+ times matching 1+ horizontal whitespace chars followed by what is listed in the character class
\h* Match 0+ times a horizontal whitespace char
} Match }
In Java
String regex = "\\{-?\\h*([\\w']{3,}(?:\\h+[\\w']+)*)\\h*}";
Regex demo

Related

Regex for matching a character later in the string if a certain character is present before

Let's say I have the following string
['json.key']
I want a regex pattern that will match the entire string because it contains the matching closing '] to the opening ['.
But sometimes the [' and '] don't have to exist, and it should be okay too.
jsonKey
But I don't want strings like these to match
['jsonKey
jsonKey']
Because they are missing the matching [' and '].
The current regex pattern I have for this is
(\[')?[\w-]+('])?
But this doesn't quite work because it lets the two last cases pass.
I need a regex pattern for Java and JavaScript code. But they are separate modules, it could be different patterns.
In Java or Javascript you can use alternation and look arounds like this:
(?<!\S)(?:\['[\w-]+']|[\w-]+)(?!\S)
RegEx Demo
RegEx Details:
(?<!\S): Assert that previous char is not a non-whitespace
(?:: Start non-capture group
\['[\w-]+']: Match ['<1+ word char>']
|: OR
[\w-]+: Match 1+ of word char or hyphen
): End non-capture group
(?!\S): Assert that next char is not a non-whitespace

How to check if substring is contained within word Java

I want to write a regex pattern that looks at a string to see if there is a "." followed by letters or numbers or both with no space in between.
Currently I have:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[a-z]+", testStr)
But this doesn't work if there are numbers or symbols after the "." Can someone help me find a regex string that will return true for the string:
asdad-asdd/asdcs.pd(210)fsd
Just to reiterate the criteria for a successful match is the string contains any possible combination of letters, numbers, and/or symbols before and after a "."
You can match these strings by replacing [a-z]+ with [a-z\d\p{Punct}]+:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[a-z\\d\\p{Punct}]+", testStr)
The [a-z\d\p{Punct}]+ pattern matches lowercase ASCII letters, digits or punctuation. Add A-Z into the brackets if you plan to allow uppercase ASCII letters. See the regex demo.
However, you might also match any non-whitespace chars with \S+:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.\\S+", testStr)
If you do not want to allow another dot:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[^\\s.]+", testStr)
Here, [^\\s.]+ matches one or more chars other than whitespace and . chars.

how to build validation regex with starting and ending character validations?

how to build explicit Regex for string with alphabet at start and underscore or digit in the middle and alphabet or digit at end?
the pattern tried so far can be seen here with test cases.
https://regex101.com/r/JedpJu/3
I want to filter out strings like following.
_ (only underscore)
9a_d (string starting with numbers)
ad_ (ending with underscores)
EDIT
ad*d_rr (any special character apart from underscore also should not be allowed.)
You may use
^[A-Za-z](?:[A-Za-z0-9_]*[A-Za-z0-9])?$
which is the same as
^[A-Za-z](?:\w*[A-Za-z0-9])?$
See the regex demo
In Java, you may use it with .matches() and omit the anchors:
s.matches("[A-Za-z](?:[A-Za-z0-9_]*[A-Za-z0-9])?")
s.matches("[A-Za-z](?:\\w*[A-Za-z0-9])?")
If the string may include line breaks use
s.matches("(?s)[A-Za-z](?:[A-Za-z0-9_]*[A-Za-z0-9])?")
s.matches("(?s)[A-Za-z](?:\\w*[A-Za-z0-9])?")
where (?s) enables . to match line break chars.

regex capture includes too much

I have a string from which I would like to caputre all after and including colon until (excluding) white space or paranthesis.
Why does the following regex include the paranthesis in the string match?
:(.*?)[\(\)\s] or also :(.+?)[\)\s] (non-greedy) does not work.
Example input: WHERE t.operator_id = :operatorID AND (t.merchant_id = :merchantID) AND t.readerApplication_id = :readerApplicationID AND t.accountType in :accountTypes
Should exctract :operatorID, :merchantID, :readerApplicationID, :accountTypes.
But my regexes extract for the second match :marchantID)
What is wrong and why?
Even if I use an exacter mapping condition in the capture, it does not work: :([a-zA-z0-9_]+?)[\)\(\s]
Put your conditional "followed by space or paren" as a lookahead, so that it sees but doesn't match. Right now you are explicitly matching parentheses with [\(\)\s]:
:(.+?)(?=[\s\(\)])
https://regex101.com/r/im8KWF/1/
Or, use the built-in \b "word boundary", which is also a "zero-width" assertion meaning the same thing*:
:(.+?)\b
https://regex101.com/r/FnnzGM/3/
*Definition of word boundary from regular-expressions.info:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character. After the last character in the string, if the last
character is a word character. Between two characters in the string,
where one is a word character and the other is not a word character.

java regex matching &[text(text - text text) !text]

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text(text - text text) !text]. Here text can be any char really. and the spacing is important. The text will be listed as shown.
I have tried the following regex but I cannot seem to get it to work:
&\\[([^\\]]*)\\]
Any help would be greatly appreciated.
You replace text with \w+ to capture 1 or more word characters.
Assuming everything else was a literal, the following regular expression should work:
&\[\w+\(\w+ - \w+ \w+\) !\w+\]
You could also use [a-zA-Z] in place of \w if you would like. It is sometimes easier to understand since it explicitly describes the characters to match, a-z and A-Z inclusive.
&\[[a-zA-Z]+\([a-zA-Z]+ - [a-zA-Z]+ [a-zA-Z]+\) ![a-zA-Z]+\]
And for one character only, remove the +
&\[\w\(\w - \w \w\) !\w\]
&\[[a-zA-Z]\([a-zA-Z] - [a-zA-Z] [a-zA-Z]\) ![a-zA-Z]\]
P.S - I cant remember if -, &, or ! are coutned as regex symbols and if they are you can make them literals by using \-, \&, or \!.
P.P.S - In java you have to escape \ so \w becomes \\w in a string.
If you want to extract text as groups to work with them after:
&\\[(\\w+)\\((\w+)\\s\\-\\s(\\w+)\\s(\\w+)\\)\\s!(\\w+)]
example

Categories