java regex matching &[text(text - text text) !text] - java

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text(text - text text) !text]. Here text can be any char really. and the spacing is important. The text will be listed as shown.
I have tried the following regex but I cannot seem to get it to work:
&\\[([^\\]]*)\\]
Any help would be greatly appreciated.

You replace text with \w+ to capture 1 or more word characters.
Assuming everything else was a literal, the following regular expression should work:
&\[\w+\(\w+ - \w+ \w+\) !\w+\]
You could also use [a-zA-Z] in place of \w if you would like. It is sometimes easier to understand since it explicitly describes the characters to match, a-z and A-Z inclusive.
&\[[a-zA-Z]+\([a-zA-Z]+ - [a-zA-Z]+ [a-zA-Z]+\) ![a-zA-Z]+\]
And for one character only, remove the +
&\[\w\(\w - \w \w\) !\w\]
&\[[a-zA-Z]\([a-zA-Z] - [a-zA-Z] [a-zA-Z]\) ![a-zA-Z]\]
P.S - I cant remember if -, &, or ! are coutned as regex symbols and if they are you can make them literals by using \-, \&, or \!.
P.P.S - In java you have to escape \ so \w becomes \\w in a string.

If you want to extract text as groups to work with them after:
&\\[(\\w+)\\((\w+)\\s\\-\\s(\\w+)\\s(\\w+)\\)\\s!(\\w+)]
example

Related

Escaping ampersand in the regexp

I have to match words which are ending with ampersand character. I came up with this regex: \w*\&\b. It works correctly for all letters, for example: \w*a\b, but when I add escaped ampersand (as in first example) it won't match words ending up with it.
Btw I was using https://regex101.com/ to test my regexps.
Here's the same regex provided by #MonkeyZeus with a small correction to accept whole word ending with multiple ampersands (e.g. wordendingwithmultiple&&&&):
\w+\&+(?=\W|$)
\\&+ instead of just &.
The demo link.
You can use this:
\w+&(?=\W|$)
\w+ - require at least one word char
& - followed by an ampersand
(?=\W|$) - positive lookahead after the ampersand for a non-word char or the end of the line
Just make sure to double up on the backslashes for Java string escaping:
\\w+&(?=\\W|$)
https://regex101.com/r/WaSEyJ/1/

RegEx for combining multiple sequences

As many people ,i am struggling with what it seems a "trivial" regex issue.
in a given text, whenever I encounter a word within {} brackets i need to extract it.At first i used
"\\{-?(\\w{3,})\\}"
and it worked ok:
as long as the word didnt have any white space or special character like ' .
For example {Project} returns Project.But {Project Test} or {Project D'arce} don't return anything.
i know that for white characters i need to use \s.But it is absolutely not clear for me how to add to the above , i tried :
"%\\{-?(\\w(\\s{3,})\\)\\}"))
but not working.Also what if i want to add words containing a special characters like ' ??? Its really frustrating
How about matching any character inside {..} which is not }?
To do so you can use negated character class [^..] like [^}]. So your regex can look like
"\\{[^}]{3,}\\}"
But if you want to limit your regex only to some specific alphabet you can also use character class to combine many characters and even predefined shorthand character classes like \w \s \d and so on.
So if you want to accept any word character \w or whitespace \s or ' your regex can look like
"\\{[\\w\\s']{3,}\\}"
You could use a character class [\w\s']and add to it what you could allow to match:
\{-?([\w\s']{3,})}
In Java
String regex = "\\{-?([\\w\\s']{3,})}";
Regex demo
If you want to prevent matching only 3 whitespace chars, you could use a repeating group:
\{-?\h*([\w']{3,}(?:\h+[\w']+)*)\h*}
About the pattern
\{ Match { char
-? Optional hyphen
\h* Match 0+ times a horizontal whitespace char
([\w\s']{3,}) Capture in a group matching 3 or more times either a word char, whitespace char or '
(?:\h[\w']+)* Repeat 0+ times matching 1+ horizontal whitespace chars followed by what is listed in the character class
\h* Match 0+ times a horizontal whitespace char
} Match }
In Java
String regex = "\\{-?\\h*([\\w']{3,}(?:\\h+[\\w']+)*)\\h*}";
Regex demo

Difficulties finding a Java regex equivalent to a JavaScript regex

So, what I am trying to do is:
I have a string:
Special Skills:
someText
could range
through multiple lines
Special Abilities:
another
someText
Background:
multiline
text
I've already managed to come up with the following regex. It works perfectly in JavaScript according to regexr.com, but not in Java, according to Intellij's built-in Check-Regex and freeformatter.com.
Special Abilities:\n(.*\n)+?(Special Skills:|Background:)
The expression should, first off, extract
Special Skills:
someText
could range
through multiple lines
Mind that the both the sections "Special Abilities" and "Background" are optional.
Since I am kindoff stuck here, any help would be greatly appreciated!
You may add the end-of-string(line) anchor $ as an alternative to the alternation group at the end of the pattern, make sure the . matches carriage returns with (?d) Pattern.UNIX_LINES embedded flag and wrap (.*\n)+? with a capturing group to capture all text it matches into 1 group (and the (.*\n)+? can be changed into a non-capturing group):
(?d)Special Abilities:\r?\n((?:.*\n)*?)(Special Skills:|Background:|$)
See this regex demo.
Details
(?d) - . now matches any char but a newline
Special Abilities: - a literal text
\r?\n - a CRLF or LF line ending
((?:.*\n)*?) - Group 1: zero or more, but as few as possible, repetitionsof 0+ chars other than LF symbol and then an LF symbol
(Special Skills:|Background:|$) - either of the three alternatives: Special Skills:, Background: or end of string ($).
An alternative expression:
(?ms)Special Abilities:\r?\n(.*?)(^Special Skills:|^Background:|\Z)
See this regex demo
Here, (?ms) defines the multiline and dotall modes (^ will match start of a line here and . will match all symbols). Instead of $, we need to use \Z - end of string anchor.

Regex Match word that include a Dot

I have a Question I have this Sentence for Example:
"HalloAnna daveca.nn dave anna ca. anna"
And I only wanna match the single Standing "ca." .
My RegEx is like that :
(?i)\b(ca\.)\b
But this doesn't work and I don't know why. Any ideas ?
//Update
I excecute it with:
testSource.replaceAll()
and with
pattern.matcher(testSource).replaceAll().
both doesn´t work.
You must escape the dot and assert a non-word following:
(?i)\bca\.(?=\W)
See live demo.
You should use it like this:
Pattern.compile("(?i)\\b(ca\\.)(?=\\W)").matcher(a).replaceAll("SOME TEXT");
Which if you omit the java escapes gives a regex: (?i)\b(ca\.)\W.
Every \ in normal regex has to be escaped in java - \\.
Also, before a word you have word boundary (\b), but it applies only to a part in String where you have a change from whitespace to a alphanumeric character or the other way around. But in your case you have a dot, which is not an alphanumeric character, so you can't use \b at the end. You can use \W which means that a non-word character is following the dot. But to use \W you need to ignore it in the capture group (so it won't be replaced) - (?=.
Another issue was that you used ., which matches any character, but you actually want to match the real dot, so to do that you have to escape it - \., which in java String becomes \\..

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

Categories