I am trying to match "tab" and "newline" meta chars but without "spaces" with REGEX in Java.
\s matches evrything i.e. tab, space and new line... But, I don't want "space" to be matched.
How do I do that?
Thanks.
One way to do it is:
[^\\S ]
The negated character class makes this regex to match anything except - \\S (non-whitespace) and " "(space) character. So, it will match \\s except space.
Explicitly list them inside [...] (set of characters):
"[\\t\\n\\r\\f\\v]"
Related
I have a string from which I would like to caputre all after and including colon until (excluding) white space or paranthesis.
Why does the following regex include the paranthesis in the string match?
:(.*?)[\(\)\s] or also :(.+?)[\)\s] (non-greedy) does not work.
Example input: WHERE t.operator_id = :operatorID AND (t.merchant_id = :merchantID) AND t.readerApplication_id = :readerApplicationID AND t.accountType in :accountTypes
Should exctract :operatorID, :merchantID, :readerApplicationID, :accountTypes.
But my regexes extract for the second match :marchantID)
What is wrong and why?
Even if I use an exacter mapping condition in the capture, it does not work: :([a-zA-z0-9_]+?)[\)\(\s]
Put your conditional "followed by space or paren" as a lookahead, so that it sees but doesn't match. Right now you are explicitly matching parentheses with [\(\)\s]:
:(.+?)(?=[\s\(\)])
https://regex101.com/r/im8KWF/1/
Or, use the built-in \b "word boundary", which is also a "zero-width" assertion meaning the same thing*:
:(.+?)\b
https://regex101.com/r/FnnzGM/3/
*Definition of word boundary from regular-expressions.info:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character. After the last character in the string, if the last
character is a word character. Between two characters in the string,
where one is a word character and the other is not a word character.
I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text(text - text text) !text]. Here text can be any char really. and the spacing is important. The text will be listed as shown.
I have tried the following regex but I cannot seem to get it to work:
&\\[([^\\]]*)\\]
Any help would be greatly appreciated.
You replace text with \w+ to capture 1 or more word characters.
Assuming everything else was a literal, the following regular expression should work:
&\[\w+\(\w+ - \w+ \w+\) !\w+\]
You could also use [a-zA-Z] in place of \w if you would like. It is sometimes easier to understand since it explicitly describes the characters to match, a-z and A-Z inclusive.
&\[[a-zA-Z]+\([a-zA-Z]+ - [a-zA-Z]+ [a-zA-Z]+\) ![a-zA-Z]+\]
And for one character only, remove the +
&\[\w\(\w - \w \w\) !\w\]
&\[[a-zA-Z]\([a-zA-Z] - [a-zA-Z] [a-zA-Z]\) ![a-zA-Z]\]
P.S - I cant remember if -, &, or ! are coutned as regex symbols and if they are you can make them literals by using \-, \&, or \!.
P.P.S - In java you have to escape \ so \w becomes \\w in a string.
If you want to extract text as groups to work with them after:
&\\[(\\w+)\\((\w+)\\s\\-\\s(\\w+)\\s(\\w+)\\)\\s!(\\w+)]
example
I have a Question I have this Sentence for Example:
"HalloAnna daveca.nn dave anna ca. anna"
And I only wanna match the single Standing "ca." .
My RegEx is like that :
(?i)\b(ca\.)\b
But this doesn't work and I don't know why. Any ideas ?
//Update
I excecute it with:
testSource.replaceAll()
and with
pattern.matcher(testSource).replaceAll().
both doesn´t work.
You must escape the dot and assert a non-word following:
(?i)\bca\.(?=\W)
See live demo.
You should use it like this:
Pattern.compile("(?i)\\b(ca\\.)(?=\\W)").matcher(a).replaceAll("SOME TEXT");
Which if you omit the java escapes gives a regex: (?i)\b(ca\.)\W.
Every \ in normal regex has to be escaped in java - \\.
Also, before a word you have word boundary (\b), but it applies only to a part in String where you have a change from whitespace to a alphanumeric character or the other way around. But in your case you have a dot, which is not an alphanumeric character, so you can't use \b at the end. You can use \W which means that a non-word character is following the dot. But to use \W you need to ignore it in the capture group (so it won't be replaced) - (?=.
Another issue was that you used ., which matches any character, but you actually want to match the real dot, so to do that you have to escape it - \., which in java String becomes \\..
I am having a string "<?xml version=2.0><rss>Feed</rss>" I wrote a regex to match this string as
"<?xml.*<rss.*</rss>"
But if the input string contains \n like `"\nFeed" doesn't work for the above regex.
How to modify my regex to include \n character between strings.
The matching behavior of a dot can be controlled with a flag. It looks like in Java the default matching behavior for the dot is any character except the line terminators \r and \n.
I'm not a Java programmer, but usually using (?s) at beginning of a search string changes the matching behavior for a dot to any character including line terminators. So perhaps "(?s)<?xml.*<rss.*</rss>" works.
But better would be here to use "<?xml.*?<rss[\s\S]*?</rss>" as search string.
\s matches any whitespace character which includes line terminators and \S matches any non whitespace character. Both in square brackets results in matching any character.
For completness: [\w\W] matches also always any character.
You can combine it with (\\n)*. It is necessary to add an extra \ because it is a special character.
Another option is to execute replaceAll("\\n","") before executing the regex.
In all the tutorials I have read they always say that \s matches a whitespace. So why this instruction
System.out.println("line1 \n line2".replaceAll("\\s\\s*", " "));
have this output :
line1 line2
Thanks for your response.
The string literal "\\s\\s*" is equivalent to the regular expression syntax \s\s* which matches "a whitespace character followed by zero or more whitespace characters".
A whitespace character is defined as [ \t\n\x0B\f\r], which includes spaces and newlines.
\\s matches a whitespace character, where the whitespace characters are - [ \t\n\x0B\f\r]. It's not just a space. I suspect this is what you inferred from whitespace. See Pattern class documentation.
Also, you can replace your regex \\s\\s* with just \\s+.
"\\s\\s*" is the escaped version of \s\s* which is the same of \s+
It maches one or more of any white-space char. White-space chars are [ \t\n\x0B\f\r]. So it will replace multiple white-spaces by only one in each match.
First, this regex is a bit silly: \\s\\s* will match one or more whitespace characters, since the \\s character class matches all whitespace.
But, it could be expressed easier as \\s+, which accomplishes the exact same thing.