How to create regex that matches only not spaces and not letters? - java

I have String but very long.
I need to remove all not white spaces and all not letters :
My pattern is :
String content=readUrl();
content.replaceAll("(\\S)|(^\\[a-z])", "");
And this doesn't work.
Why does my regex replace everything?

You can use a negated character class:
(?i)[^\\sa-z]+
or better if you want to support all alphabets:
[^\\s\\p{L}]+

Breaking it down:
content.replaceAll("(\\S)|(^\\[a-z])", "");
will replace everything that matches the pattern. The | means a sequence matches the pattern if it matches the part on the left, or it matches the part on the right. The pattern on the left, (\\S), matches every character that isn't white space. So every character that isn't white space will match, and will be eliminated. I think the part on the right will only match if your string actually begins with the five characters [a-z] (since you've told it to treat the [ literally), so it doesn't really have an effect on the result. But the result will be that the result will be just white space.

Related

regex capture includes too much

I have a string from which I would like to caputre all after and including colon until (excluding) white space or paranthesis.
Why does the following regex include the paranthesis in the string match?
:(.*?)[\(\)\s] or also :(.+?)[\)\s] (non-greedy) does not work.
Example input: WHERE t.operator_id = :operatorID AND (t.merchant_id = :merchantID) AND t.readerApplication_id = :readerApplicationID AND t.accountType in :accountTypes
Should exctract :operatorID, :merchantID, :readerApplicationID, :accountTypes.
But my regexes extract for the second match :marchantID)
What is wrong and why?
Even if I use an exacter mapping condition in the capture, it does not work: :([a-zA-z0-9_]+?)[\)\(\s]
Put your conditional "followed by space or paren" as a lookahead, so that it sees but doesn't match. Right now you are explicitly matching parentheses with [\(\)\s]:
:(.+?)(?=[\s\(\)])
https://regex101.com/r/im8KWF/1/
Or, use the built-in \b "word boundary", which is also a "zero-width" assertion meaning the same thing*:
:(.+?)\b
https://regex101.com/r/FnnzGM/3/
*Definition of word boundary from regular-expressions.info:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character. After the last character in the string, if the last
character is a word character. Between two characters in the string,
where one is a word character and the other is not a word character.

Wierd behaviour on regexp Matcher

My regexp below is supposed to filter out capital words with a length of 8-10, where 0-2 numbers may appear. It has been working for all of my tests, but for some reason it got stuck on the string below. And n.group(0) only contains an empty string instead of the matched "word".
static final Pattern PATTERN =
Pattern.compile("\\b(?=[A-Z\\d]{9,10}\\b)(?:[A-Z]*\\d){0,2}[A-Z]*\\b");
Matcher n = LONG_PASSWORD.matcher("foo ID:636152727 bar");
while (n.find()) {
String s = n.group(0);
resultArrayList.add(s);
}
Why does my pattern match ID:636152727?
Some examples that I want to filter out (which is working):
AAAAAAAAAA
1AAAAAAAAA
1AAAAAAAA1
etc...
I don't have a better solution to offer than the one in Ωmega's answer, but I think I can explain what's happening. What it boils down to is that the first \b and the last \b are matching the same spot: right after the colon.
That's the first place where the lookahead can match, since it's followed by nine digits and a word boundary. Then the next part of the regex tries to match two digits (interspersed with any number of uppercase letters) followed by a word boundary, and fails. So it tries to match just one digit (ditto), and fails again. Then it tries matching zero digits (interspersed with zero letters), and it succeeds, without advancing the match position. That position is still a word boundary, so the final \b succeeds as well.
A word boundary is just another zero-width assertion, like lookaheads and lookbehinds. There's no reason why two or more can't be applied at the same spot; you did that on purpose with the first word boundary and the lookahead. Some regex flavors treat it as an error if you apply a quantifier to an assertion (like \b+), but I don't think any of them would catch this problem. This is one of those rare instances where separate start-of-word and end-of-word assertions, like GNU's \< and \> or TCL's \y and \Y, would make a difference.
You need to use anchors ^ and $ »
Pattern.compile("^(?=[A-Z\\d]{9,10}$)(?:[A-Z]*\\d){0,2}[A-Z]*$");
Use this pattern:
"(?:^|(?<=\\s))(?=[A-Z\\d]{9,10}(?:\\s|$))(?:[A-Z]*\\d){0,2}[A-Z]*(?=\\s|$)"

Regex to accept only alphabets and spaces and disallowing spaces at the beginning and the end of the string

I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.

regex help in java

I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.
You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+
Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.
Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr
there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.

java regex until certain word/text/characters

Please consider the following text :
That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it.
How can I get the following result :
That is, it matches at any position that has a non-word character to the
That is everything until left
input.replace("^(.*?)\\bleft.*$", "$1");
^ anchors to the beginning of the string
.*? matches as little as possible of any character
\b matches a word boundary
left matches the string literal "left"
.* matches the remainder of the string
$ anchors to the end of the string
$1 replaces the matched string with group 1 in ()
If you want to use any word (not just "left"), be careful to escape it. You can use Pattern.quote(word) to escape the string.
The answer is actually /(.*)\Wleft\w/ but it won't match anything in
That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it.
String result = inputString.replace("(.*?)left.*", "$1");

Categories