String pattern that prevents the first character being a whitespace - java

I'm having a bit of difficulty figuring out a pattern that will allow anything to be entered, as long as the first character isn't a whitespace.
I've tried
String pattern = "[^\\s][a-zA-Z0-9\\W ]+";"
and "([a-zA-Z0-9\\W]+)|(([a-zA-Z0-9\\W]+\\s[a-zA-Z0-9\\W]+)+)" as well as several other variants, with no success. Any help would be greatly appreciated.
I'm using Java btw

Does this work
^[^\s].*
The first caret denotes start of line, and the second negation.

Most regular expression matching defaults to searching anywhere in the string for the pattern. Since you are concerned specifically with the beginning of the string, you should prefix the entire regex with '^' to anchor the match to the beginning of the input.
String pattern = "^[^\\s][a-zA-Z0-9\\W ]+";
It can be a bit confusing since ^ has a very different meaning when it appears inside square brackets. Inside the brackets, as you know, it signals matching the complement of (ie all characters except) the set of characters listed in the brackets. Outside, it is simply an anchor for the beginning of the string.
In this non-bracketed use, it is the opposite of $ which anchors a match at the end of a string, eg /end$/ will match "friend" but not "ending" - you can read more about anchors at this URL: http://www.regular-expressions.info/anchors.html

Since you don't care about the rest, you can just use String.charAt(int) with Character.isSpaceChar(char), or String.codePointAt(int) with Character.isSpaceChar(int).
The second method is the correct way to deal with Unicode string and code point in astral plane, while the first method is broken, but usable when your input only has character from the Basic Multilingual Plane (BMP).
Code for the second method:
boolean startWithSpace = Character.isSpaceChar(input.codePointAt(0));
Character.isSpaceChar checks for any whitespace character according to Unicode. Not to be confused with Character.isWhitespace, which checks for whitespace character according to Java.

Related

Java Regular Expression for number of exactly 5 digits anywhere in the string

I'm trying to create a regular expression to parse a 5 digit number out of a string no matter where it is but I can't seem to figure out how to get the beginning and end cases.
I've used the pattern as follows \\d{5} but this will grab a subset of a larger number...however when I try to do something like \\D\\d{5}\\D it doesn't work for the end cases. I would appreciate any help here! Thanks!
For a few examples (55555 is what should be extracted):
At the beginning of the string
"55555blahblahblah123456677788"
In the middle of the string
"2345blahblah:55555blahblah"
At the end of the string
"1234567890blahblahblah55555"
Since you are using a language that supports them use negative lookarounds:
"(?<!\\d)\\d{5}(?!\\d)"
These will assert that your \\d{5} is neither preceded nor followed by a digit. Whether that is due to the edge of the string or a non-digit character does not matter.
Note that these assertions themselves are zero-width matches. So those characters will not actually be included in the match. That is why they are called lookbehind and lookahead. They just check what is there, without actually making it part of the match. This is another disadvantage of using \\D, which would include the non-digit character in your match (or require you to use capturing groups).

Regex to accept only alphabets and spaces and disallowing spaces at the beginning and the end of the string

I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.

java regex: match input starting with non-number or empty string followed by specific pattern

I'm using Java regular expressions to match and capture a string such as:
0::10000
A solution would be:
(0::\d{1,8})
However, the match would succeed for the input
10::10000
as well, which is wrong. Therefore, I now have:
[^\d](0::\d{1,8})
which means it must lead with any character except a number, but that means there needs to be some character before the first zero. What I really want (and what I need help with) is to say "lead with a non-number or nothing at all."
In conclusion the final solution regular expression should match the following:
0::10000kjkj0::10000
and should not match the following:
10::10000
This site may be of use if someone wants to help.
Thanks.
You need a negative lookbehind:
(?<!\d)(0::\d{1,8})
It means "match 0::\d{1,8} not preceded by \d".

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Unescaped "." still matches when used in a negation group

I made, what I believed to be, an error in a regular expression in Java recently but when I test my code I don't get the error I expect.
The expression I created was meant to replace a password in a string that I received from another source. The pattern I used went along the lines of: "password: [^\\s.]*", the idea being that it would match the word "password" the colon, a space, then any characters except for a space or a full-stop (period). I would then replace the instance with "password: XXXXXX" and therefore mask it.
The obvious error should be that I have forgotten to escape the full-stop. In otherwords the proper expression should have been "password: [^\\s\\.]*". Thing is, if I don't escape the full-stop the code still works!
Here's some sample code:
import java.util.regex.*;
public class SimpleRegexTest {
public static void main(String[] args) {
Pattern simplePattern = Pattern.compile("password: [^\\s.]*");
Matcher simpleMatcher = simplePattern.matcher("password: newpass. Enjoy.");
String maskedString = simpleMatcher.replaceAll("password: XXXXXX");
System.out.println(maskedString);
}
}
When I run the above code I get the following output:
password: XXXXXX. Enjoy.
Is this a special case, or have I completely missed something?
(edit: changed to "escape the full-stop")
Michael Borgwardt: I couldn't think of another term to describe what I was doing apart from "negation group", sorry for the ambiguity.
Aviator: In this case, no, a space won't be in the password. I didn't make the rules ;-).
(edit: doubled up the slashes in the non-code text so it displays properly, added the ^ which was in the code, but not the text :-/)
Sundar: Fixed the double slashes, SO seems to have it's own escape characters.
A period ('.' character) does not need to be escaped inside a character class [] in a regular expression.
From the API:
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.
It looks like you got the negation operator mixed up for regex ranges.
In particular, my understanding is that you used the snippet [\s.]* to mean "any characters except for a space or a full-stop (period)." This would in fact be expressed as [^ .]*, using the caret to negate the characters in the set.
I don't know if this was just a typo in your post or what was actually in your code, but the regex as it stands in your question will match the word "password", a colon, a space, then any sequence of backslash characters, "s" characters or periods.

Categories