Validate string has no illegal characters - java

Im trying to validate a string that only allows letters, numbers and these characters :
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
I tried doing this but its not working and allowing me to enter characters not in the regex. Im still pretty new to java and something similar was working in javascript but I cant figure out whats going on here. I think its running as if it cant find any of the characters mentioned then it will return four.
Pattern allowedCharacters = Pattern.compile("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]");
if (!allowedCharacters.matcher(pw).find()){
return 4;
}
Any help is appreciated. Thanks
EDIT:
I also tried:
if (pw.matches("^[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}
and
if (!pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}

matcher.find() checks if string contains substring that matches regex, so with
!matcher.find() you are checking if there is no match of regex in tested string.
Consider using using matcher.matches() to check if entire string is matched by regex. In this case you will have to add quantifiers like *, + or {n,m} to character class to decide about passwords length. Otherwise it will only single character passwords.
Here is demo of how your code can look like
// here you place quantifier
// ↓
if (pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]+")){
System.out.println("password contains only valid characters");
} else {
System.out.println("invalid characters in password");
}
Update:
in your regex you are not escaping [ which makes [\]^_`{|}~-] separate character class which will be added to outer character class. This character class will not include \ or [. If you are really interested in accepting only alphanumeric characters and !"#$%&'()*+,-./:;<=>?#[]^_`{|}~ then consider using
"[\\w\\Q!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~\\E]+"
as regex.
\\w represents [a-zA-Z0-9_]
and \Q and \E is quote, which is mechanism to escape metacharacters, even in character class.

It's because you're using find() and not matches(). That said, I'd try the opposite, doing find on [^<legal chars>] (note the caret) to match an illegal characters. It's faster because it'll fail as soon as it hits something illegal. Also, start with the simple legal characters, then move up from there. Regular expressions can get hard to read, and adding one char at a time that has special meaning is easier than adding them all at once.

Using other answers from this question, I found this to work for me. Nothing needs to be escaped between the \Q and \E. They do that for you.
Pattern whitelist = Pattern.compile("^[\\w\\s\\Q!\"#$%&'()*+,-.\\/:;<=>?#[]^_`{|}~\\E]+$");
if (!whitelist.matcher(pw).matches()) {
// error
}

Related

Validating a mathematical expression in java

I am trying to validate if the string "expression" as in the code below is a formula.
String expression = request.getParameter(FORMULA);
if(!Pattern.matches("[a-zA-Z0-9+-*/()]", expression)){return new AjaxMessage(AjaxMessage.ResponseStatusEnum.FAILURE, getJsonString(, "Manager.invalid.formula" , null));
}
examples of value for expression are {a+b/2, (a+b)*2,(john-Max),etc} just for the context (the variable names in the formula might vary and the arithmetic expression contains only [+-/()*] special characters. As you can see I tried to validate using regex (new to regex), but I think it's not possible as I don't know the length of the variable names.
Is there a way to achieve a validation using regex or any other library in java?
Thanks in advance.
The reason is you are using characters with special meaning in regex. You need to escape those characters. I have just modified yor regex to make it work.
Code:
List<String> expressions = new ArrayList<String>();
expressions.add("a+b/2");
expressions.add("(a+b)*2");
expressions.add("john-Max");
expressions.add("etc[");
for (String expression : expressions) {
if (!Pattern.matches("[a-zA-Z0-9\\+\\-\\*/\\(\\)]*", expression)) {
System.out.println("NOT match");
} else {
System.out.println("MATCH");
}
}
}
OUTPUT:
MATCH
MATCH
MATCH
NOT match
You're using special character in your regex, you need to escape them using \.
It should look like [a-zA-Z0-9+\\-*/()] . This only tests one character you need to add a * at the end to test multiple characters.
Edit (thanks Toto): because [] tests a single character, it's called a character class (not like a Java class actually), so only the -is considered special here. For a regex without the braces, you would neeed to escape the other special characters.
Special characters have special meaning using regex and won't be interpreted as the character they are (for example parenthesis are used to make groups, * means 0 or more of the previous character, etc.).
About character class: https://docs.oracle.com/javase/tutorial/essential/regex/char_classes.html
More info:
http://www.regular-expressions.info/characters.html and
http://www.regular-expressions.info/refcharacters.html
I use this site to test my regexes (note that regex engine may vary !):
https://regex101.com/
As said in comment, a mathematic expression is more than just different characters, so if you want to validate, you'll have to do more manual checking.

Java RegEx pattern is invalid when trying to exclude commas

I'm building a function to validate usernames, and in this case I want to accept alphabetic characters only. I'm matching the provided user input against this regex:
[1-9!##$%&*()_+=|<>?{}\\[\\]~-,]
This is the method that makes use of the regex:
public static String purgeInvalidLogin(String failedLogin, String pattern) {
Pattern special = Pattern.compile (pattern);
String purgedLogin = failedLogin.replaceAll(special.pattern(), ""); // remove any special characters before moving on
purgedLogin = StringUtils.deleteWhitespace(purgedLogin);
return purgedLogin;
}
However when trying to run this I get this message:
Illegal character range near index 25 [!##$%&*()_+=|<>?{}[]~-,] ^
which only happened once I added the comma. I've also tried the expression [!##$%&*()_+=|<>?{}[]~-\,] (escaping the comma) to no avail. I'm wondering how I can use the regex properly to exclude commas making use of my method above.
Thanks in advance.
Escape the hyphen just before it. It is interpreted as defining a range of characters, as soon as you add another character (the comma) after it.
[1-9!##$%&*()_+=|<>?{}\\[\\]~\\-,]
You want to accept only alpha chars and you are doing this by listing every possible illegal character. I think you have got this backwards and it would better to look for what you do want (which would be a much shorter regex) and flag non matches.

Replacing illegal character in fileName

In Java, I've a File-Name-String. There I want to replace all illegal Characters with '_', but not a-z, 0-9, -,. and _
I tried following code: But this did not worked!
myString = myString.replaceAll("[\\W][^\\.][^-][^_]", "_");
You need to replace everything but [a-zA-Z0-9.-].
The ^ within the brackets stands for "NOT".
myString = myString.replaceAll("[^a-zA-Z0-9\\.\\-]", "_");
If you are looking for options on windows platform then you can try below solution to make use of all valid characters other than "\/:*?"<>|" in file name.
fileName = fileName.replaceAll("[\\\\/:*?\"<>|]", "_");
Keep it simple.
myString = myString.replaceAll("[^a-zA-Z0-9.-]", "_");
http://ideone.com/TINsr4
Even simpler
myString = myString.replaceAll("[^\\w.-]", "_");
Predefined Character Classes:
\w A word character: [a-zA-Z_0-9]
I know there have been some answers here already, but I would like to point out that I had to alter the given suggestions slightly.
filename.matches("^.*[^a-zA-Z0-9._-].*$")
This is what I had to use for .matches in Java to get the desired results. I am not sure if this is 100% correct, but this is how it worked for me, it would return true if it encountered any character other than a-z A-Z 0-9 (.) (_) and (-).
I would like to know if there are any flaws with my logic here.
In previous answers I've seen some discussion of what should or should not be escaped. For this example, I've gotten away without escaping anything, but you should escape the (-) minus character to be safe as it will "break" your expression unless it is at the end of the list.
The (.) dot character doesn't have to be escaped within the ([]) Square Braces it would seem, but it will not hurt you if you do escape it.
Please see Java Patterns for more details.
If you want to use more than like [A-Za-z0-9], then check MS Naming Conventions, and dont forget to filter out "...Characters whose integer representations are in the range from 1 through 31,...".

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Unescaped "." still matches when used in a negation group

I made, what I believed to be, an error in a regular expression in Java recently but when I test my code I don't get the error I expect.
The expression I created was meant to replace a password in a string that I received from another source. The pattern I used went along the lines of: "password: [^\\s.]*", the idea being that it would match the word "password" the colon, a space, then any characters except for a space or a full-stop (period). I would then replace the instance with "password: XXXXXX" and therefore mask it.
The obvious error should be that I have forgotten to escape the full-stop. In otherwords the proper expression should have been "password: [^\\s\\.]*". Thing is, if I don't escape the full-stop the code still works!
Here's some sample code:
import java.util.regex.*;
public class SimpleRegexTest {
public static void main(String[] args) {
Pattern simplePattern = Pattern.compile("password: [^\\s.]*");
Matcher simpleMatcher = simplePattern.matcher("password: newpass. Enjoy.");
String maskedString = simpleMatcher.replaceAll("password: XXXXXX");
System.out.println(maskedString);
}
}
When I run the above code I get the following output:
password: XXXXXX. Enjoy.
Is this a special case, or have I completely missed something?
(edit: changed to "escape the full-stop")
Michael Borgwardt: I couldn't think of another term to describe what I was doing apart from "negation group", sorry for the ambiguity.
Aviator: In this case, no, a space won't be in the password. I didn't make the rules ;-).
(edit: doubled up the slashes in the non-code text so it displays properly, added the ^ which was in the code, but not the text :-/)
Sundar: Fixed the double slashes, SO seems to have it's own escape characters.
A period ('.' character) does not need to be escaped inside a character class [] in a regular expression.
From the API:
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.
It looks like you got the negation operator mixed up for regex ranges.
In particular, my understanding is that you used the snippet [\s.]* to mean "any characters except for a space or a full-stop (period)." This would in fact be expressed as [^ .]*, using the caret to negate the characters in the set.
I don't know if this was just a typo in your post or what was actually in your code, but the regex as it stands in your question will match the word "password", a colon, a space, then any sequence of backslash characters, "s" characters or periods.

Categories