Regular Expression allows specific special characters in java - java

I need to know the regular expression for string that contains alphanumeric characters, #, underscore(_), full stop(.)and not any blank spaces. And also for alphanumeric characters and it allow spaces. I tried with this regex,
^[_A-Za-z0-9-\\.\\#]$ and ^[A-Za-z0-9-\\s]$
CODE:
private static final String Username_REGEX ="^[_A-Za-z0-9.#-]$";
public static boolean isUsername(EditText editText, boolean required) {
return isValid(editText, Username_REGEX,Username_MSG, required);
}
public static boolean isValid(EditText editText, String regex, String errMsg, boolean required) {
String text = editText.getText().toString().trim();
editText.setError(null);
if ( required && !hasTextemt(editText) ) return false;
if (required && !Pattern.matches(regex, text)) {
editText.setError(errMsg);
return false;
};
return true;
}
public static boolean hasTextemt(EditText editText) {
String text = editText.getText().toString().trim();
editText.setError(null);
if (text.length() == 0) {
editText.setError(emt);
return false;
}
return true;
}
Is this correct? I did not get proper result. Can anyone guide me?

Move the dash - at the end of the character class:
^[_A-Za-z0-9.#-]+$
and
^[A-Za-z0-9\\s-]+$
Between two characters it means a range.
Edit: You also need a + modifier to match one or more of the characters in the character class.

I am assuming that you are getting this input via an EditText widget. So inside the layout of the XML file you can add the following properties by which it will receive only specified characters. :
android:digits="abcdefghijklmnopqrstuvwxyz0123456789,.-#_"
note that it wont allow any capital letter.
just add any digits/keys you want your user to be able to enter. If you are not worried about the patterns and number of occurrence of any character then you don't even need any regex.
Hope it helps

Try
"[\\w#\\.]+" //for alphanumeric, #, .
"[\\w\\s]+" //for alphanumeric, spaces
Add ^ and $ if you need that matches the whole word.
PS: For testing regexp I always use RegexPlanet (not spam :P)
Hope it helps.

You are only missing a quantifier. In your expression ^[_A-Za-z0-9.#-]$, the character class [_A-Za-z0-9.#-] matches exactly one character out of the class. To allow repeated characters, you need to define a quantifier.
* short for {0,} matches 0 or more characters (==> this allows the empty string!)
+ short for {1,} matches 1 or more characters
{n,m} matches minimum n and maximum m characters.
So your regex would look like
^[_A-Za-z0-9.#-]+$
if you require 1 or more characters, or
^[_A-Za-z0-9.#-]{6,20}$
if you want at least 6 characters and at most 20.
Other things:
You can replace _A-Za-z0-9 by \w, but be aware, \w is Unicode based and contains all letters and digits from all languages.
A-Za-z is only ASCII, maybe you want to have a look at Unicode properties. With e.g. \p{L} you can match a letter of any language.

You're missing a plus sign (meaning one or more) at the end of the character class, and you can simplify considerably:
^[\\w.#]+$
Characters within a character class lose their special meanings so don't need to be escaped, except for square brackets and a couple of others.
For alphanumeric and spaces only, that is only combinations of letters, numbers and spaces:
^[a-zA-Z0-9 ]+$

Related

Java check if string only contains english keyboard letters

I want to disallow users from using any special characters in their name.
They should be able to use the whole english keyboard, so
a-z, 0-9, [], (), &, ", %, $, ^, °, #, *, +, ~, §, ., ,, -, ', =, }{
and so on. So they should be allowed to use every "normal" english character which you can type with your keyboard.
How can I check that?
Use regex to match name with English alphabets.
Solution 1:
if(name.matches("[a-zA-Z]+")) {
// Accept name
}
else {
// Ask to enter again
}
Solution 2:
while(!name.matches("[a-zA-Z]+")) {
// Ask to enter again
}
// Accept name
We can do like:
String str = "My string";
System.out.println(str.matches("^[a-zA-Z][a-zA-Z\\s]+$"));//true
str = "My string1";
System.out.println(str.matches("^[a-zA-Z][a-zA-Z\\s]+$"));//false
You can use a regular expression for this.
Since you have lots of characters that have special meaning in a regular expression, I recommend putting them in a separate string and quoting them:
String specialCharacters = "-[]()&...";
Pattern allowedCharactersPattern = Pattern.compile("[A-Za-z0-9" + Pattern.quote(specialCharacters) + "]*");
boolean containsOnlyAllowedCharacters(String str) {
return allowedCharactersPattern.matcher(str).matches();
}
As for how to obtain the string of special characters in the first place, there is no way to list all the characters that can be typed with the user's current keyboard layout. In fact, since there are ways to type any Unicode character at all such a list would be useless anyway.
I find the requirement to be quite strange , in that I can't see the rationale behind accepting § but not, say, å, and I have not checked the list of characters you want to accept in any detail.
But, it seems to me that what you're asking is to accept any character whose codepoint value is less than 0x0080, with the oddball exception of § (0x00A7). So I'd code it to make that check explicitly, and not get involved with regular expressions. I assume you want to exclude control characters, even though they can be typed on an English keyboard.
Pseudocode:
for each character ch in string
if ch < 0x0020 || (ch >= 0x007f && ch != `§')
then it's not allowed
Your requirements are oddly-stated though, in that you want to disallow "special characters" but allow `!##$%6&*()_+' for example. What's your definition of "special character"?
For arbitrary definition of 'allowable characters' I'd use a bitset.
static BitSet valid = new Bitset();
static {
valid.set('A', 'Z'+1);
valid.set('a', 'z'+1);
valid.set('0', '9'+1);
valid.set('.');
valid.set('_');
...etc...
}
then
for (int j=0; j<str.length(); j++)
if (!valid.get(str.charAt(j))
...illegal...

Regex to identify strings containing a particular symbol?

I have set of inputs ++++,----,+-+-.Out of these inputs I want the string containing only + symbols.
If you want to see if a String contains nothing but + characters, write a loop to check it:
private static boolean containsOnly(String input, char ch) {
if (input.isEmpty())
return false;
for (int i = 0; i < input.length(); i++)
if (input.charAt(i) != ch)
return false;
return true;
}
Then call it to check:
System.out.println(containsOnly("++++", '+')); // prints: true
System.out.println(containsOnly("----", '+')); // prints: false
System.out.println(containsOnly("+-+-", '+')); // prints: false
UPDATE
If you must do it using regex (worse performance), then you can do any of these:
// escape special character '+'
input.matches("\\++")
// '+' not special in a character class
input.matches("[+]+")
// if "+" is dynamic value at runtime, use quote() to escape for you,
// then use a repeating non-capturing group around that
input.matches("(?:" + Pattern.quote("+") + ")+")
Replace final + with * in each of these, if an empty string should return true.
The regular expression for checking if a string is composed of only one repeated symbol is
^(.)\1*$
If you only want lines composed by '+', then it's
^\++$, or ^++*$ if your regex implementation does not support +(meaning "one or more").
For a sequence of the same symbol, use
(.)\1+
as the regular expression. For example, this will match +++, and --- but not +--.
Regex pattern: ^[^\+]*?\+[^\+]*$
This will only permit one plus sign per string.
Demo Link
Explanation:
^ #From start of string
[^\+]* #Match 0 or more non plus characters
\+ #Match 1 plus character
[^\+]* #Match 0 or more non plus characters
$ #End of string
edit, I just read the comments under the question, I didn't actually steal the commented regex (it just happens to be intellectual convergence):
Whoops, when using matches disregard ^ and $ anchors.
input.matches("[^\\+]*?\+[^\\+]*")

Remove Special Characters For A Pattern Java

I want to remove that characters from a String:
+ - ! ( ) { } [ ] ^ ~ : \
also I want to remove them:
/*
*/
&&
||
I mean that I will not remove & or | I will remove them if the second character follows the first one (/* */ && ||)
How can I do that efficiently and fast at Java?
Example:
a:b+c1|x||c*(?)
will be:
abc1|xc*?
This can be done via a long, but actually very simple regex.
String aString = "a:b+c1|x||c*(?)";
String sanitizedString = aString.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(sanitizedString);
I think that the java.lang.String.replaceAll(String regex, String replacement) is all you need:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String).
there is two way to do that :
1)
ArrayList<String> arrayList = new ArrayList<String>();
arrayList.add("+");
arrayList.add("-");
arrayList.add("||");
arrayList.add("&&");
arrayList.add("(");
arrayList.add(")");
arrayList.add("{");
arrayList.add("}");
arrayList.add("[");
arrayList.add("]");
arrayList.add("~");
arrayList.add("^");
arrayList.add(":");
arrayList.add("/");
arrayList.add("/*");
arrayList.add("*/");
String string = "a:b+c1|x||c*(?)";
for (int i = 0; i < arrayList.size(); i++) {
if (string.contains(arrayList.get(i)));
string=string.replace(arrayList.get(i), "");
}
System.out.println(string);
2)
String string = "a:b+c1|x||c*(?)";
string = string.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(string);
Thomas wrote on How to remove special characters from a string?:
That depends on what you define as special characters, but try
replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since
you'd then either have to escape it or it would mean "any but these
characters".
Another note: the - character needs to be the first or last one on the
list, otherwise you'd have to escape it or it would define a range (
e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character
positioning, you might want to escape all those characters that have a
special meaning in regular expressions (the following list is not
complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex:
\p{P}\p{S} (keep in mind that in Java strings you'd have to escape
back slashes: "\p{P}\p{S}").
A third way could be something like this, if you can exactly define
what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
Here's less restrictive alternative to the "define allowed characters"
approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and
not a separator (whitespace, linebreak etc.). Note that you can't use
[\P{L}\P{Z}] (upper case P means not having that property), since that
would mean "everything that is not a letter or not whitespace", which
almost matches everything, since letters are not whitespace and vice
versa.

Perl5Matcher.matches(input, pattern) is returning true for input containing semicolon even when semicolon is not in pattern

I have a string MyString = "AP;"; or any other number of strings containing ;
When I attempt to validate that MyString matches a pattern
eg. MyPattern = "^[a-zA-Z0-9 ()+-_.]*$";
Which I believe should allow AlphaNumerics, and the characters ()+-_.]* but not ;
However the below statement is returning True!
Pattern sepMatchPattern = sepMatchCompiler.compile("^[a-zA-Z0-9 ()+-_.]*$");
Perl5Matcher matcher = new Perl5Matcher();
if (matcher.matches("AP;", sepMatchPattern)) {
return true;
} else {
return false;
}
Can anyone explain why the semicolon keeps getting allowed through?
The problem lies in the regular expression that you have defined - ^[a-zA-Z0-9 ()+-_.]*$. Within this regular expression is a character class of alpha (upper and lower), numeric, space, parentheses, and some punctuation. One of the punctuation characters is a period. The period is not escaped, and thus it has its original meaning of any character (including a semi colon).
This regex will match any string - it is essentially ^.*$.
To fix this, escape the period.
Pattern sepMatchPattern = sepMatchCompiler.compile("^[a-zA-Z0-9 ()+-_\\.]*$");
Edit:
It turns out that there is another item that I missed in there that has special meaning. The hyphen in the character class of "+-_" does not mean "plus, hyphen, or underscore". Rather, it means all the characters from 0x2B to 0x5F (inclusive). A quick test shows that ^[+-_]*$ also matches AP; because A and P are 0x41 and 0x50 and the notorious semicolon is 0x3B - all within the range of 0x2B to 0x5F.
The correct regular expression is:
"^[a-zA-Z0-9 ()+\\-_\\.]*$"

java regex to filter out non-English text

I found a few references to regex filtering out non-English but none of them is in Java, aside from the fact that they are all referring to somewhat different problems than what I am trying to solve:
Replace all non-English characters
with a space.
Create a method that returns true
if a string contains any non-English
character.
By "English text" I mean not only actual letters and numbers but also punctuation.
So far, what I have been able to come with for goal #1 is quite simple:
String.replaceAll("\\W", " ")
In fact, so simple that I suspect that I am missing something... Do you spot any caveats in the above?
As for goal #2, I could simply trim() the string after the above replaceAll(), then check if it's empty. But... Is there a more efficient way to do this?
In fact, so simple that I suspect that I am missing something... Do you spot any caveats in the above?
\W is equivalent to [^\w], and \w is equivalent to [a-zA-Z_0-9]. Using \W will replace everything which isn't a letter, a number, or an underscore — like tabs and newline characters. Whether or not that's a problem is really up to you.
By "English text" I mean not only actual letters and numbers but also punctuation.
In that case, you might want to use a character class which omits punctuation; something like
[^\w.,;:'"]
Create a method that returns true if a string contains any non-English character.
Use Pattern and Matcher.
Pattern p = Pattern.compile("\\W");
boolean containsSpecialChars(String string)
{
Matcher m = p.matcher(string);
return m.find();
}
This works for me
private static boolean isEnglish(String text) {
CharsetEncoder asciiEncoder = Charset.forName("US-ASCII").newEncoder();
CharsetEncoder isoEncoder = Charset.forName("ISO-8859-1").newEncoder();
return asciiEncoder.canEncode(text) || isoEncoder.canEncode(text);
}
Here is my solution. I assume the text may contain English words, punctuation marks and standard ascii symbols such as #, %, # etc.
private static final String IS_ENGLISH_REGEX = "^[ \\w \\d \\s \\. \\& \\+ \\- \\, \\! \\# \\# \\$ \\% \\^ \\* \\( \\) \\; \\\\ \\/ \\| \\< \\> \\\" \\' \\? \\= \\: \\[ \\] ]*$";
private static boolean isEnglish(String text) {
if (text == null) {
return false;
}
return text.matches(IS_ENGLISH_REGEX);
}
Assuming an english word is made up of characters from: [a-zA-Z_0-9]
To return true if a string contains any non-English character, use string.matches:
return !string.matches("^\\w+$");

Categories