Test for English only A-Z upper case of a character - java

I need to test a character for uppercase only A-Z. Not any other special unicode or other languages.
I was reading the documentation for Character.isUpperCase. It seems like it would pass if it was a unicode character that was considered uppercase but not technically between A-Z. And it seems like it would pass uppercase characters from other languages besides english.
Do i just need to use regular expressions or am i reading into Character.isUpperCase incorrectly?
Thanks

From the documentation you linked:
Many other Unicode characters are uppercase too.
So yes, using isUpperCase will match things other than A-Z.
One way to do the test though is like this.
boolean isUpperCaseEnglish(char c){
return c >= 'A' && c <= 'Z';
}

isUpperCase indeed does not promise the character is between 'A' and 'Z'. You could use a regex:
String s = ...;
Pattern p = Pattern.compile("[A-Z]*");
Matcher m = p.matcher(s);
boolean matches = m.matches();

Character.isUpperCase() does accept things based off of other languages. For instance, Ω would be considered uppercase.
But you can do a check to make sure it is between A and Z:
public static boolean isUpperCaseInEnglish(char c) {
return (c >= 'A' && c <= 'Z');
}

Related

Java check if string only contains english keyboard letters

I want to disallow users from using any special characters in their name.
They should be able to use the whole english keyboard, so
a-z, 0-9, [], (), &, ", %, $, ^, °, #, *, +, ~, §, ., ,, -, ', =, }{
and so on. So they should be allowed to use every "normal" english character which you can type with your keyboard.
How can I check that?
Use regex to match name with English alphabets.
Solution 1:
if(name.matches("[a-zA-Z]+")) {
// Accept name
}
else {
// Ask to enter again
}
Solution 2:
while(!name.matches("[a-zA-Z]+")) {
// Ask to enter again
}
// Accept name
We can do like:
String str = "My string";
System.out.println(str.matches("^[a-zA-Z][a-zA-Z\\s]+$"));//true
str = "My string1";
System.out.println(str.matches("^[a-zA-Z][a-zA-Z\\s]+$"));//false
You can use a regular expression for this.
Since you have lots of characters that have special meaning in a regular expression, I recommend putting them in a separate string and quoting them:
String specialCharacters = "-[]()&...";
Pattern allowedCharactersPattern = Pattern.compile("[A-Za-z0-9" + Pattern.quote(specialCharacters) + "]*");
boolean containsOnlyAllowedCharacters(String str) {
return allowedCharactersPattern.matcher(str).matches();
}
As for how to obtain the string of special characters in the first place, there is no way to list all the characters that can be typed with the user's current keyboard layout. In fact, since there are ways to type any Unicode character at all such a list would be useless anyway.
I find the requirement to be quite strange , in that I can't see the rationale behind accepting § but not, say, å, and I have not checked the list of characters you want to accept in any detail.
But, it seems to me that what you're asking is to accept any character whose codepoint value is less than 0x0080, with the oddball exception of § (0x00A7). So I'd code it to make that check explicitly, and not get involved with regular expressions. I assume you want to exclude control characters, even though they can be typed on an English keyboard.
Pseudocode:
for each character ch in string
if ch < 0x0020 || (ch >= 0x007f && ch != `§')
then it's not allowed
Your requirements are oddly-stated though, in that you want to disallow "special characters" but allow `!##$%6&*()_+' for example. What's your definition of "special character"?
For arbitrary definition of 'allowable characters' I'd use a bitset.
static BitSet valid = new Bitset();
static {
valid.set('A', 'Z'+1);
valid.set('a', 'z'+1);
valid.set('0', '9'+1);
valid.set('.');
valid.set('_');
...etc...
}
then
for (int j=0; j<str.length(); j++)
if (!valid.get(str.charAt(j))
...illegal...

Converting a lowercase char to uppercase without using an if statement

How I can convert a lowercase char to uppercase without using an if statement?. I.e. don't use code like this:
if(c > 'a' && c < 'z')
{
c = c-32;
}
You can use this:
char uppercase = Character.toUpperCase(c);
Use Character.toUpperCase(char):
Converts the character argument to uppercase using case mapping information from the UnicodeData file.
For example, Character.toUpperCase('a') returns 'A'.
So the full code you probably want is:
c = Character.toUpperCase(c);
If you are sure that your characters are ASCII alphabetic, then you can unset the bit that makes it lowercase, since the difference between the lowercase and uppercase latin chars is only one bit in the ASCII table.
You can simply do:
char upper = c & 0x5F;
You can use the ternary operator. For your case, try something like this:
c = (c >= 'a' && c <= 'z') ? c = c - 32 : c;

How to use Unicode to split Japanese from English

I have a string variable which is a paragraph containing both English and Japanese words.
I want to split Japanese from English.
So I use the Unicode to decide whether the character falls into \u+0000~ \u+007F (basic Latin unicode)
But I don't know how to write the Java code to convert char to unicode, and how to compare unicode.
Anyone can give me a sample?
public void split(String str){
char[]cstr=str.toCharArray();
String en = "";
String jp = "";
for(char c: cstr){
//(1) To Unicode?
//(2) How to check whether fall into \u0000 ~ \u007F
if(is_en) en+=c;
else jp+=c;
}
}
Assuming the string you have is 16-bit Unicode, and that you aren't trying to go to full Unicode, you can use:
if ('\u0000' <= c && c <= '\u007f')
{ // c is English }
else { // c is other }
I don't know, however, that this does exactly what you want. Many of the characters in that range are actually punctuation, for instance. And I found a reference here to a set of Unicode characters that are a mix of Roman and "half-width kanji". Just be aware that actually differentiating between all the Unicode characters that might represent English letters and all others might not be this simple, it will depend on your environment.

java check the first and last char are uppercase

I am trying to achieve this.
I have a string of 9char (always the same). But i also know that the first and last char is always a aplhabetic, it must be. the rest in the middle are numbers. How to check for that.
I got this logic so far, syntax is my problem
string samplestring;
samplestring = a1234567B
If(samplestring.length() == 9 && samplestring.substring(0,1).uppercase && samplestring.substring(8,9) && samplestring.THE REST OF THE CHAR IN THE MIDDLE ARE DIGITS)
{
println("yes this is correct");
}
else
{
println("retype");
}
Please dont mind about the simple english just want to know the syntax but the logic is there i hope..
Also can please show me those lowercase ones how to convert to uppercase?
A regular expression would be suitable:
String s = new String("A2345678Z");
if (s.matches("[A-Z][0-9]{7}[A-Z]")))
{
}
Regular expression explained:
[A-Z] means any uppercase letter
[0-9]{7} means 7 digits
Pattern p = Pattern.compile("^[A-Za-z]\\d+[A-Za-z]$");
Matcher m = p.match("A1234567B");
if (m.matches()) {
//
}
Edit:
If there are always seven digits, you can replace the \\d+ with \\d{7}
String str="A12345678B";
char first = str.charAt(0);
char second = str.charAt(str.length()-1);
if(Character.isUpperCase(first)&& Character.isUpperCase(second)){
//do something
}

Regex for checking if a string is strictly alphanumeric

How can I check if a string contains only numbers and alphabets ie. is alphanumeric?
Considering you want to check for ASCII Alphanumeric characters, Try this:
"^[a-zA-Z0-9]*$". Use this RegEx in String.matches(Regex), it will return true if the string is alphanumeric, else it will return false.
public boolean isAlphaNumeric(String s){
String pattern= "^[a-zA-Z0-9]*$";
return s.matches(pattern);
}
If it will help, read this for more details about regex: http://www.vogella.com/articles/JavaRegularExpressions/article.html
In order to be unicode compatible:
^[\pL\pN]+$
where
\pL stands for any letter
\pN stands for any number
It's 2016 or later and things have progressed. This matches Unicode alphanumeric strings:
^[\\p{IsAlphabetic}\\p{IsDigit}]+$
See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.
See the documentation of Pattern.
Assuming US-ASCII alphabet (a-z, A-Z), you could use \p{Alnum}.
A regex to check that a line contains only such characters is "^[\\p{Alnum}]*$".
That also matches empty string. To exclude empty string: "^[\\p{Alnum}]+$".
Use character classes:
^[[:alnum:]]*$
Pattern pattern = Pattern.compile("^[a-zA-Z0-9]*$");
Matcher matcher = pattern.matcher("Teststring123");
if(matcher.matches()) {
// yay! alphanumeric!
}
try this [0-9a-zA-Z]+ for only alpha and num with one char at-least..
may need modification so test on it
http://www.regexplanet.com/advanced/java/index.html
Pattern pattern = Pattern.compile("^[0-9a-zA-Z]+$");
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
}
To consider all Unicode letters and digits, Character.isLetterOrDigit can be used. In Java 8, this can be combined with String#codePoints and IntStream#allMatch.
boolean alphanumeric = str.codePoints().allMatch(Character::isLetterOrDigit);
To include [a-zA-Z0-9_], you can use \w.
So myString.matches("\\w*"). (.matches must match the entire string so ^\\w*$ is not needed. .find can match a substring)
https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
If you want to include foreign language letters as well, you can try:
String string = "hippopotamus";
if (string.matches("^[\\p{L}0-9']+$")){
string is alphanumeric do something here...
}
Or if you wanted to allow a specific special character, but not any others. For example for # or space, you can try:
String string = "#somehashtag";
if(string.matches("^[\\p{L}0-9'#]+$")){
string is alphanumeric plus #, do something here...
}
100% alphanumeric RegEx (it contains only alphanumeric, not even integers & characters, only alphanumeric)
For example:
special char (not allowed)
123 (not allowed)
asdf (not allowed)
1235asdf (allowed)
String name="^[^<a-zA-Z>]\\d*[a-zA-Z][a-zA-Z\\d]*$";
To check if a String is alphanumeric, you can use a method that goes through every character in the string and checks if it is alphanumeric.
public static boolean isAlphaNumeric(String s){
for(int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if(!Character.isDigit(c) && !Character.isLetter(c))
return false;
}
return true;
}

Categories