Converting a lowercase char to uppercase without using an if statement - java

How I can convert a lowercase char to uppercase without using an if statement?. I.e. don't use code like this:
if(c > 'a' && c < 'z')
{
c = c-32;
}

You can use this:
char uppercase = Character.toUpperCase(c);

Use Character.toUpperCase(char):
Converts the character argument to uppercase using case mapping information from the UnicodeData file.
For example, Character.toUpperCase('a') returns 'A'.
So the full code you probably want is:
c = Character.toUpperCase(c);

If you are sure that your characters are ASCII alphabetic, then you can unset the bit that makes it lowercase, since the difference between the lowercase and uppercase latin chars is only one bit in the ASCII table.
You can simply do:
char upper = c & 0x5F;

You can use the ternary operator. For your case, try something like this:
c = (c >= 'a' && c <= 'z') ? c = c - 32 : c;

Related

Adding alphabets to a Java HashSet

I want to add all lowercase alphabets to a Java HashSet. For this I use the following code snippet:
for(char c = 'a'; c <= 'z'; c++)
set.add(c);
Is there a better way to do this, like one in which I don't have to iterate over all the alphabets?
In Java 8 using IntStream, I think you should be able to get all lowercase Character values like this:
Set<Character> set = IntStream.rangeClosed(Character.MIN_VALUE, Character.MAX_VALUE)
.filter(Character::isLowerCase)
.mapToObj(i -> Character.valueOf((char) i))
.collect(Collectors.toSet());
This Set<Character> contains 1,402 Character objects.
If you don't want to box the char values as Character objects, and don't mind using a third-party library, you can do the following with Eclipse Collections:
CharSet charSet = IntInterval.zeroTo(Character.MAX_VALUE)
.asLazy()
.select(Character::isLowerCase)
.collectChar(i -> (char) i)
.toSet();
This CharSet contains 1,402 char values.
If you don't want ALL lowercase letters, then just change your range to the ones you are looking for. For example:
CharSet charSet = IntInterval.fromTo('a', 'z')
.asLazy()
.collectChar(i -> (char) i)
.toSet();
Note: I am a committer for Eclipse Collections

Java String/Char charAt() Comparison

I have seen various comparisons that you can do with the charAt() method.
However, I can't really understand a few of them.
String str = "asdf";
str.charAt(0) == '-'; // What does it mean when it's equal to '-'?
char c = '3';
if (c < '9') // How are char variables compared with the `<` operator?
Any help would be appreciated.
// What does it mean when it's equal to '-'?
Every letter and symbol is a character. You can look at the first character of a String and check for a match.
In this case you get the first character and see if it's the minus character. This minus sign is (char) 45 see below
// How are char variables compared with the < operator?
In Java, all characters are actually 16-bit unsigned numbers. Each character has a number based on it unicode. e.g. '9' is character (char) 57 This comparison is true for any character less than the code for 9 e.g. space.
The first character of your string is 'a' which is (char) 97 so (char) 97 < (char) 57 is false.
String str = "asdf";
String output = " ";
if(str.charAt(0) == '-'){
// What does it mean when it's equal to '-'?
output= "- exists in the first index of the String";
}
else {
output="- doesn't exists in the first index of the String";
}
System.out.println(output);
It checks if that char exists in index 0, it is a comparison.
As for if (c < '9'), the ascii values of c and 9 are compared. I don't know why you would check if ascii equivalent of c is smaller than ascii equivalent of '9' though.
If you want to get ascii value of any char, then you can:
char character = 'c';
int ascii = character;
System.out.println(ascii);
str.charAt(0) == '-'; returns a boolean , in this case false.
if (c < '9') compares ascii value of '3' with ascii value of '9' and return boolean again.
str.charAt(0) == '-'
This statement returns a true if the character at point 0 is '-' and false otherwise.
if (c < '9')
This compares the ascii value of c with the ascii value of '9' in this case 99 and 57 respectively.
Characters are a primitive type in Java, which means it is not a complex object. As a consequence, every time you're making a comparison between chars, you are directly comparing their values.
Java characters are defined according to the original unicode specification, which gives each character a 16-bit value. These are the values that Java is comparing when you are comparing something like c>'3' or str.charAt(0) == '-'.

Test for English only A-Z upper case of a character

I need to test a character for uppercase only A-Z. Not any other special unicode or other languages.
I was reading the documentation for Character.isUpperCase. It seems like it would pass if it was a unicode character that was considered uppercase but not technically between A-Z. And it seems like it would pass uppercase characters from other languages besides english.
Do i just need to use regular expressions or am i reading into Character.isUpperCase incorrectly?
Thanks
From the documentation you linked:
Many other Unicode characters are uppercase too.
So yes, using isUpperCase will match things other than A-Z.
One way to do the test though is like this.
boolean isUpperCaseEnglish(char c){
return c >= 'A' && c <= 'Z';
}
isUpperCase indeed does not promise the character is between 'A' and 'Z'. You could use a regex:
String s = ...;
Pattern p = Pattern.compile("[A-Z]*");
Matcher m = p.matcher(s);
boolean matches = m.matches();
Character.isUpperCase() does accept things based off of other languages. For instance, Ī© would be considered uppercase.
But you can do a check to make sure it is between A and Z:
public static boolean isUpperCaseInEnglish(char c) {
return (c >= 'A' && c <= 'Z');
}

why char ch =4 (without '') , is not error?

I want to know why char ch =5; (for example)
is not error ? but if I print
System.out.println(Character.isDigit(ch));
// output
false
it will be false ?
Because 5 is an integer literal that can be converted to a char. It is not the character '5' however.
A character is represented by two bytes in memory. Java converts 5 to a character.
'5' is not the 6th character (its hexadecimal code is 35 and not 5) in the ASCIIĀ table and is thus not a "digit".
try this example :
char ch = 97;
JOptionPane.showMessageDialog(null,"ch = "+ch);
The answer would be : ch = a
It simply won't give an error even though 97 is without (' ') because 97 represent the ASCII code for the character 'a' so it's not a digit , and that's why you are getting false as a result.
if you give ch = 5, it's automatically covert to char based on ASCII value.

Java Character literals value with getNumericValue()

Why do I get the same results for both upper- and lowercase literals? For instance:
char ch1 = 'A';
char ch2 = 'a';
char ch3 = 'Z';
char ch4 = 'z';
print("ch1 -- > " + Integer.toBinaryString(Character.getNumericValue(ch1)));
print("ch2 -- > " + Integer.toBinaryString(Character.getNumericValue(ch2)));
print("ch3 -- > " + Integer.toBinaryString(Character.getNumericValue(ch3)));
print("ch4 -- > " + Integer.toBinaryString(Character.getNumericValue(ch4)));
As results I get:
ch1 -- > 1010
ch2 -- > 1010
ch3 -- > 100011
ch4 -- > 100011
And don't really see the difference between 'A' and 'a'. Even if I use character literals in UTF form (\u0041 for 'A' and \u0061 for 'a') I do get the same results.
It's behaving exactly as documented:
The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35.
Basically this means that when parsing hex (say), 0xfa == 0xFA, as you'd expect.
I'd only expect case to matter when using something like base64.
Judging from the commentary, you're actually looking for the codepoints of the characters, rather than their numeric value, so I'll just isolate that into an answer. The getNumericValue() function returns what the character means as a number when interpreting its glyph, it does not return the codepoint of a character. For instance, getNumericValue('5') returns 5 as an int, not the codepoint of 5.
To use the codepoints, just use your variables or the char literals as they are. char is a numeric datatype. For instance, System.out.println((int)'a'); will print 65, quite simply.

Categories