Counting every single letter in an array with ASCII table (Unicode code) - java

I am new at Java and I could not understand this structure:
public static int[] upperCounter(String str) {
final int NUMCHARS = 26;
int[] upperCounts = new int[NUMCHARS];
char c;
for (int i = 0; i < str.length(); i++) {
c = str.charAt(i);
if (c >= 'A' && c <= 'Z')
upperCounts[c-'A']++;
}
return upperCounts;
}
This method works, but what does list[c-'A']++; mean?

c - 'A' is taking a character in the range ['A' .. 'Z'], and subtracting 'A' to create a numerical value in the range [0 .. 25] so it can be used as an array index.
upperCounts[c - 'A']++ increments the occurrence count for the character c using its corresponding index c - 'A'.
Effectively, the loop is generating an array of character type counts.

This is really advanced syntax, let me try to break it down:
c - 'a'
c is an indexed variable from the loop, while 'a' is a character that has a certain integer value as denoted by the ASCII table. This operation produces an integer result.
list[c - 'a']
This integer value is then used to interface an int[] array getting the nth item in the list array, returning an integer result.
list[c - 'a']++;
The ++ operator adds one to that value.

It means, that you increment (++) value of element at c - 'A' index in list array.
c is variable - number of letter in alphabet
'A'refers to the Unicode code point of the letter A (65 decimal). Letter B is 66 decimal etc.

Value in a char variable can be represented as an integer. The letter A is 65 and a would be 97 (see the ASCII table for more letters).
list[c-'A']++;
This code means, take the value of c (which is between 65 and 90 [due to if ( c >= 'A' && c <= 'Z' )]) and reduce the value of A (i.e. 65). This will return an index in array list and increases its current value.
Example: c is C:
C = 67
A = 65
C - A = 2
Therefore index 2 will be changed. Index 2 is the third element, like C is the third letter in the alphabet.

Related

How do I check whether a char is == to a number and why doesn't this work? [duplicate]

char c = '0';
int i = 0;
System.out.println(c == i);
Why does this always returns false?
Although this question is very unclear, I am pretty sure the poster wants to know why this prints false:
char c = '0';
int i = 0;
System.out.println(c == i);
The answer is because every printable character is assigned a unique code number, and that's the value that a char has when treated as an int. The code number for the character 0 is decimal 48, and obviously 48 is not equal to 0.
Why aren't the character codes for the digits equal to the digits themselves? Mostly because the first few codes, especially 0, are too special to be used for such a mundane purpose.
The char c = '0' has the ascii code 48. This number is compared to s, not '0'. If you want to compare c with s you can either do:
if(c == s) // compare ascii code of c with s
This will be true if c = '0' and s = 48.
or
if(c == s + '0') // compare the digit represented by c
// with the digit represented by s
This will be true if c = '0' and s = 0.
The char and int value can not we directly compare we need to apply casting. So need to casting char to string and after string will pars into integer
char c='0';
int i=0;
Answer is like
String c = String.valueOf(c);
System.out.println(Integer.parseInt(c) == i)
It will return true;
Hope it will help you
Thanks
You're saying that s is an Integer and c (from what I see) is a Char.. so there you, that's the problem: Integer vs. Char comparation.

Converting char "a" to number 0 using java function

So I'm learning about functions and methods, and trying to create a function that would allow me to replace a Letter with a Number, thus "a" would be 0, "b" would be 1, so on and so forth. I don't know ascii at all, and have only run into creating a very long if, else statement, but I don't even know if I'm on the right track. I'm trying to find a way to create a function without having to make a long conditional statement and use less line of code.
This is the new code I have written with suggestions:
public class CaesarCipher {
/*
* create function that converts a letter to a number
* ex. a -> 0, b -> 1, etc...
*/
static char letterToNumber (char firstLetter){
if (firstLetter < 'a' || firstLetter > 'z') {
}
return firstLetter;
}
static int numberToLetter (int firstNumer){
if (firstNumber < '0' || firstNumber '25'){
}
return firstNumber;
}
public static void main(String[] args) {
char a = 0;
// TODO Auto-generated method stub
System.out.println (letterToNumber (a)); //suppose to compile to convert a -> the number 0
System.out.println(numberToLetter (1)); //compile to convert 1 -> the letter b
}
}
The simplest approach is just to subtract the literal 'a'... which will implicitly convert both your input letter and the 'a' to int:
public int convert(char letter) {
if (letter < 'a' || letter > 'z') {
throw new IllegalArgumentException("Only lower-case ASCII letters are valid");
}
return letter - 'a';
}
The nice thing about this solution is that it's reasonably "obviously correct" (with the assumption that the letters 'a' to 'z' are consecutive in UTF-16). You don't need to include any magic integer values.
char letter = 'a';
int letterAscii = (int)c;
int asciiOffsetOfA = 97;
int positionInAlphabet=letterAscii-asciiOffsetOfA;
Use this with combination of String.toCharArray() and String.toLowerCase() on your input String.
The ASCII value of 0 is 48, a is 97 and A is 65. So to convert small letter to 0 you decrease 49 and capital letter 17. Same goes for B/b and 1, C/c and 2, etc.
int smallChar = 'a' - 49; // equal 0
int capitalChar = 'A' - 17; // equal 0

Why am i getting totally different output when using charAt in a conditional operator

I was guessing an output 1 for this code but instead of that i am getting output 49,
The code is
public static void main(String[] args) {
String str = "1+21";
int pos = -1;
int c;
c = (++pos < str.length()) ? str.charAt(pos) : -1;
System.out.println(c);
}
The result of someCondition ? a : b is the common type for a and b. In this case, the common type of str.charAt(pos) (a char) and -1 (an int) is int. That means your str.charAt(pos) value is being cast to an int -- basically, being converted to its unicode code point, which in this case is the same as its ASCII value.
49 is the code point for the character '1'.
If you're trying to get c to be the digit '1', the easiest thing to do is to subtract the code point for '0':
c = (++pos < str.length()) ? (str.charAt(pos) - '0') : -1;
This works because all of the numbers are sequential in unicode, starting with '0'. By subtracting the value of the char '0' from these -- that is, the int 48 -- you get the value you want:
'0' = 48 - 48 = 0
'1' = 49 - 48 = 1
...
'9' = 57 - 48 = 9
charAt method return the char value of the position that you pass. Here you are assigning that to a int variable. So that means you are getting the integer representation of particular char value.
In your case
int c = "1+21".charAt(0); -> actual char is 1 and the ASCII of that is 49

The Character Class in Java

Here is a short program that counts the letters of any given word entered by the user.
I'm trying to figure out what the following lines actually do in this program:
counts[s.charAt(i) - 'a']++; // I don't understand what the - 'a' is doing
System.out.println((char)('a' + i) // I don't get what the 'a' + i actually does.
import java.util.Scanner;
public class Listing9_3 {
public static void main(String[] args) {
//Create a scanner
Scanner input = new Scanner (System.in);
System.out.println("Enter a word to find out the occurences of each letter: ");
String s = input.nextLine();
//Invoke the count Letters Method to count each letter
int[] counts = countLetters(s.toLowerCase());
//Display results
for(int i = 0; i< counts.length; i++){
if(counts[i] != 0)
System.out.println((char)('a' + i) + " appears " +
counts[i] + ((counts[i] == 1)? " time" : " times"));
***//I don't understand what the 'a' + i is doing
}
}
public static int[] countLetters(String s) {
int[] counts = new int [26]; // 26 letters in the alphabet
for(int i = 0; i < s.length(); i++){
if(Character.isLetter(s.charAt(i)))
counts[s.charAt(i) - 'a']++;
***// I don't understand what the - 'a' is doin
}
return counts;
}
}
Characters are a kind of integer in Java; the integer is a number associated with the character on the Unicode chart. Thus, 'a' is actually the integer 97; 'b' is 98, and so on in sequence up through 'z'. So s.charAt(i) returns a character; assuming that it is a lower-case letter in the English alphabet, subtracting 'a' from it gives the result 0 for 'a', 1 for 'b', 2 for 'c', and so on.
You can see the first 4096 characters of the Unicode chart at http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF (and there will be references to other pages of the chart as well). You'll see 'a' there as U+0061 (which is hex, = 97 decimal).
Because you want your array to contains only the count of each letter from 'a' to 'z'.
So to index correctly each count of the letter within the array you would need a mapping letter -> index with 'a' -> 0, 'b' -> 1 to 'z' -> 25.
Each character is represented by a integer value on 16 bits (so from 0 to 65,535). You're only interested from the letters 'a' to 'z', which have respectively the values 97 and 122.
How would you get the mapping?
This can be done using the trick s.charAt(i) - 'a'.
This will ensure that the value returned by this operation is between 0 and 25 because you know that s.charAt(i) will return a character between 'a' and 'z' (you're converting the input of the user in lower case and using Character.isLetter)
Hence you got the desired mapping to count the occurences of each letter in the word.
On the other hand, (char)('a' + i) does the reverse operation. i varies from 0 to 25 and you respectively got the letters from 'a' to 'z'. You just need to cast the result of the addition to char otherwise you would see its unicode value be printed.
counts[s.charAt(i) - 'a']++; // I don't understand what the - 'a' is doing
assume charAT(i) is 'z'
now z-a will be equal to 25 (subtract the unicode / ASCII values).
so counts[25]=counts[25]+1; // just keeps track of count of each character

Display the number of the characters in a string

I have a Java question: I am writing a program to read a string and display the number of characters in that string. I found some example code but I don't quite understand the last part - can anyone help?
int[] count = countLetters(line.toLowerCase());
for (int i=0; i<count.length; i++)
{
if ((i + 1) % 10 == 0)
System.out.println( (char) ('a' + i)+ " " + count[i]);
else
System.out.print( (char) ('a' + i)+ " " + count[i]+ " ");
}
public static int[] countLetters(String line)
{
int[] count = new int[26];
for (int i = 0; i<line.length(); i++)
{
if (Character.isLetter(line.charAt(i)))
count[(int)(line.charAt(i) - 'a')]++;
}
return count;
}
Your last loop is :
For every character we test if it's a letter, if yes, we increment the counter relative to that character. Which means, 'a' is 0, 'b' is 1 ... (in other words, 'a' is 'a'-'a' which is 0, 'b' is 'b'-'a' which is 1 ...).
This is a common way to count the number of occurrences of characters in a string.
The code you posted counts not the length of the string, but the number of occurrences of alphabet letters that occur in the lowercased string.
Character.isLetter(line.charAt(i))
retrieved the character at position i and returns true if it is a letter.
count[(int)(line.charAt(i) - 'a')]++;
increments the count at index character - 'a', this is 0 to 26.
The result of the function is an array of 26 integers containing the counts per letter.
The for loop over the counts array ends the printed output every 10th count and uses
(char) ('a' + i)
to print the letter that the counts belongs to.
I guess you are counting the occurences of letters, not characters ('5' is also a character).
The last part:
for (int i = 0; i<line.length(); i++)
{
if (Character.isLetter(line.charAt(i)))
count[(int)(line.charAt(i) - 'a')]++;
}
It iterates over the input line and checks for each character if it is a letter. If it is, it increments the count for that letter. The count is kept in an array of 26 integers (for the 26 letters in the latin alphabet). The count for letter 'a' is kept at index 0, letter 'b' at 1, 'z' at 25. To get the index the code subtracts the value 'a' from the letter value (each character not only is a character/glyph, but also a numeric value). So if the letter is 'a' it subtracts the value of 'a' which should be 0 and so on.
In the method countLetters, the for loop goes through all characters in the line. The if checks to make sure it's a letter, otherwise it will be ignored.
line.charAt() yields the single character at position i. The type of this is char.
Now deep inside Java, a char is just a number corresponding to a character code. Lowercase 'a' has a character code of 97, 'b' is 98 and so on. (int) forces conversion from char to int. So we take the character code, let's say it's a 'b' so the code is 98, and we subtract the code for 'a', which is 97, so we get the offset 1 (from the beginning of the alphabet). For any letter in the alphabet, the offset will be between 0 and 25 (inclusive).
So we use that offset as an index into the array count and use ++ to increment it. Then later the loop in the top part of the program can print out the counts.
The loop at the top is using the reverse "trick" to convert those offsets from 0 to 25 back into letters from a to z.
The 'last part', the implementation of the loop is really hard to understand. Close to obfuscation ;) Here's a refactoring of the count method (split in two method, a general one for all chars and a special on for just the small capital letters:
public static int[] countAllASCII(String line) {
int[] count = new int[256];
char[] chars = line.toCharArray();
for (char c : chars) {
int index = (int) c;
if (index < 256) {
count[index]++;
}
}
return count;
}
public static int[] countLetters(String line) {
int[] countAll = countAll(line);
int[] result = new int[26];
System.arraycopy(countAll, (int) 'a', result, 0, 26);
return result;
}
General idea: the countAll method just counts all chars. Yes, the array is bigger, but in these dimensions, nobody cares today. The advantage: I don't have to test each char. The second method just copy the area of interest into a new (resulting) array and returns it.
EDIT
I'd changed my code for a less unfriendly comment as well. Thanks anyway, Bombe.

Categories