I have a quick question relating to ASCII and encoding. I am looking to take the input from a user - for example: "cat" - and turn it into a code. The code is as follows:
All lower case letters are converted to capital letters.
The first letter in the encoded message is stored as it's ASCII code value.
All subsequent letters are represented as the offset between the current letter and the previous letter.
For example: "cat" = 67 -2 19 since "C" = 67, "A" is -2 letters away from "C", and "T" is 19 letters away from "A". Any help would be greatly appreciated!
Scanner input = new Scanner(System.in);
String s = input.next().toUpperCase();
int ascii = s.charAt(0);
System.out.println(ascii);
for (int i = 1; i < s.length(); i++) {
ascii = s.charAt(i - 1);
System.out.println(s.charAt(i) - ascii);
}
To convert to capital letters there is the toUpperCase() method which returns the string with all capital letters.
To get the ASCII code of a letter you can simply create an integer and assign it the desired character (in this case the first letter of the string, so int ascii = s.charAt(0);).
To get the offsets you can use a for-loop starting from 1 (the second letter) where you get the ASCII code of the previous character and subtract it from the current one.
Related
I have a capital letter defined in a variable string, and I want to output the next and previous letters in the alphabet. For example, if the variable was equal to 'C', I would want to output 'B' and 'D'.
One way:
String value = "C";
int charValue = value.charAt(0);
String next = String.valueOf( (char) (charValue + 1));
System.out.println(next);
Well if you mean the 'ABC' then they split into two sequences a-z and A-Z, the simplest way I think would be to use a char variable and to increment the index by one.
char letter='c';
letter++; // (letter=='d')
same goes for decrement:
char letter='c';
letter--; // (letter=='b')
thing is that the representation of the letters a-z are 97-122 and A-Z are 65-90, so if the case of the letter is important you need to pay attention to it.
If you are limited to the latin alphabet, you can use the fact that the characters in the ASCII table are ordered alphabetically, so:
System.out.println((char) ('C' + 1));
System.out.println((char) ('C' - 1));
outputs D and B.
What you do is add a char and an int, thus effectively adding the int to the ascii code of the char. When you cast back to char, the ascii code is converted to a character.
All the answers are correct but none seem to give a full explanation so I'll try. Just like any other type, a char is stored as a number (16-bit in Java). Unlike other non-numeric types, the mapping of the values of the stored numbers to the values of the chars they represent are well known. This mapping is called the ASCII Table. The Java compiler treats chars as a 16-bit number and therefore you can do the following:
System.out.print((int)'A'); // prints 65
System.out.print((char)65); // prints A
For this reason, the ++, -- and other mathematical operations apply to chars and provide a way to increment\decrement their values.
Note that the casting is cyclic when you exceed 16-bit:
System.out.print((char)65601); // also prints A
System.out.print((char)-65471); // also prints A
P.S. This also applies to Kotlin:
println('A'.toInt()) // prints 65
println(65.toChar()) // prints A
println(65601.toChar()) // prints A
println((-65471).toChar()) // prints A
just like this :
System.out.printf("%c\n",letter);
letter++;
Is there a java library to convert special characters into decimal equivalent?
example:
input: "©™®"
output: "& #169; & #8482; & #174;"(space after & is only for question purpose, if typed without a space decimal equivalent is converted to special character)
Thank you !
This can be simply achieved with String.format(). The representations are simply the character value as decimal, padded to 4 characters and wrapped in &#;
The only tricky part is deciding which characters are "special". Here I've assumed not digit, not whitespace and not alpha...
StringBuilder output = new StringBuilder();
String input = "Foo bar ©™® baz";
for (char each : input.toCharArray()) {
if (Character.isAlphabetic(each) || Character.isDigit(each) || Character.isWhitespace(each)) {
output.append(each);
} else {
output.append(String.format("&#%04d;", (int) each));
}
}
System.out.println(output.toString());
You just need to fetch the integer value of the character as mentioned in How do I get the decimal value of a unicode character in Java?.
As per Oracle Java doc
char: The char data type is a single 16-bit Unicode character. It has
a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or
65,535 inclusive).
Assuming your characters fall within the character range, you can just get the decimal equivalent of each character from your string.
String text = "©™®";
char[] cArr = text.toCharArray();
for (char c : cArr)
{
int value = c; // get the decimal equivalent of the character
String result = "& #" + value; // append to some format string
System.out.println(result);
}
Output:
& #169
& #8482
& #174
I want to know that how to recognize and print next character in ASCII sequence if input is a non- string value like "space" or "!".
I know that for string value we can convert it into ASCII value by using
char character = 'a';
int ascii = (int) character;
Then adding 1 to it and converting it back to char , we can get next value in the sequence .
You can use:
char character = 'a';
int ascii = (char)((int)character+1);
It should work. But I have haven`t tested it.
In Currency.java file there is a line.
private static final int A_TO_Z = ('Z' - 'A') + 1;
What means is this? I didn't see this before. What is A_TO_Z's value and why it using 'Z' instead number.
With this expression you are treating chars as ints, using character's Unicode value instead of the character itself.
'Z' - 'A' + 1
Will become
90 - 65 + 1 (=26)
'Z' is a char with an integral value of 90.
'A' is a char with an integral value of 65.
90 - 65 + 1 = 26
Nasty. 'A' is the char literal for ASCII value of A (65 in decimal). 'Z' is 90. So A_TO_Z is 26, the number of letters in the English alphabet.
Characters have numeric values according to their value in the character table. That expression exploits the fact that all the letters form A to Z have consecutive values in the underlying encoding table thus subtracting the first value from the last ( + 1) gives the length of the English alphabet. The actual numerical values are unimportant in this case and the code is more or less self-explainable to the reader. In case the used encoding spreads the letters differently, the expression will become incorrect.
Anyone have an idea about what could be going on here?
The first block shows what I would generally expect to see - the first character of a string is in index '0', with the 'problem' string commented out, replaced by the exact same thing, however never run before.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
//String wordOne = "abc"; // old, pre-used string, used to hold a comma.
String wordOne = "abc";// new, never run before with a comma
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 2 // okay
Character at index '0' in wordOne: a // okay
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
The second block has the 'new' string commented out, and the first character of 'wordOne' is nothing. It isn't a null character, or newline. I had been using that variable to find commas in 'theDoc'… but when I ran it, index '0' held nothing, and index 1 had the comma in it. If i copy and paste the string, the problem remains. However, commenting it out / deleting it, gets rid of the issue.
public void finderTest(){
String theDoc = "Hello, I want this to work, and work well! Do you think it will work, and if not, why not?";
String wordOne = "abc"; // now running old string, used to hold comma
//String wordOne = "abc";
String wordTwo = "and";
System.out.println("Type of character at index '0' in theDoc: "+Character.getType(theDoc.charAt(0)));
System.out.println("Character at index '0' in theDoc: "+theDoc.charAt(0));
System.out.println();
System.out.println("All of wordOne: "+"'"+wordOne+"'");
System.out.println("Type of character at index '0' in wordOne: "+Character.getType(wordOne.charAt(0)));
System.out.println("Character at index '0' in wordOne: "+wordOne.charAt(0));
System.out.println();
System.out.println("Type of Character at index '0' in wordTwo: "+Character.getType(wordTwo.charAt(0)));
System.out.println("Character at index '0' in wordTwo: "+wordTwo.charAt(0));
}
Which gives output:
/*
Type of character at index '0' in theDoc: 1
Character at index '0' in theDoc: H
All of wordOne: 'abc'
Type of character at index '0' in wordOne: 16 // What does this mean?
Character at index '0' in wordOne: // where is the a? (well, its in wordOne index '1'... but why??)
Type of Character at index '0' in wordTwo: 2
Character at index '0' in wordTwo: a
*/
Is there something about commas or symbols in java that would cause an issue like this? I tried using character arrays, cleaning the workspace to re-build everything, and nothing has changed this… Which is a huge problem for finding indices of 'ngrams' within sentences, when some grams are things like ", and". At one point last night, it was working, and then all of a sudden started not working. I'm quite confused.
Any ideas?
I tried pasting your example into Eclipse and it told me this:
Some characters cannot be mapped using "Cp1252" character encoding.
and pointed me to the first character in the string:
String wordOne = "abc";
It appears there is a hidden (non-printable) character between the " and the a.
Character type 16 corresponds to Unicode DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING (U+202B). It's an unprintable character; you can print it's hex value to confirm.
Your string contains a character you're having trouble seeing (before the 'a'). There are dozens of characters in the Unicode set which have no meaningful visual representation - this is probably one of them.
The '16' is the character type, for example:
COMBINING_SPACING_MARK, CONNECTOR_PUNCTUATION, CONTROL, CURRENCY_SYMBOL, DASH_PUNCTUATION, DECIMAL_DIGIT_NUMBER, ENCLOSING_MARK, END_PUNCTUATION, FINAL_QUOTE_PUNCTUATION, FORMAT, INITIAL_QUOTE_PUNCTUATION, LETTER_NUMBER, LINE_SEPARATOR, LOWERCASE_LETTER, MATH_SYMBOL, MODIFIER_LETTER, MODIFIER_SYMBOL, NON_SPACING_MARK, OTHER_LETTER, OTHER_NUMBER, OTHER_PUNCTUATION, OTHER_SYMBOL, PARAGRAPH_SEPARATOR, PRIVATE_USE, SPACE_SEPARATOR, START_PUNCTUATION, SURROGATE, TITLECASE_LETTER, UNASSIGNED, UPPERCASE_LETTER
All of which are defined in the Character class. I can't tell you which one it is, because that's implementation-dependent in theory; you should check against those values. Or, better yet, use Character.getName to find the human-readable description of the character.