Which declarations are valid? - java

Select the three correct answers (valid declarations).
(a) char a = '\u0061';
(b) char 'a' = 'a';
(c) char \u0061 = 'a';
(d) ch\u0061r a = 'a';
(e) ch'a'r a = 'a';
Answer: (a), (c) and (d)
Book:
A Programmer's Guide to Java SCJP Certification (Third Edition)
Can someone please explain the reason for the option (c) and (d) as the IDE (IntelliJ IDEA) is showing it in red saying:
Cannot resolve symbol 'u0063'

The compiler can recognise Unicode escapes and translate them to UTF-16. ch\u0061r will become char which is a valid primitive type. It makes option D correct.
3.3. Unicode Escapes
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) for the indicated hexadecimal value, and passing all other characters unchanged.
\u0061 will be translated to a which is a valid Java letter that can be used to form an identifier. It makes option C correct.
3.8. Identifiers
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter {JavaLetterOrDigit}
JavaLetter:
any Unicode character that is a "Java letter"
JavaLetterOrDigit:
any Unicode character that is a "Java letter-or-digit"
A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.
The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII dollar sign ($, or \u0024) and underscore (_, or \u005f). The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword.

\u0061 means a. You can use \u0061 instead of a, therefore:
char \u0061 = 'a';
is the same as
char a = 'a';
and
ch\u0061r a = 'a';
is the same as
char a = 'a';

Related

Difference between \p{Alpha} and \p{L} in Java

As I get it \p{L} include all letters from Unicode symbols, \p{Alpha} is slightly the same but only for Latin letters(ASCII). At my work I have 'A' latin and 'A' cyrillic, and \p{Alpha} in old java code don't match cyrillic symbols as letters. As I test it the \p{L} is solution for me. Can you folks give me some advice for this situation and what i shoud use in java code? On this page http://www.regular-expressions.info/posixbrackets.html use \p{Alpha} for java code.
Actually, \p{Alpha} is a POSIX character class implementation that will match extended characters only when used in combination with UNICODE_CHARACTER_CLASS (or (?U) flag), while \p{L} will always match all Unicode letters from the BMP plane. Note you can write \p{L} as \pL or \p{IsL}.
See more reference details:
Both \p{L} and \p{IsL} denote the category of Unicode letters.
POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
Have a look at the following demo:
String l = "Abc";
String c = "Абв";
System.out.println(l.matches("\\p{Alpha}+")); // => true
System.out.println(c.matches("\\p{Alpha}+")); // => false
System.out.println(c.matches("(?U)\\p{Alpha}+")); // => true
System.out.println(l.matches("\\p{L}+")); // => true
System.out.println(c.matches("\\p{L}+")); // => true

How to include backslash in String variable name (in java)

I want to include backslash in string variable name how to do that .
Ex:
String Cd_St_SSLC/PUC;
/ (forward-slashes) are discouraged as they are reserved characters. The presence of a / will throw a compile-time error if you are not dividing, commenting (//, /** */, or /* */), or enclosing it in a string ("//") or treating it as a character literal ('//'). Operators cannot be in a variable's name.
The Java™ Tutorials
Variables
Naming
Every programming language has its own set of rules and conventions for the kinds of names that you're allowed to use, and the Java programming language is no different. The rules and conventions for naming your variables can be summarized as follows:
Variable names are case-sensitive. A variable's name can be any legal identifier — an unlimited-length sequence of Unicode letters and digits, beginning with a letter, the dollar sign "$", or the underscore character "". The convention, however, is to always begin your variable names with a letter, not "$" or "". Additionally, the dollar sign character, by convention, is never used at all. You may find some situations where auto-generated names will contain the dollar sign, but your variable names should always avoid using it. A similar convention exists for the underscore character; while it's technically legal to begin your variable's name with "_", this practice is discouraged. White space is not permitted.
Subsequent characters may be letters, digits, dollar signs, or underscore characters. Conventions (and common sense) apply to this rule as well. When choosing a name for your variables, use full words instead of cryptic abbreviations. Doing so will make your code easier to read and understand. In many cases it will also make your code self-documenting; fields named cadence, speed, and gear, for example, are much more intuitive than abbreviated versions, such as s, c, and g. Also keep in mind that the name you choose must not be a keyword or reserved word.
If the name you choose consists of only one word, spell that word in all lowercase letters. If it consists of more than one word, capitalize the first letter of each subsequent word. The names gearRatio and currentGear are prime examples of this convention. If your variable stores a constant value, such as static final int NUM_GEARS = 6, the convention changes slightly, capitalizing every letter and separating subsequent words with the underscore character. By convention, the underscore character is never used elsewhere.
See also 1/2
The Java language specification for identifiers.
The Java® Language Specification: Java SE 7 Edition
Chapter 3. Lexical Structure
3.8. Identifiers
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigit
JavaLetter:
any Unicode character that is a Java letter (see below)
JavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit (see below)
3.12. Operators
37 tokens are the operators, formed from ASCII characters.
Operator: one of
= > < ! ~ ? :
== <= >= != && || ++ --
+ - * / & | ^ % << >> >>>
+= -= *= /= &= |= ^= %= <<= >>= >>>=
See also 2/2
The following method Character.isUnicodeIdentifierPart can determine "if the character may be part of a Unicode identifier".
Method: Java.lang.Character.isUnicodeIdentifierPart()
Description
The java.lang.Character.isUnicodeIdentifierPart(char ch) [method] determines if the specified character may be part of a Unicode identifier as other than the first character.
A character may be part of a Unicode identifier if and only if one of the following statements is true:
it is a letter
it is a connecting punctuation character (such as '_')
it is a digit
it is a numeric letter (such as a Roman numeral character)
it is a combining mark
it is a non-spacing mark
isIdentifierIgnorable returns true for this character.
That's a forward slash, and not legal in a Java variable name because it is the division operator.
int a = b/c;
I suggest you to take into consideration the Java naming conventions! You can read more about this in "Thinking in java", from http://java.about.com/od/javasyntax/a/nameconventions.htm... It is a good practice to avoid characters like '/', maybe you can replace it with '_'.
Roxana
I assist you not use this String Cd_St_SSLC/PUC because it is not legal in a Java variable name instead of this if you want to meaningful name use String Cd_St_SSLC_PUC underscore.

Are all names identifiers?

In the Java Language Specification 6.2 Link
Here is the following code example:
class Test {
public static void main(String[] args) {
Class c = System.out.getClass();
System.out.println(c.toString().length() +
args[0].length() + args.length);
}
}
And it states:
the identifiers Test, main, and the first occurrences of args and c are not names. Rather, they are used in declarations to specify the names of the declared entities. The names String, Class, System.out.getClass, System.out.println, c.toString, args, and args.length appear in the example.
But are the names like Class and String also identifiers? What is an identifier exactly?
An identifier is a type of a token. From the specification of the lexical structure of Java:
3.8. Identifiers
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigit
JavaLetter:
any Unicode character that is a Java letter (see below)
JavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit (see below)
A "Java letter" is a character for which the method
Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method
Character.isJavaIdentifierPart(int) returns true.
The "Java letters" include uppercase and lowercase ASCII Latin letters
A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical
reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or
\u0024). The $ character should be used only in mechanically generated
source code or, rarely, to access pre-existing names on legacy
systems.
The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).
Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.
An identifier cannot have the same spelling (Unicode character
sequence) as a keyword (§3.9), boolean literal (§3.10.3), or the null
literal (§3.10.7), or a compile-time error occurs.
An identifier is a user defined symbol.
It allows the compiler to differentiate between bindings to objects of the same type in the symbol table.
This might answer your 2nd question:
http://www.cafeaulait.org/course/week2/08.html
Identifiers are the names of variables, methods, classes, packages and
interfaces. Unlike literals they are not the things themselves, just
ways of referring to them.

Messed up with Java Declaration

why java constant have strange behaviour (Unicode Character and normal representation).. I mean see below example.
Note : All code is in java language.
char a = '\u0061'; //This is correct
char 'a' = 'a'; //This gives compile time error
char \u0061 = 'a'; //this is correct no error
ch\u0061r a = 'a'; //This too works
ch'a'r a = 'a'; // This really is confusing compile time error
Why last declaration is not works whereas ch\u0061r a='a'; works?
You cannot put literals ('a') in the middle of identifiers.
The line
char 'a' = 'a';
Does not compile because there is no identifier, and you cannot assign one literal to another.
Unicode is permitted, however. It is just hard to read :-)
You can not put literal characters, 'a', in identifiers. You can use unicode, \u0061, though.
This isn't confusing at all. You're randomly scattering single quotes around and expecting them to be irrelevant. In the first case, you're assigning the value of the single character \u0061 to a char variable. Then you're trying to use a character literal as a variable name, which doesn't work. Then you're using a Unicode-formatted character (not quoted) as a variable name, which is okay. Perhaps you're confusing Java's quote rules with shell?
You can find the reason in specification of literals
Unicode composite characters are different from the decomposed characters.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigit
JavaLetter:
any Unicode character that is a Java letter (see below)
JavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit (see below)

Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?

I stumbled over this (again) today:
class Test {
char ok = '\n';
char okAsWell = '\u000B';
char error = '\u000A';
}
It does not compile:
Invalid character constant in line 4.
The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.
Is there a logical explanation why characters that have a special notation (like \t, \n, \r) must be expressed in that form in Java source?
Unicode characters are replaced by their value, so your line is replaced by the compiler with:
char error = '
';
which is not a valid Java statement.
This is dictated by the Language Specification:
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.
This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:
public static void main(String[] args) {
for (char c‮h = 0; c‮h < Character.MAX_VALUE; c‮h++) {
if (Character.isJavaIdentifierPart(c‮h) && !Character.isJavaIdentifierStart(c‮h)) {
System.out.printf("%04x <%s>%n", (int) c‮h, "" + c‮h);
}
}
}
Unicode escape sequences like \u000a are replaced by the actual characters they represent before the Java compiler does anything else with the source code. And so, your program eventually ends up at
char ch = '
';
So the \u000a in your source code is replaced internally by a linefeed character. Note that this happens before the compiler actually reads and interprets your source code.
Referring to the Java Language Specification:
It is a compile-time error for a line terminator (§3.4) to appear after the opening ' and before the closing '.
And as well all know by heart, \n is a line terminator, quoting:
LineTerminator:
the ASCII LF character, also known as "newline"
the ASCII CR character, also known as "return"
the ASCII CR character followed by the ASCII LF character
Other symbols that could cause problems are \, ' and " for example.
I think the reason is that \uXXXX sequences are expanded when the code is being parsed, see JLS §3.2. Lexical Translations.
It is described in 3.3. Unicode Escapes http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html. Javac first finds \uxxxx sequences in .java and replaces them with real characters then compiles. In case of
char error = '\u000A';
\u000A will be replace with newline character code (10) and the actual text will be
char error = '
';
Because the compiler treats them the same as unescaped text.
This is valid code:
class \u00C9 {}

Categories