Reading the Java Code Conventions document from 1997, I saw this in an example on P16 about variable naming conventions:
int i;
char *cp;
float myWidth;
The second declaration is of interest - to me it looks a lot like how you might declare a pointer in C. It gives a syntax error when compiling under Java 8.
Just out of curiosity: was this ever valid syntax? If so, what did it mean?
It's a copy-paste error, I suppose.
From JLS 1 (which is really not that easy to find!), the section on local variable declarations states that such a declaration, in essence, is a type followed by an identifier. Note that there is no special reference made about *, but there is special reference made about [] (for arrays).
char is our type, so the only possibility that remains is that *cp is an identifier. The section on Identifiers states
An identifier is an unlimited-length sequence of Java letters and Java
digits, the first of which must be a Java letter.
...
A Java letter is a character for which the method Character.isJavaLetter (§20.5.17) returns true
And the JavaDoc for that method states:
A character is considered to be a Java letter if and only if it is a
letter (§20.5.15) or is the dollar sign character '$' (\u0024) or the
underscore ("low line") character '_' (\u005F).
so foo, _foo and $foo were fine, but *foo was never valid.
If you want a more up-to-date Java style guide, Google's style guide is the arguably the most commonly referenced.
It appears that this is a generic coding style document for C-like languages with some Java-specific additions. See, for example, also the next page:
Do not use the assignment operator in a place where it can be easily confused with the equality operator. Example:
if (c++ = d++) { // AVOID! Java disallows.
…
}
It does not make sense to tell a programmer to avoid something that is a syntax error anyway, so the only conclusion we can draw from this is that the document is not 100% Java-specific.
Another possibility is that it was meant as a coding style for the entire Java system, including the C++ parts of the JRE and JDK.
Note that Sun abandoned the coding style document even long before Oracle came into the picture. They restrained themselves to specifying what the language is, not how to use it.
Invalid syntax!
It's just a copy/paste mistake.
The Token (*) in variables is applicable only in C because it uses pointers whereas JAVA never uses pointers.
And Token (*) is used only as operator in JAVA.
Related
Looking at GenericWhitespaceCheck in Checkstyle docs,
Left angle bracket (<):
should be preceded with whitespace only in generic methods definitions.
should not be preceded with whitespace when it is precede method name or following type name.
should not be followed with whitespace in all cases.
Right angle bracket (>):
should not be preceded with whitespace in all cases.
should be followed with whitespace in almost all cases, except diamond operators and when preceding method name.
I am not sure I fully understand the reasoning behind why < should not be followed by a space and why > should not be preceded by one.
In other words, why is Map<String> the convention over Map < String > ?
Is this only because as the number of parameters and depth increases it, the without spaces version is more readable.
Like, Map<String, List<String>> is more readable than, Map < String, List < String > > ?
Also as a general question is there some repository/guides which explains reasons behind Checkstyle conventions ?
The introduction to an early tutorial on generics (from 2004) says (emphasis mine):
This tutorial is aimed at introducing you to generics. You may be familiar with
similar constructs from other languages, most notably C++ templates. If so, you’ll soon
see that there are both similarities and important differences. If you are not familiar
with look-a-alike constructs from elsewhere, all the better; you can start afresh, without
unlearning any misconceptions.
This is acknowledging that Java generics look like C++ templates. C++ templates also conventionally omit the space after the opening <, and before the closing >.
The conventions around Java generics will follow from the ways they were written in early tutorials.
Although I have no evidence or research to base my theory on, I'd reason as follows:
Cohesion
A (kind of language-philosophical) rationale could be:
The parameterization of types (generic's major role) such as in Map<String, Object> belongs to the type-name the same like parentheses and parameters belong to the method-name. So adding parameter to a signature should follow a consistent spacing-rule: no space around parametrizing brackets (neither in generic-type's parameter definition, nor in method's parameter definition).
Thus the angel-brackets are coherently defining the "type signature" and should stay as close to the type as possible (semantical and spatial), which means no space should untie this relation.
Readability
From the (Clean Code) perspective there is a clear benefit for avoiding spaces:
Spaces around angel-brackets rather make them mis-read or mis-interpreted as logical comparison operators.
Most coding styles have a limit of characters per line or column length. Reducing whitespace make shorter lines that are easier to read.
For example, the google code style for java has a column limit of 100 characters.
Coding styles depend on the community behind the language, so I would recommend checking their standards.
In Java, variable names start with a letter, currency character ($) etc. but not with number, :, or .
Simple question: why is that?
Why doesn't the compiler allow to have variable declarations such as
int 7dfs;
Simply put, it would break facets of the language grammar.
For example, would 7f be a variable name, or a floating point literal with a value of 7?
You can conjure others too: if . was allowed then that would clash with the member selection operator: would foo.bar be an identifier in its own right, or would it be the bar field of an object instance foo?
Because the Java Language specification says so:
IdentifierChars:
JavaLetter {JavaLetterOrDigit}
So - yes, an identifier must start with a letter; it can't start with a digit.
The main reasons behind that:
it is simply what most people expect
it makes parsing source code (much) easier when you restrict the "layout" of identifiers; for example it reduces the possible ambiguities between literals and variable names.
I want to determine whether a given string is a valid Java expression (according to Java's syntax).
For example:
object.apply()
x == 2
(x != null) && x.alive
Are all valid expressions in Java.
But:
object.apply();
==
for(int i=1; i < n; ++i) i.print();
Are not valid expression in Java (some are valid statements, but this is not what I'm looking for).
Is there a simple solution? (like isJavaIdentifierStart and isJavaIdentifierPart when one wants to determine whether a string is a valid identifier)
You need to parse the expression the same way the Java compiler would parse it, following the Java language standard specification.
Building your own parser from scratch is not a good idea; the Java syntax has gotten complicated in the last decade. You should find an existing Java parser and reuse that so you don't have to reinvent the wheel incorrectly.
JavaCC and ANTLR are both available in Java-form, and have Java grammars defined for them. I suggest you consider them as prime candidates. A complication is that these parsers parse full programs, not expressions. You can fix that by modifying the grammar to make expression a goal rule, and then fixing any grammar conflicts that may produce; I would not expect much.
A more complex issue: just because the syntax is valid, doesn't mean the expression is valid. I'm pretty sure that the syntax of java will accept:
"abc" * 17.2
as valid syntax.
If you want to verify the validity of the expression, you have to type-check it, using the context in which the expression will be evaluated to provide the background type information. Otherwise one will accept this as valid:
s * d // expression that parses correctly, but isn't valid
when the background knowledge is this:
Object s;
char d;
Doing a full type check is much, much harder. As a practical matter, you'll need a full Java compiler front end, which parses and does the type checking.
Parser generators (e.g., ANTLR, JavaCC) provide zero help doing this.
So you either use the Java compiler or search for a Java front end; there are a few. [Full disclosure: my company provides one that can do this].
Nope, there is definetly not a simple way to check whether a String is a valid Java code. I can think of only two ways.
1. Export to a file and complie it
You can save a String as a file with the .java suffix and compile it. According the result of compilation, you can said if the String is valid or not.
2. Java parser
You may find a library able to do that. Take a look at JavaCC. Here I cite from their site:
A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
The notion of a "valid Java expression" is ... rubbery.
For example:
1 == true
is syntactically valid, but a Java compiler would reject it because == cannot be used with operands with that have those types. Then:
x.length() == 42
may or may not be valid, depending on the declared type of x.
If you are simply interested in whether an expression is syntactically valid, then a parser for a subset of the Java language is sufficient.
On the other hand, if you want to check if the expression would be compilable when embedded into a Java program, then the simplest approach is to embed the expression in an equivalent context and compile it with a real Java compiler.
You can create a parser with ANTLR
and you can define your own rules.
What is the difference between these two errors, lexical and semantic?
int d = "orange";
inw d = 4;
Would the first one be a semantic error? Since you can't assign a literal to an int? As for the second one the individual tokens are messed up so it would be lexical? That is my thought process, I could be wrong but I'd like to understand this a little more.
There are really three commonly recognized levels of interpretation: lexical, syntactic and semantic. Lexical analysis turns a string of characters into tokens, syntactic builds the tokens into valid statements in the language and semantic interprets those statements correctly to perform some algorithm.
Your first error is semantic: while all the tokens are legal it's not legal in Java to assign a string constant to a integer variable.
Your second error could be classified as lexical (as the string "inw" is not a valid keyword) or as syntactic ("inw" could be the name of a variable but it's not legal syntax to have a variable name in that context).
A semantic error can also be something that is legal in the language but does not represent the intended algorithm. For example: "1" + n is perfectly valid code but if it is intending to do an arithmetic addition then it has a semantic error. Some semantic errors can be picked up by modern compilers but ones such as these depend on the intention of the programmer.
See the answers to whats-the-difference-between-syntax-and-semantics for more details.
could anyone tell me the difference between Terminal and non-terminal symbol in the case of Java?
Does Terminal mean a Keyword and non-terminal any common string literal?
In grammars, a terminal is some form of token (keyword, identifier, symbol, literal, etc.) whilst a non-terminal reference rules.
So both a keyword and a literal string would be terminals. A statement would be non-terminal.
(That's probably a really bad description. Read the dragon book.)
EDIT (not by original answerer): I'd never heard of the dragon book, so here's a reference.
Your question is rather unclear. Are you talking about the formal grammar which describes the Java language? If so, everything you see in a syntactically valid Java file is (part of) a terminal.
A string is 'in' the language described
by some grammar if it can be produced
by applying the production rules of
the grammar until only terminals
remain.
Perhaps you should check out the Java Language Specification,