Error that is neither syntactic nor semantic?

Error that is neither syntactic nor semantic? - java

I had this question on a homework assignment (don't worry, already done):
[Using your favorite imperative language, give an example of
each of ...] An error that the compiler can neither catch nor easily generate code to
catch (this should be a violation of the language definition, not just a
program bug)
From "Programming Language Pragmatics" (3rd ed) Michael L. Scott
My answer, call main from main by passing in the same arguments (in C and Java), inspired by this. But I personally felt like that would just be a semantic error.
To me this question's asking how to producing an error that is neither syntactic nor semantic, and frankly, I can't really think of situation where it wouldn't fall in either.
Would it be code that is susceptible to exploitation, like buffer overflows (and maybe other exploitation I've never heard about)? Some sort of pit fall from the structure of the language (IDK, but lazy evaluation/weak type checking)? I'd like a simple example in Java/C++/C, but other examples are welcome.

Undefined behaviour springs to mind. A statement invoking UB is neither syntactically nor semantically incorrect, but rather the result of the code cannot be predicted and is considered erroneous.
An example of this would be (from the Wikipedia page) an attempt to modify a string-constant:
char * str = "Hello world!";
str[0] = 'h'; // undefined-behaviour here
Not all UB-statements are so easily identified though. Consider for example the possibility of signed-integer overflow in this case, if the user enters a number that is too big:
// get number from user
char input[100];
fgets(input, sizeof input, stdin);
int number = strtol(input, NULL, 10);
// print its square: possible integer-overflow if number * number > INT_MAX
printf("%i^2 = %i\n", number, number * number);
Here there may not necessarily be signed-integer overflow. And it is impossible to detect it at compile- or link-time since it involves user-input.

Statements invoking undefined behavior1 are semantically as well as syntactically correct but make programs behave erratically.
a[i++] = i; // Syntax (symbolic representation) and semantic (meaning) both are correct. But invokes UB.
Another example is using a pointer without initializing it.
Logical errors are also neither semantic nor syntactic.
1. Undefined behavior: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Here's an example for C++. Suppose we have a function:
int incsum(int &a, int &b) {
return ++a + ++b;
}
Then the following code has undefined behavior because it modifies an object twice with no intervening sequence point:
int i = 0;
incsum(i, i);
If the call to incsum is in a different TU from the definition of the function, then it's impossible to catch the error at compile time, because neither bit of code is inherently wrong on its own. It could be detected at link time by a sufficiently intelligent linker.
You can generate as many examples as you like of this kind, where code in one TU has behavior that's conditionally undefined for certain input values passed by another TU. I went for one that's slightly obscure, you could just as easily use an invalid pointer dereference or a signed integer arithmetic overflow.
You can argue how easy it is to generate code to catch this -- I wouldn't say it's very easy, but a compiler could notice that ++a + ++b is invalid if a and b alias the same object, and add the equivalent of assert (&a != &b); at that line. So detection code can be generated by local analysis.

Related

Bugs found by FindBugs plugin Eclipse

using bug finder plugin, I found this bugs but does not understand why it was seen as bug in the code. Does anybody know and give me proper explanation regarding these? Thanks.
Source Code - https://drive.google.com/open?id=1gAyHFcdHBShV-9oC5G7GeOtCGf7bXoso;
Patient.java:17 Patient.generatePriority() uses the nextDouble method of Random to generate a random integer; using nextInt is more efficient [Of Concern(18), Normal confidence]
public int generatePriority(){
Random random = new Random();
int n = 5;
return (int)(random.nextDouble()*n);
}
ExaminationRoom.java:25 ExaminationRoom defines equals and uses Object.hashCode() [Of Concern(16), Normal confidence]
public boolean equals(ExaminationRoom room){
if (this.getWaitingPatients().size() == room.getWaitingPatients().size()){
return true;
}
else {
return false;
}
}
ExaminationRoom.java:15 ExaminationRoom defines compareTo(ExaminationRoom) and uses Object.equals() [Of Concern(16), Normal confidence]
// Compares sizes of waiting lists
#Override
public int compareTo(ExaminationRoom o) {
if (this.getWaitingPatients().size() > o.getWaitingPatients().size()){
return 1;
}
else if (this.getWaitingPatients().size() < o.getWaitingPatients().size()){
return -1;
}
return 0;
}
Hospital.java:41 Bad month value of 12 passed to new java.util.GregorianCalendar(int, int, int) in Hospital.initializeHospital() [Scary(7), Normal confidence]
doctors.add(new Doctor("Hermione", "Granger", new GregorianCalendar(1988, 12, 10), Specialty.PSY, room102));
Person.java:29 Return value of String.toLowerCase() ignored in Person.getFullName() [Scariest(3), High confidence]
public String getFullName(){
firstName.toLowerCase();
Character.toUpperCase(firstName.charAt(0));
lastName.toLowerCase();
Character.toUpperCase(lastName.charAt(0));
return firstName + " " + lastName;
}

Don’t create a new Random object each time.
Use random.nextInt(n).
Define a hashCode method in ExaminationRoom.
Having your compareTo method be inconsistent with equals() may or may not be OK.
Use LocalDate instead of GregorianCalendar.
Pick up and use the return values from String.toLowerCase() and Character.toUpperCase().
Consider SpotBugs as a newer alternative to FindBugs.
Details
Random
Creating a new Random object each time you need one gives your poorer pseudo-random numbers with a high risk of numbers being repeated. Declare a static variable holding the Random object outside your method and initialize it in the declaration (Random is thread-safe, so you can safely do that). For drawing a pseudo-random number from 0 through 4 use
int n = 5;
return random.nextInt(n);
It’s not only more efficient (as FindBugs says), I first of all find it much more readable.
hashCode()
#Override
public int hashCode() {
return Objects.hash(getWaitingPatients());
}
compareTo()
The equals method you have shown us seems to contradict FindBugs here. It does look a bit funny, though. Are two waiting rooms considered the same if they have the same number of waiting patients? Please think again. If you end up deciding that they are not equal but should be sorted into the same spot without discrimination, your compareTo method is inconsistent with equals(). If so, please insert a comment stating this fact. If you want FindBugs not to report this as a bug in subsequent analyses, you have two options:
Insert an annotation telling FindBugs to ignore the “bug”.
Create a FindBugs ignore XML file including this point.
I’m sorry I don’t remember the detauls of each, but your search engine should be helpful.
Don’t use GregorianCalendar
The GregorianCalendar class is poorly designed and long outdated. I suggest you evict it from your code and use LocalDate from java.time, the modern Java date and time API, instead.
doctors.add(new Doctor("Hermione", "Granger", LocalDate.of(1988, Month.DECEMBER, 10), Specialty.PSY, room102));
String.toLowerCase()
This was already treated in the other answer. Changing a name to have the first letter in upper case and the rest in lower case is not as simple as it sounds.
firstName.toLowerCase();
Character.toUpperCase(firstName.charAt(0));
The first of these two lines doesn’t modify the string firstName because strings have been designed to be immutable and toLowerCase() to return a new string with all letters in lower case (according to the rules of the default locale of the JVM, confusing). The second line also doesn’t modify any character because Java is call-by-value (look it up), so no method can modify a variable passed as argument. You’re not even passing a variable, but the return value from a different method. Also Character.toUpperCase() returns a new char in lower case.
What you need to do is pick up the values returned from these two method calls, use a substring operation for removing the first letter from the lower-case version of the name and concatenate the upper-case version of that letter with the remainder of the lower-case string. If it’s complicated, I am sure that your search engine can find examples of where and how it is done.
A bit of an aside: You may want to think twice before forcing Dr. Jack McNeil to be written as Mcneil and Dr. Ludwig von Saulsbourg as Von saulsbourg.
SpotBugs
It’s only something I have heard, I haven’t checked myself. The source code of FindBugs has been taken over by a project called SpotBugs. They say that SpotBugs is being developed more actively than FindBugs. So you may consider switching. I am myself a happy SpotBugs user in my daily work.
Links
What issues should be considered when overriding equals and hashCode in Java?
Documentation of Comparable explaining what it means that compareTo() is inconsistent with equals() and how to go about it.
Oracle tutorial: Date Time explaining how to use java.time.
Chicago Hope listing Jack McNeil as Orthopedic Surgeon.
Dr. Jack mentioning Dr. Ludwig von Saulsbourg on the cast.
SpotBugs

First thing to remember about "bug finder" tools, is that they are usually only guidelines. With that said:
The class GregorianCalendar counts months from 0, meaning 0 is January, 11 is December. 12 represents the 13th month which doesn't exist. Since the function expects an int, and you gave it an int, no compiler error is generated, even though this is certainly a bug. This article does a good job explaining the reasons to upgrade, and give examples of how to use the new APIs: https://www.baeldung.com/java-8-date-time-intro
If in doubt, you can always check the documentation. In this case, the class Calendar (which GregorianCalendar extends) delcares a static constant public static final int JANUARY = 0; This confirms that january is indeed 0, but also indicates that we can use this constant in our code. You might find new GregorianCalendar(1988, Calendar.JANUARY, 10) to be a bit more readable.
You may also want to consider switching to the more modern and standard systems used to deal with time. The Java 8 Time libraries are the "new standard", and are definitely worth looking into.
Secondly, Strings are immutable in Java. This means that once a String is created, its value can never be changed. This may be counter to your intuitions, as you may have seen code such as:
String s = "hello";
s = s + " world";
However, this doesn't modify the string s. Instead, s + " world" creates a new String, and assigns it to the variable s.
Similarly, s.toLowerCase() doesn't change what s is, it only generates a new String which you must assign.
You probably want firstName = firstName.toLowerCase();
With your first example, nothing immediately jumps out to me as "bad", but if you look at the messages generated by your tool, they label the first example as "Of Concern", but label the others (e.g. the string.toLowerCase() example) as "Scary"/"Scariest". Although I am not familiar with this tool in particular, I imagine this is indicating more of a "Code Smell", rather than an actual bug.
Perhaps look into Unit Testing, if you want to reassure yourself that your code works.

Why isn't the ArrayIndexOutOfBoundsException a compile time error?

Can someone explain to me why ArrayIndexOutOfBoundsException is a run-time exception instead of a compile-time error?
In obvious cases when the indexes are negative or greater than the array size, I don't see why it cannot be a compile-time error.
Edited: especially when the size of the array and even the indexing is known at compile time, for example int[] a = new int[10]; a[-1]=5; This should be a compilation error.

The size of the arrays may be defined only at runtime (for instance, the simplest case, if the size of an array depends on the user input).
Therefore it would be impossible to check at compile time for such kind of exceptions, by checking the accesses of an array without actually know its bounds (size).

Because it can't be detected at compile-time all the time.

Entering a[-1] = 5; is something only novices would do (as Richard Tingle said). So it's not worth the effort to update the language standard just for that kind of error. A more interesting case would be a[SOME_CONSTANT] = 5; where SOME_CONSTANT was defined as static final int SOME_CONSTANT = -1; (or some expression involving only constants that computes to -1) in some other class. Even then, however, if the compiler flagged this as an error, it might catch cases where the programmer has put a[SOME_CONSTANT] = 5; in an if statement that has already checked for negative values of the constant. (I'm assuming here that SOME_CONSTANT is a constant whose value could change if the application's requirements change.) So while the language could, in theory, make it illegal to write an array indexing operation that can't possibly succeed, there are good reasons not to.
P.S. This is a real issue. The Ada language does do some compile-time checking for static expressions that can't succeed, but it doesn't check this case, and there has been some discussion in the last few weeks about whether it should, or whether compilers should be allowed (but not required) to reject programs with array indexing that is known to fail.

There is no way to check all indexes at compile time, because they can be variables and its values can change at runtime. If you have array[i] and i is the result of reading a file, you can evaluate i when executing the program. Even if you use a variable, remember that you can reassign your array changing its capacity. Again, this can be checked only ar runtime.
Check this question for more information: Runtime vs Compile time.

As well as agreeing with the fact that array size can't be checked at compile time, I want to add another note on the limit of the size of an array, which is expected to be in the range of primitive int:
// This compiles, because the size evaluates to an integer.
int[] array = new int[Integer.MAX_VALUE + 1];
// This doesn't compile.
int[] array = new int[Long.MAX_VALUE];
And this error is because of length field (int) of arrays which are special Java objects.

When working with pointers it's possible to have negative indexes and not have an error if you have correctly reserved the memory position you will access. Here is an example. When working with low-level programming languages things like this one are very frequently done but they don't have a lot of sense in high-level languages, at least for me.
int arr[10];
int* p = &arr[2];
int x = p[-2]; // valid: accesses arr[0]
if you try to do:
arr[-5] //you will access and invalid block of memory, this why you get the error.
this may result a very helpful and interesting:
http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap7/subsection2.1.3.2.html

Execution priority of expressions

In the following line of code:
x = x.times(x).plus(y);
in what order are these expressions going to be executed?
Will it be like:
x = (x + y)*x
or x = (x^2) + y,
or something else and why?
Links to documentation about the specific subject will be highly appreciated as I had no luck with my search. Apparently I don't know where to look at and what to look for.
Thank you.

These are methods; the fact that they are called "plus" and "times" doesn't mean that they'll necessarily follow the behaviour of the built-in + and * operators.
So x.times(x) will be executed first. This will return a reference to an object, on which plus(y) will then be executed. The return value of this will then be assigned to x. It's equivalent to:
tmp = x.times(x);
x = tmp.plus(y);

Here's a link to a documentation which most likely contains the required answer (probably at 15.7). It's highly technical and verbose but not inaccessible to most people (I believe).
However, it seems that you're just starting programming, so you'll be better off reading other answers here, and programming more to get an intuitive feel (not exactly a 'feel', as it's systematic and rigourous) of the order of operations etc...
Don't be afraid to write "throw-away" code (which you can incidentally save too) to find out things you don't know if you don't know where else to look for the answer. You can always google more intensively or dive through the language specs at a latter date. You'll learn faster this way. :)
One simple way to find out is to write something like this:
class Number{
private int number;
public Number(int x){
number = x;
}
public Number times(Number x){
System.Out.PrintLn("times");
return number * x;
}
public Number plus(Number x){
System.Out.PrintLn("plus");
return number + x;
}
}

Method chains get executed from left to right, with each method using the result from the previous method, so it will be x = (x^2) + y.

What you're referring to in the algebraic expressions is operator precedence - evaluating multiplications before addition, for example. The Java compiler knows about these rules for expressions, and will generate code to evaluate them as you expect.
For method calling, there are no "special rules". When given x = x.times(x).plus(y); the compiler only knows that to evaluate x.times(x).plus(y), it first needs to know what x is, so it can call times on it. Likewise, it then needs to know what x.times(x) is so it can call the plus method on that result. Hence, this type of statement is parsed left to right : (x * x) + y.
Some languages allow the creation of functions that are "infix" with user supplied precedence. (such as Haskell : See http://www.haskell.org/tutorial/functions.html, section "Fixity declarations"). Java is, alas, not one of them.

It's going to be executed in left-to-right order, as
x = (x.times(x)).plus(y)
The other way:
x = x.(times(x).plus(y))
doesn't even make sense to me. You would have to rewrite it as
x = x.times(x.plus(y))
to make sense of it, but the fact that the second x is contained within times() while the y is outside it rules out that interpretation.

The reason the documentation doesn't say anything about this is probably that such expressions follow the normal rules for how a statement like a.b().c().d() is evaluated: from left to right. We start with a and call the function b() on it. Then, we call c() on the result of that call, and we call d() on the result of c(). Hence, x.times(x).plus(y) will first perform the multiplication, then the addition.

In regards to for(), why use i++ rather than ++i?

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:
for (i = 0; i < arr.length; i++)
where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).
I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.
(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)
Anyway, at what point, and why, did i++ become more fashionable?
I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.
If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)

It doesn't matter which you use. On some extremely obsolete machines, and in certain instances with C++, ++i is more efficient, but modern compilers don't store the result if it's not stored. As to when it became popular to postincriment in for loops, my copy of K&R 2nd edition uses i++ on page 65 (the first for loop I found while flipping through.)

For some reason, i++ became more idiomatic in C, even though it creates a needless copy. (I thought that was through K&R, but I see this debated in other answers.) But I don't think there's a performance difference in C, where it's only used on built-ins, for which the compiler can optimize away the copy operation.
It does make a difference in C++, however, where i might be a user-defined type for which operator++() is overloaded. The compiler might not be able to assert that the copy operation has no visible side-effects and might thus not be able to eliminate it.

As for the reason why, here is what K&R had to say on the subject:
Brian Kernighan
you'll have to ask dennis (and it might be in the HOPL paper). i have a
dim memory that it was related to the post-increment operation in the
pdp-11, though beyond that i don't know, so don't quote me.
in c++ the preferred style for iterators is actually ++i for some subtle
implementation reason.
Dennis Ritchie
No particular reason, it just became fashionable. The code produced
is identical on the PDP-11, just an inc instruction, no autoincrement.
HOPL Paper
Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few ‘auto-increment’ memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger
motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.

For integer types the two forms should be equivalent when you don't use the value of the expression. This is no longer true in the C++ world with more complicated types, but is preserved in the language name.
I suspect that "i++" became more popular in the early days because that's the style used in the original K&R "The C Programming Language" book. You'd have to ask them why they chose that variant.

Because as soon as you start using "++i" people will be confused and curios. They will halt there everyday work and start googling for explanations. 12 minutes later they will enter stack overflow and create a question like this. And voila, your employer just spent yet another $10

Going a little further back than K&R, I looked at its predecessor: Kernighan's C tutorial (~1975). Here the first few while examples use ++n. But each and every for loop uses i++. So to answer your question: Almost right from the beginning i++ became more fashionable.

My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:
while( *p++ ) {
...
}
Note that the post-fix form is important here (using the infix form would create a one-off type of bug).
When the time comes to write a for loop where ++i or i++ doesn't really matter, it may feel more natural to use the postfix form.
ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.
If ++ is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.

At some level it's idiomatic C code. It's just the way things are usually done. If that's your big performance bottleneck you're likely working on a unique problem.
However, looking at my K&R The C Programming Language, 1st edition, the first instance I find of i in a loop (pp 38) does use ++i rather than i++.

Im my opinion it became more fashionable with the creation of C++ as C++ enables you to call ++ on non-trivial objects.
Ok, I elaborate: If you call i++ and i is a non-trivial object, then storing a copy containing the value of i before the increment will be more expensive than for say a pointer or an integer.

I think my predecessors are right regarding the side effects of choosing postincrement over preincrement.
For it's fashonability, it may be as simple as that you start all three expressions within the for statement the same repetitive way, something the human brain seems to lean towards to.

I would add up to what other people told you that the main rule is: be consistent. Pick one, and do not use the other one unless it is a specific case.

If the loop is too long, you need to reload the value in the cache to increment it before the jump to the begining.
What you don't need with ++i, no cache move.

In C, all operators that result in a variable having a new value besides prefix inc/dec modify the left hand variable (i=2, i+=5, etc). So in situations where ++i and i++ can be interchanged, many people are more comfortable with i++ because the operator is on the right hand side, modifying the left hand variable
Please tell me if that first sentence is incorrect, I'm not an expert with C.

Using a "pseudo operator" to distinguish simple repetition from general for loops

I would like to know other people's opinion on the following style of writing a for loop:
for (int rep = numberOfReps; rep --> 0 ;) {
// do something that you simply want to repeat numberOfReps times
}
The reason why I invented this style is to distinguish it from the more general case of for loops. I only use this when I need to simply repeat something numberOfReps times and the body of the loop does not use the values of rep and numberofReps in any way.
As far as I know, standard Java for example doesn't have a simple way of saying "just repeat this N times", and that's why I came up with this. I'd even go as far as saying that the body of the loop must not continue or break, unless explicitly documented at the top of the for loop, because as I said the whole purpose is to make the code easier to understand by coming up with a distinct style to express simple repetitions.
The idea is that if what you're doing is not simple (dependency on value of an inreasing/decreasing index, breaks, continues, etc), then use the standard for loop. If what you are doing is simple repetition, on the other hand, then this distinct style communicates that "fact" (once you know the purpose of the style, of course).
I said "fact" because the style can be abused, of course. I'm operating under the assumption that you have competent programmers whose objective is to make their code easier to understand, not harder.
A comment was made that allude to the principle that for should only be used for simple iteration, and while should be used otherwise (e.g. if the loop variables are modified in the body).
If that's the case, then I'm merely extending that principle to say that if it's even simpler than your simple for loops (i.e. you don't even care about the iteration index, or whether it's increasing or decreasing, etc, you just want to repeat doing something N times), then use the winking arrow for loop construct instead.
What a coincidence, Josh Bloch just tweeted the following:
Goes-to Considered Harmful:
public static void main(String[] a) {
int i = 10;
while (i --> 0) /* i goes-to 0 */ {
System.out.println(i);
}
}
Unfortunately no explanation was given, but it seems that at least this pseudo operator has a name. It has also been discussed before on SO: What is the name of this operator: “-->”?

You have the language-agnostic tag, but this question isn't really language agnostic. That pattern would be fine if there wasn't already a well established idiom for doing something n times in your language.
You go on to mention Java, whicha already has a well-established idiom for doing something n times:
for (int i = 0; i < numberOfReps; i++) {
// do something that you simply want to repeat numberOfReps times
}
While your pattern works just as well, it's confusing to others. When I first saw it my thoughts were:
What's that weird arrow?
Why is that line winking at me?
Unless you develop a pattern that has a significant advantage over the standard idiom, it's best to stick with the standard so your fellow coders don't end up scratching their heads.

Nearly every language these days has lambda, so you can write a function like
nTimes(n, body)
that takes an int and a lambda, and more directly communicate intent. In F#, for example
let nTimes(n,f) =
for i in 1..n do f()
nTimes(3, fun() -> printfn "Hello")
or if you prefer extension methods
type System.Int32 with
member this.Times(f) =
for i in 1..this do f()
(3).Times(fun() -> printfn "Hello")

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.