Java "new String[-1]" passes compilation. How come? - java

While fiddling around in Java, I initialized a new String array with a negative length.
i.e. -
String[] arr = new String[-1];
To my surprise, the compiler didn't complain about it.
Googling didn't bring up any relevant answers. Can anyone shed some light on this matter?
Many thanks!

The reason is that the JLS allows this, and a compiler that flagged it as a compilation error would be rejecting valid Java code.
It is specified in JLS 15.10.1. Here's the relevant snippet:
"... If the value of any DimExpr expression is less than zero, then a NegativeArraySizeException is thrown."
Now if the Java compiler flagged the code as an error, then that specified behaviour could not occur ... in that specific code.
Furthermore, there's no text that I can find that "authorizes" the compiler to reject this in the "obvious mistake" cases involving compile-time constant expressions like -1. (And who is to say it really was a mistake?)
The next question, of course, is 'why does the JLS allow this?'
You've need to ask the Java designers. However I can think of some (mostly) plausible reasons:
This was originally overlooked, and there's no strong case for fixing it. (Noting that fixing it breaks source code compatibility.)
It was considered to be too unusual / edge case to be worth dealing with.
It would potentially cause problems for people writing source code generators. (Imagine, having to write code to evaluate compile-time constant expressions in order that you don't generate non-compilable code. With the current JLS spec, you can simply generate the code with the "bad" size, and deal with the exception (or not) if the code ever gets executed.)
Maybe someone had a plan to add "unarrays" to Java :-)
Other answers have suggested that the compiler could / should "flag" this case. If "flagging" means outputting a warning message, that is certainly permitted by the JLS. However, it is debatable whether the compiler should do this. On the one hand, if the above code was written by mistake, then it would be useful to have that mistake flagged. On the other hand, if it was not a mistake (or the "mistake" was not relevant) then the warning would be noise, or worse. Either way, this is something that you would need to discuss with the maintainer(s) for the respective compiler(s).

I see no reason why this couldn't be flagged up at compile time (at least as a warning), since this unconditionally throws NegativeArraySizeException when executed.
I've done some quick experiments with my compiler, and it seems surprisingly relaxed about this sort of thing. It issues no warning about integer divide by zero in constant expressions, out-of-bounds array access with constant indices etc.
From this I conclude that the general pattern here is to trust the programmer.

Compiler only responsible for checking language syntax, but not the semantic meaning of you code.
Thus it is reasonable the compiler is not complaining error as there is no syntax error in your code at all.
In Java, array is allocated at runtime, which is absolutely ok. If it is allocate at compile time, then how compiler check the following code?
// runtime pass the length, with any value
void t(int length) {
String[] stirngs = new String[length];
}
When pass negative value as length to contruct array, the runtime exception will being thrown.
public class Main {
public static void main(String[] args) {
String[] v = new String[-1];
}
}
with error:
Exception in thread "main" java.lang.NegativeArraySizeException
at Main.main(Main.java:5)

Java compiler takes an integer as the length of an array. It can be a variable or a compile-time constant. The length of an array is established when the array is created. After creation, its length is fixed.
The compiler should flag a negative compile-time constant as the length of an array. It just does not do so . If the length is a negative number you will get a NegativeArraySizeException at run time.

Related

Why is "new String();" a statement but "new int[0];" not?

I just randomly tried seeing if new String(); would compile and it did (because according to Oracle's Java documentation on "Expressions, Statements, and Blocks", one of the valid statement types is "object creation"),
However, new int[0]; is giving me a "not a statement" error.
What's wrong with this? Aren't I creating an array object with new int[0]?
EDIT:
To clarify this question, the following code:
class Test {
void foo() {
new int[0];
new String();
}
}
causes a compiler error on new int[0];, whereas new String(); on its own is fine. Why is one not acceptable and the other one is fine?
The reason is a somewhat overengineered spec.
The idea behind expressions not being valid statements is that they accomplish nothing whatsoever. 5 + 2; does nothing on its own. You must assign it to something, or pass it to something, otherwise why write it?
There are exceptions, however: Expressions which, on their own, will (or possibly will) have side effects. For example, whilst this is illegal:
void foo(int a) {
a + 1;
}
This is not:
void foo(int a) {
a++;
}
That is because, on its own, a++ is not completely useless, it actually changes things (a is modified by doing this). Effectively, 'ignoring the value' (you do nothing with a + 1 in that first snippet) is acceptable if the act of producing the value on its own causes other stuff to happen: After all, maybe that is what you were after all along.
For that reason, invoking methods is also a legit expressionstatement, and in fact it is quite common that you invoke methods (even ones that don't return void), ignoring the return value. For void methods it's the only legal way to invoke them, even.
Constructors are technically methods and can have side effects. It is extremely unlikely, and quite bad code style, if this method:
void doStuff() {
new Something();
}
is 'sensible' code, but it could in theory be written, bad as it may be: The constructor of the Something class may do something useful and perhaps that's all you want to do here: Make that constructor run, do the useful thing, and then take the created object and immediately toss it in the garbage. Weird, but, okay. You're the programmer.
Contrast with:
new Something[10];
This is different: The compiler knows what the array 'constructor' does. And what it does is nothing useful - it creates an object and returns a reference to the object, and that is all that happens. If you then instantly toss the reference in the garbage, then the entire operation was a complete waste of time, and surely you did not intend to do nothing useful with such a bizarre statement, so the compiler designers thought it best to just straight up disallow you from writing it.
This 'oh dear that code makes no sense therefore I shall not compile it' is very limited and mostly an obsolete aspect of the original compiler spec; it's never been updated and this is not a good way to trust that code is sensible; there's all sorts of linter tools out there that go vastly further in finding you code that just cannot be right, so if you care about that sort of thing, invest in learning those.
Nevertheless, the java 1.0 spec had this stuff baked in and there is no particularly good reason to drop this aspect of the java spec, therefore, it remains, and constructing a new array is not a valid ExpressionStatement.
As JLS ยง14.8 states, specifically, a ClassInstanceCreationExpression is in the list of valid expressionstatements. Click that word to link to the definition of ClassInstanceCreationExpression and you'll find that it specifically refers to invoking constructors, and not to array construction.
Thus, the JLS is specific and requires this behaviour. javac is simply following the spec.

Integer return types in Java

So I have a quick question about Java.
So in C, some methods (especially the main) look like this:
int main(){
printf("Test");
return 0;
}
The int return type as some C developers on here may know shows if the method returns an error or not, with 0 being the return showing that there isn't an error. So I thought, well, if I did something like this as a method in Java:
public int test(){
return 0;
}
Would the integer return show that there is/isn't an error returned?
Would the integer return show that there is/isn't an error returned?
Not usually, although every API designer can make their own choices.
Normally, in Java, an error is modelled as an Exception and is thrown rather than returned. (As is the case with modern C++, as I understand it.)
So for instance, if you had a function that parsed integers, you might have:
int parseInt(String str) throws NumberFormatException {
// ...implementation
}
If parseInt returns, it's returned successfully. Otherwise, it throws. You'd use it like this:
try {
int value = parseInt(str);
doSomethingWith(value);
// more main-line code here
// ...
}
catch (NumberFormatException ex) {
// Deal with the exceptional case of an invalid input
}
This tutorial on Oracle's Java site goes into detail around exceptions and exception handling.
Now, sometimes you may have a method that should return an int or a flag indicating a non-exceptional condition indicating the int isn't available. Since it's normal, you expect it to happen, it shouldn't be an exception, but you still need to flag it up somehow. In that situation, one option is to use Integer instead of int and return null in the non-exception case where the int isn't available. Integer is an object type corresponding to the primitive int. Java's auto-boxing and auto-unboxing makes it fairly straightforward to use Integer in these situations.
What ends up being "exceptional" or "non-exception" is very much a judgement call, and consequently you see different API designers doing things slightly differently.
In C main's return value is copied into the exit code value of the process.
In Java, the JVM is really what is running, and it just "starts" the user provided set of classes according to the method public static void main(String[] args) in the class you provide to the command line.
So your class doesn't actually exit, the java program exits. That said, there is a way to get the java program to return with a specific exit code. You call System.exit(5) and java will exit with the exit code 5.
Note that exit codes are not generally portable. In C for example, to exit successfully, you should use EXIT_SUCCESS and not 0. While they are the same on many platforms EXIT_SUCCESS is defined to match the environment.
Some systems don't support exit code (they are rare). Exit codes can be different than 0 for success (again relatively rare). Java avoided all of these because it was promoting the throwing of Exceptions, over the reporting of error codes. That said, sometimes to integrate with other systems, you really do need to emit an exit code. Just don't think of it as a very "java" way of doing things. An exception that gets to the top of the Stack in the JVM gives a stack trace dump which generally allows one to find and fix the issue far easier than any exit code.
You're conflating a few things here. Processes on most operating systems have an exit code. This, by convention, often signals success or error of the process. In C this exit code is surfaced as the return value of the entry point.
Furthermore, in C it's a common pattern that functions return some special value (often 0 or NULL) on error, since there are no exceptions to signal that. Again, just a convention.
In Java both of those are fairly rare. Exceptions are often a nicer way of signalling an unexpected error condition, and the entry point doesn't have a return value.
Would the integer return show that there is/isn't an error returned?
That depends entirely on how the function that called it intends to use that value. An integer return value may be used for an error code, or it may be used for a count of things (length of a string, number of elements in a tree, shabazzes frobbed), or it may indicate a value on a scale (temperature, elevation, dollars, etc.).
In some cases a value of 0 is an out-of-band value (not valid for the given domain, like the zero terminator at the end of a C-style string or a NULL pointer). In other cases it's a perfectly valid data value (temperature, elevation, heading).
Notice that in Java main returns void, not int. Java's philosophy of detecting and handling errors is different from C's.

"Local variable is redundant" using Java

Why is the following giving me a "local variable is redundant error"?
public double depreciationAmount() {
double depreciationAmount = (cost * percentDepreciated);
return depreciationAmount;
}
Why is the following giving me a "local variable is redundant error"?
Because you can trivially write this without using a local variable.
public double depreciationAmount() {
return cost * percentDepreciated;
}
Hence the local variable is deemed to be unnecessary / redundant by the checker.
However, I surmise that this is not a compiler error. It might be a compiler warning, or more likely it is a style checker or bug checker warning. It is something you could ignore without any risk to the correctness of your code ... as written.
Also, I would predict that once that the code has been JIT compiled (by a modern Hotspot JIT compiler ...) there would be no performance difference between the two versions.
I won't attempt to address the issue as to whether the warning is appropriate1. If you feel it is inappropriate, then "Local variable is redundant" using Java explains how to suppress it.
1 - Except to say that it is too much to expect current generation style checkers to know when so-called explaining variables are needed. First you'd need to get a statistically significant2 group of developers to agree on measurable3 criteria for when the variables are needed, and when they aren't.
2 - Yea, I know. Abuse of terminology.
3 - They must be measurable, and there needs to be consensus on what the thresholds should be if this is to be implemented by a checker.
Although not the case here, if having a redundant local variable is desired (I've had one time where this was the case - without getting into specifics), here's how to suppress this specific warning.
#SuppressWarnings("UnnecessaryLocalVariable")
public double depreciationAmount() {
double depreciationAmount = (cost * percentDepreciated);
return depreciationAmount;
}
You only use the value of percentDepreciated to return it when you could have just done return (cost * percentDepreciated).
Why is the following giving me a "local variable is redundant error"?
I believe this message is wrong. Your depreciationAmount variable assignment is totally fine. Moreover, I always prefer this kind of assignment before return, because it helps to avoid confusion while debugging.
In this example, the getValue() method returns an expression result, instead of assigning the expression result to a variable.
Now when I use debugger watch, to know the result of an expression, I got a confusion. My program ends with the wrong result and debugger watch values are inconsistent. It would be easy to avoid this, if I would have a variable assigned before the returning expression:
Integer value = 1 + getCounter();
return value;
instead of:
return 1 + getCounter();
Now I can put a breakpoint at the return statement and know what was the result of the expression, before it was returned. Also I do not need the expression in the watch any more, and code will be executed correctly while debugging.
In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary. In the above code, you can simply return:
(cost * percentDepreciated)

Why can't ArrayStoreExceptions be caught by the Java compiler?

I understand what an ArrayStoreException is. My question is: why isn't this caught by the compiler?
This might be an odd example, but say you do this:
HashMap[] h = new LinkedHashMap[4];
h[0] = new PrinterStateReasons();
Why can't the compiler recognize that this isn't valid?
Because the information you've given the compiler allows what you're doing. It's only the runtime state that's invalid. Your h variable is declared as being a HashMap[], which means that as far as h is concerned, anything implementing HashMap is a valid element. PrinterStateReasons implements HashMap, and so h[0] = new PrinterStateReasons(); is a perfectly valid statement. Similarly, since LinkedHashMap implements HashMap, the statement HashMap[] h = new LinkedHashMap[4]; is a perfectly valid statement. It's only at runtime that you try to store a PrinterStateReasons object as an element in a LinkedHashMap array, which you can't do as it isn't assignment-compatible.
The two statements you've given are contiguous, but of course the generalized reality is far more complex. Consider:
HashMap[] h = foo.getHashMapArray();
h[0] = new PrinterStateReasons();
// ... elsewhere, in some `Foo` class -- perhaps compiled
// completely separately from the code above, perhaps
// even by a completely different team and even a different
// compiler -- and only combined with the code above at runtime...
public HashMap[] getHashMapArray() {
return new LinkedHashMap[4];
}
Well, I suppose that a smart compiler would be able to statically analyse that h can never be anything else than a LinkedHashmap[] at line 2.
But without that (potentially rather complex analytics, perhaps not this simple case though) the compiler can't really know what is assigned to h. You can assign a PrinterStateReasons to a HashMap, just not to a LinkedHashMap.
What the compiler can't catch is when you refer to a specific type array using a more generic type array:
String[] s = new String[10];
Object[] o = s;
o[0] = new Integer(5);
The compiler can't detect it because it has no way, from the variable declaration, to know the actual type of the array.
Note that you won't have this problems with generic collections, because although a String[] is also an Object[], a List<String> is not a List<Object>. One more reason to prefer collections over arrays.
The Java Language Specification says that this is valid program. A compiler that flagged it as an error is not implementing the specification, and therefore is not a compliant Java compiler. The real harm of a non-compliant compiler is that it leads to people writing source code that is not portable; e.g. compiles with one compiler but not another.
The best that a compiler could legally do is to warn you that the code will always throw an exception. For instance, some compilers will warn you about null pointer exceptions that will always be thrown. I guess they don't do this in the array index case because the analysis is more complicated, and the mistake is made less often.

Is -1 a magic number? An anti-pattern? A code smell? Quotes and guidelines from authorities [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Constant abuse?
I've seen -1 used in various APIs, most commonly when searching into a "collection" with zero-based indices, usually to indicate the "not found" index. This "works" because -1 is never a legal index to begin with. It seems that any negative number should work, but I think -1 is almost always used, as some sort of (unwritten?) convention.
I would like to limit the scope to Java at least for now. My questions are:
What are the official words from Sun regarding using -1 as a "special" return value like this?
What quotes are there regarding this issue, from e.g. James Gosling, Josh Bloch, or even other authoritative figures outside of Java?
What were some of the notable discussions regarding this issue in the past?
This is a common idiom in languages where the types do not include range checks. An "out of bounds" value is used to indicate one of several conditions. Here, the return value indicates two things: 1) was the character found, and 2) where was it found.
The use of -1 for not found and a non-negative index for found succinctly encodes both of these into one value, and the fact that not-found does not need to return an index.
In a language with strict range checking, such as Ada or Pascal, the method might be implemented as (pseudo code)
bool indexOf(c:char, position:out Positive);
Positive is a subtype of int, but restricted to non-negative values.
This separates the found/not-found flag from the position. The position is provided as an out parameter - essentialy another return value. It could also be an in-out parameter, to start the search from a given position. Use of -1 to indicate not-found would not be allowed here since it violates range checks on the Positive type.
The alternatives in java are:
throw an exception: this is not a good choice here, since not finding a character is not an exceptional condition.
split the result into several methods, e.g. boolean indexOf(char c); int lastFoundIndex();. This implies the object must hold on to state, which will not work in a concurrent program, unless the state is stored in thread-local storage, or synchronization is used - all considerable overheads.
return the position and found flag separately: such as boolean indexOf(char c, Position pos). Here, creating the position object may be seen as unnecessary overhead.
create a multi-value return type
such as
class FindIndex {
boolean found;
int position;
}
FindIndex indexOf(char c);
although it clearly separates the return values, it suffers object creation overhead. Some of that could be mitigated by passing the FindIndex as a parameter, e.g.
FindIndex indexOf(char c, FindIndex start);
Incidentally, multiple return values were going to be part of java (oak), but were axed prior to 1.0 to cut time to release. James Gosling says he wishes they had been included. It's still a wished-for feature.
My take is that use of magic values are a practical way of encoding a multi-valued results (a flag and a value) in a single return value, without requiring excessive object creation overhead.
However, if using magic values, it's much nicer to work with if they are consistent across related api calls. For example,
// get everything after the first c
int index = str.indexOf('c');
String afterC = str.substring(index);
Java falls short here, since the use of -1 in the call to substring will cause an IndeOutOfBoundsException. Instead, it might have been more consistent for substring to return "" when invoked with -1, if negative values are considered to start at the end of the string. Critics of magic values for error conditions say that the return value can be ignored (or assumed to be positive). A consistent api that handles these magic values in a useful way would reduce the need to check for -1 and allow for cleaner code.
Is -1 a magic number?
In this context, not really. There is nothing special about -1 ... apart from the fact that it is guaranteed to be an invalid index value by virtue of being negative.
An anti-pattern?
No. To qualify as an anti-pattern there would need to be something harmful about this idiom. I see nothing harmful in using -1 this way.
A code smell?
Ditto. (It is arguably better style to use a named constant rather than a bare -1 literal. But I don't think that is what you are asking about, and it wouldn't count as "code smell" anyway, IMO.)
Quotes and guidelines from authorities
Not that I'm aware of. However, I would observe that this "device" is used in various standard classes. For example, String.indexOf(...) returns -1 to say that the character or substring could not be found.
As far as I am concerned, this is simply an "algorithmic device" that is useful in some cases. I'm sure that if you looked back through the literature, you will see examples of using -1 (or 0 for languages with one-based arrays) this way going back to the 1960's and before.
The choice of -1 rather than some other negative number is simply a matter of personal taste, and (IMO) not worth analyzing., in this context.
It may be a bad idea for a method to return -1 (or some other value) to indicate an error instead of throwing an exception. However, the problem here is not the value returned but the fact that the method is requiring the caller to explicitly test for errors.
The flip side is that if the "condition" represented by -1 (or whatever) is not an "error" / "exceptional condition", then returning the special value is both reasonable and proper.
Both Java and JavaScript use -1 when an index isn't found. Since the index is always 0-n it seems a pretty obvious choice.
//JavaScript
var url = 'example.com/foo?bar&admin=true';
if(url.indexOf('&admin') != -1){
alert('we likely have an insecure app!');
}
I find this approach (which I've used when extending Array-type elements to have a .indexOf() method) to be quite normal.
On the other hand, you can try the PHP approach e.g. strpos() but IMHO it gets confusing as there are multiple return types (it returns FALSE when not found)
-1 as a return value is slightly ugly but necessary. The alternatives to signal a "not found" condition are IMHO all much worse:
You could throw an Exception, but
this isn't ideal because Exceptions
are best used to signal unexpected
conditions that require some form of
recovery or propagated failure. Not
finding an occurrence of a substring
is actually pretty expected. Also
Exception throwing has a significant
performance penalty.
You could use a compound result
object with (found,index) but this
requires an object allocation and
more complex code on the part of the
caller to inspect the result.
You could separate out two separate
function calls for contains and indexOf - however this is
again quite cumbersome for the caller
and also results in a performance hit
as both calls would be O(n) and
require a full traversal of the
String.
Personally, I never like to refer to the -1 constant: my test for not-found is always something like:
int i = someString.indexOf("substring");
if (i>=0) {
// do stuff with found index
} else {
// handle not found case
}
It is good practice to define a final class variable for all constant values in your code.
But it is general accepted to use 0, 1, -1, "" (empty string) without an explicit declaration.
This is an inheritance from C where only a single primitive value could be returned. In java you Can also return a single object.
So for new code return an object of a basetype with the subtype indicating the problem to be used with instaceof, or throw a "not Found" exception.
For existing special values make -1 a constant in your code names accordingly - NOT_FOUND - so the reader Can tell the meaning without having to check javadocs.
The same practice as with null applies to -1. Its been discussed many times.
e.g. Java api design - NULL or Exception
Its used because its the first invalid value you encounter in 0-based arrays. As you know, not all types can hold null or nothing so need "something" to signify nothing.
I would say its not official, it has just become convention (unwritten) because its very sensible for the situation. Personally, I wouldn't also call it an issue. API design is also down to the author, but guidelines can be found online.
As far as I know, such values are called sentinel values, although most common definitions differ slightly from this scenario.
Languages such as Java chose to not support passing by reference (which I think is a good idea), so while the values of individual arguments are mutable, the variables passed to a function remain unaffected. As a consequence of this, you can only have one return value of only one type. So what you do is to chose an otherwise invalid value of a valid type, and return it to transport additional semantics, because the return value is not actually the return value of the operation but a special signal.
Now I guess, the cleanest approach would be to have a contains and an indexOf method, the second of which would throw an exception, if the element you're asking for is not in the collection. Why? Because one would expect the following to be true:
someCollection.objectAtIndex(someCollection.indexOf(someObject)) == someObject
What you're likely to get is an exception because -1 is out of bounds, while the actual reason why this plausible relation is not true is, that someObject is not an element of someCollection, and that is why the inner call should raise the exception.
Now as clean and robust, as this may be, it has two key flaws:
Usually both operations would usually cost you O(n) (unless you have an inverse map within the collection), so you're better off if you do just one.
It is really quite verbose.
In the end, it's up to you to decide. This is a matter of philosophy. I'd call it a "semantic hack" to achieve both shortness & speed at the cost of robustness. Your call ;)
greetz
back2dos
like why 51% means everything among shareholders of a company, since it's the best nearest and makes sense rather than -2 or -3 ...

Categories