Why isn't the ArrayIndexOutOfBoundsException a compile time error?

Why isn't the ArrayIndexOutOfBoundsException a compile time error? - java

Can someone explain to me why ArrayIndexOutOfBoundsException is a run-time exception instead of a compile-time error?
In obvious cases when the indexes are negative or greater than the array size, I don't see why it cannot be a compile-time error.
Edited: especially when the size of the array and even the indexing is known at compile time, for example int[] a = new int[10]; a[-1]=5; This should be a compilation error.

The size of the arrays may be defined only at runtime (for instance, the simplest case, if the size of an array depends on the user input).
Therefore it would be impossible to check at compile time for such kind of exceptions, by checking the accesses of an array without actually know its bounds (size).

Because it can't be detected at compile-time all the time.

Entering a[-1] = 5; is something only novices would do (as Richard Tingle said). So it's not worth the effort to update the language standard just for that kind of error. A more interesting case would be a[SOME_CONSTANT] = 5; where SOME_CONSTANT was defined as static final int SOME_CONSTANT = -1; (or some expression involving only constants that computes to -1) in some other class. Even then, however, if the compiler flagged this as an error, it might catch cases where the programmer has put a[SOME_CONSTANT] = 5; in an if statement that has already checked for negative values of the constant. (I'm assuming here that SOME_CONSTANT is a constant whose value could change if the application's requirements change.) So while the language could, in theory, make it illegal to write an array indexing operation that can't possibly succeed, there are good reasons not to.
P.S. This is a real issue. The Ada language does do some compile-time checking for static expressions that can't succeed, but it doesn't check this case, and there has been some discussion in the last few weeks about whether it should, or whether compilers should be allowed (but not required) to reject programs with array indexing that is known to fail.

There is no way to check all indexes at compile time, because they can be variables and its values can change at runtime. If you have array[i] and i is the result of reading a file, you can evaluate i when executing the program. Even if you use a variable, remember that you can reassign your array changing its capacity. Again, this can be checked only ar runtime.
Check this question for more information: Runtime vs Compile time.

As well as agreeing with the fact that array size can't be checked at compile time, I want to add another note on the limit of the size of an array, which is expected to be in the range of primitive int:
// This compiles, because the size evaluates to an integer.
int[] array = new int[Integer.MAX_VALUE + 1];
// This doesn't compile.
int[] array = new int[Long.MAX_VALUE];
And this error is because of length field (int) of arrays which are special Java objects.

When working with pointers it's possible to have negative indexes and not have an error if you have correctly reserved the memory position you will access. Here is an example. When working with low-level programming languages things like this one are very frequently done but they don't have a lot of sense in high-level languages, at least for me.
int arr[10];
int* p = &arr[2];
int x = p[-2]; // valid: accesses arr[0]
if you try to do:
arr[-5] //you will access and invalid block of memory, this why you get the error.
this may result a very helpful and interesting:
http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap7/subsection2.1.3.2.html

Related

Array Length in Java - Performance [duplicate]

This question already has answers here:
What is the Cost of Calling array.length
(8 answers)
Java native array lengths
(6 answers)
Closed 9 years ago.
Let's say I create an array of ints with length 10, i.e.
int[] array = new int[10];
At some point in my code, I want to compare the value of an int variable, let's call it var, with the length of the array.
I would like to know if this piece of code:
if(var == array.length) { // stuff }
and this piece of code:
if(var == 10) { // stuff }
which do exactly the same thing, have also the same performance.
In other words, I would like to know the internal mechanics that the JVM (?) uses to find the length of the array (I don't say "to return" since length is a field, not a method). Does it make use of iteration? Because if it does, then the 2nd piece of code would be faster than the 1st one.
EDIT: Similar question regarding array.length cost (even though focusing more to its use in for loops):
What is the Cost of Calling array.length

.length is a property, so it would not do iteration for sure. Still, the value of the property is, naturally, fetched at runtime, meaning that the second solution will be a little bit faster (as this is comparison with constant).
Still the first implementation is far more preferable:
This makes your code quite more maintainable
You can alter the length of the array only at one place
You will never feel the performance difference unless you pass through this if litterally millions of times in a second.
EDIT By the way you can yourself tell this is a property - there are no braces after the call. I at least do not know of a way in java to make property access do additional computation, but just retrieving its value.

.length is a property of the array, not a function. Thus, the result would be available immediately, with no iteration necessary.

From the Java Doc
The members of an array type are all of the following:
The public final field length, which contains the number of components
of the array. length may be positive or zero.
length is an final field of array, so no iterations are required while writing following code.
if(var == array.length) { // stuff }
And it is good coding practice indeed.

The length property of an array is extracted in constant (O(1)) time - there is no iteration needed. It's also good practice to use this.

Error that is neither syntactic nor semantic?

I had this question on a homework assignment (don't worry, already done):
[Using your favorite imperative language, give an example of
each of ...] An error that the compiler can neither catch nor easily generate code to
catch (this should be a violation of the language definition, not just a
program bug)
From "Programming Language Pragmatics" (3rd ed) Michael L. Scott
My answer, call main from main by passing in the same arguments (in C and Java), inspired by this. But I personally felt like that would just be a semantic error.
To me this question's asking how to producing an error that is neither syntactic nor semantic, and frankly, I can't really think of situation where it wouldn't fall in either.
Would it be code that is susceptible to exploitation, like buffer overflows (and maybe other exploitation I've never heard about)? Some sort of pit fall from the structure of the language (IDK, but lazy evaluation/weak type checking)? I'd like a simple example in Java/C++/C, but other examples are welcome.

Undefined behaviour springs to mind. A statement invoking UB is neither syntactically nor semantically incorrect, but rather the result of the code cannot be predicted and is considered erroneous.
An example of this would be (from the Wikipedia page) an attempt to modify a string-constant:
char * str = "Hello world!";
str[0] = 'h'; // undefined-behaviour here
Not all UB-statements are so easily identified though. Consider for example the possibility of signed-integer overflow in this case, if the user enters a number that is too big:
// get number from user
char input[100];
fgets(input, sizeof input, stdin);
int number = strtol(input, NULL, 10);
// print its square: possible integer-overflow if number * number > INT_MAX
printf("%i^2 = %i\n", number, number * number);
Here there may not necessarily be signed-integer overflow. And it is impossible to detect it at compile- or link-time since it involves user-input.

Statements invoking undefined behavior1 are semantically as well as syntactically correct but make programs behave erratically.
a[i++] = i; // Syntax (symbolic representation) and semantic (meaning) both are correct. But invokes UB.
Another example is using a pointer without initializing it.
Logical errors are also neither semantic nor syntactic.
1. Undefined behavior: Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Here's an example for C++. Suppose we have a function:
int incsum(int &a, int &b) {
return ++a + ++b;
}
Then the following code has undefined behavior because it modifies an object twice with no intervening sequence point:
int i = 0;
incsum(i, i);
If the call to incsum is in a different TU from the definition of the function, then it's impossible to catch the error at compile time, because neither bit of code is inherently wrong on its own. It could be detected at link time by a sufficiently intelligent linker.
You can generate as many examples as you like of this kind, where code in one TU has behavior that's conditionally undefined for certain input values passed by another TU. I went for one that's slightly obscure, you could just as easily use an invalid pointer dereference or a signed integer arithmetic overflow.
You can argue how easy it is to generate code to catch this -- I wouldn't say it's very easy, but a compiler could notice that ++a + ++b is invalid if a and b alias the same object, and add the equivalent of assert (&a != &b); at that line. So detection code can be generated by local analysis.

Multidimensional arrays with different sizes

I just had an idea to test something out and it worked:
String[][] arr = new String[4][4];
arr[2] = new String[5];
for(int i = 0; i < arr.length; i++)
{
System.out.println(arr[i].length);
}
The output obviously is:
4
4
5
4
So my questions are:
Is this good or bad style of coding?
What could this be good for?
And most of all, is there a way to create such a construct in the declaration itself?
Also... why is it even possible to do?

Is this good or bad style of coding?
Like anything, it depends on the situation. There are situations where jagged arrays (as they are called) are in fact appropriate.
What could this be good for?
Well, for storing data sets with different lengths in one array. For instance, if we had the strings "hello" and "goodbye", we might want to store their character arrays in one structure. These char arrays have different lengths, so we would use a jagged array.
And most of all, is there a way to create such a construct in the declaration itself?
Yes:
char[][] x = {{'h','e','l','l','o'},
{'g','o','o','d','b','y','e'}};
Also... why is it even possible to do?
Because it is allowed by the Java Language Specification, §10.6.

This is a fine style of coding, there's nothing wrong with it. I've created jagged arrays myself for different problems in the past.
This is good because you might need to store data in this way. Data stored this way would allow you to saves memory. It would be a natural way to map items more efficiently in certain scenarios.
In a single line, without explicitly populating the array? No. This is the closest thing I can think of.
int[][] test = new int[10][];
test[0] = new int[100];
test[1] = new int[500];
This would allow you to populate the rows with arrays of different lengths. I prefer this approach to populating with values like so:
int[][] test = new int[][]{{1,2,3},{4},{5,6,7}};
Because it is more readable, and practical to type when dealing with large ragged arrays.
Its possible to do for the reasons given in 2. People have valid reasons for needing ragged arrays, so the creators of the language gave us a way to do it.

(1) While nothing is technically/functionally/syntactically wrong with it, I would say it is bad coding style because it breaks the assumption provided by the initialization of the object (String[4][4]). This, ultimately, is up to user preference; if you're the only one reading it, and you know exactly what you're doing, it would be fine. If other people share/use your code, it adds confusion.
(2) The only concept I could think of is if you had multiple arrays to be read in, but didn't know the size of them beforehand. However, it would make more sense to use ArrayList<String> in that case, unless the added overhead was a serious matter.
(3) I'm not sure what you're asking about here. Do you mean, can you somehow specify individual array lengths in that initial declaration? The answer to that is no.
(4) Its possible to extend and shrink primitive array lengths because behind the scenes, you're just allocating and releasing chunks of memory.

Existing solution to "smart" initial capacity for StringBuilder

I have a piece logging and tracing related code, which called often throughout the code, especially when tracing is switched on. StringBuilder is used to build a String. Strings have reasonable maximum length, I suppose in the order of hundreds of chars.
Question: Is there existing library to do something like this:
// in reality, StringBuilder is final,
// would have to create delegated version instead,
// which is quite a big class because of all the append() overloads
public class SmarterBuilder extends StringBuilder {
private final AtomicInteger capRef;
SmarterBuilder(AtomicInteger capRef) {
int len = capRef.get();
// optionally save memory with expense of worst-case resizes:
// len = len * 3 / 4;
super(len);
this.capRef = capRef;
}
public syncCap() {
// call when string is fully built
int cap;
do {
cap = capRef.get();
if (cap >= length()) break;
} while (!capRef.compareAndSet(cap, length());
}
}
To take advantage of this, my logging-related class would have a shared capRef variable with suitable scope.
(Bonus Question: I'm curious, is it possible to do syncCap() without looping?)
Motivation: I know default length of StringBuilder is always too little. I could (and currently do) throw in an ad-hoc intitial capacity value of 100, which results in resize in some number of cases, but not always. However, I do not like magic numbers in the source code, and this feature is a case of "optimize once, use in every project".

Make sure you do the performance measurements to make sure you really are getting some benefit for the extra work.
As an alternative to a StringBuilder-like class, consider a StringBuilderFactory. It could provide two static methods, one to get a StringBuilder, and the other to be called when you finish building a string. You could pass it a StringBuilder as argument, and it would record the length. The getStringBuilder method would use statistics recorded by the other method to choose the initial size.
There are two ways you could avoid looping in syncCap:
Synchronize.
Ignore failures.
The argument for ignoring failures in this situation is that you only need a random sampling of the actual lengths. If another thread is updating at the same time you are getting an up-to-date view of the string lengths anyway.

You could store the string length of each string in a statistic array. run your app, and at shutdown you take the 90% quartil of your string length (sort all str length values, and take the length value at array pos = sortedStrings.size() * 0,9
That way you created an intial string builder size where 90% of your strings will fit in.
Update
The value could be hard coded (like java does for value 10 in ArrayList), or read from a config file, or calclualted automatically in a test phase. But the quartile calculation is not for free, so best you run your project some time, measure the 90% quartil on the fly inside the SmartBuilder, output the 90% quartil from time to time, and later change the property file to use the value.
That way you would get optimal results for each project.
Or if you go one step further: Let your smart Builder update that value from time to time in the config file.
But this all is not worth the effort, you would do that only for data that have some millions entries, like digital road maps, etc.

Troubleshooting Java code that refuses to cooperate

The string called "code" doesn't seem to read. Why is that and how do I fix it?
My code (the snippet that causes problems):
String code;
for(int z = 0; z<x;z= z+0) // Repeat once for every character in the input string remaining
{
for(int y=0;y<2;y++) //Repeat twice
{
c = (char)(r.nextInt(26) + 'a'); //Generate a random character (lowercase)
ca = Character.toString(c);
temp = code;
code = temp + ca; //Add a random character to the encoded string
}
My error report:
--------------------Configuration: <Default>--------------------
H:\Java\Compiler.java:66: variable code might not have been initialized
temp = code;
^
1 error
Process completed.
(I am using JCreator 5.00, Java 7.)
(Yes, the error report looks stupid, but it Stack Overflow reads it as coding.)

What value would code have if x is zero? The answer is it would have no value at all (not even null). You could just initialize it to an empty string if you like:
String code = "";

Java requires that every variable is initialized before its value is used. In this example, there is a fairly obvious case in which the variable is used before it is assigned. The Java Language Spec (JLS) doesn't allow this. (If it did, the behaviour of programs would be unpredictable, including ... potentially ... JVM crashes.)
In other cases, the compiler complains when in fact the variable in question is always initialized (or so it seems). Rather than "understanding" your code, or trying to derive a logical proof of initialization, the compiler follows a specified procedure for deciding if the variable is definitely assigned. This procedure is conservative in nature, and the answer it gives is either "it is initialized" or "it might not be initialized". Hence the wording of the compilation error message.
Here is an example in which the compiler will complain, even though it is "obvious" that the variable is initialized before use:
boolean panic;
for (int i = 0; i < 10; i += 2) {
if (i % 2 == 1 && panic) { // compilation error here
System.out.println("Panic!!");
}
}
The definite assignment rules (specified in the JLS) say that panic is NOT definitely initialized at the point indicated. It is a simple matter for a person who understands the basics of formal methods to prove that i % 2 == 1 will always be false. However, the compiler can't. (And even if it could, the code is still in error given JLS rules.)

You've created a reference, but you've never initialized it. Initialize code by changing the first line to
String code = ""
Edit: Zavior pointed out that you can pull an initialized string from the cache rather than allocate space for a new one.
But why are you assigning temp to code and then code to temp plus something else? It can be set to code = code + ca.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why isn't the ArrayIndexOutOfBoundsException a compile time error? - java

Because it can't be detected at compile-time all the time.

Related

Array Length in Java - Performance [duplicate]

Error that is neither syntactic nor semantic?

Multidimensional arrays with different sizes

Existing solution to "smart" initial capacity for StringBuilder

Troubleshooting Java code that refuses to cooperate

Categories

Resources