Related
sometimes it would be convenient to have an easy way of doing the following:
Foo a = dosomething();
if (a != null){
if (a.isValid()){
...
}
}
My idea was to have some kind of static “default” methods for not initialized variables like this:
class Foo{
public boolean isValid(){
return true;
}
public static boolean isValid(){
return false;
}
}
And now I could do this…
Foo a = dosomething();
if (a.isValid()){
// In our example case -> variable is initialized and the "normal" method gets called
}else{
// In our example case -> variable is null
}
So, if a == null the static “default” methods from our class gets called, otherwise the method of our object gets called.
Is there either some keyword I’m missing to do exactly this or is there a reason why this is not already implemented in programming languages like java/c#?
Note: this example is not very breathtaking if this would work, however there are examples where this would be - indeed - very nice.
It's very slightly odd; ordinarily, x.foo() runs the foo() method as defined by the object that the x reference is pointing to. What you propose is a fallback mechanism where, if x is null (is referencing nothing) then we don't look at the object that x is pointing to (there's nothing its pointing at; hence, that is impossible), but that we look at the type of x, the variable itself, instead, and ask this type: Hey, can you give me the default impl of foo()?
The core problem is that you're assigning a definition to null that it just doesn't have. Your idea requires a redefinition of what null means which means the entire community needs to go back to school. I think the current definition of null in the java community is some nebulous ill defined cloud of confusion, so this is probably a good idea, but it is a huge commitment, and it is extremely easy for the OpenJDK team to dictate a direction and for the community to just ignore it. The OpenJDK team should be very hesitant in trying to 'solve' this problem by introducing a language feature, and they are.
Let's talk about the definitions of null that make sense, which definition of null your idea specifically is catering to (at the detriment of the other interpretations!), and how catering to that specific idea is already easy to do in current java, i.e. - what you propose sounds outright daft to me, in that it's just unneccessary and forces an opinion of what null means down everybody's throats for no reason.
Not applicable / undefined / unset
This definition of null is exactly how SQL defines it, and it has the following properties:
There is no default implementation available. By definition! How can one define what the size is of, say, an unset list? You can't say 0. You have no idea what the list is supposed to be. The very point is that interaction with an unset/not-applicable/unknown value should immediately lead to a result that represents either [A] the programmer messed up, the fact that they think they can interact with this value means they programmed a bug - they made an assumption about the state of the system which does not hold, or [B] that the unset nature is infectuous: The operation returns the notion 'unknown / unset / not applicable' as result.
SQL chose the B route: Any interaction with NULL in SQL land is infectuous. For example, even NULL = NULL in SQL is NULL, not FALSE. It also means that all booleans in SQL are tri-state, but this actually 'works', in that one can honestly fathom this notion. If I ask you: Hey, are the lights on?, then there are 3 reasonable answers: Yes, No, and I can't tell you right now; I don't know.
In my opinion, java as a language is meant for this definition as well, but has mostly chosen the [A] route: Throw an NPE to let everybody know: There is a bug, and to let the programmer get to the relevant line extremely quickly. NPEs are easy to solve, which is why I don't get why everybody hates NPEs. I love NPEs. So much better than some default behaviour that is usually but not always what I intended (objectively speaking, it is better to have 50 bugs that each takes 3 minutes to solve, than one bug that takes an an entire working day, by a large margin!) – this definition 'works' with the language:
Uninitialized fields, and uninitialized values in an array begin as null, and in the absence of further information, treating it as unset is correct.
They are, in fact, infectuously erroneous: Virtually all attempts to interact with them results in an exception, except ==, but that is intentional, for the same reason in SQL IS NULL will return TRUE or FALSE and not NULL: Now we're actually talking about the pointer nature of the object itself ("foo" == "foo" can be false if the 2 strings aren't the same ref: Clearly == in java between objects is about the references itself and not about the objects referenced).
A key aspect to this is that null has absolutely no semantic meaning, at all. Its lack of semantic meaning is the point. In other words, null doesn't mean that a value is short or long or blank or indicative of anything in particular. The only thing it does mean is that it means nothing. You can't derive any information from it. Hence, foo.size() is not 0 when foo is unset/unknown - the question 'what is the size of the object foo is pointing at' is unanswerable, in this definition, and thus NPE is exactly right.
Your idea would hurt this interpretation - it would confound matters by giving answers to unanswerable questions.
Sentinel / 'empty'
null is sometimes used as a value that does have semantic meaning. Something specific. For example, if you ever wrote this, you're using this interpretation:
if (x == null || x.isEmpty()) return false;
Here you've assigned a semantic meaning to null - the same meaning you assigned to an empty string. This is common in java and presumably stems from some bass ackwards notion of performance. For example, in the eclipse ecj java parser system, all empty arrays are done with null pointers. For example, the definition of a method has a field Argument[] arguments (for the method parameters; using argument is the slightly wrong word, but it is used to store the param definitions); however, for methods with zero parameters, the semantically correct choice is obviously new Argument[0]. However, that is NOT what ecj fills the Abstract Syntax Tree with, and if you are hacking around on the ecj code and assign new Argument[0] to this, other code will mess up as it just wasn't written to deal with this.
This is in my opinion bad use of null, but is quite common. And, in ecj's defense, it is about 4 times faster than javac, so I don't think it's fair to cast aspersions at their seemingly deplorably outdated code practices. If it's stupid and it works it isn't stupid, right? ecj also has a better track record than javac (going mostly by personal experience; I've found 3 bugs in ecj over the years and 12 in javac).
This kind of null does get a lot better if we implement your idea.
The better solution
What ecj should have done, get the best of both worlds: Make a public constant for it! new Argument[0], the object, is entirely immutable. You need to make a single instance, once, ever, for an entire JVM run. The JVM itself does this; try it: List.of() returns the 'singleton empty list'. So does Collections.emptyList() for the old timers in the crowd. All lists 'made' with Collections.emptyList() are actually just refs to the same singleton 'empty list' object. This works because the lists these methods make are entirely immutable.
The same can and generally should apply to you!
If you ever write this:
if (x == null || x.isEmpty())
then you messed up if we go by the first definition of null, and you're simply writing needlessly wordy, but correct, code if we go by the second
definition. You've come up with a solution to address this, but there's a much, much better one!
Find the place where x got its value, and address the boneheaded code that decided to return null instead of "". You should in fact emphatically NOT be adding null checks to your code, because it's far too easy to get into this mode where you almost always do it, and therefore you rarely actually have null refs, but it's just swiss cheese laid on top of each other: There may still be holes, and then you get NPEs. Better to never check so you get NPEs very quickly in the development process - somebody returned null where they should be returning "" instead.
Sometimes the code that made the bad null ref is out of your control. In that case, do the same thing you should always do when working with badly designed APIs: Fix it ASAP. Write a wrapper if you have to. But if you can commit a fix, do that instead. This may require making such an object.
Sentinels are awesome
Sometimes sentinel objects (objects that 'stand in' for this default / blank take, such as "" for strings, List.of() for lists, etc) can be a bit more fancy than this. For example, one can imagine using LocalDate.of(1800, 1, 1) as sentinel for a missing birthdate, but do note that this instance is not a great idea. It does crazy stuff. For example, if you write code to determine the age of a person, then it starts giving completely wrong answers (which is significantly worse than throwing an exception. With the exception you know you have a bug faster and you get a stacktrace that lets you find it in literally 500 milliseconds (just click the line, voila. That is the exact line you need to look at right now to fix the problem). It'll say someone is 212 years old all of a sudden.
But you could make a LocalDate object that does some things (such as: It CAN print itself; sentinel.toString() doesn't throw NPE but prints something like 'unset date'), but for other things it will throw an exception. For example, .getYear() would throw.
You can also make more than one sentinel. If you want a sentinel that means 'far future', that's trivially made (LocalDate.of(9999, 12, 31) is pretty good already), and you can also have one as 'for as long as anyone remembers', e.g. 'distant past'. That's cool, and not something your proposal could ever do!
You will have to deal with the consequences though. In some small ways the java ecosystem's definitions don't mesh with this, and null would perhaps have been a better standin. For example, the equals contract clearly states that a.equals(a) must always hold, and yet, just like in SQL NULL = NULL isn't TRUE, you probably don't want missingDate.equals(missingDate) to be true; that's conflating the meta with the value: You can't actually tell me that 2 missing dates are equal. By definition: The dates are missing. You do not know if they are equal or not. It is not an answerable question. And yet we can't implement the equals method of missingDate as return false; (or, better yet, as you also can't really know they aren't equal either, throw an exception) as that breaks contract (equals methods must have the identity property and must not throw, as per its own javadoc, so we can't do either of those things).
Dealing with null better
There are a few things that make dealing with null a lot easier:
Annotations: APIs can and should be very clear in communicating when their methods can return null and what that means. Annotations to turn that documentation into compiler-checked documentation is awesome. Your IDE can start warning you, as you type, that null may occur and what that means, and will say so in auto-complete dialogs too. And it's all entirely backwards compatible in all senses of the word: No need to start considering giant swaths of the java ecosystem as 'obsolete' (unlike Optional, which mostly sucks).
Optional, except this is a non-solution. The type isn't orthogonal (you can't write a method that takes a List<MaybeOptionalorNot<String>> that works on both List<String> and List<Optional<String>>, even though a method that checks the 'is it some or is it none?' state of all list members and doesn't add anything (except maybe shuffle things around) would work equally on both methods, and yet you just can't write it. This is bad, and it means all usages of optional must be 'unrolled' on the spot, and e.g. Optional<X> should show up pretty much never ever as a parameter type or field type. Only as return types and even that is dubious - I'd just stick to what Optional was made for: As return type of Stream terminal operations.
Adopting it also isn't backwards compatible. For example, hashMap.get(key) should, in all possible interpretations of what Optional is for, obviously return an Optional<V>, but it doesn't, and it never will, because java doesn't break backwards compatibility lightly and breaking that is obviously far too heavy an impact. The only real solution is to introduce java.util2 and a complete incompatible redesign of the collections API, which is splitting the java ecosystem in twain. Ask the python community (python2 vs. python3) how well that goes.
Use sentinels, use them heavily, make them available. If I were designing LocalDate, I'd have created LocalDate.FAR_FUTURE and LocalDate_DISTANT_PAST (but let it be clear that I think Stephen Colebourne, who designed JSR310, is perhaps the best API designer out there. But nothing is so perfect that it can't be complained about, right?)
Use API calls that allow defaulting. Map has this.
Do NOT write this code:
String phoneNr = phoneNumbers.get(userId);
if (phoneNr == null) return "Unknown phone number";
return phoneNr;
But DO write this:
return phoneNumbers.getOrDefault(userId, "Unknown phone number");
Don't write:
Map<Course, List<Student>> participants;
void enrollStudent(Student student) {
List<Student> participating = participants.get(econ101);
if (participating == null) {
participating = new ArrayList<Student>();
participants.put(econ101, participating);
}
participating.add(student);
}
instead write:
Map<Course, List<Student>> participants;
void enrollStudent(Student student) {
participants.computeIfAbsent(econ101,
k -> new ArrayList<Student>())
.add(student);
}
and, crucially, if you are writing APIs, ensure things like getOrDefault, computeIfAbsent, etc. are available so that the users of your API don't have to deal with null nearly as much.
You can write a static test() method like this:
static <T> boolean test(T object, Predicate<T> validation) {
return object != null && validation.test(object);
}
and
static class Foo {
public boolean isValid() {
return true;
}
}
static Foo dosomething() {
return new Foo();
}
public static void main(String[] args) {
Foo a = dosomething();
if (test(a, Foo::isValid))
System.out.println("OK");
else
System.out.println("NG");
}
output:
OK
If dosomething() returns null, it prints NG
Not exactly, but take a look at Optional:
Optional.ofNullable(dosomething())
.filter(Foo::isValid)
.ifPresent(a -> ...);
int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z evaluates to 20
but when the same expression in java, when executed gives the z value as 12.
Can anyone explain why this is happening and what is the difference?
when executed in C language the value of z evaluates to 20
No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior
As a rule of thumb, never modify a variable twice in the same expression.
It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?
In C, when it comes to arithmetic operators, like + and /, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.
Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar(), only the foo() function will be executed.
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer
There are 3 parts to this answer:
How this works in C (unspecified behaviour)
How this works in Java (the spec is clear on how this should be evaluated)
Why is there a difference.
For #1, you should read #klutt's fantastic answer.
For #2 and #3, you should read this answer.
How does it work in java?
Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.
The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.
Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
The answer is: performance.
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
So java code is never undefined behaviour?
Not so. Java has a memory model which includes a ton of undefined behaviour:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java. It may print 0056, 0012, 0010, 0002, 5600, 0600, and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?
For the exact same reason your C code produces arbitrary answers:
Optimization.
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized.
When executed in C language the value of z evaluates to 20
It is not the truth. The compiler you use evaluates it to 20. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.
In your expression you have two problems.
Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
In this expression there is also problem with the sequence point (Undefined Bahaviour)
I recently saw a discussion in an SO chat but with no clear conclusions so I ended up asking there.
Is this for historical reasons or consistency with other languages? When looking at the signatures of compareTo of various languages, it returns an int.
Why it doesn't return an enum instead. For example in C# we could do:
enum CompareResult {LessThan, Equals, GreaterThan};
and :
public CompareResult CompareTo(Employee other) {
if (this.Salary < other.Salary) {
return CompareResult.LessThan;
}
if (this.Salary == other.Salary){
return CompareResult.Equals;
}
return CompareResult.GreaterThan;
}
In Java, enums were introduced after this concept (I don't remember about C#) but it could have been solved by an extra class such as:
public final class CompareResult {
public static final CompareResult LESS_THAN = new Compare();
public static final CompareResult EQUALS = new Compare();
public static final CompareResult GREATER_THAN = new Compare();
private CompareResult() {}
}
and
interface Comparable<T> {
Compare compareTo(T obj);
}
I'm asking this because I don't think an int represents well the semantics of the data.
For example in C#,
l.Sort(delegate(int x, int y)
{
return Math.Min(x, y);
});
and its twin in Java 8,
l.sort(Integer::min);
compiles both because Min/min respect the contracts of the comparator interface (take two ints and return an int).
Obviously the results in both cases are not the ones expected. If the return type was Compare it would have cause a compile error thus forcing you to implement a "correct" behavior (or at least you are aware of what you are doing).
A lot of semantic is lost with this return type (and potentially can cause some difficult bugs to find), so why design it like this?
[This answer is for C#, but it probably also apples to Java to some extent.]
This is for historical, performance and readability reasons. It potentially increases performance in two places:
Where the comparison is implemented. Often you can just return "(lhs - rhs)" (if the values are numeric types). But this can be dangerous: See below!
The calling code can use <= and >= to naturally represent the corresponding comparison. This will use a single IL (and hence processor) instruction compared to using the enum (although there is a way to avoid the overhead of the enum, as described below).
For example, we can check if a lhs value is less than or equal to a rhs value as follows:
if (lhs.CompareTo(rhs) <= 0)
...
Using an enum, that would look like this:
if (lhs.CompareTo(rhs) == CompareResult.LessThan ||
lhs.CompareTo(rhs) == CompareResult.Equals)
...
That is clearly less readable and is also inefficient since it is doing the comparison twice. You might fix the inefficiency by using a temporary result:
var compareResult = lhs.CompareTo(rhs);
if (compareResult == CompareResult.LessThan || compareResult == CompareResult.Equals)
...
It's still a lot less readable IMO - and it's still less efficient since it's doing two comparison operations instead of one (although I freely admit that it is likely that such a performance difference will rarely matter).
As raznagul points out below, you can actually do it with just one comparison:
if (lhs.CompareTo(rhs) != CompareResult.GreaterThan)
...
So you can make it fairly efficient - but of course, readability still suffers. ... != GreaterThan is not as clear as ... <=
(And if you use the enum, you can't avoid the overhead of turning the result of a comparison into an enum value, of course.)
So this is primarily done for reasons of readability, but also to some extent for reasons of efficiency.
Finally, as others have mentioned, this is also done for historical reasons. Functions like C's strcmp() and memcmp() have always returned ints.
Assembler compare instructions also tend to be used in a similar way.
For example, to compare two integers in x86 assembler, you can do something like this:
CMP AX, BX ;
JLE lessThanOrEqual ; jump to lessThanOrEqual if AX <= BX
or
CMP AX, BX
JG greaterThan ; jump to greaterThan if AX > BX
or
CMP AX, BX
JE equal ; jump to equal if AX == BX
You can see the obvious comparisons with the return value from CompareTo().
Addendum:
Here's an example which shows that it's not always safe to use the trick of subtracting the rhs from the lhs to get the comparison result:
int lhs = int.MaxValue - 10;
int rhs = int.MinValue + 10;
// Since lhs > rhs, we expect (lhs-rhs) to be +ve, but:
Console.WriteLine(lhs - rhs); // Prints -21: WRONG!
Obviously this is because the arithmetic has overflowed. If you had checked turned on for the build, the code above would in fact throw an exception.
For this reason, the optimization of suusing subtraction to implement comparison is best avoided. (See comments from Eric Lippert below.)
Let's stick to bare facts, with absolute minumum of handwaving and/or unnecessary/irrelevant/implementation dependent details.
As you already figured out yourself, compareTo is as old as Java (Since: JDK1.0 from Integer JavaDoc); Java 1.0 was designed to be familiar to C/C++ developers, and mimicked a lot of it's design choices, for better or worse. Also, Java has a backwards compatibility policy - thus, once implemented in core lib, the method is almost bound to stay in it forever.
As to C/C++ - strcmp/memcmp, which existed for as long as string.h, so essentially as long as C standard library, return exactly the same values (or rather, compareTo returns the same values as strcmp/memcmp) - see e.g. C ref - strcmp. At the time of Java's inception going that way was the logical thing to do. There weren't any enums in Java at that time, no generics etc. (all that came in >= 1.5)
The very decision of return values of strcmp is quite obvious - first and foremost, you can get 3 basic results in comparison, so selecting +1 for "bigger", -1 for "smaller" and 0 for "equal" was the logical thing to do. Also, as pointed out, you can get the value easily by subtraction, and returning int allows to easily use it in further calculations (in a traditional C type-unsafe way), while also allowing efficient single-op implementation.
If you need/want to use your enum based typesafe comparison interface - you're free to do so, but since the convention of strcmp returning +1/0/-1 is as old as contemporary programming, it actually does convey semantic meaning, in the same way null can be interpreted as unknown/invalid value or a out of bounds int value (e.g. negative number supplied for positive-only quality) can be interpreted as error code. Maybe it's not the best coding practice, but it certainly has its pros, and is still commonly used e.g. in C.
On the other hand, asking "why the standard library of language XYZ does conform to legacy standards of language ABC" is itself moot, as it can only be accurately answered by the very language designed who implemented it.
TL;DR it's that way mainly because it was done that way in legacy versions for legacy reasons and POLA for C programmers, and is kept that way for backwards-compatibility & POLA, again.
As a side note, I consider this question (in its current form) too broad to be answered precisely, highly opinion-based, and borderline off-topic on SO due to directly asking about Design Patterns & Language Architecture.
This practice comes from comparing integers this way, and using a subtract between first non-matching chars of a string.
Note that this practice is dangerous with things that are partially comparable while using a -1 to mean that a pair of things was incomparable. This is because it could create a situation of a < b and b < a (which the application might use to define "incomparable"). Such a situation can lead to loops that don't terminate correctly.
An enumeration with values {lt,eq,gt,incomparable} would be more correct.
My understanding is that this is done because you can order the results (i.e., the operation is reflexive and transitive). For example, if you have three objects (A,B,C) you can compare A->B and B->C, and use the resulting values to order them properly. There is an implied assumption that if A.compareTo(B) == A.compareTo(C) then B==C.
See java's comparator documentation.
Reply this is due to performance reasons.
If you need to compare int as often happens you can return the following:
Infact comparison are often returned as substractions.
As an example
public class MyComparable implements Comparable<MyComparable> {
public int num;
public int compareTo(MyComparable x) {
return num - x.num;
}
}
We were having this discussion wiht my colleagues about Inner assignments such as:
return result = myObject.doSomething();
or
if ( null == (point = field.getPoint()) )
Are these acceptable or should they be replaced by the following and why?
int result = myObject.doSomething();
return result;
or
Point point = field.getPoint();
if ( null == point)
The inner assignment is harder to read and easier to miss. In a complex condition it can even be missed, and can cause error.
Eg. this will be a hard to find error, if the condition evaluation prevent to assign a value to the variable:
if (i == 2 && null == (point = field.getPoint())) ...
If i == 2 is false, the point variable will not have value later on.
if ( null == (point = field.getPoint()) )
Pros:
One less line of code
Cons:
Less readable.
Doesn't restrict point's scope to the statement and its code block.
Doesn't offer any performance improvements as far as I am aware
Might not always be executed (when there is a condition preceding it that evaluates to false.
Cons outweigh pros 4 / 1 so I would avoid it.
This is mainly concerned with readablity of the code. Avoid inner assignments to make your code readable as you will not get any improvements with inner assignments
Functionally Not Necessarily.
For Readability Definitely Yes
They should be avoided. Reducing the number of identifiers/operations per line will increase readability and improve internal code quality. Here's an interesting study on the topic: http://dl.acm.org/citation.cfm?id=1390647
So bottom line, splitting up
return result = myObject.doSomething();
into
result = myObject.doSomething();
return result;
will make it easier for others to understand and work with your code. At the same time, it wouldn't be the end of the world if there were a couple inner assignments sprinkled throughout your code base, so long as they're easily understandable within their context.
Well, the first one is not exactly inner assignment but in second case...it reduces readability ...but in some cases like below,
while ( null == (point = field.getPoint()) );
it's good to write it this way
In both cases the first form is harder to read, and will make you want to change it whenever you want to inspect the value in a debugger. I don't know how often I've cursed "concise" code when step-debugging.
There are a very few cases where inner assignments reduce program complexity, for example in if (x != null && y != null && ((c = f(x, y)) > 0) {...} and you really only need the assignment in the case when it is executed in the complex condition.
But in most cases inner assignments reduce readability and they easily can be missed.
I think inner assignments are a relict to the first versions of the C programming language in the seventies, when the compilers didn't do any optimizations, and the work to optimize the code was left to the programmers. In that time inner assignments were faster, because it was not necessary to read the value again from the variable, but today with fast computers and optimizing compilers this point doesn't count any more. Nevertheless some C programmers were used to them. I think Sun introduced inner assignments to Java only because they wanted to be similar to C and make it easy for C programmers to change to Java.
Always work and aim for code readability not writeability. The same goes for stuff like a > b ? x : y;
There are probably many developers out there not having issues reading your first code snipet but most of them are used to the second snipet.
The more verbose form also makes it easier to follow in a Debugger such as Eclipse. I often split up single line assignments so the intermediate values are more easily visible.
Although not directly requested by OP a similar case is function calls as method arguments may save lines but are harder to debug:
myFunction(funcA(), funcB());
does not show the return types and is harder to step through. It's also more error-prone if the two values are of the same type.
I don't find any harm in using inner assignments. It saves few lines of code (though im sure it doesn't improve compiling or execution time or memory). The only drawback is that to someone else it might appear cumbersome.
I am not able to understand the point of Option[T] class in Scala. I mean, I am not able to see any advanages of None over null.
For example, consider the code:
object Main{
class Person(name: String, var age: int){
def display = println(name+" "+age)
}
def getPerson1: Person = {
// returns a Person instance or null
}
def getPerson2: Option[Person] = {
// returns either Some[Person] or None
}
def main(argv: Array[String]): Unit = {
val p = getPerson1
if (p!=null) p.display
getPerson2 match{
case Some(person) => person.display
case None => /* Do nothing */
}
}
}
Now suppose, the method getPerson1 returns null, then the call made to display on first line of main is bound to fail with NPE. Similarly if getPerson2 returns None, the display call will again fail with some similar error.
If so, then why does Scala complicate things by introducing a new value wrapper (Option[T]) instead of following a simple approach used in Java?
UPDATE:
I have edited my code as per #Mitch's suggestion. I am still not able to see any particular advantage of Option[T]. I have to test for the exceptional null or None in both cases. :(
If I have understood correctly from #Michael's reply, is the only advantage of Option[T] is that it explicitly tells the programmer that this method could return None? Is this the only reason behind this design choice?
You'll get the point of Option better if you force yourself to never, ever, use get. That's because get is the equivalent of "ok, send me back to null-land".
So, take that example of yours. How would you call display without using get? Here are some alternatives:
getPerson2 foreach (_.display)
for (person <- getPerson2) person.display
getPerson2 match {
case Some(person) => person.display
case _ =>
}
getPerson2.getOrElse(Person("Unknown", 0)).display
None of this alternatives will let you call display on something that does not exist.
As for why get exists, Scala doesn't tell you how your code should be written. It may gently prod you, but if you want to fall back to no safety net, it's your choice.
You nailed it here:
is the only advantage of Option[T] is
that it explicitly tells the
programmer that this method could
return None?
Except for the "only". But let me restate that in another way: the main advantage of Option[T] over T is type safety. It ensures you won't be sending a T method to an object that may not exist, as the compiler won't let you.
You said you have to test for nullability in both cases, but if you forget -- or don't know -- you have to check for null, will the compiler tell you? Or will your users?
Of course, because of its interoperability with Java, Scala allows nulls just as Java does. So if you use Java libraries, if you use badly written Scala libraries, or if you use badly written personal Scala libraries, you'll still have to deal with null pointers.
Other two important advantages of Option I can think of are:
Documentation: a method type signature will tell you whether an object is always returned or not.
Monadic composability.
The latter one takes much longer to fully appreciate, and it's not well suited to simple examples, as it only shows its strength on complex code. So, I'll give an example below, but I'm well aware it will hardly mean anything except for the people who get it already.
for {
person <- getUsers
email <- person.getEmail // Assuming getEmail returns Option[String]
} yield (person, email)
Compare:
val p = getPerson1 // a potentially null Person
val favouriteColour = if (p == null) p.favouriteColour else null
with:
val p = getPerson2 // an Option[Person]
val favouriteColour = p.map(_.favouriteColour)
The monadic property bind, which appears in Scala as the map function, allows us to chain operations on objects without worrying about whether they are 'null' or not.
Take this simple example a little further. Say we wanted to find all the favourite colours of a list of people.
// list of (potentially null) Persons
for (person <- listOfPeople) yield if (person == null) null else person.favouriteColour
// list of Options[Person]
listOfPeople.map(_.map(_.favouriteColour))
listOfPeople.flatMap(_.map(_.favouriteColour)) // discards all None's
Or perhaps we would like to find the name of a person's father's mother's sister:
// with potential nulls
val father = if (person == null) null else person.father
val mother = if (father == null) null else father.mother
val sister = if (mother == null) null else mother.sister
// with options
val fathersMothersSister = getPerson2.flatMap(_.father).flatMap(_.mother).flatMap(_.sister)
I hope this sheds some light on how options can make life a little easier.
The difference is subtle. Keep in mind to be truly a function it must return a value - null is not really considered to be a "normal return value" in that sense, more a bottom type/nothing.
But, in a practical sense, when you call a function that optionally returns something, you would do:
getPerson2 match {
case Some(person) => //handle a person
case None => //handle nothing
}
Granted, you can do something similar with null - but this makes the semantics of calling getPerson2 obvious by virtue of the fact it returns Option[Person] (a nice practical thing, other than relying on someone reading the doc and getting an NPE because they don't read the doc).
I will try and dig up a functional programmer who can give a stricter answer than I can.
For me options are really interesting when handled with for comprehension syntax. Taking synesso preceding example:
// with potential nulls
val father = if (person == null) null else person.father
val mother = if (father == null) null else father.mother
val sister = if (mother == null) null else mother.sister
// with options
val fathersMothersSister = for {
father <- person.father
mother <- father.mother
sister <- mother.sister
} yield sister
If any of the assignation are None, the fathersMothersSister will be None but no NullPointerException will be raised. You can then safely pass fathersMothersSisterto a function taking Option parameters without worrying. so you don't check for null and you don't care of exceptions. Compare this to the java version presented in synesso example.
You have pretty powerful composition capabilities with Option:
def getURL : Option[URL]
def getDefaultURL : Option[URL]
val (host,port) = (getURL orElse getDefaultURL).map( url => (url.getHost,url.getPort) ).getOrElse( throw new IllegalStateException("No URL defined") )
Maybe someone else pointed this out, but I didn't see it:
One advantage of pattern-matching with Option[T] vs. null checking is that Option is a sealed class, so the Scala compiler will issue a warning if you neglect to code either the Some or the None case. There is a compiler flag to the compiler that will turn warnings into errors. So it's possible to prevent the failure to handle the "doesn't exist" case at compile time rather than at runtime. This is an enormous advantage over the use of the null value.
It's not there to help avoid a null check, it's there to force a null check. The point becomes clear when your class has 10 fields, two of which could be null. And your system has 50 other similar classes. In the Java world, you try to prevent NPEs on those fields using some combination of mental horesepower, naming convention, or maybe even annotations. And every Java dev fails at this to a significant degree. The Option class not only makes "nullable" values visually clear to any developers trying to understand the code, but allows the compiler to enforce this previously unspoken contract.
[ copied from this comment by Daniel Spiewak ]
If the only way to use Option were
to pattern match in order to get
values out, then yes, I agree that it
doesn’t improve at all over null.
However, you’re missing a *huge* class
of its functionality. The only
compelling reason to use Option is
if you’re using its higher-order
utility functions. Effectively, you
need to be using its monadic nature.
For example (assuming a certain amount
of API trimming):
val row: Option[Row] = database fetchRowById 42
val key: Option[String] = row flatMap { _ get “port_key” }
val value: Option[MyType] = key flatMap (myMap get)
val result: MyType = value getOrElse defaultValue
There, wasn’t that nifty? We can
actually do a lot better if we use
for-comprehensions:
val value = for {
row <- database fetchRowById 42
key <- row get "port_key"
value <- myMap get key
} yield value
val result = value getOrElse defaultValue
You’ll notice that we are *never*
checking explicitly for null, None or
any of its ilk. The whole point of
Option is to avoid any of that
checking. You just string computations
along and move down the line until you
*really* need to get a value out. At
that point, you can decide whether or
not you want to do explicit checking
(which you should never have to do),
provide a default value, throw an
exception, etc.
I never, ever do any explicit matching
against Option, and I know a lot of
other Scala developers who are in the
same boat. David Pollak mentioned to
me just the other day that he uses
such explicit matching on Option (or
Box, in the case of Lift) as a sign
that the developer who wrote the code
doesn’t fully understand the language
and its standard library.
I don’t mean to be a troll hammer, but
you really need to look at how
language features are *actually* used
in practice before you bash them as
useless. I absolutely agree that
Option is quite uncompelling as *you*
used it, but you’re not using it the
way it was designed.
One point that nobody else here seems to have raised is that while you can have a null reference, there is a distinction introduced by Option.
That is you can have Option[Option[A]], which would be inhabited by None, Some(None) and Some(Some(a)) where a is one of the usual inhabitants of A. This means that if you have some kind of container, and want to be able to store null pointers in it, and get them out, you need to pass back some extra boolean value to know if you actually got a value out. Warts like this abound in the java containers APIs and some lock-free variants can't even provide them.
null is a one-off construction, it doesn't compose with itself, it is only available for reference types, and it forces you to reason in a non-total fashion.
For instance, when you check
if (x == null) ...
else x.foo()
you have to carry around in your head throughout the else branch that x != null and that this has already been checked. However, when using something like option
x match {
case None => ...
case Some(y) => y.foo
}
you know y is not Noneby construction -- and you'd know it wasn't null either, if it weren't for Hoare's billion dollar mistake.
Option[T] is a monad, which is really useful when you using high-order functions to manipulate values.
I'll suggest you read articles listed below, they are really good articles that show you why Option[T] is useful and how can it be used in functional way.
Martians vs Monads: Null Considered Harmful
Monads are Elephants Part 1
Adding on to Randall's teaser of an answer, understanding why the potential absence of a value is represented by Option requires understanding what Option shares with many other types in Scala—specifically, types modeling monads. If one represents the absence of a value with null, that absence-presence distinction can't participate in the contracts shared by the other monadic types.
If you don't know what monads are, or if you don't notice how they're represented in Scala's library, you won't see what Option plays along with, and you can't see what you're missing out on. There are many benefits to using Option instead of null that would be noteworthy even in the absence of any monad concept (I discuss some of them in the "Cost of Option / Some vs null" scala-user mailing list thread here), but talking about it isolation is kind of like talking about a particular linked list implementation's iterator type, wondering why it's necessary, all the while missing out on the more general container/iterator/algorithm interface. There's a broader interface at work here too, and Option provides a presence-and-absence model of that interface.
I think the key is found in Synesso's answer: Option is not primarily useful as a cumbersome alias for null, but as a full-fledged object that can then help you out with your logic.
The problem with null is that it is the lack of an object. It has no methods that might help you deal with it (though as a language designer you can add increasingly long lists of features to your language that emulate an object if you really feel like it).
One thing Option can do, as you've demonstrated, is to emulate null; you then have to test for the extraordinary value "None" instead of the extraordinary value "null". If you forget, in either case, bad things will happen. Option does make it less likely to happen by accident, since you have to type "get" (which should remind you that it might be null, er, I mean None), but this is a small benefit in exchange for an extra wrapper object.
Where Option really starts to show its power is helping you deal with the concept of I-wanted-something-but-I-don't-actually-have-one.
Let's consider some things you might want to do with things that might be null.
Maybe you want to set a default value if you have a null. Let's compare Java and Scala:
String s = (input==null) ? "(undefined)" : input;
val s = input getOrElse "(undefined)"
In place of a somewhat cumbersome ?: construct we have a method that deals with the idea of "use a default value if I'm null". This cleans up your code a little bit.
Maybe you want to create a new object only if you have a real value. Compare:
File f = (filename==null) ? null : new File(filename);
val f = filename map (new File(_))
Scala is slightly shorter and again avoids sources of error. Then consider the cumulative benefit when you need to chain things together as shown in the examples by Synesso, Daniel, and paradigmatic.
It isn't a vast improvement, but if you add everything up, it's well worth it everywhere save very high-performance code (where you want to avoid even the tiny overhead of creating the Some(x) wrapper object).
The match usage isn't really that helpful on its own except as a device to alert you about the null/None case. When it is really helpful is when you start chaining it, e.g., if you have a list of options:
val a = List(Some("Hi"),None,Some("Bye"));
a match {
case List(Some(x),_*) => println("We started with " + x)
case _ => println("Nothing to start with.")
}
Now you get to fold the None cases and the List-is-empty cases all together in one handy statement that pulls out exactly the value you want.
Null return values are only present for compatibility with Java. You should not use them otherwise.
It is really a programming style question. Using Functional Java, or by writing your own helper methods, you could have your Option functionality but not abandon the Java language:
http://functionaljava.org/examples/#Option.bind
Just because Scala includes it by default doesn't make it special. Most aspects of functional languages are available in that library and it can coexist nicely with other Java code. Just as you can choose to program Scala with nulls you can choose to program Java without them.
Admitting in advance that it is a glib answer, Option is a monad.
Actually I share the doubt with you. About Option it really bothers me that 1) there is a performance overhead, as there is a lor of "Some" wrappers created everywehre. 2) I have to use a lot of Some and Option in my code.
So to see advantages and disadvantages of this language design decision we should take into consideration alternatives. As Java just ignores the problem of nullability, it's not an alternative. The actual alternative provides Fantom programming language. There are nullable and non-nullable types there and ?. ?: operators instead of Scala's map/flatMap/getOrElse. I see the following bullets in the comparison:
Option's advantage:
simpler language - no additional language constructs required
uniform with other monadic types
Nullable's advantage:
shorter syntax in typical cases
better performance (as you don't need to create new Option objects and lambdas for map, flatMap)
So there is no obvious winner here. And one more note. There is no principal syntactic advantage for using Option. You can define something like:
def nullableMap[T](value: T, f: T => T) = if (value == null) null else f(value)
Or use some implicit conversions to get pritty syntax with dots.
The real advantage of having explicit option types is that you are able to not use them in 98% of all places, and thus statically preclude null exceptions. (And in the other 2% the type system reminds you to check properly when you actually access them.)
Another situation where Option works, is in situations where types are not able to have a null value. It is not possible to store null in an Int, Float, Double, etc. value, but with an Option you can use the None.
In Java, you would need to use the boxed versions (Integer, ...) of those types.