What are some common Java pitfalls/gotchas for C++ programmer? - java

As the question says, what are some common/major issues that C++ programmers face when switching to Java? I am looking for some broad topic names or examples and day to day adjustments that engineers had to make. I can then go and do an in-depth reading on this.
I am specifically interested in opinions of engineers who have worked in C++ for years and had to work with Java but any pointers from others or even book recommendations are more than welcome.

In C++ you'd use destructors to clean up file descriptors, database connections and the like. The naive equivalent is to use finalizers. Don't. Ever.
Instead use this pattern:
OutputStream os;
try {
os = ...
// do stuff
} finally {
try { os.close(); } catch (Exception e) { }
}
You'll end up doing stuff like that a lot.
If you specify no access modifer, in Java the members are package-private by default, unlike C++ in which they are private. Package-private is an annoying access level meaning it's private but anything in the same package can access it too (which is an idiotic default access level imho);
There is no stack/heap separation. Everything is created on the heap (well, that's not strictly true but we'll pretend it is);
There is no pass-by-reference;
The equivalent to function pointers is anonymous interfaces.

My biggest hurdle crossing from C++ to Java was ditching procedural code. I was very used to tying all my objects together within procedures. Without procedural code in java, I made circular references everywhere. I had to learn how to call objects from objects without them being dependents of each other. It was the Biggest hurdle but the easiest to overcome.
Number 2 personal issue is documentation. JavaDoc is useful but to many java projects are under the misconception that all that is needed is the JavaDoc. I saw much better documentation in C++ projects. This may just be a personal preference for documentation outside of the code.
Number 3. There are in fact pointers in java, just no pointer arithmetic. In java they are called references. Don't think that you can ignore where things are pointing at, it will come back with a big bite.
== and .equals are not equal.
== will look at the pointer(reference) while .equals will look at the value that the reference is pointing at.

Generics (instead of templates), specifically the way they were implemented using type erasure.

Since you mention book recommendations, definitely read Effective Java, 2nd ed.—it addresses most of the pitfalls I've seen listed in the answers.

Creating a reference by accident when one was thinking of a copy constructor:
myClass me = new myClass();
myClass somebodyElse = me; /* A reference, not a value copied into an independent instance! */
somebodyElse.setPhoneNumber(5551234);
/* Hey... how come my phone doesn't work anymore?!?!? */

No multiple inheritance, and every class implicitly derives from java.lang.Object (which has a number of important methods you definitely have to know and understand)
You can have a sort of multiple inheritance by implementing interfaces
No operator overloading except for '+' (for Strings), and definitely none you can do yourself
No unsigned numerical types, except char, which shouldn't really be used as a numerical type. If you have to deal with unsigned types, you have to do a lot of casting and masking.
Strings are not null-terminated, instead they are based on char arrays and as such are immutable. As a consequence of this, building a long String by appending with += in a loop is O(n^2), so don't do it; use a StringBuilder instead.

Getting used to having a garbage collector. Not being able to rely on a destructor to clean up resources that the GC does not handle.
Everything is passed by value, because references are passed instead of objects.
No copy constructor, unless you have a need to clone. No assignment operator.
All methods are virtual by default, which is the opposite of C++.
Explicit language support for interfaces - pure virtual classes in C++.

It's all the little bitty syntax differences that got me. Lack of destructors.
On the other hand, being able to write a main for each class (immensely handy or testing) is real nice; after you get used to it, the structure and tricks available with jar files are real nice; the fact that the semantics are completely defined (eg, int is the same everywhere) is real nice.

My worst problem was keeping in mind the ownership of memory at all times. In C++, it's a necessary thing to do, and it creates some patterns in developer's mind that are hard to overcome. In Java, I can forget about it (to a very high degree, anyway), and this enables some algorithms and approaches that would be exceedingly awkward in C++.

There are no objects in Java, there are only references to objects. E.g :
MyClass myClass; // no object is created unlike C++.
But :
MyClass myClass = new MyClass(); // Now it is a valid java object reference.

The best book of Java "gotchas" that I've read is Java Puzzlers: Traps, Pitfalls, and Corner Cases. It's not specifically aimed at C++ developers, but it is full of examples of things you want to look out for.

Specifying a method parameter as final doesn't mean what you at first think it means
private void doSomething(final MyObject myObj){
...
myObj.setSomething("this will change the obj in the calling method too");
...
}
because java is pass by value it is doing what you're asking, just not immediately obvious unless you understand how java passes the value of the reference rather than the object.

Another notable one is the keyword final and const. Java defines the const as a reserved keyword but doesn't specify much of its usage. Also
object1=object2
doesn't copy the objects it changes the reference

All methods are virtual.
Parameterized types (generics) don't actually create code parameter-specific code (ie, List<String> uses the same bytecode as List<Object>; the compiler is the only thing that complains if you try to put an Integer in the former).
Varargs is easy.

Related

Scala closures compared to Java innerclasses -> final VS var

I've first asked this question about the use of final with anonymous inner classes in Java:
Why do we use final keyword with anonymous inner classes?
I'm actually reading the Scala book of Martin Odersky. It seems Scala simplifies a lot of Java code, but for Scala closures I could notice a significant difference.
While in Java we "simulate" closures with an anonymous inner class, capturing a final variable (which will be copied to live on the heap instead of the stack) , it seems in Scala we can create a closure which can capture a val, but also a var, and thus update it in the closure call!
So it is like we can use a Java anonymous innerclass without the final keyword!
I've not finished reading the book, but for now i didn't find enough information on this language design choice.
Can someone tell me why Martin Odersky, who really seems to take care of function's side effects, choose closures to be able to capture both val and var, instead of only val?
What are the benefits and drawbacks of Java and Scala implementations?
Thanks
Related question:
With Scala closures, when do captured variables start to live on the JVM heap?
An object can be seen a bag of closures that share access to the same environment and that environment is usually mutable. So why make closures generated from anonymous functions less powerful?
Also, other languages with mutable variables and anonymous functions work the same way. Principle of lease astonishment. Java is actually WEIRD in not allowing mutable variables to be captured by inner classes.
And sometimes they're just darn useful. For example a self modifying thunk to create your own variant of lazy or future processing.
Downsides? They have all the downsides of shared mutable state.
Here are some benefits and drawbacks.
There is a principle in language design that the cost of something should be apparent to the programmer. (I first saw this in Holt's Design and Definition of the Turing Language, but I forget the name he gave it.) Two things that look the same should cost the same. Two local vars should have similar costs. This is in Java's favour. Two local vars in Java are implemented the same and so cost the same regardless of whether one of them is mentioned in an inner class. Another point in Java's favour is that in most cases the captured variable is indeed final, so there is little cost to the programmer to be prevented from capturing nonfinal local vars. Also the insistence on final simplifies the compiler, since it means that all local variables are stack variables.
There is another principle of language design that says be orthogonal. If a val can be captured why not a var. As long as there is a sensible interpretation, why put in restrictions. When languages are not orthogonal enough, they seems perverse. On the other-hand, languages that have too much orthogonality may have complex (and hence buggy and/or late and/or few) implementations. For example Algol 68 had orthogonality in spades, but implementing it was not easy, meaning few implementations and little uptake. Pascal (designed at about the same time) had all sorts of inelegant restrictions that made writing compilers easier. The result was lots of implementations and lots of uptake.

what is placement new operator in C++ and does java has it?

I heard about placement new operator of C++. I am confused what it is. However, I can see where it can be used under a question in stackoverflow. I am also confused whether we have this in java or not.
So my question is very precise: What is placement new operator and do we have something like it in java?
Note please, don't be confused with other questions on stackoverflow: they are not duplicate of this question.
The following article explains the meaning of placement new in C++: http://www.glenmccl.com/nd_cmp.htm
This term itself is relevant for overloaded new statement. Since Java does not allow to overload operators at all and specifically new operator the placement new is irrelevant for Java.
But you have several alternatives.
Using factory or builder pattern
Using wrapper/decorator pattern (probably together with factory) that allows changin some class functionality by wrapping its methods.
Aspect oriented programming. It works almost like decorator pattern but can be implemented using byte code modifiction.
Class loader interception
The term "placement new" itself is somewhat ambiguous. The term is used
in two different ways in the C++ standard, and thus by the C++
community.
The first meaning refers to any overloaded operator new function
which has more than one parameter. The additional parameters can be
used for just about anything—there are two examples in the
standard itself: operator new(size_t, void*) and operator new(size_t,
std::nothrow_t const&).
The second meaning refers to the specific overload operator new(size_t,
void*), which is used in fact to explicitly call the constructor of an
object on memory obtained from elsewhere: to separate allocation
from initialization. (It will be used in classes like std::vector,
for example, where capacity() may be greater than size().)
In Java, memory management is integrated into the language, and is not
part of the library, so there can be no equivalents.
Placement new allows to specify custom allocators that take extra parameters.
There is also a predefined placement allocator that takes as extra parameter a pointer and that just returns as result of allocation that pointer, basically allowing your code to create objects at the address you specify.
You can however define other types of allocators that take other parameters, for example our debug allocator takes as extra parameters the filename and the line on which the allocation is performed. Storing this extra information with the allocated object allows us to track back to the source code where has been created a certain object instance that for example got leaked or overwritten or used after deallocation.
AFAIK Java works at an higher conceptual level and has no pointer concept (only the null pointer exception ;-) ). Memory is just a black magic box and the programmer never use the idea of memory address.
I only knew Java 1.1 and back then decided to not invest time on that commercial product so may be the logical level of Java lowered enough today to reach the random access memory concept.

Learning from Java to C... ...What is the biggest challenge?

I had experience on Java, because of some results, I need to code in C, is it difficult to switch from Java to C? And what is the biggest different between these two languages?
Without question, first and foremost, it's the manual memory management.
Second is that C has no objects so C code will tend to be structured very differently to Java code.
Edit: a little anecdote: back 15 or so years ago when it was common to log on to your local ISP on a UNIX command prompt when PPP was still pretty new and when university campuses still had an abundance of dumb terminals to UNIX servers, many had a program called fortune that would run when we logged on and output a random geeky platitude. I literally laughed out loud one day when I logged in an read:
C -- a programming language that
combines the power of assembly
language with the flexibility of
assembly language.
It's funny because it's true: C is the assembly language of modern computing. That's not a criticism, merely an observation.
Perhaps the most difficult concept is learning how to handle pointers and memory management yourself. Java substantially abstracts many concepts related to pointers, but in C, you'll have to understand how pointers are related to one another, and to the other concepts in the language.
Besides pointers and memory management, C is not an object-oriented language. You can organize your code to meet some object based concepts, but you will miss some features like inheritance, interfaces and polimorphism.
Java has a massive standard library, while C's is tiny. You'll find yourself re-inventing every kind of wheel over and over again. Even if you're writing the fifteenth linked list library of your career, you might very well make the same mistakes you've made before.
C has no standard containers besides arrays, few algorithms, no standard access to networking, graphics, web anything, xml anything, and so on. You have to really know what you're doing not to accidentally invoke undefined behaviour, cause memory corruption, resource leaks, crashes, etc. It's not for tourists.
Good luck.
Apart from Standard C being a non-OO language, the standard library being very small (and in some places, just plain bad), the goriness of manual memory handling, the complete lack of threading utilities (or even awareness of multithreading), the lax "type system" and being built for single-byte character sets, I think the greatest conceptual difference is that you have to have a clear notion of ownership of objects (or memory chunks, as it becomes in C).
It's always good practice to specify ownership of objects, but for non-GCed languages it is paramount. When you pass a pointer to another function, will that function assume ownership of the pointer or will it just "borrow" it from you for the duration of the call? When you write a function taking a pointer argument, does it make sense for you to assume ownership of the pointer, or does the pointee continue living after the function terminates?
People have covered the big differences: memory management, pointers, no fancy objects (just plain structs). So I will list a couple of minor things:
You have to declare things at the beginning of a block of code, not just when you want to use them for the first time
Automatic type conversion between pointers, booleans, ints and just about anything. This is typical C code: if (!ptr) { /* null pointer detected */ }
No bounds-checking in any sense, and fancy pointer arithmetic allowed. It is explicitly legal to reference things outside their bounds: ptr2 = ptr + 10; ptr2[-10] ++; is equivalent to ptr[0] ++;.
Strings are zero-terminated char arrays. Forgetting this and the previous point causes all sorts of bugs and security holes.
Headers should be separated from implementation code (.h and .c), and must explicitly reference each other via #include statements, avoiding any circular dependencies. There is a preprocessor, and compile-time macros (such as #includes) are a vital part of the language.
And finally -- C has a goto statement. But, if you ever use it, Dijkstra will rise from the grave and haunt you.
Java is a garbage collected environment, C is not. C has pointers, Java does not. Java methods use pass by reference more often, implicitly, whereas C you must explicitly indicate when you are passing by reference. Lot more considerations to be sure.

What is the reason for these PMD rules?

DataflowAnomalyAnalysis: Found
'DD'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DataflowAnomalyAnalysis: Found
'DU'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DD and DU sound familiar...I want to say in things like testing and analysis relating to weakest pre and post conditions, but I don't remember the specifics.
NullAssignment: Assigning an Object to
null is a code smell. Consider
refactoring.
Wouldn't setting an object to null assist in garbage collection, if the object is a local object (not used outside of the method)? Or is that a myth?
MethodArgumentCouldBeFinal: Parameter
'param' is not assigned and could be
declared final
LocalVariableCouldBeFinal: Local
variable 'variable' could be declared
final
Are there any advantages to using final parameters and variables?
LooseCoupling: Avoid using
implementation types like
'LinkedList'; use the interface
instead
If I know that I specifically need a LinkedList, why would I not use one to make my intentions explicitly clear to future developers? It's one thing to return the class that's highest up the class path that makes sense, but why would I not declare my variables to be of the strictest sense?
AvoidSynchronizedAtMethodLevel: Use
block level rather than method level
synchronization
What advantages does block-level synchronization have over method-level synchronization?
AvoidUsingShortType: Do not use the
short type
My first languages were C and C++, but in the Java world, why should I not use the type that best describes my data?
DD and DU anomalies (if I remember correctly—I use FindBugs and the messages are a little different) refer to assigning a value to a local variable that is never read, usually because it is reassigned another value before ever being read. A typical case would be initializing some variable with null when it is declared. Don't declare the variable until it's needed.
Assigning null to a local variable in order to "assist" the garbage collector is a myth. PMD is letting you know this is just counter-productive clutter.
Specifying final on a local variable should be very useful to an optimizer, but I don't have any concrete examples of current JITs taking advantage of this hint. I have found it useful in reasoning about the correctness of my own code.
Specifying interfaces in terms of… well, interfaces is a great design practice. You can easily change implementations of the collection without impacting the caller at all. That's what interfaces are all about.
I can't think of many cases where a caller would require a LinkedList, since it doesn't expose any API that isn't declared by some interface. If the client relies on that API, it's available through the correct interface.
Block level synchronization allows the critical section to be smaller, which allows as much work to be done concurrently as possible. Perhaps more importantly, it allows the use of a lock object that is privately controlled by the enclosing object. This way, you can guarantee that no deadlock can occur. Using the instance itself as a lock, anyone can synchronize on it incorrectly, causing deadlock.
Operands of type short are promoted to int in any operations. This rule is letting you know that this promotion is occurring, and you might as well use an int. However, using the short type can save memory, so if it is an instance member, I'd probably ignore that rule.
DataflowAnomalyAnalysis: Found
'DD'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DataflowAnomalyAnalysis: Found
'DU'-anomaly for variable 'variable'
(lines 'n1'-'n2').
No idea.
NullAssignment: Assigning an Object to
null is a code smell. Consider
refactoring.
Wouldn't setting an object to null assist in garbage collection, if the object is a local object (not used outside of the method)? Or is that a myth?
Objects in local methods are marked to be garbage collected once the method returns. Setting them to null won't do any difference.
Since it would make less experience developers what is that null assignment all about it may be considered a code smell.
MethodArgumentCouldBeFinal: Parameter
'param' is not assigned and could be
declared final
LocalVariableCouldBeFinal: Local
variable 'variable' could be declared
final
Are there any advantages to using final parameters and variables?
It make clearer that the value won't change during the lifecycle of the object.
Also, if by any chance someone try to assign a value, the compiler will prevent this coding error at compile type.
consider this:
public void businessRule( SomeImportantArgument important ) {
if( important.xyz() ){
doXyz();
}
// some fuzzy logic here
important = new NotSoImportant();
// add for/if's/while etc
if( important.abc() ){ // <-- bug
burnTheHouse();
}
}
Suppose that you're assigned to solve some mysterious bug that from time to time burns the house.
You know what wast the parameter used, what you don't understand is WHY the burnTHeHouse method is invoked if the conditions are not met ( according to your findings )
It make take you a while to findout that at some point in the middle, somone change the reference, and that you are using other object.
Using final help to prevent this kind of things.
LooseCoupling: Avoid using
implementation types like
'LinkedList'; use the interface
instead
If I know that I specifically need a LinkedList, why would I not use one to make my intentions explicitly clear to future developers? It's one thing to return the class that's highest up the class path that makes sense, but why would I not declare my variables to be of the strictest sense?
There is no difference, in this case. I would think that since you are not using LinkedList specific functionality the suggestion is fair.
Today, LinkedList could make sense, but by using an interface you help your self ( or others ) to change it easily when it wont.
For small, personal projects this may not make sense at all, but since you're using an analyzer already, I guess you care about the code quality already.
Also, helps less experienced developer to create good habits. [ I'm not saying you're one but the analyzer does not know you ;) ]
AvoidSynchronizedAtMethodLevel: Use
block level rather than method level
synchronization
What advantages does block-level synchronization have over method-level synchronization?
The smaller the synchronized section the better. That's it.
Also, if you synchronize at the method level you'll block the whole object. When you synchronize at block level, you just synchronize that specific section, in some situations that's what you need.
AvoidUsingShortType: Do not use the
short type
My first languages were C and C++, but in the Java world, why should I not use the type that best describes my data?
I've never heard of this, and I agree with you :) I've never use short though.
My guess is that by not using it, you'll been helping your self to upgrade to int seamlessly.
Code smells are more oriented to code quality than performance optimizations. So the advice are given for less experienced programmers and to avoid pitfalls, than to improve program speed.
This way, you could save a lot of time and frustrations when trying to change the code to fit a better design.
If it the advise doesn't make sense, just ignore them, remember, you are the developer at charge, and the tool is just that a tool. If something goes wrong, you can't blame the tool, right?
Just a note on the final question.
Putting "final" on a variable results in it only be assignable once. This does not necessarily mean that it is easier to write, but it most certainly means that it is easier to read for a future maintainer.
Please consider these points:
any variable with a final can be immediately classified in "will not change value while watching".
by implication it means that if all variables which will not change are marked with final, then the variables NOT marked with final actually WILL change.
This means that you can see already when reading through the definition part which variables to look out for, as they may change value during the code, and the maintainer can spend his/her efforts better as the code is more readable.
Wouldn't setting an object to null
assist in garbage collection, if the
object is a local object (not used
outside of the method)? Or is that a
myth?
The only thing it does is make it possible for the object to be GCd before the method's end, which is rarely ever necessary.
Are there any advantages to using final parameters and variables?
It makes the code somewhat clearer since you don't have to worry about the value being changed somwhere when you analyze the code. More often then not you don't need or want to change a variable's value once it's set anyway.
If I know that I specifically need a
LinkedList, why would I not use one to
make my intentions explicitly clear to
future developers?
Can you think of any reason why you would specifically need a
LinkedList?
It's one thing to
return the class that's highest up the
class path that makes sense, but why
would I not declare my variables to be
of the strictest sense?
I don't care much about local variables or fields, but if you declare a method parameter of type LinkedList, I will hunt you down and hurt you, because it makes it impossible for me to use things like Arrays.asList() and Collections.emptyList().
What advantages does block-level synchronization have over method-level synchronization?
The biggest one is that it enables you to use a dedicated monitor object so that only those critical sections are mutually exclusive that need to be, rather than everything using the same monitor.
in the Java world, why should I not
use the type that best describes my
data?
Because types smaller than int are automtically promoted to int for all calculations and you have to cast down to assign anything to them. This leads to cluttered code and quite a lot of confustion (especially when autoboxing is involved).
AvoidUsingShortType: Do not use the short type
List item
short is 16 bit, 2's compliment in java
a short mathmatical operaion with anything in the Integer family outside of another short will require a runtime sign extension conversion to the larger size. operating against a floating point requires sign extension and a non-trivial conversion to IEEE-754.
can't find proof, but with a 32 bit or 64 bit register, you're no longer saving on 'processor instructions' at the bytecode level. You're parking a compact car in a a semi-trailer's parking spot as far as the processor register is concerned.
If your are optimizing your project at the byte code level, wow. just wow. ;P
I agree on the design side of ignoring this pmd warning, just weigh accurately describing your object with a 'short' versus the incurred performance conversions.
in my opinion, the incurred performance hits are miniscule on most machines. ignore the error.
What advantages does block-level
synchronization have over method-level
synchronization?
Synchronize a method is like do a synchronize(getClass()) block, and blocks all the class.
Maybe you don't want that

Does Java need closures?

I've been reading a lot lately about the next release of Java possibly supporting closures. I feel like I have a pretty firm grasp on what closures are, but I can't think of a solid example of how they would make an Object-Oriented language "better". Can anyone give me a specific use-case where a closure would be needed (or even preferred)?
As a Lisp programmer I would wish that the Java community understands the following difference: functions as objects vs. closures.
a) functions can be named or anonymous. But they can also be objects of themselves. This allows functions to be passed around as arguments, returned from functions or stored in data structures. This means that functions are first class objects in a programming language.
Anonymous functions don't add much to the language, they just allow you to write functions in a shorter way.
b) A closure is a function plus a binding environment. Closures can be passed downwards (as parameters) or returned upwards (as return values). This allows the function to refer to variables of its environment, even if the surrounding code is no longer active.
If you have a) in some language, then the question comes up what to do about b)? There are languages that have a), but not b). In the functional programming world a) (functions) and b (functions as closures) is nowadays the norm. Smalltalk had a) (blocks are anonymous functions) for a long time, but then some dialects of Smalltalk added support for b) (blocks as closures).
You can imagine that you get a slightly different programming model, if you add functions and closures to the language.
From a pragmatic view, the anonymous function adds some short notation, where you pass or invoke functions. That can be a good thing.
The closure (function plus binding) allows you for example to create a function that has access to some variables (for example to a counter value). Now you can store that function in an object, access it and invoke it. The context for the function object is now not only the objects it has access to, but also the variables it has access to via bindings. This is also useful, but you can see that variable bindings vs. access to object variables now is an issue: when should be something a lexical variable (that can be accessed in a closure) and when should it be a variable of some object (a slot). When should something be a closure or an object? You can use both in the similar ways. A usual programming exercise for students learning Scheme (a Lisp dialect) is to write a simple object system using closures.
The result is a more complicated programming language and a more complicated runtime model. Too complicated?
They don't make an Object-Oriented language better. They make practical languages more practical.
If you're attacking a problem with the OO hammer - represent everything as interactions between objects - then a closure makes no sense. In a class-based OO language, closures are the smoke-filled back rooms where stuff gets done but no one talks about it afterwards. Conceptually, it is abhorrent.
In practice, it's extremely convenient. I don't really want to define a new type of object to hold context, establish the "do stuff" method for it, instantiate it, and populate the context... i just want to tell the compiler, "look, see what i have access to right now? That's the context i want, and here's the code i want to use it for - hold on to this for me 'till i need it".
Fantastic stuff.
The most obvious thing would be a pseudo-replacement for all those classes that just have a single method called run() or actionPerformed() or something like that. So instead of creating a Thread with a Runnable embedded, you'd use a closure instead. Not more powerful than what we've got now, but much more convenient and concise.
So do we need closures? No. Would they be nice to have? Sure, as long as they don't feel bolted on, as I fear they would be.
I suppose for supporting core functional programming concepts, you need closures. Makes the code more elegant and composable with the support for closures. Also, I like the idea of passing around lines of code as parameters to functions.
There are some very useful 'higher order functions' which can do operations on lists using closures. Higher order functions are functions having 'function objects' as parameters.
E.g. it is a very common operation to apply some transformation to every element in a list. This higher order function is commonly called 'map' or 'collect'. (See the *. spread operator of Groovy).
For example to square each element in a list without closures you would probably write:
List<Integer> squareInts(List<Integer> is){
List<Integer> result = new ArrayList<Integer>(is.size());
for (Integer i:is)
result.add(i*i);
return result;
}
Using closures and map and the proposed syntax, you could write it like that:
is.map({Integer i => i*i})
(There is a possible performance problem here regarding boxing of primitive types.)
As explained by Pop Catalin there is another higher order function called 'select' or 'filter': It can be used to get all the elements in a list complying to some condition. For example:
Instead of:
void onlyStringsWithMoreThan4Chars(List<String> strings){
List<String> result = new ArrayList<String>(str.size()); // should be enough
for (String str:strings)
if (str.length() > 4) result.add(str);
return result;
}
Instead you could write something like
strings.select({String str => str.length() > 4});
using the proposal.
You might look at the Groovy syntax, which is an extension of the Java language to support closures right now. See the chapter on collections of the Groovy User Guide for more examples what to do with closures.
A remark:
There is perhaps some clarification needed regarding the term 'closure'. What I've shown above are strictly spoken no closures. They are just 'function objects'.
A closure is everything which can capture - or 'close over' - the (lexical) context of the code surrounding it. In that sense there are closures in Java right now, i.e. anonymous classes:
Runnable createStringPrintingRunnable(final String str){
return new Runnable(){
public void run(){
System.out.println(str); // this accesses a variable from an outer scope
}
};
}
Java doesn't need closures, an Object oriented language can do everything a closure does using intermediate objects to store state or do actions (in Java's case inner classes).
But closures are desirable as a feature because they greatly simplify the code and increase readability and as a consequence the maintainability of the code.
I'm no Java specialist but I'm using C# 3.5 and closures are one of my favorite features of the language, for example take the following statement as an example:
// Example #1 with closures
public IList<Customer> GetFilteredCustomerList(string filter) {
//Here a closure is created around the filter parameter
return Customers.Where( c => c.Name.Contains(filter)).ToList();
}
now take an equivalent example that doesn't use closures
//Example #2 without closures, using just basic OO techniques
public IList<Customer> GetFilteredCustomerList(string filter) {
return new Customers.Where( new CustomerNameFiltrator(filter));
}
...
public class CustomerNameFiltrator : IFilter<Customer> {
private string _filter;
public CustomerNameFiltrator(string filter) {
_filter = filter;
}
public bool Filter(Customer customer) {
return customer.Name.Contains( _filter);
}
}
I know this is C# and not Java but the idea is the same, closures are useful for conciseness, and make code shorter and more readable. Behind the scenes, the closures of C# 3.5 do something that's looks very similar to example #2 meaning the compiler creates a private class behind the scenes and passes the 'filter' parameter to it.
Java doesn't need closures to work, as a developer you don't need them either, but, they are useful and provide benefits so that means that they are desirable in a language that is a production language and one of it's goals is productivity.
I've been reading a lot lately about the next release of Java possibly supporting closures. I feel like I have a pretty firm grasp on what closures are, but I can't think of a solid example of how they would make an Object-Oriented language "better."
Well, most people who use the term "closure" actually mean "function object", and in this sense, function objects make it possible to write simpler code in certain circumstances such as when you need custom comparators in a sort function.
For example, in Python:
def reversecmp(x, y):
return y - x
a = [4, 2, 5, 9, 11]
a.sort(cmp=reversecmp)
This sorts the list a in reverse order by passing the custom comparison functoin reversecmp. The addition of the lambda operator makes things even more compact:
a = [4, 2, 5, 9, 11]
a.sort(cmp=lambda x, y : y - x)
Java does not have function objects, so it uses "functor classes" to simulate them. In Java you do the equivalent operation by implementing a custom version of the Comparator class, and passing that to the sort function:
class ReverseComparator implements Comparator {
public compare(Object x, Object y) {
return (Integer) y - (Integer) x;
}
...
List<Integer> a = Arrays.asList(4, 2, 5, 9, 11);
Collections.sort(a, new ReverseComparator());
As you can see, it gives the same effect as closures, but is clumsier and more verbose. However, the addition of anonymous inner classes obviates most of the pain:
List<Integer> a = Arrays.asList(4, 2, 5, 9, 11);
Comparator reverse = new Comparator() {
public Compare(Object x, Object y) {
return (Integer) y - (Integer) x;
}
}
Collections.sort(a, reverse);
So I would say that the combination of functor classes + anonymous inner classes in Java is sufficient to compensate for the lack of true function objects, making their addition unnecessary.
Java has had closures since 1.1, just in a very cumbersome and limited way.
They are often useful wherever you have a callback of some description. A common case is to abstract away control flow, leaving the interesting code to call an algoritm with a closure that has no external control flow.
A trivial example is for-each (although Java 1.5 already has that). Whilst you can implement a forEach method in Java as it stands, it's far too verbose to be useful.
An example which already makes sense with existing Java is implementing the "execute around" idiom, whereby resource acquisition and release is abstracted. For instance, file open and close can be done within try/finally, without the client code having to get the details right.
When closures finally arrive in Java, I will gleefully get rid of all my custom comparator classes.
myArray.sort( (a, b) => a.myProperty().compareTo(b.myProperty() );
...looks a helluva lot better than...
myArray.sort(new Comparator<MyClass>() {
public int compare(MyClass a, MyClass b) {
return a.myProperty().compareTo(b.myProperty();
}
});
A few people have said, or implied, that closures are just syntactic sugar - doing what you could already do with anonymous inner classes and making it more convenient to pass parameters in.
They are syntactic sugar in the same sense that Java is syntactic sugar for assembler (that "assembler" could be bytecode, for sake of argument). In other words they raise they level of abstraction, and this is an important concept.
Closures promote the concept of the function-as-object to a first class entity - one that increases the expressiveness of code, rather than cluttering it with even more boilerplate.
An example that's close to my heart has already been mentioned by Tom Hawtin - implementing the Execute Around idiom, which is just about the only way to get RAII into Java. I wrote a blog entry on exactly that subject a couple of years ago when I first heard closures might be coming.
Ironically, I think the very reason that closures would be good for Java (more expressiveness with less code) may be what rattles many Java advocates. Java has a mindset of "spell everything out the long way". That and the fact that closures are a nod towards a more functional way of doing things - which I also see as a Good Thing, but may water down the pure OO message that many in the Java community hold dear.
I have been thinking a lot about the topic of this very interesting question in
the last few days. First of all, if I have understood correctly, Java already has
some basic notion of closures (defined through anonymous classes) but the new feature
that is going to be introduced is the support for closures based on anonymous functions.
This extension will definitely make the language more expressive but I am not sure
if it really fits with the rest of the language.
Java has been designed as an object-oriented language with no support for functional programming: Will the new semantics be easy to understand? Java 6 does not even have functions, will Java 7 have anonymous functions but no "normal" functions?
My impression is that as new programming styles or paradigms like functional
programming become more popular, everyone wants to use them in their
favourite OOP language. This is understandable: one wants to continue to use
a language they're familiar with while adopting new features. But in this way
a language can become really complex and lose coherence.
So my attitude at the moment is to stick to Java 6 for OOP (I hope Java 6 will still
be supported for a while) and, in case I really get interested in doing OOP + FP,
to take a look at some other language like Scala (Scala was defined to be multi-
paradigm from the beginning and can be well integrated with Java) rather than switching
to Java 7.
I think Java owes its success to the fact that it combines a simple language with very
powerful libraries and tools, and I do not think that new features like closures will
make it a better programming language.
Now that JDK8 is about to be released there is more information available that can enrich the answers to this question.
Bria Goetz, language architect at Oracle, has published a series of papers (yet drafts) on the current state of lambda expressions in Java. It does also cover closures as they are planning to release them in the upcoming JDK, which should be code complete around January 2013 and should be released around midyear 2013.
The State of Lambda: in the first page or two this article attempts to answer the question presented here. Although I still found it short in arguments, but is is full of examples.
The State of Lambda - Libraries Edition: this is also very interesting because it covers advantages like lazy evaluation and parallelism.
The Translation of Lambda Expressions: which basically explains the desugaring process done by the Java compiler.
As a java developer who is trying to teach themselves lisp in an attempt to become a better programmer, I would say that I would like to see the Josh Block proposal for closures implemented. I find myself using anonymous inner classes to express things like what to do with each element of a list when aggregating some data. To would be nice to represent that as a closure, instead of having to create an abstract class.
Closures in an imperative language (examples: JavaScript, C#, the forthcoming C++ refresh) are not the same as anonymous inner classes. They need to be able to capture modifiable references to local variables. Java's inner classes can only capture local final variables.
Almost any language feature can be criticised as non-essential:
for, while, do are all just syntactic sugar over goto/if.
Inner classes are syntactic sugar over classes with a field pointing to the outer class.
Generics are syntactic sugar over casts.
Exactly the same "non-essential" argument should have blocked the inclusion of all the above features.
Java Closure Examples
Not only that benjismith, but I love how you can just do...
myArray.sort{ it.myProperty }
You only need the more detailed comparator you've shown when the natural language comparison of the property doesn't suit your needs.
I absolutely love this feature.
What about readability and maintainability...one-liner closures are harder to understand and debug, imo
Software has looong life and you can get people with rudimentary knowledge of the language to maintain it...So spread out logic better than one-liners for easy maintenance...You generally don't have a software star looking after software after its release...
You might want to look at Groovy, a language that's mostly compatible with Java, and runs on the JRE, but supports Closures.
The lack of binding in anonymous function [i.e. if the variables (and method arguments if there is an enclosing method) of the outer context are declared final then they are available but not otherwise], I don't quite understand what that restriction actually buys.
I use "final" profusely anyways. So, if my intent was to use the same objects inside the closure, I would indeed declare those objects final in the enclosing scope. But what would be wrong in letting the "closure [java a.i.c.]" just get a copy of the reference as if passed via a constructor (well that in fact is how it is done).
If the closure wants to overwrite the reference, so be it; it will do so without changing the copy that the enclosing scope sees.
If we argue that that would lead to unreadable code (e.g. maybe it's not straight-forward to see what the object reference is at the time of the constructor call for the a.i.c.), then how about at least making the syntax less verbose? Scala? Groovy?

Categories