scala vs java, performance and memory? [closed]

scala vs java, performance and memory? [closed] - java

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am keen to look into Scala, and have one basic question I cant seem to find an answer to:
in general, is there a difference in performance and usage of memory between Scala and Java?

Scala makes it very easy to use enormous amounts of memory without realizing it. This is usually very powerful, but occasionally can be annoying. For example, suppose you have an array of strings (called array), and a map from those strings to files (called mapping). Suppose you want to get all files that are in the map and come from strings of length greater than two. In Java, you might
int n = 0;
for (String s: array) {
if (s.length > 2 && mapping.containsKey(s)) n++;
}
String[] bigEnough = new String[n];
n = 0;
for (String s: array) {
if (s.length <= 2) continue;
bigEnough[n++] = mapping.get(s);
}
Whew! Hard work. In Scala, the most compact way to do the same thing is:
val bigEnough = array.filter(_.length > 2).flatMap(mapping.get)
Easy! But, unless you're fairly familiar with how the collections work, what you might not realize is that this way of doing this created an extra intermediate array (with filter), and an extra object for every element of the array (with mapping.get, which returns an option). It also creates two function objects (one for the filter and one for the flatMap), though that is rarely a major issue since function objects are small.
So basically, the memory usage is, at a primitive level, the same. But Scala's libraries have many powerful methods that let you create enormous numbers of (usually short-lived) objects very easily. The garbage collector is usually pretty good with that kind of garbage, but if you go in completely oblivious to what memory is being used, you'll probably run into trouble sooner in Scala than Java.
Note that the Computer Languages Benchmark Game Scala code is written in a rather Java-like style in order to get Java-like performance, and thus has Java-like memory usage. You can do this in Scala: if you write your code to look like high-performance Java code, it will be high-performance Scala code. (You may be able to write it in a more idiomatic Scala style and still get good performance, but it depends on the specifics.)
I should add that per amount of time spent programming, my Scala code is usually faster than my Java code since in Scala I can get the tedious not-performance-critical parts done with less effort, and spend more of my attention optimizing the algorithms and code for the performance-critical parts.

I'm a new user, so I'm not able to add a comment to Rex Kerr's answer above (allowing new users to "answer" but not "comment" is a very odd rule btw).
I signed up simply to respond to the "phew, Java is so verbose and such hard work" insinuation of Rex's popular answer above. While you can of course write more concise Scala code, the Java example given is clearly bloated. Most Java developers would code something like this:
List<String> bigEnough = new ArrayList<String>();
for(String s : array) {
if(s.length() > 2 && mapping.get(s) != null) {
bigEnough.add(mapping.get(s));
}
}
And of course, if we are going to pretend that Eclipse doesn't do most of the actual typing for you and that every character saved really makes you a better programmer, then you could code this:
List b=new ArrayList();
for(String s:array)
if(s.length()>2 && mapping.get(s) != null) b.add(mapping.get(s));
Now not only did I save the time it took me to type full variable names and curly braces (freeing me to spend 5 more seconds to think deep algorithmic thoughts), but I can also enter my code in obfuscation contests and potentially earn extra cash for the holidays.

Write your Scala like Java, and you can expect almost identical bytecode to be emitted - with almost identical metrics.
Write it more "idiomatically", with immutable objects and higher order functions, and it'll be a bit slower and a bit larger. The one exception to this rule-of-thumb is when using generic objects in which the type params use the #specialised annotation, this'll create even larger bytecode that can outpace Java's performance by avoiding boxing/unboxing.
Also worth mentioning is the fact that more memory / less speed is an inevitable trade-off when writing code that can be run in parallel. Idiomatic Scala code is far more declarative in nature than typical Java code, and is often a mere 4 characters (.par) away from being fully parallel.
So if
Scala code takes 1.25x longer than Java code in a single thread
It can be easily split across 4 cores (now common even in laptops)
for a parallel run time of (1.24 / 4 =) 0.3125x the original Java
Would you then say that the Scala code is now comparatively 25% slower, or 3x faster?
The correct answer depends on exactly how you define "performance" :)

Computer Language Benchmarks Game:
Speed test java/scala 1.71/2.25
Memory test java/scala 66.55/80.81
So, this benchmarks say that java is 24% faster and scala uses 21% more memory.
All-in-all it's no big deal and should not matter in real world apps, where most of the time is consumed by database and network.
Bottom line: If Scala makes you and your team (and people taking project over when you leave) more productive, then you should go for it.

Others have answered this question with respect to tight loops although there seems to be an obvious performance difference between Rex Kerr's examples that I have commented on.
This answer is really targeted at people who might investigate a need for tight-loop optimisation as design flaw.
I am relatively new to Scala (about a year or so) but the feel of it, thus far, is that it allows you to defer many aspects of design, implementation and execution relatively easily (with enough background reading and experimentation :)
Deferred Design Features:
Abstract Types
Explicitly Typed Self References
Views
Mixins
Deferred Implementation Features:
Variance Annotations
Compound Types
Local Type Inference
Deferred Execution Features: (sorry, no links)
Thread-safe lazy values
Pass-by-name
Monadic stuff
These features, to me, are the ones that help us to tread the path to fast, tight applications.
Rex Kerr's examples differ in what aspects of execution are deferred. In the Java example, allocation of memory is deferred until it's size is calculated where the Scala example defers the mapping lookup. To me, they seem like completely different algorithms.
Here's what I think is more of an apples to apples equivalent for his Java example:
val bigEnough = array.collect({
case k: String if k.length > 2 && mapping.contains(k) => mapping(k)
})
No intermediary collections, no Option instances etc.
This also preserves the collection type so bigEnough's type is Array[File] - Array's collect implementation will probably be doing something along the lines of what Mr Kerr's Java code does.
The deferred design features I listed above would also allow Scala's collection API developers to implement that fast Array-specific collect implementation in future releases without breaking the API. This is what I'm referring to with treading the path to speed.
Also:
val bigEnough = array.withFilter(_.length > 2).flatMap(mapping.get)
The withFilter method that I've used here instead of filter fixes the intermediate collection problem but there is still the Option instance issue.
One example of simple execution speed in Scala is with logging.
In Java we might write something like:
if (logger.isDebugEnabled())
logger.debug("trace");
In Scala, this is just:
logger.debug("trace")
because the message parameter to debug in Scala has the type "=> String" which I think of as a parameter-less function that executes when it is evaluated, but which the documentation calls pass-by-name.
EDIT {
Functions in Scala are objects so there is an extra object here. For my work, the weight of a trivial object is worth removing the possibility of a log message getting needlessly evaluated.
}
This doesn't make the code faster but it does make it more likely to be faster and we're less likely to have the experience of going through and cleaning up other people's code en masse.
To me, this is a consistent theme within Scala.
Hard code fails to capture why Scala is faster though it does hint a bit.
I feel that it's a combination of code re-use and the ceiling of code quality in Scala.
In Java, awesome code is often forced to become an incomprehensible mess and so isn't really viable within production quality APIs as most programmers wouldn't be able to use it.
I have high hopes that Scala could allow the einsteins among us to implement far more competent APIs, potentially expressed through DSLs. The core APIs in Scala are already far along this path.

#higherkinded´s presentation on the subject - Scala Performance Considerations which does some Java/Scala comparisions.
Tools:
ScalaMeter
scala-benchmarking-template
Great blogpost:
Nanotrusting the Nanotime

Java and Scala both compile down to JVM bytecode, so the difference isn't that big. The best comparison you can get is probably on the computer language benchmarks game, which essentially says that Java and Scala both have the same memory usage. Scala is only slightly slower than Java on some of the benchmarks listed, but that could simply be because the implementation of the programs are different.
Really though, they're both so close it's not worth worrying about. The productivity increase you get by using a more expressive language like Scala is worth so much more than minimal (if any) performance hit.

The Java example is really not an idiom for typical application programs.
Such optimized code might be found in a system library method. But then it would use an array of the right type, i.e. File[] and would not throw an IndexOutOfBoundsException. (Different filter conditions for counting and adding).
My version would be (always (!) with curly braces because I don't like to spend an hour searching a bug which was introduced by saving the 2 seconds to hit a single key in Eclipse):
List<File> bigEnough = new ArrayList<File>();
for(String s : array) {
if(s.length() > 2) {
File file = mapping.get(s);
if (file != null) {
bigEnough.add(file);
}
}
}
But I could bring you a lot of other ugly Java code examples from my current project. I tried to avoid the common copy&modify style of coding by factoring out common structures and behaviour.
In my abstract DAO base class I have an abstract inner class for the common caching mechanism. For every concrete model object type there is a subclass of the abstract DAO base class, in which the inner class is subclassed to provide an implementation for the method which creates the business object when it is loaded from the database. (We can not use an ORM tool because we access another system via a proprietary API.)
This subclassing and instantiation code is not at all clear in Java and would be very readable in Scala.

Related

Why don't compilers use asserts to optimize? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Following pseudo-C++-code:
vector v;
... filling vector here and doing stuff ...
assert(is_sorted(v));
auto x = std::find(v, elementToSearchFor);
find has linear runtime, because it's called on a vector, which can be unsorted. But at that line in that specific program we know that either: The program is incorrect (as in: it doesn't run to the end if the assertion fails) or the vector to search for is sorted, therefore allowing a binary search find with O(log n). Optimizing it into a binary search should be done by a good compiler.
This is only the easiest worst case behavrior I found so far (more complex assertions may allow even more optimization).
Do some compilers do this? If yes, which ones? If not, why don't they?
Appendix: Some higher level languages may easily do this (especially in case of FP ones), so this is more about C/C++/Java/similar languages

Rice's Theorem basically states that non-trivial properties of code cannot be computed in general.
The relationship between is_sorted being true, and running a faster search is possible instead of a linear one, is a non-trivial property of the program after is_sorted is asserted.
You can arrange for explicit connections between is_sorted and the ability to use various faster algorithms. The way you communicate this information in C++ to the compiler is via the type system. Maybe something like this:
template<typename C>
struct container_is_sorted {
C c;
// forward a bunch of methods to `c`.
};
then, you'd invoke a container-based algorithm that would either use a linear search on most containers, or a sorted search on containers wrapped in container_is_sorted.
This is a bit awkward in C++. In a system where variables could carry different compiler-known type-like information at different points in the same stream of code (types that mutate under operations) this would be easier.
Ie, suppose types in C++ had a sequence of tags like int{positive, even} you could attach to them, and you could change the tags:
int x;
make_positive(x);
Operations on a type that did not actively preserve a tag would automatically discard it.
Then assert( {is sorted}, foo ) could attach the tag {is sorted} to foo. Later code could then consume foo and have that knowledge. If you inserted something into foo, it would lose the tag.
Such tags might be run time (that has cost, however, so unlikely in C++), or compile time (in which case, the tag-state of a given variable must be statically determined at a given location in the code).
In C++, due to the awkwardness of such stuff, we instead by habit simply note it in comments and/or use the full type system to tag things (rvalue vs lvalue references are an example that was folded into the language proper).
So the programmer is expected to know it is sorted, and invoke the proper algorithm given that they know it is sorted.

Well, there are two parts to the answer.
First, let's look at assert:
7.2 Diagnostics <assert.h>
1 The header defines the assert and static_assert macros and
refers to another macro,
NDEBUG
which is not defined by <assert.h>. If NDEBUG is defined as a macro name at the point in the source file where <assert.h> is included, the assert macro is defined simply as
#define assert(ignore) ((void)0)
The assert macro is redefined according to the current state of NDEBUG each time that <assert.h> is included.
2 The assert macro shall be implemented as a macro, not as an actual function. If the macro definition is suppressed in order to access an actual function, the behavior is undefined.
Thus, there is nothing left in release-mode to give the compiler any hint that some condition can be assumed to hold.
Still, there is nothing stopping you from redefining assert with an implementation-defined __assume in release-mode yourself (take a look at __builtin_unreachable() in clang / gcc).
Let's assume you have done so. Now, the condition tested could be really complicated and expensive. Thus, you really want to annotate it so it does not ever result in any run-time work. Not sure how to do that.
Let's grant that your compiler even allows that, for arbitrary expressions.
The next hurdle is recognizing what the expression actually tests, and how that relates to the code as written and any potentially faster, but under the given assumption equivalent, code.
This last step results in an immense explosion of compiler-complexity, by either having to create an explicit list of all those patterns to test or building a hugely-complicated automatic analyzer.
That's no fun, and just about as complicated as building SkyNET.
Also, you really do not want to use an asymptotically faster algorithm on a data-set which is too small for asymptotic time to matter. That would be a pessimization, and you just about need precognition to avoid such.

Assertions are (usually) compiled out in the final code. Meaning, among other things, that the code could (silently) fail (by retrieving the wrong value) due to such an optimization, if the assertion was not satisfied.
If the programmer (who put the assertion there) knew that the vector was sorted, why didn't he use a different search algorithm? What's the point in having the compiler second-guess the programmer in this way?
How does the compiler know which search algorithm to substitute for which, given that they all are library routines, not a part of the language's semantics?

You said "the compiler". But compilers are not there for the purpose of writing better algorithms for you. They are there to compile what you have written.
What you might have asked is whether the library function std::find should be implemented to potentially seek whether or not it can perform the algorithm other than using linear search. In reality it might be possible if the user has passed in std::set iterators or even std::unordered_set and the STL implementer knows detail of those iterators and can make use of it, but not in general and not for vector.
assert itself only applies in debug mode and optimisations are normally needed for release mode. Also, a failed assert causes an interrupt not a library switch.
Essentially, there are collections provided for faster lookup and it is up to the programmer to choose it and not the library writer to try to second guess what the programmer really wanted to do. (And in my opinion even less so for the compiler to do it).

In the narrow sense of your question, the answer is they do if then can but mostly they can't, because the language isn't designed for it and assert expressions are too complicated.
If assert() is implemented as a macro (as it is in C++), and it has not been disabled (by setting NDEBUG in C++) and the expression can be evaluated at compile time (or can be data traced) then the compiler will apply its usual optimisations. That doesn't happen often.
In most cases (and certainly in the example you gave) the relationship between the assert() and the desired optimisation is far beyond what a compiler can do without assistance from the language. Given the very low level of meta-programming capability in C++ (and Java) the ability to do this is quite limited.
In the wider sense I think what you're really asking for is a language in which the programmer can make asserts about the intention of the code, from which the compiler can choose between different translations (and algorithms). There have been experimental languages attempting to do that, and Eiffel had some features in that direction, but I'm now aware of any mainstream compiled languages that could do it.

Optimizing it into a binary search should be done by a good compiler.
No! A linear search results in a much more predictable branch. If the array is short enough, linear search is the right thing to do.
Apart from that, even if the compiler wanted to, the list of ideas and notions it would have to know about would be immense and it would have to do nontrivial logic on them. This would get very slow. Compilers are engineered to run fast and spit out decent code.
You might spend some time playing with formal verification tools whose job is to figure out everything they can about the code they're fed in, which asserts can trip, and so forth. They're often built without the same speed requirements compilers have and consequently they're much better at figuring things out about programs. You'll probably find that reasoning rigorously about code is rather harder than it looks at first sight.

Performance of Scala for Android

I just started learning Scala, and I'm having the time of my life. Today I also just heard about Scala for Android, and I'm totally hooked.
However, considering that Scala is kind of like a superset of Java in that it is Java++ (I mean Java with more stuff included) I am wondering exactly how the Scala code would work in Android?
Also, would the performance of an Android application be impacted if written with Scala? That is, if there's any extra work needed to interpret Scala code.

To explain a little bit #Aneesh answer -- Yes, there is no extra work to interpret Scala bytecode, because it is exactly the same bits and pieces as Java bytecode.
Note that there is also a Java bytecode => Dalvik bytecode step when you run code in Android.
But using the same bricks one can build a bikeshed and the other guy can build a townhall. For example, due to the fact that language encourages immutability Scala generates a lot of short living objects. For mature JVMs like HotSpot it is not a big deal for about a decade. But for Dalvik it is a problem (prior to recent versions object pooling and tight reuse of already created objects was one of the most common performance tips even for Java).
Next, writing val isn't the same as writing final Foo bar = .... Internally, this code is represented as a field + getter (unless you prefix val with private [this] which will be translated to the usual final field). var is translated into field + getter + setter. Why is it important?
Old versions of Android (prior to 2.2) had no JIT at all, so this turns out to about 3x-7x penalty compared to the direct field access. And finally, while Google instructs you to avoid inner classes and prefer packages instead, Scala will create a lot of inner classes even if you don't write so. Consider this code:
object Foo extends App {
List(1,2,3,4)
.map(x => x * 2)
.filter(x => x % 3 == 0)
.foreach(print)
}
How many inner classes will be created? You could say none, but if you run scalac you will see:
Foo$$anonfun$1.class // Map
Foo$$anonfun$2.class // Filter
Foo$$anonfun$3.class // Foreach
Foo$.class // Companion object class, because you've used `object`
Foo$delayedInit$body.class // Delayed init functionality that is used by App trait
Foo.class // Actual class
So there will be some penalty, especially if you write idiomatic Scala code with a lot of immutability and syntax sugar. The thing is that it highly depends on your deployment (do you target newer devices?) and actual code patterns (you always can go back to Java or at least write less idiomatic code in performance critical spots) and some of the problems mentioned there will be addressed by language itself (the last one) in the next versions.
Original picture source was Wikipedia.
See also Stack Overflow question Is using Scala on Android worth it? Is there a lot of overhead? Problems? for possible problems that you may encounter during development.

I've found this paper showing some benchmarks beetween the two languages:
http://cse.aalto.fi/en/midcom-serveattachmentguid-1e3619151995344619111e3935b577b50548b758b75/denti_ngmast.pdf
I've not read the entire article but in the end seems they will give a point to Java:
In conclusion, we feel that Scala will not play a major role in the
mobile application development in the future, due to the importance of
keeping a low energy consumption on the device. The strong point of
the Scala language is the way its components scale, which is not of
major importance in mobile devices where applications tend to not
scale at all and stay small.
Credits to Mattia Denti and Jukka K. Nurminen from Aalto University.

While Scala "will just work", because it is compiled to byte code just like Java, there are performance issues to consider. Idiomatic Scala code tends to create a lot more temporary objects, and the Dalvik VM doesn't take too kindly to these.
Here are some things you should be careful with when using Scala on Android:
Vectors, as they can be wasteful (always take up arrays of 32 items, even if you hold just one item)
Method chaining on collections - you should use .view wherever possible, to avoid creating redundant collections.
Boxing - Scala will box your primitives in various cases, like using Generics, using Option[Int], and in some anonymous functions.
for loops can waste memory, consider replacing them with while loops in sensitive sections
Implicit conversions - calls like str.indexWhere(...)will allocate a wrapper object on the string. Can be wasteful.
Scala's Map will allocate an Option[V] every time you access a key. I've had to replace it with Java's HashMap on occasion.
Of course, you should only optimize the places that are bottlenecks after using a profiler.
You can read more about the suggestions above in this blog post:
http://blogs.microsoft.co.il/dorony/2014/10/07/scala-performance-tips-on-android/

Scala compiler will create JVM byte code. So basically at the lower level, it's just like running Java. So the performance will be equivalent to Java.
However, how the Scala compiler creates the byte code may have some impact on performance. I'd say since Scala is new and due to possible inefficiency in its byte code generation, Scala would be a bit slower than Java on Android, though not by a lot.

Performance of reflection: quality byte code in JVM

Edit 2:
Does a program with a fully object-oriented implementation give high performance? Most of the framework is written with full power of it. However, reflection is also heavily used to achieve it like for AOP and dependency injection. Use of reflection affects the performance to a certain extent.
So, Is it good practice to use reflection? Is there some alternative to reflection from programming language constructs? To what extent should reflection be used?

Reflection is, in itself and by nature, slow. See this question for more details.
This is caused by a few reasons. Jon Skeet explains it nicely:
Check that there's a parameterless constructor Check the accessibility
of the parameterless constructor Check that the caller has access to
use reflection at all Work out (at execution time) how much space
needs to be allocated Call into the constructor code (because it won't
know beforehand that the constructor is empty)
Basically, reflection has to perform all the above steps before invocation, whereas normal method invocation has to do much less.
The JITted code for instantiating B is incredibly lightweight.
Basically it needs to allocate enough memory (which is just
incrementing a pointer unless a GC is required) and that's about it -
there's no constructor code to call really; I don't know whether the
JIT skips it or not but either way there's not a lot to do.
With that said, there are many cases where Java is not dynamic enough to do what you want, and reflection provides a simple and clean alternative. Consider the following scenario:
You have a large number of classes which represent various items, i.e. a Car, Boat, and House.
They both extend/implement the same class: LifeItem.
Your user inputs one of 3 strings, "Car", "Boat", or "House".
Your goal is to access a method of LifeItem based on the parameter.
The first approach that comes to mind is to build an if/else structure, and construct the wanted LifeItem. However, this is not very scalable and can become very messy once you have dozens of LifeItem implementations.
Reflection can help here: it can be used to dynamically construct a LifeItem object based on name, so a "Car" input would get dispatched to a Car constructor. Suddenly, what could have been hundreds of lines of if/else code turns into a simple line of reflection. The latter scenario would not be as valid on a Java 7+ platform due to the introduction of switch statements with Strings, but even then then a switch with hundreds of cases is something I'd want to avoid. Here's what the difference between cleanliness would look like in most cases:
Without reflection:
public static void main(String[] args) {
String input = args[0];
if(input.equals("Car"))
doSomething(new Car(args[1]));
else if(input.equals("Boat"))
doSomething(new Boat(args[1]));
else if (input.equals("House"))
doSomething(new House(args[1]));
... // Possibly dozens more if/else statements
}
Whereas by utilizing reflection, it could turn into:
public static void main(String[] args) {
String input = args[0];
try {
doSomething((LifeItem)Class.forName(input).getConstructor(String.class).newInstance(args[1]));
} catch (Exception ie) {
System.err.println("Invalid input: " + input);
}
}
Personally, I'd say the latter is neater, more concise, and more maintainable than the first. In the end its a personal preference, but that's just one of the many cases where reflection is useful.
Additionally, when using reflection, you should attempt to cache as much information as possible. In other words employ simple, logical things, like not calling get(Declared)Method everywhere if you can help it: rather, store it in a variable so you don't have the overhead of refetching the reference whenever you want to use it.
So those are the two extremes of the pro's and con's of reflection. To sum it up if reflection improves your code's readability (like it would in the presented scenario), by all means go for it. And if you do, just think about reducing the number of get* reflection calls: those are the easiest to trim.

While reflection is most expensive than "traditional code", premature optimization is the root of all evil. From a decade-long empirical evidence, I assume that a method invoked via reflection will hardly affect performance unless it is invoked from a heavy loop, and even so there have been some performance enhancements on reflection:
Certain reflective operations, specifically Field, Method.invoke(),
Constructor.newInstance(), and Class.newInstance(), have been
rewritten for higher performance. Reflective invocations and
instantiations are several times faster than in previous releases
Enhancements in J2SDK 1.4 -
Note that method lookup (i.e. Class.getMethod) is not mentioned above, and choosing the right Method object usually requires additional steps such as traversing the class hierarchy while asking for the "declared method" in case that it is not public), so I tend to save the found Method in a suitable map whenever it is possible, so that the next time the cost would be only that of a Map.get() and Method.invoke(). I guess that any well-written framework can handle this correctly.
One should also consider that certain optimizations are not possible if reflection is used (such as method inlining or escape analysis. Java HotSpot™ Virtual Machine Performance Enhancements). But this doesn't mean that reflection has to be avoided at all cost.
However, I think that the decision of using reflection should be based in other criteria, such as code readability, maintainability, design practices, etc. When using reflection in your own code (as opposed to using a framework that internally uses reflection), one risk transforming compile-time errors into run-time errors, which are harder to debug. In some cases, one could replace the reflective invocation by a traditional OOP pattern such as Command or Abstract Factory.

I can give you one example (but sorry, I can't show you the test results, because it was few months ago). I wrote an XML library (custom project oriented) which replaced some old DOM parser code with classes + annotations. My code was half the size of the original. I did tests, and yes, reflection was more expensive, but not much (something like 0.3 seconds out of 14-15 seconds of executing (loss is about 2%)). In places, where code is executed infrequently, reflection can be used with a small performance loss.
Moreover, I am sure, that my code can be improved for better performance.
So, I suggest these tips:
Use reflection if you can do it in a way that is beautiful, compact & laconic;
Do not use reflection if your code will be executed many-many times;
Use reflection, if you need to project a huge amount of information from another source (XML-files, for example) to Java application;
The best usage for reflections and annotations is where code is executed only once (pre-loaders).

Metaprogramming - self explanatory code - tutorials, articles, books [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking into improving my programming skils (actually I try to do my best to suck less each year, as our Jeff Atwood put it), so I was thinking into reading stuff about metaprogramming and self explanatory code.
I am looking for something like an idiot's guide to this (free books for download, online resources). Also I want more than your average wiki page and also something language agnostic or preferably with Java examples.
Do you know of such resources that will allow to efficiently put all of it into practice (I know experience has a lot to say in all of this but i kind of want to build experience avoiding the flow bad decisions - experience - good decisions)?
EDIT:
Something of the likes of this example from the Pragmatic Programmer:
...implement a mini-language to control a simple drawing package... The language consists of single-letter commands. Some commands are followed by a single number. For example, the following input would draw a rectangle:
P 2 # select pen 2
D # pen down
W 2 # draw west 2cm
N 1 # then north 1
E 2 # then east 2
S 1 # then back south
U # pen up
Thank you!

Welcome to the wonderful world of meta-programming :) Meta programming relates actually to many things. I will try to list what comes to my mind:
Macro. The ability to extend the syntax and semantics of a programming language was explored first under the terminology macro. Several languages have constructions which resemble to macro, but the piece of choice is of course Lisp. If you are interested in meta-programming, understanding Lisp and the macro system (and the homoiconic nature of the languge where code and data have the same representation) is definitively a must. If you want a Lisp dialect that runs on the JVM, go for Clojure. A few resources:
Clojure mini language
Beating the Averages (why Lisp is a secret weapon)
There is otherwise plenty of resource about Lisp.
DSL. The ability to extend one language syntax and semantics is now rebranded under the term "DSL". The easiest way to create a DSL is with the interpreter pattern. Then come internal DSL with fluent interface and external DSL (as per Fowler's terminology). Here is a nice video I watched recently:
DSL: what, why, how
The other answers already pointed to resources in this area.
Reflection. Meta-programming is also inseparable form reflection. The ability to reflect on the program structure at run-time is immensely powerful. It's important then to understand what introspection, intercession and reification are. IMHO, reflection permits two broad categories of things: 1. the manipulation of data whose structure is not known at compile time (the structure of the data is then provided at run-time and the program stills works reflectively). 2. powerful programming patterns such as dynamic proxy, factories, etc. Smalltalk is the piece of choice to explore reflection, where everything is reflective. But I think Ruby is also a good candidate for that, with a community that leverage meta programming (but I don't know much about Ruby myself).
Smalltalk: a reflective language
Magritte: a meta driven approach to empower developpers and end-users
There is also a rich literature on reflection.
Annotations. Annotations could be seen as a subset of the reflective capabilities of a language, but I think it deserves its own category. I already answered once what annotations are and how they can be used. Annotations are meta-data that can be processed at compile-time or at run-time. Java has good support for it with the annotation processor tool, the Pluggable Annotation Processing API, and the mirror API.
Byte-code or AST transformation. This can be done at compile-time or at run-time. This is somehow are low-level approach but can also be considered a form of meta-programming (In a sense, it's the same as macro for non-homoiconic language.)
DSL with Groovy (There is an example at the end that shows how you can plug your own AST transformation with annotations).
Conclusion: Meta-programming is the ability for a program to reason about itself or to modify itself. Just like meta stack overflow is the place to ask question about stack overflow itself. Meta-programming is not one specific technique, but rather an ensemble of concepts and techniques.
Several things fall under the umbrella of meta-programming. From your question, you seem more interested in the macro/DSL part. But everything is ultimately related, so the other aspects of meta-programming are also definitively worth looking at.
PS: I know that most of the links I've provided are not tutorials, or introductory articles. These are resources that I like which describe the concept and the advantages of meta-programming, which I think is more interesting

I've mentioned C++ template metaprogramming in my comment above. Let me therefore provide a brief example using C++ template meta-programming. I'm aware that you tagged your question with java, yet this may be insightful. I hope you will be able to understand the C++ code.
Demonstration by example:
Consider the following recursive function, which generates the Fibonacci series (0, 1, 1, 2, 3, 5, 8, 13, ...):
unsigned int fib(unsigned int n)
{
return n >= 2 ? fib(n-2) + fib(n-1) : n;
}
To get an item from the Fibonacci series, you call this function -- e.g. fib(5) --, and it will compute the value and return it to you. Nothing special so far.
But now, in C++ you can re-write this code using templates (somewhat similar to generics in Java) so that the Fibonacci series won't be generated at run-time, but during compile-time:
// fib(n) := fib(n-2) + fib(n-1)
template <unsigned int n>
struct fib // <-- this is the generic version fib<n>
{
static const unsigned int value = fib<n-2>::value + fib<n-1>::value;
};
// fib(0) := 0
template <>
struct fib<0> // <-- this overrides the generic fib<n> for n = 0
{
static const unsigned int value = 0;
};
// fib(1) := 1
template <>
struct fib<1> // <-- this overrides the generic fib<n> for n = 1
{
static const unsigned int value = 1;
};
To get an item from the Fibonacci series using this template, simply retrieve the constant value -- e.g. fib<5>::value.
Conclusion ("What does this have to do with meta-programming?"):
In the template example, it is the C++ compiler that generates the Fibonacci series at compile-time, not your program while it runs. (This is obvious from the fact that in the first example, you call a function, while in the template example, you retrieve a constant value.) You get your Fibonacci numbers without writing a function that computes them! Instead of programming that function, you have programmed the compiler to do something for you that it wasn't explicitly designed for... which is quite remarkable.
This is therefore one form of meta-programming:
Metaprogramming is the writing of computer programs that write or manipulate other programs (or themselves) as their data, or that do part of the work at compile time that
would otherwise be done at runtime.
-- Definition from the Wikipedia article on metaprogramming, emphasis added by me.
(Note also the side-effects in the above template example: As you make the compiler pre-compute your Fibonacci numbers, they need to be stored somewhere. The size of your program's binary will increase proportionally to the highest n that's used in expressions containing the term fib<n>::value. On the upside, you save computation time at run-time.)

From your example, it seems you are talking about domain specific languages (DSLs), specifically, Internal DSLs.
Here is a large list of books about DSL in general (about DSLs like SQL).
Martin Fowler has a book that is a work in progress and is currently online.
Ayende wrote a book about DSLs in boo.
Update: (following comments)
Metaprogramming is about creating programs that control other programs (or their data), sometimes using a DSL. In this respect, batch files and shell scripts can be considered to be metaprogramming as they invoke and control other programs.
The example you have shows a DSL that may be used by a metaprogram to control a painting program.

Tcl started out as a way of making domain-specific languages that didn't suck as they grew in complexity to the point where they needed to get generic programming capabilities. Moreover, it remains very easy to add in your own commands precisely because that's still an important use-case for the language.
If you're wanting an implementation integrated with Java, Jacl is an implementation of Tcl in Java which provides scriptability focussed towards DSLs and also access to access any Java object.
(Metaprogramming is writing programs that write programs. Some languages do it far more than others. To pick up on a few specific cases, Lisp is the classic example of a language that does a lot of metaprogramming; C++ tends to relegate it to templates rather that permitting it at runtime; scripting languages all tend to find metaprogramming easier because their implementations are written to be more flexible that way, though that's just a matter of degree..)

Well, in the Java ecosystem, i think the simplest way to implement a mini-language is to use scripting languages, like Groovy or Ruby (yes, i know, Ruby is not a native citizen of the java ecosystem). Both offer rather good DSL specification mechanism, that will allow you to do that with far more simplicity than the Java language would :
Writing DSL in Groovy
Creating Ruby DSL
There are however pure Java laternatives, but I think they'll be a little harder to implement.

You can have a look at the eclipse modeling project, they've got support for meta-models.

There's a course on Pluralsight about Metaprogramming which might be a good entry point https://app.pluralsight.com/library/courses/understanding-metaprogramming/table-of-contents

Besides dynamic typing, what makes Ruby "more flexible" than Java? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've been using Java almost since it first came out but have over the last five years gotten burnt out with how complex it's become to get even the simplest things done. I'm starting to learn Ruby at the recommendation of my psychiatrist, uh, I mean my coworkers (younger, cooler coworkers - they use Macs!). Anyway, one of the things they keep repeating is that Ruby is a "flexible" language compared to older, more beaten-up languages like Java but I really have no idea what that means. Could someone explain what makes one language "more flexible" than another? Please. I kind of get the point about dynamic typing and can see how that could be of benefit for conciseness. And the Ruby syntax is, well, beautiful. What else? Is dynamic typing the main reason?

Dynamic typing doesn't come close to covering it. For one big example, Ruby makes metaprogramming easy in a lot of cases. In Java, metaprogramming is either painful or impossible.
For example, take Ruby's normal way of declaring properties:
class SoftDrink
attr_accessor :name, :sugar_content
end
# Now we can do...
can = SoftDrink.new
can.name = 'Coke' # Not a direct ivar access — calls can.name=('Coke')
can.sugar_content = 9001 # Ditto
This isn't some special language syntax — it's a method on the Module class, and it's easy to implement. Here's a sample implementation of attr_accessor:
class Module
def attr_accessor(*symbols)
symbols.each do |symbol|
define_method(symbol) {instance_variable_get "##{symbol}"}
define_method("#{symbol}=") {|val| instance_varible_set("##{symbol}", val)}
end
end
end
This kind of functionality allows you a lot of, yes, flexibility in how you express your programs.
A lot of what seem like language features (and which would be language features in most languages) are just normal methods in Ruby. For another example, here we dynamically load dependencies whose names we store in an array:
dependencies = %w(yaml haml hpricot sinatra couchfoo)
block_list %w(couchfoo) # Wait, we don't really want CouchDB!
dependencies.each {|mod| require mod unless block_list.include? mod}

It's also because it's a classless (in the Java sense) but totally object oriented (properties pattern) so you can call any method, even if not defined, and you still get a last chance to dynamically respond to the call, for example creating methods as necessarry on the fly. Also Ruby doesn't need compilation so you can update a running application easily if you wanted to. Also an object can suddenly inherit from another class/object at anytime during it's lifetime through mixins so it's another point of flexibility. Anyways I agree with the kids that this language called Ruby , which has actually been around as long as Java, is very flexible and great in many ways, but I still haven't been able to agree it's beatiful (syntax wise), C is more beatiful IMHO (I'm a sucker for brackets), but beauty is subjective, the other qualities of Ruby are objective

Blocks, closures, many things. I'm sure some much better answers will appear in the morning, but for one example here's some code I wrote ten minutes ago - I have an array of scheduled_collections, some of which have already happened, others which have been voided, canceled, etc. I want to return an array of only those that are pending. I'm not sure what the equivalent Java would be, but I imagine it's not this one-line method:
def get_all_pending
scheduled_collections.select{ |sc| sc.is_pending? }
end
A simpler example of the same thing is:
[0,1,2,3].select{|x| x > 1}
Which will produce [2,3]

Things I like
less code to get your point across
passing around code blocks (Proc, lambdas) is fun and can result in tinier code. e.g. [1, 2, 3].each{|x| puts "Next element #{x}"}
has the scripting roots of PERL.. very nice to slice n dice routine stuff like parsing files with regexps, et. all
the core data structure class API like Hash and Array is nicely done.
Metaprogramming (owing to its dynamic nature) - ability to create custom DSLs (e.g. Rails can be termed a DSL for WebApps written in Ruby)
the community that is spawning gems for just about anything.

Mixins. Altering a Ruby class to add new functionality is trivially easy.

Duck typing refers to the fact when types are considered equivalent by what methods them implement, not based on their declared type. To take a concrete example, many methods in Ruby take a IO-like object to operate on a stream. This means that the object has to implement enough functions to be able to pass as an IO type object (it has to sound enough like a duck).
In the end it means that you have to write less code than in Java to do the same thing. Everything is not great about dynamic languages, though. You more or less give up all of the compile-time typechecking that Java (and other strongly/statically typed languages) gives you. Ruby simply has no idea if you're about to pass the wrong object to a method; that will give you a runtime error. Also, it won't give you a runtime error until the code is actually called.

Just for laughs, a fairly nasty example of the flexibility of the language:
class Fixnum
def +(other)
self - other
end
end
puts 5 + 3
# => 2

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.