Java vs Scala Types Hierarchy

Java vs Scala Types Hierarchy - java

Currently, I am learning Scala and I noticed that Type Hierarchy in Scala is much more consistent. There is Any type which is really super type of all types, unlike Java Object which is only a super type of all reference types.
Java Examples
Java approach led to introduction of Wrapper Classes for primitives, Auto-boxing. It also led to having types which cannot be e.g. keys in HashMaps. All those things adds to complexity of the language.
Integer i = new Integer(1); // Is it really needed? 1 is already an int.
HashMap<int, String> // Not valid, as int is not an Object sub type.
Question
It seems like a great idea to have all types in one hierarchy. This leads to the question: Why there is no single hierarchy of all types in Java? There is division between primitive types and reference types. Does it have some advantages? Or was it bad design decision?

That's a rather broad question.
Many different avenues exist to explain this. I'll try to name some of them.
Java cares about older code; scala mostly does not.
Programming languages are in the end defined by their community; a language that nobody but you uses is rather handicapped, as you're forced to write everything yourself. That does mean that the way the community tends to do things does reflect rather strongly on whether a language is 'good' or not. The java community strongly prefers reasonable backwards compatibility (reasonable as in: If there is a really good reason not to be entirely backwards compatible, for example because the feature you're breaking is very rarely used, or almost always used in ways that's buggy anyway, that's okay), the scala community tends to flock from one hip new way of doing things to the other, and any library that isn't under very active development either does not work at all anymore or trying to integrate it into more modern scala libraries is a very frustrating exercise.
Java doesn't work like that. This can be observed, for example, in generics: Generics (the T in List<T>) weren't in java 1.0; they were introduced in java 1.5, and they were introduced in a way that all existing code would just continue to work fine, all libraries, even without updates, would work allright with newer code, and adapting existing code to use generics did not require picking new libraries or updating much beyond adding the generics to the right places in the source file.
But that came at a cost: erasure. And because the pre-1.5 List class worked with Objects, generics had to work with Object as an implicit bound. Java could introduce an Any type but it would be mostly useless; you couldn't use it in very many places.
Erasure means that, in java, generics are mostly a figment of the compiler's imagination: That's why, given an instance of a list, you cannot ask it what its component type is; it simply does not know. You can write List<String> x = ...; String y = x.get(0); and that works fine, but that is because the compiler injects an invisible cast for you, and it knows this cast is fine because the generics give the compiler a framework to judge that this cast will never cause a ClassCastException (barring explicit attempts to mess with it, which always comes with a warning from the compiler when you do). But you can't cast an Object to an int, and for good reason.
The Scala community appears to be more accepting of a new code paradigm that just doesn't interact with the old; they'll refactor all their code and leave some older library by the wayside more readily.
Java is more explicit than scala is.
Scalac will infer tons of stuff, that's more or less how the language is designed (see: implicit). For some language features you're forced to just straight up make a call: You're trading clarity for verbosity. There where you are forced to choose, java tends to err on the side of clarity. This shows up specifically when we're talking about silent heap and wrapper hoisting: Java prefers not to do it. Yes, there's auto-boxing (which is silent wrapping), but silently treating int which, if handled properly, is orders of magnitude faster than a wrapped variant, as the wrapped variant for an entire collection just so you can write List<int> is a bridge too far for java: Presumably it would be too difficult to realize that you're eating all the performance downsides.
That's why java doesn't 'just' go: Eh, whatever, we'll introduce an Any type and tape it all together at runtime by wrapping stuff silently.
primitives are performant.
In java (and as scala runs on the JVM, scala too), there are really only 9 types: int, long, double, short, float, boolean, char, byte, and reference. As in, when you have an int variable, in memory, it is literally just that value, but if you have a String variable, the string lives in the heap someplace and the value you're passing around everywhere is a pointer to it. Given that you can't directly print the pointer or do arithmetic on it, in java we like to avoid the term and call it a 'reference' instead, but make no mistake: That's just a pointer with another name.
pointers are inherently memory wasting and less performant. There are excellent reasons for this tradeoff, but it is what it is. However, trying to write code that can deal with a direct value just as well as a reference is not easy. Moving this complexity into your face by making it relatively difficult to writing code that is agnostic (which is what the Any type is trying to accomplish) is one way to make sure the programmers don't ever get confused about it.
The future
Add up the 3 things above and hopefully it is now clear that an Any type either causes a lot of downsides, or, that it would be mostly useless (you couldn't use it anywhere).
However, there is good news on the horizon. Google for 'Project Valhalla' and 'java value types'. This is a difficult endeavour that will attempt to allow a lot of what an Any type would bring you, including, effectively, primitives in generics. In a way that integrates with existing java code, just like how java's approach to closures meant that java did not need to make scala's infamous Function8<A,B,C,D,E,F,G,H,R> and friends. Doing it right tends to be harder, so it took quite a while, and project valhalla isn't finished yet. But when it is, you WILL be able to write List<int> list = new ArrayList<int>, AND use Any types, and it'll all be as performant as can be, and integrate with existing java code as best as possible. Project Valhalla is not part of JDK14 and probably won't make 15 either.

Related

How to get a field reference in java?

i am trying to get a compile time safe field reference in java, done not with reflection and Strings, but directly referencing the field. Something like
MyClass::myField
I have tried the usual reflection way, but you need to reference the fields as strings, and this is error prone in case of a rename, and will not throw a compile time error
EDIT: just want to clarify that my end goal is to get the field NAME for entity purposes, such as reference the entity field in a query, and not the value

Unfortunately, you might as well want to wish for a unicorn. The notion of 'a field reference', in the sense that you are asking for, simply isn't part of java-the-language.
That MyClass::myThing syntax works only for methods. There's simply no such thing for fields. It's unfortunate.
It's very difficult to give objective reasons for the design decisions of any language; it either requires spelunking through the designer's collective heads which requires magic or science fiction, or asking them to spill the beans, which they're probably not going to do in a stack overflow question. Sometimes (and more recent java features, such as this one), design is debated in public. Specifically, you can search for the openjdk lamba-dev mailing list where no doubt this question was covered. You'll need to go through, and I'm not exaggerating, tens of thousands of posts, but, the good news is, it's searchable.
But, I can guess / dig through my own memory as I spent some time discussing Project Lambda as it was designed:
Direct field access isn't common in the java ecosystem. The language allows direct field access but few java programs are written that way, so why make a language feature that would only be immediately useful and familiar to an exotic bunch.
The infrastructure required is also rather significant - a method lambda isn't allowed to be written in java unless you use it in a context that makes it possible for the compiler to 'treat' the lambda as a type - specifically, a #FunctionalInterface - any interface that contains exactly 1 method (other than methods that already exist in j.l.Object itself). In other words, this is fine:
Function<String, String> f = String::toLowerCase;
But this is not:
Object o = String::toLowerCase;
So, let's imagine for a moment that field refs did exist. What does that mean? What is the 'type' of the expression MyClass::myField? Perhaps a new concept: An interface with 2 methods; one of them takes no arguments and returns a T, the other wants a T and returns nothing (to match the act of reading the field, and writing it), but where it's also acceptable if it's a FunctionalInterface that is either one of those, perhaps? That sounds complicated.
The general mindset of the java design team right now (and has been for a while) is not to overcomplicate matters: Do not add features unless you have a good reason. After all, if it turns out that the community really clamours for field refs, they can be added. But, if on the other hand, they were added but nobody uses them, they can't be removed (and thus you've now permanently made the language more complicated and reduced room for future language features for a thing nobody uses and which most style guides tell you to actively avoid).
That's, I'm pretty sure, why they don't exist.

Can a language ever have compile-time checking but the characteristics of dynamic typing?

Upon reading the following:
A lot of people define static typing and dynamic typing with respect
to the point at which the variable types are checked. Using this
analogy, static typed languages are those in which type checking is
done at compile-time, whereas dynamic typed languages are those in
which type checking is done at run-time.
This analogy leads to the analogy we used above to define static and
dynamic typing. I believe it is simpler to understand static and
dynamic typing in terms of the need for the explicit declaration of
variables, rather than as compile-time and run-time type checking.
Source
I was thinking that the two ways we define static and dynamic typing: compile-time checking and explicit type declaration are a bit like apples and oranges. A characteristic in all statically typed languages (from my knowledge) is the reference variables have a defined type. Can there be a language that has the benefits of compile-time checking (like Java) but also the ability to have variables unbounded to a specific type (like Python)?
Note: Not exactly type inference in a language like Java, because the variables are still assigned a type, just implicitly. This theoretical language wouldn't have reference types, so there would be no casting. I'm trying to avoid the use of "static typing" vs "dynamic typing" because of the confusion.

There could be, but should there be?
Imagine in hypothetical-pseudo-C++:
class Object
{
public:
virtual Object invoke(const char *name, std::list<Object> args);
virtual Object get_attr(const char *name);
virtual const Object &set_attr(const char *name, const Object &src);
};
And that you have a language that arranges:
to make Object class the root base class of all classes
syntactic sugar to turn blah.frabjugate() into blah.invoke("frabjugate") and
blah.x = 10 into blah.set_attr("x", 10)
Add to this something combining attributes of boost::variant and boost::any and you have a pretty good start. All the dynamicism (both good and runtime bugs bad) of Python with the eloquence and rigidity (yay!) of C++ or Java. With added run-time bloat and efficiency of hash-table lookups vs. call/jmp machine instructions.
In languages like Python, when you call blah.do_it() it has to do potentially multiple hash table lookups of the string "do_it" to find out if your instance blah or its class has a callable thing called "doit" every time it is called. This is the most extreme late-binding that could be imaged:
flarg.do_it() # replaces flarg.do_it()
flarg.do_it() # calls a different flarg.do_it()
You could have your hypothetical language give some control over when the binding occurs. C++-like standard methods are crudely static bound to the apparent reference type, not the real instance type. C++ virtual methods are late-bound to the object instance type. Python-like attributes and methods are extremely late bound to the current version of the object instance.
I think you could definitely program in a strong static typed language in a dynamic style, just as you could build an interpreter in a language like C++ or Java. Some syntax hooks could make it look a little more seamless. But maybe you could do the same in reverse: maybe a Python decorator that automatically checks argument types, or a MetaClass that does it at compile time? [no, I don't think this is possible...]
I think you should view it as a union of features. but you'd get both the best and the worst of both worlds...

Can there be a language that has the benefits of compile-time checking (like Java) but also the ability to have variables unbounded to a specific type (like Python)?
Actually mostly language have support for both, so yes. The difference is which form is preferred/easier and generally used. Java prefers static types but also supports dynamic casts and reflection.
This theoretical language wouldn't have reference types, so there would be no casting.
You have to consider that language also need to perform reasonably well so you have to consider how they will be implemented. You could have a super type but this makes optimisation very hard and you code will most likely either run slowly or use much more resources.
The more popular languages tend to make pragmatic implementation choices. They are not purely one type or another and are willing to borrow styles even if they don't handle them as cleanly as a "pure" language.
what exactly do they allow the compiler or programmer to do that dynamic types can't?
It is generally accepted that the quicker you find a bug, the cheaper it is to fix. When you first start programming, the cost of maintenance isn't high in your mind, but once you have much more experience you will realise that a successful project costs far more to maintain than it did to develop and fixing long standing bugs can be really costly.
static languages have two advantages
you pick up bugs sooner rather than later. The sooner the better. With dynamic languages you might never discover a bug if the code is never run.
the cost of maintenance is easier. Static languages make clearer the assumption made when the code was first written and are more likely to detect issues if you don't have enough test coverage (btw, you never have enough test coverage)

No you cannot. The difference here boils down to early binding versus late binding. Early binding means matching everything up on the binary level upfront, fixing it in code. The result is rigid, type-safe and fast code. Late binding means there is some kind of runtime interpretation involved. This results in flexiblility (potentially unsafe) at the cost of performance.
The two approaches are different on a technical level (compilation versus interpretation) and the programmer would have to choose which is desired when, which would defeat the benefit of having both in the first place.
In languages that use a (common) language runtime however you do get some of what you are asking for through reflection. But it is organized differently and still type-safe. It is not the implicit kind of binding you refer to but requires a bit of work and awareness from the programmer.

As far as what is possible with static types that is impossible with dynamic types: nothing. They are both Turing complete
The value of static types is finding bugs early. In Python, something as simple as a misspelled name isn't caught until you run the program, and even then only if the line of code with the misspelling is run.
class NuclearReactor():
def turn_power_off(self):
...
def shut_down_cleanly(self):
self.turn_power_of()

Performance of Scala for Android

I just started learning Scala, and I'm having the time of my life. Today I also just heard about Scala for Android, and I'm totally hooked.
However, considering that Scala is kind of like a superset of Java in that it is Java++ (I mean Java with more stuff included) I am wondering exactly how the Scala code would work in Android?
Also, would the performance of an Android application be impacted if written with Scala? That is, if there's any extra work needed to interpret Scala code.

To explain a little bit #Aneesh answer -- Yes, there is no extra work to interpret Scala bytecode, because it is exactly the same bits and pieces as Java bytecode.
Note that there is also a Java bytecode => Dalvik bytecode step when you run code in Android.
But using the same bricks one can build a bikeshed and the other guy can build a townhall. For example, due to the fact that language encourages immutability Scala generates a lot of short living objects. For mature JVMs like HotSpot it is not a big deal for about a decade. But for Dalvik it is a problem (prior to recent versions object pooling and tight reuse of already created objects was one of the most common performance tips even for Java).
Next, writing val isn't the same as writing final Foo bar = .... Internally, this code is represented as a field + getter (unless you prefix val with private [this] which will be translated to the usual final field). var is translated into field + getter + setter. Why is it important?
Old versions of Android (prior to 2.2) had no JIT at all, so this turns out to about 3x-7x penalty compared to the direct field access. And finally, while Google instructs you to avoid inner classes and prefer packages instead, Scala will create a lot of inner classes even if you don't write so. Consider this code:
object Foo extends App {
List(1,2,3,4)
.map(x => x * 2)
.filter(x => x % 3 == 0)
.foreach(print)
}
How many inner classes will be created? You could say none, but if you run scalac you will see:
Foo$$anonfun$1.class // Map
Foo$$anonfun$2.class // Filter
Foo$$anonfun$3.class // Foreach
Foo$.class // Companion object class, because you've used `object`
Foo$delayedInit$body.class // Delayed init functionality that is used by App trait
Foo.class // Actual class
So there will be some penalty, especially if you write idiomatic Scala code with a lot of immutability and syntax sugar. The thing is that it highly depends on your deployment (do you target newer devices?) and actual code patterns (you always can go back to Java or at least write less idiomatic code in performance critical spots) and some of the problems mentioned there will be addressed by language itself (the last one) in the next versions.
Original picture source was Wikipedia.
See also Stack Overflow question Is using Scala on Android worth it? Is there a lot of overhead? Problems? for possible problems that you may encounter during development.

I've found this paper showing some benchmarks beetween the two languages:
http://cse.aalto.fi/en/midcom-serveattachmentguid-1e3619151995344619111e3935b577b50548b758b75/denti_ngmast.pdf
I've not read the entire article but in the end seems they will give a point to Java:
In conclusion, we feel that Scala will not play a major role in the
mobile application development in the future, due to the importance of
keeping a low energy consumption on the device. The strong point of
the Scala language is the way its components scale, which is not of
major importance in mobile devices where applications tend to not
scale at all and stay small.
Credits to Mattia Denti and Jukka K. Nurminen from Aalto University.

While Scala "will just work", because it is compiled to byte code just like Java, there are performance issues to consider. Idiomatic Scala code tends to create a lot more temporary objects, and the Dalvik VM doesn't take too kindly to these.
Here are some things you should be careful with when using Scala on Android:
Vectors, as they can be wasteful (always take up arrays of 32 items, even if you hold just one item)
Method chaining on collections - you should use .view wherever possible, to avoid creating redundant collections.
Boxing - Scala will box your primitives in various cases, like using Generics, using Option[Int], and in some anonymous functions.
for loops can waste memory, consider replacing them with while loops in sensitive sections
Implicit conversions - calls like str.indexWhere(...)will allocate a wrapper object on the string. Can be wasteful.
Scala's Map will allocate an Option[V] every time you access a key. I've had to replace it with Java's HashMap on occasion.
Of course, you should only optimize the places that are bottlenecks after using a profiler.
You can read more about the suggestions above in this blog post:
http://blogs.microsoft.co.il/dorony/2014/10/07/scala-performance-tips-on-android/

Scala compiler will create JVM byte code. So basically at the lower level, it's just like running Java. So the performance will be equivalent to Java.
However, how the Scala compiler creates the byte code may have some impact on performance. I'd say since Scala is new and due to possible inefficiency in its byte code generation, Scala would be a bit slower than Java on Android, though not by a lot.

Keeping track of what's in a Collection in pre-generics Java?

For a bunch of reasons that (believe it or not) are not as unsound as you may think, we are still (sigh) using Java 1.4 to build and run our code (though we plan to finally move to Java 7 by the end of the year).
Our existing code that uses Collection classes doesn't do a very good job of making it clear what is expected to be in the Collection. Obviously, you can read the code and see what the downcasts end up being done and infer from that, but you can't just look at a method declaration and know what the Collection object that is a method argument or method return value actually holds.
In new code that I'm writing and when I am in older code that uses Collections, I've been adding in-line comments to Collections declarations to show what would have been declared if generics were being used. For example:
Map/*<String, Set<Integer>>*/ theMap = new HashMap/*<String, Set<Integer>>*/();
or
List/*<Actions>*/ someMethod(List/*<Job>*/ jobs);
In keeping with the frowning at subjectivity here at SO, rather than asking what you think of this (though admittedly I'd like to know -- I do find it a bit ugly but still like having the type info there) I'd instead just ask what, if anything, you do to make it clear what is being held by pre-generics Collection objects.

What we recommended back in the old days -- and I was a Java Architect at Sun when Java 1.1 was the New Thing -- was to write a class around the structure (I don't think 1.1 even had Collection as a base class) so that the typecasts happned in code you control instead of in user code. So, for example, something like
public class ArrayOfFoo {
Object [] ary; // ctor left as exercise
public void set(int index, Foo value){
ary[index] = (Object) value; // cast strictly not needed, any Foo is an Object
}
public void get(int index){
return (Foo) ary[index]; // cast needed, not every Object is a Foo
}
}
Sounds like the code base you have isn't built to this convention; if you're writing new code, there's no reason you can't start. Failing that, your convention isn't bad, but it's easy to forget the cast and then have to search to find out why you're getting a bad cast exception. It's mildly better to resort of some variant on Hungarian notation, or the Smalltalk 'aVariable' convention, by encoding the type in the names, so that you use
Object fooAry = new Object[aZillion];
fooAry[42] = new Foo();
Foo aFoo = fooAry[42];

Use clear variable identifiers such as jobList, actionList, or dictionaryMap. If you're concerned with the type of objects they contain, you could even make it a convention to always let the identifier of a Collection hint about which type of objects it holds.
The inlined comments aren't that idea actually. When I ported a 1.5 project back to 1.4 I did just that (instead of removing the type parameters). It worked out quite well.

I'd recommend writing tests. For various reasons:
You should be writing tests anyway!
You can assert the type of a collection member very easily to ensure that all your code paths are adding the right types to the collection
You can use the test to write code that serves as an "example" of how to use the collection correctly

If you just need binary compatibility to 1.4 you could consider using a tool to downgrade the class files back to 1.4 and thus start to develop in 1.6 or 1.7 right now. You would of course need to avoid any API that hasn't been there in 1.4 (unfortunately you can't compile code with generics against the 1.4 jars directly as they don't declare any generic types). The Bytecode is still the same (at least with 1.6, I don't know for sure about 1.7). One free tool that can do the trick is ProGuard. It can do much more sophisticated things and can also remove all traces of generics in the class files. Just turn off the obfuscation and optimization if you don't need it. It will also warn you if some missing API was used in the processed code if you feed it the 1.4 libraries.
I'm aware that is considered a hack by many but we had a similar requirement where we needed some code to still run on a Personal Java VM (this is essentially Java 1.1) and several other exotic VMs and this approach worked quite well. We started with ProGuard and then made our own tool for the task to be able to implement a few workarounds for some Bugs in the diverse VMs.

Java -> Python?

Besides the dynamic nature of Python (and the syntax), what are some of the major features of the Python language that Java doesn't have, and vice versa?

List comprehensions. I often find myself filtering/mapping lists, and being able to say [line.replace("spam","eggs") for line in open("somefile.txt") if line.startswith("nee")] is really nice.
Functions are first class objects. They can be passed as parameters to other functions, defined inside other function, and have lexical scope. This makes it really easy to say things like people.sort(key=lambda p: p.age) and thus sort a bunch of people on their age without having to define a custom comparator class or something equally verbose.
Everything is an object. Java has basic types which aren't objects, which is why many classes in the standard library define 9 different versions of functions (for boolean, byte, char, double, float, int, long, Object, short). Array.sort is a good example. Autoboxing helps, although it makes things awkward when something turns out to be null.
Properties. Python lets you create classes with read-only fields, lazily-generated fields, as well as fields which are checked upon assignment to make sure they're never 0 or null or whatever you want to guard against, etc.'
Default and keyword arguments. In Java if you want a constructor that can take up to 5 optional arguments, you must define 6 different versions of that constructor. And there's no way at all to say Student(name="Eli", age=25)
Functions can only return 1 thing. In Python you have tuple assignment, so you can say spam, eggs = nee() but in Java you'd need to either resort to mutable out parameters or have a custom class with 2 fields and then have two additional lines of code to extract those fields.
Built-in syntax for lists and dictionaries.
Operator Overloading.
Generally better designed libraries. For example, to parse an XML document in Java, you say
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("test.xml");
and in Python you say
doc = parse("test.xml")
Anyway, I could go on and on with further examples, but Python is just overall a much more flexible and expressive language. It's also dynamically typed, which I really like, but which comes with some disadvantages.
Java has much better performance than Python and has way better tool support. Sometimes those things matter a lot and Java is the better language than Python for a task; I continue to use Java for some new projects despite liking Python a lot more. But as a language I think Python is superior for most things I find myself needing to accomplish.

I think this pair of articles by Philip J. Eby does a great job discussing the differences between the two languages (mostly about philosophy/mentality rather than specific language features).
Python is Not Java
Java is Not Python, either

One key difference in Python is significant whitespace. This puts a lot of people off - me too for a long time - but once you get going it seems natural and makes much more sense than ;s everywhere.
From a personal perspective, Python has the following benefits over Java:
No Checked Exceptions
Optional Arguments
Much less boilerplate and less verbose generally
Other than those, this page on the Python Wiki is a good place to look with lots of links to interesting articles.

With Jython you can have both. It's only at Python 2.2, but still very useful if you need an embedded interpreter that has access to the Java runtime.

Apart from what Eli Courtwright said:
I find iterators in Python more concise. You can use for i in something, and it works with pretty much everything. Yeah, Java has gotten better since 1.5, but for example you can iterate through a string in python with this same construct.
Introspection: In python you can get at runtime information about an object or a module about its symbols, methods, or even its docstrings. You can also instantiate them dynamically. Java has some of this, but usually in Java it takes half a page of code to get an instance of a class, whereas in Python it is about 3 lines. And as far as I know the docstrings thing is not available in Java

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.