Explain Type Classes in Haskell

Explain Type Classes in Haskell - java

I am a C++ / Java programmer and the main paradigm I happen to use in everyday programming is OOP. In some thread I read a comment that Type classes are more intuitive in nature than OOP. Can someone explain the concept of type classes in simple words so that an OOP guy like me can understand it?

First, I am always very suspicious of claims that this or that program structure is more intuitive. Programming is counter-intuitive and always will be because people naturally think in terms of specific cases rather than general rules. Changing this requires training and practice, otherwise known as "learning to program".
Moving on to the meat of the question, the key difference between OO classes and Haskell typeclasses is that in OO a class (even an interface class) is both a type and a template for new types (descendants). In Haskell a typeclass is only a template for new types. More precisely, a typeclass describes a set of types that share a common interface, but it is not itself a type.
So the typeclass "Num" describes numeric types with addition, subtraction and multiplication operators. The "Integer" type is an instance of "Num", meaning that Integer is a member of the set of types that implement those operators.
So I can write a summation function with this type:
sum :: Num a => [a] -> a
The bit to to the left of the "=>" operator says that "sum" will work for any type "a" that is an instance of Num. The bit to the right says it takes a list of values of type "a" and returns a single value of type "a" as a result. So you could use it to sum a list of Integers or a list of Doubles or a list of Complex, because they are all instances of "Num". The implementation of "sum" will use the "+" operator of course, which is why you need the "Num" type constraint.
However you cannot write this:
sum :: [Num] -> Num
because "Num" is not a type.
This distinction between type and typeclass is why we don't talk about inheritance and descendants of types in Haskell. There is a sort of inheritance for typeclasses: you can declare one typeclass as a descendant of another. The descendant here describes a subset of the types described by parent.
An important consequence of all this is that you can't have heterogenous lists in Haskell. In the "sum" example you can pass it a list of integers or a list of doubles, but you cannot mix doubles and integers in the same list. This looks like a tricky restriction; how would you implement the old "cars and lorries are both types of vehicle" example? There are several answers depending on the problem you are actually trying to solve, but the general principle is that you do your indirection explicitly using first-class functions rather than implicitly using virtual functions.

Well, the short version is: Type classes are what Haskell uses for ad-hoc polymorphism.
...but that probably didn't clarify anything for you.
Polymorphism should be a familiar concept to people from an OOP background. The key point here, however, is the difference between parametric and ad-hoc polymorphism.
Parametric polymorphism means functions that operate on a structural type that itself is parameterized by other types, such as a list of values. Parametric polymorphism is pretty much the norm everywhere in Haskell; C# and Java call it "generics". Basically, a generic function does the same thing to a specific structure, no matter what the type parameters are.
Ad-hoc polymorphism, on the other hand, means a collection of distinct functions, doing different (but conceptually related) things depending on types. Unlike parametric polymorphism, ad-hoc polymorphic functions need to be specified individually for each possible type they can be used with. Ad-hoc polymorphism is thus a generalized term for a variety of features found in other languages, such as function overloading in C/C++ or class-based dispatch polymorphism in OOP.
A major selling point of Haskell's type classes over other forms of ad-hoc polymorphism is greater flexibility due to allowing polymorphism anywhere in the type signature. For instance, most languages won't distinguish overloaded functions based on return type; type classes can.
Interfaces as found in many OOP languages are somewhat similar to Haskell's type classes--you specify a group of function names/signatures that you want to treat in an ad-hoc polymorphic fashion, then explicitly describe how various types can be used with those functions. Haskell's type classes are used similarly, but with greater flexibility: you can write arbitrary type signatures for the type class functions, with the type variable used for instance selection appearing anywhere you like, not just as the type of an object that methods are being called on.
Some Haskell compilers--including the most popular, GHC--offer language extensions that make type classes even more powerful, such as multi-parameter type classes, which let you do ad-hoc polymorphic function dispatch based on multiple types (similar to what's called "multiple dispatch" in OOP).
To try to give you a bit of the flavor of it, here's some vaguely Java/C#-flavored pseudocode:
interface IApplicative<>
{
IApplicative<T> Pure<T>(T item);
IApplicative<U> Map<T, U>(Function<T, U> mapFunc, IApplicative<T> source);
IApplicative<U> Apply<T, U>(IApplicative<Function<T, U>> apFunc, IApplicative<T> source);
}
interface IReducible<>
{
U Reduce<T,U>(Function<T, U, U> reduceFunc, U seed, IReducible<T> source);
}
Note that we're, among other things, defining an interface over a generic type and defining a method where the interface type appears only as the return type, Pure. Not apparent is that every use of the interface name should mean the same type (i.e., no mixing different types that implement the interface), but I wasn't sure how to express that.

In C++/etc, "virtual methods" are dispatched according to the type of the this/self implicit argument. (The method is pointed to in a function table which the object implicitly points to)
Type classes work differently, and can do everything that "interfaces" can and more. Let's start with a simple example of something that interfaces can't do: Haskell's Read type-class.
ghci> -- this is a Haskell comment, like using "//" in C++
ghci> -- and ghci is an interactive Haskell shell
ghci> 3 + read "5" -- Haskell syntax is different, in C: 3 + read("5")
8
ghci> sum (read "[3, 5]") -- [3, 5] is a list containing 3 and 5
8
ghci> -- let us find out the type of "read"
ghci> :t read
read :: (Read a) => String -> a
read's type is (Read a) => String -> a, which means for every type that implements the Read type-class, read can convert a String to that type. This is dispatch based on return type, impossible with "interfaces".
This can't be done in C++ et al's approach, where the function table is retrieved from the object - here, you don't even have the relevant object until after read returns it so how could you call it?
A key implementation difference from interfaces that allows this to happen, is that the function table isn't pointed to inside the object, it is passed separately by the compiler to the called functions.
Additionally, in C++/etc when one defines a class they are also responsible on implementing their interfaces. This means that you can't just invent a new interface and make Int or std::vector implement it.
In Haskell you can, and it isn't by "monkey patching" like in Ruby. Haskell has a good name-spacing scheme that means that two type classes can both have a function of the same name and a type can still implement both.
This allows Haskell to have many simple classes like Eq (types that support equality checking), Show (types that can be printed to a String), Read (types that can be parsed from a String), Monoid (types that have a concatenation operation and an empty element) and many more, and allows for even the primitive types like Int to implement the appropriate type-classes.
With the richness of type-classes people tend to program to more general types and then have more reusable functions and since they also have less freedom when the types are general they may even produce less bugs!
tldr: type-classes == awesome

In addition to what xtofl and camccann have already written in their excellent answers, a useful thing to notice when comparing Java's interfaces to Haskell's type classes is the following:
Java interfaces are closed, meaning that the set of interfaces any given class implements is decided once and for all when and where it is defined;
Haskell's type classes are open, meaning that any type (or group of types for multi-parameter type classes) can be made a member of any type class at any time, as long as suitable definitions can be provided for the functions defined by the type class.
This openness of type classes (and Clojure's protocols, which are very similar) is a very useful property; it is quite usual for a Haskell programmer to come up with a new abstraction and immediately apply it to a range of problems involving pre-existing types through clever use of type classes.

A type class can be compared to the concept of 'implementing' an interface. If some data type in Haskell implements the "Show" interface, it can be used with all functions that expect a "Show" object.

With OOP, you inherit both interface and implementation. Haskell type-classes allow these to be seperated. Two utterly unrelated types can both expose the same interface.
Perhaps more importantly, Haskell allows class implementations to be added "after the fact". That is, I can invent some new type-class of my own, and then go and make all the standard pre-defined types be instances of this class. In an OO language, you [usually] can't easily add a new method to an existing class, no matter how useful that would be.

Related

Java Type Variance Too Permissive?

In a modern functional language like Scala, type variance is inherent in the type. Here's e.g. Scala's Function1:
trait Function1[-T1, +R] { ... }
contravariant in parameter type and covariant in return type. And here's java's counterpart:
interface Function<T,R> { ... }
Now, to express variance relationship, the "wildcard capture" special syntax is used. For example, the stream's map function is declared as
<R> Stream<T> map(Function<? super T, ? extends R> mapper);
Here, Java shifts the declaration of variance relationship from the type itself to its use as a param in some method signature.
Here's my question. Would I be amiss to say that there cannot be any legitimate usages of Function<T,R> that are not contravariant in T and covariant in R? In other words, does Java's way offer useful extra flexibility not found in Scala, or is it just a lot of repetitive unwieldy boilerplate?

For Function specifically, no. Function defines exactly one abstract method, apply, which uses T contravariantly and R covariantly. But Function isn't what they had in mind when they designed that feature.
When the Java devs designed call-site variance, they were imagining classes that had both covariant and contravariant uses. For instance, in principle, the E in List<E> must be invariant. It appears in covariant position in get and in contravariant position in add.
So the rationale was this. Suppose we have a type hierarchy X <= Y <= Z. That is, X is a class that subclasses Y, and Y in turn subclasses Z. A List<Y> can do anything with type Y. It can have Ys added to the end, and a user can retrieve elements of type Y from it. But it can never be a List<Z> or a List<X>, since adding to a List<X> would be unsound, and so would retrieving as a List<Z>.
But we can express our intention. List<? extends Y> is a type we can only ever read from. It actually can take a List<Z> under the hood, since a list of Z elements is genuinely still (at least for covariant methods) a list of Y elements. We can get elements from this list, but we can't add to the end of it, since we said we're using the type argument in covariant position but add uses the type argument contravariantly. Essentially, List<? extends Y> is a smaller interface that includes some of the methods from the actual interface List.
The same is true of List<? super Y>. We can't read from it, since we don't know that every element is of type Y. But we can add to it, since we know that the list at least supports elements of type Y. We can use all of the contravariant methods, like add, but none of the covariant ones.
For a type like List that uses its type arguments in different ways, the call-site variance makes some amount of sense. For a special-purpose interface like Function that does one thing, it makes little sense.
That was the Java developers' rationale some twenty years ago when generics were added to Java. A lot has happened since then. If someone wrote an interface like List in today's world, an interface with upwards of 20 abstract methods, half of which have "this method may not be supported and might just throw UnsupportedOperationException" built-in to the contract, they'd rightly be laughed off the stage.
Today's world is one of small, tight interfaces. We follow the SOLID principles. An interface does one thing and does it well. If an interface defines more than two or three (non-defaulted, non-inherited) methods, we give pause and ask if we can make it more modular. And we try to design systems that are more immutable by design, to support scaling and concurrency. We have records, or data classes or whatever your favorite language calls them, that are immutable by default.
So twenty years ago, the idea of a massive super-interface that does twenty things and that can be narrowed down dynamically via type projections seemed pretty cool. Today, it makes far more sense to specify the variance at the declaration site, since most interfaces are small and have a clear use case in mind.
The scala.collection.Seq trait defines three abstract, non-inherited methods (apply, iterator, and length), and all of those use the type argument covariantly, so Seq is defined with a covariant type. The corresponding mutable trait adds one more method (update), which uses its type argument contravariantly, so it has an invariant argument.
In Scala, if you want to modify a sequence, you take a scala.collection.mutable.Seq. If you want to read, you take a scala.collection.Seq. And those interfaces are small enough and narrow enough in purpose that the fact that there are several of those doesn't affect the code quality (and the fact that traits and classes in Scala are cheap to write, compared to the boilerplate necessary in Java to make even a simple class).

Actually, Scala supports both declaration and use site variance. Specifically, you can specify bounded wildcards just like in Java.
This already hints that declaration site variance can not replace use site variance in all cases. The reason is that a declaration can only be variant if it is variant in all possible uses. If some uses are variant, but other uses are not, we can't use declaration site variance, but we can use use site variance.
For instance, class Array[A] can not be declared variant, but the method appendedAll from ArrayOps can employ use site variance
def appendedAll[B >: A](suffix: Array[_ <: B])(implicit arg0: ClassTag[B]): Array[B]
since it uses covariant methods of suffix.

In other words, does Java's way offer useful extra flexibility not found in Scala, or is it just a lot of repetitive unwieldy boilerplate?
Suppose that class Foo extends class Parent and is in turn extended by class Child.
Then, as you know, we can pass an instance of ArrayList<Foo> to a method that takes a List<? extends Parent>, or to a method that takes a List<? super Child>.
A reason that a method might take List<? extends Parent> is if it only reads from the list (it never writes to it), and can happily support any element that's an instance of Parent (because it doesn't need anything specific to Foo or another subtype).
A reason that a method might take List<? super Child> is if it only writes to the list (it never reads from it), and the elements that it writes are always instances of Child (so it doesn't care whether the list can accept arbitrary instance of Foo or another supertype).
That said, yes, it is a lot of repetitive unwieldy boilerplate! As a result, it's not uncommon to come across a method that could take a List<? extends Parent> or a List<? super Child> but instead just takes a List<Parent> or a List<Child> (respectively).

I am not an expert on this but it can be argued that the use of wildcard capture syntax in Java's Function interface allows for more flexibility in expressing variance relationships compared to Scala's Function1 trait, where variance is inherent in the type. The wildcard capture syntax allows for variance to be declared specifically in the context of a method's parameter or return type, rather than being inherent to the type itself. However, it can also be seen as repetitive and unwieldy boilerplate. It depends on the perspective of the developer and their use case.

Generic substitution principle - how application of principle is governed

from the text "Java Generics and Collections" by Naftalin and Wadler, a passage states that though Integer is a subtype of Number, List<Integer> is not a subtype of List<Number>.
This prevents one from using, polymorphically, references to the in places where one might traditionally expect to allow such statements.
My question here is, since the 'substitution principle' does not apply in the case List<Integer> and List<Number>, does the restriction on the principle relate to where a class and a specific type (assigned for a generic type) are combined - in all cases and in general - (here as 'List<Integer>' for example). Here in this case I refer to the notion of substiution in resepct to a statement like 'List<Integer>', as opposed to 'List', or '<Integer>' seperatley.
Or alternatively, is the restriciton instead defined through some mechanism that specifies whether and which classes are subtypes (and thus when and when it applies) as one does through the usual extends and implements mechanism.
Essentially i do not understand the mechanism through which the susbtitution principle in such cases are caused to be defined as applying or not applying.
many thanks

If I understood the question correctly, in Java and in the situation you described, the Liskov substition principle does not apply because List<Integer> is not a subtype of List<Number>, which however is a requirement for the Likov substition principle.
That being said, the relation between List<Integer> and List<Number> can be described by covaraince and contravariance, which models the expactation which could be formulated as follows.
As Integer is a subtype of Number, which means that every implementation for Number can be used for Integer, the same must apply for types which instantiate the same generic template, but use Integer and Number as type arguments.
However, to my understanding, this is a different mechanism, which is also discussed in this question for generics.

Why is the declaration of type important in a statically typed language?

I'm trying to understand the benefit of a programming language being statically typed, and through that, I'm wondering why we need to include type in declaration? Does it serve any purpose rather than to make type explicit? If this is the case, I don't see the point. I understand that static typing allows for type checking at compile-time, but if we leave out the explicit type declaration, can't Java still infer type during compile-time?
For example, let's say we have in Java:
myClass test = new myClass();
Isn't the type declaration unnecessary here? If I'm not mistaken, this is static binding, and Java should know test is of type myClass without explicit declaration of type even at compile-time.
Response to possible duplicate: this is not a question regarding static vs. dynamic type, but rather about type inference in statically typed languages, as explained in the accepted answer.

There are statically typed languages that allow you to omit the type declaration. This is called type inference. The downsides are that it's tougher to design (for the language designers), tougher to implement (for the compiler writers), and can be tougher to understand when something goes wrong (for programmers). The problem with the last one of those is that if many (or all) of your types are inferred, the compiler can't really tell you much more than "the types aren't all consistent" — often via a cryptic message.
In a trivial case like the one you cite, yes, it's easy. But as you get farther from the trivial case, the system quickly grows in complexity.
Java does actually do a bit of type inference, in very limited forms. For instance, in this snippet:
List<String> emptyStrings = Collections.emptyList();
... the compiler has inferred that the method call emptyList returns a List<String>, and not just a List<T> where the type T is unspecified. The non-inferred version of that line (which is also valid Java) is:
List<String> emptyStrings = Collections.<String> emptyList();

It is necessary. You can have inheritance, where types are necessary.
For example:
Building build1 = new House();
Building build2 = new SkyScraper();
It is the same in polymorphism.
You can then collect all Buildings to array for example. If there will be one House and one SkyScraper you can't do this.

What is the difference between a class and a type in Scala (and Java)?

Scala
Where can differences between a class and a type be observed in Scala and why is this distinction important?
Is it only a consideration from the language design point-of-view or has it "practical" impact when programming Scala?
Or is it fundamental to "securing the boundaries" of the type system (Nothing, Null come to my mind)?
Java
How many of the considerations/differences/problems mentioned above can also be recognized in Java?
(See What is the difference between Type and Class? as a language-agnostic introduction.)

When you say "type" I'm going to assume you mean static type mostly. But I'll talk about dynamic types shortly.
A static type is a property of a portion of a program that can be statically proven (static means "without running it"). In a statically typed language, every expression has a type whether you write it or not. For instance, in the Cish "int x = a * b + c - d", a,b,c,and d have types, a * b has a type, a * b + c has a type and a * b + c -d has a type. But we've only annotated x with a type. In other languages, such as Scala, C#, Haskell, SML, and F#, even that wouldn't be necessary.
Exactly what properties are provable depends on the type checker.
A Scala style class, on the other hand, is just the specification for a set of objects. That specification includes some type information and includes a lot of implementation and representation details such as method bodies and private fields, etc. In Scala a class also specifies some module boundaries.
Many languages have types but don't have classes and many languages have classes but don't have (static) types.
There are several observable differences between types and classes. List[String] is a type but not a class. In Scala List is class but normally not a type (it's actually a higher kinded type). In C# List isn't a type of any sort and in Java it's a "raw type".
Scala offers structural types. {def foo : Bar} means any object that provably has a foo method that returns a Bar, regardless of class. It's a type, but not a class.
Types can be abstracted using type parameters. When you write def foo[T](x : T) = ..., then inside the body of foo T is a type. But T is not a class.
Types can be virtual in Scala (i.e. "abstract type members"), but classes can't be virtual with Scala today (although there's a boilerplate heavy way to encode virtual classes https://wiki.scala-lang.org/display/SIW/VirtualClassesDesign)
Now, dynamic types. Dynamic types are properties of objects that the runtime automatically checks before performing certain operations. In dynamically typed class-based OO languages there's a strong correlation between types and classes. The same thing happens on JVM languages such as Scala and Java which have operations that can only be checked dynamically such as reflection and casting. In those languages, "type erasure" more or less means that the dynamic type of most objects is the same as their class. More or less. That's not true of, e.g., arrays which aren't typically erased so that the runtime can tell the difference between Array[Int] and Array[String]. But remember my broad definition "dynamic types are properties of objects that the runtime automatically checks." When you use reflection it is possible to send any message to any object. If the object supports that message then everything works out. Thus it makes sense to talk of all objects that can quack like a duck as a dynamic type, even though it's not a class. That's the essence of what the Python and Ruby communities call "duck typing." Also, by my broad definition even "zeroness" is a dynamic type in the sense that, in most languages, the runtime automatically checks numbers to make sure you don't divide by zero. There are a very, very few languages that can prove that statically by making zero (or not-zero) a static type.
Finally, as other's have mentioned, there are types like int which don't have a class as an implementation detail, types like Null and Any which are a bit special but COULD have classes and don't, and types like Nothing which doesn't even have any values let alone a class.

Okay, I'll bite... James has a good answer, so I'm going to try a different tact and give a more down-to-earth viewpoint.
Broadly speaking, a class is something that can be instantiated. singleton objects (scala) traits (Scala) and interfaces (Scala) are also commonly considered to be classes. This makes sense, as singletons are still instantiated (via compiler-generated code) and an interface can be instantiated as part of a subclass.
Which brings us to the second point. classes are the primary unit of design in most object-oriented languages (though not the prototype-based ones like javascript). Polymorphism and subclassing are both defined in terms of classes. classes also provide a namespace and visibility controls.
types are a very different beast, every possible value that the system can express will have one or more types, and these can sometimes be equated to classes, for example:
(Int) => String // both the type and class are Function1[Int,String]
"hello world" // class and type are String
You also get some interesting differences between Scala and Java:
7 // both the class and type are Int in Scala
// in Java there's no class and the type is Integer.TYPE
println("hello world") // the return type is Unit, of class Unit
// Java has void as a type, but no corresponding class
error("oops") // the type and class are both "Nothing"
and the really fun types that aren't classes at all. For example, this.type always refers to the unique type of this. It's unique to a single instance and isn't even compatible with other instances of the same class.
There are also abstract types and type parameters. For example:
type A // 'A' is an undetermined abstract type
// to be made concrete in a subclass
class Seq[T] { ... } // T is a type, but not a class
Seq is interesting as it's a class, but not a type. More accurately, it's a "type constructor"; something that will construct a valid type when supplied with the necessary type parameter. Another term for type constructors is "higher kinded types", I personally don't like this term, as "type constructor" encourages me to think in terms of supplying types like any other form of argument - a mental model that has served me well for Scala.
"higher-kinded" rightly implies that Seq has a "kind", which is * => *, this notation states that Seq will take a single type and yield a single type (this is similar to curried notation for describing functions). By way of comparison, the kind of Map is * => * => * because it takes two type parameters.

A type can be useful by itself, without any instances. One example for this is called "phantom type". Here is an example for Java: http://michid.wordpress.com/2008/08/13/type-safe-builder-pattern-in-java/
In that example we have public static class Initializer<HA, HB>, where HA and HB take some types (represented by the abstract classes TRUE and FALSE), without ever beeing instantiated.
I hope this shows that types and classes are something different, and that types can be useful by itself.

(Java only) I'd say, a type is a set of objects. Object o is type X, if o is a member of set X. Type X is a subtype of Y, if set X is a subset of Y.
For every class C (not interface) there is a set of objects, created from new C(...). Interestingly, we rarely cares about this set. (but every object does belong to a set like this, a fact that may be useful)
For every class C, there is a type t(C), generally refered to as "the type C", which is the set of all objects that can be created from new S(...) where S is C or a subclass of C.
Similarly, for every interface I, there is a type t(I), "the type I", which is the set of all objects that can be created from new S(...) where S implements I.
Obviously, if class S is a subclass of C, type S is a subtype of type C. Similar for interface I
There is a null type, which is the empty set. The null type is a subtype of every type.
There is a set of all objects, which is the type Object. It's a super type of every type.
So far, this formalism is pretty useless. A type is basically the same as a class or an interface, and the subtype relation is basically the subclass/subinterface relation. The triviality is a good thing, the language was understandable! But entering generics, there are more complicated types, and operations like unions and intersections of types. Types are no longer only classes and interfaces, and subtype relations are much richer and harder to understand.

How do I reconstruct generic type information for classes given a TypeLiteral?

I have the following problem:
Given a Guice type literal TypeLiteral<T> template and a class Class c implementing or extending T, construct a type Type t which is equivalent to c with all type variables instantiated so as to be compatible with template.
If c has no type variables, it's easy; c is the type in question. However, if c has type variables, then I need to do the following:
Find the type in c's inheritance and implementation hierarchy corresponding to the raw type of T
Walk through the type parameter structure, finding any type variable uses and their corresponding types in template
Use the Guice Types helper functions to create a type from c instantiated with the types found in (2).
Of course, there are error cases and it might not be complete. If it can't find matching uses of all type variables, it will fail. There might be other cases as well. However, if I have this:
class CS<I> implements S<Map<I,Float>> {
// some stuff
}
and a type literal TypeLiteral<S<Map<I,Float>>>, I want to get a type which represents CS fully instantiated to match the type literal.
It looks like reflection provides enough information to accomplish this, but the logic looks complex and error-prone. Is there an existing library which exposes this logic?

TypeLiteral.getSupertype() should do the trick:
TypeLiteral<?> t = TypeLiteral.get(x).getSupertype(y);

This problem is an instance of the unification problem, and as such the standard unification algorithm is applicable and not as complicated as I initially thought. Further, this instance of the problem allows for some significant simplifying assumptions, as one of the trees will contain no variables. 200 lines of Java later, I have a working solution.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.