I searched through SO for a while and I could not find a definite and general answer, only some contradictory and particular opinions. [1]
So I would like to know what is the relationship between duck typing and generic programming? ( DT < GP , DT == GP, DT > GP ) . By generic programming I refer to ,in particular, C++ templates or Java generics, but a general answer, related to the concepts, if it's possible, would be welcomed.
I know that generic programming will be handled at compile time, while duck typing will be handled at runtime, but appart of this I do not know how to position them.
Lastly, I do not want to start a debate, so I would prefer answers like reasons for, reasons against.
[1] What's the relationship between C++ template and duck typing?
I have encountered two different definitions of "Duck Typing". No doubt there are people who will tell you that one of them is "correct" and the other is "incorrect". I merely attempt to document that they're both used rather than tell you that they're both "correct", but personally I see nothing wrong with the broader meaning.
1) runtime-only typing. Type is a property of objects, not variables, and hence necessarily when you come to call a method on an object, or otherwise use properties that it has by virtue of its type, the presence or absence of that method is discovered at runtime[*]. So if it "looks like a duck and quacks like a duck" (i.e. if it turns out to have a quack() function) then it "is" a duck (anyway, you can treat it like one). By this definition of course C++ templates fall at the first hurdle, they use static typing.
2) A name used more generally for the principle that if it looks like a duck and quacks like a duck then it is a duck, to mean any setup in which interfaces are implicitly defined by the operations performed by the consumer of the interface, rather than interfaces being explicitly advertised by the producer (whatever implements the interface). By this definition, C++ templates do use a kind of duck-typing, but whether something "looks like a duck" is determined at compile time, not at runtime, based on its static type rather than its dynamic type. "Check at compile time, can this variable's static type quack?", not "check at runtime, can this object quack?".
The dispute seems to me in effect to be over whether Python "owns" the term, so that only Python-like type systems can be called duck typing, or whether others are free to appropriate the term to mean a similar concept in a different context. Whatever you think it should mean, it seems irresponsible to use a "jokey" term and demand that everyone understands the same formal definition from it. SO is not a suitable forum to tell you what a term "should" mean, unless there's an authoritative source that you're asking about, like a dictionary or any academic papers that define it formally. I think it can tell you what it's actually used to mean.
"Generic programming" can make use of duck typing or not, depending on the exact details. C++ generic containers do use "type 2" duck-typing, because for a general type in C++ it's not guaranteed that you can copy it, compare it, get a hash value, etc. Java generic containers don't, Object has enough methods already to make it hashable etc.
Conversely, I suspect that anything you do that uses duck typing, can reasonably be referred to as "generic programming". So I suppose, in the terms you asked for, GP > DT in that duck typing (either definition) is a strict subset of the vast range of stuff that can be called "generic".
[*] well, in some cases your dynamic language could have some static analysis that proves the case one way or the other, but the language demands the ability to defer this check until runtime for cases where the static analysis cannot conclusively say.
This is really a question of vocabulary. In it's most general sense,
generic programming is unrelated to the issue of compile-time vs.
run-time: it's a solution to a general problem. A good example of
where generic programming is run-time is Python, but it's also possible
to implement run-time generic programming in C++ (at a significant cost
in execution time).
Duck typing is an orthogonal concept, and is usually used to imply
runtime typing. Again, the most frequently cited modern example is
Python, but many, many languages, starting with Lisp, have used it in
the past. As a general rule, both C++ and Java have made an explicit
choice not to support duck typing. It's a trade-off: safety vs.
flexibility (or compile-time errors vs. run-time errors).
Java doesn't support duck typing in the language. It does support reflection which can achieve the same thing. It doesn't have any relationship with Java's Generics as far as I can see, in fact getting them to work together is a real pain.
For me, "duck typing" means there is no explicit conformance relationship. If something walks like a duck and talks like a duck, it can be treated like a duck, and doesn't need to explicitly declare that it is a duck. In C++ terms, it doesn't need to inherit from a Duck base class: inheritance is a way of declaring that one class conforms to the interface of another explicitly.
This notion is orthogonal to whether type-checking happens at run-time or compile-time. Languages like Smalltalk provide duck-typing that happens at run-time (and inheritance is used to reuse implementation, not to declare conformance of interface). C++ templates are a form of duck-typing that happens at compile-time.
And that last sentence is the answer to the question.
Related
Upon reading the following:
A lot of people define static typing and dynamic typing with respect
to the point at which the variable types are checked. Using this
analogy, static typed languages are those in which type checking is
done at compile-time, whereas dynamic typed languages are those in
which type checking is done at run-time.
This analogy leads to the analogy we used above to define static and
dynamic typing. I believe it is simpler to understand static and
dynamic typing in terms of the need for the explicit declaration of
variables, rather than as compile-time and run-time type checking.
Source
I was thinking that the two ways we define static and dynamic typing: compile-time checking and explicit type declaration are a bit like apples and oranges. A characteristic in all statically typed languages (from my knowledge) is the reference variables have a defined type. Can there be a language that has the benefits of compile-time checking (like Java) but also the ability to have variables unbounded to a specific type (like Python)?
Note: Not exactly type inference in a language like Java, because the variables are still assigned a type, just implicitly. This theoretical language wouldn't have reference types, so there would be no casting. I'm trying to avoid the use of "static typing" vs "dynamic typing" because of the confusion.
There could be, but should there be?
Imagine in hypothetical-pseudo-C++:
class Object
{
public:
virtual Object invoke(const char *name, std::list<Object> args);
virtual Object get_attr(const char *name);
virtual const Object &set_attr(const char *name, const Object &src);
};
And that you have a language that arranges:
to make Object class the root base class of all classes
syntactic sugar to turn blah.frabjugate() into blah.invoke("frabjugate") and
blah.x = 10 into blah.set_attr("x", 10)
Add to this something combining attributes of boost::variant and boost::any and you have a pretty good start. All the dynamicism (both good and runtime bugs bad) of Python with the eloquence and rigidity (yay!) of C++ or Java. With added run-time bloat and efficiency of hash-table lookups vs. call/jmp machine instructions.
In languages like Python, when you call blah.do_it() it has to do potentially multiple hash table lookups of the string "do_it" to find out if your instance blah or its class has a callable thing called "doit" every time it is called. This is the most extreme late-binding that could be imaged:
flarg.do_it() # replaces flarg.do_it()
flarg.do_it() # calls a different flarg.do_it()
You could have your hypothetical language give some control over when the binding occurs. C++-like standard methods are crudely static bound to the apparent reference type, not the real instance type. C++ virtual methods are late-bound to the object instance type. Python-like attributes and methods are extremely late bound to the current version of the object instance.
I think you could definitely program in a strong static typed language in a dynamic style, just as you could build an interpreter in a language like C++ or Java. Some syntax hooks could make it look a little more seamless. But maybe you could do the same in reverse: maybe a Python decorator that automatically checks argument types, or a MetaClass that does it at compile time? [no, I don't think this is possible...]
I think you should view it as a union of features. but you'd get both the best and the worst of both worlds...
Can there be a language that has the benefits of compile-time checking (like Java) but also the ability to have variables unbounded to a specific type (like Python)?
Actually mostly language have support for both, so yes. The difference is which form is preferred/easier and generally used. Java prefers static types but also supports dynamic casts and reflection.
This theoretical language wouldn't have reference types, so there would be no casting.
You have to consider that language also need to perform reasonably well so you have to consider how they will be implemented. You could have a super type but this makes optimisation very hard and you code will most likely either run slowly or use much more resources.
The more popular languages tend to make pragmatic implementation choices. They are not purely one type or another and are willing to borrow styles even if they don't handle them as cleanly as a "pure" language.
what exactly do they allow the compiler or programmer to do that dynamic types can't?
It is generally accepted that the quicker you find a bug, the cheaper it is to fix. When you first start programming, the cost of maintenance isn't high in your mind, but once you have much more experience you will realise that a successful project costs far more to maintain than it did to develop and fixing long standing bugs can be really costly.
static languages have two advantages
you pick up bugs sooner rather than later. The sooner the better. With dynamic languages you might never discover a bug if the code is never run.
the cost of maintenance is easier. Static languages make clearer the assumption made when the code was first written and are more likely to detect issues if you don't have enough test coverage (btw, you never have enough test coverage)
No you cannot. The difference here boils down to early binding versus late binding. Early binding means matching everything up on the binary level upfront, fixing it in code. The result is rigid, type-safe and fast code. Late binding means there is some kind of runtime interpretation involved. This results in flexiblility (potentially unsafe) at the cost of performance.
The two approaches are different on a technical level (compilation versus interpretation) and the programmer would have to choose which is desired when, which would defeat the benefit of having both in the first place.
In languages that use a (common) language runtime however you do get some of what you are asking for through reflection. But it is organized differently and still type-safe. It is not the implicit kind of binding you refer to but requires a bit of work and awareness from the programmer.
As far as what is possible with static types that is impossible with dynamic types: nothing. They are both Turing complete
The value of static types is finding bugs early. In Python, something as simple as a misspelled name isn't caught until you run the program, and even then only if the line of code with the misspelling is run.
class NuclearReactor():
def turn_power_off(self):
...
def shut_down_cleanly(self):
self.turn_power_of()
What does it mean to say "with inheritance you're locked into compile-time decisions about code behavior".
I suggest this post from Donal Fellows on Programmers,
Some languages are pretty strongly static, and only allow the
specification of the inheritance relationship between two classes at
the time of definition of those classes. For C++, definition time is
practically the same as compilation time. (It's slightly different in
Java and C#, but not very much.) Other languages allow much more
dynamic reconfiguration of the relationship of classes (and class-like
objects in Javascript) to each other; some go as far as allowing the
class of an existing object to be modified, or the superclass of a
class to be changed. (This can cause total logical chaos, but can also
model real world nasties quite well.)
But it is important to contrast this to composition, where the
relationship between one object and another is not defined by their
class relationship (i.e., their type) but rather by the references
that each has in relation to the other. General composition is a very
powerful and ubiquitous method of arranging objects: when one object
needs to know something about another, it has a reference to that
other object and invokes methods upon it as necessary. As soon as you
start looking for this super-fundamental pattern, you'll find it
absolutely everywhere; the only way to avoid it is to put everything
in one object, which would be massively dumb! (There's also stricter
UML composition/aggregation, but that's not what the GoF book is
talking about there.)
One of the things about the composition relationship is that
particular objects do not need to be hard-bound to each other. The
pattern of concrete objects is very flexible, even in very static
languages like C++. (There is an upside to having things very static:
it is possible to analyse the code more closely and — at least
potentially — issue better code with less overhead.) To recap,
Javascript, as with many other dynamic languages, can pretend it
doesn't use compilation at all; just pretence, of course, but the
fundamental language model doesn't require transformation to a fixed
intermediate format (e.g., a “binary executable on disk”). That
compilation which is done is done at runtime, and can be easily redone
if things vary too much. (The fascinating thing is that such a good
job of compilation can be done, even starting from a very dynamic
basis…)
Some GoF patterns only really make sense in the context of a language
where things are fairly static. That's OK; it just means that not all
forces affecting the pattern are necessarily listed. One of the key
points about studying patterns is that it helps us be aware of these
important differences and caveats. (Other patterns are more universal.
Keep your eyes open for those.)
I am trying to build a Java to C++ trans-compiler (i.e. Java code goes in, semantically "equivalent" (more or less) C++ code comes out).
Not considering garbage collection, the languages are quite familiar, so the overall process works quite well already. One issue, however, are generics which do not exist in C++. Of course, the easiest way would be to perform erasure as done by the java compiler. However, the resulting C++ code should be nice to handle, so it would be good if I would not lose generic type information, i.e., it would be good, if the C++ code would still work with List<X> instead of List. Otherwise, the C++ code would need explicit casting everywhere where such generics are used. This is bug-prone and inconvenient.
So, I am trying to find a way to somehow get a better representation for generics. Of course, templates seem to be a good candidate. Although they are something completely different (metaprogramming vs. compile-time only type enhancement), they could still be useful. As long as no wildcards are used, just compiling a generic class to a template works reasonably well. However, as soon as wildcards come into play, things get really messy.
For example, consider the following java constructor of a list:
class List<T>{
List(Collection<? extends T> c){
this.addAll(c);
}
}
//Usage
Collection<String> c = ...;
List<Object> l = new List<Object>(c);
how to compile this? I had the idea of using chainsaw reinterpret cast between templates. Then, the upper example could be compiled like that:
template<class T>
class List{
List(Collection<T*> c){
this.addAll(c);
}
}
//Usage
Collection<String*> c = ...;
List<Object*> l = new List<Object*>(reinterpret_cast<Collection<Object*>>(c));
however, the question is whether this reinterpret cast produces the expected behaviour. Of course, it is dirty. But will it work? Usually, List<Object*> and List<String*> should have the same memory layout, as their template parameter is only a pointer. But is this guaranteed?
Another solution I thought of would be replacing methods using wildcards by template methods which instanciate each wildcard parameter, i.e., compile the constructor to
template<class T>
class List{
template<class S>
List(Collection<S*> c){
this.addAll(c);
}
}
of course, all other methods involving wildcards, like addAll would then also need template parameters. Another problem with this approach would be handling wildcards in class fields for example. I cannot use templates here.
A third approach would be a hybrid one: A generic class is compiled to a template class (call it T<X>) and an erased class (call it E). The template class T<X> inherits from the erased class E so it is always possible to drop genericity by upcasting to E. Then, all methods containing wildcards would be compiled using the erased type while others could retain the full template type.
What do you think about these methods? Where do you see the dis-/advantages of them?
Do you have any other thoughts of how wildcards could be implemented as clean as possible while keeping as much generic information in the code as possible?
Not considering garbage collection, the languages are quite familiar, so the overall process works quite well already.
No. While the two languages actually look rather similar, they are significantly different as to "how things are done". Such 1:1 trans-compilations as you are attempting will result in terrible, underperforming, and most likely faulty C++ code, especially if you are looking not at a stand-alone application, but at something that might interface with "normal", manually-written C++.
C++ requires a completely different programming style from Java. This begins with not having all types derive from Object, touches on avoiding new unless absolutely necessary (and then restricting it to constructors as much as possible, with the corresponding delete in the destructor - or better yet, follow Potatoswatter's advice below), and doesn't end at "patterns" like making your containers STL-compliant and passing begin- and end-iterators to another container's constructor instead of the whole container. I also didn't see const-correctness or pass-by-reference semantics in your code.
Note how many of the early Java "benchmarks" claimed that Java was faster than C++, because Java evangelists took Java code and translated it to C++ 1:1, just like you are planning to do. There is nothing to be won by such transcompilation.
An approach you haven't discussed is to handle generic wildcards with a wrapper class template. So, when you see Collection<? extends T>, you replace it with an instantiation of your template that exposes a read-only[*] interface like Collection<T> but wraps an instance of Collection<?>. Then you do your type erasure in this wrapper (and others like it), which means the resulting C++ is reasonably nice to handle.
Your chainsaw reinterpret_cast is not guaranteed to work. For instance if there's multiple inheritance in String, then it's not even possible in general to type-pun a String* as an Object*, because the conversion from String* to Object* might involve applying an offset to the address (more than that, with virtual base classes)[**]. I expect you'll use multiple inheritance in your C++-from-Java code, for interfaces. OK, so they'll have no data members, but they will have virtual functions, and C++ makes no special allowance for what you want. I think with standard-layout classes you could probably reinterpret the pointers themselves, but (a) that's too strong a condition for you, and (b) it still doesn't mean you can reinterpret the collection.
[*] Or whatever. I forget the details of how the wildcards work in Java, but whatever's supposed to happen when you try to add a T to a List<? extends T>, and the T turns out not to be an instance of ?, do that :-) The tricky part is auto-generating the wrapper for any given generic class or interface.
[**] And because strict aliasing forbids it.
If the goal is to represent Java semantics in C++, then do so in the most direct way. Do not use reinterpret_cast as its purpose is to defeat the native semantics of C++. (And doing so between high-level types almost always results in a program that is allowed to crash.)
You should be using reference counting, or a similar mechanism such as a custom garbage collector (although that sounds unlikely under the circumstances). So these objects will all go to the heap anyway.
Put the generic List object on the heap, and use a separate class to access that as a List<String> or whatever. This way, the persistent object has the generic type that can handle any ill-formed means of accessing it that Java can express. The accessor class contains just a pointer, which you already have for reference counting (i.e. it subclasses the "native" reference, not an Object for the heap), and exposes the appropriately downcasted interface. You might even be able to generate the template for the accessor using the generics source code. If you really want to try.
I am a long time OO programmer and a functional programming newbie. From my little exposure algebraic data types only look like a special case of inheritance to me where you only have one level hierarchy and the super class cannot be extended outside the module.
So my (potentially dumb) question is: If ADTs are just that, a special case of inheritance (again this assumption may be wrong; please correct me in that case), then why does inheritance gets all the criticism and ADTs get all the praise?
Thank you.
I think that ADTs are complementary to inheritance. Both of them allow you to create extensible code, but the way the extensibility works is different:
ADTs make it easy to add new functionality for working with existing types
You can easily add new function that works with ADT, which has a fixed set of cases
On the other hand, adding new case requires modifying all functions
Inheritance makes it easy to add new types when you have fixed functionality
You can easily create inherited class and implement fixed set of virtual functions
On the other hand, adding a new virtual function requires modifying all inherited classes
Both object-oriented world and functional world developed their ways to allow the other type of extensibility. In Haskell, you can use typeclasses, in ML/OCaml, people would use dictionary of functions or maybe (?) functors to get the inhertiance-style extensibility. On the other hand, in OOP, people use the Visitor pattern, which is essentially a way to get something like ADTs.
The usual programming patterns are different in OOP and FP, so when you're programming in a functional language, you're writing the code in a way that requires the functional-style extensibility more often (and similarly in OOP). In practice, I think it is great to have a language that allows you to use both of the styles depending on the problem you're trying to solve.
Tomas Petricek has got the fundamentals exactly right; you might also want to look at Phil Wadler's writing on the "expression problem".
There are two other reasons some of us prefer algebraic data types over inheritance:
Using algebraic data types, the compiler can (and does) tell you if you have forgotten a case or if a case is redundant. This ability is especially useful when there are many more operations on things than there are kinds of thing. (E.g., many more functions than algebraic datatypes, or many more methods than OO constructors.) In an object-oriented language, if you leave a method out of a subclass, the compiler can't tell whether that's a mistake or whether you intended to inherit the superclass method unchanged.
This one is more subjective: many people have noted that if inheritance is used properly and aggressively, the implementation of an algorithm can easily be smeared out over a half a dozen classes, and even with a nice class browser at can be hard to follow the logic of the program (data flow and control flow). Without a nice class browser, you have no chance. If you want to see a good example, try implementing bignums in Smalltalk, with automatic failover to bignums on overflow. It's a great abstraction, but the language makes the implementation difficult to follow. Using functions on algebraic data types, the logic of your algorithm is usually all in one place, or if it is split up, its split up into functions which have contracts that are easy to understand.
P.S. What are you reading? I don't know of any responsible person who says "ADTs good; OO bad."
In my experience, what people usually consider "bad" about inheritance as implemented by most OO languages is not the idea of inheritance itself but the idea of subclasses modifying the behavior of methods defined in the superclass (method overriding), specifically in the presence of mutable state. It's really the last part that's the kicker. Most OO languages treat objects as "encapsulating state," which amounts to allowing rampant mutation of state inside of objects. So problems arise when, for example, a superclass expects a certain method to modify a private variable, but a subclass overrides the method to do something completely different. This can introduce subtle bugs which the compiler is powerless to prevent.
Note that in Haskell's implementation of subclass polymorphism, mutable state is disallowed, so you don't have such issues.
Also, see this objection to the concept of subtyping.
I am a long time OO programmer and a functional programming newbie. From my little exposure algebraic data types only look like a special case of inheritance to me where you only have one level hierarchy and the super class cannot be extended outside the module.
You are describing closed sum types, the most common form of algebraic data types, as seen in F# and Haskell. Basically, everyone agrees that they are a useful feature to have in the type system, primarily because pattern matching makes it easy to dissect them by shape as well as by content and also because they permit exhaustiveness and redundancy checking.
However, there are other forms of algebraic datatypes. An important limitation of the conventional form is that they are closed, meaning that a previously-defined closed sum type cannot be extended with new type constructors (part of a more general problem known as "the expression problem"). OCaml's polymorphic variants allow both open and closed sum types and, in particular, the inference of sum types. In contrast, Haskell and F# cannot infer sum types. Polymorphic variants solve the expression problem and they are extremely useful. In fact, some languages are built entirely on extensible algebraic data types rather than closed sum types.
In the extreme, you also have languages like Mathematica where "everything is an expression". Thus the only type in the type system forms a trivial "singleton" algebra. This is "extensible" in the sense that it is infinite and, again, it culminates in a completely different style of programming.
So my (potentially dumb) question is: If ADTs are just that, a special case of inheritance (again this assumption may be wrong; please correct me in that case), then why does inheritance gets all the criticism and ADTs get all the praise?
I believe you are referring specifically to implementation inheritance (i.e. overriding functionality from a parent class) as opposed to interface inheritance (i.e. implementing a consistent interface). This is an important distinction. Implementation inheritance is often hated whereas interface inheritance is often loved (e.g. in F# which has a limited form of ADTs).
You really want both ADTs and interface inheritance. Languages like OCaml and F# offer both.
Method signature in Java:
public List<String> getFilesIn(List<File> directories)
similar one in ruby
def get_files_in(directories)
In the case of Java, the type system gives me information about what the method expects and delivers. In Ruby's case, I have no clue what I'm supposed to pass in, or what I'll expect to receive.
In Java, the object must formally implement the interface. In Ruby, the object being passed in must respond to whatever methods are called in the method defined here.
This seems highly problematic:
Even with 100% accurate, up-to-date documentation, the Ruby code has to essentially expose its implementation, breaking encapsulation. "OO purity" aside, this would seem to be a maintenance nightmare.
The Ruby code gives me no clue what's being returned; I would have to essentially experiment, or read the code to find out what methods the returned object would respond to.
Not looking to debate static typing vs duck typing, but looking to understand how you maintain a production system where you have almost no ability to design by contract.
Update
No one has really addressed the exposure of a method's internal implementation via documentation that this approach requires. Since there are no interfaces, if I'm not expecting a particular type, don't I have to itemize every method I might call so that the caller knows what can be passed in? Or is this just an edge case that doesn't really come up?
What it comes down to is that get_files_in is a bad name in Ruby - let me explain.
In java/C#/C++, and especially in objective C, the function arguments are part of the name. In ruby they are not.
The fancy term for this is Method Overloading, and it's enforced by the compiler.
Thinking of it in those terms, you're just defining a method called get_files_in and you're not actually saying what it should get files in. The arguments are not part of the name so you can't rely on them to identify it.
Should it get files in a directory? a drive? a network share? This opens up the possibility for it to work in all of the above situations.
If you wanted to limit it to a directory, then to take this information into account, you should call the method get_files_in_directory. Alternatively you could make it a method on the Directory class, which Ruby already does for you.
As for the return type, it's implied from get_files that you are returning an array of files. You don't have to worry about it being a List<File> or an ArrayList<File>, or so on, because everyone just uses arrays (and if they've written a custom one, they'll write it to inherit from the built in array).
If you only wanted to get one file, you'd call it get_file or get_first_file or so on. If you are doing something more complex such as returning FileWrapper objects rather than just strings, then there is a really good solution:
# returns a list of FileWrapper objects
def get_files_in_directory( dir )
end
At any rate. You can't enforce contracts in ruby like you can in java, but this is a subset of the wider point, which is that you can't enforce anything in ruby like you can in java. Because of ruby's more expressive syntax, you instead get to more clearly write english-like code which tells other people what your contract is (therein saving you several thousand angle brackets).
I for one believe that this is a net win. You can use your newfound spare time to write some specs and tests and come out with a much better product at the end of the day.
I would argue that although the Java method gives you more information, it doesn't give you enough information to comfortably program against.
For example, is that List of Strings just filenames or fully-qualified paths?
Given that, your argument that Ruby doesn't give you enough information also applies to Java.
You're still relying on reading documentation, looking at the source code, or calling the method and looking at its output (and decent testing of course).
While I love static typing when I'm writing Java code, there's no reason that you can't insist upon thoughtful preconditions in Ruby code (or any kind of code for that matter). When I really need to insist upon preconditions for method params (in Ruby), I'm happy to write a condition that could throw a runtime exception to warn of programmer errors. I even give myself a semblance of static typing by writing:
def get_files_in(directories)
unless File.directory? directories
raise ArgumentError, "directories should be a file directory, you bozo :)"
end
# rest of my block
end
It doesn't seem to me that the language prevents you from doing design-by-contract. Rather, it seems to me that this is up to the developers.
(BTW, "bozo" refers to yours truly :)
Method Validation via duck-typing:
i = {}
=> {}
i.methods.sort
=> ["==", "===", "=~", "[]", "[]=", "__id__", "__send__", "all?", "any?", "class", "clear", "clone", "collect", "default", "default=", "default_proc", "delete", "delete_if", "detect", "display", "dup", "each", "each_key", "each_pair", "each_value", "each_with_index", "empty?", "entries", "eql?", "equal?", "extend", "fetch", "find", "find_all", "freeze", "frozen?", "gem", "grep", "has_key?", "has_value?", "hash", "id", "include?", "index", "indexes", "indices", "inject", "inspect", "instance_eval", "instance_of?", "instance_variable_defined?", "instance_variable_get", "instance_variable_set", "instance_variables", "invert", "is_a?", "key?", "keys", "kind_of?", "length", "map", "max", "member?", "merge", "merge!", "method", "methods", "min", "nil?", "object_id", "partition", "private_methods", "protected_methods", "public_methods", "rehash", "reject", "reject!", "replace", "require", "respond_to?", "select", "send", "shift", "singleton_methods", "size", "sort", "sort_by", "store", "taint", "tainted?", "to_a", "to_hash", "to_s", "type", "untaint", "update", "value?", "values", "values_at", "zip"]
i.respond_to?('keys')
=> true
i.respond_to?('get_files_in')
=> false
Once you've got that reasoning down, method signatures are moot because you can test them in the function dynamically. ( this is partially due to not being able do do signature-match-based-function-dispatch, but this is more flexible because you can define unlimited combinations of signatures )
def get_files_in(directories)
fail "Not a List" unless directories.instance_of?('List')
end
def example2( *params )
lists = params.map{|x| (x.instance_of?(List))?x:nil }.compact
fail "No list" unless lists.length > 0
p lists[0]
end
x = List.new
get_files_in(x)
example2( 'this', 'should', 'still' , 1,2,3,4,5,'work' , x )
If you want a more assurable test, you can try RSpec for Behaviour driven developement.
Short answer: Automated unit tests and good naming practices.
The proper naming of methods is essential. By giving the name get_files_in(directory) to a method, you are also giving a hint to the users on what the method expects to get and what it will give back in return. For example, I would not expect a Potato object coming out of get_files_in() - it just doesn't make sense. It only makes sense to get a list of filenames or more appropriately, a list of File instances from that method. As for the concrete type of the list, depending on what you wanted to do, the actual type of List returned is not really important. What's important is that you can somehow enumerate the items on that list.
Finally, you make that explicit by writing unit tests against that method - showing examples on how it should work. So that if get_files_in suddenly returns a Potato, the test will raise an error and you'll know that the initial assumptions are now wrong.
Design by contract is a much subtler principle than just specifying the argument type an return type. Other answers here concentrate much on good naming, which is important. I could go on an on about the many ways in which the name get_files_in is ambiguous. But good naming is just an outward consequence of a deeper principle of having good contracts and designing by them. Names are always a bit ambiguous, and good pragmatic linguistics is a product of good thinking.
You can consider contracts the design principles, and they are frequently hard and boring to state in an abstract form. An untyped language requires that the programmer thinks about contracts for real, that she understands them a deeper level than just as type constraints. If there is a team, the team members must all mean and abide by the same contracts. They must be dedicated thinkers and must spend time together discussing concrete examples in order to establish shared understanding of contracts.
The same requirements apply to the API user: The user must first memorize the documentation, and then she is able to gradually understand the contracts, and start loving the API if the contracts are thoughtfully crafted (or hating it if otherwise).
This is connected to duck typing. A contract must give clue as to what happens regardless of the type of the method inputs. So the contract must be understood in a deeper, more generalized way. This answer itself might seem a bit inconcrete, or even haughty, for which I apologize. I am simply trying to say that the duck is not a lie, the duck means that one thinks about one's problem on a higher level of abstraction. The designers, the programmers, the mathematicians are all different names for the same capability, and mathematicians know that there are many levels of aptitude in mathematics, where mathematicians on a next higher level easily solve problems which those on lower levels find too hard to solve. The duck means that your programming has to be good mathematics, and it restricts the successful developers and users to only those, who are able to do so.
It's by no means a maintenance nightmare, just another way of working, that calls for consistence in the API and good documentation.
Your concern seems related to the fact that any dynamic language is a dangerous tool, that cannot enforce API input/output contracts. The fact is, while chosing static may seem safer, the better thing you can do in both worlds is to keep a good set of tests that verify not only the type of the data returned (which is the only thing the Java compiler can verify and enforce), but also it's correctness and inner workings(Black box/white box testing).
As a side note, I don't know about Ruby, but in PHP you can use #phpdoc tags to hint the IDE (Eclipse PDT) about the data types returned by a certain method.
I made a half-baked attempt at something like dbc for Ruby a few years ago, may give folks some ideas about how to move forward with a more comprehensive solution:
https://github.com/justinwiley/higher-expectations