Why is javaslang CharSeq final? - java

We're all familiar with the argument about why String is final in java.
However, I was wondering why javaslang's CharSeq is final too.
Given javaslang's FP inspirations and the fact that Haskell allows type synonyms, I would have thought this would be a good opportunity to make CharSeq non-final with maybe the methods as final.
A non-final CharSeq would then allow me to extend it with an empty body to create a good approximation of type synonyms. This would avoid the boilerplate of the tiny type pattern for cases where the additional type safety is desired.
I'm sure there is a good reason why this is not the design, which is why I am asking here.
UPDATED 16-Dec-2016: I've raised an enhancement request with the javaslang team on github as issue #1764.

All the same reasons for and against apply.
Java's designers believed that guaranteeing immutability was worth losing the ability to subclass.
Slang's designers agree, and took the same approach with their "String", CharSeq.
The only internally consistent way to disagree with this design, would be to also disagree with String's design, which has stood the test of time.
Sometimes we wish we could subclass String, and we can't - but it doesn't sting us often. If you really want to, you could write your own String that proxies all methods to a java.lang.String delegate. You could subclass that as much as you like. It would be your own responsibility not to introduce mutability.

Related

Wondering about Microstream class StorageConfiguration

there are two questions with microstream database and its class StorageConfiguration:
1) What ist the difference of the methods New() and Builder() and the DEFAULT construct?
2) Why the methods are writting uppercased? That does not seem to be Java naming convention.
Thanks for any answers!
I am the MicroStream lead developer and I can gladly answer those questions.
To 1)
"New" is a "static factory method" for the type itself.
"Builder" is a static factory method for a "builder" instance of the type.
Both terms can be perfectly googled for more information about them.
A quick service as a starting point:
"static factory method":
https://www.baeldung.com/java-constructors-vs-static-factory-methods
"builder pattern":
https://en.wikipedia.org/wiki/Builder_pattern
--
To your actually second question, about the "DEFAULT" construct:
If I may, there is no "DEFAULT" construct, but "Default".
(Conventions are important ... mostly. See below.)
"Default" is simply the default implementation (= class) of the interface StorageConfiguration.
Building a software architecture directly in classes quickly turns out to be too rigid and thus bad design. Referencing and instantiating classes directly creates a lot of hardcoded dependencies to one single implementation that can't be changed or made more flexible later on. Inheritance is actually only very rarely flexible enough to be a solution for arising architecture flexibility problems. Interfaces, on the other hand, only define a type and the actual class implementing it hardly matters and can even be easily interchangeable. For example, by only designing via interfaces, every instance can easily be "wrapped" by any desired logic via using the decorator pattern. E.g. adding a logging aspect to a type.
There is a good article with an anecdote about James Gosling (the inventor of Java) named "Why extends is evil" that describes this:
https://www.javaworld.com/article/2073649/why-extends-is-evil.html
So:
"Default" is just the default class implementing the interface it is nested in. It makes sense to name such a class "Default", doesn't it? There can be other classes next to it, like "Wrapper" or "LazyInitializing" or "Dummy" or "Randomizing" or whatever.
This design pattern is used in the entire code of MicroStream, giving it an incredibly flexible and powerful architecture. For example:
With a single line of code, every part of MicroStream (every single "gear" in the machine) can be replaced by a custom implementation. One that does things differently (maybe better?) or fixes a bug without even needing a new MicroStream version. Or one that adds logging or customized exception handling or that introduces object communication where there normally is none. Maybe directly with the application logic (but at your own risk!). Anything is possible, at least inside the boundaries of the interfaces.
Thinking in interfaces might be confusing in the beginning (which is why a lot of developers "burn mark" interfaces with a counterproductive "I" prefix. It hurts me every time I see that), but THEY are the actual design types in Java. Classes are only their implementation vehicles and next to irrelevant on the design level.
--
To 2)
I think a more fitting term for "static factory method" is "pseudo constructor". It is a method that acts as a public API constructor for that type, but it isn't an actual constructor. Following the argumentation about the design advantages of such constructor-encapsulating static methods, the question about the best, consistent naming pattern arose to me. The JDK gives some horribly bad examples that should not be copied. Like "of" or "get". Those names hardly carry the meaning of the method's purpose.
It should be as short but still as descriptive as possible. "create" or "build" would be okay, but are they really the best option? "new" would be best, but ironically, that is a keyword associated with the constructors that should be hidden from public API. "neW" or "nEw" would look extremely ugly and would be cumbersome to type. But what about "New"? Yes, it's not strictly Java naming conventions. But there already is one type of methods that does is an exception to the general naming rule. Which one? Constructors! It's not "new person(...") but "new Person(...)". A method beginning with a capital letter. Since the beginning of Java. So if the static method should take the place of a constructor, wouldn't it be quite logical and a very good signal to apply that same exception ... or ... "extension" of the naming convention to that, too? So ... "New" it is. Perfectly short, perfectly clear. Also not longer and VERY similar to the original constructors. "Person.New" instead of "new Person".
The "naming convention extension" that fits BOTH naming exceptions alike is: "every static method that starts with a capital letter is guaranteed to return a new instance of that type." Not a cached one. Always a new one. (this can be sometime crucial to guarantee the correctness of algorithms.)
This also has some neat side effects. For example:
The pseudo-constructor method for creating a new instance of
"StorageConfigurationBuilder" can be "StorageConfiguration.Builder()".
It is self-explaining, simple, clear.
Or if there is a method "public static Vector Normalized(Vector v)", it implicitely
tells that the passed instance will not be changed, but a new instance will
be returned for the normalized vector value. It's like having the
option to give constructors proper names all of a sudden. Instead of
a sea of different "Vector(...)" methods and having to rely on the
JavaDoc to indirectly explain their meaning, the explanation is right
there in the name. "New(...)", "Normalized(...)", "Copy(...)" etc.
AND it also plays along very nicely with the nested-Default-class
pattern: No need to write "new StorageConfiguration.Default()" (which
would be bad because too hardcoded, anyway), but just
"StorageConfiguration.New" suffices. It will internally create and
return a new "StorageConfiguration.Default" instance. And should that
internal logic ever change, it won't even be noticable by the API
user.
Why do I do that if no one else does?
If one thinks about it, that cannot be a valid argument. I stick VERY closely to standards and conventions as far as they make sense. They do about 99% of the time, but if they contain a problem (like forbidding a static method to be called "new") or lacking a perfectly reasonable feature (like PersonBuilder b = Person.Builder()" or choosing properly speaking names for constructors), then, after careful thought, I br... extend them as needed. This is called innovation. If no one else had that insight so far, bad for them, not for me. The question is not why an inventor creates an improvment, but why no one else has done it so far. If there is an obvious possibility for improvement, it can't be a valid reason not to do it just because no one else did it. Such a thinking causes stagnation and death of progress. Like locking oneself in a 1970ies data storing technology for over 40 years instead of just doing the obviously easier, faster, direct, better way.
I suggest to see the capital letter method naming extension as a testimony to innovation: If a new idea objectively brings considerably more advantages than disadvantages, it should - or almost MUST - be done.
I hereby invite everyone to adopt it.

ImmutableList vs List- what should I cast it as?

I know its commonly accepted to cast all List implementations down to List. Whether it is a variable, method return, or a method parameter using an ArrayList, CopyOnWriteArrayList, etc.
List<Market> mkts = new ArrayList<>();
When I'm using a Guava ImmutableList, I have the sense it can arguably be an exception to this rule (especially if I'm building in-house, complicated business applications and not a public API). Because if I cast it down to list, the deprecated mutator methods will no longer be flagged as deprecated. Also, it no longer is identified as an immutable object which is a very important part of its functionality and identity.
List<Market> mkts = ImmutableList.of(mkt1,mkt2,mkt3);
Therefore it makes sense to pass it around as an ImmutableList right? I could even argue that its a good policy for an internal API to only accept ImmutableList, so mutability and multithreading on the client side won't wreck anything inside the libary.
ImmutableList<Market> mkts = ImmutableList.of(mkt1,mkt2,mkt3);
I know there is a risk of ImmutableList itself becoming deprecated, and the day Oracle decides to create its own ImmutableList will require a lot of refactoring. But is it arguable the pros of maintaining an ImmutableList cast can outweigh the cons?
I agree with your rationale. If you are using the Guava collection library and your lists are immutable then passing them as ImmutableList is a good idea.
However:
I know there is a risk of ImmutableList itself becoming deprecated, and the day Oracle decides to create its own ImmutableList will require a lot of refactoring.
The first scenario seems unlikely, but it is a risk you take whenever you use any 3rd-party library. But the flipside is that you could chose to not upgrade your application's Guava version if they (Google) gratuitously deprecated a class or method that you relied on.
UPDATE
Louis Wasserman (who works for Google) said in a comment:
"Guava provides pretty strong compatibility guarantees for non-#Beta APIs."
So we can discount the possibility of gratuitous API changes.
The second scenario is even more unlikely (IMO). And you can be sure that if Oracle did add an immutable list class or interface, that would not require you to refactor. Oracle tries really hard to avoid breaking existing code when they enhance the standard APIs.
But having said that, it is really up to you to weigh up the pros and cons ... and how you would deal with the cons should the eventuate.
Unfortunately, there's no corresponding interface in Java (and most probably never will be). So my take is to pretend that ImmutableList is an interface. :D But seriously, it add important information which shouldn't get lost.
The ancient rule it all comes from actually states something like "program against interfaces". IIRC at the time the rules was created, there was no Java around and "interface" means programming interface, i.e., the contract, not java interface.
A method like
void strange(ArrayList list) {...}
is obviously strange, as there's no reason not to use List. A signature containing ImmutableList has a good reason.
I know there is a risk of ImmutableList itself becoming deprecated, and the day Oracle decides to create its own ImmutableList will require a lot of refactoring.
You mean Java 18? Let's see, but Guava's ImmutableList is pretty good and there's not much point in designing such a class differently. So you can hope that most changes will be in your imports only. And by 2050 there'll be worse problems than this.
Keep using List rather than ImmutableList! There is no problem with that and no reason for your API to start using ImmutableLists explicitly for several reasons:
ImmutableList is Guava only and unlikely to become standard Java at any point. Don't tie your code and coding habits to a third party library (even if it is a cool one like Guava).
Using immutable objects is good practice in Java and of particular importance when developing an API (see Effective Java Item 15 - minimize mutability). It is a general concept that can be taken for granted and does not need to be conveyed in the name of interfaces. Equally, you would not consider calling a User class that is designed for inheritance UserThatCanBeSubclassed.
In the name of stability your API should NEVER start modifying a List that was passed into it and ALWAYS make a defensive copy when passing a List to a client. Introducing ImmutableList here would lure you and the clients of your API into a false sense of security and entice them to violate that rule.
I understand your dilemma.
Personnaly, I would advise to keep using List as the reference type (to be future-proof and benefit from polymorphism), and use an #Immutable annotation to convey the information that it is immutable.
Annotations are more visible than plain javadoc comments, and you can even use the one from JSR-305 (ex-JCIP).
Some static analysis tools can even detect it and verify that your object is not mutated.
I would rather stay with just List for method parameter. There is no much benefit to enforce the caller to pass ImmutableList - it's your own method and you won't mutate list anyway, but you'd have method more reusable and generic.
As a return type, I would go with ImmutableList to let method users know that this list cannot be modified.

Having pairs of static and instanced methods that perform the same tasks?

While developing a two-dimensional vector class as part of a math library, I'm considering having static and instance method pairs for stylistic and usability reasons. That is, two equivalent functions but one is static & non-mutating, and the other is instanced & mutating. I know I'm not the first person to consider this problem (See here, for example) but I haven't found any information that directly addresses it.
Pros of having static and instance method pairs:
Some people prefer to use one or the other and in some cases being able to choose makes code easier to read.
It is implied that static methods are not mutating when both static and instanced methods are provided. This can make the calling code much clearer, e.g.:
someVector = Vector2d.add(vec1, vec2);
someVector = (new Vector2d(vec1)).add(vec2); // does the same thing although more convoluted.
// similarly adding directly to a vector is simpler with a mutator method.
someVector.add(vec2);
someVector = Vector2d.add(someVector, vec2);
This is especially important when long chains of function calls are used, which is common with vectors.
In-place operations can be faster computationally than creating a new instance for every operation. The user decides when performance is important. For users of a Vector class, performance may be important as vectors are frequently used in computationally expensive code.
Pros of having only static or instance methods, but not both:
No significant code redundancy. Easier to maintain.
Less bloat. The javadocs will be almost half the size.
Not necessary to inform users that static methods never mutate and non-getter instanced methods always mutate.
How frowned upon is having static/instance method pairs? Is it used in any major libraries?
Is the pattern "static methods don't mutate, instance methods do" widely known?
I think your concept of providing both static/immutable and instance/mutable methods is a good one. I think the distinction is easy to explain and will be easy for the API users to understand and remember.
I think your API implementation code will not have redundant business logic. You will find that that you repeat a pattern where the static implementation creates a new instance and calls the instance method on that new instance.
Given that I am lazy, I would look at building a bit of infrastructure that would auto-generate the static methods, their javadoc and their unit tests at compile-time. This would be overkill if you have 10 methods, but becomes a big win if you have 1,000 methods.
On the first part, "static methods don't mutate", that's widely used in OOP. I haven't heard of it being expressed explicitly. But it is common sense: "If you change an object, why would the method be static if it could be an instance method?" So I completely agree with the "static methods don't mutate".
On the second part, "instance methods do [mutate]", that's actually not as widely used. It rather depends on whether you decide your design to apply immutability or mutability. Examples from the Java API: java.lang.String is immutable, java.util.Date is mutable (most likely by accident / bad design), java.lang.StringBuilder is mutable intentionally (that's its purpose). Mutability can lead to defensive cloning in order to protect the code from mutation bugs. Whether this really is a problem depends on a few things:
Is it an API others will use? You never know how they will use your code... IMO it's more important to protect API code from mutation bugs than normal code.
How good is the unit test coverage? Would your unit tests find all the mutation bugs that might sneak in? If you follow TDD properly (Uncle Bob's 3 Laws of TDD), and it's non-API code, mutation bugs are very unlikely to sneak in without being instantly discovered.
If you have code that has to protect itself against mutation bugs using defensive cloning, how often is that code called? If defensive clones are created frequently, it might be better to use immutable objects than mutable objects. Basically this is the call of the number of calls of read-only methods (that would eventually defensively clone) of associating classes vs. the number of calls of mutator methods on the class itself.
Personally, I prefer immutable objects, I'm a fan of final (if I could change Java, I would make final the default for all fields and variables, and introduce a keyword var to make them non-final), and I try to do functional programming in Java, although it is not a functional programming language, as much as possible. From my experience I know that I spend significantly less time debugging my code than others (actually I run the Java debugger maybe twice a year or so). I do not have enough empirical data and proper analysis for creating any kind of "causal relationship" between experience, immutability, functional programming and correctness, therefore I will only say I believe that immutability and functional programming help for correctness, and you will have to come up with your own judgement on this.
Concluding on the second part, "instance methods do [mutate]" is the widely used assumption in case the object is mutable anyway, otherwise instance methods would clone.

Why ADTs are good and Inheritance is bad?

I am a long time OO programmer and a functional programming newbie. From my little exposure algebraic data types only look like a special case of inheritance to me where you only have one level hierarchy and the super class cannot be extended outside the module.
So my (potentially dumb) question is: If ADTs are just that, a special case of inheritance (again this assumption may be wrong; please correct me in that case), then why does inheritance gets all the criticism and ADTs get all the praise?
Thank you.
I think that ADTs are complementary to inheritance. Both of them allow you to create extensible code, but the way the extensibility works is different:
ADTs make it easy to add new functionality for working with existing types
You can easily add new function that works with ADT, which has a fixed set of cases
On the other hand, adding new case requires modifying all functions
Inheritance makes it easy to add new types when you have fixed functionality
You can easily create inherited class and implement fixed set of virtual functions
On the other hand, adding a new virtual function requires modifying all inherited classes
Both object-oriented world and functional world developed their ways to allow the other type of extensibility. In Haskell, you can use typeclasses, in ML/OCaml, people would use dictionary of functions or maybe (?) functors to get the inhertiance-style extensibility. On the other hand, in OOP, people use the Visitor pattern, which is essentially a way to get something like ADTs.
The usual programming patterns are different in OOP and FP, so when you're programming in a functional language, you're writing the code in a way that requires the functional-style extensibility more often (and similarly in OOP). In practice, I think it is great to have a language that allows you to use both of the styles depending on the problem you're trying to solve.
Tomas Petricek has got the fundamentals exactly right; you might also want to look at Phil Wadler's writing on the "expression problem".
There are two other reasons some of us prefer algebraic data types over inheritance:
Using algebraic data types, the compiler can (and does) tell you if you have forgotten a case or if a case is redundant. This ability is especially useful when there are many more operations on things than there are kinds of thing. (E.g., many more functions than algebraic datatypes, or many more methods than OO constructors.) In an object-oriented language, if you leave a method out of a subclass, the compiler can't tell whether that's a mistake or whether you intended to inherit the superclass method unchanged.
This one is more subjective: many people have noted that if inheritance is used properly and aggressively, the implementation of an algorithm can easily be smeared out over a half a dozen classes, and even with a nice class browser at can be hard to follow the logic of the program (data flow and control flow). Without a nice class browser, you have no chance. If you want to see a good example, try implementing bignums in Smalltalk, with automatic failover to bignums on overflow. It's a great abstraction, but the language makes the implementation difficult to follow. Using functions on algebraic data types, the logic of your algorithm is usually all in one place, or if it is split up, its split up into functions which have contracts that are easy to understand.
P.S. What are you reading? I don't know of any responsible person who says "ADTs good; OO bad."
In my experience, what people usually consider "bad" about inheritance as implemented by most OO languages is not the idea of inheritance itself but the idea of subclasses modifying the behavior of methods defined in the superclass (method overriding), specifically in the presence of mutable state. It's really the last part that's the kicker. Most OO languages treat objects as "encapsulating state," which amounts to allowing rampant mutation of state inside of objects. So problems arise when, for example, a superclass expects a certain method to modify a private variable, but a subclass overrides the method to do something completely different. This can introduce subtle bugs which the compiler is powerless to prevent.
Note that in Haskell's implementation of subclass polymorphism, mutable state is disallowed, so you don't have such issues.
Also, see this objection to the concept of subtyping.
I am a long time OO programmer and a functional programming newbie. From my little exposure algebraic data types only look like a special case of inheritance to me where you only have one level hierarchy and the super class cannot be extended outside the module.
You are describing closed sum types, the most common form of algebraic data types, as seen in F# and Haskell. Basically, everyone agrees that they are a useful feature to have in the type system, primarily because pattern matching makes it easy to dissect them by shape as well as by content and also because they permit exhaustiveness and redundancy checking.
However, there are other forms of algebraic datatypes. An important limitation of the conventional form is that they are closed, meaning that a previously-defined closed sum type cannot be extended with new type constructors (part of a more general problem known as "the expression problem"). OCaml's polymorphic variants allow both open and closed sum types and, in particular, the inference of sum types. In contrast, Haskell and F# cannot infer sum types. Polymorphic variants solve the expression problem and they are extremely useful. In fact, some languages are built entirely on extensible algebraic data types rather than closed sum types.
In the extreme, you also have languages like Mathematica where "everything is an expression". Thus the only type in the type system forms a trivial "singleton" algebra. This is "extensible" in the sense that it is infinite and, again, it culminates in a completely different style of programming.
So my (potentially dumb) question is: If ADTs are just that, a special case of inheritance (again this assumption may be wrong; please correct me in that case), then why does inheritance gets all the criticism and ADTs get all the praise?
I believe you are referring specifically to implementation inheritance (i.e. overriding functionality from a parent class) as opposed to interface inheritance (i.e. implementing a consistent interface). This is an important distinction. Implementation inheritance is often hated whereas interface inheritance is often loved (e.g. in F# which has a limited form of ADTs).
You really want both ADTs and interface inheritance. Languages like OCaml and F# offer both.

Can excessive use of final hurt more than do good?

Why are people so emphatic about making every variable within a class "final"? I don't believe that there is any true benefit to adding final to private local variables, or really to use final for anything other than constants and passing variables into anonymous inner classes.
I'm not looking to start any sort of flame war, I just honestly want to know why this is so important to some people. Am I missing something?
Intent. Other people modifying your code won't change values they aren't supposed to change.
Compiler optimizations can be made if the compiler knows a field's value will never change.
Also, if EVERY variable in a class is final (as you refer to in your post), then you have an immutable class (as long as you don't expose references to mutable properties) which is an excellent way to achieve thread-safety.
The downside, is that
annoy it is hard
annoy to read
annoy code or anything
annoy else when it all
annoy starts in the
annoy same way
Other than the obvious usage for creating constants and preventing subclassing/overriding, it is a personal preference in most cases since many believe the benefits of "showing programmer intent" are outweighed by the actual code readability. Many prefer a little less verbosity.
As for optimisations, that is a poor reason for using it (meaningless in many cases). It is the worst form of micro optimisation and in the days of JIT serves no purpose.
I would suggest to use it if you prefer, don't if you that is what you prefer. Since it will all come down to religious arguments in many cases, don't worry about it.
It marks that I'm not expecting that value to change, which is free documentation. The practice is because it clearly communicates the intent of that variable and forces the compiler to verify that. Beyond that, it allows the compiler to make optimizations.
It's important because immutability is important particularly when dealing with a shared memory model. If something is immutable then it's thread safe, that makes it good enough an argument to follow as a best practice.
http://www.artima.com/intv/blochP.html
One benefit for concurrent programming which hasn't been mentioned yet:
Final fields are guaranteed to be initialized when the execution of the constructor is completed.
A project I'm currently working on is setup in a way that whenever one presses "save" in Eclipse, the final modifier is added to every variable or field that is not changed in the code. And it hasn't yet hurt anybody.
There are many good reasons to use final, as noted elsewhere. One place where it is not worth it, IMO, is on parameters to a method. Strictly speaking, the keyword adds value here, but the value is not high enough to withstand the ugly syntax. I'd prefer to express that kind of information through unit tests.
I think use of final over values that are inner to a class is an overkill unless the class is probably going to be inherited. The only advantage is around the compiler optimizations, which surely may benefit.

Categories