Why isn't commons-lang in the java Standard API? - java

It seems like every java project I join or start on always has commons-lang as a dependency - and for good reason. commons-lang has tons of classes and utility methods that are pretty standard fair with the most standard APIs in other languages. Why hasn't Sun/Oracle/JCP adopted some of the things in commons-lang in to the standard api?

As pointed out already, some features in the commons API have made it into Java, often implemented (IMHO) better than they were originally in the commons library. Enums is the classic example.
In terms of why they don't adopt more of commons-lang, well with some classes there's the element of confusion. Take StrBuilder for example, it's more powerful than the Java StringBuilder and it's extensible. But I'm not sure I'd be for adding such a class into the Java core API, StringBuilder/StringBuffer are perfectly good enough for most purposes and having another one in there would really just become a bit confusing. They couldn't really alter StringBuilder in a way that would accommodate all of the changes either because that could break existing code. Even if they did add one, what about when someone else came along with another more powerful version? StrBuilder2? Before long everything's a big mess (some argue that the core API is already, let alone with such additions.)
And as always with these things, the big point is what should be included from commons-lang. Some people would probably want to see the MutableXXX classes added, others the XXXUtils classes, others the time package... there isn't really a common consensus.
The other big thing is that the Java developers have to be a lot more careful what goes in the core Java API than the Apache developers do for commons-lang. If a crappy design in commons-lang is superseded in a future release, the old one can be deprecated and subsequently removed (indeed this seems to be what happens.) In the core Java API it needs to stay for backwards compatibility reasons, just causing more clutter.
For what it's worth though I do think more of the functionality in commons-lang should probably be included. I can just see the reasons, at least in part, why it's not.

Historically Apache Commons implemented some of the features that later were introduced in Java 5, such as enums and annotations. Their implementation was sufficiently different to make integration difficult.

Related

How to make inline taglets (which require com.sun) more cross-platform? Is there a non-Oracle/more-cross-platform javadoc parser?

I'm writing a library that inserts already unit-tested example code (its source-code, output, and any input files) into JavaDoc, with lots of customization possibilities. The main way of using this library is with inline taglets, such as
{#.codelet.and.out my.package.AGreatExample}
{#.codelet my.package.AGreatExample}
{#.file.textlet examples\doc-files\an_input_file.txt}
{#.codelet.and.out my.package.AGreatExample%eliminateCommentBlocksAndPackageDecl()}
Since custom taglets (and even doclets) require com.sun, this means they're not nearly as cross platform as Java itself. (Not sure if this is relevant, but the word "javadoc"--and even the substring "doc"--is not in the Java 8 Language Specifications.)
I don't like the idea of writing a library that's limited in this way. So what do I do? My thoughts so far are that
In order to take advantage of the existing javadoc parser, I stick with the com.sun taglets. However, I make this reliance on com.sun as "thin" as can be. That is, I put as little code in the taglet class as possible, leaving the bulk of the code elsewhere, where there is no reliance on com.sun.
I work towards creating my own parser, which only searches for my specific taglets. This is a pain, but not too horrible. You iterate through the lines of each Java source file, searching for \{#\.myTagletName (.*?)\}. Once you capture that text, it's pretty much the same as the code within the com.sun taglet.
This parser would have to be run before executing javadoc, and would therefore require a duplicate directory structure. (1) your original code, with the unparsed custom tags, (2) the duplicate of that code, with parsed-output. I'd copy all code to the duplicate directory, and then parse only those Java files known to have these taglets (classes that are "registered" in some way with the parser).
Is this a reasonable approach? Is there a more cross-platform javadoc/taglet parser out there already, so I don't have to roll my own? Is there anything cross-platform that is taglet-like already out there? Is JavaDoc itself not cross platform, or just custom taglets and doclets?
I'd like a rough perspective on how many people I'm locking out of my library because of this decision (to use inline taglets), but mostly I'm looking for a long term solution.
(Despite my Java 8 link above, I'm using Java 7.)
Credit to #fge for the taglet suggestion, which is more elegant than my original idea, and to #Michael for the ominous-but-helpful com.sun warnings.
At first, note that there is a difference between sun.* and com.sun.* dependencies. The sun.* namespace contains classes that implement Oracle's Java Virtual Machine. You should not use such dependencies because the Oracle JVM's internal API can change in future releases and because this namespace may not be provided by other, non-Oracle JVM implementations. (In practice, even Android's JVM ships with one of the more widely used sun.* classes.)
Then there is the com.sun.* namespace which was used by Sun Microsystems for implementing its Java applications. An example for legal use of com.sun.* dependencies is Sun's Jersey framework which was originally deployed in the com.sun.jersey.* namespace. (For the sake of completeness, note that recent Jersey versions are deployed in the org.glassfish.jersey.* namespace beginning with version 2.0 which is incompatible to the Jersey 1 API.) For further reference, note how Oracle does not even mention the com.sun.* namespace when discussing the problems that are imposed by using the sun.* namespace. Also, see this related question on Stack Overflow.
Therefore, using com.sun.* dependencies is a different deal compared to sun.* dependencies. By using com.sun.* classes, you rather lock yourself to a specific library's API, not to a specific JVM. For example, you can avoid direct use of the com.sun.jersey.* namespace by using the standardized JAX-RS javax.ws.rs.* namespace. In this sense, com.sun.* dependencies are product specific and proprietary and must not be confused with Java's standardized APIs which are usually found in the javax.* namespace.
If I was you, I would stick with the taglets which is a mature and recognized implementation. Oracle is pretty determined not to break APIs (otherwise, they would probably also move the taglets to com.oracle.*) and I see no reason why they would suddenly change the taglet package structure. And if they would, you merely need to update your tech. If your application breaks for a new Java release, your users will come looking for an update of your software. Because you do not run the taglet project, I agree with you that detaching your logic from a foreign API is in general a good idea as it is for any dependency. Also, using taglets for your use case pretty much recognizes the KISS and DRY principles.

Java SE and Scala Standard Library - cases when one of them is preferable

We all know that one can use Java libraries from Scala and vice versa. But even looking over the surface of Java SE and Scala standard library, we can notice that there are many parts in them that solve identical or at least similar problems. The trivial examples are collections, concurrency and IO. I am not an expert in either of two, but I suspect that in general Java SE is broader in size while Scala SL contains more conceptually advanced features (such as actors). The question is, if we have access to both libraries and have an opportunity to use both languages, are there some recommendations when we should choose Java SE features over Scala SL?
In general, when writing in Scala, I would advise always using the Scala libraries over the Java ones. My advice on specific areas would be:
Collections - Scala's are much better, and I would always prefer them over the Java equivalents. Scala does however lack a mutable TreeMap, so if you need that sort of structure you'll have to go back to Java.
Concurrency - Scala's concurrency features wrap Java's and are more advanced. I'd always pick them.
IO - I think this is one area where Scala is narrower in what it supports. I would generally use a Scala Source when possible, but there may be more unusual situations where you'll have to drop back to Java IO (or possibly use a third party library).
Swing - Last time I looked, Scala's swing wrapping wasn't complete, so if you're doing a lot of Swing related stuff, you might take the decision to use Java's swing components everywhere for consistency.
Scala Libraries fit into two general categories:
Original Scala Libraries. These are entirely (or almost entirely) written in Scala. Usually, people will write libraries from scratch for good reasons. Maybe Java lacks a similar library, or maybe whoever wrote the Scala one thinks the Java equivalent has serious limitations.
Collections is one such example.
Scala Wrappers over Java Libraries. In these cases, Scala uses an adapter pattern (or one of the other similar patterns) to provide a Scala-friendly API. These APIs are more fluent, integrate well with important Scala classes (such as collections and Option), and often make use of powerful Scala features such as traits to decrease boilerplate.
These libraries rarely offer more functionality than what Java provides, but reduces boilerplate enormously and makes code using them more idiomatic. Often, however, they present just a subset of the total functionality provided by Java. Depending on the library, it may or may not be possible or easy to extend it by accessing the underlying Java classes.
Scala Swing is great example of these.
In the particular case of scala.io, that is not so much a library as a crude wrapper just to handle simple common scripting tasks with an idiomatic Scala API. It's adequate for that -- and certainly much kinder on my eyes than java.io --, but not for any serious I/O. There's a real I/O library for Scala currently undergoing evaluation for adoption.
Another example I like a lot if scala.sys.process. It wraps over Java's Process and ProcessBuilder, providing almost all of the functionality, and adding some. Furthermore, you can use most of Java internals if needed (the sole exception is Process itself, which isn't really much useful).
My advice is to use Scala libraries were they exist and fit your needs, extend them if they are mostly adequate, but reach for Java libraries without hesitation otherwise. After all, having a high degree of interoperability with Java is a key feature of Scala.

Clojure libraries lacking (so use Java)...?

From Clojure it is easy enough to use Java libraries...but what libraries does Clojure not have that are best done with Java?
It isn't easy to give a straightforward question to this answer, because it would be first necessary to define the difference between a Clojure library and a Java library. (Even more so, because Clojure is a Java library :))
Ok, let's start with a premise that a Clojure library is any library written in Clojure and simply ignore the Java code in Clojure implementation itself. But, what if given library uses some Java dependency, like say one of Apache Commons libraries? Would it still qualify as a Clojure and not Java library?
My own criterion (and I am guessing yours, too) for the difference between the two is whether or not the library exposes a Clojure-style interface with namespaces, functions, sequences or a Java-style interface with classes, methods and collections.
It is almost trivial to write Clojure wrappers around such Java libraries. In my experience that is very useful if you want to fit in functionality of the library in overall functional design of your application. A simple example would be if you want to map a Java method against a sequence. You can either use an ad-hoc defined anonymous function to wrap the method call, or a named function from your wrapper layer. If you do such things very often the second approach may be more suited, at least for most commonly used methods.
So, my conclusion is that any Java library should be easy to convert to a Clojure library. All that is needed is to write a wrapper for it.
Another conclusion is that it may not be needed at all. If all you want is to call the method, you may still just call the method and avoid all the architecture astronautics. :)
One potential answer may be a bytecode library like ASM http://asm.ow2.org/
But honestly, with time, any library in Java can be written in clojure. Some Java code that compiled to different bytecode can be replicated if clojure uses ASM underneath.
I strongly prefer Clojure as a language for development in general, but there are several good reasons I have found for using Java libraries or writing Java code in preference to Clojure:
Leveraging mature Java libraries - some Java libraries are truly excellent and very mature. From a pragmatic perspective, you are much better off directly using Java libraries like Netty, Swing or Joda Time rather than trying to utilise or invent some Clojure alternative. Sometimes there are Clojure wrappers for these libraries but these are mostly still in a somewhat experimental / immature state.
High performance code - I do quite a lot of data and image processing where maximum performance in essential. This rules out pretty much any approach that adds overhead (such as lazy sequences, temporary object creation) so idiomatic Clojure won't fit the bill. You could probably get there with very unidiomatic Clojure (lots of tight imperative loops and primitive array manipulation for example...) but if you're going to write this kind of code it's often actually simpler and cleaner in Java
APIs with mutable semantics - if the APIs you are relying upon depend upon mutable objects, Clojure code to interface with these APIs can become a bit ugly and unidiomatic. Sometimes writing Java in these cases is simpler.
The good news is that because the interoperability between Clojure and Java is so good, there isn't really any issue with mixing Clojure and Java code in a project. As a result, most of my projects are a mix of Clojure and Java code - I use whichever one is most appropriate for the task at hand.
Libraries for building GUIs comes to mind.
Lots of APIs. In fact, Clojure itself is built on top many sturdy Java APIs like the java.util.Collection API. And well known Clojure APIs like Incanter are built on top of libraries like Parallel Colt, and JFreeChart.
I can't find the quote at the moment; but Rich said somthing to the effect of "clojure should use java where possible" and not wrap java unnecessarily. The principal being to embrace the java platform instead of fighting it. so the general advice becomes:
If a good java library exists use it, if not write one in clojure.

Strategies for migrating medium-sized code base from Java 1.4.2 to Java 5

I'm in the process of reviewing a code base (~20K LOC) and trying to determine how to migrating it from 1.4.2 to 5. Obviously, it's not an overnight project and the suggestion which I have received is to write new code against Java 5 and migrate the old code in a piece-meal fashion. Also, I'm no expert in the new features in Java 5 (i.e. I know of them, but have never written any for production use).
My questions:
What features of Java 5 are typically used in production code? (i.e. generics, auto-boxing, etc.) Are there features to be avoided / not considered to be best-practices?
What are the best refactoring strategies which I can use migrate a code base of this size? (i.e. make changes to classes one at a time only when a class is edited, etc.) Objective - reduce risk on the code base. Limitation - resources to do refactoring.
Any advice is appreciated - thanks in advance.
UPDATE - a year too late, but better late than never? =)
Thank you for all of the comments - lots of great points of view. In the life of a software developer, there's always going to be the projects you strive to finish but never get around to because of something more "urgent".
With respect to the use of Java 5 (at that time), it was something which was required in the client's production environment, so that was why we did not use Java 6.
I found that the stronger typing for collections, enums and unboxing of primitives were the features I tend to apply the most, both to old and new code. The refactoring was fairly straight-forward, but code comprehension improved significantly and standards became easier to enforce. The ones I had the most trouble with was the generics; I think it's a concept which I still haven't had a chance to fully grasp and appreciate yet, so it was difficult for me to find previous cases where the application of generics was appropriate.
Thanks again to everyone who contributed to this thread and apologies for the late follow up.
Java 5 is almost completely backwards compatible with Java 4. Typically, the only change you must make when you migrate is to rename any usages of the new enum keyword in the Java 4 code.
The full list of potential compatibility problems is listed here:
http://java.sun.com/j2se/1.5.0/compatibility.html
The only other one that I've run into in practice is related to the change in the JAXP implementation. In our case, it simply meant removing xerces.jar from the classpath.
As far as refactoring goes, I think that migrating your collection classes to use the new strongly-typed generic versions and removing unnecessary casting is a good idea. But as another poster pointed out, changing to generic collections tends to work best if you work in vertical slices. Otherwise, you end up having to add casting to the code to make the generic types compatible with the non-generic types.
Another feature I like to use when I'm migrating code is the #Override annotation. It helps to catch inheritance problems when you're refactoring code.
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Override.html
The new concurrency library is very useful if your code uses threading. For example, you may be able to replace home-grown thread pools with a ThreadPoolExecutor.
http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#concurrency
I would definitely take the approach of updating the code as you change it during normal maintenance. Other than the compatibility issues, I don't think there is a compelling reason to use the new Java 5 features unless you're already changing the code for other reasons.
There is one very real issue with the "viral" nature of generics; once you start introducing them at a given layer in an architecture you generally want to introduce it at the layer above & below as well. I have found that introducing generics is probably best done in full "verticals". But you do not have to do all the verticals at once.
This is a really hard question to answer because it depends on what code will be affected and how critical that code is.
First and foremost, when migration is a nontrivial undertaking, do yourself a favour and upgrade to the latest version of Java, which would be Java 6 not Java 5. Java 6 has been out for a year and a half or more and is mature. There's no reason to not pick it over Java 5 (imho).
Secondly, like any software project, your goal should be to get something into production as soon as you possibly can. So you need to identify a slice of your system. The smaller the better, the more non-cdritical, the better.
The other thing to do is just try starting up your app under Java 6 and seeing what breaks. It might be worse than you expected. It might be much better.
The other thing you'll probably need to be aware of is that by the sounds of it you will have jars/libraries in your app that have since been deprecated. Some may not even be compatible with Java beyond 1.4.2. You will probably want to upgrade all of these to the latest version as well.
This will probably mean more stuff breaking but using old/deprecated APIs is just kicking the can down the street and causes you other problems.
There are exceptions to this where upgrading can have far-reaching consequences. Axis1 to Axis2 comes to mind. Those situations require more careful thought.
As for what features are used... all of them pretty much. I can't think of any that should be avoided off the top of my head.
Also, I just noticed the size of your project: ~20K LOC. That's actually quite small (eg I've written an app about that size in the last 3 months by myself).
Lastly, this also depends on how easily you will find things that break. If you have good unit test coverage then great. That's pretty rare though. If you can just run through the app and reliably find problems it's not too bad.
The problematic situations are where scenarios are hard to test and it's likely you won't uncover problems straight away. That calls for more caution.
You would want to migrate stuff that doesn't work in the transition from 1.4 to 5 (not sure what that would be), but I'd be wary of migrating stuff for the sake of it.
If you do take this route, some questions:
Do you have comprehensive test coverage ? If not, you should write unit tests for the code you're going to be migrating.
Do you have components that are widely used within your codebase ? If so, they are probably candidates to be migrated in terms of their API (e.g. using generics etc.)
In terms of what's widely used from Java 5. Generics is important and makes your life a lot easier. I don't see autoboxing too much, nor enums (this is all relative). Varargs almost never. Annotations are useful for frameworks, but I consume these. I don't think I've ever implemented one myself.
20 (non-comment) kloc should be small enough to insert generics with a big bang. Obviously make sure your code compiles an runs on Java SE 5 first. The relatively easy thing about generics is that adding them makes very little change to semantics (certain overloadings can change because of implicit cases - Iterator<char[]> iter; ... System.out.println(iter.next()); as a bad example off the top of my head).
Some cases adding generics will highlight conceptual problems with the code. Using one Map as two maps with disjoint key sets, for example. TreeMap is an example in the Java library where a single class has two distinct mode (using Comparator<T> or Comparable<T>).
Things like enhanced-for and auto-boxing are very local and can be added piecemeal. enums are rarer and might take some thinking about how you are actually going to use them.
I think you're going about this the wrong way. Your plan shouldn't be to update all current code to Java 1.5, your plan should be to ensure that all current code runs exactly the same in 1.5 as it did in 1.4.2, and that all future code written will work fine in 1.5.
I've gone through a few transitions like this of varied sized code bases. The goal was always to make sure we had a ton of unit tests so that we could easily plug in 1.5 and run our tests through it. We actually encountered about 10 problems, mostly related to regular expression libraries not supporting something or supporting something differently.
Write all new code in 1.5 then, and if you change an older class for whatever reason, spend a minute and implement generics, but there's no reason to refactor everything. That sounds a bit dangerous to me if you don't have the tests in place.

Genericized commons collection

I'm astonished that the Apache Commons Collections project still hasn't got around to making their library generics-aware. I really like the features provided by this library, but the lack of support for generics is a big turn-off. There is a Lavalabs fork of Commons Collections which does support generics, which seems to claim backward compatibility, but when I tried updating to this version, my web application failed to start (in JBoss).
My questions are:
Whether anyone has successfully updated from Commons Collections to the fork mentioned above
If Commons Collections has any plans to add support for generics
BTW, I'm aware of Google collections, but am reluctant to use it until the API stabilises.
Cheers,
Don
Consider Google Collections. From their Javalobby interview:
[Google Collections is] built with Java 5 features: generics, enums, covariant return types, etc. When writing Java 5 code, you want a collections library that takes full advantage of the language. In addition, we put enormous effort into making the library complete, robust, and consistent with the JDK collection classes.
There are contributions. Checkout the jira's
There is also a JDK5 branch.
We do would like to add generics and update Commons Collections to 1.5 (and 1.6). The biggest problem is how to address backwards compatibility. And people have very different opinions there. For some of the Commons components the newer JDK almost asks for a rewrite for the new JDKs IMHO.
During ApacheCon I felt the urge across several people to get this moving though. It's just a big task.
Feel free to show up on dev#commons.apache.org
cheers,
Torsten
Given that the last word in Jakarta's own internal debate was in Dec 07, I would say that Apache will not embrace generics, leaving the field open for something Java5 friendly like Google Collections.
I say, bite the bullet and switch to google-collections, at least for new code.
I know you're concerned about stability, but the google-collections library is VERY close to stable for 1.0 release -- hang out on the dev list or watch their reported issues, they are already very very cautious about changes, especially breaking ones. Any incompatibilities between the current release and the (seemingly imminent) 1.0 final are going to be extremely tiny.
Also, if you're worried about stability, pick a version (e.g. the current one, 1.0 RC4), and... just don't upgrade. Sure, you won't get any new features, but commons-collections hasn't been updated in a meaningful way in several years, so are you really any worse off? At least you're frozen on something with generics and (IMHO) a much better API.
The general BC problem is that package org.apache.commons.collections has been renamed to org.apache.commons.collections15. I don't know the reason of this change. Try to rename it back, recompile the library and run your application again.
I have found this issue using Clirr tool on commons-collections-3.2.1.jar (from Apache) and collections-generic-4.01.jar (from Lavalabs).
I can't imagine what reason you can have to don't use google collections. It's quite simple to use that library.
For my work i use both, apache collections and google collections.
can you explain more about why you can't use google collections?
regards
There's a genericised port of Commons Collections 3.1 available here, which we've been using for a few years now. Does the job nicely, and since it's based strictly on the existing Commons source, it has a stable API.
It could use updating to conform to Commons Collections 3.2, though.
Have a read on the collection blog, it provide the completed understanding of the collection framework.
http://tech.konnectingtheworld.com/2010/09/a-note-on-java-collections/
If you feel that you query has not been answered, get in touch with me. I shall try to provide you the information as much as I can.

Categories