Understanding enumset for enumerators

Understanding enumset for enumerators - java

As far as I understood it would be much easier and clearler to use EnumSet and that the Set was specifically designed for using with Enumerators. My question is whether we should consider to use the EnumSet every time we need to maintain some collection of enumerators. For instance, I have he following enum:
public enum ReportColumns{
ID,
NAME,
CURRENCY
//It may contain much more enumerators than I mentioned here
public int getMaintenanceValue(){
//impl
}
}
And I need to use some collection ofthe enum in the method:
public void persist(Collection<ReportColumns> cols){
List<Integer> ints = new LinkedList<>();
for(ReportColumn c: cols){
ints.add(c.getMaintenanceValue());
}
//do some persistance-related operations
}
so, if I don't care about if the collection is ordered or not should I use EnumSet<E> every time to improve performance?

As a rule of thumb, whenever you create a collection of enums and don't care about their order, you should create an EnumSet. As you mentioned, it would give you a slight increase in performance (in fact, most static code analysis tools I know actually warn about not using it).
For a method declaration though, I wouldn't. As another rule of thumb, methods should use the "highest" type possible that still makes sense. The public contract of this method should be "give me a bunch of enums, and I'll persist them". The method shouldn't care what collection is passed, so it makes no sense forcing the parameter type to be an EnumSet. If the concrete type you pass is indeed an EnumSet you'll get all the performance benefits anyway.

Related

Java variable type Collection for HashSet or other implementations?

I have often seen declarations like List<String> list = new ArrayList<>(); or Set<String> set = new HashSet<>(); for fields in classes. For me it makes perfect sense to use the interfaces for the variable types to provide flexibility in the implementation. The examples above do still define which kind of Collections have to be used, respectively which operations are allowed and how it should behave in some cases (due to docs).
Now consider the case where actually only the functionality of the Collection (or even the Iterable) interface is required to use the field in the class and the kind of Collection doesn't actually matter or I don't want to overspecify it. So I choose for example HashSet as implementation and declare the field as Collection<String> collection = new HashSet<>();.
Should the field then actually be of type Set in this case? Is this kind of declaration bad practice, if so, why? Or is it good practice to specify the actual type as less as possible (and still provide all required methods). The reason why I ask this is because I have hardly ever seen such a declaration and lately I get more an more in the situation where I only need to specify the functionality of the Collection interface.
Example:
// Only need Collection features, but decided to use a LinkedList
private final Collection<Listener> registeredListeners = new LinkedList<>();
public void init() {
ExampleListener listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
}
public void reset() {
for (Listener listener : registeredListeners) {
unregisterListenerSomewhere(listener);
}
registeredListeners.clear();
}

Since your example uses a private field it doesn't matter all that much about hiding the implementation type. You (or whoever is maintaining this class) can always just go look at the field's initializer to see what it is.
Depending on how it's used, though, it might be worth declaring a more specific interface for the field. Declaring it to be a List indicates that duplicates are allowed and that ordering is significant. Declaring it to be a Set indicates that duplicates aren't allowed and that ordering is not significant. You might even declare the field to have a particular implementation class if there's something about it that's significant. For example, declaring it to be LinkedHashSet indicates that duplicates aren't allowed but that ordering is significant.
The choice of whether to use an interface, and what interface to use, becomes much more significant if the type appears in the public API of the class, and on what the compatibility constraints on this class are. For example, suppose there were a method
public ??? getRegisteredListeners() {
return ...
}
Now the choice of return type affects other classes. If you can change all the callers, maybe it's no big deal, you just have to edited other files. But suppose the caller is an application that you have no control over. Now the choice of interface is critical, as you can't change it without potentially breaking the applications. The rule here is usually to choose the most abstract interface that supports the operations you expect callers to want to perform.
Most of the Java SE APIs return Collection. This provides a fair degree of abstraction from the underlying implementation, but it also provides the caller a reasonable set of operations. The caller can iterate, get the size, do a contains check, or copy all the elements to another collection.
Some code bases use Iterable as the most-abstract interface to return. All it does is allow the caller to iterate. Sometimes this is all that's necessary, but it might be somewhat limiting compared to Collection.
Another alternative is to return a Stream. This is helpful if you think the caller might want to use stream's operations (such as filter, map, find, etc.) instead of iterating or using collection operations.
Note that if you choose to return Collection or Iterable, you need to make sure that you return an unmodifiable view or make a defensive copy. Otherwise, callers could modify your class's internal data, which would probably lead to bugs. (Yes, even an Iterable can permit modification! Consider getting an Iterator and then calling the remove() method.) If you return a Stream, you don't need to worry about that, since you can't use a Stream to modify the underlying source.
Note that I turned your question about the declaration of a field into a question about the declaration of method return types. There is this idea of "program to the interface" that's quite prevalent in Java. In my opinion it doesn't matter very much for local variables (which is why it's usually fine to use var), and it matters little for private fields, since those (almost) by definition affect only the class in which they're declared. However, the "program to the interface" principle is very important for API signatures, so those cases are where you really need to think about interface types. Private fields, not so much.
(One final note: there is a case where you need to be concerned about the types of private fields, and that's when you're using a reflective framework that manipulates private fields directly. In that case, you need to think of those fields as being public -- just like method return types -- even though they're not declared public.)

As with all things, it's a question of tradeoffs. There are two opposing forces.
The more generic the type, the more freedom the implementation has. If you use Collection you're free to use an ArrayList, HashSet, or LinkedList without affecting the user/caller.
The more generic the return type, the less features there are available to the user/caller. A List provides index-based lookup. A SortedSet makes it easy to get contiguous subsets via headSet, tailSet, and subSet. A NavigableSet provides efficient O(log n) binary search lookup methods. If you return Collection, none of these are available. Only the most generic access functions can be used.
Furthermore, the sub-types guarantee special properties that Collection does not: Sets hold unique items. SortedSets are sorted. Lists have an order; they're not unordered bags of items. If you use Collection then the user/caller can't necessarily assume that these properties hold. They may be forced to code defensively and, for instance, handle duplicate items even if you know there won't be duplicates.
A reasonable decision process might be:
If O(1) indexed access is guaranteed, use List.
If elements are sorted and unique, use SortedSet or NavigableSet.
If element uniqueness is guaranteed and order is not, use Set.
Otherwise, use Collection.

It really depends on what you want to do with the collection object.
Collection<String> cSet = new HashSet<>();
Collection<String> cList = new ArrayList<>();
Here in this case if you want you can do :
cSet = cList;
But if you do like :
Set<String> cSet = new HashSet<>();
the above operation is not permissible though you can construct a new list using the constructor.
Set<String> set = new HashSet<>();
List<String> list = new ArrayList<>();
list = new ArrayList<>(set);
So basically depending on the usage you can use Collection or Set interface.

Is it appropriate to extend List to add fields

The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?

This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)

One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch

If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.

The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.

Is it more efficient to use the class, e.g. Hashtable than the interface, e.g. Map?

Will the compiler generate better code if it knows the actual class it will be working with vs the interface?
For example, I refer to my actual class like so:
Hashtable<String,String> foo() {
Hashtable<String,String> table = new Hashtable<String,String>(100);
....
return table;
}
...
Hashtable<String,String> tbl = foo();
VS
Map<String,String> foo() {
Map<String,String> table = new Hashtable<String,String>(100);
....
return table;
}
...
Map<String,String> tbl = foo();
Will the first form be more efficient?
OK, summarizing the answers now. I wish I could mark both Thomas and Tagir as correct, but I can't.
Thomas is correct in that the "correct" behavior is to use the abstract interface (Map) rather than the concrete implementation (Hashtable). This is the proper abstraction of the data and allows for the underlying implementation to be changed at will.
Tagir is correct in that exposing the concrete class allows certain compiler optimizations -- possibly highly significant optimizations. However, knowing if this will work or not requires knowledge of the compiler internals or benchmarking, and is not portable. It probably does not work for Android.
Finally, if you care about performance, don't use Hashtable; it's obsolete and clunky. If you really care about performance, consider using arrays instead.

In terms of runtime efficiency both should be equal as in both cases a Hashtable is used.
In terms of design using Map would be better in most cases, i.e. when it is irrelevant which implementation of Map is used. Generally you should use interfaces where possible so that you can replace the implementation, e.g. use a HashMap instead.
The difference between Hashtable and HashMap for example, would mainly be due to thread safety, i.e. Hashtable is synchronized and thus threadsafe while HashMap yields better performance due to the lack of synchronization. If you'd use Map in your interface you could also use ConcurrentHashMap without having to change the caller and get thread safety along with performance (although I'm not sure about how much difference there would be between ConcurrentHashMap an Hashtable).

Apart from the fact that using Hashtable is itself a crime against performance (acquires a lock on each method call), the advantage of using a concrete class will only give a slight advantage to non-JIT-compiled code because the class's vtable can be reached directly, as opposed to a linear search through the itable lookup array, plus a dereference to the actual itable. The overhead of this search would become significant only if a class implemented many interfaces, which is not the case for Hashtable.
Unless your call sites dispatch to many different implementations of Map (hardly likely), the JIT-compiled code will be as efficent as if you used the concrete type on the variable. And for the simplest case where you say
Map<K,V> m = new Hashtable<>();
even the interpreter can statically determine that m will always point to a Hashtable and optimize accordingly.

Sometimes it's possible that returning a concrete class may improve the performance. For example, consider that you obtain something from the map:
Map<String,String> tbl = foo();
String result = tbl.get(something);
In case if tbl is the concrete class like Hashtable (and you have no Hashtable subclasses which overload the get method), the call to the method can be easily devirtualized and even inlined (this can be done even by simpler C1 "client" compiler). In case if you use the interface, the it would be harder to devirtualize. As far as I know, C1 cannot devirtualize the interface call with many implementors at all. C2 "server| compiler can do it, but it should rely on type profile. Thus your code should be hot enough and profile should not be polluted with other types.
However while in some specific benchmarks such difference can be orders of magnitude, in most of production code this is negligible and unlikely to be the performance bottle-neck.

emptyList() vs emptySet(), is there any reason to chose one over the other if an instance of Collection is needed?

In the JDK, there's Collection.emtpyList() and Collection.emptySet(). Both in their own right. But sometimes all that is needed is an empty, immutable instance of Collection. To me, there's no reason to chose one over the other as both implement all operations of Collection in an efficient way and with the same results. Yet each time I need such an empty collection I ponder which one to use for a second of two.
I do not expect to gain a deeper understanding of the collections framework from an answer to this question but maybe there's a subtle reason I could use to justify choosing one over the other without thinking about it ever again.
An answer should state at least one reason preferring one of Collection.emtpyList() and Collection.emptySet() over the other in a context where they're functionally equivalent. An answer is better if the stated reason is near the top of this list:
There's a case where the type system is happier with one over the other (e.g. type inference allows shorter code with one than the other).
There is a performance difference, maybe in some special case (e.g. if the empty collection is passed as an argument to some of the collection framework's static or instance methods like Collections.sort() or Collection.removeAll()).
Choosing one over the other "makes more sense" in the general case, if you think about it.
Examples where this question arises
To give some context, here are two examples where I am in need of an empty, unmodifiable collection.
This is an example of an API that allows creating some object by optionally specifying a collection of objects that are used in the creation. The second method just calls the first one with an empty collection:
static void createObjectWithTheseThings(Collection<Thing> things) {
...
}
static void createObjectWithoutAnyThings() {
createObjectWithTheseThings(Collections.emptyXXX());
}
This is an example of an Entity with state represented by an immutable collection stored in a non-final field. On initialization the field should be set to an empty collection:
class Example {
// Initialized to an empty collection.
private Collection<T> containedThings = Collections.emptyXXX();
...
}

Unfortunately I don't have an answer that will make the top of your priority list but if I were you I'd settle on
Collections.emptySet
Type inference was your first priority but I don't know if the choice can/should influence that given you were looking for an emptyCollection()
On the second priority, think about any api that takes in a collection which performs differently (accidentally/intentionally) based on the sub-interfaces of the concrete object passed in. Aren't they more likely to offer varied performance based on the concrete implementations (as with an ArrayList or LinkedList) instead? The empty set/list are not modeled on any empty data structures anyway; they are dummy implementations - hence no real difference
Based on java's modelling of these interfaces (which admittedly is not ideal), a Collection is very similar to a Set. In fact I think the methods are almost exactly the same. Logically too it looks OK with List being the specific-sub type that adds additional ordering concerns.
Now Collection and Set looking very similar(java-wise) brings up a question. If you are using a Collection type, it is clear it is not a list you want. Now the question is are you sure you don't mean a Set. If you don't, then are you using something like a Bag (surely there must be concrete instances which are not empty in the overall logic). So if you are concerned with say a Bag, then shouldn't it be up to the Bag api to provide an emptyBag() method? Just wondering. btw, I'd stick with emptySet() in the meantime :)

For the emptyXXX(), it really doesn't matter at all - since they are both empty (and they are unmodifieable, so they always stay empty) it doesn't matter at all. They will be equally suited to all operations Collection offers.
Take a look at what Collections really gives you there: Special implementations (the instances are shared across calls!). All relevant operations are dummy implementations that either return a constant result or immediately throw. Even iterator() is just a dummy with no state.
It wont make any notable difference at all.
Edit: You could say for the special case of emptyList/Set, they are semantically and complexity-wise the same at the Collecton interface level. All operations available on Collection are implemented by emptySet/List as O(1) operations. And since they're following both the contract defined by Collection, they are semantically identical too.

The only situation I can imagine this making a difference is if the code that will use your Collection does something like this:
Collection<T> collection = ...
List<T> asAList;
if (collection instanceof List) {
asAList = (List<T>) collection;
} else {
asAList = new ArrayList<T>(collection);
}
Obviously in a case like this you would want to use emptyList(), while if the secret target type was a Set, you'd want emptySet().
Otherwise, in terms of what "makes more sense", I agree with #ac3's logic that a generic Collection is like a Bag, and an empty immutable Set and empty immutable Bag are pretty much the same thing. However, a person very used to using immutable lists might find those easier to think of.

is there a performance hit when using enum.values() vs. String arrays?

I'm using enumerations to replace String constants in my java app (JRE 1.5).
Is there a performance hit when I treat the enum as a static array of names in a method that is called constantly (e.g. when rendering the UI)?
My code looks a bit like this:
public String getValue(int col) {
return ColumnValues.values()[col].toString();
}
Clarifications:
I'm concerned with a hidden cost related to enumerating values() repeatedly (e.g. inside paint() methods).
I can now see that all my scenarios include some int => enum conversion - which is not Java's way.
What is the actual price of extracting the values() array? Is it even an issue?
Android developers
Read Simon Langhoff's answer below, which has pointed out earlier by Geeks On Hugs in the accepted answer's comments. Enum.values() must do a defensive copy

For enums, in order to maintain immutability, they clone the backing array every time you call the Values() method. This means that it will have a performance impact. How much depends on your specific scenario.
I have been monitoring my own Android app and found out that this simple call used 13.4% CPU time! in my specific case.
In order to avoid cloning the values array, I decided to simple cache the values as a private field and then loop through those values whenever needed:
private final static Protocol[] values = Protocol.values();
After this small optimisation my method call only hogged a negligible 0.0% CPU time
In my use case, this was a welcome optimisation, however, it is important to note that using this approach is a tradeoff of mutability of your enum. Who knows what people might put into your values array once you give them a reference to it!?

Enum.values() gives you a reference to an array, and iterating over an array of enums costs the same as iterating over an array of strings. Meanwhile, comparing enum values to other enum values can actually be faster that comparing strings to strings.
Meanwhile, if you're worried about the cost of invoking the values() method versus already having a reference to the array, don't worry. Method invocation in Java is (now) blazingly fast, and any time it actually matters to performance, the method invocation will be inlined by the compiler anyway.
So, seriously, don't worry about it. Concentrate on code readability instead, and use Enum so that the compiler will catch it if you ever try to use a constant value that your code wasn't expecting to handle.
If you're curious about why enum comparisons might be faster than string comparisons, here are the details:
It depends on whether the strings have been interned or not. For Enum objects, there is always only one instance of each enum value in the system, and so each call to Enum.equals() can be done very quickly, just as if you were using the == operator instead of the equals() method. In fact, with Enum objects, it's safe to use == instead of equals(), whereas that's not safe to do with strings.
For strings, if the strings have been interned, then the comparison is just as fast as with an Enum. However, if the strings have not been interned, then the String.equals() method actually needs to walk the list of characters in both strings until either one of the strings ends or it discovers a character that is different between the two strings.
But again, this likely doesn't matter, even in Swing rendering code that must execute quickly. :-)
#Ben Lings points out that Enum.values() must do a defensive copy, since arrays are mutable and it's possible you could replace a value in the array that is returned by Enum.values(). This means that you do have to consider the cost of that defensive copy. However, copying a single contiguous array is generally a fast operation, assuming that it is implemented "under the hood" using some kind of memory-copy call, rather than naively iterating over the elements in the array. So, I don't think that changes the final answer here.

As a rule of thumb : before thinking about optimizing, have you any clue that this code could slow down your application ?
Now, the facts.
enum are, for a large part, syntactic sugar scattered across the compilation process. As a consequence, the values method, defined for an enum class, returns a static collection (that's to say loaded at class initialization) with performances that can be considered as roughly equivalent to an array one.

If you're concerned about performance, then measure.
From the code, I wouldn't expect any surprises but 90% of all performance guesswork is wrong. If you want to be safe, consider to move the enums up into the calling code (i.e. public String getValue(ColumnValues value) {return value.toString();}).

use this:
private enum ModelObject { NODE, SCENE, INSTANCE, URL_TO_FILE, URL_TO_MODEL,
ANIMATION_INTERPOLATION, ANIMATION_EVENT, ANIMATION_CLIP, SAMPLER, IMAGE_EMPTY,
BATCH, COMMAND, SHADER, PARAM, SKIN }
private static final ModelObject int2ModelObject[] = ModelObject.values();

If you're iterating through your enum values just to look for a specific value, you can statically map the enum values to integers. This pushes the performance impact on class load, and makes it easy/low impact to get specific enum values based on a mapped parameter.
public enum ExampleEnum {
value1(1),
value2(2),
valueUndefined(Integer.MAX_VALUE);
private final int enumValue;
private static Map enumMap;
ExampleEnum(int value){
enumValue = value;
}
static {
enumMap = new HashMap<Integer, ExampleEnum>();
for (ExampleEnum exampleEnum: ExampleEnum.values()) {
enumMap.put(exampleEnum.value, exampleEnum);
}
}
public static ExampleEnum getExampleEnum(int value) {
return enumMap.contains(value) ? enumMap.get(value) : valueUndefined;
}
}

I think yes. And it is more convenient to use Constants.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.