Java syntax scenario where Collection<Foo>[] is necessary and/or valid - java

I came across an interesting piece of code while learning morphadorner manipulations. The code is below:
Collection<Object>[] nodes = someFunction()
My question is in what scenario is this declaration necessary and/or valid:
Collection<Object>[] nodes
I have seen:
Collection<Object[]> nodes
But cannot think of a scenario where I would need an Array of Collections. So again the question is, when would this be used?
This is the javadoc:
java.util.Set<java.lang.String>[]
findNames(java.lang.String text)
Returns names from text.

First of all,
Collection<Object[]> nodes;
and
Collection<Object>[] nodes;
are two different things. The first is a collection of arrays, whereas the second is an array of collections.
As to when you'd use the latter, my answer would be "rarely". While conceptually this is pretty simple, Java arrays and generics don't play together nicely.
It is therefore more common to see
ArrayList<Collection<Object>> nodes;
which is similar but much easier to deal with.
As to whether findNames() is an example of good design, my main objection is that it's completely impossible to guess from the function signature what the elements of the array are supposed to represent (or how many there are). For this reason, I would have done it differently, probably returning a custom class with two clearly-named accessors.

The code compiles, so it's probably valid Java.
It's necessary when the designer couldn't think of a better solution.
Which leaves us with: Is it good design?
Maybe but probably not. One scenario would be if the function always returns two or three collections (i.e. more than one but the number never changes).
You could create an object for this but since this is Java, this would take many, many, many lines of deadly boring code. It would also mean that you would have to come up with some useful names for each collection.
Taking the JavaDoc into account that you posted, it seems the number of arrays depends on the number of sentences in the text.
So in this scenario, I would return a List of Collections (since the order of sentences never changes and you might want to get them by index).
The designer might argue that you can add elements to a list but not to an array but I'd use an unmodifiable list.
So in this example, I'd say it's bad design.

Collection<Object>[] nodes
is Valid because it means that we are getting an array of Collection<Object> .
Whereas,
Collection<Object[]> nodes
means we are getting a Collection that contains arrays of Objects.
My question is in what scenario is this declaration necessary and/or
valid:
Consider the situation when you have different set of objects say (people). Each set belong to the people of particular country. And we create an array of set of specified size. In that case we use the following syntax:
Set<People>[] setOfPeople = new TreeSet<People>[5]; // We want to consider people of 5 different countries.

Related

Can we insert elements of multiple types in a List?

While working on a program, I thought if it is possible to add elements of multiple types e.g. Integer, String, Long etc in a List, without making it to accept everything of Object type.
I want to restrict the list to accept elements of only these three types? Is this possible?
There are few solutions of it, which I dont want to do
1) We can create a Pojo having all these three types as elements and insert that pojo.
2) A base class implementing datatype specific wrapper classes. In this case, user will know this abstraction while creating objects of different classes.
Can this be done in a better and more generic way?
What you can do as an alternative, is implement your own type of collection. This could have 3 add-methods accepting the different kinds of types you want and the default add-method should throw an UnsupportedOperationException.
This however might not be the ideal solution, since it might introduce bugs if you don't have a full understanding of how the collection you are implementing/extending should work internally.
You have Number to use with, well, any kind of number.
But there is no relationship between String and Number.
So, List<Object> is your only choice there.
And keep in mind: when you really have to deal with different types of elements - maybe List isn't the correct abstraction to use?!
In other words: if you have values of different types that belong together, you should rather consider creating a specific class to wrap around those values.
A better solution would be to use Map<TypeOfData, List<Type>> rather than hacking List to achieve it.
Although there are probably some hacked together work-around that may help you do this, the simple answer to your question is "no, this can't be done". Part of programming is using the right tool for the right job. In this case, it's almost certain that a List is the wrong tool to help you do this job.

Best way to write a Java function that modifies an object

I'm currently moving from C++ to Java for work and am having difficulty without const and pointers to make sure intent is always clear. One of the largest problems I'm having is with returning a modified object of the same type.
Take for example a filter function. It's used to filter out values.
public List<int> filter(List<Integer> values) {
...
}
Here everything is Serializable so we could copy the whole list first then modify the contents and return it. Seems a little pointlessly inefficient though. Especially if that list is large. Also copying your inputs every time looks quite clumsy.
We could pass it in normally, modify it and make it clear that we are doing that from the name:
public void modifyInputListWithFilter(List<Integer> values) {
...
}
This is the cleanest approach I can think of - you can copy it before hand if you need to, otherwise just pass it in. However I would still rather not modify input parameters.
We could make the List a member variable of the class we are in and add the filter method to the current class. Chances are though that now our class is doing more than one thing.
We could consider moving the List to it's own class which filter is a function of. It seems a little excessive for one variable though, we'll quickly have more classes than we can keep a track of. Also if we only use this strategy and more than just filtering happens to the List class will unavoidably start doing more than one thing.
So what is the best way of writing this and why?
The short answer is that there is not a single best way. Different scenarios will call for different approaches. Different design patterns will call for different approaches.
You've suggested two approaches, and either one of them might be valid depending on the scenario.
I will say that there is nothing inherently wrong with modifying a List that you pass into a function: take a look at the Collections.sort() function, for example. There is also nothing wrong with returning a copy of the List instead.
The only "rule" here is a Liskov rule (not the Liskov rule): your function should do whatever its documentation says it will do. If you're going to modify the List, make sure your documentation says so. If you aren't, make sure it says that instead.

What is the benefit of immediate down-casting?

I've been looking at a lot of code recently (for my own benefit, as I'm still learning to program), and I've noticed a number of Java projects (from what appear to be well respected programmers) wherein they use some sort of immediate down-casting.
I actually have multiple examples, but here's one that I pulled straight from the code:
public Set<Coordinates> neighboringCoordinates() {
HashSet<Coordinates> neighbors = new HashSet<Coordinates>();
neighbors.add(getNorthWest());
neighbors.add(getNorth());
neighbors.add(getNorthEast());
neighbors.add(getWest());
neighbors.add(getEast());
neighbors.add(getSouthEast());
neighbors.add(getSouth());
neighbors.add(getSouthWest());
return neighbors;
}
And from the same project, here's another (perhaps more concise) example:
private Set<Coordinates> liveCellCoordinates = new HashSet<Coordinates>();
In the first example, you can see that the method has a return type of Set<Coordinates> - however, that specific method will always only return a HashSet - and no other type of Set.
In the second example, liveCellCoordinates is initially defined as a Set<Coordinates>, but is immediately turned into a HashSet.
And it's not just this single, specific project - I've found this to be the case in multiple projects.
I am curious as to what the logic is behind this? Is there some code-conventions that would consider this good practice? Does it make the program faster or more efficient somehow? What benefit would it have?
When you are designing a method signature, it is usually better to only pin down what needs to be pinned down. In the first example, by specifying only that the method returns a Set (instead of a HashSet specifically), the implementer is free to change the implementation if it turns out that a HashSet is not the right data structure. If the method had been declared to return a HashSet, then all code that depended on the object being specifically a HashSet instead of the more general Set type would also need to be revised.
A realistic example would be if it was decided that neighboringCoordinates() needed to return a thread-safe Set object. As written, this would be very simple to do—replace the last line of the method with:
return Collections.synchronizedSet(neighbors);
As it turns out, the Set object returned by synchronizedSet() is not assignment-compatible with HashSet. Good thing the method was declared to return a Set!
A similar consideration applies to the second case. Code in the class that uses liveCellCoordinates shouldn't need to know anything more than that it is a Set. (In fact, in the first example, I would have expected to see:
Set<Coordinates> neighbors = new HashSet<Coordinates>();
at the top of the method.)
Because now if they change the type in the future, any code depending on neighboringCoordinates does not have to be updated.
Let's you had:
HashedSet<Coordinates> c = neighboringCoordinates()
Now, let's say they change their code to use a different implementation of set. Guess what, you have to change your code too.
But, if you have:
Set<Coordinates> c = neighboringCoordinates()
As long as their collection still implements set, they can change whatever they want internally without affecting your code.
Basically, it's just being the least specific possible (within reason) for the sake of hiding internal details. Your code only cares that it can access the collection as a set. It doesn't care what specific type of set it is, if that makes sense. Thus, why make your code be coupled to a HashedSet?
In the first example, that the method will always only return a HashSet is an implementation detail that users of the class should not have to know. This frees the developer to use a different implementation if it is desirable.
The design principle in play here is "always prefer specifying abstract types".
Set is abstract; there is no such concrete class Set - it's an interface, which is by definition abstract. The method's contract is to return a Set - it's up the developer to chose what kind of Set to return.
You should do this with fields as well, eg:
private List<String> names = new ArrayList<String>;
not
private ArrayList<String> names = new ArrayList<String>;
Later, you may want to change to using a LinkedList - specifying the abstract type allows you to do this with no code changes (except for the initializtion of course).
The question is how you want to use the variable. e.g. is it in your context important that it is a HashSet? If not, you should say what you need, and this is just a Set.
Things were different if you would use e.g. TreeSet here. Then you would lose the information that the Set is sorted, and if your algorithm relies on this property, changing the implementation to HashSet would be a disaster. In this case the best solution would be to write SortedSet<Coordinates> set = new TreeSet<Coordinates>();. Or imagine you would write List<String> list = new LinkedList<String>();: That's ok if you want to use list just as list, but you wouldn't be able to use the LinkedList as deque any longer, as methods like offerFirst or peekLast are not on the List interface.
So the general rule is: Be as general as possible, but as specific as needed. Ask yourself what you really need. Does a certain interface provide all functionality and promises you need? If yes, then use it. Else be more specific, use another interface or the class itself as type.
Here is another reason. It's because more general (abstract) types have fewer behaviors which is good because there is less room to mess up.
For example, let's say you implemented a method like this: List<User> users = getUsers(); when in fact you could have used a more abstract type like this: Collection<User> users = getUsers();. Now Bob might assume wrongly that your method returns users in alphabetic order and create a bug. Had you used Collection, there wouldn't have been such confusion.
It's quite simple.
In your example, the method returns Set. From an API designer's point of view this has one significant advantage, compared to returning HashSet.
If at some point, the programmer decides to use SuperPerformantSetForDirections then he can do it without changing the public API, if the new class extends Set.
The trick is "code to the interface".
The reason for this is that in 99.9% of the cases you just want the behavior from HashSet/TreeSet/WhateverSet that conforms to the Set-interface implemented by all of them. It keeps your code simpler, and the only reason you actually need to say HashSet is to specify what behavior the Set you need has.
As you may know HashSet is relatively fast but returns items in seemingly random order. TreeSet is a bit slower, but returns items in alphabetical order. Your code does not care, as long as it behaves like a Set.
This gives simpler code, easier to work with.
Note that the typical choices for a Set is a HashSet, Map is HashMap and List is ArrayList. If you use a non-typical (for you) implementation, there should be a good reason for it (like, needing the alphabetical sorting) and that reason should be put in a comment next to the new statement. Makes the life easier for future maintainers.

Java: Should I always replace Arrays for ArrayLists?

Well, it seems to me ArrayLists make it easier to expand the code later on both because they can grow and because they make using Generics easier. However, for multidimensional arrays, I find the readability of the code is better with standard arrays.
Anyway, are there some guidelines on when to use one or the other? For example, I'm about to return a table from a function (int[][]), but I was wondering if it wouldn't be better to return a List<List<Integer>> or a List<int[]>.
Unless you have a strong reason otherwise, I'd recommend using Lists over arrays.
There are some specific cases where you will want to use an array (e.g. when you are implementing your own data structures, or when you are addressing a very specific performance requirement that you have profiled and identified as a bottleneck) but for general purposes Lists are more convenient and will offer you more flexibility in how you use them.
Where you are able to, I'd also recommend programming to the abstraction (List) rather than the concrete type (ArrayList). Again, this offers you flexibility if you decide to chenge the implementation details in the future.
To address your readability point: if you have a complex structure (e.g. ArrayList of HashMaps of ArrayLists) then consider either encapsulating this complexity within a class and/or creating some very clearly named functions to manipulate the structure.
Choose a data structure implementation and interface based on primary usage:
Random Access: use List for variable type and ArrayList under the hood
Appending: use Collection for variable type and LinkedList under the hood
Loop and process: use Iterable and see the above for use under the hood based on producer code
Use the most abstract interface possible when handing around data. That said don't use Collection when you need random access. List has get(int) which is very useful when random access is needed.
Typed collections like List<String> make up for the syntactical convenience of arrays.
Don't use Arrays unless you have a qualified performance expert analyze and recommend them. Even then you should get a second opinion. Arrays are generally a premature optimization and should be avoided.
Generally speaking you are far better off using an interface rather than a concrete type. The concrete type makes it hard to rework the internals of the function in question. For example if you return int[][] you have to do all of the computation upfront. If you return List> you can lazily do computation during iteration (or even concurrently in the background) if it is beneficial.
The List is more powerful:
You can resize the list after it has been created.
You can create a read-only view onto the data.
It can be easily combined with other collections, like Set or Map.
The array works on a lower level:
Its content can always be changed.
Its length can never be changed.
It uses less memory.
You can have arrays of primitive data types.
I wanted to point out that Lists can hold the wrappers for the primitive data types that would otherwise need to be stored in an array. (ie a class Double that has only one field: a double) The newer versions of Java convert to and from these wrappers implicitly, at least most of the time, so the ability to put primitives in your Lists should not be a consideration for the vast majority of use cases.
For completeness: the only time that I have seen Java fail to implicitly convert from a primitive wrapper was when those wrappers were composed in a higher order structure: It could not convert a Double[] into a double[].
It mostly comes down to flexibility/ease of use versus efficiency. If you don't know how many elements will be needed in advance, or if you need to insert in the middle, ArrayLists are a better choice. They use Arrays under the hood, I believe, so you'll want to consider using the ensureCapacity method for performance. Arrays are preferred if you have a fixed size in advance and won't need inserts, etc.

Ways to fill a list in Java

I would like to know your opinions on which you find is a better approach to have a list filled up by a different method. I know there isn't a definite answer, but I would like to see reasonable pros and cons.
Approach 1.
private List<Snap> snapList;
snapList = getSnapList();
Approach 2.
private List<Snap> snapList = new ArrayList<Snap>();
fillSnapList(snapList);
Thanks,
Matyas
Why not follow the Java API's Collections class and make your fill methods static (if it makes sense and is independent of object state).
Collections.fill( mylist, 0 );
like
MyListFiller.fill( myList, args );
At any rate, creating a filler interface makes sense if the fill method plans to change. If you're not really "filling", but returning object state of some kind, just have the given method build the List and return it.
public List<Object> getMyStuff()
{
//build and return my stuff
}
It depends on the situation.
A method like getSnapList() is appropriate in situations like the following:
The method you're writing doesn't want to care about where the list came from.
The method shouldn't know what kind of list it's getting - for example, if you want to change to using a LinkedList, then you can do it in getSnapList() instead of all the methods that call fillSnapList().
You will only ever want to fill new lists.
A method like fillSnapList() is appropriate in situations like the following:
You may want to fill the list more than one time.
You may want to vary the way the list is filled (i.e. what gets put into it).
You need to fill a list that someone else hands you.
You need to share the list among more than one class or object, and you might need to refill it at some point in its lifespan.
I like approach 1 better than approach 2, because the list is really the output of the method that you're calling. Approach 1 makes that more clear than approach 2.
Also, approach 1 gives the method the opportunity to return an unmodifyable list. You might want this if the list should be filled once and shouldn't be modified later.
Java is not a functional programming language, but the first approach is more in the functional programming style than the second approach (in functional programming, immutability and avoiding mutable state are important ideas - and one important advantage of those is that they make concurrent programming easier, which is also useful in a non-functional programming language).
One con for the first option is that the method name you have choosen (getSnapList()) is often considered a simple accessor, ie return the reference for the field snapList. In your design, it is implied that you will be creating the list if it doesnt exist and filling it with data, which would be introducing a side effect to the normal idiom.
Due to this and as it is better to be explicit, I prefer the second option.
I prefer approach #1 because the method can be overridden by a sub class that would want to use a different List implementation. Also, I think that naming a factory method as a getter is confusing, I would rather name it newSnapList() or createSnapList().

Categories