Automatically merge several collections to one - java

I have some Guava Functions like Function<String,Set<String>>. Using those with FluentIterable.transform() leads to a FluentIterable<Set<String>>, however I need a FluentIterable<String>. So my idea now would be to subclass FluentIterable<E> and add a new method transform2() which simply merges everything to one collection before returning it.
The original transform method looks like this:
public final <T> FluentIterable<T> transform(Function<? super E, T> function) {
return from(Iterables.transform(iterable, function));
}
I thought of something like this for my subclass and transform2() method:
public abstract class FluentIterable2<E> extends FluentIterable<E>
{
public final <T> FluentIterable<T> transform2(Function<? super E, Collection<T>> function) {
// (PROBLEM 1) Eclipse complains: The field FluentIterable<E>.iterable is not visible
Iterable<Collection<T>> iterables = Iterables.transform(iterable, function);
// (PROBLEM 2) Collection<T> merged = new Collection<T>(); // I need a container / collection - which one?
for(Collection<T> iterable : iterables)
{
// merged.addAll(iterable);
}
// return from(merged);
}
}
Currently I have two problems with my new subclass, marked above with PROBLEM 1 and PROBLEM 2
PROBLEM 1: The iterable field in the original FluentIterable class is private - what can I do about this? Can I create a new private field with the same name in my subclass, will this then be OK? What about methods in my subclass that call super.someMethod() which uses this field? Will they then use the field of the super class, which probably has a different value?
PROBLEM 2: I need some generic collection where I can combine the content of several collections, but collections is an interface, so I can't instantiate it. So, which class can I use there?
It would be acceptable if the solution only works with sets, though I'd prefer a solution that works with sets and lists.
Thanks for any hint on this!

Does FluentIterable.transformAndConcat(stringToSetFunction) not work for your use case?

Why subclass FluentIterable just to do this? You just need a simple loop:
Set<String> union = Sets.newHashSet();
for (Set<String> set : fluentIterableOfSets) {
union.addAll(set);
}

Use FluentIterable.transformAndConcat(f), where f is a Function mapping an element to some kind of iterable over the element type.
In your case, let's say your Function<String, Set<String>> is called TOKENIZE, and your initial Iterable<String> is called LINES.
Then to get a Set<String> holding all the distinct tokens in LINES, do this:
Iterable<String> LINES = ...;
Function<String, Set<String>> TOKENIZE = ...;
Set<String> TOKENS = FluentIterable.from(LINES)
.transformAndConcat(TOKENIZE)
.toSet();
But consider JB Nizet's answer carefully. Try it both ways and see which works better.

Related

Java: Create a Generic Method out of multiple methods

I came across a piece of code where two methods have very similar functionalities, return the same type, but are different.
private Set<String> extractDeviceInfo(List<Device> devices){
Set<String> sets= new HashSet<>();
for(Device item:items){
sets.add(item.getDeviceName());
}
return sets;
}
private Set<String> extractDeviceInfoFromCustomer(List<Customer> customers){
Set<String> sets= new HashSet<>();
for (Customer c : customers) {
sets.add(c.getDeviceName());
}
return sets;
}
As you can see from the code above, both methods are returning the same Set and retrieving the same data.
I'm trying to attempt to create a generic method out of it and did some research but couldn't find anything that could solve this issue.
If I understand this correctly, using generics, I can define generic parameters in the method and then pass parameters as well as the class type when calling the method. However I am not sure what to do after wards.
For example, the method getDeviceName() how can I call it out of a generic class as the compiler doesn't know whether the generic class has that method or not.
I will really appreciate if someone could tell me whether this is possible and how to achieve the desired result.
Thanks
UPDATE: Creating an interface and then having implementation looks like a good solution but I feel like it's overdoing when it comes to refactoring a couple of methods to avoid boiler plate.
I've noticed that Generic classes can be passed as a parameter and the have methods like getMethod() etc.
I was wondering if it was possible to create a generic method where you pass the class as well as the method name and then the method resolves that at runtime
eg.
private <T> Set<String> genericMethod(Class<T> clazz, String methodName ){
clazz.resolveMethod(methodName);
}
So basically, I could do this when calling the method:
genericMethod(Customer.class,"gedDeviceInfo");
I believe there's one language where this was achievable but not sure if you can do it in Java, although, a few years back I remember reading about resolving string into java code so they get compiled at runtime.
Both Device and Customer should implement the same interface where the method getDeviceName is defined:
interface Marker {
String getDeviceName();
}
class Device implements Marker { ... }
class Customer implements Marker { ... }
I named it Marker, but it's up to you to name it reasonably. Then, the method might look like:
private Set<String> extractDeviceInfo(List<? extends Marker> markers) {
return markers.stream().map(Marker::getDeviceName).collect(Collectors.toSet());
}
It allows the next type variations:
extractDeviceInfo(new ArrayList<Device>());
extractDeviceInfo(new ArrayList<Customer>());
extractDeviceInfo(new ArrayList<Marker>());
99% of the time Andrew answer is the solution. But, another approach is to define the function in parameter.
This can be useful for some reporting or if you need to be able to extract values from an instance in multiple ways using the same method.
public static <T, U> Set<U> extractInfo(List<T> data, Function<T, U> function){
return data.stream().map(function).collect(Collectors.toSet());
}
Example :
public class Dummy{
private String a;
private long b;
public Dummy(String a, long b){ this.a = a; this.b = b; }
public String getA(){return a; }
public long getB(){return b; }
}
List<Dummy> list = new ArrayList<>();
list.add(new Dummy("A1", 1));
list.add(new Dummy("A2", 2));
list.add(new Dummy("A3", 3));
Set<String> setA = extractInfo(list, Dummy::getA); // A1, A2, A3
Set<Long> setB = extractInfo(list, Dummy::getB); // 1, 2, 3
using reflection in java will take a performance hit. in your case, it's probably not worth it.
There is nothing wrong with your original code, if there are less than 3 places using it, DO NOT refactor. If there is more than 3 places and expecting more coming, you can refactor using #andrew's method.
you should not refactor code just for the sake of refactoring in my opinion.

Typing a generic type but not its own type in Java Generics

I need to type method signature so it accepts 2 equally typed parameters of different particular concrete subtypes.
Is it possible to code something like this with generics? How would you solve it? (The case is absolutely an example)
public <T extends List<?>> T<String> sum(T<Integer> sublistOfInts, T<Boolean> sublistOfBooleans){
/*fusion both lists*/
return sublistOfStrings;
}
EDIT: In the end, what I am looking for is a way for the compiler to pass:
ArrayList<String> myList = sum(new ArrayList<Integer>(), new ArrayList<Boolean>());
but not:
ArrayList<String> myList = sum(new ArrayList<Double>(), new ArrayList<Boolean>());
nor
ArrayList<String> myList = sum(new LinkedList<Integer>(), new ArrayList<Boolean>());
(...)
EDIT 2: I found a better example. Imagine an interface Tuple, with child classes Duple, Triple>..., it would be perfectly nice to have something like
<T extends Tuple<?>> T<String> reset( T<String> input, T<Boolean> listToNull){
T copy = input.copy();
for (int i=0; i<input.size();i++){
if (listToNull.get(i)){
copy.set(i,null);
}
}
}
What I suggest you do instead
First, get rid of the method argument generics. There's no reason to force a caller to provide ArrayList<Integer> and ArrayList<Boolean> when you want to return an ArrayList<String>. Just accept any List<Integer> and List<Boolean>, and leave it to your method to turn them into the appropriate return List.
Since you know that you want to return some sort of List of String you can write your parameter as <T extends List<String>> and your return type as simply T.
That leaves us with the hard part: getting your method to instantiate an object of unknown type. That's hard. You can't just do new T();. You need to invoke something that will produce a T on your behalf. Luckily, Java 8 provides a Functional Interface for Supplier<T>. You just need to invoke the get() method to get your ArrayList<String> or whatever else you might want. The part that's painful is that your invoker needs to provide their own Supplier. But I think that's as good as it gets in Java 8.
Here's the code:
public <T extends List<String>> T sum(
List<Integer> sublistOfInts,
List<Boolean> sublistOfBooleans,
Supplier<T> listMaker) {
T sublistOfStrings = listMaker.get();
/*fusion of both lists*/
return sublistOfStrings;
}
At least this compiles:
ArrayList<String> myNewList = thing.<ArrayList<String>>sum(intList, boolList, ArrayList::new);
And this does not:
ArrayList<String> myNewList = thing.<ArrayList<String>>sum(intList, boolList, LinkedListList::new);
You can even leave off the type parameter on the invocation. This compiles:
ArrayList<String> myNewList = thing.sum(intList, boolList, ArrayList::new);
And this does not:
ArrayList<String> myNewList = thing.sum(intList, boolList, LinkedListList::new);
Why you can't just do what you're asking
In brief, it's because type arguments can't themselves be parameterized. And that's because we don't know how many type arguments they themselves would take, nor the restrictions that might be placed on them.
Take the relatively obscure class RoleList. It extends ArrayList<Object>, so it fits List<?>. But it doesn't take a type argument at all. So if someone invoked your sum() method with RoleList, that would require in your example:
RoleList<Integer> intList = // something
RoleList<Boolean> boolList = // something
RoleList<String> myNewList = thing.sum(intList, boolList);
That clearly can't work since it requires an unparameterized type to take type arguments. And if you took off the type arguments like so:
RoleList intList = // something
RoleList boolList = // something
RoleList myNewList = thing.sum(intList, boolList);
Then your method needs to be able to accept two List<Object> arguments and return a value of List<Object>. And that violates your basic premise, that you be able to control such things.
In reality, RoleList should not be allowed here at all, because you can't ever guarantee that one instance will contain only Integers, another only Booleans, and a third only Strings. A compiler that allowed RoleList here would necessarily have weaker type checking than we have now.
So the bottom line is that you just can't do what you're asking because Java just isn't built that way.
Why that's ok
You can still get complete type safety inside your sum() method using my suggested method, above. You make sure that the incoming Lists contain only Integer or Boolean values, respectively. You make sure that the caller can rely on the return of a specific subtype of List containing only String values. All of the guarantees that make a difference are there.
There are two things that strike me about the above. How are you instantiating sublistOfStrings, and what advantages do you expect to get above using plain old inheritance?
There are a couple of ways of instantiating T<String>. You could have a factory check the class of your arguments, and instantiate it based on that. Or you could do something like:
(List<String>)sublistOfInts.getClass().newInstance()
But you can't just go new T<String>(). So you're basing the implementation of your return type off of the type of one of your arguments anyway (unless there's a way I haven't thought of).
By specifying both arguments are of type 'T' doesn't mean they're exactly of the same concrete type 'T' either. For instance
sum((int)1, (long)2L); // valid
sum((int)2, (double)2.0D); // valid ... etc
public <T extends Number> T sum(T a, T b) {
return a;
}
So you aren't enforcing that sublistOfInts and sublistOfBooleans are both of type say ArrayList, and therefore you can return an ArrayList. You still need to write code to check what type of List<?> you'll want to return based on the arguments.
I think you're better off not using generics, and using something like this:
public List<String> sum(List<Integer> sublistOfInts, List<Boolean> sublistOfBooleans) {
// Determine what subclass of list you want to instantiate based on `sublistOfInts` and `sublistOfBools`
// Call factory method or newInstance to instantiate it.
// Sum, and return.
}
You can still call it with subtypes of List<?>. I don't beleive there's any advantage you could get from generics even if Java did let you do it (which is doesn't, because it can't parameterize T like that).
I know what you have is just an example but if you only want to return a single list that contains the String value of all the contents in a group of other lists you could just specify a method that takes a varargs of unbounded lists.
public List<String> sum(List<?>... lists) {
List<String> sublistOfStrings = new ArrayList<String>();
for(List<?> list : lists) {
for(Object obj : list) {
sublistOfStrings.add(obj.toString());
}
}
return sublistOfStrings;
}

Method to convert an Arraylist into a Set

I coded the following method to convert my Arraylist into a set:
public static Set<Animal> toSet(){
Set<Animal> aniSet = new HashSet<Animal>(animals);
return aniSet;
}
I would like to do this instead :
public static Set<Animal> toSet(){
return HashSet<Animal>(animals);
}
Why do i get an error message that says it cannot find variable HashSet ? Do i need to store a variable first ?
EDIT : had to add new before my Hashset. Coding makes me feel so dumb :')
There are two problems with this code:
You forget that the animals have to come from somewhere; I don't think the first example compiles either; and
you forgot to use new when creating a new HashSet<Animal>.
This is probably the intended behavior:
public static <T> Set<T> toSet(Collection<? extends T> data){
return new HashSet<T>(data);
}
You can then call it with:
ArrayList<Animal> animals = new ArrayList<>();
//do something with the animals list
//...
Set<Animal> theSet = Foo.<Animal>toSet(animals);
by using a generic static method, you can call it with any type you like. By using Collection<? extends T> you are furthermore not limited to an ArrayList<T>, but you can use any kind of Collection (LinkedList, HashSet, TreeSet, ...). Finally the type of that collection does not even have to be animal. You could convert an ArrayList<Cat> into a HashSet<Animal>.
Note however that there is not much use in this method: calling it is not much shorter than using the constructor directly. The only real advantage I see is that you encapsulate which Set<T> you are going to use, such that if you later change your mind to TreeSet<T> all methods calling this toSet method will generate a TreeSet<T> instead of a HashSet<T>.

OO pattern for dividing a collection into groups

I have a list of MyObjects which I need to divide into three groups:
Known good (keep)
Known bad (reject)
Unrecognized (raise alert)
MyObject contains various properties which must be examined to determine which of the 3 groups to put the object in.
My initial implementation (Java) just takes a List in its constructor and does the triage there. Pseudocode:
class MyObjectFilterer {
public MyObjectFilterer(List<MyObject> list) {
// triage items here
}
public List<MyObject> getGood() {
// return sub-list of good items
}
public List<MyObject> getBad() {
// return sub-list of bad items
}
public List<MyObject> getUnrecognized() {
// return sub-list of unrecognized items
}
}
Any issues with this implementation? Is there a better OO choice?
I would probably prefer a static factory method to do the filtering, that then calls a private constructor that takes the three filtered lists, following the good code practice of never doing any serious work in a constructor. Other than that, this looks fine.
There may be multiple approachs. If the problem is generic / repetitive enough, you could define an interface with a method to classify the objects.
interface Selector {
public boolean isGood(MyObject myObject);
public boolean isBad(MyObject myObject);
public boolean isUnknown(MyObject myObject);
}
That way you could change the logic implementation easily.
An other idea would be using the Chain of responsibility.
Your MyObjectFilterer contains a reference to three Objects GoodFilterer, BadFilterer and UnrecognizedFilterer. Each of them contains the following methods: addMethod(MyObject object), getObjects() and addFilter(). Of course they have to implement an interface Filterer.
With the addFilter method you can build the chain. so that the GoodFilterer contains a reference to the BadFilterer and this one contains a reference to the UnrecognizedFilterer
Now you go through your list of MyObjects and call the add method on the GoodFilterer (first one in this chain). Inside the add method you decide if this is good, than you keep it and finish the work, if not pass it on to the BadFilterer.
You keep your three methods for getting the good/bad and unrecognized, but you will pass this to the getObjects() method of the corresponding Filterer
The Benefit is that the logic if this is a good/bad or Unrecognized one is now seperated.
The Downside you would need 3 new classes and 1 Interface.
But like i said, this is just an other idea what you could do.
You should simplify as it's possible. Just make static method in MyObjectFilter with following signature:
public static List filterMyObjects(List data, Group group).
Group is enumeration with three values and it can be used as attribute of MyObject class
I might try something like:
enum MyObjectStatus {
GOOD, BAD, UNRECOGNIZED;
}
class MyObjectFilterer {
private MyObjectStatus getStatus(MyObject obj) {
// classify logic here, returns appropriate enum value
}
// ListMultimap return type below is from Google Guava
public ListMultimap<MyObjectStatus, MyObject> classify(List<MyObject> objects) {
ListMultimap<MyObjectStatus, MyObject> map = ArrayListMultimap.create();
for(MyObject obj: objects) {
map.put(getStatus(obj), obj);
}
}
}
Call classify() to get a Multimap, and extract each category as needed with something like:
List<MyObject> good = map.get(GOOD);
List<MyObject> bad = map.get(BAD);
List<MyObject> unknown = map.get(UNRECOGNIZED);
A nice thing about this solution is you don't have to create/publish accessor methods for each category (unless you want to), and if new categories are created, you also don't add new accessors -- just the new enum and the additional classifier logic.

How do I perform an action on each element of a List and return the result (without affecting the original of course)?

How do I write a static method in Java that will take a List, perform an action on each element, and return the result (without affecting the original of course)?
For example, if I want to add 2 to each element what goes in the ... here? The concrete return type must be the same, e.g. if my List is a LinkedList with values 1,2,3 I should get back a LinkedList with values 3,4,5. Similarly for ArrayList, Vector, Stack etc, which are all Lists.
I can see how to do this using multiple if (lst instanceof LinkedList) ... etc... any better way?
import java.util.List;
public class ListAdd {
static List<Integer> add2 (List<Integer> lst) {
...
return result;
}
}
There are already many answers, but I'd like to show you a different way to think of this problem.
The operation you want to perform is known as map in the world of functional programming. It is something we do really all the time in functional languages.
Let M<A> be some kind of container (in your case, M would be List, and A would be Integer; however, the container can be lots of other things). Suppose you have a function that transforms As into Bs, that is, f: A -> B. Let's write this function as of type F<A, B>, to use a notation closer to Java. Note that you can have A = B, as in the example you give (in which A = B = Integer).
Then, the operation map is defined as follows:
M<B> map(M<A>, F<A, B>)
That is, the operation will return a M<B>, presumably by applying F<A, B> to each A in M<A>.
In practice...
There's a brilliant library developed by Google, called Guava, which brings lot's of functional idioms to Java.
In Guava, the map operation is called transform, and it can operate on any Iterable. It has also more specific implementations that work directly on lists, sets, etc.
Using Guava, the code you want to write would look like this:
static List<Integer> add2(List<Integer> ns) {
return Lists.transform(ns, new Function<Integer, Integer>() {
#Override Integer apply(Integer n) { return n + 2; }
}
}
Simple as that.
This code won't touch the original list, it will simply provide a new list that calculates its values as needed (that is, the values of the newly created list won't be calculated unless needed -- it's called a lazy operation).
As a final consideration, it is not possible for you to be absolutely sure that you will be able to return exactly the same implementation of List. And as many others pointed out, unless there's a very specific reason for this, you shouldn't really care. That's why List is an interface, you don't care about the implementation.
Fundamentally, the List interface doesn't make any guarantees that you'll have a way to duplicate it.
You may have some luck with various techniques:
Using clone() on the passed in List, although it may throw, or (since it is protected in Object) simply not be accessible
Use reflection to look for a public no-argument constructor on the passed-in List
Try to serialize and deserialize it in order to perform a "deep clone"
Create some sort of factory and build in knowledge of how to duplicate each different kind of List your code may encounter (What if it's a wrapper created by unmodifiableList(), or some oddball custom implementation backed by a RandomAccessFile?)
If all else fails, either throw, or return an ArrayList or a Vector for lack of better options
You could use reflection to look for a public zero-arg constructor on the result of lst.getClass() and then invoke() it to obtain the List into which you'll place your results. The Java Collections Framework recommends that any derivative of Collection offer a zero-arg constructor. That way, your results we be of the same runtime class as the argument.
Here is a variant which does neither copies nor modifies the original list. Instead, it wraps the original list by another object.
public List<Integer> add2(final List<Integer> lst) {
return new AbstractList<Integer>() {
public int size() {
return lst.size();
}
public Integer get(int index) {
return 2 + lst.get(index);
}
};
}
The returned list is not modifiable, but will change whenever the original list changes.
(This implements the iterator based on index access, thus it will be slow for a linked list. Then better implement it based on AbstractSequentialList.)
Of course, the resulting list will obviously not be of the same class as the original list.
Use this solution only if you really only need a read-only two added view of your original list, not if you want a modified copy with similar properties.
The whole point of using an interface, in this case List, is to abstract the fact that the implementation is hidden behind the interface.
Your intention is clear to me, however: the Clonable interface supports creating a new instance with the same state. This interface might not be defined on your List.
Often it's a good idea to rethink this situation: why do you need to clone the List in this place, this class? Shouldn't your list-creator be responsible for cloning the list? Or shouldn't the caller, who knows the type, make sure he passes in a clone of his list?
Probably, if you look for the semantics as you defined it, you can implement all your supported Lists:
static Vector<Integer> addTwo(Vector<Integer> vector) {
Vector<Integer> copy = null; // TODO: copy the vector
return addTwo_mutable(copy);
}
static ArrayList<Integer> addTwo(ArrayList<Integer> aList) {
ArrayList<Integer> copy = null; // TODO: copy the array list
return addTwo_mutable(copy);
}
static LinkedList<Integer> addTwo(LinkedList<Integer> lList) {
LinkedList<Integer> copy = null; // TODO: copy the linked list
return addTwo_mutable(copy);
}
private <T extends List<Integer>> static T addTwo_mutable(T list) {
return list; // TODO: implement
}
Even, when you don't support a data-type, you'll get a nice compiler error that the specified method does not exists.
(code not tested)
Just to show you that what you want to do is not possible in the general case, consider the following class:
final class MyList extends ArrayList<Integer> {
private MyList() {
super.add(1);
super.add(2);
super.add(3);
}
private static class SingletonHolder {
private static final MyList instance = new MyList();
}
public static MyList getInstance() {
return SingletonHolder.instance;
}
}
It is a singleton (also, a lazy, thread-safe singleton by the way), it's only instance can be obtained from MyList.getInstance(). You cannot use reflection reliably (because the constructor is private; for you to use reflection, you'd have to rely on proprietary, non-standard, non-portable APIs, or on code that could break due to a SecurityManager). So, there's no way for you to return a new instance of this list, with different values.
It's final as well, so that you cannot return a child of it.
Also, it would be possible to override every method of ArrayList that would modify the list, so that it would be really an immutable singleton.
Now, why would you want to return the exact same implementation of List?
OK well someone mentioned reflection. It seems to be an elegant solution:
import java.util.*;
public class ListAdd {
static List<Integer> add2 (List<Integer> lst) throws Exception {
List<Integer> result = lst.getClass().newInstance();
for (Integer i : lst) result.add(i + 2);
return result;
}
}
Concise, but it thows an checked exception, which is not nice.
Also, wouldn't it be nicer if we could use the method on concrete types as well, e.g. if a is an ArrayList with values 1, 2, 3, we could call add2(a) and get an ArrayList back? So in an improved version, we could make the signature generic:
static <T extends List<Integer>> T add2 (T lst) {
T res;
try {
res = (T) lst.getClass().newInstance();
} catch (InstantiationException e) {
throw new IllegalArgumentException(e);
} catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
for (Integer i : lst) res.add(i + 2);
return res;
}
I think throwing a runtime exception is the least worst option if a list without a nullary construcor is passed in. I don't see a way to ensure that it does. (Java 8 type annotations to the rescue maybe?) Returning null would be kind of useless.
The downside of using this signature is that we can't return an ArrayList etc as the default, as we could have done as an alternative to throwing an exception, since the return type is guaranteed to be the same type as that passed in. However, if the user actually wants an ArrayList (or some other default type) back, he can make an ArrayList copy and use the method on that.
If anyone with API design experience reads this, I would be interested to know your thoughts on which is the preferable option: 1) returning a List that needs to be explicity cast back into the original type, but enabling a return of a different concrete type, or 2) ensuring the return type is the same (using generics), but risking exceptions if (for example) a singleton object without a nullary constructor is passed in?

Categories