Share values between several transformations? - java

Imagine I have the following List of values:
List<String> values = Lists.asList("a", "a", "b", "c");
Now I want to add an index to all values, so that one ends up with this as list:
a1 a2 b1 c1 // imagine numbers as subscript
I want to use a FluentIterable and its transform method for that, so something like this:
from(values).transform(addIndexFunction);
The problem with that is, that addIndexFunction needs to know, how often the index was already increased - think of a2, when adding the index to this a, the function needs to know, that there was alraedy an a1.
So, is there some kind of best practice for doing such a thing? My current idea is to create a Map with each letter as key, so:
Map<String,Integer> counters = new HashMap<>();
// the following should be generated automatically, but for the sake of this example it's done manually...
counters.put("a", 0);
counters.put("b", 0);
counters.put("c", 0);
and then modify my transform call:
from(values).transform(addIndexFunction(counters));
As Map is an object and passed by reference, I can now share the counter state between the transformations, right? Feedback, better ideas? Is there some built-in mechanism for such things in Guava?
Thanks for any hint!

Use a Multiset to replace the HashMap, and you're good to go, following #Perception's suggestion to encapsulate the Multiset in the Function itself and aggregating data as the function is applied.

Don't use transform here, or your iterable will have different values every time you iterate over it, and will generally behave very weirdly. (It's also somewhat frowned upon to have state in a Function.)
Instead, do a proper for loop with a Multiset helper:
Multiset<String> counts = HashMultiset.create();
List<Subscript> result = Lists.newArrayList();
for (String value : values) {
int count = counts.add(value, 1);
result.add(new Subscript(value, count));
}

Related

Using removeAll() or something else to compare two lists

I have two lists, let's call them list A and list B. Both of these lists contain names and there are no duplicates (they are unique values). Every name in list B can be found in list A. I want to find which names are missing from list B in order to insert those missing names into a database. Basic example:
List<String> a = new ArrayList<>(Arrays.asList("name1", "name2", "name3", "name4","name5","name6"));
List<String> b = new ArrayList<>(Arrays.asList("name1", "name2", "name4", "name6"));
a.removeAll(b);
//iterate through and insert into my database here
From what I've searched removeAll() seems to be a go-to answer. In my case, I am dealing with a wide range of possible quantities. It could be anywhere between 500 to 50,000 names. Will removeAll() suffice for this? I've read that removeAll() is O(n^2) which may not be a problem with very small quantities, but with larger quantities, it sounds like it could be. I'd imagine it also depends on the user's patience as to when it would be considered a problem? Ultimately I'm wondering if there is a better way to do this without adding a huge amount of complexity as I do appreciate simplicity (to a point).
If the only thing you're doing with these lists is inserting them into a database, you shouldn't really care about the order of the elements. You could use HashSets instead of ArrayLists and get an O(n) performance instead of O(n2). As a side bonus, using a Set will ensure the values in a and b are really unique.
50000 is a very small amount of data. Unless you're doing this repeatedly, anything reasonable would likely be good enough.
One way to implement this:
List<String> a = new ArrayList<>(Arrays.asList("name1", "name2", "name3", "name4","name5","name6"));
List<String> b = new HashSet<>(Arrays.asList("name1", "name2", "name4", "name6"));
for (String s : a) {
if (!b.contains(s)) {
insertToDb(s);
}
}
or using Stream API in Java 8:
List result = a.stream().filter(s -> !b.contains(s)).collect(Collectors.toList());
// alternatively: Set result = a.stream().filter(s -> !b.contains(s)).collect(Collectors.toSet());

String Array as Key of HashMap

I need to solve two problems for our project, where (1) I have to find a way in which I can keep an array (String[] or int[]) as a key of the Map. The requirement is that, if the contents of two arrays are equal (String[] a={"A","B"}, String[] b={"B","A"}) then they should be considered as equal/same keys, i.e., if I use a, or b as key of Map then a.equal(b)=true
I found that Java Sets adds the hashcodes of all the objects stored in them. The addition of hashcode allows to compare two hashsets, to see if they are equal or not, this means that such mechanism allows to compare two java Sets based on their contents.
So for the above problem I can use Sets as a Key of the Map, but the thing is I want to use Arrays as Key. So any idea for this?
(2) the next thing is, we are interested in an efficient partial key matching mechanism. For instance, to see if any key in the Map contains a portion of the Array, such as to find some thing like Key.contains(new String[]{"A"}).
Please share your ideas, any alternate way of doing this, I am concern with space and time optimal implementations. As this will be used in Data Stream processing projects. So space and time is really an issue.
Q1 - You can't use bare arrays as HashMap keys if you want key equality based on the array elements. Arrays inherit equals(Object) and hashCode() implementations from java.lang.Object, and they are based on object identity, not array contents.
The best alternative I can think of is to wrap the arrays as (immutable) lists.
Q2 - I don't think there is a simple efficient way to do this. The best I can think of are:
Extract all possible subarrays of each array and make each one an alternative key in the hash table. The problem is that the keys will take O(N M^2) space where M is the average (?) number of strings in the primary key String[]'s . Lookup will still be O(1).
Build an inverted index that gives the location of each string in all of the keys, then do a "phrase search" for a sequence of strings in the key space. That should scale better in terms of space usage, but lookup is a lot more expensive. And it is complicated.
I try to use lambda expression in Java8 to solve your problem
For Problem 1:
String[] arr1 = {"A","B","A","C","D"};
List<String> list1 = new ArrayList<String>(new LinkedHashSet<>(Arrays.asList(arr1)));
list1.stream().forEach(x -> System.out.println(x));
If you would like to compare them if they are equal. I suggest you could sort them first and then compare.
Of course, It's much better to use Set and Hashcode to do comparsion
For Problem 2(Some variable in the above would be re-used):
String[] arr2 = {"A"};
List<String> list2 = new ArrayList<String>(Arrays.asList(arr2)); //Assume List2 element is also unique
int NumOfKeyContain = list1.stream().filter(a -> (list2.stream().filter(b -> !b.equals(a)).count())<list2.size())
.collect(Collectors.toList())
.size();
System.out.println(NumOfKeyContain); //NumOfKeyContain is the number that of key in list2 contained by list1

How to iterate and remove keys from Map<String , Map<String, Set<String>>>

I am having map this way,
Map<String, Map<String, Set<String>>> sampleMap = new Map<String, Map<String, Set<String>>>();
and the data in this map would be this way,
sampleMap={2014={A=[1, 2], B=[3], 2015={A=[1,2], B=[1,2], 2016={A=[1,2], B=[3,4]}};
I want to remove the key's from the map based on this input: List<String> filter; with values this way,
filterArray : [2014, 2015]
i.e, first iterate through arraylist values one by one, verify if the arraylist value matches with any of the key in Hashmap.
if key is matched ignore it.
if key is not matched, I just want to remove that key from the map.
i.e, I always want to keep only matched keys in map, comparing with the input value passed.
In this case, as I have arraylist values this way,[2014,2015],
2014,2015 keys only to be in my map. So,
Data to be before removal:
sampleMap={2014={A=[1, 2], B=[3], 2015={A=[1,2], B=[1,2], 2016={A=[1,2], B=[3,4]}};
Data to be after removel:
sampleMap={2014={A=[1, 2], B=[3], 2015={A=[1,2], B=[1,2]}};
I tried this way, However I just want to know is this is the correct approch, or is it is prone to any of the exceptions?
Iterator<Map.Entry<String , Map<String, Set<String>>>> iter = sampleMap.entrySet().iterator();
while (iter.hasNext()) {
Map.Entry<String , Map<String, Set<String>>> entry = iter.next();
logger.info("Keys : " + entry.getKey());
if (filterArray.equalsIgnoreCase(entry.getKey())) {
iter.remove();
}
}
Use retainAll() on the keySet:
map.keySet().retainAll(list);
Seems reasonable. I might have a couple pieces of advice.
First of all, whenever I see nested collections I always wonder if there should be a class or two in there. If this is a one-time task then don't worry about it, but if you want to reuse this code you might want to think about creating a class for your inner map/set... but if it's really this simple then it's no big deal.
Secondly if you are using Java 8, using a list comprehension for filtering would perform better (Because it would automatically thread your compares) and would be cleaner. I can give you the groovy solution for what you are trying to do, but I'm not familiar enough with java 8 list comprehensions to do it correctly.
def filteredStructure=structure.findAll{entry->entry.key.equalsIgnoreCase("2014") || entry.key.equalsIgnoreCase("2015"))
The java version should be really similar.

Generating possible unordered pairs of combinations from a Hashset

I realise that this question may have been asked many times before,
but for this particular application using loops won't really work because I can't index into a set
What I'm looking to do is getting a set of possible unordered pairs from data in a hashset as efficiently as possible.
So if my hashset contained A, B, C, D , E
Then the following combinations are possbile: AB, AC, AD, AE, BC, BD, BE, CD, CE, DE
What options do I have available to achieve this efficiently?
Any ideas would be greatly appreciated
As far as the efficiency goes, there aren't too many options out there: you need to produce a set of N2 items, meaning that the timing would also be at least of the same order.
Since enumerating a set is linear, two nested loops will deal with this as efficiently as any other method would.
The loop on the outside should iterate the collection from the beginning. The loop on the inside should start at the position of the outer loop's iterator, incremented by one position.
You can still index your data, just add an additional HashMap<Your_Class, Integer> map to store the index of a particular data.
HashSet<Your_Class> set = ...//HashSet that contains data
int index = 0;
HashMap<Your_Class,Integer> map = new HashMap<>();
for(Your_Class item : set){
map.put(item, index++);
}
//Generate all the set with those indexes, and replace them with map.get(index)
So, in the example case, A has index 0, B has index 1,...., So for each pair 01, 02, 03..., just need to convert it back into AB, AC ,...
There aren't too may options. You can arrange your objects in an imutable object Class with the two objects like this:
public T Class Arrangement<T>{
private final T object1;
private final T object2;
public Arrangement(T object1, T)
// get methods...
}
Set<MyType> mySet = new HashSet<MyType>();
mySet.add(new Arrangement(myObject1, myObject2);
Something like this!

Question regarding Java's LinkedList class

I have a question regarding the LinkedList class in Java.
I have a scenario wherein i need to add or set an index based on whether the index exists in the linkedlist or not. A pseudo-code of what i want to achieve is --
if index a exists within the linkedlist ll
ll.set(a,"arbit")
else
ll.add(a,"arbit")
I did go through the Javadocs for the LinkedList class but did not come across anything relevant.
Any ideas ?
Thanks
p1ng
What about using a Map for this:
Map<Integer, String> map = new HashMap<Integer, String>();
// ...
int a = 5;
map.put(a, "arbit");
Even if a already exists, put will just replace the old String.
Searching in linked list is not very efficient (O(n)). Have you considering using different data structure - e.g. HashMap which would give you O(1) access time?
If you need sequential access as well as keyed access you might want to try a LinkedHashMap, available as from 1.4.2
http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedHashMap.html
Map<Integer, String> is definitely a good (the best?) way to go here.
Here's an option for keeping with LinkedList if that's for some bizarre reason a requirement. It has horrible runtime performance and disallows null, since null now becomes an indicator that an index isn't occupied.
String toInsert = "arbit";
int a = 5;
//grow the list to allow index a
while ( a >= ll.size() ) {
ll.add(null);
}
//set index a to the new value
ll.set(a, toInsert);
If you're going to take this gross road, you might be better off with an ArrayList.
Why is it so bad? Say you had only one element at index 100,000. This implementation would require 100,000 entries in the list pointing to null. This results in horrible runtime performance and memory usage.
LinkedList cannot have holes inside, so you can't have list [1,2,3,4] and then ll.add(10,10), so I think there's something wrong with your example. Use either Map or search for some other sparse array
It looks like you're trying to use a as a key, and don't state whether you have items at index i < a. If you run your code when ll.size() <= a then you'll end up with a NullPointerException.
And if you add an item at index a the previous item at a will now be at a+1.
In this case it would be best to remove item at a first (if it exists) then add item "arbit" into a. Of course, the condition above re: ll.size() <=a still applies here.
If the order of the results is important, a different approach could use a HashMap<Integer,String> to create your dataset, then extract the keys using HashMap<?,?>.getKeySet() then sort them in their natural order (they're numeric after all) then extract the values from the map while iterating over the keySet. Nasty, but does what you want... Or create your own OrderedMap class, that does the same...
Could you expand on why you need to use a LinkedList? Is ordering of the results important?

Categories