Java collection for this use case - java

Let's say we have a bunch of Car objects.
Each Car has some distinguishing properties e.g. manufacturer, model, year, etc. (these can be used to create distinct hashCodes).
Each car has a List of PurchaseOffer objects (a PurchaseOffer object contains pricing\retailer info).
We receive Lists of Cars from several different sources, each Car with a single PurchaseOffer.
Thing is, these lists may overlap - a Car can appear in more than one list.
We wish to aggregate the lists into a single collection of Cars where each Car holds all encountered PurchaseOffers for it.
My Problem is choosing what to collection to use in this aggregation process:
Feels natural to use java.util.HashSet for holding our cars, that way when going over the different lists of Cars, we can check if a car already exists in the Set in amortized O(1),
however - you cannot retrieve an element from a Set (in our case - when we go encounter a Car that already exists in the Set - we would have liked to retrieve that Car from the Set based on its identifying hashCode and add PurchaseOffers to it).
I can use a HashMap where each Car's hashCode maps to the actual Car object, but it probably isn't the school-book solution since it is unsafe - I would have to make sure myself that every hashCode maps to a Car with that hashCode - there could be inconsistency.
Of course, can make a designated data structure that guarantees this consistency - Shouldn't one already exist ?
Can anyone suggest the data-structure I am after, or point out a design mistake ?
Thanks.

Since this is a many-to-many relationship, you need a bi-directional multi-map. Car is the key for the first one, with a List of PurchaseOrder as the value. The PurchaseOrder is the key for the second one, with a List of Cars as the value.
The underlying implementation is two HashMaps.
Put an API on top of it to get the behavior you need. Or see if Google Collections can help you. It's a combination of a BiMap and two MultiMaps.

I think that you really do need (at least) a HashMap<Car, List<PurchaseOffer>> ... as suggested by #Andreas_D
Your objection that each Car already has a List<PurchaseOffer> is beside the point. The list in the HashMap is the aggregate list, containing all PurchaseOffer objects from all Car objects that stand for the same physical car.
The point of creating a new list is to avoid changing the original lists on the original Car objects. (If that was not a concern, then you could pick one instance of Car from the set that represent a physical car, and merge the PurchaseOffer objects from the others into that list.)
I'm not entirely sure why #duffymo suggested a bi-directional map between, but I think it is because the different Car objects from different sources may have complementary (or contradictory) information for the same physical car. By keeping all instances, you avoid discarding information. (Once again, if you are happy to discard mutate and/or discard information, you could attempt to merge the information about each individual car into a single Car object.
If you really didn't care about preserving information and were prepared to merge stuff willy-nilly then the following approach would probably work:
HashMap<Car, Car> map = new HashMap<Car, Car>(...);
for (Car car : carsToBeAggregated) {
Car master = nap.get(car);
if (master == null) {
map.put(car, car);
} else {
master.offers.addAll(car.offers);
// optionally, merge other Car information from car to master
}
}
You should NOT be trying to use the Car.hashCode() as a key for anything. Hashcode values are not unique identifiers: there is a distinct possibility that two different cars will end up with the same hashcode value. If you attempt to use them as if they were unique identifiers you'll get into trouble ...

The basic datastructure should be a HashMap<Car, List<PurchaseOffer>>. This allows for storing and receiving all offers for one selected car.
Now you may have to find a suitable implementation for Car.equals() to assure, that "cars" coming from different source are really the same. What about basing equals() on a unique identifier for a real world car (VIN)?

I would prefer to use a HashMap<Car, List<PurchaseOffer>>, as suggested before (Andreas, Stephen), mainly if the Car object does not hold the list of PurchaseOffers.
Otherwise I would consider using a HashMap<Car, Car> or, better IMO, a HashMap<ID, Car> if there is an unique ID for each Car.
It can not simply map the Car's hashCode to the Car, as mentioned in the question, since distinct Cars can have the same hashCode!
(Anyway, I would create an own class for storing and managing the Cars. This would contain the HashMap, or whichever - so it's easy to change the implementation without needing to change its interface)

create tout custom class that extends hash
Set, override method contains(Object o) check there os hash code is same or not and return result according, and add object to set of and only if it not containing that object

How about a defining a new custom Aggregation class? Define the hashcode such that the id of the car acts as the key and override the equals() accordingly. Define a custom method for accepting your original car and do a union operation on the lists. Finally store the custom objects in a HashSet for achieving constant time look up.
In purist terms, aggregation is a behavior beyond the scope of a single object. Visitor pattern tries to address a similar problem.
Alternatively if you have a sql datastore, a simple select using group by would do the trick.

Welp, yeah, HashMap<Car, List<PurchaseOffer>> would be perfect if it wasn't for the fact that
each Car contains a List<PurchaseOffer> as a property. Can say that a Car object is composed
of two parts: an identifying part (let's say each car indeed has a unique VIN), and the list of
PurchaseOffers.
In this case split the Car class in two classes - the CarType class with the identifying attributes, and then the list part (maybe both together used by Car). Then use Map<CarType, Lost<PurchaseOffer> for your datastructure (or MultiMap<CarType, PurchaseOffer>).

//alt. 1
List<Offer> offers;
List<Car> cars;
Map<Car, List<Offer>> mapCarToOffers;
Map<Offer, List<Car>> mapOfferToCars;
public void List<Offer> getOffersForCar(Car aCar);
public void List<Car> getCarsForOffer(Offer anOffer);
Alternative 1 would make use of the hashCode() of Car and Offer
//alt. 2
List<Offer> offers;
List<Car> cars;
Map<Integer, List<Offer>> mapCarIdToOffers;
Map<Integer, List<Car>> mapOfferIdToCars;
public void List<Offer> getOffersForCarId(int aCarId);
public void List<Car> getCarsForOfferId(int anOfferId);
Alternative 2 would make use of the hashCode() of Integer. This would allay your concerns about "safety" as the hash codes for Integer objects should not overlap where the values are unique. This incurs the additional overhead of having to maintain unique IDs for each Car and Offer object, however, I am guessing that you probably already have those from your business requirements.
Note, you may choose to use other classes as alternative to ints for ID's (e.g. String).
For both alternatives, implement the Lists with ArrayList or LinkedList - which one is better is up to you to determine based on other requirements, such as the frequency of insertion/deletion vs lookup. Implement the Maps with HashMap - see comments above about how hash codes are used.
As a side note, in our software, we use these both of the above to represent similar types of many-to-many data. Very similar to your use case.
Both alternatives work very well.

Why not use an object database for this? You could store any object graph you wanted, and you'd get a search API with which you could do any relationship/retrieval mechanism you wanted. A simple collection could work, but it sounds like you want a more complex relationship than a collection would provide. Look into db4o (http://db4o.com) - it's very powerful for this sort of thing.

Related

Sort and associate objects with unique identifier in a collection

I'm working my through an assignment and got stuck on step 5, would appreciate any help.
Carefully study the class structure in Products.java.
Design a generic container called GenericOrder that acts as a collection of an arbitrary number of objects in Products.java. Design a mechanism that gives each instance of the container a unique identifier. Implement as many methods as necessary. You must use Java generics features.
Design and implement a subclass of GenericOrder called ComputerOrder that takes an arbitrary number of different classes of ComputerPart objects, Peripheral objects, and Service objects. Implement as many methods as necessary.
Design and implement a subclass of GenericOrder called PartyTrayOrder that takes an arbitrary number of different classes of Cheese objects, Fruit objects, and Service objects. Implement as many methods as necessary.
Design and implement a class called OrderProcessor. You must implement at least the following methods:
accept; // this method accepts a GenericOrder or any of its subclass objects and stores it in any internal collection of OrderProcessor.
process; // this method sorts all accepted orders in the internal collection of GenericOrder into collections of ComputerPart, Peripheral, Cheese, Fruit, and Service. You must associate each object with the unique identifier. You may refer to the TwoTuple.java example in the text book.
dispatchXXX; // this method simulates the dispatch of the sorted collections. For example, the method dispatchComputerParts() should produce this output:
Motherboard name=Asus, price=$37.5, order number=123456
Motherboard – name=Asus, price=$37.5, order number=987654
RAM – name=Kingston, size=512, price=$25.0, order number=123456
Create a client class to test OrderProcessor. You will need to create a datagenerator for testing purpose. It is not mandatory but you may use a variation of Data Generator in TIJ pages 637 to 638.
Here is what I have for Q5
public abstract class OrderProcessor<T> {
private ArrayList<T> dataCollection = new ArrayList<T>();
public void accept(T item){
dataCollection.add(item);
}
public void process(){
Collections.sort(dataCollection);
}
public List getDataCollection(){
return dataCollection;
}
}
In its current state Collections.sort(dataCollection); doesn't compile because it does not accept T and if I change the ArrayList to String any function used from other subclasses won't work because they all T. Any help would be greatly appreciated.
thanks in advance.
EDIT: Since you want to partition your orders and not sort, you can use something like this:
dataCollection.stream().collect(
Collectors.groupingBy(order -> order.getIdentifier())
)
Here, this groups them by their identifiers and puts them into a Map. The order.getIdentifier() part is just a placeholder for whatever you want to use to divide them up. The return type will be Map<TypeOfIdentifier, T>.
For this to work, though, your T has to be of some specific type (T extends Product perhaps?) so you can get the identifier. Since I don't know the code for differentiating between different products, I can't put the exact code here.
The Javadoc for Collectors
This is why Collections.sort wasn't working for you, but you don't need Collections.sort anyways.
T must extend the Comparable interface, because obviously you can't sort objects of just any type. The Comparable interface has a compareTo method that lets you sort.
An alternative would be to write a custom Comparator that defines a single method: compare, which would take 2 objects of type T and return an int representing the order (in most cases it's basically the first argument minus the second argument). For this, you would need to use Collections.sort(dataCollection, customComparator).
You can define your comparator with a lambda expression, but I can't help you beyond that because I have no idea how you want to sort your objects.

Can JUNG library make edges based on predefined properties?

I have some data of let’s say type Person. This Person has a phone-number property but also a calling and a called phone-number properties.
class Person {
String id;
String displayName;
String phoneNr;
String callingNr; // or List<String> callingNrs;
String calledNr; // or List<String> calledNrs;
}
What I want, is I put a bunch of those Person objects in a Graph instance and than render the relationships on a view. Ideally the components drawn on the view are interactive, meaning you can click on a node/vertex that highlight the edges (and maybe more).
I tried JUNG, but in the documentation, I see some examples that I have to, kind of, define the relationships between Person objects myself, like below:
Graph.addEdge("edge-name", personA.phoneNr, personB.phoneNr);
I’m new to JUNG, but maybe there’s a way to tell JUNG about the properties of Person and that JUNG knows how to connect them?
Is this possible with JUNG? Or do I need another type of library, if yes, than can someone please provide me one I can use?
Here is what I would do:
Make a java.util.Map of each person's phone number (key) to an instance of the Person (value). That is your reverse number lookup.
Populate your reverse number lookup map by iterating over your collection of people using the PhoneNr as the key and the Person instance as the value.
Next, I would create an edge class 'PhoneCall' that contains information like 'time of call' and 'duration of call' (more or less info, depending on what you have available).
To add edges to your graph, iterate over your collection of Person instances, and for each Person, iterate over the collection of calling numbers. For each calling number, use the reverse number lookup map to get the person calling and make a directed edge to connect the calling person to the current person.
Do something similar for the each Person's collection of called numbers.
Your graph nodes will be Person instances, and your edges will be PhoneCall instances that connect one Person to another. Be sure to add an equals and hashCode method to your Person class and to your PhoneCall class so that they will work properly (and duplicates will be detected and hopefully ignored).
Hope this helps!

Data structure to represent Teams and Players

There is a Team object , that contains list of players List<Players>. All teams need to be stored in a Teams collection.
Conditions:
If a new player need to be added to a particular team , that particular team is retrieved from Teams and Player need to be added to Players list of that team
Each Team object in the collection Teams need to be unique based on the team name
Team objects in the collection need to be sorted based on team name.
Considerations:
In this scenario when I use List<Team> , I can achieve 1, 3 . But uniqueness cannot be satisfied.
If I use TreeSet<Team> 2,3 can be achieved. But as there is no get method on TreeSet , a particular team cannot be selected
So I ended up using TreeMap<teamName,Team>. This makes all 1,2,3 possible. But I think it's not the good way to do it
Which Data Structure is ideal for this use case? Preferably form Java collections.
You can utilize your TreeSet if you wish. However, if you're going to utilize the Set interface you can use remove(object o) instead of get. You'll remove the object, make your modifications, then add it back into the set.
I think extending (i.e. creating a subclass from) ArrayList or LinkedList and overriding the set(), add(), addAll(), remove(), and removeRange() methods in such way that they ensure the uniqueness and sortedness conditions (invariants) would be a very clean design. You can also implement a binary search method in your class to quickly find a team with a given name.
ArrayList is a better choice to base your class on, if you aren't going to add or remove teams too frequently. ArrayList would give you O(n) insertion and removal, but O(log n) cost for element access and ensuring uniqueness if you use binary search (where n is the number of elements in the array).
See the generics tutorial for subclassing generics.
How about using a Guava's MultiMap? More precisely, a SetMultimap. Specifically, a SortedSetMultimap. Even more specifically, its TreeMultimap implementation (1).
Explanations:
In a MultiMap, a Key points not to a single value, but rather to a Collection of values.
This means you can bind to a single Team key a collection of several Player values, so that's Req1 solved.
In a SetMultiMap, the Keys are unique.
This gets your Req2 solved.
In a SortedSetMultimap, the Valuess are also sorted.
While you don't specifically care for this, it's nice to have.
In a TreeMultimap, The Keyset and each of their Values collections are Sorted.
This gets your Req3 sorted (See what I did there?)
Usage:
TreeMultimap<Team, Player> ownership = new TreeMultimap<Team, Player>();
ownership.put(team1, playerA);
ownership.put(team1, playerB);
ownership.put(team2, playerC);
Collection<Player> playersOfTeamA = ownership.get(team1); // contains playerA, playerB
SortedSet<Team> allTeams = ownership.keySet(); // contains team1, team2
Gothas:
Remember to set equals and hashCode correctly on your Team object to use its name.
Alternatively, you could use the static create(Comparator<? super K> keyComparator, Comparator<? super V> valueComparator) which provides a purpose-built comparison if you do not wish to change the natural ordering of Team. (use Ordering.natural() for the Player comparator to keep its natural ordering - another nice Guava thing). In any case, make sure it is compatible with equals!
MultiMaps are not Maps because puting a new value to a key does not remove the previously held value (that's the whole point), so make sure you understand it. (for instance it still hold that you cannot put a key-value pair twice...)
(1): I am unsure wether SortedSetMultimap is sufficient. In its Javadoc, it states the Values are sorted, but nothing is said of the keys. Does anyone know any better?
(2) I assure you, I'm not affiliated to Guava in any way. I just find it awesome!

Java Map of Map

I need to do a look-up table based on two keys. I am building a mileage look-up chart similar to what is seen in the back of road maps. A sample of a chart can be found here. If you know the starting city is x and the ending city is y you look to find the intersection to find out the total miles.
When I first started attacking this problem I though of doing Two maps. City being an ENUM of my city of interest.
Map<City, Map<City, Integer>> map;
But, as I researched I am seeing warnings about Map's that have values of type Map. Is there an easier solution to my problem that I might be overlooking? With this being 66x66 col*row I want to make sure I do it right the first time and dont have to redo the data entry.
As a note I will be saving all my values into a database for easy update and retrieval so the solution would need to be easy to map with JPA or Hibernate etc.
Thanks in advanced.
It'd be easier if you do this:
Map<Pair<City, City>, Integer> map;
That is: create a new generic class, let's call it Pair that represents a pair of cities, and use it as key to your Map. Of course, don't forget to override hashCode() and equals() in Pair. And take a look at #increment1's answer, he's right: if the distance from city A to B is the same as the distance from B to A, then there's no need to store two pairs of cities, a single pair will do, no matter the order used to add the cities to the Map.
Notice that this is the strategy used by ORMs (for instance, JPA) when mapping composite keys in a database: create a new class (Pair in the example) that encapsulates all the objects used as keys, it'll be much easier to manage this way: conceptually, there's only one key - even if internally that key is composed of several elements.
Make a map of Path's, where Path is a custom class that holds two cities. Remember to override equals and hashcode.
Edit: Why is there 66x66 paths? Is the mileage different regarding which way you go (probably is a bit difference, but do you have that data)? If not, you can discard more than half that number of entries (the half is obvious, the 'more' part is from New York to New York entry no longer needs to be saved with 0).
You should create a simple class that contains two City references, from and to, and which overrides equals and hashCode appropriately. Then use that as your key.
Similar to other answers, I suggest creating a city pair class to be your map key (thus avoid a map of maps). One difference I would make, however, would be to make the city pair class order agnostic in regards to the cities in its hashCode and equals methods.
I.e. Make CityPair(Seattle,LA) equal to CityPair(LA,Seattle).
The advantage of this is that you would then not duplicate any unnecessary entries in your map automatically.
I would achieve this by having hashCode and equals always consider city1 to be the city with the lower ordinal value (via Enum.ordinal()) in your enum.
Alternatively, try this simple unordered pair implementation given in another question and answer.
If you're using Eclipse Collections, you can use MutableObjectIntMap and Pair.
MutableObjectIntMap<Pair<City, City>> map = ObjectIntHashMap.newMap();
map.put(Tuples.pair(newYorkCity, newark), 10);
map.put(Tuples.pair(newYorkCity, orlando), 1075);
Assert.assertEquals(10, map.get(Tuples.pair(newYorkCity, newark)));
Assert.assertEquals(1075, map.get(Tuples.pair(newYorkCity, orlando)));
Pair is built into the framework so you don't have to write your own. MutableObjectIntMap is similar to a Map<Object, Integer> but optimized for memory. It's backed by an Object array and an int array and thus avoids storing Integer wrapper objects.
Note: I am a committer for Eclipse collections.
To do the same as the graphic, i would use a 2d- array.
// index is the city code:
int[][] distances;
store the city code in a
Map<String, Integer> cityNameToCodeMap
Use it as follows;
Integer posA = cityNameTCodeMap.get("New York");
// TODO check posA and posB for null, if city does not exits
Integer posB = cityNameTCodeMap.get("Los Angeles");
int distance = distances[posA][posB];
reason for this design:
The matrix is in the graphic is not a sparse matrix, it is full.
For that case an 2d-array uses least memory.
There is another way to do this, that may work for you. Basically, you want to create a class called something like CityPair. It would take 2 arguments to its constructor, the start and end cities, and would override the hashcode function to generate a unique hash based on the two inputs. These two inputs could then be used in a HashMap<CityPair,Integer> type.
if there are only 66 cities, then your hashing function could look something like this:
//first assign each city an id, 0-65 and call it city.getID()
#Override public int hashCode()
{
return ((city1.getID() << 16) | (city2.getID()))
}
of course as noted in the comments, and in other answers, you will want to override the function prototyped by:
public boolean equals(Object)
from object so that the map can recover from a hash collision

map vs ?? for storing name and score

I used hashmap to store data.
The problem is that I just noticed hashmap can't have more than one same key.
What else should I use to store data which the data looks like this:
Name1 100.0
Name2 99.8
Name3 121.5
...
Other thing I'm trying to do is to show data of one certain person, when I call that key.
So, is there way to store more than one value related to one key? or should I use other type of storage?
A hashmap can have duplicate keys if you store the values within another data structure such as a linked list or a tree at each key index. Then you just have to decide how to handle the collisions.
Edit:
HashMap
["firstKey"] => LinkedList of (3,4,5)
["secondKey"] => null
["thirdKey"] => LinkedList of (3)
To extend on Matthew Coxes answer, you could extend the Hashtable Class so that it automatically manages your lists for you and would give you the appearance of having multiple keys.
The Google guava library contain some collection type that allow for more that one element per key. The Multimap is the first one that come to mind.
http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Multimap.html
Guava in general contain a lot of very convenient utilities, I think its worth checking out.
If you can't use an external library, you can simply (Like Matthew Cox said) mix a map and a List with Map<K, List<V>>. But that is a bit more inconvenient to work with since you have to initialise a list for every key.
I'd rather go with my own datamodel and store that in a list, or map if you want fast access, e.g.
public class Player {
private String name;
private List<Float> scores;
}
The advantages:
you can easily see, what the structure wants to express
you can easily extend it (e.g. add aliases for the player, or calculate the avarage scor of player 1)

Categories