How do I guarantee the order of items in a collection

How do I guarantee the order of items in a collection - java

I have a list of objects and each and every object in the list have a position which may not change unless explicitly changed, in other words, the objects are in a queue. What collection should I use in my entity class to store these objects and how should that collection be annotated?
I currently have this
#Entity
class Foo {
...
#OneToMany(mappedBy = "foo", cascade = CascadeType.ALL)
List<Bar> bars = new ArrayList<Bar>();
...
}
If this is not possible with JPA purely, I'm using EclipseLink as my JPA provider, so EclipseLink specific annotations will ok if nothing else helps.
EDIT: Note people, the problem is not that Java wouldn't preserv the order, I know most collections do, the problem is that I don't know a smart way for JPA to preserv the order. Having an order id and making the query order by it is possible, but in my case maintaining the order id is laborious (because the user interface allows reordering of items) and I'm looking for a smarter way to do it.

If you want this to be ordered after round-tripping to SQL, you should provide some sort of ordering ID within the entity itself - SQL is naturally set-oriented rather than list-oriented. You can then either sort after you fetch them back, or make sure your query specifies the ordering too.
If you give the entity an auto-generated integer ID this might work, but I wouldn't like to guarantee it.

Use a sort order id, as Jon suggested, then add an #OrderBy annotation below the #OneToMany. This will order any query by the specified field.
As an example, if you add a new field called "sortId" to Bar, Foo would look like this:
#Entity
class Foo {
...
#OneToMany(mappedBy = "foo", cascade = CascadeType.ALL)
#OrderBy("sortId ASC")
List bars = new ArrayList();
...
}

You can
Sort a List before creation
Sort a List after creation
Use a collection that performs a sort on insert. TreeMap, TreeSet

A linked list implements the Queue inteface in java and allows you to add things in the middle...
TBH most of the collections are ordered aren't they...
Check the docs, most say whether or not they are ordered.

It's worth trying LinkedList instead of ArrayList, however, as Jon said, you need to find a way of persisting the order information.
A solution will probably involve issuing an order number to each entry an storing it as a SortedMap, when converting into the List, if List is that you need.
However, ORM could potentially be clever enough to do all the conversions for you if you stored the collection as LinkedList.

Related

How to properly modify attached hibernate collections without too many side effects

I have the following code:
I have a unidirectional one-to-many relationship between Article and Comments:
#Entity
public class Article {
#OneToMany(orphanRemoval=true)
#JoinColumn(name = "article_id")
private List<Comment> comments= new ArrayList<>();
…
}
I used set ophanRemoval=true in order to mark the "child" entity to be removed when it's no longer referenced from the "parent" entity, e.g. when you remove the child entity from the corresponding collection of the parent entity.
Here is an example:
#Service
public class MyService {
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comments> comments = article.getComments();
//Calls a method which modifies removes some comments from the collection based on some logic
removeSomeComments(comments); //side effect
modifyComments(comments); //side effect
.....
return repository.save(article);
}
}
So I have some statements that perform some actions on the collection, which will then get persisted in the database. In the example above I am getting the article from the database, performing some mutations on the object, by deleting/modifying some comments and then saving it in the database.
I am not sure what's the cleanest way of modifying collections of objects without having to many side-effects, which leads to an error-prone code (my code is more complex and requires multiple mutations on the collection).
Since I am inside the transaction any changes (adding, deleting or modifying children) to the collection will be persisted the next time EntityManager.commit() is called.
However, I tried to refactor this code and write it in more expressive functional style:
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comment> updatedComments = article.getComments().stream()
filter(some logic..) //remove some comments from the list based on a filter
sorted()
.filter(again some logic) //do more stuff
.collect(Collectors.toList());
article.add(updatedComments);
return repository.save(article);
}
I like this approach more, as it short, concise and more expressive.
However this won't work since it throws:
A collection with cascade=“all-delete-orphan” was no longer referenced by the owning entity instance
That's because I am assigning a new list (updatedComments) .
If I want to remove or modify children from the parent I have to modify the contents of the list instead of assigning a new list.
So I had to do this at the end:
article.getComments().clear();
article.getComments().addAll(updatedComments);
repository.save(article)
Do you consider the second example a good practice?
I am not sure how to work with collections in JPA.
My business logic is more complex and i want to avoid having 3-4 methods that mutate a given collection (attached to a hibernate session) which was passed in as parameter.
I think the second example has less potential for side effects because it doesn't mutate any input parameter. What do you think?
(I am using Spring-Boot 2.2.5)

You can actually try and turn the predicate logic used in your filter
.filter(some logic..) //remove some comments from the list based on a filter
to be used within removeIf and perform the modification as:
Article article = repository.findById(articleId);
article.getComments().removeIf(...inverse of some logic...) //this
return repository.save(article);

Which method is more optimal for the base and sorting speed?

I currently have a list
#OneToMany(mappedBy = "movie", orphanRemoval = true, cascade = CascadeType.ALL, fetch = FetchType.LAZY)
#OrderBy("date ASC")
private List<MovieReleaseDateEntity> releaseDates = new ArrayList<>();
entities when downloading are sorted by date.
However, as you know, the List are not the most suitable for JPA and it is recommended to use Set. That's why I have a different idea
#OneToMany(mappedBy = "movie", orphanRemoval = true, cascade = CascadeType.ALL, fetch = FetchType.LAZY)
private SortedSet<MovieReleaseDateEntity> releaseDates = new TreeSet<>();
and implements Comparable
public int compareTo(MovieReleaseDateEntity o) {
return this.date.compareTo(o.date);
}
Which way is better? The first with a List or the second with Set?

Database are designed to execute an order by command in an optimized way while the column used in the order by is indexed.
So you should favor the ordering by the database if you can tweak the index constraints on your tables.
However, as you know, the List are not the most suitable for JPA and
it is recommended to use Set.
I would rather say that you can not abuse of the List mapping in an entity as the number of acceptable is limited for good sense reasons (cardinalities).
You can also see things more simply : if you have few elements, I suppose that using a Set is probably an acceptable way in terms of performance.
Otherwise using a List with an order by done by the database should be probably favored.
In any case, use the structure (List or Set) that seems the most fine for your use case and then measure the actual performance before changing your design to hope having better performance.

The first approach (#OrderBy("date ASC")) is sorted on database level, and second one is in memory with java implementation. If returned list is big enough in size, it's better to use first approach.
If returned collection is fairly small and you need some complex sort logic (but it's not your case with sorting by specific field), sorting with SortedSet for entity that implements Comparable might be justified.

basic difference between <bag> and <list>

I was learning Hibernate, where collections are used in hibernate. I know that bag in collection is used for mapping property of type Collection or list. And also difference betweeen bag and list is bag is unordered with duplicate allowed collection type, and in list we maintain the insertion order in collection.
1> But apart from this is there any other difference between this two?
2> I read in one book that,
bag is the lack of objects to be used as keys for the elements in
the bag, which decreases performance when updating or deleting
elements. When an element of the bag changes, Hibernate must update
all of the elements since there is no way for Hibernate to find out
which element has changed
do any one have any idea about this?

Your definition is correct. Bag works like a list without index (you don't know what is the order of elements), so it's similar to Set with duplicates.
The most important thing is to know that Hibernate can map your collections as a bag implicitly if you don't use index column in one-to-many relation. This may decrease the performance of delete/update statements and it's good to be aware of this.
Here you can find how it works internally:
http://assarconsulting.blogspot.co.uk/2009/08/why-hibernate-does-delete-all-then-re.html

When you don't want insertion order capability of list but want to allow duplicate values then you can go for bag. Here you can't go for set because it doesn't allow duplicate values.

a) BAG
Just want to add One more point. There are two types of bags 1)Bag without id's and 2) Bag with Id's.
In Bag without Id's when you remove any element, entire bag got cleared and the elements are inserted again.
But in bag with Id's, the element which has been removed only gets removed, rest of the elements are not impacted.
#ElementCollection
#CollectionTable(name = "account_user",joinColumns=#JoinColumn(name="user_id"))
#CollectionId(columns = { #Column(name = "account_user_id") }, generator = "sequence", type = #Type(type = "long"))
#Column(name = "account_provider")
private Collection<String> accountSet = new ArrayList<String>();
So if you are using bag, always try to use id bags unless you have good reason to use another.
b) List
List are also of two types, lists with order and without order.
Lists without order is similar to bag without ids.
#ElementCollection
#CollectionTable(name = "account_user",joinColumns=#JoinColumn(name="user_id"))
#Column(name = "account_provider")
private List<String> accountSet = new ArrayList<String>();
While in list with order, the data structure maintains an indexing order. So, if you remove one of the elements. rest of the elements got shifted automatically.
Hence, this type of list is used to maintain the order in which the elements are inserted into the list.
#ElementCollection
#OrderColumn(name="account_provider_order")
#CollectionTable(name = "account_user",joinColumns=#JoinColumn(name="user_id"))
#Column(name = "account_provider")
private List<String> accountSet = new ArrayList<String>();
Also, note that although ordering is persisted in the seperate column of table. It doesn't appears on the object state, when you fetch. Hence, it is just used for internal operations.

To initialize or not initialize JPA relationship mappings?

In one to many JPA associations is it considered a best practice to initialize relationships to empty collections? For example.
#Entity
public class Order {
#Id
private Integer id;
// should the line items be initialized with an empty array list or not?
#OneToMany(mappedBy="order")
List<LineItem> lineItems = new ArrayList<>();
}
In the above example is it better to define lineItems with a default value of an empty ArrayList or not? What are the pros and cons?

JPA itself doesn't care whether the collection is initialized or not. When retrieving an Order from the database with JPA, JPA will always return an Order with a non-null list of OrderLines.
Why: because an Order can have 0, 1 or N lines, and that is best modeled with an empty, one-sized or N-sized collection. If the collection was null, you would have to check for that everywhere in the code. For example, this simple loop would cause a NullPointerException if the list was null:
for (OrderLine line : order.getLines()) {
...
}
So it's best to make that an invariant by always having a non-null collection, even for newly created instances of the entity. That makes the production code creating new orders safer and cleaner. That also makes your unit tests, using Order instances not coming from the database, safer and cleaner.

I would also recommend using Guava's immutable collections, e.g.,
import com.google.common.collect.ImmutableList;
// ...
#OneToMany(mappedBy="order")
List<LineItem> lineItems = ImmutableList.of();
This idiom never creates a new empty list, but reuses a single instance representing an empty list (the type does not matter). This is a very common practice of functional programming languages (Scala does this too) and reduces to zero the overhead of having empty objects instead of null values, making any efficiency argument against the idiom moot.

I would rather prefer an utility like this:
public static <T> void forEach(Collection<T> values, Consumer<T> consumer) {
if (values != null) values.stream().forEach(consumer);
}
and use it in code like:
Utils.forEach(entity.getItems(), item -> {
// deal with item
});

My suggestion would be to not initialize them.
We ran into a situation where we initialized our collections, then retrieved same entity essentially twice successively. After the second retrieve, a lazy loaded collection that should have had data was empty after calling its getter. If we called the getter after the first retrieve, on the other hand, the collection did load the data. Theory is that the second retrieve got a managed entity from the session that had its collection initialized to empty and appeared to already be loaded or appeared to be modified, and therefore no lazy load took place. Solution was to NOT initialize the collections. This way we could retrieve the entity multiple times in the transaction and have its lazy loaded collections load correctly.
One more item to note: in a different environment, the behavior was different. The collection was lazy loaded just fine when calling the collection's getter on the entity that was retrieved the second time in the same transaction.
Unfortunately I don't have information on what was different between the two environments. It appears - although we didn't prove it 100% and didn't identify the implementations - that different JPA implementations work differently with respect to initialized collections.
We were using hibernate - just don't know which version we were using on each of the two platforms.

Java collection for this use case

Let's say we have a bunch of Car objects.
Each Car has some distinguishing properties e.g. manufacturer, model, year, etc. (these can be used to create distinct hashCodes).
Each car has a List of PurchaseOffer objects (a PurchaseOffer object contains pricing\retailer info).
We receive Lists of Cars from several different sources, each Car with a single PurchaseOffer.
Thing is, these lists may overlap - a Car can appear in more than one list.
We wish to aggregate the lists into a single collection of Cars where each Car holds all encountered PurchaseOffers for it.
My Problem is choosing what to collection to use in this aggregation process:
Feels natural to use java.util.HashSet for holding our cars, that way when going over the different lists of Cars, we can check if a car already exists in the Set in amortized O(1),
however - you cannot retrieve an element from a Set (in our case - when we go encounter a Car that already exists in the Set - we would have liked to retrieve that Car from the Set based on its identifying hashCode and add PurchaseOffers to it).
I can use a HashMap where each Car's hashCode maps to the actual Car object, but it probably isn't the school-book solution since it is unsafe - I would have to make sure myself that every hashCode maps to a Car with that hashCode - there could be inconsistency.
Of course, can make a designated data structure that guarantees this consistency - Shouldn't one already exist ?
Can anyone suggest the data-structure I am after, or point out a design mistake ?
Thanks.

Since this is a many-to-many relationship, you need a bi-directional multi-map. Car is the key for the first one, with a List of PurchaseOrder as the value. The PurchaseOrder is the key for the second one, with a List of Cars as the value.
The underlying implementation is two HashMaps.
Put an API on top of it to get the behavior you need. Or see if Google Collections can help you. It's a combination of a BiMap and two MultiMaps.

I think that you really do need (at least) a HashMap<Car, List<PurchaseOffer>> ... as suggested by #Andreas_D
Your objection that each Car already has a List<PurchaseOffer> is beside the point. The list in the HashMap is the aggregate list, containing all PurchaseOffer objects from all Car objects that stand for the same physical car.
The point of creating a new list is to avoid changing the original lists on the original Car objects. (If that was not a concern, then you could pick one instance of Car from the set that represent a physical car, and merge the PurchaseOffer objects from the others into that list.)
I'm not entirely sure why #duffymo suggested a bi-directional map between, but I think it is because the different Car objects from different sources may have complementary (or contradictory) information for the same physical car. By keeping all instances, you avoid discarding information. (Once again, if you are happy to discard mutate and/or discard information, you could attempt to merge the information about each individual car into a single Car object.
If you really didn't care about preserving information and were prepared to merge stuff willy-nilly then the following approach would probably work:
HashMap<Car, Car> map = new HashMap<Car, Car>(...);
for (Car car : carsToBeAggregated) {
Car master = nap.get(car);
if (master == null) {
map.put(car, car);
} else {
master.offers.addAll(car.offers);
// optionally, merge other Car information from car to master
}
}
You should NOT be trying to use the Car.hashCode() as a key for anything. Hashcode values are not unique identifiers: there is a distinct possibility that two different cars will end up with the same hashcode value. If you attempt to use them as if they were unique identifiers you'll get into trouble ...

The basic datastructure should be a HashMap<Car, List<PurchaseOffer>>. This allows for storing and receiving all offers for one selected car.
Now you may have to find a suitable implementation for Car.equals() to assure, that "cars" coming from different source are really the same. What about basing equals() on a unique identifier for a real world car (VIN)?

I would prefer to use a HashMap<Car, List<PurchaseOffer>>, as suggested before (Andreas, Stephen), mainly if the Car object does not hold the list of PurchaseOffers.
Otherwise I would consider using a HashMap<Car, Car> or, better IMO, a HashMap<ID, Car> if there is an unique ID for each Car.
It can not simply map the Car's hashCode to the Car, as mentioned in the question, since distinct Cars can have the same hashCode!
(Anyway, I would create an own class for storing and managing the Cars. This would contain the HashMap, or whichever - so it's easy to change the implementation without needing to change its interface)

create tout custom class that extends hash
Set, override method contains(Object o) check there os hash code is same or not and return result according, and add object to set of and only if it not containing that object

How about a defining a new custom Aggregation class? Define the hashcode such that the id of the car acts as the key and override the equals() accordingly. Define a custom method for accepting your original car and do a union operation on the lists. Finally store the custom objects in a HashSet for achieving constant time look up.
In purist terms, aggregation is a behavior beyond the scope of a single object. Visitor pattern tries to address a similar problem.
Alternatively if you have a sql datastore, a simple select using group by would do the trick.

Welp, yeah, HashMap<Car, List<PurchaseOffer>> would be perfect if it wasn't for the fact that
each Car contains a List<PurchaseOffer> as a property. Can say that a Car object is composed
of two parts: an identifying part (let's say each car indeed has a unique VIN), and the list of
PurchaseOffers.
In this case split the Car class in two classes - the CarType class with the identifying attributes, and then the list part (maybe both together used by Car). Then use Map<CarType, Lost<PurchaseOffer> for your datastructure (or MultiMap<CarType, PurchaseOffer>).

//alt. 1
List<Offer> offers;
List<Car> cars;
Map<Car, List<Offer>> mapCarToOffers;
Map<Offer, List<Car>> mapOfferToCars;
public void List<Offer> getOffersForCar(Car aCar);
public void List<Car> getCarsForOffer(Offer anOffer);
Alternative 1 would make use of the hashCode() of Car and Offer
//alt. 2
List<Offer> offers;
List<Car> cars;
Map<Integer, List<Offer>> mapCarIdToOffers;
Map<Integer, List<Car>> mapOfferIdToCars;
public void List<Offer> getOffersForCarId(int aCarId);
public void List<Car> getCarsForOfferId(int anOfferId);
Alternative 2 would make use of the hashCode() of Integer. This would allay your concerns about "safety" as the hash codes for Integer objects should not overlap where the values are unique. This incurs the additional overhead of having to maintain unique IDs for each Car and Offer object, however, I am guessing that you probably already have those from your business requirements.
Note, you may choose to use other classes as alternative to ints for ID's (e.g. String).
For both alternatives, implement the Lists with ArrayList or LinkedList - which one is better is up to you to determine based on other requirements, such as the frequency of insertion/deletion vs lookup. Implement the Maps with HashMap - see comments above about how hash codes are used.
As a side note, in our software, we use these both of the above to represent similar types of many-to-many data. Very similar to your use case.
Both alternatives work very well.

Why not use an object database for this? You could store any object graph you wanted, and you'd get a search API with which you could do any relationship/retrieval mechanism you wanted. A simple collection could work, but it sounds like you want a more complex relationship than a collection would provide. Look into db4o (http://db4o.com) - it's very powerful for this sort of thing.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.