To initialize or not initialize JPA relationship mappings? - java

In one to many JPA associations is it considered a best practice to initialize relationships to empty collections? For example.
#Entity
public class Order {
#Id
private Integer id;
// should the line items be initialized with an empty array list or not?
#OneToMany(mappedBy="order")
List<LineItem> lineItems = new ArrayList<>();
}
In the above example is it better to define lineItems with a default value of an empty ArrayList or not? What are the pros and cons?

JPA itself doesn't care whether the collection is initialized or not. When retrieving an Order from the database with JPA, JPA will always return an Order with a non-null list of OrderLines.
Why: because an Order can have 0, 1 or N lines, and that is best modeled with an empty, one-sized or N-sized collection. If the collection was null, you would have to check for that everywhere in the code. For example, this simple loop would cause a NullPointerException if the list was null:
for (OrderLine line : order.getLines()) {
...
}
So it's best to make that an invariant by always having a non-null collection, even for newly created instances of the entity. That makes the production code creating new orders safer and cleaner. That also makes your unit tests, using Order instances not coming from the database, safer and cleaner.

I would also recommend using Guava's immutable collections, e.g.,
import com.google.common.collect.ImmutableList;
// ...
#OneToMany(mappedBy="order")
List<LineItem> lineItems = ImmutableList.of();
This idiom never creates a new empty list, but reuses a single instance representing an empty list (the type does not matter). This is a very common practice of functional programming languages (Scala does this too) and reduces to zero the overhead of having empty objects instead of null values, making any efficiency argument against the idiom moot.

I would rather prefer an utility like this:
public static <T> void forEach(Collection<T> values, Consumer<T> consumer) {
if (values != null) values.stream().forEach(consumer);
}
and use it in code like:
Utils.forEach(entity.getItems(), item -> {
// deal with item
});

My suggestion would be to not initialize them.
We ran into a situation where we initialized our collections, then retrieved same entity essentially twice successively. After the second retrieve, a lazy loaded collection that should have had data was empty after calling its getter. If we called the getter after the first retrieve, on the other hand, the collection did load the data. Theory is that the second retrieve got a managed entity from the session that had its collection initialized to empty and appeared to already be loaded or appeared to be modified, and therefore no lazy load took place. Solution was to NOT initialize the collections. This way we could retrieve the entity multiple times in the transaction and have its lazy loaded collections load correctly.
One more item to note: in a different environment, the behavior was different. The collection was lazy loaded just fine when calling the collection's getter on the entity that was retrieved the second time in the same transaction.
Unfortunately I don't have information on what was different between the two environments. It appears - although we didn't prove it 100% and didn't identify the implementations - that different JPA implementations work differently with respect to initialized collections.
We were using hibernate - just don't know which version we were using on each of the two platforms.

Related

How to properly modify attached hibernate collections without too many side effects

I have the following code:
I have a unidirectional one-to-many relationship between Article and Comments:
#Entity
public class Article {
#OneToMany(orphanRemoval=true)
#JoinColumn(name = "article_id")
private List<Comment> comments= new ArrayList<>();
…
}
I used set ophanRemoval=true in order to mark the "child" entity to be removed when it's no longer referenced from the "parent" entity, e.g. when you remove the child entity from the corresponding collection of the parent entity.
Here is an example:
#Service
public class MyService {
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comments> comments = article.getComments();
//Calls a method which modifies removes some comments from the collection based on some logic
removeSomeComments(comments); //side effect
modifyComments(comments); //side effect
.....
return repository.save(article);
}
}
So I have some statements that perform some actions on the collection, which will then get persisted in the database. In the example above I am getting the article from the database, performing some mutations on the object, by deleting/modifying some comments and then saving it in the database.
I am not sure what's the cleanest way of modifying collections of objects without having to many side-effects, which leads to an error-prone code (my code is more complex and requires multiple mutations on the collection).
Since I am inside the transaction any changes (adding, deleting or modifying children) to the collection will be persisted the next time EntityManager.commit() is called.
However, I tried to refactor this code and write it in more expressive functional style:
public Article modifyComment(Long articleId) {
Article article = repository.findById(articleId);
List<Comment> updatedComments = article.getComments().stream()
filter(some logic..) //remove some comments from the list based on a filter
sorted()
.filter(again some logic) //do more stuff
.collect(Collectors.toList());
article.add(updatedComments);
return repository.save(article);
}
I like this approach more, as it short, concise and more expressive.
However this won't work since it throws:
A collection with cascade=“all-delete-orphan” was no longer referenced by the owning entity instance
That's because I am assigning a new list (updatedComments) .
If I want to remove or modify children from the parent I have to modify the contents of the list instead of assigning a new list.
So I had to do this at the end:
article.getComments().clear();
article.getComments().addAll(updatedComments);
repository.save(article)
Do you consider the second example a good practice?
I am not sure how to work with collections in JPA.
My business logic is more complex and i want to avoid having 3-4 methods that mutate a given collection (attached to a hibernate session) which was passed in as parameter.
I think the second example has less potential for side effects because it doesn't mutate any input parameter. What do you think?
(I am using Spring-Boot 2.2.5)
You can actually try and turn the predicate logic used in your filter
.filter(some logic..) //remove some comments from the list based on a filter
to be used within removeIf and perform the modification as:
Article article = repository.findById(articleId);
article.getComments().removeIf(...inverse of some logic...) //this
return repository.save(article);

Java variable type Collection for HashSet or other implementations?

I have often seen declarations like List<String> list = new ArrayList<>(); or Set<String> set = new HashSet<>(); for fields in classes. For me it makes perfect sense to use the interfaces for the variable types to provide flexibility in the implementation. The examples above do still define which kind of Collections have to be used, respectively which operations are allowed and how it should behave in some cases (due to docs).
Now consider the case where actually only the functionality of the Collection (or even the Iterable) interface is required to use the field in the class and the kind of Collection doesn't actually matter or I don't want to overspecify it. So I choose for example HashSet as implementation and declare the field as Collection<String> collection = new HashSet<>();.
Should the field then actually be of type Set in this case? Is this kind of declaration bad practice, if so, why? Or is it good practice to specify the actual type as less as possible (and still provide all required methods). The reason why I ask this is because I have hardly ever seen such a declaration and lately I get more an more in the situation where I only need to specify the functionality of the Collection interface.
Example:
// Only need Collection features, but decided to use a LinkedList
private final Collection<Listener> registeredListeners = new LinkedList<>();
public void init() {
ExampleListener listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
}
public void reset() {
for (Listener listener : registeredListeners) {
unregisterListenerSomewhere(listener);
}
registeredListeners.clear();
}
Since your example uses a private field it doesn't matter all that much about hiding the implementation type. You (or whoever is maintaining this class) can always just go look at the field's initializer to see what it is.
Depending on how it's used, though, it might be worth declaring a more specific interface for the field. Declaring it to be a List indicates that duplicates are allowed and that ordering is significant. Declaring it to be a Set indicates that duplicates aren't allowed and that ordering is not significant. You might even declare the field to have a particular implementation class if there's something about it that's significant. For example, declaring it to be LinkedHashSet indicates that duplicates aren't allowed but that ordering is significant.
The choice of whether to use an interface, and what interface to use, becomes much more significant if the type appears in the public API of the class, and on what the compatibility constraints on this class are. For example, suppose there were a method
public ??? getRegisteredListeners() {
return ...
}
Now the choice of return type affects other classes. If you can change all the callers, maybe it's no big deal, you just have to edited other files. But suppose the caller is an application that you have no control over. Now the choice of interface is critical, as you can't change it without potentially breaking the applications. The rule here is usually to choose the most abstract interface that supports the operations you expect callers to want to perform.
Most of the Java SE APIs return Collection. This provides a fair degree of abstraction from the underlying implementation, but it also provides the caller a reasonable set of operations. The caller can iterate, get the size, do a contains check, or copy all the elements to another collection.
Some code bases use Iterable as the most-abstract interface to return. All it does is allow the caller to iterate. Sometimes this is all that's necessary, but it might be somewhat limiting compared to Collection.
Another alternative is to return a Stream. This is helpful if you think the caller might want to use stream's operations (such as filter, map, find, etc.) instead of iterating or using collection operations.
Note that if you choose to return Collection or Iterable, you need to make sure that you return an unmodifiable view or make a defensive copy. Otherwise, callers could modify your class's internal data, which would probably lead to bugs. (Yes, even an Iterable can permit modification! Consider getting an Iterator and then calling the remove() method.) If you return a Stream, you don't need to worry about that, since you can't use a Stream to modify the underlying source.
Note that I turned your question about the declaration of a field into a question about the declaration of method return types. There is this idea of "program to the interface" that's quite prevalent in Java. In my opinion it doesn't matter very much for local variables (which is why it's usually fine to use var), and it matters little for private fields, since those (almost) by definition affect only the class in which they're declared. However, the "program to the interface" principle is very important for API signatures, so those cases are where you really need to think about interface types. Private fields, not so much.
(One final note: there is a case where you need to be concerned about the types of private fields, and that's when you're using a reflective framework that manipulates private fields directly. In that case, you need to think of those fields as being public -- just like method return types -- even though they're not declared public.)
As with all things, it's a question of tradeoffs. There are two opposing forces.
The more generic the type, the more freedom the implementation has. If you use Collection you're free to use an ArrayList, HashSet, or LinkedList without affecting the user/caller.
The more generic the return type, the less features there are available to the user/caller. A List provides index-based lookup. A SortedSet makes it easy to get contiguous subsets via headSet, tailSet, and subSet. A NavigableSet provides efficient O(log n) binary search lookup methods. If you return Collection, none of these are available. Only the most generic access functions can be used.
Furthermore, the sub-types guarantee special properties that Collection does not: Sets hold unique items. SortedSets are sorted. Lists have an order; they're not unordered bags of items. If you use Collection then the user/caller can't necessarily assume that these properties hold. They may be forced to code defensively and, for instance, handle duplicate items even if you know there won't be duplicates.
A reasonable decision process might be:
If O(1) indexed access is guaranteed, use List.
If elements are sorted and unique, use SortedSet or NavigableSet.
If element uniqueness is guaranteed and order is not, use Set.
Otherwise, use Collection.
It really depends on what you want to do with the collection object.
Collection<String> cSet = new HashSet<>();
Collection<String> cList = new ArrayList<>();
Here in this case if you want you can do :
cSet = cList;
But if you do like :
Set<String> cSet = new HashSet<>();
the above operation is not permissible though you can construct a new list using the constructor.
Set<String> set = new HashSet<>();
List<String> list = new ArrayList<>();
list = new ArrayList<>(set);
So basically depending on the usage you can use Collection or Set interface.

emptyList() vs emptySet(), is there any reason to chose one over the other if an instance of Collection is needed?

In the JDK, there's Collection.emtpyList() and Collection.emptySet(). Both in their own right. But sometimes all that is needed is an empty, immutable instance of Collection. To me, there's no reason to chose one over the other as both implement all operations of Collection in an efficient way and with the same results. Yet each time I need such an empty collection I ponder which one to use for a second of two.
I do not expect to gain a deeper understanding of the collections framework from an answer to this question but maybe there's a subtle reason I could use to justify choosing one over the other without thinking about it ever again.
An answer should state at least one reason preferring one of Collection.emtpyList() and Collection.emptySet() over the other in a context where they're functionally equivalent. An answer is better if the stated reason is near the top of this list:
There's a case where the type system is happier with one over the other (e.g. type inference allows shorter code with one than the other).
There is a performance difference, maybe in some special case (e.g. if the empty collection is passed as an argument to some of the collection framework's static or instance methods like Collections.sort() or Collection.removeAll()).
Choosing one over the other "makes more sense" in the general case, if you think about it.
Examples where this question arises
To give some context, here are two examples where I am in need of an empty, unmodifiable collection.
This is an example of an API that allows creating some object by optionally specifying a collection of objects that are used in the creation. The second method just calls the first one with an empty collection:
static void createObjectWithTheseThings(Collection<Thing> things) {
...
}
static void createObjectWithoutAnyThings() {
createObjectWithTheseThings(Collections.emptyXXX());
}
This is an example of an Entity with state represented by an immutable collection stored in a non-final field. On initialization the field should be set to an empty collection:
class Example {
// Initialized to an empty collection.
private Collection<T> containedThings = Collections.emptyXXX();
...
}
Unfortunately I don't have an answer that will make the top of your priority list but if I were you I'd settle on
Collections.emptySet
Type inference was your first priority but I don't know if the choice can/should influence that given you were looking for an emptyCollection()
On the second priority, think about any api that takes in a collection which performs differently (accidentally/intentionally) based on the sub-interfaces of the concrete object passed in. Aren't they more likely to offer varied performance based on the concrete implementations (as with an ArrayList or LinkedList) instead? The empty set/list are not modeled on any empty data structures anyway; they are dummy implementations - hence no real difference
Based on java's modelling of these interfaces (which admittedly is not ideal), a Collection is very similar to a Set. In fact I think the methods are almost exactly the same. Logically too it looks OK with List being the specific-sub type that adds additional ordering concerns.
Now Collection and Set looking very similar(java-wise) brings up a question. If you are using a Collection type, it is clear it is not a list you want. Now the question is are you sure you don't mean a Set. If you don't, then are you using something like a Bag (surely there must be concrete instances which are not empty in the overall logic). So if you are concerned with say a Bag, then shouldn't it be up to the Bag api to provide an emptyBag() method? Just wondering. btw, I'd stick with emptySet() in the meantime :)
For the emptyXXX(), it really doesn't matter at all - since they are both empty (and they are unmodifieable, so they always stay empty) it doesn't matter at all. They will be equally suited to all operations Collection offers.
Take a look at what Collections really gives you there: Special implementations (the instances are shared across calls!). All relevant operations are dummy implementations that either return a constant result or immediately throw. Even iterator() is just a dummy with no state.
It wont make any notable difference at all.
Edit: You could say for the special case of emptyList/Set, they are semantically and complexity-wise the same at the Collecton interface level. All operations available on Collection are implemented by emptySet/List as O(1) operations. And since they're following both the contract defined by Collection, they are semantically identical too.
The only situation I can imagine this making a difference is if the code that will use your Collection does something like this:
Collection<T> collection = ...
List<T> asAList;
if (collection instanceof List) {
asAList = (List<T>) collection;
} else {
asAList = new ArrayList<T>(collection);
}
Obviously in a case like this you would want to use emptyList(), while if the secret target type was a Set, you'd want emptySet().
Otherwise, in terms of what "makes more sense", I agree with #ac3's logic that a generic Collection is like a Bag, and an empty immutable Set and empty immutable Bag are pretty much the same thing. However, a person very used to using immutable lists might find those easier to think of.

Regarding unmodifiable collection

Consider the following code below
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Test {
public static void main(String[] args) {
List<Integer> intList1=new ArrayList<Integer>();
List<Integer> intList2;
intList1.add(1);
intList1.add(2);
intList1.add(3);
intList2=Collections.unmodifiableList(intList1);
intList1.add(4);
for(int i=0;i<4;i++)
{
System.out.println(intList2.get(i));
}
}
}
The result of the above code is
1
2
3
4
In the above code we create an unmodifiable List intList2 from the contents of the List intList1. But after the Collections.unmodifiable statement when I make a change to intList1 that change reflects to intList2. How is this possible ?
You need to read the Javadoc for Collections.unmodifiableList
Returns an unmodifiable view of the specified list.
This means that the returned view is unmodifiable. If you have the original reference you can change the collection. If you change the collection then changes will be reflected in the view.
This has the advantage of being very fast, i.e. you don't need to copy the collection, but the disadvantage that you noted - the resulting collection is a view.
In order to create a truly unmodifiable collection you would need to copy then wrap:
intList2=Collections.unmodifiableList(new ArrayList<>(intList1));
This copies the contents of intList1 into another collection then wraps that collection in the unmodifiable variable. No reference to the wrapped collection exists.
This is expensive - the entire underlying datastore (an array in this case) needs to duplicated which (generally) takes O(n).
Google Guava provides immutable collections which solve some of the problems of making defensive copies:
If a collection is already immutable it is not copied again
Provide an interface which can be used to explicitly state that a collection is immutable
Provide numerous static factory methods to generate immutable collections
But speed is still the key concern when using immutable copies of collections rather than unmodifiable views.
It should be noted that the usual use for Collections.unmodifiableXXX is to return from a method, for example a getter:
public Collection<Thing> getThings() {
return Collections.unmodifiableCollection(things);
}
In this case there are two things to note:
The user of getThings cannot access things so the unmodifiability cannot be broken.
It would be very expensive to copy things each time a getter were called.
In summary the answer to your question is a little more complex than you might have expected and there are a number of aspects to consider when passing collections around in your application.
From the Javadoc of Collections.unmodifiableList:
Returns an unmodifiable view of the specified list. This method allows modules to provide users with "read-only" access to internal lists.
It prevent the returned list to be modified, but the original list itself still can be.
In your code you are
intList2=Collections.unmodifiableList(intList1);
creating unmodifiableList in intList2. So you are free to make changes in inList1
but you are not allowed to do any changes in intList2
try this:
intList2.add(4);
you will get
java.lang.UnsupportedOperationException
at java.util.Collections$UnmodifiableCollection.add(Unknown Source)
above exception.
Collections.unmodifiableList returns a "read-only" view of the internal list. While the object that was returned is not modifiable the original list that it references can be modified. Both objects point to the same object in memory so it will reflect changes made.
Here is a good explanation of what is happening.
That happens because the unmodifiable list is internally backed for the first list, if you really want it to be unmodifiable you shouldn't use the first List any more.

How to have variables with dynamic data types in Java?

I need to have a UserProfile class that it's just that, a user profile. This user profile has some vital user data of course, but it also needs to have lists of messages sent from the user friends.
I need to save these messages in LinkedList, ArrayList, HashMap and TreeMap. But only one at a time and not duplicate the message for each data structure. Basically, something like a dynamic variable type where I could pick the data type for the messages.
Is this, somehow, possible in Java? Or my best approach is something like this? I mean, have 2 different classes (for the user profile), one where I host the messages as Map<K,V> (and then I use HashMap and TreeMap where appropriately) and another class where I host them as List<E> (and then I use LinkedList and ArrayList where appropriately). And probably use a super class for the UserProfile so I don't have to duplicate variables and methods for fields like data, age, address, etc...
Any thoughts?
First of all, you are not duplicating a message by adding it to different collections at the same time - you only store distinct references to the same object. (Well, unless a message is represented as a primitive type like long... but these can't be added to collections anyway.)
Why can't you have all those collections within the same UserProfile? This would allow you to access messages by key or index, and iterate through them any way you like.
A LinkedHashMap might also be an interesting option for you, as it guarantees iteration order, so in a way it behaves similarly to a List regarding iteration, while still being a Map. Ultimately, it boils down to how you want to access the messages of a given user, which you haven't detailed.
Update: #Snake, you can only store references to objects in Java collections. A primitive long value thus can't be stored directly, only by converting to a Long object first. Note that since Java5, this conversion may be implicit due to autoboxing, so you don't see it in the code, but it still happens nevertheless - e.g.
List<Long> list = new ArrayList<Long>();
list.add(1L); // the primitive value is boxed into a Long object,
// which is then added to the list
long value = list.get(0); // the value of the Long object in the list is outboxed
// and assigned to the primitive variable
If this is a university project, then I suspect that what you are meant to do is this:
Collection mycoll;
mycoll = new ArrayList();
for (Message m:message) {
// do stuff and measure the performance
}
// do other stuff and measure the performance
mycoll = new LinkedList();
// do the same stuff as a above and measure the performance again
mycoll = new HashMap();
//... and so on
As stated above, adding an object to a collection doesn't duplicate it.
I ended up using what I described on the first post:
Is this, somehow, possible in Java? Or
my best approach is something like
this? I mean, have 2 different classes
(for the user profile), one where I
host the messages as Map (and
then I use HashMap and TreeMap where
appropriately) and another class where
I host them as List (and then I use
LinkedList and ArrayList where
appropriately). And probably use a
super class for the UserProfile so I
don't have to duplicate variables and
methods for fields like data, age,
address, etc...

Categories