Multi-Tenant Sequence generator in Hibernate - java

I have a Client entity with orgId and clientId as a composite key. When I have to insert a new client object, I have to generate clientId id sequentially for each orgId, so to do that, I am generating clientId by maintaining the last clientId of every orgId in a separate table, and selecting, adding 1, and updating it.
#Entity
#Table(name = "ftb_client")
public class Client implements Serializable {
#Id
#JoinColumn(name = "ORG_ID")
protected String orgId;
#Id
#Column(name = "CLIENT_ID")
protected int clientId;
#Column(name = "CLIENT_NAME_ENG")
private String clientNameEng;
//....
}
#Entity
#Table
public class MySeq implements Serializable {
#Id
protected String orgId;
private int lastClientId;
//....
}
public Long getNewClientId(String orgId) {
MySeq mySeq = getSession()
.createQuery("from MySeq where orgId = :orgId", MySeq.class)
.setParameter("orgId", orgId)
.setLockMode(LockModeType.PESSIMISTIC_WRITE)
.uniqueResult();
mySeq.setLastClientId(mySeq.getLastClientId() + 1);
return mySeq.getLastClientId();
}
But, this leads to duplicate id generation if there are thousands of concurrent requests. So, to make it thread-safe I have to use Pessimistic locking, so that multiple requests do not generate the same clientId. But now, the problem is that lock doesn't get released until the transaction ends and concurrent requests keep pending for a long time.
So instead of using a lock, if I could use a separate sequence per orgId then I could make the id generation concurrent too. I want to manually execute the sequence generator by determining the sequence name in the runtime by doing something like client_sequence_[orgId] and execute it to generate the id.
And I also want to make it database-independent, or at least for Oracle, MySQL, and Postgres.
I want to know if it is possible or is there any other approach?

It doesn't matter if you use PESSIMISTIC_WRITE or not, a lock will be acquired anyway if you update the entity. The difference is that the lock is acquired eagerly in the case you describe here which prevents lost writes.
Usually, this is solved by creating a separate transaction for the sequence increment. To improve performance, you should increment by a batching factor i.e. 10 and keep 10 values in a queue in-memory to serve from. When the queue is empty, you ask for another 10 values etc.
Hibernate implements this behind the scenes with the org.hibernate.id.enhanced.TableGenerator along with org.hibernate.id.enhanced.PooledOptimizer. So if you know the sequences that you need upfront, I would recommend you use these tools for that purpose. You can also do something similar though yourself if you like.

Related

Using UUID vs #Id annotation in spring project

I'm very new to Spring/Springboot and have seen different approaches in tutorials regarding the model classes used to represent database objects. I was just wondering when it's appropriate to use which?
Approach 1:
A basic class to model a user object
public class User {
private final UUID id;
// other fields
public User(UUID id, <other fields>) {
this.id = id;
// set other fields
}
In the repository layer, we might have a DAO which looks something like
#Repository
public interface UserDao {
public int createUser(UUID id, <other fields>);
// other CRUD operations
}
When the user doesn't input a valid UUID (or absent) a default method could insert it by calling UUID.randomUUID()
Approach 2:
Instead of using a UUID as a unique identifier, instead, with something like Hibernate/JPA we use the #Entity annotation on the User class in the model package, and have the PK field annotated with #Id
#Entity
public class User {
#Id
private final long id;
// other fields
}
#Id annotation is the most commonly used approach in Hibernate. This will map a Java String / BigDecimal / long attribute to an identifier. And using this, you can use specify four generation strategies - AUTO, IDENTITY, SEQUENCE and TABLE.
UUIDs are used when you want your primary key to be globally unique. I can think of a few scenarios where you might want this -
You have data in multiple databases and your keys needs to be unique across different databases.
You need your generated id value even before you persist your record in your database for specific business purposes.
But the downside is that, UUIDs are long and may cost more in terms of storage space.

H2 Database generation strategy is leaving gaps between id values

I'm working on a REST API using Spring. I have this class, which id's is being generated automatically:
#Entity
public class Seller implements Serializable{
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
private String name;
private double tasa;
public Long getId() {
return id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public double getTasa() {
return tasa;
}
public void setTasa(double tasa) {
this.tasa = tasa;
}
}
I added some endpoints to create, delete and get a seller from the DB. So my problem arises when I delete one seller from the DB. When I try to create a new one, I was expecting to get the lower available value for the id but what is actually doing is using some kind of counter/sequence. Let me show you:
So in my second post instruction I was expecting a json with id = 1, instead I received a 2. I tried using TABLE and IDENTITY strategies but the unwanted behavior continued. So my question is: how can I achieve the behavior I desire? I don´t want gaps between my seller's ids.
In general the database are designed to be incremental. When the ID is generated, it is not generated based on the content of the tables. instead of it, the ID is generated using a sequence. In your example you have some records, but imagine a database with a lot of records. The database generates the IDs based on a Sequence (or similar), to avoid read the data, an expensive process.
If the ID is not relevant to the business, then this behavior doesn't affect your process. (Like the message's id in a chat).
If the ID is important, I recommend to redefine the delete process. you probably need to preserve all the ids, like a customer id.
If you want to preserve the sequence and allow delete records, the recommendation is to generate the id by yourself, but you need to lead with problems like concurrence
I tried using TABLE and IDENTITY strategies but the unwanted behavior continued.
This is not unwanted behaviour. Check
How primary keys are generated.
So my question is: how can I achieve the behavior I desire? I don´t want gaps between my seller's ids
One way to achieve this is to not use #GeneratedValue(strategy = GenerationType.AUTO) and set id manually from program and in there you can put any logic you want.
It's not recommended to set primary key manually. If you want you can use any other field like seller_code for this behaviour.
Another question here which is similar to this.

Spring Data JPA - concurrent Bulk inserts/updates

at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:
public class ProductPlacement implements Serializable{
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
#Column(name = "product_placement_id")
private long id;
#ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
private Set<CustomerReview> customerReviews;
}
public class CustomerReview implements Serializable{
private static final long serialVersionUID = 1L;
#Id
#Column(name = "customer_review_id")
private String reviewIdentifier;
#ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
#JoinTable(
name = "tb_miner_review_to_product",
joinColumns = #JoinColumn(name = "customer_review_id"),
inverseJoinColumns = #JoinColumn(name = "product_placement_id")
)
private Set<ProductPlacement> productPlacements;
}
One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:
The review is not in the database -> insert review with reference to the product that is contained in the message
The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.
Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):
#Override
#Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
for(CustomerReview review: customerReviews){
CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
if (cr!=null){
cr.getProductPlacements().add(placement);
customerReviewRepository.saveAndFlush(cr);
}
else{
Set<ProductPlacement> productPlacements = new HashSet<>();
productPlacements.add(placement);
review.setProductPlacements(productPlacements);
cr = review;
customerReviewRepository.saveAndFlush(cr);
}
}
}
Questions:
I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?
Update to question 1: Would a simple #Lock on my Review-Repository prefent the unique-constraint exception?
#Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);
What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?
Thank you!
From a performance point of view, I will consider evaluating the solution with the following changes.
Changing from bidirectional ManyToMany to bidirectional OneToMany
I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.
The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.
Use the second option because whenever the associations are controlled by #ManyToOne associations, the DML statements are always the most efficient ones.
Enable the batching of DML statements
Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.
Quoting from batch INSERT and UPDATE statements
hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true
Remove the number of saveAndFlush calls
The current code gets the ProductPlacement and for each review it does a saveAndFlush, which results in no batching of DML statements.
Instead I would consider loading the ProductPlacement entity and adding the List<CustomerReview> customerReviews to the Set<CustomerReview> customerReviews field of ProductPlacement entity and finally call the merge method once at the end, with these two changes:
Making ProductPlacement entity owner of the association i.e., by moving mappedBy attribute onto Set<ProductPlacement> productPlacements field of CustomerReview entity.
Making CustomerReview entity implement equals and hashCode method by using reviewIdentifier field in these method. I believe reviewIdentifier is unique and user assigned.
Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.

JPA EclipseLink 2 query performance

APPLICATION and ENVIRONMENT
Java EE / JSF2.0 / JPA enterprise application, which contains a web and an EJB module. I am generating PDF documents which contains evaluated data queried via JPA.
I am using MySQL as database, with MyISAM engine on all tables. JPA Provider is EclipseLink with cache set to ALL. FetchType.EAGER is used at relationships.
AFTER RUNNING NETBEANS PROFILER
Profiler results show that the following method is called the most. In this session it was 3858 invocations, with ~80 seconds from request to response. This takes up 80% of CPU time. There are 680 entries in the Question table.
public Question getQuestionByAzon(String azon) {
try {
return (Question) em.createQuery("SELECT q FROM Question q WHERE q.azonosito=:a").setParameter("a", azon).getSingleResult();
} catch (NoResultException e) {
return null;
}
}
The Question entity:
#Entity
#Inheritance(strategy = InheritanceType.SINGLE_TABLE)
public abstract class Question implements Serializable {
private static final long serialVersionUID = 1L;
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
#Column(unique = true)
private String azonosito;
#Column(nullable = false)
#Basic(optional = false)
private String label;
#Lob
#Column(columnDefinition = "TEXT")
private String help;
private int quizNumber;
private String type;
#ManyToOne
private Category parentQuestion;
...
//getters and setters, equals() and hashCode() function implementations
}
There are four entities extending Question.
The column azonosito should be used as primary key, but I don't see this as the main reason for low performance.
I am interested in suggestions for optimization. Feel free to ask if you need more information!
EDIT See my answer summarizing the best results
Thanks in advance!
Using LAZY is a good start, I would recommend you always make everything LAZY if you are at all concerned about performance.
Also ensure that you are using weaving, (Java SE agent, or Java EE/Spring, or static), as LAZY OneToOne and ManyToOne depend on this.
Changing the Id to your other field would be a good idea, if you always query on it and it is unique. You should also check why your application keeps executing the same query over and over.
You should make the query a NameDQuery not use a dynamic query.
In EclipseLink you could also enable the query cache on the query (once it is a named query), this will enable cache hits on the query result.
Have you got unique index on the azonosito column in your database. Maybe that will help.
I would also suggest to fetch only the fields you really need so maybe some of then could be lazy i.e. Category.
Since changing fetch type of relationship to LAZY dramatically improved performance of your application, perhaps you don't have an index for foreign key of that relationship. If so, you need to create it.
In this answer I will summarize what was the best solution for that particular query.
First of all, I set azonosito column as primary key, and modified my entities accordingly. This is necessary because EclipseLink object cache works with em.find:
public Question getQuestionByAzon(String azon) {
try {
return em.find(Question.class, azon);
} catch (NoResultException e) {
return null;
}
}
Now, instead of using a QUERY_RESULT_CACHE on a #NamedQuery, I configured the Question entity like this:
#Entity
#Inheritance(strategy = InheritanceType.SINGLE_TABLE)
#Cache(size=1000, type=CacheType.FULL)
public abstract class Question implements Serializable { ... }
This means an object cache of maximum size 1000 will be maintained of all Question entities.
Profiler Results ~16000 invocations
QUERY_RESULT_CACHE: ~28000ms
#Cache(size=1000, type=CacheType.FULL): ~7500ms
Of course execution time gets shorter after the first execution.

JPA Best Practice to update an Entity with Collections

I am using JPA in a Glassfish Container. I have the following Modell (not complete)
#Entity
public class Node {
#Id
private String serial;
#Version
#Column(updatable=false)
protected Integer version;
private String name;
#ManyToMany(cascade = {CascadeType.PERSIST,CascadeType.MERGE})
private Set<LUN> luns = new HashSet<LUN>();
#Entity
public class LUN {
#Id
private String wwid;
#Version
#Column(updatable=false)
protected Integer version;
private String vendor;
private String model;
private Long capacity;
#ManyToMany(mappedBy = "luns")
private Set<Node> nodes = new HashSet<Node>();
This information will be updated daily. Now my question is, what is the best practice to do this.
My fist approach was, I generate the Node Objects on the client (with LUNs) every day new, and merge it to the Database (I wanted to let JPA do the work) via service.
Now I did some tests without LUNs yet. I have the following service in a stateless EJB:
public void updateNode(Node node) {
if (!nodeInDB(node)) {
LOGGER.log(Level.INFO, "persisting node {0} the first time", node.toString());
em.persist(node);
} else {
LOGGER.log(Level.INFO, "merging node {0}", node.toString());
node = em.merge(node);
}
}
The test:
#Test
public void addTest() throws Exception {
Node node = new Node();
node.setName("hostname");
node.setSerial("serial");
nodeManager.updateNode(node);
nodeManager.updateNode(node);
node.setName("newhostname");
nodeManager.updateNode(node);
}
This works without the #Version Field. With the #Version field I get an OptimisticLockException.
Is that the wrong approach? Do I have to always perform an em.find(...) and then modify the managed entity via getter and setter?
Any help is appreciated.
BR Rene
The #version annotation is used to enable optimistic locking.
When you use optimistic locking, each successful write to your table increases a version counter, which is read and compared every time you persist your entities. If the version read when you first find your entity doesn't match the version in the table at write time, an exception is thrown.
Your program updates the table several times after reading the version column only once. Therefore, at the second time you call persist() or merge(), the version numbers don't match, and your query fails. This is the expected behavior when using optimistic locking: you were trying to overwrite a row that was changed since the time you first read it.
To answer your last question: You need to read the changed #version information after every write to your database. You can do this by calling em.refresh().
You should, however, consider re-thinking your strategy: Optimistic locks are best used on transactions, to ensure data consistency while the user performs changes. These usually read the data, display it to the user, wait for changes, and then persist the data after the user has finished the task. You wouldn't really want nor need to write the same data rows several times in this context, because the transaction could fail due to optimistic locking on every one of these write calls - it would complicate things rather than make them more simple.

Categories