In my App I'm modelling an Invoice. In my country (Italy) every invoice must have a unique sequential number without holes, that every year have to restart from 1.
I thought long and hard about the best way to implement it but I have not found a good guide about this. For now I have a JpaRepository where I've my custom synchronized save() method in which I get the last id used:
SELECT MAX(numero) FROM Invoice WHERE YEAR(date) = :year
The problem of this approach is that is not very safe because the developer should know that the save should be done only with that particular service.
Instead I'd like more an approach that is hidden to the developer.
I thought to use a #Prepersist method in a #EntityListeners. This sounds good but do get entity manager inside this class is not so simple....so maybe is not the optimal place...
Finally I thought about Hibernate Interceptor....
Please give me some hints. The problem seems a quite generic problem; so maybe there is yet a good practice to follow.
Thanks
This problem can be broken down into the following requirements:
Sequentially unique: Generate numbers in a sequence, starting from a given value (say, 1000001) and then always incrementing by a fixed value (say, 1).
No gaps: There must not be any gaps between the numbers. So, if the first number generated is 1000001, the increment is 1 and 200 numbers have been generated so far, the latest number should be 1000201.
Concurrency: Multiple processes must be able to generate the numbers at the same time.
Generation at creation: The numbers must be generated at the time of creation of a record.
No exclusive locks: No exclusive locks should be required for generating the numbers.
Any solution can only comply with 4 out of these 5 requirements. For example, if you want to guarantee 1-4, each process will need to take locks so that no other process can generate and use the same number that it has generated. Therefore, imposing 1-4 as requirements will mean that 5 will have to be let gone of. Similarly, if you want to guarantee 1, 2, 4 and 5, you need to make sure that only one process (thread) generates a number at a time because uniqueness cannot be guaranteed in a concurrent environment without locking. Continue this logic and you will see why it is impossible to guarantee all of these requirements at the same time.
Now, the solution depends on which one out of 1-5 you are willing to sacrifice. If you are willing to sacrifice #4 but not #5, you can run a batch process during idle hours to generate the numbers. However, if you put this list in front of a business user (or a finance guy), they will ask you to comply with 1-4 as #5 is a purely technical issue (to them) and therefore they would not want to be bothered with it. If that is the case, a possible strategy is:
Perform all possible computation required to generate an invoice upfront, keeping the invoice number generation step as the very last step. This will ensure that any exceptions that can occur, happen before the number is generated and also to make sure that a lock is taken for a very short amount of time, thereby not affecting the concurrency or performance of the application too much.
Keep a separate table (for example, DOCUMENT_SEQUENCE) to keep a track of the last generated number.
Just before saving an invoice, take an exclusive row-level lock on the sequence table (say, isolation level SERIALIZABLE), find the required sequence value to use and save the invoice immediately. This should not take too much time because reading a row, incrementing its value and saving a record should be a short enough operation. If possible, make this short transaction a nested transaction to the main one.
Keep a decent-enough database timeout so that concurrent threads waiting for a SERIALIZABLE lock do not time out too fast.
Keep this whole operation in a retry loop, retrying at least 10 times before giving up completely. This will ensure that if the lock queue builds up too fast, the operations are still tried a few times before giving up totally. Many commercial packages have retry count as high as 40, 60 or 100.
In addition to this, if possible and allowed by your database design guidelines, put a unique constraint on the invoice number column so that duplicate values are not stored at any cost.
Spring gives you all the tools to implement this.
Transactions: Through the #Transactional annotation.
Serialization: Through the isolation attribute of the #Transactional annotation.
Database access: Through Spring JDBC, Spring ORM and Spring Data JPA.
Retries: Through Spring Retry.
I have a sample app that demonstrates using all these pieces together.
Related
I have a requirement in my project.
I want to generate an increasing unique sequence number which will be mapped to a specific field(interchange id ) for the output xml file.
The customer wants to generate some alert around this number. So they are very specific about the number should be
1. Unique 2. Increasing.
So now I have two approach to this case
I will generate a sequence with the help of oracle sequence. But the problem is again they do not want to unnecessarily hit the database.
Will genearate in java with the help of static variable. But I feel its not full proof. I think if my application or server restarts then the static variable will start form 0 again. In that case the number will not be unique.
So my question is, if we can get this something easily in mulesoft or any idea is apreciated
TIA.
Use the static field, but grab the value from the database. Given that there's no requirement that the numbers need to be consecutive, you can increment the database value by f.e. 100 each time, then you need to hit the database only every 100th number that you need to produce.
Obtaining a sequence value from the database does not qualify as "hitting the database unnecessarily".
It is necessary in order to obtain a unique sequence value in an efficient and scalable way.
While sequence contention isn't unheard of in an Oracle database it is usually not the biggest problem you have in a busy database. And one of the easiest things to fix: increase the sequence cache.
If you do know (as opposed to just assuming) that you will get a performance problem, then you might think about increasing the sequence increment to a very high number. Then, when you start your application you call nextval and get the upper limit of numbers you can hand out inside your Java code without risking anything. If you reach that limit, call nextval again to get the next slice of numbers. Essentially combining your static variable with the sequence persistence that Oracle offers
But again: I doubt that calling the sequence for each number will get you into trouble any time soon (and if it does, you probably will have other performance problems that are far bigger)
Another point as to why it is preferable to use something like the database for this is because a static variable will only work while you have a single JVM instance.
If you ever need to scale through more nodes this becomes a very brittle pattern (unless you are able to avoid by using GUID's instead - but this won't fulfil your incrementing requirement in this case).
Instead of 'the' database you could use another efficient mechanism like memcache or Redis?
During localhost development the ID's generated by GAE, starts with 1.
However in a real GAE deployment in the cloud, the ID generated even for the firsts entities are quite long like, 5639412304721232, is there a work around to make the first entities to start with 1, 2, 3.. and so on?
One might suggest to use Sharded Counters, and yes I've used this, however some suggests that sharded counters are not to be used as app might get the same count as it is eventually consistent.
In this case what could be the best solution?
The official post explaining the switch from sequential to 'scattered' ids is here.
The instructions for reverting to sequential behaviour are here, but note the warning that this option will eventually be removed.
The 'best' solution depends on what you need and why. You'll get better datastore performance with scattered ids, but honestly, you might not notice much difference if your app makes gets a small number of requests and makes light use of the datastore. If that's the case, you can use roll your own sequential ids based on a simple entity with a property that holds the the current high watermark id, and rely on having a low transaction rate to keep you from running into limits on the number of transactions per entity.
Reliably handing out sequential ids without gaps in a distributed systems is challenging.
Be aware that you may run into problems if you create a lot of entities very quickly, with sequential Long IDs. This post gives you an explanation why.
In theory there's a choice of auto ID generation policies, with scattered IDs being the default since 1.8.1, but the old monotonically increasing legacy policy is to be deprecated for the reasons discussed in the linked post.
If you're using a sharded counter, you will avoid this but, as you say, you may encounter other issues.
You might try using allocate_ds. We use this to get smaller integer values for system generated ids. In Python using a db kind:
model_key = db.Key.from_path('your_kind_name', 1)
key_batch = db.allocate_ids(model_key, 1)
id_new = key_batch[0]
idkey = db.Key.from_path('your_kind_name', id_new)
I would assign the key's identifier as the strings "1", "2", "3"... and so on, generating them from a sequencer. You can check to see if the entity already exists with a get_or_insert() function.
Similarly, you can use the auto-increment solution by storing the sequence number in an entity.
I trying to implement the active record pattern using Java/JDBC and MySQL along with optimistic locking for concurrency handling.
Now, I have a 'version_number' field for all the records in a table which is incremented after every update.
There seem to be 2 strategies for implementing this:
The application when it requests the data it also stores the corresponding version number of each of the objects (i.e. record). On updating, the version number is 'sent down' to the data layer which is used in the UPDATE...SET...WHERE query for optimistic locking
The application DOES NOT store the version number, but only some parts of the object (as opposed to an entire row of data). For optimistic locking to succeed, the data layer (active record) would need to first fetch the 'row' from the DB, get version number and then fire the same UPDATE...SET...WHERE query for updating the record.
In the former there is the 'first fetch' and then an update. In the latter case you do have a 'first fetch' but also a fetch right before an update.
The question is this: by design, which is the better approach? Is it okay/safe/correct to have all the data, including the version number be stored in the web application's front-end (Javascript/HTML)? Or is it better to take a performance hit of a read before update?
Is there a 'right way' to implement this design? I'm not sure how current implementations of active record handle this (Ruby, Play, ActiveJDBC etc.) If I'm to implement it 'raw' in JDBC what's the right design decision in this case?
This is neither a matter of performance nor security, the two approaches are functionally different and achieve different goals.
With the first approach you are optimistically locking the row for the user's entire "think time." If User 1 loads the screen, then User 2 makes changes, User 1's changes will fail and they will see an error that they were looking at out of date data.
With the second approach you are only protecting against interleaving writes between competing request threads. User 1 may load a page, then User 2 makes changes, then when User 1 hits submit, their changes will go through and blow out User 2's changes. User 1 may have made a decision based on outdate information and never know.
It's a matter of which behaviour is the one you want for your business rules, not one or the other being technically "correct." they are both valid, they do different things.
ActiveJDBC implements version 1. With version 2, you might introduce race conditions
Part of my project requires that we maintain stats of our customers products. More or less we want to show our customers how often their products has been viewed on the site
Therefore we want to create some form of Product Impressions Counter. I do not just mean a counter when we land on the specific product page, but when the product appears in search results and in our product directory lists.
I was thinking that after calling the DB I would extract the specific product ids and pass them to a service that will then inserted then into the stats tables. Or another is using some form of singleton buffer writer which writes to the DB after it reaches a certains size?
Has anyone ever encountered this in there projects and have any ideas that they would like to share?
And / or does anyone know of any framework or tools that could aid this development?
Any input would be really appreciated.
As long as you don't have performance problems, do not over-engineer your design. On the other hand, depending on how big the site is, it seem that you are going to have performance problems due to huge amount of writes.
I think real time updates will have a huge performance impact. Also it is very likely that you will update the same data multiple times in short period of time. Another thing is that, although interesting, storing this statistics is not mission-cricital and it shouldn't affect normal system work. Final thought: inconsistencies and minor inaccuracy is IMHO acceptable in this use case.
Taking all this into account I would temporarily hold the statistics in memory and flush them periodically as you've suggested. This has the additional benefit of merging events for the same product - if between two flushed some product was visited 10 times, you will only perform one update, not 10.
Technically, you can use properly synchronized singleton with background thread (a lot of handcrafting) or some intelligent cache with write-behind technology.
I have a SELECT query with lot of IF conditions, which I can do either in the query itself (takes DB machine's CPU) or I can put it in my java code (takes server machine's CPU).
Is there any preferred approach here (to put conditions in DB Vs in mid-tier)?
UPDATE: My query is a join on more than 2 tables,
and I am using left join to combine and there are some rows which will have corresponding row in 2nd table and some are not.
I need to have some default value for those columns when I don't have corresponding row in 2nd table.
SElECT CASE WHEN t2.col1 is null
then 'default' else t2.col1
END
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
If it's really something that the DB cannot do any faster than the app server, and which actually reduces the load on the DB server if moved to the app server, then I'd move it to the app server.
The reason: if you reach the limits of your hardware, it's much easier to have multiple app servers than to have a clustered database.
However, the second condition above should be tested thoroughly: many things will not reduce (or even increase) the DB load if moved away from the DB.
Update: For the kind of thing you need, I doubt whether the first condition is satisfied - have you tested it? A simple CASE is completely insignificant, unless the condition or the branches contain some very expensive calculations.
Yes, though I would suggest another approach, one that adds no load to the app server and minimal load to the DBMS. It's a little hard to answer the question since you haven't provided a concrete example but I'll give it a shot.
My preferred solution is to get rid of the if conditions totally if you can. At a bare minimum, you can re-jig your database schema to move the cost of calculation away from the select (which happens a lot) and into the insert/update (which happens less often).
That's the normal case, I have seen databases that write more frequently than read, but they're the exception rather than the rule.
By way of example, let's say you store person information and you want to get a list of people whose first name is more than 5 characters long. Don't ask why, I'm the customer, you have to give me what I want :-)
Rather than a monstrous select statement to (possibly) split apart the name and count the characters in it, do that as an insert/update trigger when the data enters the table - that's the only time when the value can change after all.
Put that calculation in another column (indexed) and use that in your select. The cost of the calculation is amortised over al the selects, which will be blindingly fast.
It will take up more storage space but, if you compare the number of database "how can I make this faster?" questions against the number of "how can I use less space?" questions, you'll find the former greatly outweigh the latter.
And, yes, it does mean you store redundant data but the triggers mitigate the possibility of losing ACID properties. It's okay to bend rules if you know the possible consequences and how best to avoid them.
Based on your update, you should put the workload on to the machine where it causes the least impact. That may be the DBMS, it may be the app server, it may even be on the client side (of the app server) itself since that would distribute the cost across a lot of machines rather than concentrating it at a single point.
You should measure, not guess! Set up realistic performance test systems along with realistic production-quality data, then try the different approaches. That's the only real way to be certain.