I have a requirement in my project.
I want to generate an increasing unique sequence number which will be mapped to a specific field(interchange id ) for the output xml file.
The customer wants to generate some alert around this number. So they are very specific about the number should be
1. Unique 2. Increasing.
So now I have two approach to this case
I will generate a sequence with the help of oracle sequence. But the problem is again they do not want to unnecessarily hit the database.
Will genearate in java with the help of static variable. But I feel its not full proof. I think if my application or server restarts then the static variable will start form 0 again. In that case the number will not be unique.
So my question is, if we can get this something easily in mulesoft or any idea is apreciated
TIA.
Use the static field, but grab the value from the database. Given that there's no requirement that the numbers need to be consecutive, you can increment the database value by f.e. 100 each time, then you need to hit the database only every 100th number that you need to produce.
Obtaining a sequence value from the database does not qualify as "hitting the database unnecessarily".
It is necessary in order to obtain a unique sequence value in an efficient and scalable way.
While sequence contention isn't unheard of in an Oracle database it is usually not the biggest problem you have in a busy database. And one of the easiest things to fix: increase the sequence cache.
If you do know (as opposed to just assuming) that you will get a performance problem, then you might think about increasing the sequence increment to a very high number. Then, when you start your application you call nextval and get the upper limit of numbers you can hand out inside your Java code without risking anything. If you reach that limit, call nextval again to get the next slice of numbers. Essentially combining your static variable with the sequence persistence that Oracle offers
But again: I doubt that calling the sequence for each number will get you into trouble any time soon (and if it does, you probably will have other performance problems that are far bigger)
Another point as to why it is preferable to use something like the database for this is because a static variable will only work while you have a single JVM instance.
If you ever need to scale through more nodes this becomes a very brittle pattern (unless you are able to avoid by using GUID's instead - but this won't fulfil your incrementing requirement in this case).
Instead of 'the' database you could use another efficient mechanism like memcache or Redis?
Related
I would like to have a persistent, distributed counter. My idea is to use a database sequence. Only sequence. I do not want to have a table, because I will not populate the table. I just need a sequence of unique numbers.
I don't want to use naive select mys-seq.nextval from dual (or org.springframework.jdbc.support.incrementer.OracleSequenceMaxValueIncrementer) because I would like to use sequence caching ability - I do not want to hit the database every time I need a new number.
I guess I should use org.hibernate.id.enhanced.SequenceStyleGenerator, but I cannot find any example of how to use it "standalone", without entity.
Unfortunately, all examples I found describes how to configure entity id generation with the sequence.
PS. I have the Spring Boot app.
I found simple solution for my problem: I can treat each number from database sequence as a range of numbers to use. For example, if sequence returns 5, it means that reserved range for my counter is 5000 - 5999.
In that solution I will hit database once for thousand numbers.
Initially I thought I have to utilize internal database level sequence number caching, but the same result I can achieve with trivial application level caching
In my App I'm modelling an Invoice. In my country (Italy) every invoice must have a unique sequential number without holes, that every year have to restart from 1.
I thought long and hard about the best way to implement it but I have not found a good guide about this. For now I have a JpaRepository where I've my custom synchronized save() method in which I get the last id used:
SELECT MAX(numero) FROM Invoice WHERE YEAR(date) = :year
The problem of this approach is that is not very safe because the developer should know that the save should be done only with that particular service.
Instead I'd like more an approach that is hidden to the developer.
I thought to use a #Prepersist method in a #EntityListeners. This sounds good but do get entity manager inside this class is not so simple....so maybe is not the optimal place...
Finally I thought about Hibernate Interceptor....
Please give me some hints. The problem seems a quite generic problem; so maybe there is yet a good practice to follow.
Thanks
This problem can be broken down into the following requirements:
Sequentially unique: Generate numbers in a sequence, starting from a given value (say, 1000001) and then always incrementing by a fixed value (say, 1).
No gaps: There must not be any gaps between the numbers. So, if the first number generated is 1000001, the increment is 1 and 200 numbers have been generated so far, the latest number should be 1000201.
Concurrency: Multiple processes must be able to generate the numbers at the same time.
Generation at creation: The numbers must be generated at the time of creation of a record.
No exclusive locks: No exclusive locks should be required for generating the numbers.
Any solution can only comply with 4 out of these 5 requirements. For example, if you want to guarantee 1-4, each process will need to take locks so that no other process can generate and use the same number that it has generated. Therefore, imposing 1-4 as requirements will mean that 5 will have to be let gone of. Similarly, if you want to guarantee 1, 2, 4 and 5, you need to make sure that only one process (thread) generates a number at a time because uniqueness cannot be guaranteed in a concurrent environment without locking. Continue this logic and you will see why it is impossible to guarantee all of these requirements at the same time.
Now, the solution depends on which one out of 1-5 you are willing to sacrifice. If you are willing to sacrifice #4 but not #5, you can run a batch process during idle hours to generate the numbers. However, if you put this list in front of a business user (or a finance guy), they will ask you to comply with 1-4 as #5 is a purely technical issue (to them) and therefore they would not want to be bothered with it. If that is the case, a possible strategy is:
Perform all possible computation required to generate an invoice upfront, keeping the invoice number generation step as the very last step. This will ensure that any exceptions that can occur, happen before the number is generated and also to make sure that a lock is taken for a very short amount of time, thereby not affecting the concurrency or performance of the application too much.
Keep a separate table (for example, DOCUMENT_SEQUENCE) to keep a track of the last generated number.
Just before saving an invoice, take an exclusive row-level lock on the sequence table (say, isolation level SERIALIZABLE), find the required sequence value to use and save the invoice immediately. This should not take too much time because reading a row, incrementing its value and saving a record should be a short enough operation. If possible, make this short transaction a nested transaction to the main one.
Keep a decent-enough database timeout so that concurrent threads waiting for a SERIALIZABLE lock do not time out too fast.
Keep this whole operation in a retry loop, retrying at least 10 times before giving up completely. This will ensure that if the lock queue builds up too fast, the operations are still tried a few times before giving up totally. Many commercial packages have retry count as high as 40, 60 or 100.
In addition to this, if possible and allowed by your database design guidelines, put a unique constraint on the invoice number column so that duplicate values are not stored at any cost.
Spring gives you all the tools to implement this.
Transactions: Through the #Transactional annotation.
Serialization: Through the isolation attribute of the #Transactional annotation.
Database access: Through Spring JDBC, Spring ORM and Spring Data JPA.
Retries: Through Spring Retry.
I have a sample app that demonstrates using all these pieces together.
During localhost development the ID's generated by GAE, starts with 1.
However in a real GAE deployment in the cloud, the ID generated even for the firsts entities are quite long like, 5639412304721232, is there a work around to make the first entities to start with 1, 2, 3.. and so on?
One might suggest to use Sharded Counters, and yes I've used this, however some suggests that sharded counters are not to be used as app might get the same count as it is eventually consistent.
In this case what could be the best solution?
The official post explaining the switch from sequential to 'scattered' ids is here.
The instructions for reverting to sequential behaviour are here, but note the warning that this option will eventually be removed.
The 'best' solution depends on what you need and why. You'll get better datastore performance with scattered ids, but honestly, you might not notice much difference if your app makes gets a small number of requests and makes light use of the datastore. If that's the case, you can use roll your own sequential ids based on a simple entity with a property that holds the the current high watermark id, and rely on having a low transaction rate to keep you from running into limits on the number of transactions per entity.
Reliably handing out sequential ids without gaps in a distributed systems is challenging.
Be aware that you may run into problems if you create a lot of entities very quickly, with sequential Long IDs. This post gives you an explanation why.
In theory there's a choice of auto ID generation policies, with scattered IDs being the default since 1.8.1, but the old monotonically increasing legacy policy is to be deprecated for the reasons discussed in the linked post.
If you're using a sharded counter, you will avoid this but, as you say, you may encounter other issues.
You might try using allocate_ds. We use this to get smaller integer values for system generated ids. In Python using a db kind:
model_key = db.Key.from_path('your_kind_name', 1)
key_batch = db.allocate_ids(model_key, 1)
id_new = key_batch[0]
idkey = db.Key.from_path('your_kind_name', id_new)
I would assign the key's identifier as the strings "1", "2", "3"... and so on, generating them from a sequencer. You can check to see if the entity already exists with a get_or_insert() function.
Similarly, you can use the auto-increment solution by storing the sequence number in an entity.
I have about 10,000,000 records inside a redis database. I received a single columned CSV file with about 100,000 strings which correspond to keys in my redis database. For each of these strings inside the CSV I need to increment the value in redis by one. Normally to increment the INCR command is used, but is there a way I can make this faster than creating a loop that iterates 100,000 times and sends an INCR command one by one to change each key value individually? Is there a more mass way to update?
First of all, each redis driver has "Pipeline" to execute batch commands. You dont need to send the incr command one by one but send them together to redis server.
Second, if there are duplicate keys in your 100,000 strings, use "INCRBY" command. For instance, the doc is "k1,1; k2,2; k1,3", then you can use "INCRBY k1 2" instead of 2 "INCR k1"
Note: the following is pure speculation and needs testing to be verified :)
#Mark_H's answer is textbook (+1) but I have a wild idea that you can test if you want. Assuming (and that's a big assumption) that your 10M or so keys are serializable and that given the position of a key in the sequence you can derive the relevant key's name (e.g. if the names are based on a continuous numerical identifier) how about preparing a bitstring and have the set bits indicate an increment operation?
Such a bit value would be about 1.2MB in size, but the alternative is sending 100K ops (pipelined or not) so this is more network efficient. How about performance? The next part of the idea is to write a small Lua that actually accepts this value and does the INCR for relevant keys. I suspect it would perform equally as, if not better.
Keep us updated if you try this ;)
P.S. another hidden assumption in my answer is that you're only adding 1's, but that can be resolved by repeating the approach for other numbers/additional keys.
I want to store different kinds of counters for my user.
Platform: Java
E.g. I have identified:
currentNumRecords
currentNumSteps
currentNumFlowsInterval1440
currentNumFlowsInterval720
currentNumFlowsInterval240
currentNumFlowsInterval60
currentNumFlowsInterval30
etc.
Each of the counters above needs to be reset at the beginning of each month for each user. The value of each counter can be unpredictably high with peaks etc. (I mean that a lot of things are counted, so I want to think about a scalable solution).
Now my question is what approach to take to:
a) Should I have separate columns for each counter on the user table and doing things like 'Update set counterColumn = counterColumn+ 1' ?
b) put all the values in some kind of JSON/XML and put it in a single column? (in this case I always have to update all values at once)
The disadvantage I see is row locking on the user table everytime a single counter is incremented.
c) having an separate counter table with 3 columns (userid, name, counter) and doing one INSERT for each count + having a background job doing aggregates which are written to the User table? In this case would it be ok to store the aggregated counters as JSON inside a column in the user table?
d) Doing everything in MySQL or also use another technology? I also thought about using another solution for storing counters and only keeping the aggregates in MySQL. E.g. I have experimented with Apache Cassandra's distributed counters. My concerns are about the Transactions which cassandra does not have.
I need the counters to be exact because they are used for billing, thus I don't know if Cassandra is a good fit here, although the scalability of Cassandra seems tempting.
What about Redis for storing the counters + writing the aggregates in MySQL? Does Redis have stuff which helps me here? Or should I just store everything in a simple Java HashMap in-memory and have a aggregation background thread and don't use another technology?
In summary I am concerned about:
reduce row locking
have exact counters (transactions?)
Thanks for your ideas :)
You're sort of saying contradictory things.
The number of counts can be huge or at least unpredictable per user.
To me this means they must be uniform, like an array. It is not possible to have an unbounded number of heterogenous data, unless you have an unbounded amount of code and an unbounded number of developer hours to expend.
If they are uniform they should be flattened into a table user_counter where each row is of the form (user_id, counter_name, counter_value). However you will need to think carefully about what sort of indices you will need, etc. Updating at the beginning of the month if they are all set to zero or some default value is one SQL query.
Basically (c). (a) and (b) are most absurd and MySQL is still a suitable technology for this.
Your requirement is not so untypical. In general this is statistical session/user/... bound written data.
The first thing is to split things if not already done so. Make a mostly readonly database, and separately collect these data. So a separated user table for the normal properties.
The statistical data could be held in an in-memory table. You could also use means other than a database, a message queue, session attributes.