I really want to abuse #Asynchronous to speed up my web application, therefore I want to understand this a bit more in order to avoid incorrectly using this annotation.
So I know business logic inside this annotated method will be handled in a separated thread, so the user wont have to wait. So I have two method that persist data
public void persist(Object object) {
em.persist(object);
}
#Asynchronous
public void asynPersist(Object object) {
em.persist(object);
}
So I have couple scenario I want to ask which one of these scenario is not ok to do
1. B is not depend on A
A a = new A();
asynPersist(a); //Is it risky to `asynPersist(a) here?`
B b = new B();
persist(b);
//Cant asynPersist(B) here because I need the `b` to immediately
//reflect to the view, or should I `asynPersist(b)` as well?
2. Same as first scenario but B now depend on A. Should I `asynPersist(a)`
3. A and B are not related
A a = new A();
persist(a); //Since I need `a` content to reflect on the view
B b = new B();
asynPersist(b); //If I dont need content of b to immediately reflect on the view. Can I use asyn here?
EDIT: hi #arjan, thank you so much for your post, here is another scenario I want to ask your expertise. Please let me know if my case does not make any sense to u.
4. Assume User has an attribute call `count` type `int`
User user = null;
public void incVote(){
user = userDAO.getUserById(userId);
user.setCount(u.getCount() + 1);
userDAO.merge(user);
}
public User getUser(){ //Accessor method of user
return user;
}
If I understand you correctly, if my method getUserById use #Asynchronous, then the line u.setCount(u.getCount() + 1); will block until the result of u return, is it correct? So in this case, the use of #Asynchronous is useless, correct?
If the method merge (which merge all changes of u back to database) use #Asynchronous, and if in my JSF page, I have something like this
<p:commandButton value="Increment" actionListener="#{myBean.incVote}" update="cnt"/>
<h:outputText id="cnt" value="#{myBean.user.count}" />
So the button will invoke method incVote(), then send and ajaxical request to tell the outputText to update itself. Will this create an race condition (remember we make merge asynchronous)? So when the button tell the outputText to update itself, it invoke the accessor method getUser(), will the line return user; block to wait for the asynchronous userDAO.merge(user), or there might possible a race condition here (that count might not display the correct result) and therefore not recommend to do so?
There are quite a few places where you can take advantage of #Asynchronous. With this annotation, you can write your application as intended by the Java EE specification; don't use explicit multi-threading but let work being done by managed thread pools.
In the first place you can use this for "fire and forget" actions. E.g. sending an email to a user could be done in an #Asynchronous annotated method. The user does not need to wait for your code to contact the mail-server, negotiate the protocol, etc. It's a waste of everyone's time to let the main request processing thread wait for this.
Likewise, maybe you do some audit logging when a user logs in to your application and logs off again. Both these two persist actions are perfect candidates to put in asynchronous methods. It's senseless to let the user wait for such backend administration.
Then there is a class of situations where you need to fetch multiple data items that can't be (easily) fetched using a single query. For instance, I often see apps that do:
User user = userDAO.getByID(userID);
Invoice invoice = invoiceDAO.getByUserID(userID);
PaymentHistory paymentHistory = paymentDAO.getHistoryByuserID(userID);
List<Order> orders = orderDAO.getOpenOrdersByUserID(userID);
If you execute this as-is, your code will first go the DB and wait for the user to be fetched. It sits idle in between. Then it will go fetch the invoice and waits again. Etc etc.
This can be sped up by doing these individual calls asynchronously:
Future<User> futureUser = userDAO.getByID(userID);
Future<Invoice> futureInvoice = invoiceDAO.getByUserID(userID);
Future<PaymentHistory> futurePaymentHistory = paymentDAO.getHistoryByuserID(userID);
Future<List<Order>> futureOrders = orderDAO.getOpenOrdersByUserID(userID);
As soon as you actually need one of those objects, the code will automatically block if the result isn't there yet. This allows you to overlap fetching of individual items and even overlap other processing with fetching. For example, your JSF life cycle might already go through a few phases until you really need any of those objects.
The usual advice that multi threaded programming is hard to debug doesn't really apply here. You're not doing any explicit communication between threads and you're not even creating any threads yourself (which are the two main issues this historical advice is based upon).
For the following case, using asynchronous execution would be useless:
Future<user> futureUser = userDAO.getUserById(userId);
User user = futureUser.get(); // block happens here
user.setCount(user.getCount() + 1);
If you do something asynchronously and right thereafter wait for the result, the net effect is a sequential call.
will the line return user; block to wait for the asynchronous userDAO.merge(user)
I'm afraid you're not totally getting it yet. The return statement has no knowledge about any operation going on for the instance being processed in another context. This is not how Java works.
In my previous example, the getUserByID method returned a Future. The code automatically blocks on the get() operation.
So if you have something like:
public class SomeBean {
Future<User> futureUser;
public String doStuff() {
futureUser = dao.getByID(someID);
return "";
}
public getUser() {
return futureUser.get(); // blocks in case result is not there
}
}
Then in case of the button triggering an AJAX request and the outputText rendering itself with a binding to someBean.user, then there is no race condition. If the dao already did its thing, futureUser will immediately return an instance of type User. Otherwise it will automatically block until the User instance is available.
Regarding doing the merge() operation asynchronous in your example; this might run into race conditions. If your bean is in view scope and the user quickly presses the button again (e.g. perhaps having double clicked the first time) before the original merge is done, an increment might happen on the same instance that the first merge invocation is still persisting.
In this case you have to clone the User object first before sending it to the asynchronous merge operation.
The simple examples I started this answer with are pretty save, as they are about doing an isolated action or about doing reads with an immutable type (the userID, assume it is an int or a String) as input.
As soon as you start passing mutable data into asynchronous methods you'll have to be absolutely certain that there is no mutation being done to that data afterwards, otherwise stick to the simple rule to only pass in immutable data.
You should not use asynch this way if any process that follows the asynch piece depends on the outcome. If you persist data that a later thread needs, you'll have a race condition that will be a bad idea.
I think you should take a step back before you go this route. Write your app as recommended by Java EE: single threaded, with threads handled by the container. Profile your app and find out where the time is being spent. Make a change, reprofile, and see if it had the desired effect.
Multi-threaded apps are hard to write and debug. Don't do this unless you have a good reason and solid data to support your changes.
Related
I have a service method where I request an entity by ID from the database. If the entity has the attribute paid == false, I set it to true and do something. If paid==true it just returns.
#Override
#Transactional(rollbackFor={ServiceException.class})
public void handleIntentSucceeded(PaymentIntent intent) throws ServiceException {
LOGGER.trace("handleIntentSucceeded({})", intent);
CreditCharge charge = transactionRepository.findByPaymentIntentId(intent.getId());
if(charge.getPaid()) {
return;
// do some stuff
charge.setPaid(true);
transactionRepository.save(charge);
}
Now if there are multiple requests with the same intent at the same time, this method would no longer be consistent because, for example, the first request receives the charge with paid==false, so it does "some things" and if the second request comes to this method before the first request has saved the charge with paid==true, it would also do "some things" even if the first request already does so. Is this a correct conclusion?
To be sure that only one request can process this method at a time, to avoid "some things" being done multiple times, I could set the Transactional to #Transactional(isolation = Isolation.SERIALIZABLE). This way any request can process this method/transaction only if the request has committed the Transactional before.
Is this the best approach or is there a better way?
One solution, as already mentioned above is to use OptimisticLocking. However, an OptimisticLockingException will lead to a failed http request. If this is a problem, you can handle the exception.
But in case you are sure, that you will not run multiple instances of the application and there are not big requirements for perfomance, or you simply want to deal with the problem later and until that use a "workaround", you can make the method synchronized (https://www.baeldung.com/java-synchronized). That way, the Java runtime will ensure, that the method cannot be run in parallel.
I would probably look for a way of optimisically locking the record (e.g. using some kind of update counter), so that only the first concurrent transaction changing the paid property would complete successfully.
Any subsequent transaction which was trying to modify the same entity in the meantime would then fail, and their actions done during do some stuff would rollback.
Optimistic vs. Pessimistic locking
edit: REPEATABLE_READ isolation level (as suggested by one of the comments) might also behave similarly to optimistic locking; though this might depend on the implementation
I have a method
#Transactional
public void updateSharedStateByCommunity(List[]idList)
This method is called from the following REST API:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
// call updateSharedStateByCommunity
}
Now the ID lists are very large, like 200000, When I try to process it, then it takes lots of time and on client side timeout error occurred.
So, I want to split it to two calls with list size of 100000 each.
But, the problem is, it is considered as 2 independent transactions.
NB: The 2 calls is an example, it can be divided to many times, if number ids are more larger.
I need to ensure two separate call to a single transaction. If any one of the 2 calls fails, then it should rollback to all operation.
Also, in the client side, we need to show progress dialog, so I can't use only timeout.
The most obvious direct answer to your question IMO is to slightly change the code:
#RequestMapping(method = RequestMethod.POST)
public ret_type updateUser(param) {
updateSharedStateByCommunityBlocks(resolveIds);
}
...
And in Service introduce a new method (if you can't change the code of the service provide an intermediate class that you'll call from controller with the following functionality):
#Transactional
public updateSharedStatedByCommunityBlocks(resolveIds) {
List<String> [] blocks = split(resolveIds, 100000); // 100000 - bulk size
for(List<String> block :blocks) {
updateSharedStateByCommunity(block);
}
}
If this method is in the same service, the #Transactional in the original updateSharedStateByCommunity won't do anything so it will work. If you'll put this code into some other class, then it will work since the default propagation level of spring transaction is "Required"
So it addresses harsh requirements: you wanted to have a single transaction - you've got it. Now all the code runs in the same transaction. Each method now runs with 100000 and not with all the ids, everything is synchronous :)
However, this design is problematic for many different reasons.
It doesn't allow to track the progress (show it to the user) as you've stated by yourself in the last sentence of the question. REST is synchronous.
It assumes that network is reliable and waiting for 30 minutes is technically not a problem (leaving alone the UX and 'nervous' user that will have to wait :) )
In addition to that, the network equipment can force closing the connection (like load balancers with pre-configured request timeout).
That's why people suggest some kind of asyncrhonous flow.
I can say that you still can use the async flow, spawn the task, and after each bulk update some shared state (in-memory in the case of a single instance) and persistent (like database in the case of cluster).
So that the interaction with the client will change:
Client calls "updateUser" with 200000 ids
Service responds "immediately" with something like "I've got your request, here is a request Id, ping me once in a while to see what happens.
Service starts an async task and process the data chunk by chunk in a single transaction
Client calls "get" method with that id and server reads the progress from the shared state.
Once ready, the "Get" methods will respond "done".
If something fails during the transaction execution, the rollback is done, and the process updates the database status with "failure".
You can also use more modern technologies to notify the server (web sockets for example), but it's kind of out of scope for this question.
Another thing to consider here: from what I know, processing 200000 objects should be done in much less then 30 minutes, its not that much for modern RDBMSs.
Of course, without knowing your use case its hard to tell what happens there, but maybe you can optimize the flow itself (using bulk operations, reducing the number of requests to db, caching and so forth).
My preferred approach in those scenarios is make the call asynchronous (Spring Boot allow this using the #Async annotation), hence the client won't expect for any HTTP response. The notification could be done via a WebSocket that will push a message to the client with the progress each X items processed.
Surely it will add more complexity to your application, but if you design the mechanism properly, you'll be able to reuse it for any other similar operation you may face in the future.
The #Transactional annotation accepts a timeout (although not all underlying implementations will support it). I would argue against trying to split the IDs into two calls, and instead try to fix the timeout (after all, what you really want is a single, all-or-nothing transaction). You can set timeouts for the whole application instead of on a per-method basis.
From technical point, it can be done with the org.springframework.transaction.annotation.Propagation#NESTED Propagation, The NESTED behavior makes nested Spring transactions to use the same physical transaction but sets savepoints between nested invocations so inner transactions may also rollback independently of outer transactions, or let them propagate. But the limitation is only works with org.springframework.jdbc.datasource.DataSourceTransactionManager datasource.
But for really large dataset, it still need more time to processing and make the client waiting, so from solution point of view, maybe using async approach will be more better but it depends on your requirement.
I wanted to set up this variable called userName, which should get a new value inside the ValueEventListener. However, when setting the new value inside the function, it doesn't change. The output is still ""
private fun getName(){
var userName = ""
val user = fbAuth.currentUser
var uid = user!!.uid
mDatabase = FirebaseDatabase.getInstance().getReference("users")
mDatabase.addValueEventListener(object : ValueEventListener {
override fun onCancelled(p0: DatabaseError) {
TODO("not implemented")
}
override fun onDataChange(snapshot: DataSnapshot) {
userName = snapshot.child(uid).child("name").getValue().toString()
}
})
println(userName)
}
Expected output: John (value of name child),
current output: ""
The listener is asynchronous, if you put the println statement below the username = line, then it will print.
In fact, go ahead and do that; observe the timestamps; which one prints first? The empty one or the one inside the callback?
the var is being modified by the callback, but the println executes first, long before (in computer times, that is) Firebase emits its value.
Additionally, I would invert the order of the mDatabase lines.
You are essentially requesting a value and then listening for results; the result may have already been emitted. You should add the listener first, then request the data.
Update: what if I need the value for another callback?
Welcome to the world of asynchronous programming :-)
What you describe is a set of independent asynchronous operations. You need value A, and value B, but you can't obtain value B, until you have value A. Both are asynchronous and take time, but you don't have time on the main thread, or rather, you have ~16ms to compute, measure, and draw your screen so the OS can keep up with 60 frames per second. That's not a lot of time and part of the reason why asynchronous programming exists!
This other answer already provides a working sample of what you need. This other external link has a more concrete example of the Observer Listener pattern.
In short, what you want is an instance of an object which can be invoked once the an operation completes.
In a regular synchronous function, each statement is executed after the other and no statement will be executed until the previous one is not finished; all statements are therefore, blocking statements.
For example:
var times = 2
var message = "Hello"
var world = "World"
println("The message is $message $times")
println(world)
Will print:
The message is Hello 2
World
This is because the execution point will go from one line to the other, waiting for the previous one to execute. If one operation takes time, the thread will be blocked (from performing anything else) until that operation completes and the execution point can move to the next instruction.
As you can imagine, the Main Thread in iOS and Android (and well, Windows, macOS, Linux, etc) cannot be blocked, or the OS wouldn't be able to respond to your touches and other things happening (for e.g., on a mobile phone, an incoming phone call wouldn't be able to be processed if the UI is not responsive and you cannot tap "answer").
This is why we use other "threads" to off-load things that are not super fast. This comes with a mindset change, as well as correct planning, for things are now more complicated.
Let's see a simple example (some pseudo code, so bear any obvious glaring mistakes, this is just to illustrate the point, not to write a solution).
fun main() {
var hello = "Hello"
var message = thisTakesTime()
println("The message is $hello $message")
println(hello)
}
fun thisTakesTime(): String {
// do something that takes 1 second (1000 ms)
Thread.sleep(1000)
return "World"
}
This will print
The message is Hello World
Hello
As you can see, nothing changed, except that for one entire second, the main thread was unresponsive. If you were to run this on Android, for example, it will work, but your app will not respond for a second, during the Thread.sleep. One second is fast, try 10 seconds; this exceeds the Android Operating System limit of 5 seconds for the main thread to be unresponsive, before deciding the ANR (application not responding) dialog is needed; this is the infamous "It looks like XXX application is not responding, wait or close".
What can you do?
Initially, if you have too many callbacks (where callback A cannot execute until callback B finished, and callback B cannot execute until callback C finished), and you start nesting them like that, you end up in the infamous Callback-Hell (in Javascript, but valid for any language/platform).
Basically tracking all these asynchronous callbacks and ensuring that by the time the response comes, your next callback is ready, and so forth is a pain, and it introduces exponential complexity if, for example, callback C fails in the middle, now you have to let callback B know that C failed and therefore it will have to fail too, which -in turn- has to let callback A (the original!) know that B failed, and therefore A has to do something about it, does A need to know that B failed because of C? or does A only cares for B and B alone and the reasons behind B's failure are irrelevant?
Well, as you can see, even talking about this gets complicated and messy and I didn't even cover other possible scenarios, equally as complex.
What I'm trying to say here is not that you shouldn't use callbacks; it's that you have to carefully plan where, and when to use them.
Kotlin has alternatives to reduce/remove the callback hell by using Coroutines but these are a moderately advanced topic and it also requires a fundamental change in how you design your components and pieces.
All in all, for your use case, remember the golden rule of OOP: Make small concrete classes that do very few things, and do them well. If you need to start adding too many if () all over the place, then chances are you're mixing business logic, random decisions, and "whatabout" cases all over the place.
Imagine you have a class that processes Location data and uploads it to a server.
You may be tempted to:
Write all the code in the Activity/Fragment (or ViewModel); quickly becomes a mess.
Create a LocationUtils with static methods (or singleton pattern); a mess already, but also hard to test, and mock. What if you need more than one type of processing? Or what if you want to store them in a database, are you going to add more static methods?
Create a small LocationProcessor class, that receives two points (lat/long) does the processing in a small function, and returns the processed data, then create another class called LocationUploader, that receives clean input from a Processor, and uploads it to a server. None of these classes should think about "what if I don't have permissions, what if the user turns location off", etc. These are problems that exceed the responsibility of a class whose intention was to process location coordinates, nothing else. There should be other classes responsible for that. Remember, small classes, small responsibilities == less to worry about in a single file.
Conclusion?
Well, at this point there are better answers which will give you the copy-paste version of what you're looking for; I believe the concept you have to take out of this wall of text today, is that in order to write modern, testable, and simple functional code, a change in how you plan things must happen.
Long story short: when things are not synchronous, you need to keep something (an object) ready to be called back (hence the name callback), listening (or observing) (hence why we call them listener or observers), the emission of something (usually called an Observable, because it can be "observed").
Good luck!
Yes, the listener is asynchronous, it will only work if you print the variable inside onDataChange method.
However you can use a callback strategy to wait for Firebase to return the data. Something like this:
interface MyCallback {
fun onCallback(value: String )
}
fun readData(myCallback: MyCallback){
mDatabase.addValueEventListener(object : ValueEventListener {
override fun onDataChange(snapshot: DataSnapshot) {
userName = snapshot.child(uid).child("name").getValue().toString()
myCallback.onCallback(value)
}
})
}
fun test(){
readData(object: MyCallback {
override fun onCallback(value : String) {
println(value)
}
})
}
As Martin says, it's an asynchronous operation, you should handle the text output after the asynchronous process has completed:
mDatabase.addValueEventListener(object : ValueEventListener {
override fun onCancelled(p0: DatabaseError) {
TODO("not implemented")
}
override fun onDataChange(snapshot: DataSnapshot) {
userName = snapshot.child(uid).child("name").getValue().toString()
println(userName) //--> Asynchronous request has ended, show the name
}
})
I'm making a series of connections asynchronously via MySQL, and I have a class which contains a bunch of easy-accesible static methods to update/remove/clear/get/etc data.
The issue I'm confronted with is that the getter methods won't return the proper value (practically ever) because they are returned before the async connection gets a chance to update the value to be returned.
Example:
public static int getSomething(final UUID user)
{
Connection c = StatsMain.getInstance().getSQL().getConnection();
PreparedStatement ps;
try
{
ps = c.prepareStatement("select something from stats where uuid=?");
ps.setString(1, user.toString());
ResultSet result = ps.executeQuery();
return result.getInt("something");
}
catch (SQLException e)
{
return false;
}
}
(Not copy & pasted, but pretty close)
I realize I can use a 'callback' effect by passing an interface to the method and doing such, but that becomes very tedious when the database stores 10 values for a key.
Sounds like you're looking for Futures since Java 6 or CompletableFuture, which is new in Java 8
Solution 1:
The best method I've come up with is have a thread with a loop in it that waits for MySQL to return values and responds to each value. This is rather like the callback in the get routine, but you only have the one loop. Of course, the loop has to know what to do with each possible returned piece of data.
This means rethinking a bit how your program works. Instead of: ask a question, get an answer, use the answer, you have two completely independent operations. The first is: ask a question, then forget about it. The second is: get an answer, then, knowing nothing about the question, use the answer. It's a completely different approach, and you need to get your head around it before using it.
(One possible further advantage of this approach is that MySQL end can now send data without being prompted. You have the option of feeding changes made by another user to your user in real time.)
Solution 2:
My other solution is simpler in some ways, but it can have you firing off lots of threads. Just have your getSomething method block until it has the answer and returns. To keep your program from hanging, just put the whole block of code that calls the method in its own thread.
Hybrid:
You can use both solutions together. The first one makes for cleaner code, but the second lets you answer a specific question when you get the reply. (If you get a "Customer Name" from the DB, and you have a dozen fields it could go in, it might help to know that you did ask for this field specifically, and that you asked because the user pushed a button to put the value in a specific text box on the screen.)
Lastly:
You can avoid a lot of multithreading headaches by using InvokeLater to put all changes to your data structures on your EventQueue. This can nicely limit the synchronization problems. (On the other hand, having 20 or 30 threads going at once can make good use of all your computer's cores, if you like to live dangerously.)
You may want to stick with synchronized calls, but if you do want to go asynchronous, this is how I'd do it. It's not too bad once you get some basic tools written and get your brain to stop thinking synchronously.
I'm not quite sure exactly how to go about this...so it may take me a few tries to get this question right. I have a annotation for caching the results of a method. My code is a private fork for now, but the part I'm working on starts from here:
https://code.google.com/p/cache4guice/source/browse/trunk/src/org/cache4guice/aop/CacheInterceptor.java#46
I have annotated a method that I want cached, that runs a VERY slow query, sometimes takes a few minutes to run. The problem is, that my async web app keeps getting new users coming and asking for the same data. However, the getSlowData() method hasn't completed yet.
So something like this:
#Cached
public getSlowData() {
...
}
Inside the interceptor, we check the cache and find that it's not cached, which passes us down to:
return getResultAndCache(methodInvocation, cacheKey);
I've never gotten comfortable with the whole concept of concurrency. I think what I need is to mark that the getResultAndCache() method, for the given getSlowData(), has already been kicked off and subsequent requests should wait for the result.
Thanks for any thoughts or advice!
Most cache implementations synchronize calls to 'get' and 'set' but that's only half of the equation. What you really need to do is make sure that only one thread enters the 'check if loaded and load if not there' part. For most situations, the cost to serializing thread access may not be worth if there's
1) no risk
2) little cost
to loading the data multiple times through parallel threads (comment here if you need more clarification on this). Since this annotation is used universally, I would suggest creating a second annotation, something like '#ThreadSafeCached' and the invoke method will look like this
Object cacheElement = cache.get(cacheKey);
if (cacheElement != null) {
LOG.debug("Returning element in cache: {}", cacheElement);
} else {
synchronized(<something>) {
// double-check locking, works in Java SE 5 and newer
if ((cacheElement = cache.get(cacheKey)) == null) {
// a second check to make sure a previous thread didn't load it
cacheElement = getResultAndCache(methodInvocation, cacheKey);
} else {
LOG.debug("Returning element in cache: {}", cacheElement);
}
}
}
return cacheElement;
Now, I left the part about what you synchronize on. It'd be most optimal to lock down on the item being cached since you won't make any threads not specifically loading this cache item wait. If that's not possible, another crude approach may be to lock down on the annotation class itself. This is obviously less efficient but if you have no control over the cache loading logic (seems like you do), it's an easy way out!