Why would I ever want to load an Objectify entity asynchronously? And what does asynchronous loading actually mean?
According to Objectify documentation about loading, the following way of loading an Entity is asynchronous:
// Simple key fetch, always asynchronous
Result<Thing> th = ofy().load().key(thingKey);
And if I want a load to perform synchronously then I should do this:
Thing th = ofy().load().key(thingKey).now(); // added .now()
To me, asynchronous means that the action will take place later at some unspecified time. For saving, asynchronous makes sense because the datastore operation may need some time to finish on its own without blocking the application code.
But with loading, does asynchronous mean the load will take place at another time? How is that even possible in Java? I thought the variable Result<Thing> th had to be updated when the line of code Result<Thing> th = ofy().load().key(thingKey); finishes executing.
As a novice it's taken me a long time to figure this out (see for instance Objectify error "You cannot create a Key for an object with a null #Id" in JUnit).
So I have a few questions:
1] Why would I ever want to load an Objectify entity asynchronously?
2] What does asynchronous loading actually mean?
3] What is the conceptual link between now() for loading and now() for saving?
Synchronous Load (source)
Thing th = ofy().load().key(thingKey).now();
Synchronous Save (source)
ofy().save().entity(thing1).now();
4] Why isn't synchronous the default behavior for saving and loading?
Response from Google Cloud Support to support case 05483551:
“Asynchronous” in the context of Java means the use of “Futures” or Future-like constructs. A Future in java[1] is an object that represents an operation that doesn’t necessarily need to be performed and completed by the time the next line begins executing in the current thread.
A call to an asynchronous function in Java will return a Future immediately, representing the promise that a background “thread” will work on the computation/network call while the next line of the code continues to execute, not needing that result yet. When the method .get() is called on the Future object, either the result is returned, having been obtained in time, or the thread will wait until the result is obtained, passing execution to the next line after the .get() call only once this happens.
In Objectify, Futures were avoided, and instead the Result interface was defined[2], for reasons related to exceptions being thrown that made it painful to develop on the basis of Futures. They work in almost identical fashion, however. Where a regular Future has the method .get(), the Result interface (implemented by several different concrete classes depending what kind of Objectify call you’re doing) has .now(), which retrieves the result or waits the thread until it’s available.
The reason why you might want to load an entity asynchronously is when you have a request handler or API method that needs an Entity later in the function, but has some other computation to do as well, unrelated to the Entity. You can kick off the load for the entity in the first line, obtaining a Result, and then only call .now() on the Result once your other unrelated code has finished its execution. If you waited for the point when you call .now() to actually initiate the load, you might have your response handler/API method just waiting around for the result, instead of doing useful computations.
Finally, the conceptual link between .now() for loading and .now() for saving is that both operations happen in the background, and are only finally forced, waiting the execution thread, when .now() is called on the Result-interface-implementing object that is returned by the call to save() or load().
I hope this has helped explain the asynchronous constructs in Java Objectify for you. If you have any further questions or issues, feel free to include these in your reply, and I'll be happy to help.
Sincerely,
Nick
Technical Solutions Representative
Cloud Platform Support
[1] http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
[2] http://objectify-appengine.googlecode.com/svn/trunk/javadoc/com/googlecode/objectify/Result.html
Asynchronous operations start a network fetch to the backend and then let your code continue executing. The advantage of async operations is that you can run several of them in parallel:
Result<Thing> th1 = ofy().load().key(thingKey1);
Result<Thing> th2 = ofy().load().key(thingKey2);
Result<Thing> th3 = ofy().load().key(thingKey3);
th1.now();
th2.now();
th3.now();
This executes significantly faster than calling now() immediately each time. Note this is a bad example because you probably would use a batch get (which also parallelizes the operation) but you can have several queries, saves, deletes, etc running simultaneously.
now() always forces synchronous completion, blocking until done.
The Google Cloud Datasore was designed to give the user a positive relational and non-relational experience, essentially the best of both worlds. Google datastore is a NoSQL database which offers eventual consistency for improved scalability but also gives you the option to choose strong consistency.
This article by Google, Balancing Strong and Eventual Consistency with Google Cloud Datastore, will go a long way to answering some of your questions. It explains the eventual consistency model which is key to understanding how the datastore works under the hood in relation to your question.
Related
This is a general question regarding android development and the use of co-routines. I am relatively new to developing in android and have created an application using the MVVM architecture model.
I am currently having a problem where I insert into a table and retrieve an ID back in an observer with LiveData.
I then need to use this ID immediately to insert into another table to act as a foreign key.
One table defines the entry and the other the fields associated to that entry.
My issue is that the insertion of the initial ID is happening in the background, so by the time the ID is returned to the activity an error has already been thrown up.
I need some way of:
either waiting for the ID to be returned
or have the insertion run in the foreground (but am unsure how to do
this).
I have seen one solution is to use co-routines but this seems to just be a Kotlin solution.
Does anyone know of a solution that would work in android java to immediately retrieve the ID of insertion in the activity to use for the next insert?
*I am using a room SQL Database.
Ok, correct me if I'm wrong, but what I think you want is a way to chain asynchronous operations together in a synchronous way.
So you have one operation which needs to insert into a table asynchronously, and another operation which needs to use the id from the result of the first operation to insert into another table.
So your second operation requires the first operation to have finished before it runs. But your first operation is running in the background so the question arises; "How do I make sure not to fire the second operation until the first one has finished?".
This is the concept of "chaining" asynchronous calls. Or, in other words, performing asynchronous calls in a synchronous fashion.
Because you need to use Java you won't be able to use Kotlin coroutines (because that's a Kotlin language feature). Fortunately, there are several methods for achieving this in Java.
I personally would recommend the use of RX Java. There are loads of operators for combining asynchronous operations. The one you'd probably want for this use case is called flatMap, which is an operator which blocks on the first operations result before invoking the second operation, with the results of the first one as argument(s).
However, RX is quite a big dependency to add and also has quite a learning curve. So, choosing to use this tool will depend on how prevelant this kind of problem is in your code base.
Another option, is to set up a shared single thread executor which would be used to issue both operations on the same background thread. Because it is a single background thread, as long as you issue the commands into the executor sequentially, they will execute sequentially, but on a background thread. So, assuming your Room DB functions are blocking (i.e. when you issue them, the current thread waits for the operation to complete) then you can have a chained operation like so:
// Create a shared single threaded executor to run both operations on the same background thread
private Executor sharedSingleThreadExecutor = Executors.newSingleThreadExecutor();
private void doThingAThenThingB() {
// Sequentially call thing A on shared background thread
sharedSingleThreadExecutor.execute(() -> {
// Do thing A
doThingA();
});
// Sequentially call thing B on shared background thread
sharedSingleThreadExecutor.execute(() -> {
// Do thing b
doThingB();
});
}
I've spent a lot of time looking at this and there are a tonne of ways to background in Java (I'm specifically looking at Java 8 solutions, it should be noted).
Ok, so here is my (generic) situation - please note this is an example, so don't spend time over the way it works/what it's doing:
Someone requests something via an API call
The API retrieves some data from a datastore
However, I want to cache this aggregated response in some caching system
I need to call a cache API (via REST) to cache this response
I do not want to wait until this call is done before returning the response to the original API call
Some vague code structure:
#GET
# // api definitions
public Response myAPIMethod(){
// get data from datastore
Object o = getData();
// submit request to cache data, without blocking
saveDataToCache();
// return the response to the Client
return Response.ok(data).build();
}
What is the "best" (optimal, safest, standard) way to run saveDataToCache in the background without having to wait before returning data? Note that this caching should not occur too often (maybe a couple of times a second).
I attempted this a couple of ways, specifically with CompletableFutures but when I put in some logging it seemed that it always waited before returning the response (I did not call get).
Basically the connection from the client might close, before that caching call has finished - but I want it to have finished :) I'm not sure if the rules are the same as this is during the lifetime of a client connection.
Thanks in advance for any advice, let me know if anything is unclear... I tried to define it in a way understandable to those without the domain knowledge of what I'm trying to do (which I cannot disclose).
You could consider adding the objects to cache into a BlockingQueue and have a separate thread taking from the queue and storing into cache.
As per the comments, the cache API is already asynchronous (it actually returns a Future). I suppose it creates and manages an internal ExecutorService or receives one at startup.
My point is that there's no need to take care of the objects to cache, but of the returned Futures. Asynchronous behavior is actually provided by the cache client.
One option would be to just ignore the Future returned by this client. The problem with this approach is that you loose the chance to take a corrective action in case an error occurrs when attempting to store the object in the cache. In fact, you would never know that something went wrong.
Another option would be to take care of the returned Future. One way is with a Queue, as suggested in another answer, though I'd use a ConcurrentLinkedQueue instead, since it's unbounded and you have mentioned that adding objects to the cache would happen as much as twice a second. You could offer() the Future to the queue as soon as the cache client returns it and then, in another thread, that would be running an infinite loop, you could poll() the queue for a Future and, if a non null value is returned, invoke isDone() on it. (If the queue returns null it means it's empty, so you might want to sleep for a few milliseconds).
If isDone() returns true, you can safely invoke get() on the future, surrounded by a try/catch block that catches any ExecutionException and handles it as you wish. (You could retry the operation on the cache, log what happened, etc).
If isDone() returns false, you could simply offer() the Future to the queue again.
Now, here we're talking about handling errors from asynchronous operations of a cache. I wouldn't do anything and let the future returned by the cache client go in peace. If something goes wrong, the worst thing that may happen is that you'd have to go to the datastore again to retrieve the object.
I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.
I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)
This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.
I did find this statement:
With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds.
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.
I'm currently working on a (rather large) pet project of mine , a Swing application that by it's very nature needs to be multi-threaded. Almost all user interactions might fetch data from some remote servers over the internet , since I neither control these servers nor the internet itself, long response times are thus inevitable. A Swing UI obviously cannot repaint itself while the EDT is busy so all remote server calls need to be executed by background thread(s).
My problem:
Data fetched by the background threads gets 'enriched' with data from a local (in-memory) database (remote server returns IDs/references to data in the local database). This data later eventually gets passed to the EDT where it becomes part of the view model. Some entities are not completely initialized at this point (lazy-fetching enabled) so the user might trigger lazy-fetching by e.g. scrolling in a JTable. Since the hibernate session is already closed this will trigger a LazyInitializationException. I can't know when lazy-fetching might be triggered by the user so creating a session on demand/attaching the detached object will not work here.
I 'solved' this problem by:
using a single (synchronized , since Session instances are not thread-safe) Session for the whole application
disabling lazy-fetching completely
While this works, the application's performance has suffered greatly (sometimes being close to unusable). The slowdown is mainly caused by the large number of objects that are now fetched by each query.
I'm currently thinking about changing the application's design to 'Session-per-thread' and migrating all entities fetched by non-EDT threads to the EDT thread's Session (similar to this posting on the Hibernate forums).
Side-note: Any problems related to database updates do not apply since all database entities are read-only (reference data).
Any other ideas on how to use Hibernate with lazy-loading in this scenario ?
Don't expose the Session itself in your data API. You can still do it lazily, just make sure that the hydration is being done from the 'data' thread each time. You could use a block (runnable or some kind of command class is probably the best Java can do for you here unfortunately) that's wrapped by code that performs the load async from the 'data' thread. When you're in UI code, (on the UI thread of course) field some kind of a 'data is ready' event that is posted by the data service. You can then get the data from the event use in the UI.
You could look have a look at Ebean ORM. It is session-less and lazy loading just works. This doesn't answer your question but really proposes an alternative.
I know Ebean has built in support for asynchronous query execution which may also be interesting for your scenario.
Maybe worth a look.
Rob.
There are two distinct problems, that should get resolved seperately:
Handling of Hibernate Sessions in Swing Applications. Let me recommend my own article, regarding this problem: http://blog.schauderhaft.de/2008/09/28/hibernate-sessions-in-two-tier-rich-client-applications/
The basic idea is to have a session for every frame, excluding modal frames which use the session of the spawning frame. It is not easy but it works. Meaning, you won't get any LLEs anymore.
How to get your GUI thread separated from the back end.
I recommend to keep the hibernate objects strictly on the back end thread they originate from. Only give wrapper objects to the ETD. If these wrapper objects are asked for a value, they create a request which gets passed to the backend thread, which eventually will return the value.
I'd envision three kinds of wrapper Implementations:
Async: requests the value, and gets notified when the value is available. It would return immediately with some dummy value. On notification it will fire a PropertyChange event i.O. to inform the GUI about the 'changed' value (changed from unknown to a real value).
Sync: requests the value and waits for it to be available.
Timed: a mixture between the two, waiting for a short time (0.01) seconds, before returning. This would avoid plenty change events, compared to the async version.
As a basis for these wrappers a recommend the ValueModel of the JGoodies Binding library: http://www.jgoodies.com/downloads/libraries.html
Obviously You need to take care that any action is only performed on actually loaded values, but since you don't plan on doing updates this shouldn't be to much of an issue.
Let me end with a warning: I have thought about it a lot, but never actually tried it, so move with care.