I am trying to implement a servlet for GPS monitoring and trying create simple cache, because i think that it will be faster then SQL request for every Http Request. simple scheme:
in the init() method, i reads one point for each vehicle into HashMap (vehicle id = key, location in json = value) . after that, some request try to read this points and some request try to update (one vehicle update one item). Of course I want to minimize synchronization so i read javadoc :
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
If I am right, there is no any synchronization in my task, because i do only "not a structural modification == changing the value associated with a key that an instance already contains)". is it a correct statement?
Use the ConcurrentHashMap it doesn't use synchronization by locks, but by atomic operations.
Wrong. Adding an item to the hash map is a structural modification (and to implement a cache you must add items at some point).
Use java.util.concurrent.ConcurrentHashMap.
if all the entries are read into hashmap in init() and then only read/modified - then yes, all the other threads theoretically do not need to sync, though some problems might arise due to threads caching values, so ConcurrentHashMap would be better.
perhaps rather than implementing cache yourself, use a simple implementation found in Guava library
Caching is not an easy problem - but it is a known one. Before starting, I would carefully measure wheter you really do have a performance problem, and whether caching actually solve it. You may think it should, and you may be right. You may also be horrendously wrong depending on the situation ("Preemptive optimization is the root of all evil"), so measure.
This said, do not implement a cache yourself, use a library doing it for you. I have personnaly good experience with ehcache.
If I understand correctly, you have two types of request:
Read from cache
Write to cache (to update the value)
In this case, you may potentially try to write to the same map twice at the same time, which is what the docs are referring to.
If all requests go through the same piece of code (e.g. an update method which can only be called from one thread) you will not need synchronisation.
If your system is multi-threaded and you have more than one thread or piece of code that writes to the map, you will need to externally synchronise your map or use a ConcurrentHashMap.
For clarity, the reason you need synchronisation is that if you have two threads both trying to update the a JSON value for the same key, who wins? This is either left up to chance or causes exceptions or, worse, buggy behaviour.
Any time you modify the same element from two threads, you need to synchronise on that code or, better still, use a thread-safe version of the data structure if that is applicable.
Related
I'm trying to remove an entry from the Caffeine cache manually. I have two attempts but I suspect that there are some problems with both of them:
This one seems like it could suffer from a race condition.
cache.get(key);
cache.invalidate(key);
This one seems to be bypassing the methods of the cache itself so I'm not sure if there are strange side effects that result.
cache.asMap().remove(key);
Is there a standard way to do this or a method of the cache that I'm missing here?
You should use cache.asMap().remove(key) as you suspected. The other call delegates to this, but does not return the value because that is not idiomatic for a cache.
The Cache interface is opinionated for how one should commonly use a cache, while the asMap() view is more raw to allow for advanced operations. For example, you generally wouldn't iterate over a cache (e.g. memcached doesn't allow this), but if you need to then the Map provides that support. All calls flow into the same backing structure, so there will be no inconsistency. The APIs merely try to nudge users towards best practices, but strive to not block a developer from getting their work done safely and correctly.
In my java webappication, there is an action which updates the order object and saves it in DB through ajax call(POST request).
Method saveOrder() performs this action,
if multiple users perform the same action, there should be lock on this method, so that the write transaction is performed with the latest data.
The class file code is as follows
public class OrderLoader extends JSONProcessSimple {
#override
public JSONObject exec(JSONObject jsonsent) throws JSONException, ServletException {
JSONObject result = this.saveOrder(array);
return result;
}
public JSONObject saveOrder(JSONArray jsonarray) throws JSONException {
JSONObject jsonResponse = new JSONObject();
//Write operation on DB
return jsonResponse;
}
}
Is it possible through synchronized approach, please suggest me a solution.
Thanks in advance!
Depending on the architecture of your application (does it run in mutiple parallel instances in a clustered environment?), there is no simple solution; if it is executed in only one VM, synchronized could be an approach. Also, have a look at the java.util.concurrent.lock package.
For a more sophisticated, distributed approach, you could implement a DB-based lock.
A better solution would be to check your database isolation and your SQL. Perhaps you need a SERIALZIABLE connection or a transaction manager. That's server side.
It's easy to add a synchronized keyword, but I think it's more than that.
http://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html
http://www.precisejava.com/javaperf/j2ee/JDBC.htm
Synchronization allone would not do the job since you'd block until the first request is saved and then you'd still have to check whether the other orders are newer or not.
As the others already stated, there's no easy solution and it would depend on your architecturea and environment.
You might want to try some optimitistic locking approach, i.e. each update checks a version column and increments it if the version matches. Something like ... SET version = x + 1 WHERE version = x and then you check whether columns have been updated or not.
That would not get the latest order to be saved but would prevent lost updates. You could, however, adapt that approach to only update the database whenever you have newer data (maybe based on some date and then use WHERE date > x).
EDIT:
Since you're using Hibernate, I'd look into Hibernate's optimistic locking. That would at least handle concurrent edits since only the first one would succeed. If non-concurrent edits result in OptimisticLockExceptions you are probably missing a (re)read somewhere.
With concurrent edit I mean user A reads the object, changes it and then triggers the write. In between user B has also read the object and triggers a write later. The writes are not concurrent but user B didn't see the changes of user A and thus might result in lost updates.
In your case it would depend on what operations are done on an order. In some cases you might safely reread the order just before persisting the changes (e.g. when adding positions, it might even be ok to do so when deleting them - if the position doesn't exist you just do nothing) while in other cases you might want to report the concurrent edit (e.g. when two users edit the quantity of the same position, the order header etc.)
If your method updates order for different users then you dont need to have any synchronization as every thread hitting the method will have its own copy, thus its not a concern.
But if all users are acting on same data, you can use write to db in begintransaction and endtransaction. This should make concurrent writes threadsafe.
I need to update a whole collection concurrently in a background thread, but read operation might take place at the same time. It takes about 3 seconds to update the collection when I benchmark it. Is there any way to lock a collection while updating the collection? I try to create a new collection and insert all the documents into it and rename it to the original collection with "dropToTarget=true", but I am not sure how safe and stable it is in terms of sharding. I read that renameCollection is incompatible with the sharding.
It would be great if someone can suggest if there is a good idea.
Thanks.
Do you presented two possible strategies to update your collection, one being inline with a lock on it and the other one with a temporary collection?
As the mongodb documentation clearly states it will not work for sharded collections (http://docs.mongodb.org/manual/reference/command/renameCollection/). From my understanding this means that your collection you want to rename isn't sharded, as you need to delete the other collection before you do the actual renaming you'll mostlikely loose any previously kept sharding (-information). So you would need to reactivate the sharding. I highly discourage from using the two collection approach, especially if you're sharding your data.
You would need to get all the data from your sharded collection and store it centralized, once you're done with updating you need to rename the collection and shard it again. This will cause much I/O for your whole system, especially for the client doing the update.
Depending on your system architecture (with a single point of entry). You could easily hold some global flag telling you if you currently have the collection update running. Forbidding other write operations.
For multi-entry points into your MongoDB you might try $isolated, but this doesn't work with sharded collections. And I'm not sure if it allows read operations, the documentation isn't very clear.
Is it strictly disallowed to write any data, while the update is in progress? What type of updates do you perform. Can they influence each other? Or would it be possible to have concurrent writes?
I have a method that reads from values from a file (reading method) and call another method from a Data Access Object (DAO) class which store these values into DB (storing method). My question is: shall i pass all the values read in the reading method as a list to the storing method ( which saves hundreds of method calls, but introduce the need for creation of the list in the reading method iteration over the list in the storing method), or shall i make a separate call to the storing method for each value ( which means hundreds of method calls, but without the need for list creation and iteration). which approach is more efficient from a performance and good practice view of point?
You have multiple values, so someone has to iterate over the values. The only question is: who?
Generally speaking: As soon as you talk about DB access (reading and writing) the overhead of allocating a list, iterating over it and doing some method calls is negligible to the overhead of one DB call (we talk about hundreds of call, not billions, right?). A DB call is in most cases something like a remote procedure call and writing to a database will write to disk - another great performance sucker.
So in most cases you gain performance if you minimize the calls to the DB. And that can be done more easily if you DB access layer knows the complete job. So if you give the DAO a list it can do stuff like prepared statement, batch updates and what not. Given only bits and pieces those things are impossible to do with good performance.
You should read all of them at once, as a List and use something like JDBC Batch Update to store them all at once or in batches.
However, doing them all one by one is definitely a no-no. It will make too many network / IO operations.
The only way to be sure is to time the various options. However, since you are hitting a database in this process, that is likely to be dominant. The variant that reduces the database access is liable to be the fastest. If both variants have the same impact on the database then I'd expect them to take similar amounts of time. Calling methods and creating objects are orders of magnitude faster than communicating with your database.
Accessing the harddisk is the biggest overhead here. Do the reading from harddisk once, if you can. If not, read into a buffer and refill it after you're done with it. This operation is suitable for Threading, give it a try.
Database resources are limited, database hit should be minimized for better performance.
If you compare the cost of database hit and operating on a list, it's negligible.
So create a List and try to do batch operation instead of doing one operation per record.
Less database interaction is always a good practice and is also highly performing option.
To answer your question with respect to good practice, you have to consider the usage of the DAO method across your application. If you see it being called multiple times for multiple values from a single point of origin, then I think you should pass a List to the method. But if you have more cases of only one value calls to the method, then you should define it at that granularity.
What others said about DB access still holds. Try to batch multiple DB accesses if that is possible in your case.
I have a Hibernate Entity in my code. i would fetch that and based on the value of one of the properties ,say "isProcessed" , go on and :
change value of "isProcessed" to "Yes" (the property that i checked)
add some task to a DelayedExecutor.
in my performance test, i have found that if i hammer the function,a classic dirty read scenario happens and i add too many tasks to the Executor that all of them would be executed. i can't use checking the equality of the objects in the Queue based on anything , i mean java would just execute all of them which are added.
how can i use hibernate's dirty object stuff to be able to check "isProcessed" before adding the task to executor? would it work?
hope that i have been expressive enough.
If you can do all of your queries to dispatch your tasks using the same Session, you can probably patch something together. The caveat is that you have to understand how hibernate's caching mechanisms (yes, that's plural) work. The first-level cache that is associated with the Session is going to be the key here. Also, it's important to know that executing a query and hydrating objects will not look into and return objects from the first-level cache...the right hand is not talking to the left hand.
So, to accomplish what you're trying to do (assuming you can keep using the same Session...if you can't do this, then I think you're out of luck) you can do the following:
execute your query
for each returned object, re-load it with Session's get method
check the isProcessed flag and dispatch if need-be
By calling get, you'll be sure to get the object from the first-level cache...where all the dirty objects pending flush are held.
For background, this is an extremely well-written and helpful document about hibernate caching.