Technology Versions:
Hibernate 3.6.5
Hibernate Search 3.4.0
Lucene (lucene-core-*.jar) 3.1.0
Spring 3.1.0.M2
Hibernate (Search) Configuration:
<prop key="hibernate.search.default.indexBase">/some/dir</prop>
<prop key="hibernate.search.default.directory_provider">org.hibernate.search.store.FSDirectoryProvider</prop>
<prop key="hibernate.search.default.locking_strategy">native</prop>
<prop key="hibernate.search.default.exclusive_index_use">false</prop>
The issue:
Our application is deployed in Amazon (AWS cloud), but we've faced this issue in our local clusters as well:
The application design is such that there is a Thread that is spawned from within the main (Web) application wherefrom we need to update an Indexed entity. Basically it's like a status monitor thread which reads a .status file and updates the database every 30 seconds or so. This keeps happening for about 10mins to 1/2 an hour on an average.
The issue we see is that: every few days we need to regenerate the indexes because Hibernate Search stops returning anything for the entity in question (the one discussed above).
I went through few forums and seems it is suggested that only a single thread should be updating the Lucene indexes. But it is also given that index writing is thread-safe. So even if multiple threads are writing to the same index, I still expect that it should not cause the issue (of nothing being returned in the search). That is to say, I may get stale status of the entity in question, but still, something should be returned.
We're using the default IndexReader/Writer implementation of Hibernate Search.
Any help would be highly appreciated. Thanks.
here are some thoughts.
I went through few forums and seems it is suggested that only a single
thread should be updating the Lucene indexes.
That's not generally true. Lucene and Hibernate Search allow multiple index writer, BUT access to the index must be properly synchronized which happens via Lucene's org.apache.lucene.store.LockFactory. The lock factory is configurable and you are using the native one via the property *hibernate.search.default.locking_strategy*. The problem might be that this strategy is file based. I don't know much about Amazon's distributed file system internal workings, but I would imagine that file locks just don't work in this case. You might need to implement a custom lock strategy.
But it is also given that index writing is thread-safe. So even if multiple threads are
writing to the same index, I still expect that it should not cause the issue
Correct, provided the locking works.
An alternative is to work w/o locks (setting *hibernate.search.default.locking_strategy* to none) provided you can guarantee that only your update thread will ever write to the index. Do you have aside of the update thread automatic indexing enabled? If so try turning it of (permitted your use case allows you to).
Ok, for those who have landed here looking for a solution -- here's what we did finally, which seems to have addressed the issue (we've load-tested this fix on AWS and have had no issues so far):
We wrote our implementation of a lock factory based on Lucene's NativeFSLockFactory. We modified the obtain method so as to retry obtaining the lock a couple of times before giving-up. We added a short lag (sleep) between retries to handle NFS latencies.
We tested this fix with Lucene's LockVerifyServer and observed that under high load, even though a few of the lock obtain requests have to wait to obtain the lock -- but eventually every lock obtain request is addressed. In real-time, this interprets to a successful update of the index files.
Thank you Hardy for showing us the path. :)
Update: Yesterday, we had to bump-up the retry count to a higher number ~30 because we faced an index update slippage with the previous value of 3. Things seem fine thereafter.
Related
As my understanding, LockAcquisitionException will happen when a thread is trying to update a row of record that is locking by another thread. ( Please do correct me if I am wrong )
So I try to simulate as follow:
I lock a row of record using dbVisualizer, then I using application to run a update query on the same record as well. At the end, I am just hitting global transaction time out instead of LockAcquisitionException with reason code 68.
Thus, I am thinking that my understanding is wrong. LockAcquisitionException is not happen by this way. Can kindly advise or give some simple example to create the LockAcquisitionException ?
You will get LockAcquisitionException (SQLCODE=-911 SQLERRMC=68) as a result of a lock timeout.
It may be unhelpful to compare the actions of dbViz with hibernate because they may use different classes/methods and settings at jdbc level which can influence the exception details. What matters is that at Db2 level both experienced SQLCODE=-911 with SQLERRMC=68 regardless of the exception-name they report for the lock-timeout.
You can get a lock-timeout on statements like UPDATE or DELETE or INSERT or SELECT (and others including DDL and commands), depending on many factors.
All lock-timeouts have one thing in common: one transaction waited too long and got rolled-back because another transaction did not commit quickly enough.
Lock-timeout diagnosis and Lock-Timeout avoidance are different topics.
The length of time to wait for a lock can be set at database level, connection level, or statement level according to what design is chosen, including mixing these. You can also adjust how Db2 behaves for locking by adjusting database parameters like CUR_COMMIT, LOCK_TIMEOUT and by adjusting isolation-level at statement-level or connection-level.
It's wise to ensure accurate diagnosis before thinking about avoidance.
As you are running Db2-LUW v10.5.0.9, consider careful study of this page and all related links:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/t0055072.html
There are many situations that can lead to a lock timeout, so it's better to know exactly which situation is relevant for your case(s).
Avoiding lock-conflicts is a matter of both configuration and transaction design so that is a bigger topic. The configuration can be at Db2 level or at application layer or both.
Sometimes bugs cause lock-timeouts, for example when app-server threads have a database-connection that is hung and has not committed and is not being cleaned up correctly by the application.
You should diagnose the participants in the lock timeout. There are different ways to do lock-conflict diagnosis on Db2-LUW so choose the one that works for you.
One simple diagnosis tool that still works on V10.5.0.9 is to use the Db2-registry variable DB2_CAPTURE_LOCKTIMEOUT=ON, event though the method is deprecated. You can set this variable (and unset it) on the fly without needing any service-outage. So if you have a recreatable scenario that results in SQLCODE=-911 SQLERRMC=68 (lock timeout), you can switch on this variable, repeat the test, then switch off the variable. If the variable is switched on, and a lock timeout happens, Db2 will write a new text file containing information about the participants in the locking situation showing details that help you understand what is happening and that lets you consider ways to resolve the issue when you have enough facts. You don't want to keep this variable permanently set because it can impact perormance and fill up the Db2 diagnostics file-system if you get a lot of lock-timeouts. So you have to be careful. Read about this variable in the Knowledge Center at this page:
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.regvars.doc/doc/r0005657.html
You diagnose the lock-timeout by careful study of the contents of these files, although of course it's necessary to understand the details also. This is a regular DBA activity.
Another method is to use db2pdcfg -catch with a custom db2cos script, to decide what to do after Db2 throws the -911. This needs scripting skills and it lets you decide exactly what diagnostics to collect after the -911 and where to store those diagnostics.
Another method which involves much more work but potentially pays more dividends is to use an event monitor for locking. The documentation is at:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0054074.html
Be sure to study the "Related concepts" and "Related tasks" pages also.
I am fairly new to mongo, so what I'm trying to achieve here might not be possible. My research so far is inconclusive...
My scenario is the following: I have an application which may have multiple instances running. These instances are processing some data, and when that processing fails, they write the ID of the failed item in a mongo collection ("error").
From time to time I want to retry processing those items. So, at fixed intervals, the application reads all the IDs from the collection, after which it deletes all the records. Now, this is an obvious race condition. Two instances may read the very same data, which would double the work to be done. Some IDs may also be missed like this.
My question would be the following: is there any way I can read and delete those records, in a distributed-atomic way? I was thinking about locking the collection, but for this I found no support so far in the java driver's documentation. I also tried to look for a findAndDrop() like method, but no luck so far.
I am aware of techniques like leader election, which most probably would solve this problem, but I wanted to see if it can be done in an easier way.
You can use BlockingQueue with multiple producer-single consumer approach, as you have multiple producer to produce ids and delete them with single consumer.
After all, I found no way to implement this with mongo.
However, since this is a heroku app, I stored the IDs in a Redis collection. This library I found implements a distributed Redis lock for Jedis, so this workaround solved my problem.
I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.
I'm using ehcache with hibernate and I'd like to use the blocking or SelfPopulating cache to avoid the issues presented in http://ehcache.org/documentation/constructs-0_5.html#mozTocId722946
An expensive operation is required, say rendering a large web page, which takes 30 seconds. The page is not considered stale until it is 5 minutes old. The page is hit very heavily and will be hit an average of 20 times per minute each 5 minutes.
Do I have to do this programmatically as http://ehcache.org/documentation/cache_decorators.html suggests or is there a declarative (in xml) way to do so?
thanks a lot
There is no way to do this in ehcache.xml since you must register the class with the CacheManager before the cache config is read.
So you must use the code mentioned in the docs and you must run this code before you do anything with Hibernate. A simple way to do this is to use the hibernate.cache.provider_class property which tells Hibernate a factory for the cache. Have a look at the source of an implementation which should give you an idea what you need to do.
Perhaps this question is not very clear but I didn't find better words for the heading, which describes the problem I like to deal with shortly.
I want to restrict access from a java desktop application to postgres.
The background:
Suppose you have 2 apps running and the first Application has to do some complex calculations on the basis of data in the db. To nail the immutability of the data in the db down i'd like to lock the db for insert, update and delete operations. On client side i think it's impossible to handle this behaviour satisfactory. So i thought about to use a little java-app on server-side which works like a proxy. So the task is to hand over CRUD (Create Read Update Delete) operations until it gets a command to lock. After a lock it rejects all CUD operations until it gets a unlock command from the locking client or a timeout is reached.
Questions:
What do you think about this approach?
Is it possible to lock a Database while using such an approach?
Would you prefer Java SE or Java EE as server-side java app?
Thanks in advance.
Why not use transactions in your operations? The database has features to maintain data integrity itself, rather than resorting to a brute operation such as a total-database lock.
This locking mechanism you describe sounds like it would be a pain for the users. Are the users initating the lock or is the software itself? If it's the users, you can expect some problems when Bob hits lock and then goes to lunch for 2 hours, forgetting to unlock the database first...
Indeed... there are a few proper ways to deal with this problem.
Just lock the tables in your code. Postgresql has commands for locking entire tables that you could run from your client application
Pick a transaction isolation level that doesn't have the problem of reading data that was committed after your txn started (BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ).
Of these, by far the most efficient is to use repeatable read as your isolation level. Postgres supports this quite efficiently, and it will give you a consistent view of the data without such heavy locking of the db.
Year i thought about transactions but in this case i can't use them. I'm sorry i didn't mention it exactly. So assume the follow easy case:
A calculation closes one area of responsibility. After calc a new one is opened and new inserts are dedicated to it. But while calculation-process a insert or update or delete is not allowed to the data of the (currently calculated) area of responsibility. More over a delete is strictly prohibited because data has to be archived.
So imo the use of transactions doesn't fit this requirement. Or did i miss sth.?
ps: (off topic) #jsight: i currently read that intenally postgres mapps "repeatable read" to "serializable", so using "repeatable read" gets you more restriction then you would perhaps expect.