We're seeing OptimisticLockingExceptions in a Camunda process with the following Scenario:
The process consists of one UserTask followed by one Gateway and one ServiceTask. The UserTask executes
runtimeService.setVariable(execId, "object", out);`.
taskService.complete(taskId);
The following ServiceTask uses "object" as input variable (does not modify it) and, upon completion throws said OptimisticLockingException. My problem seems to originate from the fact, that taskService.complete() immediately executes the ServiceTask, prior to flushing the variables set in the UserTask.
I've had another, related issue, which occured, when in one UserTask I executed runtimeService.setVariable(Map<Strong, Boolean>) and tried to access the members of the Map as transition-guards in a gateway following that UserTask.
I've found the following article: http://forums.activiti.org/content/urgenterror-updated-another-transaction-concurrently which seems somehow related to my issue. However, I'm not clear on the question whether this is (un)wanted behaviour and how I can access a DelegateExecution-Object from a UserTask.
After long and cumbersome search we think, we have nailed two issues with camunda which (added together) lead to the Exception from the original question.
Camunda uses equals on serialized objects (represented by byte-arrays) to determine, whether process variables have to be written back to the database. This even happens when variables are only read and not set. As equals is defined by pointer-identity on arrays, a serializabled-Object is never determined "equal" if it has been serialized more than once. We have found, that a single runtimeService.setVariable() leads to four db-updates at the time of completeTask() (One for setVariable itself, the other three for various camunda-internal validation actions). We think this is a bug and will file a bug report to camunda.
Obviously there are two ways to set variables. One way is to use runtimeService.setVariable(), the other is to use delegateTask/delegateExecution.setVariable(). There is some flaw when using both ways at the same time. While we cannot simplify our setup to a simple unit-test, we have identified several components which have to be involved for the Exception to occur:
2.1 We are using a TaskListener to set up some context-variables at the start of Tasks this task-listener used runtimeService.setVariable() instead of delegateTask.setVariable(). After we changed that, the Exception vanished.
2.2 We used (and still use) runtimeService.setVariable() during Task-Execution. After we switched to completeTask(Variables) and omitted the runtimeService.setVariable() calls, the Exception vanished as well. However, this isn't a permanent solution as we have to store process variables during task execution.
2.3 The exception occured only in combination when process variables where read or written by the delegate<X>.getVariable() way (either by our code or implicitly in the camunda implementation of juel-parsing with gateways and serviceTasks or completeTask(HashMap))
Thanks a lot for all your input.
You could consider using an asynchronous continuation on the service task. This will make sure that the service task is executed inside a new transaction / command context.
Consider reading the camunda documentation on transactions and asynchronous continuations.
The DelegateExecution object is meant for providing service task (JavaDelegate) implementations access to process instance variables. It is not meant to be used from a User Task.
Related
This may be a stupid question. However, I would like to know if I have something like this - rdd.mapPartitions(func). Should the logic in func be threadsafe?
Thanks
The short answer is no, it does not have to be thread safe.
The reason for this is that spark divides the data between partitions. It then creates a task for each partition and the function you write would run within that specific partition as a single threaded operation (i.e. no other thread would access the same data).
That said, you have to make sure you do not create thread "unsafety" manually by accessing resources which are not the RDD data. For example, if you create a static object and access that, it might cause issues as multiple tasks might run in the same executor (JVM) and access it as well. That said, you shouldn't be doing something like that to begin with unless you know exactly what you are doing...
Any function passed to the mapPartitions (or any other action or transformation) has to be thread safe. Spark on JVM (this is not necessarily true for guest languages) uses executor threads and doesn't guarantee any isolation between individual tasks.
This is particularly important when you use resources which are not initialized in the function, but passed with the closure like for example objects initialized in the main function, but referenced in the function.
It goes without saying you should not modify any of the arguments unless it is explicitly allowed.
When you do "rdd.mapPartitions(func)", the func may actually execute in a different jvm!!! Thread does not have significance across JVM.
If you are running in local mode, and using global state or thread unsafe functions, the job might work as expected but the behaviours is not defined or supported.
I work with the camunda BPM process engine and think it is important to understand some concepts. At the moment I struggle a little bit with the concept of Process Executions and Variable Scopes.
To understand what happens during a process execution I designed the following demo process and marked the activities inside the same execution with the same color. I could do this because I debugged the execution id inside each activity.
I understand most of it. What surprised me is that an input parameter opens a new execution (Task 1.3). Thanks meyerdan for clarification on this.
What I do not understand is that "Task 2.2" is inside the same execution of "Task 2.1". A quote from the camunda documentation about Executions is
Internally, the process engine creates two concurrent executions
inside the process instance, one for each concurrent path of
execution.
So I would have exepcted that Task 2.1 / Task 2.2 and Task 3.1 each live inside an own execution.
Is anyone able to explain this?
My main motivation to understand this is the impact it has on process variable scopes. I did not figure out so far what the Java API methods
VariableScope#getVariable / VariableScope#setVariable
VariableScope#getVariableLocal / VariableScope#setVariableLocal
really do. I first thought that the "Local" variants only refer to the current execution and the other ones only refer to the process instance execution - but that seems to be only half of the truth. These are getters and setters where I miss JavaDoc painfully ;-) Bonus points for also explaining this!
Thanks!
You will find the process in a Maven project with an executable JUnit test on GitHub.
Have a look at Variable Scopes and Variable Visibility
A quote from the documentation (Java Object API) about the setVariable method:
Note that this code sets a variable at the highest possible point in
the hierarchy of variable scopes. This means, if the variable is
already present (whether in this execution or any of its parent
scopes), it is updated. If the variable is not yet present, it is
created in the highest scope, i.e. the process instance. If a variable
is supposed to be set exactly on the provided execution, the local
methods can be used.
I am confused about the applicability of multi threading in general...
I am creating an application which executes some code which has been saved in xml format. The work is to use apache http client and retrieve some data from websites...More than 1 website can be visited by one block of code in xml...
Now I want that if 2 users have created their own respective codes and saved them in XML, then each user's 'job' (ie block of code in xml format) runs in a separate thread.
I have with me code to execute one user's code...Now I want that multiple persons' code can be run in parallel. But I have some doubts--
(1) The Apache HTTP client provides a way of multithreaded communication, currently I am simply using the default HTTP client- this same client can be made to visit multiple websites, one after the other- as per code block in xml. Am I correct in thinking that I do not need to change my code so that it uses the recommended multithreaded communication?
(2) I am thinking of creating a servlet that when invoked, executes one block of xml code. So to execute 2 blocks of code as given by 2 different users, I will have to invoke this servlet twice. I am going to deploy this application using Amazon Elastic Beanstalk, so what I am confused about is, do I need to use multi threading at all in my program? Can I not simply invoke the existing code (which is used to execute one block of code at a time) from the servlet? And I do want to keep processing of the different blocks of XML code separate from each other, so I dont think I should use multi threading here.. Am I correct in my assumption?
Running it one after the other as per your 1st option will not be considered 'concurrent' .
Coming to the servlet method , the way you describe it will work concurrently , but you also need to think about how many users concurrently ? Since for each user , there would be a separate request , there would be some network latency involved for multiple calls. You need to think about all these factors before going ahead with this option
Since you have the code for one user's job , you can define a thread class which has userid as an attribute. In the run() method call the code for a particular user's job.
Now create two threads and set the appropriate userid for each thread and spawn them off.
If the number of users are more , you can look at using Java's Thread Pool Executor .
Since you are going to use a servlet container then it's going to manage multithreading for you. Every servlet request will be executed in a different thread. In that scenario one servlet call would execute on block of code from provided XML in a single threaded manner. If there are several sites declared per block of code they would be visited serially. Other user in the same time may call the same server with other block of code running in parallel with the first one.
We have a system that uses threading so that it can concurrently handle different bits of functionality in parallel. We would like to find a way to tie all log entries for a particular "transaction" together. Normally, one might use 'threadName' to gather these together, but clearly that fails in a multithreaded situation.
Short of passing a 'transaction key' down through every method call, I can't see a way to tie these together. And passing a key into every single method is just ugly.
Also, we're kind of tied to Java logging, as our system is built on a modified version of it. So, I would be interested in other platforms for examples of what we might try, but switching platforms is highly unlikely.
Does anyone have any suggestions?
Thanks,
Peter
EDIT: Unfortunately, I don't have control over the creation of the threads as that's all handled by a workflow package. Otherwise, the idea of caching the ID once for each thread (on ThreadLocal maybe?) then setting that on the new threads as they are created is a good idea. I may try that anyway.
You could consider creating a globally-accessible Map that maps a Thread's name to its current transaction ID. Upon beginning a new task, generate a GUID for that transaction and have the Thread register itself in the Map. Do the same for any Threads it spawns to perform the same task. Then, when you need to log something, you can simply lookup the transaction ID from the global Map, based on the current Thread's name. (A bit kludgy, but should work)
This is a perfect example for AspectJ crosscuts. If you know the methods that are being called you can put interceptors on them and bind dynamically.
This article will give you several options http://www.ibm.com/developerworks/java/library/j-logging/
However you mentioned that your transaction spans more than one thread, take a look at how log4j cope with binding additional information to current thread with MDC and NDC classes. It uses ThreadLocal as you were advised before, but interesting thing is how log4j injects data into log messages.
//In the code:
MDC.put("RemoteAddress", req.getRemoteAddr());
//In the configuration file, add the following:
%X{RemoteAddress}
Details:
http://onjava.com/pub/a/onjava/2002/08/07/log4j.html?page=3
http://wiki.apache.org/logging-log4j/NDCvsMDC
How about naming your threads to include the transaction ID? Quick and Dirty, admittedly, but it should work (until you need the thread name for something else or you start reusing threads in a thread pool).
If you are logging, then you must have some kind of logger object. You should have a spearate instance in each thread.
add a method to it called setID(String id).
When it is initialized in your thread, set a unique ID using the method.
prepend the set iD to each log entry.
A couple people have suggested answers that have the newly spawned thread somehow knowing what the transaction ID is. Unless I'm missing something, in order to get this ID into the newly spawned thread, I would have to pass it all the way down the line into the method that spawns the thread, which I'd rather not do.
I don't think you need to pass it down, but rather the code responsible for handing work to these threads needs to have the transactionID to pass. Wouldn't the work-assigner have this already?
Suppose that I have a method called doSomething() and I want to use this method in a multithreaded application (each servlet inherits from HttpServlet).I'm wondering if it is possible that a race condition will occur in the following cases:
doSomething() is not staic method and it writes values to a database.
doSomething() is static method but it does not write values to a database.
what I have noticed that many methods in my application may lead to a race condition or dirty read/write. for example , I have a Poll System , and for each voting operation, a certain method will change a single cell value for that poll as the following:
[poll_id | poll_data ]
[1 | {choice_1 : 10, choice_2 : 20}]
will the JSP/Servlets app solve these issues by itself, or I have to solve all that by myself?
Thanks..
It depends on how doSomething() is implemented and what it actually does. I assume writing to the database uses JDBC connections, which are not threadsafe. The preferred way of doing that would be to create ThreadLocal JDBC connections.
As for the second case, it depends on what is going on in the method. If it doesn't access any shared, mutable state then there isn't a problem. If it does, you probably will need to lock appropriately, which may involve adding locks to every other access to those variables.
(Be aware that just marking these methods as synchronized does not fix any concurrency bugs. If doSomething() incremented a value on a shared object, then all accesses to that variable need to be synchronized since i++ is not an atomic operation. If it is something as simple as incrementing a counter, you could use AtomicInteger.incrementAndGet().)
The Servlet API certainly does not magically make concurrency a non-issue for you.
When writing to a database, it depends on the concurrency strategy in your persistence layer. Pessimistic locking, optimistic locking, last-in-wins? There's way more going on when you 'write to a database' that you need to decide how you're going to handle. What is it you want to have happen when two people click the button at the same time?
Making doSomething static doesn't seem to have too much bearing on the issue. What's happening in there is the relevant part. Is it modifying static variables? Then yes, there could be race conditions.
The servlet api will not do anything for you to make your concurrency problems disappear. Things like using the synchronized keyword on your servlets are a bad idea because you are basically forcing your threads to be processed one at a time and it ruins your ability to respond quickly to multiple users.
If you use Spring or EJB3, either one will provide threadlocal database connections and the ability to specify transactions. You should definitely check out one of those.
Case 1, your servlet uses some code that accesses a database. Databases have locking mechanisms that you should exploit. Two important reasons for this: the database itself might be used from other applications that read and write that data, it's not enough for your app to deal with contending with itself. And: your own application may be deployed to a scaled, clustered web container, where multiple copies of your code are executing on separate machines.
So, there are many standard patterns for dealing with locks in databases, you may need to read up on Pessimistic and Optimistic Locking.
The servlet API and JBC connection pooling gives you some helpful guarantees so that you can write your servlet code without using Java synchronisation provided your variables are in method scope, in concept you have
Start transaction (perhaps implicit, perhaps on entry to an ejb)
Get connection to DB ( Gets you a connection from pool, associated with your tran)
read/write/update code
Close connection (actually keeps it for your thread until your transaction commits)
Commit (again maybe implictly)
So your only real issue is dealing with any contentions in the DB. All of the above tends to be done rather more nicely using things such as JPA these days, but under the covers thats more or less what's happening.
Case 2: static method, this presumably implies that you now keep everything in a memory structure. This (barring remote invocation of some sort) impies a single JVM and you managing your own locking. Should your JVM or machine crash I guess you lose your data. If you care about your data then using a DB is probably better.
OR, how about a completely other approach: servlet simply records the "vote" by writing a message to a persistent JMS queue. Have some other processes pick up the votes from the queue and adds them up. You won't give immediate feedback to the voter this way, but you decouple the user's experience from the actual (in similar scenarios) quite complex processing .
I thing that the best solution for your problem is to use something like "synchronized" keyword and wait/notify!