We have a system that uses threading so that it can concurrently handle different bits of functionality in parallel. We would like to find a way to tie all log entries for a particular "transaction" together. Normally, one might use 'threadName' to gather these together, but clearly that fails in a multithreaded situation.
Short of passing a 'transaction key' down through every method call, I can't see a way to tie these together. And passing a key into every single method is just ugly.
Also, we're kind of tied to Java logging, as our system is built on a modified version of it. So, I would be interested in other platforms for examples of what we might try, but switching platforms is highly unlikely.
Does anyone have any suggestions?
Thanks,
Peter
EDIT: Unfortunately, I don't have control over the creation of the threads as that's all handled by a workflow package. Otherwise, the idea of caching the ID once for each thread (on ThreadLocal maybe?) then setting that on the new threads as they are created is a good idea. I may try that anyway.
You could consider creating a globally-accessible Map that maps a Thread's name to its current transaction ID. Upon beginning a new task, generate a GUID for that transaction and have the Thread register itself in the Map. Do the same for any Threads it spawns to perform the same task. Then, when you need to log something, you can simply lookup the transaction ID from the global Map, based on the current Thread's name. (A bit kludgy, but should work)
This is a perfect example for AspectJ crosscuts. If you know the methods that are being called you can put interceptors on them and bind dynamically.
This article will give you several options http://www.ibm.com/developerworks/java/library/j-logging/
However you mentioned that your transaction spans more than one thread, take a look at how log4j cope with binding additional information to current thread with MDC and NDC classes. It uses ThreadLocal as you were advised before, but interesting thing is how log4j injects data into log messages.
//In the code:
MDC.put("RemoteAddress", req.getRemoteAddr());
//In the configuration file, add the following:
%X{RemoteAddress}
Details:
http://onjava.com/pub/a/onjava/2002/08/07/log4j.html?page=3
http://wiki.apache.org/logging-log4j/NDCvsMDC
How about naming your threads to include the transaction ID? Quick and Dirty, admittedly, but it should work (until you need the thread name for something else or you start reusing threads in a thread pool).
If you are logging, then you must have some kind of logger object. You should have a spearate instance in each thread.
add a method to it called setID(String id).
When it is initialized in your thread, set a unique ID using the method.
prepend the set iD to each log entry.
A couple people have suggested answers that have the newly spawned thread somehow knowing what the transaction ID is. Unless I'm missing something, in order to get this ID into the newly spawned thread, I would have to pass it all the way down the line into the method that spawns the thread, which I'd rather not do.
I don't think you need to pass it down, but rather the code responsible for handing work to these threads needs to have the transactionID to pass. Wouldn't the work-assigner have this already?
Related
Please note: Although this question mentions Java, I think it's an OOP/concurrency problem at heart and can probably be answered by anyone with significant programming experience.
So I'm building a ConfigurationLoader that will read Configuration POJO from a remote service and make it available via an API. A few things:
As soon as the ConfigurationLoader is asked for the Configuration the first time, a background thread (worker) will ping the remote service every, say, 30 seconds, for updates and then apply those updates to the Configuration instance; and
If the Configuration is modified, the background worker will be notified of the change and will push the "new" Configuration to the remote service;
Both the ConfigurationLoader and the Configuration must be thread-safe
So when I think "thread safety" the first thing I think of is immutability, which leads me towards excellent projects like Immutables. The problem is that Configuration can't be immutable because we need to be able to change it on the client-side and then let the loader ripple those changes back to the server.
My next thought was to try and make both ConfigurationLoader and Configuration singletons, but the problem is there is that the ConfigurationLoader takes a lot of arguments to instantiate it, and as this excellent answer points out, a singleton that takes arguments in construction is not a true singleton.
// Groovy pseudo-code
class Configuration {
// Immutable? Singleton? Other?
}
class ConfigurationLoader {
// private fields
Configuration configuration
ConfigurationLoader(int fizz, boolean buzz, Foo foo, List<Bar> bars) {
super()
this.fizz = fizz
this.buzz = buzz;
// etc.
}
Configuration loadConfiguration() {
if(configuration == null) {
// Create background worker that will constantly update
// 'configuration'
}
}
}
What are my options here? How do I create both the loader and the config to be thread-safe, where the config is changeable by the client-side (on-demand) or asynchronously by a background worker thread?
The problem is that Configuration can't be immutable because we need to be able to change it
It can still be immutable, you just create a new one for every change ("copy-on-write").
What are my options here?
First thing you'll have to think about: How do you want to react to configuration changes in concurrently running tasks? Basically, you have three options:
Ignore configuration change until the task is done
I.e. some directory your codes writes files to - finish writing the current file to the current target dir, put new files in the new dir. Writing some bytes into /new/path/somefile won't be a good idea if you never created that file. Your best option for this is probably an immutable Configuration object that you store in a field of your task instance (i.e. at task creation - in that case you can also make that field final for clarity). This usually works best if your code is designed as a collection of isolated small tasks.
Pros: Config never changes within a single task, so this is simple to get tread-safe and easy to test.
Cons: Config updates never make it to already running tasks.
Make your tasks check for config changes
I.e. your task regularly sends some data to an email address. Have a central storage for your config (like in your pseudo-code) and re-fetch it in some interval (i.e. between collecting data and sending the mail) from your task code. This usually works best for long-running/permanent tasks.
Pros: Config can change during a task run, but still somewhat simple to get safe - just make sure you have some memory barrier in place for reading the config (make your private configuration field volatile, use an AtomicReference, guard it with a lock, whatever).
Cons: Task code will be harder to test than first option. Config values may still be outdated between checks.
Signal config changes to your tasks
Basically option two, but the other way around. Whenever config changes, interrupt your tasks, handle the interrupt as a "config needs updating" message, continue/restart with new config.
Pros: Config values are never outdated.
Cons: This is the hardest to get right. Some might even argue that you cannot get this right, because interruption should only be used for task abortion. There is only very minor benefits (if at all) over the second option if you place your task's update checks at the right spots. Don't do this if you don't have a good reason to.
You need a singleton to pull this off, but your singleton isn't the immutable thing. Its the threadsafe thing. Make your singleton (Configuration) contain a simple Properties object or something and protect access to this with synchronization. Your Configuration Loader somehow knows of this Configuration singleton and functions to set, under synchronization, new instances of the Properties object when it detects change.
I'm pretty sure Apache Commons Configuration does something like this.
We're seeing OptimisticLockingExceptions in a Camunda process with the following Scenario:
The process consists of one UserTask followed by one Gateway and one ServiceTask. The UserTask executes
runtimeService.setVariable(execId, "object", out);`.
taskService.complete(taskId);
The following ServiceTask uses "object" as input variable (does not modify it) and, upon completion throws said OptimisticLockingException. My problem seems to originate from the fact, that taskService.complete() immediately executes the ServiceTask, prior to flushing the variables set in the UserTask.
I've had another, related issue, which occured, when in one UserTask I executed runtimeService.setVariable(Map<Strong, Boolean>) and tried to access the members of the Map as transition-guards in a gateway following that UserTask.
I've found the following article: http://forums.activiti.org/content/urgenterror-updated-another-transaction-concurrently which seems somehow related to my issue. However, I'm not clear on the question whether this is (un)wanted behaviour and how I can access a DelegateExecution-Object from a UserTask.
After long and cumbersome search we think, we have nailed two issues with camunda which (added together) lead to the Exception from the original question.
Camunda uses equals on serialized objects (represented by byte-arrays) to determine, whether process variables have to be written back to the database. This even happens when variables are only read and not set. As equals is defined by pointer-identity on arrays, a serializabled-Object is never determined "equal" if it has been serialized more than once. We have found, that a single runtimeService.setVariable() leads to four db-updates at the time of completeTask() (One for setVariable itself, the other three for various camunda-internal validation actions). We think this is a bug and will file a bug report to camunda.
Obviously there are two ways to set variables. One way is to use runtimeService.setVariable(), the other is to use delegateTask/delegateExecution.setVariable(). There is some flaw when using both ways at the same time. While we cannot simplify our setup to a simple unit-test, we have identified several components which have to be involved for the Exception to occur:
2.1 We are using a TaskListener to set up some context-variables at the start of Tasks this task-listener used runtimeService.setVariable() instead of delegateTask.setVariable(). After we changed that, the Exception vanished.
2.2 We used (and still use) runtimeService.setVariable() during Task-Execution. After we switched to completeTask(Variables) and omitted the runtimeService.setVariable() calls, the Exception vanished as well. However, this isn't a permanent solution as we have to store process variables during task execution.
2.3 The exception occured only in combination when process variables where read or written by the delegate<X>.getVariable() way (either by our code or implicitly in the camunda implementation of juel-parsing with gateways and serviceTasks or completeTask(HashMap))
Thanks a lot for all your input.
You could consider using an asynchronous continuation on the service task. This will make sure that the service task is executed inside a new transaction / command context.
Consider reading the camunda documentation on transactions and asynchronous continuations.
The DelegateExecution object is meant for providing service task (JavaDelegate) implementations access to process instance variables. It is not meant to be used from a User Task.
This is a recent interview question to my friend:
How would you handle a situation where users enter some data in the screen and let's say 5 of them clicked on the Submit button *the SAME time ?*
(By same time,the interviewer insisted that they are same to the level of nanoseconds)
My answer was just to make the method that handles the request synchronized and only one request can acquire the lock on the method at a given time.
But it looks like the interviewer kept insisting there was a "better way" to handle it .
One other approach to handle locking at the database level, but I don't think it is "better".
Are there any other approaches. This seems to be a fairly common problem.
If you have only one network card, you can only have one request coming down it at once. ;)
The answer he is probably looking for is something like
Make the servlet stateless so they can be executed concurrently.
Use components which allow thread safe concurrent access like Atomic* or Concurrent*
Use locks only where you obsolutely have to.
What I prefer to do is to make the service so fast it can respond before the next resquest can come in. ;) Though I don't have the overhead of Java EE or databases to worry about.
Does it matter that they click at the same time e.g. are they both updating the same record on a database?
A synchronized method will not cut it, especially if it's a webapp distributed amongst multiple JVMs. Also the synchronized method may block, but then the other threads would just fire after the first completes and you'd have lost writes.
So locking at database level seems to be the option here i.e. if the record has been updated, report an error back to the users whose updates were serviced after the first.
You do not have to worry about this as web server launches each request in isolated thread and manages it.
But if you have some shared resource like some file for logging then you need to achieve concurrency and put thread lock on it in request and inter requests
We have a Java application that processes requests. Naturally, there is a thread pool with threads named with some insignificant names like "processor-1", "processor-2" etc. We are wondering whether it's a good idea to rename the processing thread each time it gets a request to a name that would indicate which request is being processed.
I have a feeling that something is wrong about it, but cannot think of a good reason. I can remember several cases of analyzing multi-threading issues from thread dumps and it was very important to me to know whether I see the same thread in two consecutive dumps or not. However, I realize that I could look at the thread ID instead. In log4j logs it's a common practice to write the thread name and I remember many cases of filtering by it. But still I'm not sure I have a good "line of defense" here.
What is the common practice? Is it a good idea to keep renaming threads?
There are related questions like Renaming Threads in Java or Should threads in Java be named for easier debugging? but they do not address this specific issue.
It is a good idea.
The only purpose of thread name is for diagnosis, so you are encouraged to name/rename threads the best way that helps you understand what's going on at runtime.
Especially for threads in a thread pool. Their identities are meaningless. It's functionally equivalent to have a new thread per task. We care only about the task names, and thread name is a convenient place to store a task name.
I would give each thread a meaningful, but not dynamically changing, name.
Storing information about which request is being processed sounds fine, but storing that in the thread name smells. You could log that information, or store it in a ThreadLocal, or...?
Suppose that I have a method called doSomething() and I want to use this method in a multithreaded application (each servlet inherits from HttpServlet).I'm wondering if it is possible that a race condition will occur in the following cases:
doSomething() is not staic method and it writes values to a database.
doSomething() is static method but it does not write values to a database.
what I have noticed that many methods in my application may lead to a race condition or dirty read/write. for example , I have a Poll System , and for each voting operation, a certain method will change a single cell value for that poll as the following:
[poll_id | poll_data ]
[1 | {choice_1 : 10, choice_2 : 20}]
will the JSP/Servlets app solve these issues by itself, or I have to solve all that by myself?
Thanks..
It depends on how doSomething() is implemented and what it actually does. I assume writing to the database uses JDBC connections, which are not threadsafe. The preferred way of doing that would be to create ThreadLocal JDBC connections.
As for the second case, it depends on what is going on in the method. If it doesn't access any shared, mutable state then there isn't a problem. If it does, you probably will need to lock appropriately, which may involve adding locks to every other access to those variables.
(Be aware that just marking these methods as synchronized does not fix any concurrency bugs. If doSomething() incremented a value on a shared object, then all accesses to that variable need to be synchronized since i++ is not an atomic operation. If it is something as simple as incrementing a counter, you could use AtomicInteger.incrementAndGet().)
The Servlet API certainly does not magically make concurrency a non-issue for you.
When writing to a database, it depends on the concurrency strategy in your persistence layer. Pessimistic locking, optimistic locking, last-in-wins? There's way more going on when you 'write to a database' that you need to decide how you're going to handle. What is it you want to have happen when two people click the button at the same time?
Making doSomething static doesn't seem to have too much bearing on the issue. What's happening in there is the relevant part. Is it modifying static variables? Then yes, there could be race conditions.
The servlet api will not do anything for you to make your concurrency problems disappear. Things like using the synchronized keyword on your servlets are a bad idea because you are basically forcing your threads to be processed one at a time and it ruins your ability to respond quickly to multiple users.
If you use Spring or EJB3, either one will provide threadlocal database connections and the ability to specify transactions. You should definitely check out one of those.
Case 1, your servlet uses some code that accesses a database. Databases have locking mechanisms that you should exploit. Two important reasons for this: the database itself might be used from other applications that read and write that data, it's not enough for your app to deal with contending with itself. And: your own application may be deployed to a scaled, clustered web container, where multiple copies of your code are executing on separate machines.
So, there are many standard patterns for dealing with locks in databases, you may need to read up on Pessimistic and Optimistic Locking.
The servlet API and JBC connection pooling gives you some helpful guarantees so that you can write your servlet code without using Java synchronisation provided your variables are in method scope, in concept you have
Start transaction (perhaps implicit, perhaps on entry to an ejb)
Get connection to DB ( Gets you a connection from pool, associated with your tran)
read/write/update code
Close connection (actually keeps it for your thread until your transaction commits)
Commit (again maybe implictly)
So your only real issue is dealing with any contentions in the DB. All of the above tends to be done rather more nicely using things such as JPA these days, but under the covers thats more or less what's happening.
Case 2: static method, this presumably implies that you now keep everything in a memory structure. This (barring remote invocation of some sort) impies a single JVM and you managing your own locking. Should your JVM or machine crash I guess you lose your data. If you care about your data then using a DB is probably better.
OR, how about a completely other approach: servlet simply records the "vote" by writing a message to a persistent JMS queue. Have some other processes pick up the votes from the queue and adds them up. You won't give immediate feedback to the voter this way, but you decouple the user's experience from the actual (in similar scenarios) quite complex processing .
I thing that the best solution for your problem is to use something like "synchronized" keyword and wait/notify!