I'm noticing some strange behavior in my app that smells like a lack of thread-safety. I'm working on reproducing it, but in the meantime I wanted to ensure I'm making the right assumptions about how the class that contains my endpoint handlers is used from a threading perspective. Most of what happens is opaque to me, because I'm not the one instantiating the class in the first place. To state the obvious, it must be some black magic in Endpoints.
MY ASSUMPTION
An instance of the class that holds my endpoint handlers is created for every single request that comes into my app. Based upon that assumption, it's ok for that class to have non-thread-safe objects that get used by my handlers.
MY FEAR
The instances of Endpoint handler classes are reused across requests.
So, which is it? Regardless of the answer, I think it would make sense for me to remove the ambiguity in my app and assume the worst, because I don't think I have any control over how Endpoints behaves. In my case, I'm creating a JDO/DataNucleus PersistenceManager (not thread-safe) when constructing the class housing my endpoint handlers. I should probably just create it in each handler as a local, or use a ThreadLocal.
I can probably also fashion a test to prove one or the other. I'll post back an answer to my own question if I do.
Related
I'm fairly new to Java and joining a project that leverages the DDD pattern (supposedly). I come from a strong python background and am fairly anal about unit test driven design. That said, one of the challenges of moving to Java is the testability of Service layers.
Our REST-like project stack is laid out as follows:
ServiceHandlers which handles request/response, etc and calls specific implementations of IService (eg. DocumentService)
DocumentService - handles auditing, permission checking, etc with methods such as makeOwner(session, user, doc)
Currently, something like DocumentService has repository dependencies injected via guice. In a public method like DocumentService.makeOwner, we want to ensure the session user is an admin as well as check if the target user is already an owner (leveraging the injected repositories). This results in some dupe code - one for both users involved to resolve the user and ensure membership, permissions, etc etc. To eliminate this redundant code, I want make a sort of super simpleisOwner(user, doc) call that I can concisely mock out for various test scenarios (such as throwing the exception when the user can't be resolved, etc). Here is where my googling fails me.
If I put this in the same class as DocumentService, I can't mock it while testing makeOwner in the same class (due to Mockito limitations) even though it somewhat feels like it should go here (option1).
If I put it in a lower class like DocumentHelpers, it feels slightly funny but I can easily mock it out. Also, DocumentHelpers needs the injected repository as well, which is fine with guice. (option 2)
I should add that there are numerous spots of this nature in our infant code base that are untestable currently because methods are non-statically calling helper-like methods in the same *Service class not used by the upper ServiceHandler class. However, at this stage, I can't tell if this is poor design or just fine.
So I ask more experienced Java developers:
Does introducing "Service Helpers" seem like a valid solution?
Is this counter to DDD principals?
If not, is there are more DDD-friendly naming convention for this aside from "Helpers"?
3 bits to add:
My googling has mostly come up with debates over "helpers" as static utility methods for stateless operations like date formatting, which doesn't fit my issue.
I don't want to use PowerMock since it breaks code coverage and is pretty ugly to use.
In python I'd probably call the "Service Helper" layer described above as internal_api, but that seems to have a different meaning in Java, especially since I need the classes to be public to unit test them.
Any guidance is appreciated.
That the user who initiates the action must be an admin looks like an application-level access control concern. DDD doesn't have much of an opinion about how you should do that. For testability and separation of concerns purposes, it might be a better idea to have some kind of separate non-static class than a method in the same service or a static helper though.
Checking that the future owner is already an owner (if I understand correctly) might be a different animal. It could be an invariant in your domain. If so, the preferred way is to rely on an Aggregate to enforce that rule. However, it's not clear from your description whether Document is an aggregate and if it or another aggregate contains the data needed to tell if a user is owner.
Alternatively, you could verify the rule at the Application layer level but it means that your domain model could go inconsistent if the state change is triggered by something else than that Application layer.
As I learn more about DDD, my question doesn't seem to be all that DDD related and more just about general hierarchy of the code structure and interactions of the layers. We ended up going with a separate DocumentServiceHelpers class that could be mocked out. This contains methods like isOwner that we can mock to return true or false as needed to test our DocumentService handling more easily. Thanks to everyone for playing along.
Within Java you can create an Observer-Observable set of classes in which the Observable can call the Observer. You can also in java explicitly reference an owning class instance in a child instance of another class and call the owning class instance's public functions.
Which is the better approach to take? Which is more beneficial in different scenarios, one example being Multi-Threading?
The Observer Pattern should be used whenever you don't know or don't care who is observing you. This is the key-concept in event-driven programming. You don't have any control of who is observing or what they do when you broadcast your events. Like you already mentioned in your comments, this is great for decoupling classes.
An example of a usage could be in a plugin-architecture:
You write a basic mail-server that broadcasts whenever a mail is received. You could then have a spam-plugin that validates the incoming mail, an auto-reply service that sends a reply, a forward service that redirects the mail and so on. Your plain mail server (the observable) doesn't know anything about spam, replies or forwarding. It just shouts out "Hey, a new mail is here" not knowing if anyone is listening. Then each of the plugins (the observers) does their own special thing, not knowing anything about each other. This system is very flexible and could easily be extended.
But the flexibility provided by the Observer Pattern is a two-edged sword. In the mail-server example, each plugin handles the incoming mail in total isolation from each other. This makes it impossible to setup rules like "don't reply or forward spam" because the observers doesn't know about each other - and even if they did, they wouldn't know in what order they are executed or has completed. So for the basic mail-server to solve this problem, It'll need to have references to the instances that does the spam/reply/forward actions.
So the Observer Pattern provides flexibility. You could easily add a new anti-virus plugin later, without having to modify your plain mail server code. The cost of this flexibility is loss of control of the flow of actions.
The reference approach gives you total control of the flow of actions. However you would need to modify your plain mail server code if you ever need to add support for an anti-virus plugin.
I hope this example gives you some ideas of the pros and cons of each approach.
In regards to multi-threading, one approach isn't favorable over the other.
If I want to update a cache every minute, or do something else every hour, where I should put my code (Java) ? As I think, not in the servlets. Can you help me with it?
You need to use cron jobs:
Scheduled Tasks With Cron for Java
This is exactly what they have been designed for.
The answer by Andrei Volgin is correct, and you need to pursue the link.
However, I want to address the 'not in the servlets' part of your questions. I think you are asking from a design perspective whether the code should reside inside the servlet class. I have answered this for myself recently.
The way Crons and Tasks are implemented by GAE, the code will be called via servlets, as these are background URL calls. So, theoretically, the code can be in the servlet class itself. If you are using a framework like Spring, you will probably have one entry point servlet and your own handlers/managers/services. In this case, you can write the code in the handler.
In my project, I created a single entry point servlet for all UI related processing. When I needed to implement the first Task Queue I created another entry point servlet for the queues/crons and then coded inside new handlers.
In general, your app design would be looking similar to
UI ---> Servlet Entry Point 1 ---> Generic Business Logic Handler ---> Specific Business Logic Handler --> System Services Handler ---> System Services
Instead of UI, now we have Queues/Crons calling the system, but generally, as was in my case, the cron was calling code that was more 'internal', for example, send-mail is implemented as a queued task which needs to directly call the System Service Handler bypassing two business logic layers. Similarly, ftp-today's-transactions is a cron that needs to directly call System Services bypassing the business logic layers.
It makes sense to NOT directly call System services from servlet entry point 1, just because you happen to have it at hand and configured in web.xml. It makes more sense to create another entry point for queues and crons which are more 'internal'.
The code then resides in the next level class (called Handlers, sometimes) And you can continue to maintain the hierarchy of layers if you are using packages to enforce it.
You will then not feel bad about calling something sys level directly from servlet level as this will be a specifically secure and separate access interface defined to be calling direct.
Just to make it more intuitive, my two servlets are called
Thin - Thin Http Interface on NudeBusinessObjects [All BOs extend this, and there is a non Http interface]
Thiq - Thiq Http Interface on Queues
Thin just ensures the required parameters are present and passes to handler. It always calls com.mybusiness classes which in turn call com.mysystem classes if they need to.
Thiq has more code, needs secure credentials even on automatic, does more complicated validations and generally has defined high level behaviour for failures across crons/tasks. It always calls com.mysystem classes.
Just my two cents. It isn't too big a thing and if you only keep one entry point and achieve the same effect by writing things in handlers, or even servlets, it doesn't cause end of the world. It just looks ugly when you make an architecture diagram.
To explain the use case in as few words as possible, we have a set of controllers (and associated services) that we want to deprecate. In this same project we have introduced fancy new stuff to take its place. For the simplicity sake, all the same REST endpoints are implemented by both sets of work. The only difference being the first set of controllers are namespaced to /v1/ and the latter /v2/.
To verify that all this new work actually works in production, the goal was to incrementally push traffic in that direction. Whether it is "make 5% of the traffic go to the new stuff" or "for all calls dealing with orders from Factory X go to the new stuff" or some other piece of routing logic.
I'm trying not to touch the old code (that would require retesting all of it), so figured I could just hijack how Spring maps a request to a controller. Initially I thought I could I extend RequestMappingHandlerMapping and override the lookupHandlerMethod() call, but while that method is protected in the Abstract base class the very important map of handlerMethods and urlMap are private: leaving me in the lurch when it comes to providing a differing but existing handlerMethod.
I'd like to leverage as much as I can from the AbstractHandlerMethodMapping and the RequestMappingHandlerMapping classes. Currently, without access to the handler maps, I'm debating whether to completely use them, but change the lookupPath string and create my own HttpServerRequest class where I've edited the request URI to match my "new" controllers.
Any ideas? I'm hoping there is something I am missing.
As far as I know, Servlet 3 spec introduces asynchronous processing feature. Among other things, this will mean that the same thread can and will be reused for processing another, concurrent, HTTP request(s). This isn't revolutionary, at least for people who worked with NIO before.
Anyway, this leads to another important thing: no ThreadLocal variables as a temporary storage for the request data. Because if the same thread suddenly becomes the carrier thread to a different HTTP request, request-local data will be exposed to another request.
All of that is my pure speculation based on reading articles, I haven't got time to play with any Servlet 3 implementations (Tomcat 7, GlassFish 3.0.X, etc.).
So, the questions:
Am I correct to assume that ThreadLocal will cease to be a convenient hack to keep the request data?
Has anybody played with any of Servlet 3 implementations and tried using ThreadLocals to prove the above?
Apart from storing data inside HTTP Session, are there any other similar easy-to-reach hacks you could possibly advise?
EDIT: don't get me wrong. I completely understand the dangers and ThreadLocal being a hack. In fact, I always advise against using it in similar context. However, believe it or not, thread context has been used far more frequently than you probably imagine. A good example would be Spring's OpenSessionInViewFilter which, according to its Javadoc:
This filter makes Hibernate Sessions
available via the current thread,
which will be autodetected by
transaction managers.
This isn't strictly ThreadLocal (haven't checked the source) but already sounds alarming. I can think of more similar scenarios, and the abundance of web frameworks makes this much more likely.
Briefly speaking, many people have built their sand castles on top of this hack, with or without awareness. Therefore Stephen's answer is understandable but not quite what I'm after. I would like to get a confirmation whether anyone has actually tried and was able to reproduce failing behaviour so this question could be used as a reference point to others trapped by the same problem.
Async processing shouldn't bother you unless you explcitly ask for it.
For example, request can't be made async if servlet or any of filters in request's filter chain is not marked with <async-supported>true</async-supported>. Therefore, you can still use regular practices for regular requests.
Of couse, if you actually need async processing, you need to use appropriate practices. Basically, when request is processed asynchronously, its processing is broken into parts. These parts don't share thread-local state, however, you can still use thread-local state inside each of that parts, though you have to manage the state manually between the parts.
(Caveat: I've not read the Servlet 3 spec in detail, so I cannot say for sure that the spec says what you think it does. I'm just assuming that it does ...)
Am I correct to assume that ThreadLocal will cease to be a convenient hack to keep the request data?
Using ThreadLocal was always a poor approach, because you always ran the risk that information would leak when a worker thread finished one request and started on another one. Storing stuff as attributes in the ServletRequest object was always a better idea.
Now you've simply got another reason to do it the "right" way.
Has anybody played with any of Servlet 3 implementations and tried using ThreadLocals to prove the above?
That's not the right approach. It only tells you about the particular behaviour of a particular implementation under the particular circumstances of your test. You cannot generalize.
The correct approach is to assume that it will sometimes happen if the spec says it can ... and design your webapp to take account of it.
(Fear not! Apparently, in this case, this does not happen by default. Your webapp has to explicitly enable the async processing feature. If your code is infested with thread locals, you would be advised not to do this ...)
Apart from storing data inside HTTP Session, are there any other similar easy-to-reach hacks you could possibly advise.
Nope. The only right answer is storing request-specific data in the ServletRequest or ServletResponse object. Even storing it in the HTTP Session can be wrong, since there can be multiple requests active at the same time for a given session.
NOTE: Hacks follow. Use with caution, or really just don't use.
So long as you continue to understand which thread your code is executing in, there's no reason you can't use a ThreadLocal safely.
try {
tl.set(value);
doStuffUsingThreadLocal();
} finally {
tl.remove();
}
It's not as if your call stack is switched out randomly. Heck, if there are ThreadLocal values you want to set deep in the call stack and then use further out, you can hack that too:
public class Nasty {
static ThreadLocal<Set<ThreadLocal<?>>> cleanMe =
new ThreadLocal<Set<ThreadLocal<?>>>() {
protected Set<ThreadLocal<?>> initialValue() {
return new HashSet<ThreadLocal<?>>();
}
};
static void register(ThreadLocal<?> toClean) {
cleanMe.get().add(toClean);
}
static void cleanup() {
for(ThreadLocal<?> tl : toClean)
tl.remove();
toClean.clear();
}
}
Then you register your ThreadLocals as you set them, and cleanup in a finally clause somewhere. This is all shameful wankery that you shouldn't probably do. I'm sorry I wrote it but it's too late :/
I'm still wondering why people use the rotten javax.servlet API to actually implement their servlets. What I do:
I have a base class HttpRequestHandler which has private fields for request, response and a handle() method that can throw Exception plus some utility methods to get/set parameters, attributes, etc. I rarely need more than 5-10% of the servlet API, so this isn't as much work as it sounds.
In the servlet handler, I create an instance of this class and then forget about the servlet API.
I can extend this handler class and add all the fields and data that I need for the job. No huge parameter lists, no thread local hacking, no worries about concurrency.
I have a utility class for unit tests that creates a HttpRequestHandler with mock implementations of request and response. This way, I don't need a servlet environment to test my code.
This solves all my problems because I can get the DB session and other things in the init() method or I can insert a factory between the servlet and the real handler to do more complex things.
You are psychic ! (+1 for that)
My aim is ... to get a proof this has stopped working in Servlet 3.0 container
Here is the proof that you were asking for.
Incidentally, it is using the exact same OEMIV filter that you mentioned in your question and, guess what, it breaks Async servlet processing !
Edit: Here is another proof.
One solution is to not use ThreadLocal but rather use a singleton that contains a static array of the objects you want to make global. This object would contain a "threadName" field that you set. You first set the current thread's name (in doGet, doPost) to some random unique value (like a UUID), then store it as part of the object that contains the data you want stored in the singleton. Then whenever some part of your code needs to access the data, it simply goes through the array and checks for the object with the threadName that is currently running and retrieve the object. You'll need to add some cleanup code to remove the object from the array when the http request completes.