Should heavy-use objects be created on application/session level in Coldfusion?

Should heavy-use objects be created on application/session level in Coldfusion? - java

I'm running Coldfusion8/MySQL 5.0.88.
My applications main feature is a search function, which on submit triggers an AJAX request calling a cfc-method. The method assembles the HTML, gzips it and returns gzipped HTML as Ajax response.
This is the gzip part:
<cfscript>
var result="";
var text=createObject("java","java.lang.String").init(arguments[1]);
var dataStream=createObject("java","java.io.ByteArrayOutputStream").init();
var compressDataStream=createObject("java","java.util.zip.GZIPOutputStream").init(dataStream);
compressDataStream.write(text.getBytes());
compressDataStream.finish();
compressDataStream.close();
</cfscript>
I am a little reluctant regarding the use of cfobject here, especially since this script will be called over and over again by every user.
Question:
Would it increase performance if I create the object on the application or session level or at least check for the existence of the object before re-creating it. What's the best way to handle this?

If your use of objects is like what's in the code snippet in the question, I'd not put anything into any scope longer-lived than request. The reasons being:
The objects you are instantiating are not re-usable (Strings are immutable, and the output streams don't look re-usable either)
Even if they were re-usable, the objects in question aren't thread-safe. They can't be shared between concurrent requests, so application scope isn't appropriate and actually session scope probably isn't safe either as concurrent requests for the same session can easily occur.
The objects you're using there are probably very low overhead to create, so there'd be little benefit to trying to cache them, if you could.
If you have objects that are really resource intensive, then caching and pooling them can make sense (e.g. Database Connections), but it's considerable effort to get right, so you need to be sure that you need it first.

Related

Implementing shared data structure (e.g. cache) as a concurrentHashMap in a singleton bean

I am implementing an HTTP API using the Spring MVC framework.
I want to store some data between requests and between sessions. The data needs to be readable and modifiable by multiple requests in completely independent sessions, but it only needs to exist in-memory while the application is running, it does not need to be persisted to a database, and it does not need to be shared between any scaled-up, multi-node, multi-process server backend design, just one per (e.g.) Tomcat instance is completely fine. Consider for example a cache or something logging some short-lived metrics about the application-specific data coming in through the requests.
I am assuming the usual way would be to use an in-memory database or something like Redis.
However, this being my first venture into web stuff and coming from c++ parallel computing personally, this seems like an extremely over-engineered and inefficient solution to me.
Can I not just create a singleton bean containing a ConcurrentHashMap of my required types, inject it as a dependency into my Controller, and be done with it? I never see anyone talk about this anywhere, even though it seems to be the simplest solution by far to me. Is there something about how Spring MVC or Tomcat works that makes this impossible?

Basically, yes. "A singleton ConcurrentHashMap" can be used as a cache.
But, I'd go with something that works like a map but has an API that is specifically tailored to caches. Fortunately, such a thing exists.
Guava is a 'general utilities' project (just a bunch of useful utility classes, lots of em now seem a bit pointless, in the sense that java.util and co have these too, but guava is over 10 years old, and everything it has didn't exist back then) - and one of the most useful things it has is a 'Cache' class. It's a Map with bonus features.
I strongly suggest you use it and follow its API designs. It's got a few things that map doesn't have:
You can set up an eviction system; various strategies are available. You can allow k/v pairs to expire X milliseconds after being created, or optionally X milliseconds after the last time they were read. Or simply guarantee that the cache will never exceed some set size, removing the least recently accessed (or written - again, your choice) k/v pair if needed.
The obvious 'get a value' API call isn't .get() like with map, it's a variant where you provide the key as well as a computation function that would calculate the value; the Cache object will just return the cache value if it exists, but if not, it will run the computation, store it in the cache, and return that. Making your life a lot easier, you just call the get method, pass in the key and the computer, and continue, not having to care about whether the computation function is used or not.
You get some control over concurrent calculations too - if 2 threads simultaneously end up wanting the value for key K which isn't in the cache, should both threads just go compute it, or should one thread be paused to wait for the other's calculation? That's also not entirely trivial to write in a ConcurrentHashMap.
Some fairly fancy footwork - weak keying/valuing: You can set things up such that if the key is garbage collected, the k/v pair gets evicted (eventually) too. This is tricky (string keys don't really work here, for example, and sometimes your value refers to your key in which case the existence of the value would mean your key can't be GCed, making this principle worthless - so you need to design your key and value classes carefully), but can be very powerful.
I believe you can also get just the guava cache stuff on its own, but if not - you know where to look: Add guava as a dependency to your project, fire up an instance of CacheBuilder, read the javadocs, and you're off :)

Is it a bad practice to use a ThreadLocal Object for storing web request metadata?

I am working on a j2ee webapp divided in several modules. I have some metadata such as user name and preferences that I would like to access from everywhere in the app, and maybe also gather data similar to logging information but specific to a request and store it in those metadata so that I could optionally send it back as debug information to the user.
Aside from passing a generic context object throughout every method from the upper presentation classes to the downer daos or using AOP, the only solution that came in mind was using a threadlocal "Context" object very similar to a session BTW, and add a filter for binding it on ongoing request and unbinding it on response.
But such thing feels a little hacky since this breaks several patterns and could possibly make things complicated when it comes to testing and debugging so I wanted to ask if from your experience it is ok to proceed like this?

ThreadLocal is a hack to make up for bad design and/or architecture. It's a terrible practice:
It's a pool of one or more global variables and global variables in any language are bad practice (there's a whole set of problems associated with global variables - search it on the net)
It may lead to memory leaks, in any J2EE container than manages its threads, if you don't handle it well.
What's even worse practice is to use the ThreadLocal in the various layers.
Data communicated from one layer to another should be passed using Transfer Objects (a standard pattern).
It's hard to think of a good justification for using ThreadLocal. Perhaps if you need to communicate some values between 2 layers that have a third/middle layer between them, and you don't have the means to make changes to that middle layer. But if that's the case, I would look for a better middle layer.
In any case, if you store the values in one specific point in the code and retrieve it in another single point, then it may be excusable, otherwise you just never know what side affects any executing method may have on the values in the ThreadLocal.

Personally I prefer passing a context object, as the fact that the same thread is used for processing is an artifact of the implementation, and you shouldn't rely on such artifacts. The moment you want to use other threads, you'll hit a wall.
If those states are encapsulated in a Context object, I think that's clean enough.

When it comes to testing, the best tool is dependency injection. It allows to inject fake dependencies into the object under test.
And all dependency injection frameworks (Spring, CDI, Guice) have the concept of a scope (where request is one of these scopes). Under the hood, beans stored in the request scoped are indeed associated with a ThreadLocal variable, but this is all done by the dependency injection framework.
What I would do is thus to use a DI framework, which would make request-scope objects available anywhere, but without having to look them up, which would break testability. Just inject a request-scoped object where you want to use it, and the DI framework will retrieve it for you.

You must know that a servlet container can / will re-use threads for requests so if you do use ThreadLocals, you'll need to clean up after yourself once the request is finished (perhaps using a filter)

If you are the only developer in the project and you think you gain something: just do it! Because it is your time. But, be prepared to revert the decision and reorganize the code base later, as should be always the case.
Let's say there are ten developers on the project. Everybody might like to have its thread local variable to pass on parameters like currency, locale, roles, maybe it becomes even a HashMap....
I think in the end, not everything which is feasible, should be done. Complexity will strike back on you....

ThreadLocal can lead to memory leak if we do not set null manually once its out of scope.

alternate to sesssion variable is it serialization

This is a question I always wanted to ask. We always read that it is better to use request object when we have to carry data from one page to other. Now let's say I have about 10 different data items that I need on 4-5 pages. Is it better to use a session variable, or is there an alternative to that? In my app I have about 10 menus where each menu performs different operations. In each such menu I have such different data which are not common between menus. Which is the best way to handle this?

For that kind of problem (navigation) I prefer an stateless approach. This is, passing the info in the url or request body. An stateful approach is harder in the end, less scalable, consumes more memory for each user and, as any other global variable, you have to be very careful to handle it.
Remember that HTTP is an stateless protocol and then, and you should prefer an stateless design. The stateful approach is just a kind of trick that both sides (client and server) use to achieve the magic you know as session variables.
Send the required info in the request!

Servlet 3 spec and ThreadLocal

As far as I know, Servlet 3 spec introduces asynchronous processing feature. Among other things, this will mean that the same thread can and will be reused for processing another, concurrent, HTTP request(s). This isn't revolutionary, at least for people who worked with NIO before.
Anyway, this leads to another important thing: no ThreadLocal variables as a temporary storage for the request data. Because if the same thread suddenly becomes the carrier thread to a different HTTP request, request-local data will be exposed to another request.
All of that is my pure speculation based on reading articles, I haven't got time to play with any Servlet 3 implementations (Tomcat 7, GlassFish 3.0.X, etc.).
So, the questions:
Am I correct to assume that ThreadLocal will cease to be a convenient hack to keep the request data?
Has anybody played with any of Servlet 3 implementations and tried using ThreadLocals to prove the above?
Apart from storing data inside HTTP Session, are there any other similar easy-to-reach hacks you could possibly advise?
EDIT: don't get me wrong. I completely understand the dangers and ThreadLocal being a hack. In fact, I always advise against using it in similar context. However, believe it or not, thread context has been used far more frequently than you probably imagine. A good example would be Spring's OpenSessionInViewFilter which, according to its Javadoc:
This filter makes Hibernate Sessions
available via the current thread,
which will be autodetected by
transaction managers.
This isn't strictly ThreadLocal (haven't checked the source) but already sounds alarming. I can think of more similar scenarios, and the abundance of web frameworks makes this much more likely.
Briefly speaking, many people have built their sand castles on top of this hack, with or without awareness. Therefore Stephen's answer is understandable but not quite what I'm after. I would like to get a confirmation whether anyone has actually tried and was able to reproduce failing behaviour so this question could be used as a reference point to others trapped by the same problem.

Async processing shouldn't bother you unless you explcitly ask for it.
For example, request can't be made async if servlet or any of filters in request's filter chain is not marked with <async-supported>true</async-supported>. Therefore, you can still use regular practices for regular requests.
Of couse, if you actually need async processing, you need to use appropriate practices. Basically, when request is processed asynchronously, its processing is broken into parts. These parts don't share thread-local state, however, you can still use thread-local state inside each of that parts, though you have to manage the state manually between the parts.

(Caveat: I've not read the Servlet 3 spec in detail, so I cannot say for sure that the spec says what you think it does. I'm just assuming that it does ...)
Am I correct to assume that ThreadLocal will cease to be a convenient hack to keep the request data?
Using ThreadLocal was always a poor approach, because you always ran the risk that information would leak when a worker thread finished one request and started on another one. Storing stuff as attributes in the ServletRequest object was always a better idea.
Now you've simply got another reason to do it the "right" way.
Has anybody played with any of Servlet 3 implementations and tried using ThreadLocals to prove the above?
That's not the right approach. It only tells you about the particular behaviour of a particular implementation under the particular circumstances of your test. You cannot generalize.
The correct approach is to assume that it will sometimes happen if the spec says it can ... and design your webapp to take account of it.
(Fear not! Apparently, in this case, this does not happen by default. Your webapp has to explicitly enable the async processing feature. If your code is infested with thread locals, you would be advised not to do this ...)
Apart from storing data inside HTTP Session, are there any other similar easy-to-reach hacks you could possibly advise.
Nope. The only right answer is storing request-specific data in the ServletRequest or ServletResponse object. Even storing it in the HTTP Session can be wrong, since there can be multiple requests active at the same time for a given session.

NOTE: Hacks follow. Use with caution, or really just don't use.
So long as you continue to understand which thread your code is executing in, there's no reason you can't use a ThreadLocal safely.
try {
tl.set(value);
doStuffUsingThreadLocal();
} finally {
tl.remove();
}
It's not as if your call stack is switched out randomly. Heck, if there are ThreadLocal values you want to set deep in the call stack and then use further out, you can hack that too:
public class Nasty {
static ThreadLocal<Set<ThreadLocal<?>>> cleanMe =
new ThreadLocal<Set<ThreadLocal<?>>>() {
protected Set<ThreadLocal<?>> initialValue() {
return new HashSet<ThreadLocal<?>>();
}
};
static void register(ThreadLocal<?> toClean) {
cleanMe.get().add(toClean);
}
static void cleanup() {
for(ThreadLocal<?> tl : toClean)
tl.remove();
toClean.clear();
}
}
Then you register your ThreadLocals as you set them, and cleanup in a finally clause somewhere. This is all shameful wankery that you shouldn't probably do. I'm sorry I wrote it but it's too late :/

I'm still wondering why people use the rotten javax.servlet API to actually implement their servlets. What I do:
I have a base class HttpRequestHandler which has private fields for request, response and a handle() method that can throw Exception plus some utility methods to get/set parameters, attributes, etc. I rarely need more than 5-10% of the servlet API, so this isn't as much work as it sounds.
In the servlet handler, I create an instance of this class and then forget about the servlet API.
I can extend this handler class and add all the fields and data that I need for the job. No huge parameter lists, no thread local hacking, no worries about concurrency.
I have a utility class for unit tests that creates a HttpRequestHandler with mock implementations of request and response. This way, I don't need a servlet environment to test my code.
This solves all my problems because I can get the DB session and other things in the init() method or I can insert a factory between the servlet and the real handler to do more complex things.

You are psychic ! (+1 for that)
My aim is ... to get a proof this has stopped working in Servlet 3.0 container
Here is the proof that you were asking for.
Incidentally, it is using the exact same OEMIV filter that you mentioned in your question and, guess what, it breaks Async servlet processing !
Edit: Here is another proof.

One solution is to not use ThreadLocal but rather use a singleton that contains a static array of the objects you want to make global. This object would contain a "threadName" field that you set. You first set the current thread's name (in doGet, doPost) to some random unique value (like a UUID), then store it as part of the object that contains the data you want stored in the singleton. Then whenever some part of your code needs to access the data, it simply goes through the array and checks for the object with the threadName that is currently running and retrieve the object. You'll need to add some cleanup code to remove the object from the array when the http request completes.

Java Filters Performance Question

I have two questions. The first is do Filters add a lot of overhead to request. We have a filter and it is set to run on the URL pattern /*. This means it also runs on all the image request. I think that this is not good for performance, but my co-workers think that it doesn't matter if the filter runs 5 or 6 times per request because the filter only has a couple of if statements.
Is there a way to have the filter run once per request, ignoring the image request.
Thanks Doug

Measuring is knowing. If well-written, I'd say, it's negligible. But if it's for example grabbing the session regardless of it's been created (and thus there's a chance that it will unnecessarily be created), then it may have a noticeable impact on performance and/or memory usage because creation of sessions isn't per-se cheap and sessions are stored in sever's memory for a longer term than the requests.
You may want to replace the url-pattern of /* by *.jsp or to move the restricted pages to a specific folder, e.g. /secured, /private, /pages, etc and alter the url-pattern accordingly to /secured/*, /private/*, /pages/*, etc and put all the static content in a different place, e.g. /static. This way the filter won't be invoked for static content anymore.

First, I agree with the Profile-first approach.
Second, as far as I know it depends, web-server use the same technique to invoke a specific servelt(/JSP) as they use for filters.
In case the filter is filtering a static resource(e.g. jpg file), it's a bit of a waste,
In case the filter is filtering a dynamic resource (e.g. Servlet) it's negligible..
(Most of the Java web frameworks like struts and Jboss-seam are using filters heavily..)

It almost never useful to speculate about the performance implications of code without first profiling it. Unless the code being proposed in the filters is doing some operations you know to be slow then measure first before optimising.
Remember even though when you are writing a servlet it may seem like the only thing that happens is the code in your doGet() or doPost() methods a lot of other things happen before your servlet/filter code gets invoked. The servlet container processes the HTTP request bundles it up in Java objects and does all sorts of other processing before it hands over to your code.
If your servlet filters really are only a couple of if statements operating on data that is cheap to get (such as the request itself), it is unlikely this is going to be an issue for you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.