Is it bad practise to utilize many threads? (through SwingWorkers) - java

My Java (Swing) application creates a new SwingWorker object when it needs to (e.g) download data from the Internet and do something at the same time (think display a loader). However, monitoring the threads created, this can quickly reach ~100 threads.
Is this bad practice? If yes; what's the proper way to do it? Doesn't the GC automatically clean up unused threads?

Yes it is a bad practice when you put no upper bound on the number of threads (or generally resources).
In this case you better use a thread pool which contains at most a specific number of threads (say for example 25). You can either create them all at startup, or create them lazily on demand.
Implement a simple request manager system for the pool, which gives to the requesters the resources (or in case of running out of resources, queues them or simply denies them).
In this way, cleaning them in the end will also be easy and obvious.

Related

Assign few connections from a connection pool to each specific task

I have a connection pool with 50 connections.
I want to dedicate
10 among these to task A
20 for task B
Is this a good practice? Is this possible in java while creating connection pools or thread pools?
Sharing resources between different participants is a good practice.
It is also a well-established practice to not do that blindly, but in context of priorities/goals/policies. Very often that happens on a higher level (think of load balancing); but of course, you could also built that into your application directly.
But to my knowledge, there are no simple mechanism in the (standard) java libraries to do that.
Long story short: if you want to use something like this; then you might have to step back and implement your own solution. In other words: you create your own connection pool that knows about "different" tasks; and that allows you to give priorities to them; and then the pool decides on such policies who will be served next.
On the downside: implementing something like this can turn out pretty complicated pretty quickly. Thus my first advise: go for two independent pools; make experiments and see how things work out for you. Only when you find that this solution is too inefficient for you; then start looking into your building your "load balancing"!

Should I use ThreadLocal in this high traffic multi-threaded scenario?

Scenario
We are developing an API that will handle around 2-3 million hits per hour in a multi-threaded environment. The server is Apache Tomcat 7.0.64.
We have a custom object with lot of data let's call it XYZDataContext. When a new request comes in we associate XYZDataContext object to the request context. One XYZDataContext object per request. We will be spawning various threads in parallel to serve that request to collect/process data from/into XYZDataContext object. Our threads that will process things in parallel need access to this XYZDataContext object and
to avoid passing around of this object everywhere in the application, to various objects/methods/threads,
we are thinking to make it a threadlocal. Threads will use data from XYZDataContext object and will also update data in this object.
When the thread finishes we are planning to merge the data from the updated XYZDataContext object in the spawned child thread into the main thread's XYZDataContext object.
My questions:
Is this a good approach?
Threadpool risks - Tomcat server will maintain a threadpool and I read that using threadlocal with thread pools is a disaster because thread is not GCed per say and is reused so the references to the threadlocal objects will not get GCed and will result in storing huge objects in memory that we don't need anymore eventually resulting into OutOfMemory issues...
UNLESS they are referenced as weak references so that get GCed immediately.
We're using Java 1.7 open JDK. I saw the source code for ThreadLocal and the although the ThreadLocalMap.Entry is a weakreference it's not associated with a ReferenceQueue, and the comment for Entry constructor says "since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space."
I guess this works great in case of caches but is not the best thing in our case. I would like that the threadlocal XYZDataContext object be GCed immediately. Will the ThreadLocal.remove() method be effective here?
Is there any way to enforce emptying the space in the next GC run?
This is a right scenario to use ThreadLocal objects? Or are we abusing the threadlocal concept and using it where it shouldn't be used?
My gut feeling tells me you're on the wrong path. Since you already have a central context object (one for all threads) and you want to access it from multiple threads at the same time I would go with a Singleton hosting the context object and providing threadsafe methods to access it.
Instead of manipulating multiple properties of your context object, I would strongly suggest to do all manipulations at the same time. Best would be if you pass only one object containing all the properties you want to change in your context object.
e.g
Singleton.getInstance().adjustContext(ContextAdjuster contextAdjuster)
You might also want to consider using a threadsafe queue, filling it up with ContextAdjuster objects from your threads and finally processing it in the Context's thread.
Google for things like Concurrent, Blocking and Nonblocking Queue in Java. I am sure you'll find tons of example code.

Pool of threads tied to resources on the server

I have a Java servlet that operates with a heavy-weight and thread-unsafe resource to handle user requests. The resource is an object that needs a long time to be instantiated (up to 10 seconds) and takes a large amount of memory.
But when the object is allocated, it takes a short time to run its method I need to process a request.
There can be several such resources, different from each other.
Each request comes with an ID, which points out on the certain resource.
I wish to implement a pool of such resources, so that requests with the same IDs will not instantiate a new object, but will pick one from the pool.
The scheme is following:
after the request has been received, servlet checks whether a resource with the requested ID is in the pool
if not, servlet creates one and provides it
if the resource is already instantiated, the request goes into a queue to be executed, doPost waits for it.
The operation over different resources must be concurrent, but synchronized within the same resource.
I am new to multithreading in Java, and the ThreadPoolExecutor does not seem to be usable as is, so I would be appreciated for an advice how to implement the above described scheme. Thanks.
You are correct - ThreadPoolExecutor is not what you want. It is simply a pool of threads to run tasks with, not a shared resource collection.
What you want is a cache. It needs to create a resource and return it to requesting threads to use, and reuse the things it returned previously. Also, the resource returned must be thread-safe (So if your underlying resources are not, you may need to write synchronized wrappers for them).
There are a number of thread-safe caches around, quite a few of them - opensource. Try those out, it shouldn't be too difficult to configure them for your use case (it seems fairly typical).
It is possible and not too difficult to implement a make-shift cache of your own, but you're far better off using a third-party solution if you are new to multithreading.

Use threads as "sessions"

I am developing a text-based game, MUD. I have the base functions of the program ready, and now I would like to allow to connect more than one client at a time. I plan to use threads to accomplish that.
In my game I need to store information such as current position or health points for each player. I could hold it in the database, but as it will change very quick, sometimes every second, the use of database would be inefficient (am I right?).
My question is: can threads behave as "sessions", ie hold some data unique to each user?
If yes, could you direct me to some resources that I could use to help me understand how it works?
If no, what do you suggest? Is database a good option or would you recommend something else?
Cheers,
Eleeist
Yes, they can, but this is a mind-bogglingly stupid way to do things. For one thing, it permanently locks you into a "one thread per client" model. For another thing, it makes it difficult (maybe even impossible) to implement interactions between users, which I'm sure your MUD has.
Instead, have a collection of some kind that stores your users, with data on each user. Save persistent data to the database, but you don't need to update ephemeral data on every change.
One way to handle this is to have a "changed" boolean in each user. When you make a critical change to a user, write them to the database immediately. But if it's a routine, non-critical change, just set the "changed" flag. Then have a thread come along every once in a while and write out changed users to the database (and clear the "changed" flag).
Use appropriate synchronization, of course!
A Thread per connection / user session won't scale. You can only have N number of threads active where N is equal to the number of physical cores / processors your machine has. You are also limited by the amount of memory in your machine for how many threads you can create a time, some operating systems just put arbitrary limits as well.
There is nothing magical about Threads in handling multiple clients. They will just make your code more complicated and less deterministic and thus harder to reason about what is actually happening when you start hunting logic errors.
A Thread per connection / user session would be an anti-pattern!
Threads should be stateless workers that pull things off concurrent queues and process the data.
Look at concurrent maps for caching ( or use some appropriate caching solution ) and process them and then do something else. See java.util.concurrent for all the primitive classes you need to implement something correctly.
Instead of worrying about threads and thread-safety, I'd use an in-memory SQL database like HSQLDB to store session information. Among other benefits, if your MUD turns out to be the next Angry Birds, you could more easily scale the thing up.
Definitely you can use threads as sessions. But it's a bit off the mark.
The main point of threads is the ability of concurrent, asynchronous execution. Most probably, you don't want events received from your MUD clients to happen in an parallel, uncontrolled order.
To ensure consistency of the world I'd use an in-memory database to store the game world. I'd serialize updates to it, or at least some updates to it. Imagine two players in parallel hitting a monster with HP 100. Each deals 100 damage. If you don't serialize the updates, you could end up giving credit for 100 damage to both players. Imagine two players simultaneously taking loot from the monster. Without proper serialization they could end up each with their own copy of the loot.
Threads, on the other hand, are good for asynchronous communication with clients. Use threads for that, unless something else (like a web server) does that for you already.
ThreadLocal is your friend! :)
http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
ThreadLocal provides storage on the Thread itself. So the exact same call from 2 different threads will return/store different data.
The biggest danger is having a leak between Threads. You would have to be absolutely sure that if a different user used a Thread that someone else used, you would reset/clear the data.

Use of Stanford Parser in Web Service

I need to use the Stanford Parser in a web service. As SentenceParser loads a big object, I will make sure it is a singleton, but in this case, is it thread safe (no according to http://nlp.stanford.edu/software/parser-faq.shtml). How else would it be done efficiently? One option is locking the object while being used.
Any idea how the people at Stanford are doing this for http://nlp.stanford.edu:8080/parser/ ?
If the contention is not a factor, locking (synchronization) would be one option as you mentioned, and it might be good enough.
If there are contentions, however, I see three general options.
(1) instantiating it every time
Just instantiate it as a local variable every time you perform parsing. Local variables are trivially safe. The instantiation is not free of course, but it may be acceptable depending on the specific situation.
(2) using threadlocals
If instantiation turns out to be costly, consider using threadlocals. Each thread would retain its own copy of the parser, and the parser instance would be reused on a given thread. Threadlocals are not without problems, however. Threadlocals may not be garbage collected without being set to null or until the holding thread goes away. So there is a memory concern if there are too many of them. Second, beware of the reuse. If these parsers are stateful, you need to ensure to clean up and restore the initial state so subsequent use of the threadlocal instance does not suffer from the side effect of previous use.
(3) pooling
Pooling is in general no longer recommended, but if the object sizes are truly large so that you need to have a hard limit on the number of instances you can allow, then using an object pool might be the best option.
I don't know how the people at Stanford have implemented their service but I would build such a service based on a message framework, such as http://www.rabbitmq.com/. So your front end service will receive documents and use a message queue to communicate (store documents and retrieve results) with several workers that execute NLP parsing. The workers -- after finishing processing -- will store results into a queue that is consumed by the front end service. This architecture will let you to dynamically add new workers in case of high load. Especially that NLP tagging takes some time - up several seconds per document.

Categories