MultiThreadedHttpConnectionManager with each HttpClient needing unique cookies and host - java

I've been reading the Apache javadocs and have concluded that the correct way to handle a heavily multithreaded application which involves HttpClient is to use a static singleton MultiThreadedHttpConnectionManager with a global HttpClient, with each thread having its instance of HttpState for its cookies and HostConfiguration. Is this correct? Obviously I want to avoid each thread interfering with the others' states.
Also, one of the problems I've been noticing is that when these threads end the socket associated with the HttpClient remains (checking /proc/###/fd), even after RequestMethod.releaseConnection() is called in the finally block. This is a problem as the default limit for fd is 1024 and this will be breached after a few days. What's the best way to kill with these left over sockets?
The only applicable method I can see for killing these sockets is MultiThreadedHttpConnectionManager.closeIdleConnections(timeout)- is this correct?
This means I'm going to have to either instantiate another thread to periodically call this method, or just call it every time the threads are killed manually, which seems quite clunky. Any advice?
Thanks a lot

Related

Thread local behaviour in spring boot

As we know Tomcat has approx 200 threads and Jetty has some default count threads in their respective thread pools. So if we set something in a ThreadLocal per request, will it be there in the thread for life time or will Tomcat clear the ThreadLocal after each request.
If we set something in userContext in a filter do we need to clear it every time the filter exits?
Or will the web server create a new thread every time, if we don't have a thread pool configuration?
public static final ThreadLocal<UserContextDto> userContext = new ThreadLocal<>();
Yes, you need to clear ThreadLocal. Tomcat won't clear ThreadLocals.
No, new thread is not created every time. A thread from the pool is used to serve a request, and returned back to pool once request is complete.
This not only applies to Tomcat, it applies to Jetty and Undertow as well. Thread creation for every request is expensive in terms of both resources and time.
No, Tomcat will not clear ThreadLocals that your code creates, which means they will remain and could pollute subsequent requests.
So whenever you create one, make sure you clear it out before that same request or whatever exits.
It should also be noted that subsequent requests - even using the identical URL - could well be executed in a totally different thread, so ThreadLocals are not a mechanism for saving state between requests. For this, something like SessionBeans could be used.
If you put something in a ThreadLocal in a Thread that is not 100% under your control (i.e. one in which you are invoked from other code, like for a HTTP request), you need to clear whatever you set before you leave your code.
A try/finally structure is a good way to do that.
A threadpool can't do it for you, because the Java API does not provide a way to clear a thread's ThreadLocal variables. (Which is arguably a shortcoming in the Java API)
Not doing so risks a memory leak, although it's bounded by the size of the thread pool if you have one.
Once the same thread gets assigned again to the code that knows about the ThreadLocal, you'll see the old value from the previous request if you didn't remove it. It's not good to depend on that. It could lead to hard to trace bugs, security holes, etc.

Number of threads for NioEventLoopGroup with persistent connections

I would like to use Java Netty to create a TCP server for a large number of persistent connections from a clients. In other words, imaging that there are 1000 client devices out there, and all of them create and maintain a persistent connection to the TCP server. There will be a reasonable amount of traffic (mostly lines of text) that go back and forth across each of these persistent connections. How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
My understanding is that when the connection is created, Netty creates a SimpleChannelInboundHandler<String> object to handle the connection. When the connection is created then the handler channelActive method is called, and every time it gets a new message from the client, the method messageReceived gets called (or channelRead0 method in Netty 4.0.24).
Is my understanding correct?
What happens if I have long running code to run in messageReceived -
do I need to launch this code in yet another thread
(java.util.Thread)?
What happens if my messageReceived method blocks on something or
takes a long time to complete? Does that bring Netty to a grinding
halt?
Basically I need to write a TCP socket server that can serve a large number of persistent connections as quickly as possible.
Is there any guidance available on number of threads for NioEventLoopGroup and on how to use any threads inside the handler?
Any help would be greatly appreciated.
How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
About Boss Thread,if you are saying that you need persistent connections , there is no sense to use a lot of boss threads, because boss threads only responsible for accepting new connections. So I would use only one boss thread.
The number of worker threads should depends on your processor cores.
Don't forget to add -XmsYYYYM and -XmxYYYYM as your VM attributes, because without them you can face the case, when your JVM are not using all cores.
What happens if I have long running code to run in messageReceived - do I need to launch this code in yet another thread (java.util.Thread)?
Do you really need to do it? Probably you should think of doing your logic another way, if not then probably you should consider OIO with new thread for each connection.
What happens if my messageReceived method blocks on something or takes a long time to complete?
You should avoid using thread blocking actions in your handlers.
Does that bring Netty to a grinding halt?
Yep, it does.

Increasing HTTP client timeout

I'll get right into the subject
I have a server that works a music recommendation system ( for some kind of application)
the server has a very large database
So i made a singleton constructor of the recommendation system.
My problem is
the first time this constructor is being created it has to run a training data and it connects to the database a lot which is a time consuming operation
This has to run only the first time according to my singleton object and then afterwards, it'll be able to use the results of the constructor right away
My problem is that on the first HTTP request from my PC to the server, the explorer times out and the singleton object is never created on the server
I think my solution would be in extending the wait time of the explorer until the server finishes computation and returns with result, however
if someone has a better solution i'd be greatly in his dept
I really need an easy applicable solution that requires minimal effort because the delivery deadline is closing up and i need to wrap the project as fast as possible
Thanks again
Few comments/suggestions
Increasing timeout is one way but its not sure shot way of solving the problem. The time taken by the recommendation system may not always be same over the time.
I suggest another approach to solve this. Not sure if its an option for you, but Will it be possible to create the recommendation system asynchronously in a separate thread so that the server start up is not held back by this ?
If you could do above, then provision a flag which indicates that recommendation system has started.
Meanwhile if you receive any request, first check the flag if the flag indicate that the recommendation system has not yet started, the return some meaningful message/status.
This way you will get the response immediately and based on the response you can work out retries on the client side.
Please note that this will be substantial change on the server side. Just an opinion to improve the things further and full proof way of avoiding timeout.
You can increase the connection time out using below
HttpResponse response = null;
final HttpParams httpParams = new BasicHttpParams();
// 60 second connection timeout
HttpConnectionParams.setConnectionTimeout(httpParams, 60000);
HttpClient httpClient = new DefaultHttpClient(httpParams);

About Java ServerSocket Accept: busy-wating?

i'm reading the TCP/IP Socket in Java, about the serversocket, it says
When we call accept() on that ServerSocket
instance, if a new connection is pending, accept() returns immediately; otherwise it blocks
until either a connection comes in or the timer expires, whichever comes first. This allows
a single thread to handle multiple connections. Unfortunately, the approach requires that
we constantly poll all sources of I/O, and that kind of “busy waiting” approach again introduces
a lot of overhead from cycling through connections just to find out that they have
nothing to do.
As I understand it, should this be "notified" when a connection comes thus should not be "busy waiting"? Did i misunderstand something...?
-----------------EDIT----------------------
The whole paragraph is as below:
Because of these complications, some programmers prefer to stick with a single-threaded
approach, in which the server has only one thread, which deals with all clients—not sequentially,
but all at once. Such a server cannot afford to block on an I/O operation with any
one client, and must use nonblocking I/O exclusively. Recall that with nonblocking I/O, we specify the maximum amount of time that a call to an I/O method may block (including zero).
We saw an example of this in Chapter 4, where we set a timeout on the accept operation
(via the setSoTimeout() method of ServerSocket). When we call accept() on that ServerSocket
instance, if a new connection is pending, accept() returns immediately; otherwise it blocks
until either a connection comes in or the timer expires, whichever comes first. This allows
a single thread to handle multiple connections. Unfortunately, the approach requires that
we constantly poll all sources of I/O, and that kind of “busy waiting” approach again introduces
a lot of overhead from cycling through connections just to find out that they have
nothing to do
It's mostly nonsense, even in the entire quotation. Either you are using blocking I/O, in which case you need a thread per connection, and another per accept() loop, or you are using non-blocking I/O, in which case you have select(), or, from Java 7, you are using Asynchronous I/O, in which case it is all callbacks into completion handlers. In none of these cases do you need to poll or busy-wait.
I think he must be referring to using blocking mode with very short timeouts, but it's really most unclear.

Common practices to avoid timeouts / starvation in Java?

I have a web-service that write files to disk and other stuff to database. The entire operation takes 1-2 seconds for each write.
The service can, bur that is unlikely, be called from several clients at the same time. Let´s assume that 20 clients call the webservice at the same time, the write operations must be synchronized. In that case, some clients can get a time out exception because they have to wait to many seconds.
Are there any good practices to solve these kind of situations? As it is now, the methods are synchronized (and that can cause the starvation/timeouts).
Should I let all threads get into the write method by removing the synchronized keyword and put their task into a task queue to avoid a timeout? Is that the correct way to get arount this?
Removing the synchronized and putting it into a task queue by itself will not help you (because that's effectively what the synchronized is doing for you). However if you respond to the web request as soon as you put it on the queue, then you will reduce your response fime. But at the cost of some reliability as the user will get a confirmation that the work is done and the work will not really have been done (the system could crash before the work is done).
Francis Upton's practice is indeed an accepted practice.
Another one, is making more fine grained synchronization. Instead of synchronizing all read/write methods of a class, you can synchronize access of the exact invariants that should be synchronized.
And yet even better, is to get rid of synchronization altogether. This is possible using the java.util.concurrent package. This package introduce new collections that use Non-Blocking Algorithms (implemented in java using Compare-Ans-Swap atomic instructions). These collections, such as ConcurrentHashMap, enable much better throughput when scaling.
You can read more about it in this article.
In this type of implementation (slow service under increasing load) you want to make as much as possible async, including the timeout processing (if server-based) and the required I/O. Don't hold up your client response threads waiting for either of these time-consuming operations, to preserve the server's responsiveness to new requests, but instead fire off the required operations (maybe to a dynamic thread pool) and let callbacks process the results, whether timeout, complete I/O, or errors.
Send the appropriate response depending on what happens first, but be prepared to roll back I/O if you send an error/timeout message and then a completed I/O arrives (due to a race condition between I/O and timer). This implies transactional semantics are required in the server.
This is an area that get increasingly complex as your load grows but good design early on should allow you to scale as load grows. Ideally the client servicing threads should not block at all.

Categories