Concurrent output from java application to elasticsearch - java

I am researching if it is possible to have multiple threads output to elasticsearch concurrently using the transport client and bulk upload apis. Specifically, I want to have multiple transport clients or bulk upload api instances run on their own threads and handle input to elasticsearch. My specific reason for wanting to do this is so I can create a load balancing algorithm to handle a very large number of json messages efficiently. I have been googling for some time and can't find any documentation on this type of thing, or anyone else asking similar questions. Additionally, I am new to elasticsearch. Does anyone have any insight on this, some literature they could share, or a good place to start? Thanks.

An idea on how you can achieve this is to have a static class that acts as a wrapper for an elastic Client object. You can then spawn several threads in whatever code you are executing using the ExecutorService. The ExecutorService includes many utility methods, detailed in the link, that might help you manage your processing. These threads would then call into the static class to get the client object when doing processing, prepare their bulk requests, and then send them.
If you are lazy, you can just have loops that execute indefinitely and have sleep calls to help prevent overloading.
A few caveats to watch out for:
1) Be very mindful of Elasticsearch's Thread pool and queue sizes. Do not submit data to ES faster than your hardware can handle. If you are submitting data to ES too fast such that you are overloading the queue, bulk requests will be aborted. Do not increase the bulk queue size unless you need to and know your hardware can keep up and prevent overload. Increasing the queue size if you are running into roadblocks will only delay the inevitable. If you are overloading the bulk, include a way to throttle requests in your code.
2) Partition up your bulk requests by type/index. I am not 100% sure how ES handles bulk requests under the hood, but I have noticed some inconsistent behavior in the queue size when shoving tons requests to different indexes in one bulk request. It would make sense that Elasticsearch partitions up the requests to prevent tons of useless seqs and optimize shard/node traversal, but I have noticed that the queue size goes up much quicker if you mix.

Related

Serving single HTTP request with multiple threads

Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?
I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.

We can only use a blockingqueue or any other data structures for Threadpool task queue?

Hi I am a newbie in Concurrent programming with java. of all the examples I saw in concurrent programming whenever we use to define a task queue people used different implementations of blockingqueue.
why only blockingqueue? what are the advantages and disadvantages?
why not any other data structures?
Ok, i can't address exactly why unspecified code you looked at uses certain data structures and not other ones. But Blocking queues have nice properties. Holding only a fixed number of elements and forcing producers who would insert items over that limit to wait is actually a feature.
Limiting the queue size helps keep the application safe from a badly-behaved producer, which otherwise could fill the queue with entries until the application ran out of memory. Obviously it's faster to insert a task into the task wueue thsn it is to execute it, an executor is going to be at risk for getting bombarded with work.
Also making the producer wait applies back pressure to the system. That way the queue lets the producer know it's falling behind and not accepting more work. It's better for the producer to wait than it is for it to keep hammering the queue; back pressure lets the system degrade gracefully.
So you have a data structure that is easy to understand, has practical benefits for building applications and seems like a natural fit for a task queue. Of course people are going to use it.

Java, JMS shutdown connection when receive all response.

I have problem with counting responses from response queue. I mean, once per day we run a job which gather some data from db and send them to queue. When we receive all responses we should shutdown connection. The problem is how we can check if all responses arrived ? Keeping this in global variable is risky because of concurrence issue. Any idea ? I am quite new in JMS so maybe solution is obvious but I dont see it.
I don't know what your stack is or whatever tools you might be using to accomplish this but I've got this in mind and this might help you out (hopefully).
Generate a hash for each job you plan on queuing and store it in a concurrent list/map. (i.e: ConcurrentHashMap)
Send the job to the queue.
Once the job is done and sends back a response, reproduce the hash and store it a separate concurrent list/map that holds all the jobs that are done.
Now that you have two lists of all the jobs supposed to be executed and the jobs that you got a response from. There multiple ways to accomplish this. If you lookup Java Concurrency, you'd find plenty of tutorials and documentation. I like to use CyclicBarrierandCountDownLatch`. If plan on using any of these methods, take extra precautions to prevent your application from hanging or worse, a filthy memory leak.
OR, you could simply check on how many queuing requests and responses you've and if they are equal to each other, drop the connection.

Common practices to avoid timeouts / starvation in Java?

I have a web-service that write files to disk and other stuff to database. The entire operation takes 1-2 seconds for each write.
The service can, bur that is unlikely, be called from several clients at the same time. Let´s assume that 20 clients call the webservice at the same time, the write operations must be synchronized. In that case, some clients can get a time out exception because they have to wait to many seconds.
Are there any good practices to solve these kind of situations? As it is now, the methods are synchronized (and that can cause the starvation/timeouts).
Should I let all threads get into the write method by removing the synchronized keyword and put their task into a task queue to avoid a timeout? Is that the correct way to get arount this?
Removing the synchronized and putting it into a task queue by itself will not help you (because that's effectively what the synchronized is doing for you). However if you respond to the web request as soon as you put it on the queue, then you will reduce your response fime. But at the cost of some reliability as the user will get a confirmation that the work is done and the work will not really have been done (the system could crash before the work is done).
Francis Upton's practice is indeed an accepted practice.
Another one, is making more fine grained synchronization. Instead of synchronizing all read/write methods of a class, you can synchronize access of the exact invariants that should be synchronized.
And yet even better, is to get rid of synchronization altogether. This is possible using the java.util.concurrent package. This package introduce new collections that use Non-Blocking Algorithms (implemented in java using Compare-Ans-Swap atomic instructions). These collections, such as ConcurrentHashMap, enable much better throughput when scaling.
You can read more about it in this article.
In this type of implementation (slow service under increasing load) you want to make as much as possible async, including the timeout processing (if server-based) and the required I/O. Don't hold up your client response threads waiting for either of these time-consuming operations, to preserve the server's responsiveness to new requests, but instead fire off the required operations (maybe to a dynamic thread pool) and let callbacks process the results, whether timeout, complete I/O, or errors.
Send the appropriate response depending on what happens first, but be prepared to roll back I/O if you send an error/timeout message and then a completed I/O arrives (due to a race condition between I/O and timer). This implies transactional semantics are required in the server.
This is an area that get increasingly complex as your load grows but good design early on should allow you to scale as load grows. Ideally the client servicing threads should not block at all.

nonblocking-io vs blocking-io on raw data throughput

In apache HTTPComponent document there is a statement:
Contrary to the popular belief, the performance of NIO in terms of raw data throughput is significantly lower than that of blocking I/O."
Is that true? Can someone explain this in more details? And what is a typical use case where
request / response handling needs to be decoupled
Non blocking IO should be used when you can handle the request, dispatch it for processing on some other execution context (different thread, RPC call to another server, some other async mechanism) and release the web-server's thread to handle more incoming requests. When the processing of the response will be completed, a response handling thread will be invoked, and it will send the response to the client.
I would recommend reading netty documentation for better understanding of the concept.
As for higher throughput: When your server sends/recieves large amounts of data, all those context switches, and passing data between threads, can really hurt overall performance. Think of it like this: you receive a large request (PUT request with a large file). All you need to do is to save it to disk, and return OK. Starting to toss it between threads could result in few more mem-copy operations that would have been needed in case you've just threw it to disk in the same thread. And handling this operation in async manner would not improve performance: though you could have released the request handling thread back to web-server's thread pool and let it process other requests, your main performance bottleneck is your disk IO, and in this case - trying to save more files simultaneously, would only make things slower.
I hope I was clear enough. Please feel free to ask more questions in comments if you need more explanations.
The first statement is true only when the number of concurrent requests is relatively small (rather in tens than thousands). It's all about using many threads (blocking) instead of one or few threads (non-blocking). Let's say you want to write an application which only downloads a file from remote server. If your application need to download only one file at a time you need only one thread. But if you have a crawler which runs thousands of HTTP requests then you need to have thousands of threads (or use limited number of threads + NIO instead). For so big number of threads the problem is context switching which can slow down your application dramatically (therefore for this number of concurrent requests NIO is better).
But let's back to your question. Why NIO can be slower in terms of raw data throughput ? The reason is the amount of CPU time used by NIO driven applications. For such case in blocking model your code is doing only one thing - waiting for data (it executes recv() operation in a loop). In the NIO application the logic is much more complicated: in a loop the code is using the selector to select a set of keys (which involves epoll_wait system call on Linux, Oracle JVM), then iterate through the set, pick up a channel for every key and then read the data from the channel (read() operation in OS). In standard blocking model all you do is to execute the recv() system function. In summary: NIO driven application in such case use more CPU time and generates more mode switch operations because of higher number of system calls (by saying mode switch I mean the switch from user to kernel mode). Therefore the time needed to download the file will be higher.

Categories