I'm using Apache Commons Pool for storing a set of Couchbase Clients. When doing load tests (but still with a small load), after a couple thousand operations (using a 100 connections pool) some Couchbase Client Objects start throwing IlegalStateException and then they shutdown.
I would like to know if there is a way to check if a CouchbaseClient object is still valid for being used. The Exception is generated when a "set" operation is called from the object, so checking if the client is valid would be an acceptable solution.
I'm still not sure what is the origin of the Exception. However, when I do a sort of validation of the client before returning to the pool, the problem stops happening. The issue is that this validation (calling a set operation to test if the client is working) is too brute force and impacts performance. I would like to find a smoother way of checking this.
The most common cause of IllegalStateException is that is wasn't possible to add an operation to the input queue. See the Bullet-Proof Futures and Listeners section in the Couchbase Java Developer Guide on how to handle it.
Note however that 100 CouchbaseClient objects is a lot! You generally don't need more than one or two CouchbaseClient object per application - the recommended use-case is asynchronous and there is an internal thread pool for handling the actual low-level operations.
I'd recommend looking at the Couchbase Java Developer Guide, specifically the section on Understanding and Using Asynchronous Operations.
Related
I have a project that reads data from many different providers; some via SOAP, some via HTTP, etc. Some of these providers also have a restriction on the number of concurrent connections to them. For example, provider A may allow unlimited connections, provider B may only allow 2, and provider C may allow 5.
I'm decent with Micronaut, but I'm unaware of anything built into it that would allow me to limit connections to specific URLs as necessary. So, my first thought is to create a per-provider thread limit (perhaps using RxJava's scheduler system? I believe you can create custom ones using Java's Executor class) and let that do the work of queuing for me. I think I could also go the more manual route of creating a ConcurrentMap and storing the number of active connections in that, but that seems messier and more error-prone.
Any advice would be greatly appreciated! Thanks!
Limiting thread numbers is suitable only if the network connections are made by threads, that is, synchronously. But Micronaut also can make asynchronous connections, and then limiting the number of threads won't work. Better do limiting the number of connections directly. Create an intermediate proxy object with has the same interface as Micronaut and passes all incoming requests to the real Micronaut. It also has a parameter - limit, and when a request is passed, decrements the limit. when the limit becomes 0, the proxy object stops passing requests, keeping them in an input queue. As soon as a request is finished, it signals the proxy object and it passes one request from the input queue, if any, or just increments the limit.
The simplest implementation of the proxy is a thread with BlockingQueue for input requests and Semaphore for limit. But if there are many providers and creating a thread for each provider is expensive, the proxy can be implemented as an asynchronous object.
I have problem with counting responses from response queue. I mean, once per day we run a job which gather some data from db and send them to queue. When we receive all responses we should shutdown connection. The problem is how we can check if all responses arrived ? Keeping this in global variable is risky because of concurrence issue. Any idea ? I am quite new in JMS so maybe solution is obvious but I dont see it.
I don't know what your stack is or whatever tools you might be using to accomplish this but I've got this in mind and this might help you out (hopefully).
Generate a hash for each job you plan on queuing and store it in a concurrent list/map. (i.e: ConcurrentHashMap)
Send the job to the queue.
Once the job is done and sends back a response, reproduce the hash and store it a separate concurrent list/map that holds all the jobs that are done.
Now that you have two lists of all the jobs supposed to be executed and the jobs that you got a response from. There multiple ways to accomplish this. If you lookup Java Concurrency, you'd find plenty of tutorials and documentation. I like to use CyclicBarrierandCountDownLatch`. If plan on using any of these methods, take extra precautions to prevent your application from hanging or worse, a filthy memory leak.
OR, you could simply check on how many queuing requests and responses you've and if they are equal to each other, drop the connection.
I am researching if it is possible to have multiple threads output to elasticsearch concurrently using the transport client and bulk upload apis. Specifically, I want to have multiple transport clients or bulk upload api instances run on their own threads and handle input to elasticsearch. My specific reason for wanting to do this is so I can create a load balancing algorithm to handle a very large number of json messages efficiently. I have been googling for some time and can't find any documentation on this type of thing, or anyone else asking similar questions. Additionally, I am new to elasticsearch. Does anyone have any insight on this, some literature they could share, or a good place to start? Thanks.
An idea on how you can achieve this is to have a static class that acts as a wrapper for an elastic Client object. You can then spawn several threads in whatever code you are executing using the ExecutorService. The ExecutorService includes many utility methods, detailed in the link, that might help you manage your processing. These threads would then call into the static class to get the client object when doing processing, prepare their bulk requests, and then send them.
If you are lazy, you can just have loops that execute indefinitely and have sleep calls to help prevent overloading.
A few caveats to watch out for:
1) Be very mindful of Elasticsearch's Thread pool and queue sizes. Do not submit data to ES faster than your hardware can handle. If you are submitting data to ES too fast such that you are overloading the queue, bulk requests will be aborted. Do not increase the bulk queue size unless you need to and know your hardware can keep up and prevent overload. Increasing the queue size if you are running into roadblocks will only delay the inevitable. If you are overloading the bulk, include a way to throttle requests in your code.
2) Partition up your bulk requests by type/index. I am not 100% sure how ES handles bulk requests under the hood, but I have noticed some inconsistent behavior in the queue size when shoving tons requests to different indexes in one bulk request. It would make sense that Elasticsearch partitions up the requests to prevent tons of useless seqs and optimize shard/node traversal, but I have noticed that the queue size goes up much quicker if you mix.
One spring service is implemented in one java deployment unit(JVM). Another spring service is implemented in another JVM. Making service call from 1st jvm to 2nd jvm. Service interface could be either rest or soap over http. Need to keep single transaction over multiple jvms, meaning if any service fails every thing must be rolled back. How to do this. Any code examples.
Use global transactions (i.e., JTA),
Use XA resources (RDBMS and JMS connections), do "Full XA with 2PC".
For further reference on the Spring transaction management, including the JTA/XA scenario, read: http://docs.spring.io/spring/docs/current/spring-framework-reference/htmlsingle/#transaction
REST faces the exact same problem as SOAP-based web services with regards to atomic transactions. There is no stateful connection, and every operation is immediately committed; performing a series of operations means other clients can see interim states.
Unless, of course, you take care of this by design. First, ask yourself: do I have a standard set of atomic operations? This is commonly the case. For example, for a banking operation, removing a sum from one account and adding the same sum to a different account is often a required atomic operation. But rather than exporting just the primitive building blocks, the REST API should provide a single "transfer" operation, which encapsulates the entire process. This provides the desired atomicity, while also making client code much simpler. This appracoh is known as low granularity services, or high-level batch operations.
If there is no simple, pre-defined set of desired atomic operation sequences, the problem is more severe. A common solution is the batch command pattern. Define one REST method to demarcate the beginning of a transaction, and another to demarcate its end (a 'commit' request). Anything sent between these sets of operations is queued by the server but not committed, until the commit request is sent.
This pattern complicates the server significantly -- it must maintain a state per client. Normally, the first operation ('begin transaction') returns a transaction ID (TID), and all subsequent operations, up to and including the commit, must include this TID as a parameter.
It is a good idea to enforce a timeout on transactions: if too much time has passed since the initial 'begin transaction' request, or since the last step, the server has the right to abort the transaction. This prevents a potential DoS attack that causes the server to waste resources by keeping too many transactions open. The client design must keep in mind that each operation must be checked for a timeout response.
It is also a good idea to allow the client to abort a transaction, by providing a 'rollback' API.
The usual care required in designing code that uses multiple concurrent transactions applies as usual in this complex design scenario. If at all possible, try to limit the use of transactions, and support high-level batch operations instead.
I take no credit of this information, i'm just a director, credit goes to This article
Also please read Transactions in REST?
You can get some handy code samples here http://www.it-soa.eu/en/resp/atomicrest/userguide/index.html
MongoDB introduced Bulk() since version 2.6, I checked the APIs, it's seems great to me.
Before this API, if I need to do a bulk insert, I have to store documents in a List, them use insert() to insert the whole List. In a multi-thread environment, concurrency should also be considered.
Is there a queue/buffer implemented inside the bulk API? each time I
put something into the bulk before execute(), the data is stored int
he queue/buffer, is that right?
Thus, I don't need to write my own queue/buffer, just use Bulk.insert() or Bulk.find().update(), is that right?
Could someone tell me more about the queue. Do I still need to concern the concurrency issues?
Since a Bulk is created like db.collection.initializeUnorderedBulkOp(), so if a bulk instance is not released, it will stay connected to the MongoDB server, is that right?
From the basic idea of "do you need to store your own list?", then not really, but I suppose it all really depends on what you are doing.
For a basic idea of the internals of what is happening under the Bulk Operations API the best place to look is at the individual command forms for each type of operation. So the relevant manual section is here.
So you can think of the "Bulk" interface as being a list or collection of all of the operations that you add to it. And you can pretty much add to that as much as you wish to ( within certain memory and practical constraints ) and consider that the "drain" method for this "queue" is the .execute() method.
As noted in the documentation there, regardless of how many operations you "queue" this will only actually send to the server in groups of 1000 operations at a time at maximum. The other thing to keep in mind is that there is no governance that makes sure that these 1000 operations requests actually fit under the 16MB BSON limit. So that is still a hard limit with MongoDB and you can only effectively form one "request" at a time that totals in less than that data limit in size when sending to the server.
So generally speaking, it is often more practical to make your own "execute/drain" requests to the sever once per every 1000 or less entries or so. Mileage may vary on this but there are some considerations to make here.
With respect to either "Ordered" or "UnOrdered" operations requests, in the former case all queued operations will be aborted in the event of an error being generated in the batch sent. Meaning of course all operations occuring after the error is encountered.
In the later case for "UnOrdered" operations, there is not fatal error reported, but rather in the WriteResult that is returned you get a "list" of any errors that are encountered, in addition to the "UnOrdered" meaning that the operations are not necessarily "applied" in any particular order, which means you cannot "queue" operations that rely on something else in the "queue" being processed before that operation is applied.
So there is the concern of how large a WriteResult you are going to get and indeed how you handle that response in your application. As stated earlier, mileage may vary to the importance of this being a very large response to a smaller and manageable response.
As far and concurrency is concerned there is really one thing to consider here. Even though you are sending many instructions to the sever in a single call and not waiting for individual transfers and acknowledgements, it is still only really processing one instruction at a time. These are either ordered as implied by the initialize method, or "un-ordered" where that is chosen and of course the operations can then run in "parallel" as it were on the server until the batch is drained.
But there is no "lock" until the "batch" completes, so it is not a substitute for a "transaction", so don't make that mistake as a design point. The same MongoDB rules apply, but the benefit here is "one write to server" and "one response back", rather that one for each operation.
Finally, as to whether there is some "server connection" held here by the API, then the answer is not there is not. As pointed to by the initial points of looking at the command internals, this "queue" building is purely "client side only". There is no communication with the server in any way until the .execute() method is called. This is "by design" and actually half the point, as mainly we don't want to be sending data to the server each time you add an operation. It is done all at once.
So "Bulk Operations" are a "client side queue". Everything is stored within the client side until the .execute() "drains" the queue and sends the operations to the server all at once. A response is then given from the server containing all of the results from the operations sent that you can handle however you wish.
Also, once .execute() is called, no more operations can be "queued" to the bulk object, and neither can .execute() be called again. Depending on implementation, you can have some further examination of the "Bulk" object and results. But the general case is where you need to send more "bulk" operations, you re-initialize and start again, just as you would with most queue systems.
Summing up:
Yes. The object effectively "queues" operations.
You don't need your own lists. The methods are "list builders" in themselves
Operations are either "Ordered" or "Un-Ordered" as far as sequence, but all operations are individually processed by the server as per normal MongoDB rules. No transactions.
The "initialize" commands do not talk to the server directly and do not "hold connections" in themselves. The only method that actually "talks" to the server is .execute()
So it is a really good tool. You get much better write operations that you do from legacy command implementations. But do not expect that this offers functionality outside of what MongoDB basically does.