For some reason we plan to use kestrel queue in our project. We do some demons, the main problem is how to to fetch data from queue with low CPU utilization and effectively. The way we implemented to fetch is if we failed to fetch data from queue more than 5 times, we sleep the thread 100ms to reduce the CPU utilization.
while (running) {
try {
LoginLogQueueEntry data = kestrelQueue.fetch();
if (null != data && data.isLegal()) {
entryCacheList.add(data); //add the data to the local caceh
resetStatus();
} else {
failedCount++;
//if there is no data in the kestrel and the local cache is not empty, insert the data into mysql database
if (failedCount == 1 && !entryCacheList.isEmpty()) {
resetStatus();
insertLogList(entryCacheList); // insert current data into database
entryCacheList.clear(); //empty local cache
}
if (failedCount >= 5 && entryCacheList.isEmpty()) {
//fail 5 times. Sleep current thread.
failedCount = 0;
Thread.sleep((sleepTime + MIN_SLEEP_TIME) % MAX_SLEEP_TIME);
}
}
//Insert 1000 rows once
if (entryCacheList.size() >= 1000) {
insertLogList(entryCacheList);
entryCacheList.clear();
}
} catch (Exception e) {
logger.warn(e.getMessage());
}
Is there any other good way to do so? The perfect the way i think is the queue can notice to the worker that we got data and fetch them .
See the "Blocking Fetches" section at http://robey.lag.net/2008/11/27/scarling-to-kestrel.html
Blocking reads are described here, under "Memcache commands": https://github.com/robey/kestrel/blob/master/docs/guide.md
You can add option flags to a get command by separating them with slashes, so to fetch an item from the "jobs" queue, waiting up to one second:
get jobs/t=1000
If nothing shows up on the queue in one second, you'll get the same empty response, just one second later than you're getting it now. :)
It's important to tune your response timeout when you do this. If you use a blocking read with a timeout of one second, but your client library's response timeout is 500 milliseconds, the library will disconnect from the server before the blocking read is finished. So make sure the response timeout is greater than the timeout you're using in the read request.
You need to use a blocking get. I couldn't track down the API docs, but I found an article suggesting that it's possible in kestrel.
Related
I have a requirement where I read a bunch of rows (thousands) from a SQL DB using Spring Batch and call a REST Service to enrich content before writing them on a Kafka topic.
When using the Spring Reactive webClient, how do I limit the number of active non-blocking service calls? Should I somehow introduce a Flux in the loop after I read data using Spring Batch?
(I understand the usage of delayElements and that it serves a different purpose, as when a single Get Service Call brings in lot of data and you want the server to slow down -- here though, my use case is a bit different in that I have many WebClient calls to make and would like to limit the number of calls to avoid out of memory issues but still gain the advantages of non-blocking invocations).
Very interesting question. I pondered about it and I thought of a couple of ideas on how this could be done. I will share my thoughts on it and hopefully there are some ideas here that perhaps help you with your investigation.
Unfortunately, I'm not familiar with Spring Batch. However, this sounds like a problem of rate limiting, or the classical producer-consumer problem.
So, we have a producer that produces so many messages that our consumer cannot keep up, and the buffering in the middle becomes unbearable.
The problem I see is that your Spring Batch process, as you describe it, is not working as a stream or pipeline, but your reactive Web client is.
So, if we were able to read the data as a stream, then as records start getting into the pipeline those would get processed by the reactive web client and, using back-pressure, we could control the flow of the stream from producer/database side.
The Producer Side
So, the first thing I would change is how records get extracted from the database. We need to control how many records get read from the database at the time, either by paging our data retrieval or by controlling the fetch size and then, with back pressure, control how many of those are sent downstream through the reactive pipeline.
So, consider the following (rudimentary) database data retrieval, wrapped in a Flux.
Flux<String> getData(DataSource ds) {
return Flux.create(sink -> {
try {
Connection con = ds.getConnection();
con.setAutoCommit(false);
PreparedStatement stm = con.prepareStatement("SELECT order_number FROM orders WHERE order_date >= '2018-08-12'", ResultSet.TYPE_FORWARD_ONLY);
stm.setFetchSize(1000);
ResultSet rs = stm.executeQuery();
sink.onRequest(batchSize -> {
try {
for (int i = 0; i < batchSize; i++) {
if (!rs.next()) {
//no more data, close resources!
rs.close();
stm.close();
con.close();
sink.complete();
break;
}
sink.next(rs.getString(1));
}
} catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
In the example above:
I control the amount of records we read per batch to be 1000 by setting a fetch size.
The sink will send the amount of records requested by the subscriber (i.e. batchSize) and then wait for it to request more using back pressure.
When there are no more records in the result set, then we complete the sink and close resources.
If an error occurs at any point, we send back the error and close resources.
Alternatively I could have used paging to read the data, probably simplifying the handling of resources by having to reissue a query at every request cycle.
You may consider also doing something if subscription is cancelled or disposed (sink.onCancel, sink.onDispose) since closing the connection and other resources is fundamental here.
The Consumer Side
At the consumer side you register a subscriber that only requests messages at a speed of 1000 at the time and it will only request more once it has processed that batch.
getData(source).subscribe(new BaseSubscriber<String>() {
private int messages = 0;
#Override
protected void hookOnSubscribe(Subscription subscription) {
subscription.request(1000);
}
#Override
protected void hookOnNext(String value) {
//make http request
System.out.println(value);
messages++;
if(messages % 1000 == 0) {
//when we're done with a batch
//then we're ready to request for more
upstream().request(1000);
}
}
});
In the example above, when subscription starts it requests the first batch of 1000 messages. In the onNext we process that first batch, making http requests using the Web client.
Once the batch is complete, then we request another batch of 1000 from the publisher, and so on and so on.
And there your have it! Using back pressure you control how many open HTTP requests you have at the time.
My example is very rudimentary and it will require some extra work to make it production ready, but I believe this hopefully offers some ideas that can be adapted to your Spring Batch scenario.
As I wrote in title we need in project notify or execute method of some thread by another. This implementation is part of long polling. In following text describe and show my implementation.
So requirements are that:
UserX send request from client to server (poll action) immediately when he got response from previous. In service is executed spring async method where thread immediately check cache if there are some new data in database. I know that cache is usually used for methods where for specific input is expected specific output. This is not that case, because I use cache to reduce database calls and output of my method is always different. So cache help me store notification if I should check database or not. This checking is running in while loop which end when thread find notification to read database in cache or time expired.
Assume that UserX thread (poll action) is currently in while loop and checking cache.
In that moment UserY (push action) send some data to server, data are stored in database in separated thread, and also in cache is stored userId of recipient.
So when UserX is checking cache he found id of recipient (id of recipient == his id in this case), and then break loop and fetch these data.
So in my implementation I use google guava cache which provide manually write.
private static Cache<Long, Long> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
In create method I store id of user which should read these data.
public void create(Data data) {
dataRepository.save(data);
cache.save(data.getRecipient(), null);
System.out.println("SAVED " + userId + " in " + Thread.currentThread().getName());
}
and here is method of polling data:
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = findRecent(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = findRecent(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
After User X got response he send poll request again and whole process is repeated.
Can you tell me if is this application design for long polling in java (spring) is correct or exists some better way? Key point is that when user call poll request, this request should be holded for new data for some time and not response immediately. This solution which I show above works, but question is if it will be works also for many users (1000+). I worry about it because of pausing threads which should make slower another requests when no threads will be available in pool. Thanks in advice for your effort.
Check Web Sockets. Spring supports it from version 4 on wards. It doesn't require client to initiate a polling, instead server pushes the data to client in real time.
Check the below:
https://spring.io/guides/gs/messaging-stomp-websocket/
http://www.baeldung.com/websockets-spring
Note - web sockets open a persistent connection between client and server and thus may result in more resource usage in case of large number of users. So, if you are not looking for real time updates and is fine with some delay then polling might be a better approach. Also, not all browsers support web sockets.
Web Sockets vs Interval Polling
Longpolling vs Websockets
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
In your current approach, if you are having a concern with large number of threads running on server for multiple users then you can trigger the polling from front-end every time instead. This way only short lived request threads will be triggered from UI looking for any update in the cache. If there is an update, another call can be made to retrieve the data. However don't hit the server every other second as you are doing otherwise you will have high CPU utilization and user request threads may also suffer. You should do some optimization on your timing.
Instead of hitting the cache after a delay of 1 sec for 100 times, you can apply an intelligent algorithm by analyzing the pattern of cache/DB update over a period of time.
By knowing the pattern, you can trigger the polling in an exponential back off manner to hit the cache when the update is most likely expected. This way you will be hitting the cache less frequently and more accurately.
channel.basicQos(1);
while (true) {
GetResponse res = channel.basicGet(TEST_QUEUE, false);
if (res != null) {
deliveryTag = res.getEnvelope().getDeliveryTag();
}
// Handle all messages If the condition is true
if (condition) {
// nack all messages unhandled previously
channel.basicNack(deliveryTag - 1, true, true);
// ack current message only
channel.basicAck(deliveryTag, false);
}
else {
// Do not handle current message and continue to get next one
}
}
Q1.
I'm not sure If I can use both nack and ack at the same time.
Can I use deliveryTag - 1 to indicate all previous messages?
In short, I want to skip all messages which do not meet the if condition.
If current message meets the condition then nack all skipped messages and ack current one.
By doing this, I want to delay handling some particular messages.
Q2.
I'm afraid If I write as while (true) and there are multiple workers running then channel.basicQos(1) will not work as expected.
Should I write code like this to limit the count? or How should I write to ensure that all other workers can get messages evenly?
int prefetch = 1;
int count = 0;
while (count++ <= prefetch) {
}
Q3.
I've noticed The worker program will not terminate as long as the connection is open.
How long will the connection be opend and should I need to close it manually?
Finally,
RabbitMQ java client API vs AmqpTemplate vs RabbitTemplate which one is more suitable in this case(not using the MessageListener(ChannelAwareMessageListener) model)?
Q1 - it should work ok. Have you tried it and found problems? Yes, the tag is incremented for each delivery.
Q2 - basicQos has no bearing on basicGet() - it's only used with basicConsume().
Q3 - You need to close the connection when you are complete.
Finally; it depends. If you want Spring's higher level support (message conversion etc), then use it; if you want to deal with the raw data API, don't use Spring.
The RabbitTemplate doesn't directly support basicGet with user managed acks/nacks, except via its execute method with a channel callback.
While querying ES extensively, I get
Failed to execute [org.elasticsearch.action.search.SearchRequest#59e634e2] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.
action.SearchServiceTransportAction$23#75bd024b
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:79)
at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:551)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:228)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:83)
on a quite regular basis.
My plan, now, is to pause the query requests until the queue load is lower than x. You can query the client for its stats
client.admin().cluster().threadPool().stats().iterator();
But since my client is not a data node (I presume that's why), I get queue=0 returned, while the server node throw the above error.
I know why this gets thrown, and I know how to update the setting, but that just postpones this error, and creates others...
How do I ask the cluster nodes what their queue load is?
PS: I'm using the Java Api
What I've tried, without requested result, blank line indicative of another try, unless otherwise specifiied
//Nodes stats
final NodesStatsResponse nodesStatsResponse = client.admin().cluster().prepareNodesStats().execute().actionGet();
final NodeStats nodeStats = nodesStatsResponse.getNodes()[0];
final String nodeId = nodeStats.getNode().getId(); // need this later on
// same as before, but with explicit NodesStatsRequest (with id)
final NodesStatsResponse response = client.admin().cluster().nodesStats(new NodesStatsRequest(nodeId)).actionGet();
final NodeStats[] nodeStats2 = response.getNodes();
for (NodeStats nodeStats3 : nodeStats2) {
Stats stats = nodeStats3.getThreadPool().iterator().next();
}
// Cluster?
final ClusterStatsRequest clusterStatsRequest = new ClusterStatsRequestBuilder(client.admin().cluster()).request();
final ClusterStatsResponse clusterStatsResponse = client.admin().cluster().clusterStats(clusterStatsRequest).actionGet();
final ClusterStatsNodes clusterStatsNodes = clusterStatsResponse.getNodesStats();
// Nodes info?
final NodesInfoResponse infoResponse = client.admin().cluster().nodesInfo(new NodesInfoRequest(nodeId)).actionGet();// here
final NodeInfo[] nodeInfos = infoResponse.getNodes();
for (final NodeInfo nodeInfo : nodeInfos) {
final ThreadPoolInfo info = nodeInfo.getThreadPool();
final Iterator<Info> infoIterator = info.iterator();
while (infoIterator.hasNext()) {
final Info realInfo = infoIterator.next();
SizeValue sizeValue = realInfo.getQueueSize();
// is no == null, then (¿happens?, was expecting a nullpointer, but Thread disappeared)
if (sizeValue == null)
continue;
// normal queue size, no load (oddly found 1000 (expected), and one of 200 in one node?)
final long queueSize = sizeValue.getSingles();
}
}
The issue is that some of the processes need to be called instantly (e.g. user requests), whereas others may wait if the database is too busy (background processes). Preferably, I'd assign a certain amount of the queue to processes that stand on immediate requests, and the other part on background processes (but I haven't seen this option).
Update
It appears, which I didn't expect, that you can get a query overload with a single bulk query, when the total amount of separate searches exceed 1000 (when x shards, or x indices, divide by 1000/x for the number of searches). So bulking,,, not an option, unless you can make a single query. So when you target on 700 search results at once (taking in account the above statement), you'll need to know whether more than 300 items reside in the queue, for then it'll throw stuff.
To sum up:
Assume the load, per call, is the maximum bulkrequest so I cannot combine requests. How, then, can I start pausing requests before elasticsearch starts throwing the above stated exception. So I can pause a part of my application, but not the other? If I know the queue is filled, say, half way, the background process must sleep some time. How do I know the (approximated) queue load?
The way you are trying to look at the queue usage is wrong, as you are not looking at the correct statistics.
Have a look at this piece of code:
final NodesStatsResponse response = client.admin().cluster().prepareNodesStats().setThreadPool(true).execute().actionGet();
final NodeStats[] nodeStats2 = response.getNodes();
for (NodeStats nodeStats3 : nodeStats2) {
ThreadPoolStats stats = nodeStats3.getThreadPool();
if (stats != null)
for (ThreadPoolStats.Stats threadPoolStat : stats) {
System.out.println("node `" + nodeStats3.getNode().getName() + "`" + " has pool `" + threadPoolStat.getName() + "` with current queue size " + threadPoolStat.getQueue());
}
}
First of all you need setThreadPool(true) to be able to get back the thread pool statistics otherwise it will be null.
Secondly, you need ThreadPoolStats not ThreadPoolInfo which is for thread pool settings.
So, it's your second attempt, but incomplete. The 1000 you were seeing was the setting itself (the max queue size), not the actual load.
I'm hoping this is not the answer, source https://www.elastic.co/guide/en/elasticsearch/guide/current/_monitoring_individual_nodes.html#_threadpool_section:
Bulk Rejections
If you are going to encounter queue rejections, it will most likely be
caused by bulk indexing requests. It is easy to send many bulk
requests to Elasticsearch by using concurrent import processes. More
is better, right?
In reality, each cluster has a certain limit at which it can not keep
up with ingestion. Once this threshold is crossed, the queue will
quickly fill up, and new bulks will be rejected.
This is a good thing. Queue rejections are a useful form of back
pressure. They let you know that your cluster is at maximum capacity,
which is much better than sticking data into an in-memory queue.
Increasing the queue size doesn’t increase performance; it just hides
the problem. If your cluster can process only 10,000 docs per second,
it doesn’t matter whether the queue is 100 or 10,000,000—your cluster
can still process only 10,000 docs per second.
The queue simply hides the performance problem and carries a real risk
of data-loss. Anything sitting in a queue is by definition not
processed yet. If the node goes down, all those requests are lost
forever. Furthermore, the queue eats up a lot of memory, which is not
ideal.
It is much better to handle queuing in your application by gracefully
handling the back pressure from a full queue. When you receive bulk
rejections, you should take these steps:
Pause the import thread for 3–5 seconds. Extract the rejected actions
from the bulk response, since it is probable that many of the actions
were successful. The bulk response will tell you which succeeded and
which were rejected. Send a new bulk request with just the rejected
actions. Repeat from step 1 if rejections are encountered again. Using
this procedure, your code naturally adapts to the load of your cluster
and naturally backs off.
Rejections are not errors: they just mean you should try again later.
Particularly this When you receive bulk rejections, you should take these steps I don't like. We should be able to handle oncoming problems on forehand.
I have an application that makes HTTP requests to a site, ant then retrives the responses, inspects them and if the contain specific keywords, writes both the HTTP request and response to an XML file. This application uses a spider to map out all the URLS of a site and then sends request(each URL in the sitemap is fed to a separate thread that sends the request). This way I wont be able to know when all the requests have been sent. At the end of all I request i want to convert the XML file to some other format. So in order to find out when the request have ended I use the following strategy :
I store the time of each request in a varible (when a new request is sent at a time later than the time in the variable, the varible is updated). Also I start a thread to monitor this time, and if the difference in the current time and the time in the varible is more than 1 min, I know that the sending of requests has ceased. I use the following code for this purpose :
class monitorReq implements Runnable{
Thread t;
monitorReq(){
t=new Thread(this);
t.start();
}
public void run(){
while((new Date().getTime()-last_request.getTime()<60000)){
try{
Thread.sleep(30000);//Sleep for 30 secs before checking again
}
catch(IOException e){
e.printStackTrace();
}
}
System.out.println("Last request happened 1 min ago at : "+last_request.toString());
//call method for conversion of file
}
}
Is this approach correct? Or is there a better way in which I can implement the same thing.
Your current approach is not reliable. You will get into race conditions - if the thread is updating the time & the other thread is reading it at the same time. Also it will be difficult to do the processing of requests in multiple threads. You are assuming that task finishes in 60 seconds..
The following are better approaches.
If you know the number of requests you are going to make before hand you can use a CountDownLatch
main() {
int noOfRequests = ..;
final CountDownLatch doneSignal = new CountDownLatch(noOfRequests);
// spawn threads or use an executor service to perform the downloads
for(int i = 0;i<noOfRequests;i++) {
new Thread(new Runnable() {
public void run() {
// perform the download
doneSignal.countDown();
}
}).start();
}
doneSignal.await(); // This will block till all threads are done.
}
If you don't know the number of requests before hand then you can use the executorService to perform the downloads / processing using a thread pool
main() {
ExecutorService executor = Executors.newCachedThreadPool();
while(moreRequests) {
executor.execute(new Runnable() {
public void run() {
// perform processing
}
});
}
// finished submitting all requests for processing. Wait for completion
executor.shutDown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.Seconds);
}
General notes:
classes in Java should start with Capital Letters
there seems to be no synchronization between your threads; access to last_request should probably be synchronized
Using System.currentTimeMillis() would save you some objects' creation overhead
swallowing an exception like this is not a good practice
Answer:
Your way of doing it is acceptable. There is not much busy waiting and the idea is as simple as it gets. Which is good.
I would consider changing the wait time to a lower value; there is so little data, that even doing this loop every second will not take too much processing power, and will certainly improve the rection time from you app.