KafkaProducer: Difference between `callback` and returned `Future`?

KafkaProducer: Difference between `callback` and returned `Future`? - java

The KafkaProducer send method both returns a Future and accepts a Callback.
Is there any fundamental difference between using one mechanism over the other to execute an action upon completion of the sending?

Looking at the documentation you linked to it looks like the main difference between the Future and the Callback lies in who initiates the "request is finished, what now?" question.
Let's say we have a customer C and a baker B. And C is asking B to make him a nice cookie. Now there are 2 possible ways the baker can return the delicious cookie to the customer.
Future
The baker accepts the request and tells the customer: Ok, when I'm finished I'll place your cookie here on the counter. (This agreement is the Future.)
In this scenario, the customer is responsible for checking the counter (Future) to see if the baker has finished his cookie or not.
blocking
The customer stays near the counter and looks at it until the cookie is put there (Future.get()) or the baker puts an apology there instead (Error : Out of cookie dough).
non-blocking
The customer does some other work, and once in a while checks if the cookie is waiting for him on the counter (Future.isDone()). If the cookie is ready, the customer takes it (Future.get()).
Callback
In this scenario the customer, after ordering his cookie, tells the baker: When my cookie is ready please give it to my pet robot dog here, he'll know what to do with it (This robot is the Callback).
Now the baker when the cookie is ready gives the cookie to the dog and tells him to run back to it's owner. The baker can continue baking the next cookie for another customer.
The dog runs back to the customer and starts wagging it's artificial tail to make the customer aware that his cookie is ready.
Notice how the customer didn't have any idea when the cookie would be given to him, nor was he actively polling the baker to see if it was ready.
That's the main difference between the 2 scenario's. Who is responsible for initiating the "your cookie is ready, what do you want to do with it?" question. With the Future, the customer is responsible for checking when it's ready, either by actively waiting, or by polling every now and then. In case of the callback, the baker will call back to the provided function.
I hope this answer gives you a better insight in what a Future and Calback actually are. Once you got the general idea, you could try to find out on which thread each specific thing is handled. When a thread is blocked, or in what order everything completes. Writing some simple programs that print statements like: "main client thread: cookie recieved" could be a fun way to experiment with this.

The asynchronous approach
producer.send(record, new Callback(){
#Override
onComplete(RecordMetadata rm, Exception ex){...}
})
gives you better throughput comparing to synchronous
RecordMetadata rm = producer.send(record).get();
since you don't wait for acknowledgements in first case.
Also in asynchronous way ordering is not guaranteed, whereas in synchronous it is - message is sent only after acknowledgement received.
Another difference could be that in synchronous call in case of exception you can stop sending messages straightaway after the exception occurs, whereas in second case some messages will be sent before you discover that something is wrong and perform some actions.
Also note that in asynchronous approach the number of messages which are "in fligh" is controlled by max.in.flight.requests.per.connection parameter.
Apart from synchronous and asynchronous approaches you can use Fire and Forget approach, which is almost the same as synchronous, but without processing the returned metadata - just send the message and hope that it will reach the broker (knowing that most likely it will happen, and producer will retry in case of recoverable errors), but there is a chance that some messages will be lost:
RecordMetadata rm = producer.send(record);
To summarize:
Fire and Forget - fastest one, but some messages could be lost;
Synchronous - slowest, use it if you cannot afford to lose messages;
Asynchronous - something in between.

The main difference is whether you want to block the calling thread waiting for the acknowledgment.
The following using the Future.get() method would block the current thread until the send is completed before performing some action.
producer.send(record).get()
// Do some action
When using a Callback to perform some action, the code will execute in the I/O thread so it's non-blocking for the calling thread.
producer.send(record,
new Callback() {
// Do some action
}
});
Though the docs says it 'generally' executed in the producer:
Note that callbacks will generally execute in the I/O thread of the producer and so should be reasonably fast or they will delay the sending of messages from other threads. If you want to execute blocking or computationally expensive callbacks it is recommended to use your own Executor in the callback body to parallelize processing.

My observations based on The Kafka Producer documentation:
Future gives you access to synchronous processing
Future might not guarantee acknowledgement. My understanding is that a Callback will execute after acknowledgement
Callback gives you access to fully non-blocking asynchronous processing.
There are also guarantees on the ordering of execution for a callback on the same partition
Callbacks for records being sent to the same partition are guaranteed
to execute in order.
My other opinion that the Future return object and the Callback 'pattern' represents two different programming styles and I think that this is the fundamental difference:
The Future represents Java's Concurrency Model Style.
The Callback represents Java's Lambda Programming Style (because Callback actually satisfies the requirement for a Functional Interface)
You can probably end up coding similar behaviors with both the Future and Callback styles, but in some use cases it looks like one might style be more advantageous than the other.

send() is a method to start publishing a message on Kafka Cluster. The send() method is an asynchronous call that says that the send method accumulates message in Buffer and return back immediately. This can be used with linger.ms to batch publish messages for better performance. We can handle exception and control using the call send method with synchronous using get method on Future or asynchronous with a callback.
Each method has its own pros and cons and can be decided based on use cases.
Asynchronous send(Fire & Forget):
We call the send method as below to call publish a message without waiting for any success or error response.
producer.send(new ProducerRecord<String, String>("topic-name", "key", "value"));
This scenario will not wait to get complete first message start sending other messages to get published. In case of exception, producer retry based on retry config parameter but if the message still fails after retrying Kafka Producer never know about this. We may lot some message in this case but if ok with few message loss this provides high throughput and high latency.
Synchronous send
A simple way to send message synchronously is to use the get() method
RecordMetadata recMetadata = producer.send(new ProducerRecord<String, String>("topic-name", "key", "value")).get();
Producer.send returns Future of RecordMetadata and when we call .get() method it will get a reply from Kafka. We can catch Error in case of error or return RecordMetadata in case of success. RecordMetadata contains offset, partition, timestamp to log the information. It's slow but give high reliability and guarantee to deliver the message.
Asynchronous send with callback
We can also call the send() method with a callback function which returns a response once the message gets completed. This is good if you like to send messages in an asynchronous way means not to wait to complete the job but at the same time handle Error or update status about message delivery.
producer.send(record, new Callback(){
#Override
onComplete(RecordMetadata recodMetadata, Exception ex){...}
})
Note: Please don’t confuse with ack & retries with asynchronous send call. Ack and retries will apply on each send call whether its synchronous or asynchronous call, the only matter how you handle return messages and failure scenario. For example, if you send asynchronous send still ack and retries rule gets applied but will be on an independent thread without blocking other thread to send parallel records. The only challenge we will not be aware of in case of failure and time when its message completed successfully.

Related

DSL Integration Flows with retry mechanism and how it works

I have implemented a retry mechanism which works well based on the following:
https://github.com/spring-projects/spring-integration-samples/issues/237
The application consumes events from kafka, transforms those events and sends them as an HTTP request to a remote service, so it's in the integration flow that sends the HTTP request where the retry mechanism is implemented.
I was worried about sending the requests to the remote service in the same order as they come in from kafka during a temporary failure (network glitch) to avoid an overriding, but fortunately it looks like the order is kept, keep me honest here.
It seems that during the retry process all events coming in are "put on hold" and once the remote service is back up before the last try, all events are sent.
I would like to know two things here:
Am I correct with my assumption? Is this how the retry mechanism works by default?
I'm worried about the events getting back (or stack) up due to the amount of time it takes to finish the current flow execution. Is there something here I should take into consideration?
I think I might use an ExecutorChannel so that events could get processed in parallel, but by doing that I wouldn't be able to keep the order of the events.
Thanks.

Your assumption is correct. The retry is done withing the same thread and it is blocked for the next event until the send is successful or retry is exhausted. And it is really done in the same Kafka consumer thread, so new records are not pulled from the topic until retry is done.
It is not a correct architecture to shift the logic into a new thread, e.g. using an ExecutorChannel since Kafka is based on an offset commit which cannot be done out of order.

Waiting for a HTTP request in middle of the main thread

I have a queue and I have this consumer written in java for this queue. After consuming, we are executing an HTTP call to a downstream partner and this is a one-way asynchronous call. After executing this request, the downstream partner will send an HTTP request back to our system with the response for the initial asynchronous call. This response is needed for the same thread that we executed the initial asynchronous call. This means we need to expose an endpoint within the thread so the downstream system can call and send the response back. I would like to know how can I implement a requirement like this.
PS : We also can get the same response to a different web service and update a database row with the response. But I'm not sure how to stop the main thread and listen to the database row when the response is needed.
Hope you understood what I want with this requirement.

My response based on some assumptions. (I didn't wait for you respond to my comment since I found the problem had some other interesting features anyhow.)
the downstream partner will send an HTTP request back to our system
This necessitates that you have a listening port (ie, a server) running on this side. This server could be in the same JVM or a different one. But...
This response is needed for the same thread
This is a little confusing because at a high level, reusing the thread programmatically itself is not usually our interest but reusing the object (no matter in which thread). To reuse threads, you may consider using ExecutorService. So, what you may try to do, I have tried to depict in this diagram.
Here are the steps:
"Queue Item Consumer" consumes item from the queue and sends the request to the downstream system.
This instance of the "Queue Item Consumer" is cached for handling the request from the downstream system.
There is a listener running at some port within the same JVM to which the downstream system sends its request.
The listener forwards this request to the "right" cached instance of "Queue Item Consumer" (you have to figure out a way for this based on your caching mechanism). May be some header has to be present in the request from the downstream system to identify the right handler on this side.
Hope this works for you.

Returning synchronous message from service, but then doing asynchronous processing - concern about hanging threads?

Essentially I've written a service in Java that will do initial synchronous processing (a couple simple calls to other web services). Then, after that processing is done, I return an acknowledgement message to the caller, saying I've verified their request and there is now downstream processing happening in the background asynchronously.
In a nutshell, what I'm concerned about is the complexity of the async processing. The sum of those async calls can take up to 2-3 minutes depending on certain parameters sent. My thought here is: what if there's a lot of traffic at once hitting my service, and there are a bunch of hanging threads in the background, doing a large chunk of processing. Will there be bad data as a result? (like one request getting mixed in with a previous request etc)
The code follows this structure:
Validation of headers and params in body
Synchronous processing
Return acknowledgement message to the caller
Asynchronous processing
For #4, I've simply made a new thread and call a method that does all the async processing within it. Like:
new Thread()
{
#Override
public void run()
{
try {
makeDownstreamCalls(arg1, arg2 , arg3, arg4);
} catch (Exception e) {
e.printStackTrace();
}
}
}.start();
I'm basically wondering about unintended consequences of lots of traffic hitting my service. An example I'm thinking about: a thread executing downstream calls for request A, and then another request comes in, and a new thread has to be made to execute downstream calls for request B. How is request B handled in this situation, and what happens to request A, which is still in-progress? Will the async calls in request A just terminate in this case? Or can each distinct request, and thread, execute in parallel just fine and complete, without any strange consequences?

Well, the answer depends on your code, of which you posted a small part, so my answer contains some guesswork. I'll assume that we're talking about some sort of multi-threaded server which accepts client requests, and that those request come to some handleRequest() method which performs the 4 steps you've mentioned. I'll also assume that the requests aren't related in any way and don't affect each other (so for instance, the code doesn't do something like "if a thread already exists from a previous request then don't create a new thread" or anything like that).
If that's the case, then your handleRequest() method can be simultaneously invoked by different server threads concurrently. And each will execute the four steps you've outlined. If two requests happen simultaneously, then a server thread will execute your handler for request A, and a different one will execute it for B at the same time. If during the processing of a request, a new thread is created, then one will be created for A, another for B. That way, you'll end up with two threads performing makeDownstreamCalls(), one with A's parameters one with B's.
In practice, that's probably a pretty bad idea. The more threads your program will create, the more context-switching the OS has to do. You really don't want the number of requests to increase the number of threads in your application endlessly. Modern OSes are capable of handling hundreds or even thousands of threads (as long as they're bound by IO, not CPU), but it comes at a cost. You might want to consider using a Java executor with a limited number of threads to avoid crushing your process or even OS.
If there's too much load on a server, you can't expect your application to handle it. Process what you can within the limit of the application, and reject further request. Accepting more requests when you're fully loaded means that your application crashes, and none of the requests are processed - this is known as "Load Shedding".

Async API design client

Lets say I create an async REST API in Spring MVC with Java 8's Completeable.
How is this called in the client? If its non blocking, does the endpoint return something before processing? Ie
#RequestMapping("/") //GET method
public CompletableFuture<String> meth(){
thread.sleep(10000);
String result = "lol";
return CompletableFuture.completedFuture(result);
}
How does this exactly work? (This code above is just a randomly made code I just thought of).
When I send a GET request from say google chrome # localhost:3000/ then what happens? I'm a newbie to async APIs, would like some help.

No, the client doesn't know it's asynchronous. It'll have to wait for the result normally. It's just the server side that benefits from freeing up a worker thread to handle other requests.

In this version it's pointless, because CompletableFuture.completedFuture() creates a completed Future immediately.
However in a more complex piece of code, you might return a Future that is not yet complete. Spring will not send the response body until some other thread calls complete() on this Future.
Why not just use a new thread? Well, you could - but in some situations it might be more efficient not to. For example you might put a task into an Executor to be handled by a small pool of threads.
Or you might fire off a JMS message asking for the request to be handled by a completely separate machine. A different part of your program will respond to incoming JMS messages, find the corresponding Future and complete it. There is no need for a thread dedicated to this HTTP request to be alive while the work is being done on another system.
Very simple example:
#RequestMapping("/employeenames/{id}")
public CompletableFuture<String> getName(#PathVariable String id){
CompletableFuture<String> future = new CompletableFuture<>();
database.asyncSelect(
name -> future.complete(name),
"select name from employees where id = ?",
id
);
return future;
}
I've invented a plausible-ish API for an asynchronous database client here: asyncSelect(Consumer<String> callback, String preparedstatement, String... parameters). The point is that it fires off the query, then does not block the tread waiting for the DB to respond. Instead it leaves a callback (name -> future.complete(name)) for the DB client to invoke when it can.
This is not about improving API response times -- we do not send an HTTP response until we have a payload to provide. This is about using the resources on the server more efficiently, so that while we're waiting for the database to respond it can do other things.
There is a related, but different concept, of asynch REST, in which the server responds with 202 Accepted and a header like Location: /queue/12345, allowing the client to poll for the result. But this isn't what the code you asked about does.

CompletableFuture was introduced by Java to make handling complex asynchronous programming. It lets the programmer combine and cascade async calls, and offers the static utility methods runAsync and supplyAsync to abstract away the manual creation of threads.
These methods dispatch tasks to Java’s common thread pool by default or a custom thread pool if provided as an optional argument.
If a CompletableFuture is returned by an endpoint method and #complete is never called, the request will hang until it times out.

After creating SQS message, how can I monitor when it gets deleted?

My program creates a message in a SQS queue and then needs to wait for one of the workers pulling work on the queue to process it. I want to monitor the status of a message to determine when it gets deleted, since that would be my indicator that the work is done. But I can't figure out a way to do this with the SQS API.
SendMessageRequest msgRequest = new SendMessageRequest(SQS_QUEUE_URL, messageBody);
SendMessageResult result = sqsClient.sendMessage(msgRequest);
String msgId = result.getMessageId();
// so, in theory, this is what I WANT to do...
while(!sqsClient.wasThisMessageDeletedYet(msgId))
Thread.sleep(1000L);
// continue, confident that because the message was deleted, I can rely upon the fact that the result of the Worker is now stashed where it's supposed to be
What's the right way to do "wasThisMessageDeletedYet(id)"?

I'm afraid such an API endpoint doesn't exist; looking at the API reference (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/Welcome.html), you could see that there are no methods for querying messages.
Maybe you could try with "change message visibility", but that:
has side effects
you would need to know the receipt handle which you obtain when receiving the message
So I suppose your best bet is to store that state in some external database (if you want to stay in Amazon-land, maybe Dynamo?). With a simple message id -> boolean mapping indicating if messages have been processed or not.

Another option (but similar) is for the consumer to publish the status in a response queue. The wait will have to be done asynchronously (a Future perhaps).
Obviously there an overhead in processing as well as complexity in programming due to the asynchronous nature of interactions. But typically it is done this way.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.