I'm using the AWS SDK for Java and I'm using the buffering async sqs client to batch requests so that I reduce costs.
When my application shuts down, I want to ensure that no messages are waiting in the buffer, but there's no .flush() method I can see on the client.
Does AmazonSQSBufferedAsyncClient.shutdown() flush my messages when called? I looked at the source code and it's unclear. The method calls shutdown() on each QueueBuffer that it has, but inside QueueBuffer.shutdown() it says
public void shutdown() {
//send buffer does not require shutdown, only
//shut down receive buffer
receiveBuffer.shutdown();
}
Further, the documentation for .shutdown() says:
Shuts down this client object, releasing any resources that might be
held open. This is an optional method, and callers are not expected
to call it, but can if they want to explicitly release any open
resources. Once a client has been shutdown, it should not be used to
make any more requests.
For this application, I need to ensure no messages get lost while being buffered. Do I need to handle this manually using the normal AmazonSQSClient instead of the buffering/async one?
With 1.11.37 version of the SDK, there is a configuration parameter just for this purpose in QueueBufferConfig.
AmazonSQSBufferedAsyncClient bufClient =
new AmazonSQSBufferedAsyncClient(
realAsyncClient,
new QueueBufferConfig( )
.withFlushOnShutdown(true)
);
There is a method to explicitly call the flush but it is not accessible and actually I could not find any call to that method in amazon code. It seems like something is missing.
When you call shutdown on the async client it executes the following code:
public void shutdown() {
for( QueueBuffer buffer : buffers.values() ) {
buffer.shutdown();
}
realSQS.shutdown();
}
And QueueBuffer#shutdown() looks like this:
/**
* Shuts down the queue buffer. Once this method has been called, the
* queue buffer is not operational and all subsequent calls to it may fail
* */
public void shutdown() {
//send buffer does not require shutdown, only
//shut down receive buffer
receiveBuffer.shutdown();
}
So it seems like they are intentionally not calling to sendBuffer.shutdown() which is the method that will flush every message in the buffer that are not still sent.
Did you find a case when you shutdown the SQS Client and it lost messages? It looks like they are aware of that and that case should not happen, but if you want to be sure you can call that method with reflection which it is really nasty but it will satisfy your needs.
AmazonSQSBufferedAsyncClient asyncSqsClient = <your initialization code of the client>;
Field buffersField = ReflectionUtils.findField(AmazonSQSBufferedAsyncClient.class, "buffers");
ReflectionUtils.makeAccessible(buffersField);
LinkedHashMap<String, Object> buffers = (LinkedHashMap<String, Object>) ReflectionUtils.getField(buffersField, asyncSqsClient);
for (Object buffer : buffers.values()) {
Class<?> clazz = Class.forName("com.amazonaws.services.sqs.buffered.QueueBuffer");
SendQueueBuffer sendQueueBuffer = (SendQueueBuffer) ReflectionUtils.getField(ReflectionUtils.findField(clazz, "sendBuffer"), buffer);
sendQueueBuffer.flush();//finally
}
Something like that should work, I guess. Let me know!
Related
I need to call an upstream service (Azure Blob Service) to push data to an OutputStream, which then i need to turn around and push it back to the client, thru akka. Without akka (and just servlet code), i'd just get the ServletOutputStream and pass it to the azure service's method.
The closest i can try to stumble upon, and clearly this is wrong, is something like this
Source<ByteString, OutputStream> source = StreamConverters.asOutputStream().mapMaterializedValue(os -> {
blobClient.download(os);
return os;
});
ResponseEntity resposeEntity = HttpEntities.create(ContentTypes.APPLICATION_OCTET_STREAM, preAuthData.getFileSize(), source);
sender().tell(new RequestResult(resposeEntity, StatusCodes.OK), self());
The idea is i'm calling an upstream service to get an outputstream populated by calling
blobClient.download(os);
It seems like the the lambda function gets called and returns, but then afterwards it fails, because there's no data or something. As if i'm not supposed to be have that lambda function do the work, but perhaps return some object that does the work? Not sure.
How does one do this?
The real issue here is that the Azure API is not designed for back-pressuring. There is no way for the output stream to signal back to Azure that it is not ready for more data. To put it another way: if Azure pushes data faster than you are able to consume it, there will have to be some ugly buffer overflow failure somewhere.
Accepting this fact, the next best thing we can do is:
Use Source.lazySource to only start downloading data when there is downstream demand (aka. the source is being run and data is being requested).
Put the download call in some other thread so that it continues executing without blocking the source from being returned. Once way to do this is with a Future (I'm not sure what Java best practices are, but should work fine either way). Although it won't matter initially, you may need to choose an execution context other than system.dispatcher - it all depends on whether download is blocking or not.
I apologize in advance if this Java code is malformed - I use Akka with Scala, so this is all from looking at the Akka Java API and Java syntax reference.
ResponseEntity responseEntity = HttpEntities.create(
ContentTypes.APPLICATION_OCTET_STREAM,
preAuthData.getFileSize(),
// Wait until there is downstream demand to intialize the source...
Source.lazySource(() -> {
// Pre-materialize the outputstream before the source starts running
Pair<OutputStream, Source<ByteString, NotUsed>> pair =
StreamConverters.asOutputStream().preMaterialize(system);
// Start writing into the download stream in a separate thread
Futures.future(() -> { blobClient.download(pair.first()); return pair.first(); }, system.getDispatcher());
// Return the source - it should start running since `lazySource` indicated demand
return pair.second();
})
);
sender().tell(new RequestResult(responseEntity, StatusCodes.OK), self());
The OutputStream in this case is the "materialized value" of the Source and it will only be created once the stream is run (or "materialized" into a running stream). Running it is out of your control since you hand the Source to Akka HTTP and that will later actually run your source.
.mapMaterializedValue(matval -> ...) is usually used to transform the materialized value but since it is invoked as a part of materialization you can use that to do side effects such as sending the matval in a message, just like you have figured out, there isn't necessarily anything wrong with that even if it looks funky. It is important to understand that the stream will not complete its materialization and become running until that lambda completes. This means problems if download() is blocking rather than forking off some work on a different thread and immediately returning.
There is however another solution: Source.preMaterialize(), it materializes the source and gives you a Pair of the materialized value and a new Source that can be used to consume the already started source:
Pair<OutputStream, Source<ByteString, NotUsed>> pair =
StreamConverters.asOutputStream().preMaterialize(system);
OutputStream os = pair.first();
Source<ByteString, NotUsed> source = pair.second();
Note that there are a few additional things to think of in your code, most importantly if the blobClient.download(os) call blocks until it is done and you call that from the actor, in that case you must make sure that your actor does not starve the dispatcher and stop other actors in your application from executing (see Akka docs: https://doc.akka.io/docs/akka/current/typed/dispatchers.html#blocking-needs-careful-management ).
So if I have a socket server, I can accept each socket and pass it to a executory
while(true){
Socket conn = socketServ.accept();
Runnable task = new Runnable() {
#Override
public void run() {
try{
server.executor(conn);
} catch(IOException e){
}
}
};
exec1.execute(task);
}
Doing this allows my server to run on my threads and does not block the same thread. Because I also have reference to that socket... called "conn" I can successfully return messages as well.
Now I have an RMI interface, which basically lets me call methods back and forth.
for example if I had this method:
public MusicServerResponseImpl CreatePlayerlist(String Name, UserObjectImpl uo) throws RemoteException {
MusicServerResponseImpl res = new MusicServerResponseImpl();
return res;
}
Which returns a serializable object. My concern is when this message gets called, I think it is going to get called in the main thread of the server, and thus will block that thread and slow down parallelism.
What I think is the solution is to have every single RMI method also create a task for an executor.. to speed up the execution of everything...this issue I am seeing however is unlike the socket where I have an object to send information back to, I am unsure how I would return a response from the RMI method, without somehow having to block the thread.
Does that make sense? Basically I am asking how I can execute in parallel with RMI methods while still being able to return results!
Thanks for the help!
Does that make sense?
No. Concurrent calls are natively supported.
See this documentation page and look for the property named maxConnectionThreads.
You could also have tested your assumptions by, for example, printing the current thread name in your server code, and trying to execute concurrent calls and see what happens.
I am using the StreamObserver class found in the grpc-java project to set up some bidirectional streaming.
When I run my program, I make an undetermined number of requests to the server, and I only want to call onCompleted() on the requestObserver once I have finished making all of the requests.
Currently, to solve this, I am using a variable "inFlight" to keep track of the requests that have been issued, and when a response comes back, I decrement "inFlight". So, something like this.
// issuing requests
while (haveRequests) {
MessageRequest request = mkRequest();
this.requestObserver.onNext(request);
this.inFlight++;
}
// response observer
StreamObserver<Message> responseObserver = new StreamObserver<Message> {
#Override
public void onNext(Message response) {
if (--this.onFlight == 0) {
this.requestObserver.onCompleted();
}
// work on message
}
// other methods
}
A bit pseudo-codey, but this logic works. However, I would like to get rid of the "inFlight" variable if possible. Is there anything within the StreamObserver class that allows this sort of functionality, without the need of an additional variable to track state? Something that would tell the number of requests issued and when they completed.
I've tried inspecting the object within the intellij IDE debugger, but nothing is popping out to me.
To answer your direct question, you can simply call onComplete from the while loop. All the messages passed to onNext. Under the hood, gRPC will send what is called a "half close", indicating that it won't send any more messages, but it is willing to receive them. Specifically:
// issuing requests
while (haveRequests) {
MessageRequest request = mkRequest();
this.requestObserver.onNext(request);
this.inFlight++;
}
requestObserver.onCompleted();
This ensures that all responses are sent, and in the order that you sent them. On the server side, when it sees the corresponding onCompleted callback, it can half-close its side of the connection by calling onComplete on its observer. (There are two observers on the server side one for receiving info from the client, one for sending info).
Back on the client side, you just need to wait for the server to half close to know that all messages were received and processed. Note that if there were any errors, you would get an onError callback instead.
If you don't know how many requests you are going to make on the client side, you might consider using an AtomicInteger, and call decrementAndGet when you get back a response. If the return value is 0, you'll know all the requests have completed.
I have multi thread app which uses producer class to produce messages, earlier i was using below code to create producer for each request.where KafkaProducer was newly built with each request as below:
KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(prop);
ProducerRecord<String, byte[]> data = new ProducerRecord<String, byte[]>(topic, objBytes);
producer.send(data, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
isValidMsg[0] = false;
exception.printStackTrace();
saveOrUpdateLog(msgBean, producerType, exception);
logger.error("ERROR:Unable to produce message.",exception);
}
}
});
producer.close();
Then I read Kafka docs on producer and come to know we should use single producer instance to have good performance.
Then I created single instance of KafkaProducer inside a singleton class.
Now when & where we should close the producer. Obviously if we close the producer after first send request it wont find the producer to resend messages hence throwing :
java.lang.IllegalStateException: Cannot send after the producer is closed.
OR how we can reconnect to producer once closed.
Problem is if program crashes or have exceptions then?
Generally, calling close() on the KafkaProducer is sufficient to make sure all inflight records have completed:
/**
* Close this producer. This method blocks until all previously sent requests complete.
* This method is equivalent to <code>close(Long.MAX_VALUE, TimeUnit.MILLISECONDS)</code>.
* <p>
* <strong>If close() is called from {#link Callback}, a warning message will be logged and close(0, TimeUnit.MILLISECONDS)
* will be called instead. We do this because the sender thread would otherwise try to join itself and
* block forever.</strong>
* <p>
*
* #throws InterruptException If the thread is interrupted while blocked
*/
If your producer is being used throughout the lifetime of your application, don't close it up until you get a termination signal, then call close(). As said in the documentation, the producer is safe to used in a multi-threaded environment and hence you should re-use the same instance.
If you're sharing your KafkaProducer in multiple threads, you have two choices:
Call close() while registering a shutdown callback via Runtime.getRuntime().addShutdownHook from your main execution thread
Have your multi-threaded methods race for closing on only allow for a single one to win.
A rough sketch of 2 would possibly look like this:
object KafkaOwner {
private var producer: KafkaProducer = ???
#volatile private var isClosed = false
def close(): Unit = {
if (!isClosed) {
kafkaProducer.close()
isClosed = true
}
}
def instance: KafkaProducer = {
this.synchronized {
if (!isClosed) producer
else {
producer = new KafkaProducer()
isClosed = false
}
}
}
}
As described in javadoc for KafkaProducer:
public void close()
Close this producer. This method blocks until all previously sent requests complete.
This method is equivalent to close(Long.MAX_VALUE, TimeUnit.MILLISECONDS).
src: https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#close()
So you don't need to worry that your messages won't be sent, even if you call close immediately after send.
If you plan to use a KafkaProducer more than once, then close it only after you've finished using it. If you still want to have the guarantee that your message is actually sent before your method completes and not waiting in a buffer, then use KafkaProducer#flush() which will block until current buffer is sent. You can also block on Future#get() if you prefer.
There is also one caveat to be aware of if you don't plan to ever close your KafkaProducer (e.g. in short-lived apps, where you just send some data and the app immediately terminates after sending). The KafkaProducer IO thread is a daemon thread, which means the JVM will not wait until this thread finishes to terminate the VM. So, to ensure that your messages are actually sent use KafkaProducer#flush(), no-arg KafkaProducer#close() or block on Future#get().
Kafka producer is supposed to be thread safe and frugal with it's thread pool. you might want to use
producer.flush();
instead of
producer.close();
leaving the producer open until program termination or until your sure you won't need it any more.
If you still want to close the producer, then recreate it on demand.
producer = new KafkaProducer<String, byte[]>(prop);
Regarding Java NIO2.
Suppose we have the following to listen to client requests...
asyncServerSocketChannel.accept(null, new CompletionHandler <AsynchronousSocketChannel, Object>() {
#Override
public void completed(final AsynchronousSocketChannel asyncSocketChannel, Object attachment) {
// Put the execution of the Completeion handler on another thread so that
// we don't block another channel being accepted.
executer.submit(new Runnable() {
public void run() {
handle(asyncSocketChannel);
}
});
// call another.
asyncServerSocketChannel.accept(null, this);
}
#Override
public void failed(Throwable exc, Object attachment) {
// TODO Auto-generated method stub
}
});
This code will accept a client connection process it and then accept another.
To communicate with the server the client opens up an AsyncSocketChannel and fires the message.
The Completion handler completed() method is then invoked.
However, this means if the client wants to send another message on the same AsyncSocket instance it can't.
It has to create another AsycnSocket instance - which I believe means another TCP connection - which is performance hit.
Any ideas how to get around this?
Or to put the question another way, any ideas how to make the same asyncSocketChannel receive multipe CompleteionHandler completed() events?
edit:
My handling code is like this...
public void handle(AsynchronousSocketChannel asyncSocketChannel) {
ByteBuffer readBuffer = ByteBuffer.allocate(100);
try {
// read a message from the client, timeout after 10 seconds
Future<Integer> futureReadResult = asyncSocketChannel.read(readBuffer);
futureReadResult.get(10, TimeUnit.SECONDS);
String receivedMessage = new String(readBuffer.array());
// some logic based on the message here...
// after the logic is a return message to client
ByteBuffer returnMessage = ByteBuffer.wrap((RESPONSE_FINISHED_REQUEST + " " + client
+ ", " + RESPONSE_COUNTER_EQUALS + value).getBytes());
Future<Integer> futureWriteResult = asyncSocketChannel.write(returnMessage);
futureWriteResult.get(10, TimeUnit.SECONDS);
} ...
So that's it my server reads a message from the async channe and returns an answer.
The client blocks until it gets the answer. But this is ok. I don't care if client blocks.
Whent this is finished, client tries to send another message on same async channel and it doesn't work.
There are 2 phases of connection and 2 different kind of completion handlers.
First phase is to handle a connection request, this is what you have programmed (BTW as Jonas said, no need to use another executor). Second phase (which can be repeated multiple times) is to issue an I/O request and to handle request completion. For this, you have to supply a memory buffer holding data to read or write, and you did not show any code for this. When you do the second phase, you'll see that there is no such problem as you wrote: "if the client wants to send another message on the same AsyncSocket instance it can't".
One problem with NIO2 is that on one hand, programmer have to avoid multiple async operations of the same kind (accept, read, or write) on the same channel (or else an error occur), and on the other hand, programmer have to avoid blocking wait in handlers. This problem is solved in df4j-nio2 subproject of the df4j actor framework, where both AsyncServerSocketChannel and AsyncSocketChannel are represented as actors. (df4j is developed by me.)
First, you should not use an executer like you have in the completed-method. The completed-method is already handled in a new worker-thread.
In your completed-method for .accept(...), you should call asychSocketChannel.read(...) to read the data. The client can just send another message on the same socket. This message will be handled with a new call to the completed-method, perhaps by another worker-thread on your server.