Is this a realistic expectation of a distributed mechanism? - java

I've been evaluating ActiveMQ as a candidate message broker. I've written some test code to try and get an understanding of ActiveMQ's performance limitations.
I can produce a failure state in the broker by sending messages as fast as possible like this:
try {
while(true) {
byte[] payload = new byte[(int) (Math.random() * 16384)];
BytesMessage message = session.createBytesMessage();
message.writeBytes(payload);
producer.send(message);
} catch (JMSException ex) { ... }
I was surprised that the line
producer.send(message);
blocks when the broker enters a failed state. I was hoping that some exception would be thrown, so there would be some indication that the broker has failed.
I realize that my test code is spamming the broker, and I expect the broker to fail. However, I would prefer that the broker failed "loudly" as opposed to simply blocking.
Is this an unrealistic expectation?
Update:
Uri's answer references an ActiveMQ bug report that was filed in March. The bug description includes a proposal that sounds like what I'm looking for: "if the request on the transport had a timeout (this is to catch failure scenarios, so something that's not expected to reasonably happen), things would have errored out rather than building waiting threads."
However, after 8 months the bug is currently unassigned with a single vote. So I guess the question still stands, is this something ActiveMQ should (will?) implement?

You are testing the 'slow consumer' and producer flowcontrol issue all message brokers have to deal with. Do you wanna fail producers, block them or spool to disk?
Basically the out of the box default in ActiveMQ is to block producers. But you can configure message cursors to spool to disk.
BTW you've not said if you are using queues/topics or persistent/non-persistent; if you are using non persistent topics there are other strategies you can use for discarding messages etc.

Apprently there's a known issue, not sure if it's been fixed:
https://issues.apache.org/activemq/browse/AMQ-1625

Not sure about ActiveMQ config, but other JMS providers have various configuration options - so you maybe able to get ActiveMQ to do as you wish in that situation.
I know Fiorano has options to specify whether providers block or not in this situation.

Related

Spring Cloud Stream - notice and handle errors in broker

I am fairly new to developing distributed applications with messaging, and to Spring Cloud Stream in particular. I am currently wondering about best practices on how to deal with errors on the broker side.
In our application, we need to both consume and produce messages from/to multiple sources/destinations like this:
Consumer side
For consuming, we have defined multiple #Beans of type java.util.function.Consumer. The configuration for those looks like this:
spring.cloud.stream.bindings.consumeA-in-0.destination=inputA
spring.cloud.stream.bindings.consumeA-in-0.group=$Default
spring.cloud.stream.bindings.consumeB-in-0.destination=inputB
spring.cloud.stream.bindings.consumeB-in-0.group=$Default
This part works quite well - wenn starting the application, the exchanges "inputA" and "inputB" as well as the queues "inputA.$Default" and "inputB.$Default" with corresponding binding are automatically created in RabbitMQ.
Also, in case of an error (e.g. a queue is suddenly not available), the application gets notified immediately with a QueuesNotAvailableException and continuously tries to re-establish the connection.
My only question here is: Is there some way to handle this exception in code? Or, what are best practices to deal with failures like this on broker side?
Producer side
This one is more problematic. Producing messages is triggered by some internal logic, we cannot use function #Beans here. Instead, we currently rely on StreamBridge to send messages. The problem is that this approach does not trigger creation of exchanges and queues on startup. So when our code calls streamBridge.send("outputA", message), the message is sent (result is true), but it just disappears into the void since RabbitMQ automatically drops unroutable messages.
I found that with this configuration, I can at least get RabbitMQ to create exchanges and queues as soon as the first message is sent:
spring.cloud.stream.source=produceA;produceB
spring.cloud.stream.default.producer.requiredGroups=$Default
spring.cloud.stream.bindings.produceA-out-0.destination=outputA
spring.cloud.stream.bindings.produceB-out-0.destination=outputB
I need to use streamBridge.send("produceA-out-0", message) in code to make it work, which is not too great since it means having explicit configuration hardcoded, but at least it works.
I also tried to implement the producer in a Reactor style as desribed in this answer, but in this case the exchange/queue also is not created on application startup and the sent message just disappears even though the return status of the sending method is "OK".
Failures on the broker side are not registered at all with this approach - when I simulate one e.g. by deleting the queue or the exchange, it is not registered by the application. Only when another message is sent, I get in the logs:
ERROR 21804 --- [127.0.0.1:32404] o.s.a.r.c.CachingConnectionFactory : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'produceA-out-0' in vhost '/', class-id=60, method-id=40)
But still, the result of StreamBridge#send was true in this case. But we need to know that sending did actually fail at this point (we persist the state of the sent object using this boolean return value). Is there any way to accomplish that?
Any other suggestions on how to make this producer scenario more robust? Best practices?
EDIT
I found an interesting solution to the producer problem using correlations:
...
CorrelationData correlation = new CorrelationData(UUID.randomUUID().toString());
messageHeaderAccessor.setHeader(AmqpHeaders.PUBLISH_CONFIRM_CORRELATION, correlation);
Message<String> message = MessageBuilder.createMessage(payload, messageHeaderAccessor.getMessageHeaders());
boolean sent = streamBridge.send(channel, message);
try {
final CorrelationData.Confirm confirm = correlation.getFuture().get(30, TimeUnit.SECONDS);
if (correlation.getReturned() == null && confirm.isAck()) {
// success logic
} else {
// failed logic
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
// failed logic
} catch (ExecutionException | TimeoutException e) {
// failed logic
}
using these additional configurations:
spring.cloud.stream.rabbit.default.producer.useConfirmHeader=true
spring.rabbitmq.publisher-confirm-type=correlated
spring.rabbitmq.publisher-returns=true
This seems to work quite well, although I'm still clueless about the return value of StreamBridge#send, it is always true and I cannot find information in which cases it would be false. But the rest is fine, I can get information on issues with the exchange or the queue from the correlation or the confirm.
But this solution is very much focused on RabbitMQ, which causes two problems:
our application should be able to connect to different brokers (e.g. Azure Service Bus)
in tests we use Kafka binder and I don't know how to configure the application context to make it work in this case, too
Any help would be appreciated.
On the consumer side, you can listen for an event such as the ListenerContainerConsumerFailedEvent.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#consumer-events
On the producer side, producers only know about exchanges, not any queues bound to them; hence the requiredGroups property which causes the queue to be bound.
You only need spring.cloud.stream.default.producer.requiredGroups=$Default - you can send to arbitrary destinations using the StreamBridge and the infrastructure will be created.
#SpringBootApplication
public class So70769305Application {
public static void main(String[] args) {
SpringApplication.run(So70769305Application.class, args);
}
#Bean
ApplicationRunner runner(StreamBridge bridge) {
return args -> bridge.send("foo", "test");
}
}
spring.cloud.stream.default.producer.requiredGroups=$Default

i.grpc.internal.AbstractClientStream - Received data on closed stream meaning

I have a Spring boot application (v2.2.10.RELEASE) that subscribes to multiple topics in pubSub and pulls async data and sends it to somewhere else. I am not using SpringGCP, just native google libraries
this is my subscriber setting:
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(message);
consumer.ack();
};
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(2)
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
//.setChannelProvider()
.build();
With high traffic and big messages (2 - 4 kb) I encounter this info message:
[grpc-default-worker-ELG-1-1] INFO i.grpc.internal.AbstractClientStream - Received data on closed stream
first of all, I don't fully understand what that means? all that I noticed was that when this happens the delivered duplicated messages increase. so I assumed it meant that pubSub tried to reach the subscriber with some messages but the subscriber for some reason was not ready so pubSub will try to deliver the messages again. and hence more duplicates, is that right?
would this problem be solved using the TransportChannelProvider in subscribers? my understanding of the poorly written documentation, that this will create a new channel for delivery when the current in-use channel is closed, hence get rid of the previous log message.
if yes, how do I define the channel target string? and where can I find A NameResolver-compliant URI for the mangagedChannel. the snippet I mean is this:
private TransportChannelProvider getChannelProvider() {
ManagedChannel channel = ManagedChannelBuilder.forTarget(target).usePlaintext(true).build();
return FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel));
}
I am pretty new to GCP so sorry if my question is not coherent enough
Using a custom TransportChannelProvider won't solve this type of issue. This is more likely an issue deeper down in the stack, e.g., at the gRPC level. There have been some open issues for this type of error [1, 2].
With regard to why it is causing duplicates, it is possible that the messages are getting delivered via a stream that is already closed (which aligns with the error message) because they were trapped in a lower-level buffer at the gRPC layer and therefore ended up being duplicates of messages that were subsequently delivered and processed via another stream. This could be a version of the issue discussed in the documentation around large backlogs of small messages. There was a fix for this issue in v1.109.0 of the Java client library, so if you are using a version older than that, it is worth updating.
If duplicates continue to be an issue, it would be best to reach out to support with the name of your subscription and the message IDs of some of the duplicate messages so that they can look at the delivery patterns for those messages and further diagnose if these redeliveries are unexpected.

Http Websocket as Akka Stream Source

I'd like to listen on a websocket using akka streams. That is, I'd like to treat it as nothing but a Source.
However, all official examples treat the websocket connection as a Flow.
My current approach is using the websocketClientFlow in combination with a Source.maybe. This eventually results in the upstream failing due to a TcpIdleTimeoutException, when there are no new Messages being sent down the stream.
Therefore, my question is twofold:
Is there a way – which I obviously missed – to treat a websocket as just a Source?
If using the Flow is the only option, how does one handle the TcpIdleTimeoutException properly? The exception can not be handled by providing a stream supervision strategy. Restarting the source by using a RestartSource doesn't help either, because the source is not the problem.
Update
So I tried two different approaches, setting the idle timeout to 1 second for convenience
application.conf
akka.http.client.idle-timeout = 1s
Using keepAlive (as suggested by Stefano)
Source.<Message>maybe()
.keepAlive(Duration.apply(1, "second"), () -> (Message) TextMessage.create("keepalive"))
.viaMat(Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri)), Keep.right())
{ ... }
When doing this, the Upstream still fails with a TcpIdleTimeoutException.
Using RestartFlow
However, I found out about this approach, using a RestartFlow:
final Flow<Message, Message, NotUsed> restartWebsocketFlow = RestartFlow.withBackoff(
Duration.apply(3, TimeUnit.SECONDS),
Duration.apply(30, TimeUnit.SECONDS),
0.2,
() -> createWebsocketFlow(system, websocketUri)
);
Source.<Message>maybe()
.viaMat(restartWebsocketFlow, Keep.right()) // One can treat this part of the resulting graph as a `Source<Message, NotUsed>`
{ ... }
(...)
private Flow<Message, Message, CompletionStage<WebSocketUpgradeResponse>> createWebsocketFlow(final ActorSystem system, final String websocketUri) {
return Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri));
}
This works in that I can treat the websocket as a Source (although artifically, as explained by Stefano) and keep the tcp connection alive by restarting the websocketClientFlow whenever an Exception occurs.
This doesn't feel like the optimal solution though.
No. WebSocket is a bidirectional channel, and Akka-HTTP therefore models it as a Flow. If in your specific case you care only about one side of the channel, it's up to you to form a Flow with a "muted" side, by using either Flow.fromSinkAndSource(Sink.ignore, mySource) or Flow.fromSinkAndSource(mySink, Source.maybe), depending on the case.
as per the documentation:
Inactive WebSocket connections will be dropped according to the
idle-timeout settings. In case you need to keep inactive connections
alive, you can either tweak your idle-timeout or inject ‘keep-alive’
messages regularly.
There is an ad-hoc combinator to inject keep-alive messages, see the example below and this Akka cookbook recipe. NB: this should happen on the client side.
src.keepAlive(1.second, () => TextMessage.Strict("ping"))
I hope I understand your question correctly. Are you looking for asSourceOf?
path("measurements") {
entity(asSourceOf[Measurement]) { measurements =>
// measurement has type Source[Measurement, NotUsed]
...
}
}

Sending large number of messages using spring jmsTemplate

Is there a best practice or guidance for sending persistent messages with asyncSend set to true.
We don't have transaction manager configured
We have ~40k-50k messages which are sent using jmsTemplate configured with
org.apache.activemq.pool.PooledConnectionFactory
We have a for loop which iterates over messages list and send them using
jmsTemplate.convertAndSend(destination, msg)
We see lot of message loss on frequent basis, when we turn off asyncSend we get the reliability but the producer performance drops by 95%
A bit of speculation as the question is not very detailed but anyway.
Depending on configuration, ActiveMQ might have memory limits on queues (might as well differ between persistent and non persistent messages). So when memory is up, your asyncSend calls will ignore warnings and continue to deliver messages to the "black hole" until memory is freed by the consumer.
There is no silver bullet to allow max performance and max reliability. Unfortunately.
However, I would try setting a producerWindowSize on the connection factory to allow some specified amount of data before a broker ack is received. Exact value is something you need to try out and depends on scenario as well as broker config/resources.
I solved this using a ProducerCallback
List<String> messageTexts = prepareListOfMessaeTexts();
ProducerCallback producerCallback = (session, producer) -> {
Topic destination = session.createTopic(myTopicName);
for (String messageText : myMessagmessageTextseBodies) {
producer.send(destination, session.createTextMessage(messageText));
}
return null;
};
jmsTemplate.execute(producerCallback);

Why do my RabbitMQ channels keep closing?

I'm debugging some Java code that uses Apache POI to pull data out of Microsoft Office documents. Occasionally, it encounter a large document and POI crashes when it runs out of memory. At that point, it tries to publish the error to RabbitMQ, so that other components can know that this step failed and take the appropriate actions. However, when it tries to publish to the queue, it gets a com.rabbitmq.client.AlreadyClosedException (clean connection shutdown; reason: Attempt to use closed channel).
Here's the error handler code:
try {
//Extraction and indexing code
}
catch(Throwable t) {
// Something went wrong! We'll publish the error and then move on with
// our lives
System.out.println("Error received when indexing message: ");
t.printStackTrace();
System.out.println();
String error = PrintExc.format(t);
message.put("error", error);
if(mime == null) {
mime = "application/vnd.unknown";
}
message.put("mime", mime);
publish("IndexFailure", "", MessageProperties.PERSISTENT_BASIC, message);
}
For completeness, here's the publish method:
private void publish(String exch, String route,
AMQP.BasicProperties props, Map<String, Object> message) throws Exception{
chan.basicPublish(exch, route, props,
JSONValue.toJSONString(message).getBytes());
}
I can't find any code within the try block that appears to close the RabbitMQ channel. Are there any circumstances in which the channel could be closed implicitly?
EDIT: I should note that the AlreadyClosedException is thrown by the basicPublish call inside publish.
An AMQP channel is closed on a channel error. Two common things that can cause a channel error:
Trying to publish a message to an exchange that doesn't exist
Trying to publish a message with the immediate flag set that doesn't have a queue with an active consumer set
I would look into setting up a ShutdownListener on the channel you're trying to use to publish a message using the addShutdownListener() to catch the shutdown event and look at what caused it.
Another reason in my case was that by mistake I acknowledged a message twice. This lead to RabbitMQ errors in the log like this after the second acknowledgment.
=ERROR REPORT==== 11-Dec-2012::09:48:29 ===
connection <0.6792.0>, channel 1 - error:
{amqp_error,precondition_failed,"unknown delivery tag 1",'basic.ack'}
After I removed the duplicate acknowledgement then the errors went away and the channel did not close anymore and also the AlreadyClosedException were gone.
I'd like to add this information for other users who will be searching for this topic
Another possible reason for Receiving a Channel Closed Exception is when Publishers and Consumers are accessing Channel/Queue with different queue declaration/settings
Publisher
channel.queueDeclare("task_queue", durable, false, false, null);
Worker
channel.queueDeclare("task_queue", false, false, false, null);
From RabbitMQ Site
RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that
Apparently, there are many reasons for the AMQP connection and/or channels to close abruptly. In my case, there was too many unacknowledged messages on the queue because the consumer didn't specify the prefetch_count so the connection was getting terminated every ~1min. Limiting the number of unacknowledged messages by setting the consumer's prefetch count to a non-zero value fixed the problem.
channel.basicQos(100);
For those who wonder why their consuming channels are closing, check if you try to Ack or Nack a delivery more than once.
In the rabbitmq log you would see messages like:
operation basic.ack caused a channel exception precondition_failed:
unknown delivery tag ...
I also had this problem. The reason for my case was that, first I built the queue with durable = false and in the log file I had this error message when I switched durable to true:
"inequivalent arg 'durable' for queue 'logsQueue' in vhost '/':
received 'true' but current is 'false'"
Then, I changed the name of the queue and it worked for me. I assumed that the RabbitMQ server keeps the record of the built queues somewhere and it cannot change the status from durable to non-durable and vice versa.
Again I made durable=false for the new queue and this time I got this error
"inequivalent arg 'durable' for queue 'logsQueue1' in vhost '/':
received 'false' but current is 'true'"
My assumption was true. When I listed the queues in rabbitMQ server by:
rabbitmqctl list_queues
I saw both queues in the server.
To summarize, 2 solutions are:
1. renaming the name of the queue which is not a good solution
2. resetting rabbitMQ by:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

Categories