Understanding AWS FIFO Queue behaviour

Understanding AWS FIFO Queue behaviour - java

Was playing around with AWS SQS FIFO Queue locally in localstack with AWS Java sdk v2 & Spring Boot.
I created one endpoint to send messages through one publisher and three endpoints to receive/poll messages from queue via three consumers in Spring boot controller classes.
I created the FIFO queue with following properties -
RECEIVE_MESSAGE_WAIT_TIME_SECONDS = 20 seconds (long poll)
VISIBILITY_TIMEOUT = 60 seconds
FIFO_QUEUE = true
CONTENT_BASED_DEDUPLICATION = true
Each consumer could fetch at max 3 messages (At least 1 if available, up-to 3) per each poll request.
I published 5 messages to the queue (in order). They are -
Message Group Id | Deduplication Id
-----------------------------------
A | A1
A | A2
A | A3
A | A4
A | A5
From log -
2022-06-01 16:13:26.474 INFO 27918 --- [nio-9099-exec-1] c.p.sqs.service.SqsPublisherServiceImpl : sendMsgRequest SendMessageRequest(QueueUrl=http://localhost:4566/000000000000/dev-priyam-fifo-queue.fifo, MessageBody={"id":"A1"}, MessageDeduplicationId=A1, MessageGroupId=A)
2022-06-01 16:13:26.600 INFO 27918 --- [nio-9099-exec-2] c.p.sqs.service.SqsPublisherServiceImpl : sendMsgRequest SendMessageRequest(QueueUrl=http://localhost:4566/000000000000/dev-priyam-fifo-queue.fifo, MessageBody={"id":"A2"}, MessageDeduplicationId=A2, MessageGroupId=A)
2022-06-01 16:13:26.700 INFO 27918 --- [nio-9099-exec-3] c.p.sqs.service.SqsPublisherServiceImpl : sendMsgRequest SendMessageRequest(QueueUrl=http://localhost:4566/000000000000/dev-priyam-fifo-queue.fifo, MessageBody={"id":"A3"}, MessageDeduplicationId=A3, MessageGroupId=A)
2022-06-01 16:13:26.785 INFO 27918 --- [nio-9099-exec-4] c.p.sqs.service.SqsPublisherServiceImpl : sendMsgRequest SendMessageRequest(QueueUrl=http://localhost:4566/000000000000/dev-priyam-fifo-queue.fifo, MessageBody={"id":"A4"}, MessageDeduplicationId=A4, MessageGroupId=A)
2022-06-01 16:13:26.843 INFO 27918 --- [nio-9099-exec-5] c.p.sqs.service.SqsPublisherServiceImpl : sendMsgRequest SendMessageRequest(QueueUrl=http://localhost:4566/000000000000/dev-priyam-fifo-queue.fifo, MessageBody={"id":"A5"}, MessageDeduplicationId=A5, MessageGroupId=A)
I then started polling from consumers randomly. My observation is stated below -
A1, A2 and A3 were polled. They were polled but not deleted (intentionally). So they went back to the queue after visibility timeout (60 seconds) was over.
In the next poll, A3 and A4 were polled. Again, they were polled but not deleted. So they went back to the queue after 60 seconds.
In the next poll, A4 and A5 were polled. Again, they were polled but not deleted. So they went back to the queue after 60 seconds.
In the next poll (and all following polls) A5 was polled. And I kept getting only A5 from here on.
Now I want to understand why I am getting this behaviour. The whole selling point of FIFO is getting ordered messages (per same message group id). My expectation after step 1 was, I will get one of A1, A1 A2 or A1 A2 A3 in the next poll (step 2) - but this didn't happen.
Can anyone explain what is happening here?
My github repo : https://github.com/tahniat-ashraf/java-aws-sqs-101

I believe this is a known issue in localstack, when using both CONTENT_BASED_DEDUPLICATION=true and providing a MessageDeduplicationId.
SQS supports either content-based duplication, or manual deduplication via a deduplication ID. It does not support both.
Try running this on an actual SQS queue - or change your configuration as described in the localstack issue.

Related

QFJ Passes Messages in the Wrong Order

I am using QFJ 2.1.1 and testing my application against a fix simulator (also running in QFJ 2.1.1)
There are two fix sessions.
An initiator and acceptor on both sides.
The problem workflow looks like this:
1. simulator/acceptor <--- New Order Single <--- application/initiator
2. simulator/acceptor ---> ACK ---> application/initiator
3. simulator/initiator ---> New Order Single ---> application/acceptor
4. simulator/initiator <--- ACK <--- application/acceptor
The order of fix messages processed by QFJ in the simulator is 1,2,3,4
The order of fix messages processed by QFJ in the application is 1,3,2,4
The Application was called back with 3 (NewOrderSingle) before it was called back with 2 (ACK)
Here are the QFJ log snippets from the simulator showing 2 and 3 being sent on two sessions:
2
2019-12-12 10:23:12.820 [928630][QFJ Message Processor][INFO ] <-- OUTBOUND VENDOR: 8=FIX.4.2|9=180|35=8|34=2|
52=20191212-15:23:12.820|11=287:MACGREGOR-37392703:45037843|17=BYHWG|20=0|37=SIM:287:MACGREGOR-37392703:45037843|38=10000|39=0|54=2|55=MSFT|150=0|10=132|[:]
3
2019-12-12 10:23:12.820 [928630][QFJ Message Processor][INFO ] <-- OUTBOUND ATS: 8=FIX.4.2|9=208|35=D|34=2|52=20191212-15:23:12.820|11=GSET:287:MACGREGOR-37392703:45037843|18=M|21=1|38=10000|40=P|44=153.3
500|54=2|55=MSFT|60=20191212-15:23:09.205|110=0|8011=287:MACGREGOR-37392703:45037843|10=207|[:]
Here are the QFJ log snippets from the application showing 3 being received before 2 on the two sessions:
3
2019-12-12 10:23:12.824 [31181][QFIXManager][INFO ] FIX onAppReceived(): AQUA->GSET, message=[11=GSET:287:MACG
REGOR-37392703:45037843 35=D 18=M 44=153.3500] {11=GSET:287:MACGREGOR-37392703:45037843, 44=153.3500, 55=MSFT,
34=2, 56=AQUA, 35=D, 8011=287:MACGREGOR-37392703:45037843, 49=GSET, 38=10000, 18=M, 110=0, 8=FIX.4.2, 9=208,
60=20191212-15:23:09.205, 40=P, 52=20191212-15:23:12.820, 21=1, 54=2, 10=207} [:]
2
2019-12-12 10:23:12.827 [31184][QFIXManager][INFO ] FIX onAppReceived(): AQUABORG->GSETBORG, message=[11=287:M
ACGREGOR-37392703:45037843 37=SIM:287:MACGREGOR-37392703:45037843 35=8 39=0 150=0] {11=287:MACGREGOR-37392703:
45037843, 55=MSFT, 34=2, 56=AQUABORG, 35=8, 37=SIM:287:MACGREGOR-37392703:45037843, 49=GSETBORG, 38=10000, 17=
BYHWG, 39=0, 150=0, 8=FIX.4.2, 9=180, 52=20191212-15:23:12.820, 20=0, 54=2, 10=132}
As you can see, the application messages 2 and 3 are received out of order.
How can I prevent this in my QFJ application?

How to make Kafka Source reconnect when Kafka restarts

I create a Source of consumer records using Reactive Kafka as follows:
val settings = ConsumerSettings(system, keyDeserializer, valueDeserializer)
.withBootstrapServers(bootstrapServers)
.withGroupId(groupName)
// what offset to begin with if there's no offset for this group
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
// do we want to automatically commit offsets?
.withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
// auto-commit offsets every 1 minute, in the background
.withProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000")
// reconnect every 1 second, when disconnected
.withProperty(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG, "1000")
// every session lasts 30 seconds
.withProperty(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000")
// send heartbeat every 10 seconds i.e. 1/3 * session.timeout.ms
.withProperty(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "10000")
// how many records to fetch in each poll( )
.withProperty(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100")
Consumer.atMostOnceSource(settings, Subscriptions.topics(topic)).map(_.value)
I have 1 instance of Kafka running on my local machine. I push values into the topic via the console producer and see them printed out. Then I kill Kafka, and restart it to see if the source reconnects.
These are how my logs proceed:
* Connection with /192.168.0.1 disconnected
java.net.ConnectException: Connection refused
* Give up sending metadata request since no node is available
* Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
* Resuming partition test-events-0
* Error while fetching metadata with correlation id 139 : {test-events=INVALID_REPLICATION_FACTOR}
* Sending metadata request (type=MetadataRequest, topics=test-events) to node 0
* Sending GroupCoordinator request for group mytestgroup to broker 192.168.0.1:9092 (id: 0 rack: null)
* Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
* Received GroupCoordinator response ClientResponse(receivedTimeMs=1491797713078, latencyMs=70, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=166,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group mytestgroup
* Error while fetching metadata with correlation id 169 : {test-events=INVALID_REPLICATION_FACTOR}
* Received GroupCoordinator response ClientResponse(receivedTimeMs=1491797716169, latencyMs=72, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=196,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group mytestgroup
09:45:16.169 [testsystem-akka.kafka.default-dispatcher-16] DEBUG o.a.k.c.c.i.AbstractCoordinator - Group coordinator lookup for group mytestgroup failed: The group coordinator is not available.
09:45:16.169 [testsystem-akka.kafka.default-dispatcher-16] DEBUG o.a.k.c.c.i.AbstractCoordinator - Coordinator discovery failed for group mytestgroup, refreshing metadata
* Initiating API versions fetch from node 2147483647
* Offset commit for group mytestgroup failed: This is not the correct coordinator for this group.
* Marking the coordinator 192.168.43.25:9092 (id: 2147483647 rack: null) dead for group mytestgroup
* The Kafka consumer has closed.
How do I make sure that this Source reconnects and continues processing the logs?

I think you need to have at least 2 brokers. If one fails the other one can do the job and you could restart the other one.

Storm KafkaSpout stopped to consume messages from Kafka Topic

My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this:
2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo#2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x230867ec "[__sendqueue = {sojourn_time_ms=0.0, write_pos=5411461, read_pos=5411461, overflow=0, arrival_rate_secs=0.0, capacity=1024, population=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x7cdec8eb "[__complete-latency = {default=0.0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x658fc59 "[__skipped-max-spout = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3c1f3a50 "[__receive = {sojourn_time_ms=4790.5, write_pos=2468305, read_pos=2468304, overflow=0, arrival_rate_secs=0.20874647740319383, capacity=1024, population=1}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x262d7906 "[__skipped-inactive = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x73648c7e "[kafkaPartition = {Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPICallCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMax=null, Partition{host=slave103:9092, topic=packet, partition=12}/lostMessageCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMean=null, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPIMessageCount=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x4e43df61 "[kafkaOffset = {packet/totalLatestCompletedOffset=154305947, packet/partition_12/spoutLag=82472754, packet/totalEarliestTimeOffset=233919465, packet/partition_12/earliestTimeOffset=233919465, packet/partition_12/latestEmittedOffset=154307691, packet/partition_12/latestTimeOffset=236778701, packet/totalLatestEmittedOffset=154307691, packet/partition_12/latestCompletedOffset=154305947, packet/totalLatestTimeOffset=236778701, packet/totalSpoutLag=82472754}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x49fe816b "[__transfer-count = {__ack_init=0, default=0, __metrics=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x63e2bdc0 "[__fail-count = {}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3b17bb7b "[__skipped-throttle = 1086120]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x1315a68c "[__emit-count = {__ack_init=0, default=0, __metrics=0}]"]]]
2016-07-05 03:58:55.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.946 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __metrics_tick, id: {}, [60]
My test topology is really simple, One KafkaSpout and another Counter Bolt. When the topology works fine, the value between FOR and TUPLE is a positive number; when the topology stops to consume the message, the value becomes negative. so I'm curious about what causes the problem of Processing received message FOR -2 TUPLE, and how to fix this problem?
By the way, my experiment environment is:
OS: Red Hat Enterprise Linux Server release 7.0 (Maipo)
Kafka: 0.10.0.0
Storm: 1.0.1

With the help from the stom mail list I was able to tune KafkaSpout and resolve the issue. The following settings work for me.
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
config.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
I tested by sending 20k-50k batches with 1sec pause between bursts. Each message was 2048 bytes.
I am running 3 node cluster, my topology has 4 spouts and topic has 64 partitions.
After 200M messages its still working....

Check if the producer is actually writing to the topic you expect.
Make sure that the spouts can reach Kafka, at the network level. You can check it using Telnet command.
Can spouts reach Zookeeper? Check it again using Telnet.
Source: KafkaSpout is not receiving anything from Kafka
If above three are true, then:
Kafka has fixed retention window for topics. If the retention is full, it will drop the messages from the tail.
So here what 'might' be happening : the rate at which you are pushing the data to kafka is faster than the rate at which the consumers can consume the messages.
Source : Storm-kafka spout not fast enough to process the information

Kafka Consumer (Java) polls 0 messages

I am seeing this issue in my Kafka Java client where the consumer stop consuming after polling a few messages. Its not that the consumer hangs. Its unable to find messages in the topic partitions and polls 0 message. I have 4 partitions configured for the topic and 2 consumers for the consumer group.
Consumer Log:
Thread-5:2016-05-11 at 07:35:21.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:31.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:41.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:51.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Here, the log suggests that this consumer is connected to partitions 0 and 1 but is unable to consumer any message.
Consumer Offset:
Group Topic Pid Offset logSize Lag Owner
test-consumer test 0 1147335 1150034 2699 none
test-consumer test 1 1147471 1150033 2562 none
test-consumer test 2 1150035 1150035 0 none
test-consumer test 3 1150031 1150031 0 none
Here this shows that my topic has 2699 and 2562 messages pending on partitions 0 and 1 respectively

JavaPNS - works fine from laptop, not from server?

I've been pounding my head on this for a while, I hope someone can help?
I have some JavaPNS code which pushes fine to my device when running from my local machine, however, when I copy that code over to my server, everything runs fine, no errors, but I never get the alert on my device?
After examining the logs from my server compared to my local box, I noticed I never get the flushing message on my server, I am using the JavaPNS queue, with 30 threads. In both cases, local box and server, I am sending less than 30 alerts.
public class PushWorker
{
private PushQueue queue = null;
public PushWorker(File keystore, String password, boolean production) throws KeystoreException
{
BasicConfigurator.configure();
int threads = 30;
this.queue = Push.queue(keystore, password, production, threads);
queue.start();
}
public void push(String message, String sound, String token, String eventId) throws JSONException
{
BasicDevice bd = new BasicDevice();
bd.setToken(token);
PushNotificationPayload payload = PushNotificationPayload.complex();
payload.addAlert(message);
payload.addSound(sound);
payload.addCustomDictionary("eid", eventId);
push(payload, bd);
}
private void push(Payload payload, Device device)
{
queue.add(payload, device);
}
}
----BELOW is flush message from my local box --------------
4872 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Flushing
4872 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - At this point, the entire 139-bytes message has been streamed out successfully through the SSL connection
4872 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Notification sent on first attempt
Can I force the flush somehow from the queue?
----------------------BELOW is the server JavaPNS logging-----------------------
0 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.communication.ConnectionToAppleServer - Creating SSLSocketFactory
16 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.communication.ConnectionToAppleServer - Creating SSLSocketFactory
49 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.communication.ConnectionToAppleServer - Creating SSLSocket to gateway.sandbox.push.apple.com:2195
49 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.communication.ConnectionToAppleServer - Creating SSLSocket to gateway.sandbox.push.apple.com:2195
177 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Initialized Connection to Host: [gateway.sandbox.push.apple.com] Port: [2195]: 5117f31e[SSL_NULL_WITH_NULL_NULL: Socket[addr=gateway.sandbox.push.apple.com/17.172.233.65,port=2195,localport=56015]]
...
...
DEBUG javapns.notification.Payload - Adding alert [blah, blah my alert]
14767 [main] DEBUG javapns.notification.Payload - Adding sound [default]
14767 [main] DEBUG javapns.notification.Payload - Adding custom Dictionary [eid] = [193790]
14776 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Building Raw message from deviceToken and payload
14776 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Built raw message ID 16777217 of total length 135
14777 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - Attempting to send notification: {"aps":{"sound":"default","alert":"blah, blah my alert"},"eid":"193790"}
14777 [JavaPNS grouped notification thread in QUEUE mode] DEBUG javapns.notification.PushNotificationManager - to device: [my device number]
And that's it, no flush...

Do you have iptables configured on the server or some other firewall device?
Maybe port 2195 needs to be allowed.
Try using telnet to test:
$ telnet gateway.sandbox.push.apple.com 2195
Do you have SELinux enabled? check the following setting:
$ getsebool -a | grep httpd
httpd_can_network_connect --> ?
make sure it is on.
Did you check the documentation on the javapns repo site?
Let us know how it goes.
Bill

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Understanding AWS FIFO Queue behaviour - java

Related

QFJ Passes Messages in the Wrong Order

How to make Kafka Source reconnect when Kafka restarts

Storm KafkaSpout stopped to consume messages from Kafka Topic

Kafka Consumer (Java) polls 0 messages

JavaPNS - works fine from laptop, not from server?

Categories

Resources