Cluster instability with TCPPING protocol

Cluster instability with TCPPING protocol - java

I have 8 different processes distributed across 6 different servers with the following TCP/TCPPING protocol configuration:
<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP
bind_port="${jgroups.tcp.bind_port:16484}"
bind_addr="${jgroups.tcp.bind_addr:127.0.0.1}"
recv_buf_size="20M"
send_buf_size="20M"
max_bundle_size="64K"
sock_conn_timeout="300"
use_fork_join_pool="true"
thread_pool.min_threads="10"
thread_pool.max_threads="100"
thread_pool.keep_alive_time="30000" />
<TCPPING
async_discovery="true"
initial_hosts="${jgroups.tcpping.initial_hosts:127.0.0.1[16484]}"
port_range="5" #/>
<MERGE3 min_interval="10000" max_interval="30000" />
<FD_SOCK get_cache_timeout="10000"
cache_max_elements="300"
cache_max_age="60000"
suspect_msg_interval="10000"
num_tries="10"
sock_conn_timeout="10000"/>
<FD timeout="10000" max_tries="10" />
<VERIFY_SUSPECT timeout="10000" num_msgs="5"/>
<BARRIER />
<pbcast.NAKACK2
max_rebroadcast_timeout="5000"
use_mcast_xmit="false"
discard_delivered_msgs="true" />
<UNICAST3 />
<pbcast.STABLE
stability_delay="1000"
desired_avg_gossip="50000"
max_bytes="4M" />
<AUTH
auth_class="com.qfs.distribution.security.impl.CustomAuthToken"
auth_value="distribution_password"
token_hash="SHA" />
<pbcast.GMS
print_local_addr="true"
join_timeout="10000"
leave_timeout="10000"
merge_timeout="10000"
num_prev_mbrs="200"
view_ack_collection_timeout="10000"/>
</config>
The cluster keeps splitting in subgroups, then merges again and again which results in high memory usages. I can also see in the logs a lot of "suspect" warning resulting from the periodic heartbeats sent by all other cluster members. Am I missing something ?
EDIT
After enabling gc logs, nothing suspect appeared to me. On the other hand, I've noticed this jgroups logs appearing a lot:
WARN: lonlx21440_FrtbQueryCube_QUERY_29302: I was suspected by woklxp00330_Sba-master_DATA_36219; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
DEBUG: lonlx21440_FrtbQueryCube_QUERY_29302: closing expired connection for redlxp00599_Sba-master_DATA_18899 (121206 ms old) in send_table
DEBUG: I (redlxp00599_Sba-master_DATA_18899) will be the merge leader
DEBUG: redlxp00599_Sba-master_DATA_18899: heartbeat missing from lonlx21503_Sba-master_DATA_2175 (number=1)
DEBUG: redlxp00599_Sba-master_DATA_18899: suspecting [lonlx21440_FrtbQueryCube_QUERY_29302]
DEBUG: lonlx21440_FrtbQueryCube_QUERY_29302: removed woklxp00330_Sba-master_DATA_36219 from xmit_table (not member anymore)enter code here
and this one
2020-08-31 16:35:34.715 [ForkJoinPool-3-worker-11] org.jgroups.protocols.pbcast.GMS:116
WARN: lonlx21440_FrtbQueryCube_QUERY_29302: failed to collect all ACKs (expected=6) for view [redlxp00599_Sba-master_DATA_18899|104] after 2000ms, missing 6 ACKs from (6) lonlx21503_Sba-master_DATA_2175, lonlx11179_DRC-master_DATA_15999, lonlx11184_Rrao-master_DATA_31760, lonlx11179_Rrao-master_DATA_25194, woklxp00330_Sba-master_DATA_36219, lonlx11184_DRC-master_DATA_49264
I still can;'t figure out where the instability comes from.
Thanks

Any instability is not due to TCPPING protocol - this belongs to the Discovery protocol family and its purpose is to find new members, not kick them out of the cluster.
You use both FD_SOCK and FD to find if members left, and then VERIFY_SUSPECT to confirm that the node is not reachable. The setting seems pretty normal.
First thing to check is your GC logs. If you experience STW pauses longer than, say, 15 seconds, chances are that the cluster disconnect because of unresponsiviness due to GC.
If your GC logs are fine, increase logging level for FD, FD_SOCK and VERIFY_SUSPECT to TRACE and see what's going on.

Related

Redistribution fails for large messages in ActiveMQ Artemis

I'm using ActiveMQ Artemis version 2.19.1, and I'm facing an issue in a 6-node (3 masters) cluster where redistribution is failing for large messages with the below warning logs:
23:35:05,551 WARN [org.apache.activemq.artemis.core.server] AMQ222303: Redistribution by Redistributor[TEST_QUEUE/2244] of messageID = 196,950,715 failed: java.lang.UnsupportedOperationException: Method not supported with Large Messages
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage.getData(AMQPLargeMessage.java:311) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.anyMessageAnnotations(AMQPMessage.java:1374) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.hasScheduledDeliveryTime(AMQPMessage.java:1352) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1499) [artemis-server-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.core.server.cluster.impl.Redistributor$1.run(Redistributor.java:169) [artemis-server-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.19.1.jar:2.19.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_322]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.19.1.jar:2.19.1]
Later I see broker is removing consumers with below warning since properties=null:
00:10:28,280 WARN [org.apache.activemq.artemis.core.server] AMQ222151: removing consumer which did not handle a message, consumer=ServerConsumerImpl [id=57, filter=null, binding=LocalQueueBinding [address=TEST_QUEUE, queue=QueueImpl[name=TEST_QUEUE, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=localhost], temp=false]#9c000b7, filter=null, name=TEST_QUEUE, clusterName=TEST_QUEUE1e359f55-c92b-11ec-b908-005056a3af3f]], message=Reference[157995028]:RELIABLE:AMQPLargeMessage( [durable=true, messageID=157995028, address=TEST_QUEUE, size=0, scanningStatus=SCANNED, applicationProperties={VER=05, trackingId=62701757c80c3004d037ded6}, messageAnnotations={}, properties=null, extraProperties = TypedProperties[_AMQ_AD=TEST_QUEUE]]: java.lang.IllegalArgumentException: Array must not be empty or null
at org.apache.qpid.proton.codec.CompositeReadableBuffer.append(CompositeReadableBuffer.java:688) [proton-j-0.33.10.jar:]
at org.apache.qpid.proton.engine.impl.DeliveryImpl.send(DeliveryImpl.java:345) [proton-j-0.33.10.jar:]
at org.apache.qpid.proton.engine.impl.SenderImpl.send(SenderImpl.java:74) [proton-j-0.33.10.jar:]
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonServerSenderContext$LargeMessageDeliveryContext.deliverInitialPacket(ProtonServerSenderContext.java:686) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonServerSenderContext$LargeMessageDeliveryContext.deliver(ProtonServerSenderContext.java:587) [artemis-amqp-protocol-2.19.1.jar:2.19.1]
I have 6 consumers for this queue if one message out of many (let's say 1,000) is large - it should process other messages but, the processing is stopped completely with 0 consumers on the queue.

When you send large messages, within the first block sent the large message is parsed for properties. Most likely you are using large properties in a way the server is not being able to parse the first few bytes of the message. You should avoid using large properties and let the large portion only for the client.
We tried following up with many tests on this JIRA, and the only plausible scenarios would be through large properties, or some incomplete message your client is generating in a way the server can't parse it.
https://issues.apache.org/jira/browse/ARTEMIS-3837
if you provide a reproducer showing how the property is failing to be parsed we will follow up with a possible fix.
Please move any discussion regarding this to the JIRA. It could be a bug caused by your anti-pattern.

Kafka producer huge memory usage (leak?)

We have a 3 broker Kafka 0.10.1.0 deployment in production. There are some applications which have Kafka Producers embedded in them which send application logs to a topic. This topic has 10 partitions with replication factor of 3.
We are observing that memory usage on some of these application servers keep shooting through the roof intermittently. After taking heapdump we found out that top suspects were:
**org.apache.kafka.common.network.Selector -**
occupies 352,519,104 (24.96%) bytes. The memory is accumulated in one instance of "byte[]" loaded by "<system class loader>".
**org.apache.kafka.common.network.KafkaChannel -**
occupies 352,527,424 (24.96%) bytes. The memory is accumulated in one instance of "byte[]" loaded by "<system class loader>"
Both of these were holding about 352MB of space. 3 such instances, so they were consuming about 1.2GB of memory.
Now regarding usage of producers. Not a huge amount of logs are being sent to Kafka cluster. It is about 200 msgs/sec. Only one producer object is being used throughout application. Async send function is used.
What could be the cause of such huge memory usage? Is this some sort of memory leak in this specific Kafka version?
Here's Kafka Producer config being used in production.
kafka.bootstrap.servers=x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092
kafka.acks=0
kafka.key.serializer=org.apache.kafka.common.serialization.StringSerializer
kafka.value.serializer=org.apache.kafka.common.serialization.StringSerializer
kafka.max.block.ms=1000
kafka.request.timeout.ms=1000
kafka.max.in.flight.requests.per.connection=1
kafka.retries=0
kafka.compression.type=gzip
kafka.security.protocol=SSL
kafka.ssl.truststore.location=/data/kafka/kafka-server-truststore.jks
kafka.ssl.truststore.password=XXXXXX
kafka.linger.ms=300
logger.level=INFO
Here's a section from GC log showing Kafka network thread allocation
<allocation-stats totalBytes="3636833992" >
<allocated-bytes non-tlh="3525405200" tlh="111428792" />
<largest-consumer threadName="kafka-producer-network-thread | producer-1" threadId="0000000033A26700" bytes="3525287448" />
</allocation-stats>
<gc-op id="591417" type="scavenge" timems="21.255" contextid="591414" timestamp="2018-09-19T17:55:32.938">
<scavenger-info tenureage="14" tenuremask="4000" tiltratio="89" />
<memory-copied type="nursery" objects="61155" bytes="6304384" bytesdiscarded="3968416" />
<memory-copied type="tenure" objects="1199" bytes="230312" bytesdiscarded="38656" />
<finalization candidates="461" enqueued="316" />
<ownableSynchronizers candidates="18" cleared="5" />
<references type="soft" candidates="231" cleared="0" enqueued="0" dynamicThreshold="23" maxThreshold="32" />
<references type="weak" candidates="20" cleared="2" enqueued="1" />
<references type="phantom" candidates="2" cleared="0" enqueued="0" />
</gc-op>
<gc-end id="591418" type="scavenge" contextid="591414" durationms="21.715" usertimems="11.640" systemtimems="0.125" timestamp="2018-09-19T17:55:32.939" activeThreads="64">
<mem-info id="591419" free="4226106664" total="6049234944" percent="69">
<mem type="nursery" free="3855164752" total="4294967296" percent="89">
<mem type="allocate" free="3855164752" total="3865444352" percent="99" />
<mem type="survivor" free="0" total="429522944" percent="0" />
</mem>
<mem type="tenure" free="370941912" total="1754267648" percent="21">
<mem type="soa" free="362646600" total="1740233728" percent="20" />
<mem type="loa" free="8295312" total="14033920" percent="59" />
</mem>
<pending-finalizers system="315" default="1" reference="1" classloader="0" />
<remembered-set count="4110" />
</mem-info>
</gc-end>
<cycle-end id="591420" type="scavenge" contextid="591414" timestamp="2018-09-19T17:55:32.940" />
<allocation-satisfied id="591421" threadId="0000000033A26700" bytesRequested="352518920" />
<af-end id="591422" timestamp="2018-09-19T17:55:32.962" />
<exclusive-end id="591423" timestamp="2018-09-19T17:55:32.962" durationms="45.987" />

There can be many reasons.But if you need to optimise,you can try out the below things:
replica.fetch.max.bytes-Buffer size of each partition.Number of partitions multiplied by the size of the largest message does not exceed available memory.
Same applies for consumers- fetch.message.max.bytes,max.partition.fetch.bytes-The maximum amount of data per-partition the server will return. Check pointing of missing data can be done by using- replica.high.watermark.checkpoint.interval.ms which will tune the throughput.
2.batch.size(shouldn't exceed available memory) and linger.ms(sets the maximum time to buffer data,if its asynchronous)

Spring-amqp - Message processing delayed

We have a Java/spring/tomcat application deployed on a RHEL 7.0 VM, which uses AlejandroRivera/embedded-rabbitmq and it starts the Rabbitmq server as soon as the war gets deployed, and it connects to it. We have multiple queues that we use to handle and filter out events.
The flow is something like this:
event that we received -> publish event queue -> listener class filters events --> publish to another queue for processing
-> we publish to yet another queue for logging.
The issue is:
Processing starts normally, we can see messages flowing though the queues, but after some time the listener class, stops receiving events. It seems like we were able to publish it to the RabbitMQ channel, but it never got out of the queue to the listener.
This seems to start degrading causing events to be processed after some time, rising up till minutes. The load isn't as high, it's like around 200 events, from which we care about it's only a handful of them.
What we tried:
Initially the queues had pre-fetch set to 1, and consumers to be min of 2 and max of 5, we removed pre-fetch and we added more consumers as max concurrency setting, but the issue is still there, the delay just takes longer to present, but after a few minutes, the processing starts to take around 20/30 seconds.
We see in the logs that we published the event to the queue, and we see the log that we got it off the queue with a delay. So there's nothing running in our code in the middle to generate this delay.
As far as we can tell, the rest of the queues seem to process messages properly, but it's this one that gets in this stuck mode..
The errors that I see, are the following, but I'm usure what it means and if it's related:
Jun 4 11:16:04 server: [pool-3-thread-10] ERROR com.rabbitmq.client.impl.ForgivingExceptionHandler - Consumer org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$InternalConsumer#70dfa413 (amq.ctag-VaWc-hv-VwcUPh9mTQTj7A) method handleDelivery for channel AMQChannel(amqp://agent#127.0.0.1:5672/,198) threw an exception for channel AMQChannel(amqp://agent#127.0.0.1:5672/,198)
Jun 4 11:16:04 server: java.io.IOException: Unknown consumerTag
Jun 4 11:16:04 server: at com.rabbitmq.client.impl.ChannelN.basicCancel(ChannelN.java:1266)
Jun 4 11:16:04 server: at sun.reflect.GeneratedMethodAccessor180.invoke(Unknown Source)
Jun 4 11:16:04 server: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Jun 4 11:16:04 server: at java.lang.reflect.Method.invoke(Method.java:498)
Jun 4 11:16:04 server: at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:955)
Jun 4 11:16:04 server: at com.sun.proxy.$Proxy119.basicCancel(Unknown Source)
Jun 4 11:16:04 server: at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$InternalConsumer.handleDelivery(BlockingQueueConsumer.java:846)
Jun 4 11:16:04 server: at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
Jun 4 11:16:04 server: at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:100)
Jun 4 11:16:04 server: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Jun 4 11:16:04 server: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Jun 4 11:16:04 server: at java.lang.Thread.run(Thread.java:748)
This one happens on shutdown of the application, but I've seen it happen while the application is still running..
2018-06-05 13:22:45,443 ERROR CachingConnectionFactory$DefaultChannelCloseLogger - Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 109, class-id=60, method-id=120)
I'm not sure how to address these two errors, nor if they are related.
Here's my Spring config:
<!-- Queues -->
<rabbit:queue id="monitorIncomingEventsQueue" name="MonitorIncomingEventsQueue"/>
<rabbit:queue id="interestingEventsQueue" name="InterestingEventsQueue"/>
<rabbit:queue id="textCallsEventsQueue" name="TextCallsEventsQueue"/>
<rabbit:queue id="callDisconnectedEventQueue" name="CallDisconnectedEventQueue"/>
<rabbit:queue id="incomingCallEventQueue" name="IncomingCallEventQueue"/>
<rabbit:queue id="eventLoggingQueue" name="EventLoggingQueue"/>
<!-- listeners -->
<bean id="monitorListener" class="com.example.rabbitmq.listeners.monitorListener"/>
<bean id="interestingEventsListener" class="com.example.rabbitmq.listeners.InterestingEventsListener"/>
<bean id="textCallsEventListener" class="com.example.rabbitmq.listeners.TextCallsEventListener"/>
<bean id="callDisconnectedEventListener" class="com.example.rabbitmq.listeners.CallDisconnectedEventListener"/>
<bean id="incomingCallEventListener" class="com.example.rabbitmq.listeners.IncomingCallEventListener"/>
<bean id="eventLoggingEventListener" class="com.example.rabbitmq.listeners.EventLoggingListener"/>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="5" max-concurrency="40" acknowledge="none">
<rabbit:listener queues="interestingEventsQueue" ref="interestingEventsListener" method="handleIncomingMessage"/>
</rabbit:listener-container>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="5" max-concurrency="20" acknowledge="none">
<rabbit:listener queues="textCallsEventsQueue" ref="textCallsEventListener" method="handleIncomingMessage"/>
</rabbit:listener-container>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="5" max-concurrency="20" acknowledge="none">
<rabbit:listener queues="callDisconnectedEventQueue" ref="callDisconnectedEventListener" method="handleIncomingMessage"/>
</rabbit:listener-container>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="5" max-concurrency="30" acknowledge="none">
<rabbit:listener queues="incomingCallEventQueue" ref="incomingCallEventListener" method="handleIncomingMessage"/>
</rabbit:listener-container>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="1" max-concurrency="3" acknowledge="none">
<rabbit:listener queues="monitorIncomingEventsQueue" ref="monitorListener" method="handleIncomingMessage"/>
</rabbit:listener-container>
<rabbit:listener-container connection-factory="connectionFactory" message-converter="defaultMessageConverter" concurrency="5" max-concurrency="10" acknowledge="none">
<rabbit:listener queues="EventLoggingQueue" ref="eventLoggingEventListener" method="handleLoggingEvent"/>
</rabbit:listener-container>
<rabbit:connection-factory id="connectionFactory" host="${host.name}" port="${port.number}" username="${user.name}" password="${user.password}" connection-timeout="20000"/>
I've read here that the delay on processing could be caused by a network problem, but in this case the server and the app are on the same VM. It's a locked down environment, so most ports aren't open, but I doubt that's what's wrong.
More logs: https://pastebin.com/4QMFDT7A
Any help is appreciated,
Thanks,

I need to see much more log than that - this is the smoking gun:
Storing...Storing delivery for Consumer#a2ce092: tags=[{}]
The (consumer) tags is empty, which means the consumer has already been canceled at that time (for some reason, which should appear earlier in the log).
If there's any chance you could reproduce with 1.7.9.BUILD-SNAPSHOT, I added some TRACE level logging which should help diagnosing this.
EDIT
In reply to your recent comment on rabbitmq-users...
Can you try with fixed concurrency? Variable concurrency in Spring AMQP's container is often not very useful because consumers will typically only be reduced if the entire container is idle for some time.
It might explain, however, why you are seeing consumers being canceled.
Perhaps there are/were some race conditions in that logic; using a fixed number of consumers (don't specify max...) will avoid that; if you can try it, it will at least eliminate that as a possibility.
That said, I am confused (I didn't notice this in your Stack Overflow configuration); with acknowledge="none" there should be no acks being sent to the broker (NONE is used to set the autoAck)
String consumerTag = this.channel.basicConsume(queue, this.acknowledgeMode.isAutoAck(), ...
and
public boolean isAutoAck() {
return this == NONE;
}
Are you sending acks from your code? If so , the ack mode should be MANUAL. I can't see a scenario where the container will send an ack for a NONE ack mode.

Jgroups cluster works for few hours/couple of days and then fails sending messages.

We have production environment using jgroups 2.6.13. Cluster is created within a subnet with different machines.
Jgroups view shows all members in cluster and works fine for few hours even sometimes couple of days. However then fails sending packets even when cluster shows all members in the cluster view.
What is best way to debug in production without stopping processes. Which utilities in jgroups 2.6.13 would help simulate this.
Below is the stack configuration
<config>
<UDP mcast_addr="${jgroups.udp.mcast_addr:224.0.0.37}" mcast_port="${jgroups.udp.mcast_port:45576}" tos="8" ucast_recv_buf_size="20000000" ucast_send_buf_size="640000" mcast_recv_buf_size="25000000" mcast_send_buf_size="640000" loopback="false" discard_incompatible_packets="true" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="true" ip_ttl="${jgroups.udp.ip_ttl:2}" enable_bundling="true" enable_diagnostics="true" thread_naming_pattern="cl" use_concurrent_stack="true" thread_pool.enabled="true" thread_pool.min_threads="2" thread_pool.max_threads="8" thread_pool.keep_alive_time="5000" thread_pool.queue_enabled="true" thread_pool.queue_max_size="1000" thread_pool.rejection_policy="Run" oob_thread_pool.enabled="true" oob_thread_pool.min_threads="1" oob_thread_pool.max_threads="8" oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false" oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="Run"/>
<SNPING timeout="2000" num_initial_members="3" bind_port="34343" subnets="10.30.21.0,10.30.25.0"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD timeout="10000" max_tries="5" shun="true"/>
<VERIFY_SUSPECT timeout="1500"/>
<pbcast.NAKACK use_stats_for_retransmission="false" exponential_backoff="150" use_mcast_xmit="true" gc_lag="0" retransmit_timeout="50,300,600,1200" discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>
<VIEW_SYNC avg_send_interval="60000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000" shun="false" view_bundling="true"/>
</config>

How can I improve the tcp connection performance?

I have the following code:
<int-ip:tcp-connection-factory id="client"
type="client" host="${netSocketServer}" port="${netPort}"
single-use="true" so-timeout="${netSoTimeOut}" />
<int:channel id="input" />
<int-ip:tcp-outbound-gateway id="outGateway"
request-channel="input" reply-channel="reply" connection-factory="client"
request-timeout="${netRequestTimeout}" reply-timeout="${netReplyTimeout}" />
<int:channel id="reply" datatype="java.lang.String" />
I must speed up my connection with other server.
Can I make an inprove on that code?
I log the following connection times to make me think abount the slow connection.
Waiting time was: 5985
Waiting time was: 6015
Waiting time was: 1578,
Waiting time was: 5610,
Waiting time was: 5735,
Waiting time was: 1734,
Waiting time was: 1797,
Waiting time was: 1515,
Waiting time was: 1469,
Waiting time was: 6003,
Waiting time was: 6656
Thanks in advance.

You haven't provided the units.
If they're milliseconds you have a serious network problem unrelated to your Spring-Integration config.
If they're microseconds, that looks normal, there's nothing wrong.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cluster instability with TCPPING protocol - java

Related

Redistribution fails for large messages in ActiveMQ Artemis

Kafka producer huge memory usage (leak?)

Spring-amqp - Message processing delayed

Jgroups cluster works for few hours/couple of days and then fails sending messages.

How can I improve the tcp connection performance?

Categories

Resources