Kafka Client Connection Pooling - java

Does it make sense to perform producer/consumer connection pooling of kafka clients?
Does kafka internally maintain a list of connection objects initialized and ready to use?
We'd like to minimize time of connection creation, so that there is no additional overhead when it comes to send/receive messages.
Currently we're using apache commons-pool library GenericObjectPool to keep connections around.
Any help will be appreciated.

Kafka clients maintain their own connections to the clusters.
Both the Producer and Consumer keep connections alive to the brokers they are interacting with. In case they stop interacting, after connections.max.idle.ms the connection will be closed. This setting also exists on the broker so you may want to verify with your admin if they changed this value.
So in most cases, once started Kafka clients don't create many new connections but just use the ones created at startup

Related

How to check if connection is the performance bottleneck

I am using Apache Ignite with a Java application and observing that with increasing concurrency, response times also increases. I noticed that there is only one connection established between the java application and the Ignite server. How can I confirm if that is the bottleneck? Thread dumps reveal that some threads are waiting for the Socket.Read method. Is it relatable to number of connections?
As of Ignite 2.7.6, Thin Client establishes only one connection to the server node. Yes, it can become a bottleneck when used from multiple threads.
I can recommend either having one IgniteClient instance per thread, or using some kind of a connection pool.
Also, Ignite 2.8 introduces Partition Awareness (release is planned for today), where thin client connection is established to every specified server node, and key-based requests are dispatched to primary nodes. This may help in your case as well.
Did you tried the applications that comes with the java JDK (JVisualVM) or best yourkit to identify where you're loosing time ?

Robust web Socket Streaming to Kafka in Java

I need to record from an unreliable web socket connection and stream into Kafka.
Our Kafka cluster is pretty reliable and we can make it highly available.
What is the best approach to make the web socket connection as reliable as possible? I would like to minimize data loss.
One solution would be to have multiple processes or web socket clients listening and streaming to multiple Kafka topics. Then do a filter with Kafka streams. This only works if each message that I obtain has a unique id, which is not always the case.
Another solution would be to monitor the web socket connection and restart or reset it. But then I might have data loss. Or rely on web socket heartbeats?
Or code my own error handlers?
What frameworks / libraries from the Java are in this space to do the best job? Currently, I use the web socket client from org.java-websocket.
I think this is a pretty standard use case in web socket and also in Kafka producer development, so I am sure I do not need to reinvent the wheel.

ActiveMQ client: share connection between sessions?

I'm using the ActiveMQ client library to connect my server application to ActiveMQ. Several different consumers and producers run in individual threads. How should the relationship between ActiveMQConnectionFactory, ActiveMQConnection and ActiveMQSession be?
one connection factory per JVM
one connection to the broker per JVM or n connections, one per consumer
n sessions, one per consumer (the Javadoc seems to strongly suggest this)
Have a look at How do I use JMS efficiently?.
You should also think about using connection pooling.

RabbitMQ and relationship between channel and connection

The RabbitMQ Java client has the following concepts:
Connection - a connection to a RabbitMQ server instance
Channel - ???
Consumer thread pool - a pool of threads that consume messages off the RabbitMQ server queues
Queue - a structure that holds messages in FIFO order
I'm trying to understand the relationship, and more importantly, the associations between them.
I'm still not quite sure what a Channel is, other than the fact that this is the structure that you publish and consume from, and that it is created from an open connection. If someone could explain to me what the "Channel" represents, it might help clear a few things up.
What is the relationship between Channel and Queue? Can the same Channel be used to communicate to multiples Queues, or does it have to be 1:1?
What is the relationship between Queue and the Consumer Pool? Can multiple Consumers be subscribed to the same Queue? Can multiple Queues be consumed by the same Consumer? Or is the relationship 1:1?
A Connection represents a real TCP connection to the message broker, whereas a Channel is a virtual connection (AMQP connection) inside it. This way you can use as many (virtual) connections as you want inside your application without overloading the broker with TCP connections.
You can use one Channel for everything. However, if you have multiple threads, it's suggested to use a different Channel for each thread.
Channel thread-safety in Java Client API Guide:
Channel instances are safe for use by multiple threads. Requests into
a Channel are serialized, with only one thread being able to run a
command on the Channel at a time. Even so, applications should prefer
using a Channel per thread instead of sharing the same Channel across
multiple threads.
There is no direct relation between Channel and Queue. A Channel is used to send AMQP commands to the broker. This can be the creation of a queue or similar, but these concepts are not tied together.
Each Consumer runs in its own thread allocated from the consumer thread pool. If multiple Consumers are subscribed to the same Queue, the broker uses round-robin to distribute the messages between them equally. See Tutorial two: "Work Queues".
It is also possible to attach the same Consumer to multiple Queues.
You can understand Consumers as callbacks. These are called everytime a message arrives on a Queue the Consumer is bound to. For the case of the Java Client, each Consumers has a method handleDelivery(...), which represents the callback method. What you typically do is, subclass DefaultConsumer and override handleDelivery(...). Note: If you attach the same Consumer instance to multiple queues, this method will be called by different threads. So take care of synchronization if necessary.
A good conceptual understanding of what the AMQP protocol does "under the hood" is useful here. I would offer that the documentation and API that AMQP 0.9.1 chose to deploy makes this particularly confusing, so the question itself is one which many people have to wrestle with.
TL;DR
A connection is the physical negotiated TCP socket with the AMQP server. Properly-implemented clients will have one of these per application, thread-safe, sharable among threads.
A channel is a single application session on the connection. A thread will have one or more of these sessions. AMQP architecture 0.9.1 is that these are not to be shared among threads, and should be closed/destroyed when the thread that created it is finished with it. They are also closed by the server when various protocol violations occur.
A consumer is a virtual construct that represents the presence of a "mailbox" on a particular channel. The use of a consumer tells the broker to push messages from a particular queue to that channel endpoint.
Connection Facts
First, as others have correctly pointed out, a connection is the object that represents the actual TCP connection to the server. Connections are specified at the protocol level in AMQP, and all communication with the broker happens over one or more connections.
Since it's an actual TCP connection, it has an IP Address and Port #.
Protocol parameters are negotiated on a per-client basis as part of setting up the connection (a process known as the handshake.
It is designed to be long-lived; there are few cases where connection closure is part of the protocol design.
From an OSI perspective, it probably resides somewhere around Layer 6
Heartbeats can be set up to monitor the connection status, as TCP does not contain anything in and of itself to do this.
It is best to have a dedicated thread manage reads and writes to the underlying TCP socket. Most, if not all, RabbitMQ clients do this. In that regard, they are generally thread-safe.
Relatively speaking, connections are "expensive" to create (due to the handshake), but practically speaking, this really doesn't matter. Most processes really will only need one connection object. But, you can maintain connections in a pool, if you find you need more throughput than a single thread/socket can provide (unlikely with current computing technology).
Channel Facts
A Channel is the application session that is opened for each piece of your app to communicate with the RabbitMQ broker. It operates over a single connection, and represents a session with the broker.
As it represents a logical part of application logic, each channel usually exists on its own thread.
Typically, all channels opened by your app will share a single connection (they are lightweight sessions that operate on top of the connection). Connections are thread-safe, so this is OK.
Most AMQP operations take place over channels.
From an OSI Layer perspective, channels are probably around Layer 7.
Channels are designed to be transient; part of the design of AMQP is that the channel is typically closed in response to an error (e.g. re-declaring a queue with different parameters before deleting the existing queue).
Since they are transient, channels should not be pooled by your app.
The server uses an integer to identify a channel. When the thread managing the connection receives a packet for a particular channel, it uses this number to tell the broker which channel/session the packet belongs to.
Channels are not generally thread-safe as it would make no sense to share them among threads. If you have another thread that needs to use the broker, a new channel is needed.
Consumer Facts
A consumer is an object defined by the AMQP protocol. It is neither a channel nor a connection, instead being something that your particular application uses as a "mailbox" of sorts to drop messages.
"Creating a consumer" means that you tell the broker (using a channel via a connection) that you would like messages pushed to you over that channel. In response, the broker will register that you have a consumer on the channel and begin pushing messages to you.
Each message pushed over the connection will reference both a channel number and a consumer number. In that way, the connection-managing thread (in this case, within the Java API) knows what to do with the message; then, the channel-handling thread also knows what to do with the message.
Consumer implementation has the widest amount of variation, because it's literally application-specific. In my implementation, I chose to spin off a task each time a message arrived via the consumer; thus, I had a thread managing the connection, a thread managing the channel (and by extension, the consumer), and one or more task threads for each message delivered via the consumer.
Closing a connection closes all channels on the connection. Closing a channel closes all consumers on the channel. It is also possible to cancel a consumer (without closing the channel). There are various cases when it makes sense to do any of the three things.
Typically, the implementation of a consumer in an AMQP client will allocate one dedicated channel to the consumer to avoid conflicts with the activities of other threads or code (including publishing).
In terms of what you mean by consumer thread pool, I suspect that Java client is doing something similar to what I programmed my client to do (mine was based off the .Net client, but heavily modified).
I found this article which explains all aspects of the AMQP model, of which, channel is one. I found it very helpful in rounding out my understanding
https://www.rabbitmq.com/tutorials/amqp-concepts.html
Some applications need multiple connections to an AMQP broker. However, it is undesirable to keep many TCP connections open at the same time because doing so consumes system resources and makes it more difficult to configure firewalls. AMQP 0-9-1 connections are multiplexed with channels that can be thought of as "lightweight connections that share a single TCP connection".
For applications that use multiple threads/processes for processing, it is very common to open a new channel per thread/process and not share channels between them.
Communication on a particular channel is completely separate from communication on another channel, therefore every AMQP method also carries a channel number that clients use to figure out which channel the method is for (and thus, which event handler needs to be invoked, for example).
There is a relation between like A TCP connection can have multiple Channels.
Channel: It is a virtual connection inside a connection. When publishing or consuming messages from a queue - it's all done over a channel Whereas Connection: It is a TCP connection between your application and the RabbitMQ broker.
In multi-threading architecture, you may need a separate connection per thread. That may lead to underutilization of TCP connection, also it adds overhead to the operating system to establish as many TCP connections it requires during the peak time of the network. The performance of the system could be drastically reduced. This is where the channel comes handy, it creates virtual connections inside a TCP connection. It straightaway reduces the overhead of the OS, also it allows us to perform asynchronous operations in a more faster, reliable and simultaneously way.

How do I take advantage of Connection Pooling with Apache Active MQ?

I'd like to know how to properly use connection pooling with Active MQ.
Currently I have a Connection Factory that creates a new connection every time I want to send a message.
I'd like to be able to pool Connections so I don't incur the overhead of connecting every time.
you need to use activemq-pool module and PooledConnectionFactory.
See http://activemq.apache.org/spring-support.html for some more info on the topic

Categories