Jzmq producer doesn't block when consumer is full

Jzmq producer doesn't block when consumer is full - java

I have set up a ZeroMQ pipeline through VPN. However, the producer doesn't consider the consumption capacity of consumer. The producer keeps sending messages to the consumer due to which the RAM consumption has increased extremely.
I want to find the reason behind this problem. Maybe its due to UDP VPN channel.

Q : ( find the reason behind this problem ) Maybe its due to UDP VPN channel.
Disclaimer,given the ultimate Information Asymmetry is still pending here :
Well,given there are so far Zero-pieces-of-information (the less the Community promoted MCVE ~ Minimum-Complete-Verifiable-Eample-of-code-based problem formulation), either about the UDP channel, or about the composition, configuration and interconnection(s) of "producer" and "consumer" entities, except of just mentioning the ZeroMQ by itself, this answer can be and will be based but on generally available knowledge.
Answer :
ZeroMQ framework is based on a few, rather cardinal, principles :
Rule 1 ) it is Broker-less - i.e. the resulting ecosystem resembles a network of independently working agents.
Rule 2 ) user can implement whatever add-on capability, so as to extend the Rule 1.
This said, if you configure your application-domain behaviour of your ZeroMQ-interconnected agents in proper manner, the local agent may receive some indication from remote agent(s) about its(theirs) problems with RAM ( for details about possible restrictive configurations of sets-of-Tx/Rx-queue(s) see the well published API documentation )
Finally :
It is almost sure the UDP, the less the VPN has hardly anything to do with the "problem",
( the producer doesn't consider the consumption capacity of consumer. )
as this is the by-design property of the ZeroMQ concept, unless one implements any application-domain specific distributed-FSA layer, atop the ZeroMQ trivial scalable formal communication pattern archetypes, that would provision such app-level add-on-service signalling/messaging among the otherwise autonomous agents ( ZeroMQ Context()-instance equipped agent-alike entities ).
If interested, feel free to read more about ZeroMQ here & get inspired by it's beauties and powers

Related

GRPC: make high-throughput client in Java/Scala

I have a service that transfers messages at a quite high rate.
Currently it is served by akka-tcp and it makes 3.5M messages per minute. I decided to give grpc a try.
Unfortunately it resulted in much smaller throughput: ~500k messages per minute an even less.
Could you please recommend how to optimize it?
My setup
Hardware: 32 cores, 24Gb heap.
grpc version: 1.25.0
Message format and endpoint
Message is basically a binary blob.
Client streams 100K - 1M and more messages into the same request (asynchronously), server doesn't respond with anything, client uses a no-op observer
service MyService {
rpc send (stream MyMessage) returns (stream DummyResponse);
}
message MyMessage {
int64 someField = 1;
bytes payload = 2; //not huge
}
message DummyResponse {
}
Problems:
Message rate is low compared to akka implementation.
I observe low CPU usage so I suspect that grpc call is actually blocking internally despite it says otherwise. Calling onNext() indeed doesn't return immediately but there is also GC on the table.
I tried to spawn more senders to mitigate this issue but didn't get much of improvement.
My findings
Grpc actually allocates a 8KB byte buffer on each message when serializes it. See the stacktrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at com.google.common.io.ByteStreams.createBuffer(ByteStreams.java:58)
at com.google.common.io.ByteStreams.copy(ByteStreams.java:105)
at io.grpc.internal.MessageFramer.writeToOutputStream(MessageFramer.java:274)
at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:230)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:53)
at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37)
at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:252)
at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:473)
at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:457)
at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:346)
Any help with best practices on building high-throughput grpc clients appreciated.

I solved the issue by creating several ManagedChannel instances per destination. Despite articles say that a ManagedChannel can spawn enough connections itself so one instance is enough it's wasn't true in my case.
Performance is in parity with akka-tcp implementation.

Interesting question. Computer network packages are encoded using a stack of protocols, and such protocols are built on top of the specifications of the previous one. Hence the performance (throughput) of a protocol is bounded by the performance of the one used to built it, since you are adding extra encoding/decoding steps on top of the underlying one.
For instance gRPC is built on top of HTTP 1.1/2, which is a protocol on the Application layer, or L7, and as such its performance is bound by the performance of HTTP. Now HTTP itself is build on top of TCP, which is at Transport layer, or L4, so we can deduce that gRPC throughput cannot be larger than an equivalent code served in the TCP layer.
In other words: if you server is able to handle raw TCP packages, how adding new layers of complexity (gRPC) would improve performance?

I'm quite impressed with how good Akka TCP has performed here :D
Our experience was slightly different. We were working on much smaller instances using Akka Cluster. For Akka remoting, we changed from Akka TCP to UDP using Artery and achieved a much higher rate + lower and more stable response time. There is even a config in Artery helping to balance between CPU consumption and response time from a cold start.
My suggestion is to use some UDP based framework which also takes care of transmission reliability for you (e.g. that Artery UDP), and just serialize using Protobuf, instead of using full flesh gRPC. The HTTP/2 transmission channel is not really for high throughput low response time purposes.

How to make ZMQ pub client socket buffer messages while sub server socket is down

Given 2 applications where
application A is using a publisher client to contentiously stream data to application B which has a sub server socket to accept that data, how can we configure pub client socket in application A such that when B is being unavailable (like its being redeployed, restarted) A buffers all the pending messages and when B becomes available buffered messages go trough and socket catches up with real time stream?
In a nutshell, how do we make PUB CLIENT socket buffer messages with some limit while SUB SERVER is unavailable?
The default behaviour for PUB client is to drop in mute state, but it would be great if we could change that to a limit sized buffer, is it possible with zmq? or do i need to do it on application level...
I've tried setting HWM and LINGER in my sockets, but if i'm not wrong they are only responsible for slow consumer case, where my publisher is connected to subscriber, but subscriber is so slow that publisher starts to buffer messages (hwm will limit number of those messages)...
I'm using jeromq since i'm targeting jvm platform.

First of all, welcome to the world of Zen-of-Zero, where latency matters most
PROLOGUE :
ZeroMQ was designed by a Pieter HINTJENS' team of ultimately experienced masters - Martin SUSTRIK to be named first. The design was professionally crafted so as to avoid any unnecessary latency. So asking about having a (limited) persistence? No, sir, not confirmed - PUB/SUB Scalable Formal Communication Pattern Archetype will not have it built-in, right because of the added problems and decreased performance and scalability ( add-on latency, add-on processing, add-on memory-management ).
If one needs a (limited) persistence (for absent remote-SUB-side agent(s)' connections ), feel free to implement it on the app-side, or one may design and implement a new ZMTP-compliant such behaviour-pattern Archetype, extending the ZeroMQ framework, if such work goes into stable and publicly accepted state, but do not request the high-performance, latency-shaved standard PUB/SUB having polished the almost linear scalability ad astra, to get modified in this direction. It is definitely not a way to go.
Solution ?
App-side may easily implement your added logic, using dual-pointer circular buffers, working in a sort-of (app-side-managed)-Persistence-PROXY, yet in-front-of the PUB-sender.
Your design may get successful in squeezing some additional sauce from the ZeroMQ internal details in case your design also enjoys to use the recently made available built-in ZeroMQ-socket_monitor-component to setup an additional control-layer and receive there a stream of events as seen from "inside" the PUB-side Context-instance, where some additional network and connection-management related events may bring more light into your (app-side-managed)-Persistence-PROXY
Yet, be warned that
The _zmq_socket_monitor()_ method supports only connection-oriented
transports, that is, TCP, IPC, and TIPC.
so one may straight forget about this in case any of the ultimately interesting transport-classes was planned to be used { inproc:// | norm:// | pgm:// | epgm:// | vmci:// }
Heads up !
There are inaccurate, if not wrong, pieces of information from our Community honorable member smac89, who tried his best to address your additional interest expressed in the comment:
"...zmq optimizes publishing on topics? like if you keep publishing on some 100char long topic rapidly, is it actually sending the topic every time or it maps to some int and sends the int subsequently...?"
telling you:
"It will always publish the topic. When I use the pub-sub pattern, I usually publish the topic first and then the actual message, so in the subscriber I just read the first frame and ignore it and then read the actual message"
ZeroMQ does not work this way. There is nothing as a "separate" <topic> followed by a <message-body>, but rather the opposite
The TOPIC and the mechanisation of topic-filtering works in a very different way.
1) you never know, who .connect()-s:i.e. one can be almost sure the version 2.x till version 4.2+ will handle the topic-filtering in different manner ( ZMTP:RFC defines intial capability-version handshaking, to let the Context-instance decide, which version of topic-filtering will have to be used: ver 2.x used to move all messages to all peers, and let all the SUB-sides ( of ver 2.x+ ) be delivered the message ( and let the SUB-side Context-instance process the local topic-list filter processing )whereasver 4.2+ are sure to perform the topic-list filter processing on **the PUB-side Context-instance (CPU-usage grows, network-transport the opposite ), so your SUB-side will never be delivered a byte of "useless" read "not-subscribed" to messages.
2) (you may, but) there is no need to separate a "topic" into a first-frame of a thus-implied multi-frame message. Perhaps just the opposite ( it is a rather anti-pattern to do this in high performance, low-latecy distributed system design.
Topic filtering process is defined and works byte-wise, from left-to-right, pattern matching for each of the topic-list member value agains the delivered message payload.
Adding extra data, extra frame-management processing just and only does increase the end-to-end latency and processing overhead. Never a good idea to do this instead of proper distributed-system design work.
EPILOGUE :
There are no easy wins nor any low-hanging fruit in professional distributed-systems design, the less if low-latency or ultra-low-latency are the design targets.
On the other hand, be sure that ZeroMQ framework was made with this in mind and these efforts were crowned with stable, ultimately performant well-balanced set of tools for smart (by design), fast (in operation) and scalable (as hell may envy) signaling/messaging services people love to use right because of this design wisdom.
Wish you live happy with ZeroMQ as it is and feel free to add any additional set of features "in front" of the ZeroMQ layer, inside your application suite of choice.

I'm posting a quick update since the other two answers (though very informative were actually wrong), and i dont want others to be misinformed from my accepted answer. Not only you can do this with zmq, it is actually the default behaviour.
The trick is that if you publisher client never connected to the subscriber server before it keeps dropping messages (and that is why i was thinking it does not buffer messages), but if your publisher connects to subscriber and you restart subscriber, publisher will buffer messages until HWM is reached which is exactly what i asked for... so in short publisher wants to know there is someone on the other end accepting messages only after that it will buffer messages...
Here is some sample code which demonstrates this (you might need to do some basic edits to compile it).
I used this dependency only org.zeromq:jeromq:0.5.1.
zmq-publisher.kt
fun main() {
val uri = "tcp://localhost:3006"
val context = ZContext(1)
val socket = context.createSocket(SocketType.PUB)
socket.hwm = 10000
socket.linger = 0
"connecting to $uri".log()
socket.connect(uri)
fun publish(path: String, msg: Msg) {
">> $path | ${msg.json()}".log()
socket.sendMore(path)
socket.send(msg.toByteArray())
}
var count = 0
while (notInterrupted()) {
val msg = telegramMessage("message : ${++count}")
publish("/some/feed", msg)
println()
sleepInterruptible(1.second)
}
}
and of course zmq-subscriber.kt
fun main() {
val uri = "tcp://localhost:3006"
val context = ZContext(1)
val socket = context.createSocket(SocketType.SUB)
socket.hwm = 10000
socket.receiveTimeOut = 250
"connecting to $uri".log()
socket.bind(uri)
socket.subscribe("/some/feed")
while (true) {
val path = socket.recvStr() ?: continue
val bytes = socket.recv()
val msg = Msg.parseFrom(bytes)
"<< $path | ${msg.json()}".log()
}
}
Try running publisher first without subscriber, then when you launch subscriber you missed all the messages so far... now without restarting publisher, stop subscriber wait for some time and start it again.
Here is an example of one of my services actually benefiting from this...
This is the structure [current service]sub:server <= pub:client[service being restarted]sub:server <=* pub:client[multiple publishers]
Because i restart the service in the middle, all the publishers start buffering their messages, the final service that was observing ~200 messages per second observes drop to 0 (those 1 or 2 are heartbeats) then sudden burst of 1000+ messages come in, because all publishers flushed their buffers (restart took about 5 seconds)... I am actually not loosing a single message here...
Note that you must have subscriber:server <= publisher:client pair (this way publisher knows "there is only one place i need to deliver these messages to" (you can try binding on publisher and connecting on subscriber but you will not see publisher buffering messages anymore simply because its questionable if subscriber that just disconnected did it because it no longer needs the data or because it failed)

As we've discussed in the comments there is no way for the publisher to buffer messages while having nothing connected to it, it will simply drop any new messages:
From the docs:
If a publisher has no connected subscribers, then it will simply drop all messages.
This means your buffer needs to be outside of zeromq's care. Your buffer could then be a list, or a database, or any other method of storage you choose, but you cannot use your publisher for doing that.
Now the next problem is dealing with how to detect that a subscriber has connected/disconnected. This is needed to tell us when we need to start reading from the buffer/filling the buffer.
I suggest using Socket.monitor and listening for the ZMQ_EVENT_CONNECTED and ZMQ_EVENT_DISCONNECTED, as these will tell you when a client has connected/disconnected and thus enable you to switching to filling your buffer of choice. Of course, there might be other ways of doing this that does not directly involve zeromq, but that's up to you to decide.

What is the maximum number of topics an ActiveMQ broker is able to handle?

I have the following generic problem: given N sources of information, with M possible types of info for each source, what is the most efficient way to handle the topics hierarchy in ActiveMQ?
Typically N can be 100s to 10000s, while M should be 10.
Option 1
Have a hierarchy like
source1.*
source2.*
...
and a smart consumer that (on the application side) just drops the types of info not required.
Option 2
Have a hierarchy like
source1.type1
source1.type2
source1.type3
...
source2.type1
source2.type2
source2.type3
...
with a dumb consumer that accepts every message.
=================================
Option1 probably allows more sources, but more work on the consumer side (and more traffic on the network), while Option2 should be more efficient on the network traffic (and hopefully performances) but it could be much heavier on the broker resource consumption.
What's the best option?
Thank you very much
cghersi

There is nothing wrong with multiple sources publishing to the same topic. Generally it's a good idea to keep topics and queues down to manageable levels.
I would go for topic Type1, Type2, .., Type10
You can attach metadata to published messages with additional information using String properties. That way consumers can subscribe only to the data they really want using JMS selector that might include things such as data source, info type, date, priority, or what not.
MessageConsumer consumer = session.createConsumer(topic, "source = 'EU-market'");
If there would be 10 times 1000 of topics, each client would have to open listeners to multiple topics to get all information. Typically, a listener requires threads which requires a good portion of stack allocation. Better to let the clients decide by a selector.
This does not answer the question of what's the maximum limit of topics. Topics and queues adds overhead to the broker. They are registered as JMX MBeans and are shown in the web console gui. They might also allocate internal threads and memory resources. To figure out if a number of topics works or not - better test it with the setup intended to run it. Your milage may vary

Java, Massive message processing with queue manager (trading)

I would like to design a simple application (without j2ee and jms) that can process massive amount of messages (like in trading systems)
I have created a service that can receive messages and place them in a queue to so that the system won't stuck when overloaded.
Then I created a service (QueueService) that wraps the queue and has a pop method that pops out a message from the queue and if there is no messages returns null, this method is marked as "synchronized" for the next step.
I have created a class that knows how process the message (MessageHandler) and another class that can "listen" for messages in a new thread (MessageListener). The thread has a "while(true)" and all the time tries to pop a message.
If a message was returned, the thread calls the MessageHandler class and when it's done, he will ask for another message.
Now, I have configured the application to open 10 MessageListener to allow multi message processing.
I have now 10 threads that all time are in a loop.
Is that a good design??
Can anyone reference me to some books or sites how to handle such scenario??
Thanks,
Ronny

Seems from your description that you are on the right path, with one little exception. You implemented a busy wait on the retrieval of messages from the Queue.
A better way is to block your threads in the synchronised popMessage() method, doing a wait() on the queue resource when no more messages can be pop-ed. When adding (a) message(s) to the queue, the waiting threads are woken up via a notifyAll(), one or more threads will get a message and the rest re-enter the wait() state.
This way the distribution of CPU resources will be smoother.

I understand that queuing providers like Websphere and Sonic cost money, but there's always JBoss Messaging, FUSE with ApacheMQ, and others. Don't try and make a better JMS than JMS. Most JMS providers have persistence capabilities that for provide fault tolerance if the Queue or App server dies. Don't reinvent the wheel.

Reading between the lines a little it sounds like your not using a JMS provider such as MQ. Your solution sounds in the most parts to be ok however I would question your reasons for not using JMS.
You mention something about trading, I can confirm a lot of trading systems use JMS with and without j2ee. If you really want high performance, reliability and piece of mind don't reinvent the wheel by writing your own queuing system take a look at some of the JMS providers and their client API's.
karl

Event loop
How about using a event loop/message pump instead? I actually learned this technique from watching the excellent node.js video presentation from Ryan which I think you should really watch if not already.
You push at most 10 messages from Thread a, to Thread b(blocking if full). Thread a has an unbounded [LinkedBlockingQueue][3](). Thread b has a bounded [ArrayBlocking][4] of size 10 (new ArrayBlockingQueue(10)). Both thread a and thread b have an endless "while loop". Thread b will process messages available from the ArrayBlockingQueue. This way you will only have 2 endless "while loops". As a side note it might even be better to use 2 arrayBlockingQueues when reading the specification because of the following sentence:
Linked queues typically have higher
throughput than array-based queues but
less predictable performance in most
concurrent applications.
Off course the array backed queue has a disadvantage that it will use more memory because you will have to set the size prior(too small is bad, as it will block when full, too big could also be a problem if low on memory) use.
Accepted solution:
In my opinion you should prefer my solution above the accepted solution. The reason is that if it all posible you should only use the java.util.concurrent package. Writing proper threaded code is hard. When you make a mistake you will end up with deadlocks, starvations, etc.
Redis:
Like others already mentioned you should use a JMS for this. My suggestion is something along the line of this, but in my opinion simpler to use/install. First of all I assume your server is running Linux. I would advise you to install Redis. Redis is really awesome/fast and you should also use it as your datastore. It has blocking list operations which you can use. Redis will store your results to disc, but in a very efficient manner.
Good luck!

While it is now showing it's age, Practical .NET for Financial Markets demonstrates some of the universal concepts you should consider when developing a financial trading system. Athough it is geared toward .Net, you should be able to translate the general concepts to Java.

The separation of listening for the message and it's processing seems sensible to me. Having a scalable number of processing threads also is good, you can tune the number as you find out how much parallel processing works on your platform.
The bit I'm less happy about is the way that the threads poll for message arrival - here you're doing busy work, and if you add sleeps to reduce that then you don't react immediately to message arrival. The JMS APIs and MDBs take a more event driven approach. I would take a look at how that's implemented in an open source JMS so that you can see alternatives. [I also endorse the opinion that re-inventing JMS for yourself is probably a bad idea.] The thing to bear in mind is that as your systems get more complex, you add more queues and more processing busy work has greater impact.
The other concern taht I have is that you will hit limitiations of using a single machine, first you may allow greater scalability my allowing listeners to be on many machines. Second, you have a single point of failure. Clearly solving this sort of stuff is where the Messaging vendors make their money. This is another reason why Buy rather than Build tends to be a win for complex middleware.

You need very light, super fast, scalable queuing system. Try Hazelcast distributed queue!
It is a distributed implementation of java.util.concurrent.BlockingQueue. Check out the documentation for detail.
Hazelcast is actually a little more than a distributed queue; it is transactional, distributed implementation of queue, topic, map, multimap, lock, executor service for Java.
It is released under Apache license.

Distributed event handling mechanism for Java

I'm looking for a reasonably fast event handling mechanism in Java to generate and handle events across different JVMs running on different hosts.
For event handling across multiple threads in a single JVM, I found some good candidates like Jetlang. But in my search for a distributed equivalent , I couldn't find anything that was lightweight enough to offer good performance.
Does anyone know of any implementations that fit the bill?
Edit:
Putting numbers to indicate performance is a bit difficult. But for example, if you implement a heartbeating mechanism using events and the heartbeat interval is 5 seconds, the heartbeat receiver should receive a sent heartbeat within say a second or two.
Generally, a lightweight implementation gives good performance. A event handling mechanism involving a web server or any kind of centralized hub requiring powerful hardware (definitely not lightweight) to give good performance is not what I'm looking for.

Hazelcast Topic is a distributed pub-sub messaging solution.
public class Sample implements MessageListener {
public static void main(String[] args) {
Sample sample = new Sample();
Topic topic = Hazelcast.getTopic ("default");
topic.addMessageListener(sample);
topic.publish ("my-message-object");
}
public void onMessage(Object msg) {
System.out.println("Message received = " + msg);
}
}
Hazelcast also supports events on distributed queue, map, set, list. All events are ordered too.
Regards,
-talip
http://www.hazelcast.com

Depending on your use case, Terracotta may be an excellent choice.

AMQP(Advanced Message Queuing Protocol ) -- more details :
http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol is probably what you're looking for.
It is used by financial service companies for their high performance requirements -- apache has an implementation going -- http://cwiki.apache.org/qpid/
OpenAMQ - http://www.openamq.org/ is an older REFERENCE IMPLEMENTATION .

For distributed Event processing you could use Esper.It could process up to 500 000 event/s on a dual CPU 2GHz Intel based hardware.It's very stable because many banks use this solution. It supports JMS input and output adapter based on Spring JMS templates. So you could use any JMS implementation for event processing, i.e. ActiveMQ.

ZeroMQ - http://www.zeromq.org/
Although this is a transport layer, it can be tailored for event handling.

Whichever tool you use I'd recommend hiding the middleware APIs from your application logic. For example if you used the Apache Camel approach to hiding middleware you could then easily switch from AMQP to SEDA to JMS to ActiveMQ to JavaSpaces to your own custom MINA transport based on your exact requirements.
If you want to use a message broker I'd recommend using Apache ActiveMQ which is the most popular and powerful open source message broker with the largest most active community behind it both inside Apache and outside it.

Take a look at akka (http://akka.io/). It offers a distributed actor model in the same vein as erlang for the JVM with both java and scala APIs.

You need to implement Observer Design pattern for distributed event handling in java. I am using event Streaming using MongoDB capped collection and Observers to achieve this.
You can make an architecture in which your triggers a publish a document in capped collection and your observer thread waits for it using a tailable cursor.
If you did not understand what I have said above you need to brush up your MongoDB and java skills

If a JMS implementation isn't for you, then you may be interested in an XMPP approach. There are multiple implementations, and also have a Publish-Subscribe extension.

The Avis event router might be suitable for your needs. It's fast enough for near-real-time event delivery, such as sending mouse events for remote mouse control (an application we use it for daily).
Avis is also being used for chat, virtual presence, and smart room automation where typically 10-20 computers are communicating over an Avis-based messaging bus. Its commercial cousin (Mantara Elvin) is used for high-volume commercial trade event processing.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.