Communication of separate processes through Esper events - java

I am trying to let multiple java processes exchange events using Esper. One process should send events, the other prepares a query and reacts according to the reported events.
When both operations are done within the same java process, everything works fine. But when I use two different processes, they just don't see each other.
I am wondering what is the key for this communication. I used the same name for the provider. This is all I could do so far.
The Producer:
String aType = espertest.dummy.A.class.getName();
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
Object o = new A();
epService.getEPRuntime().sendEvent(o);
The Consumer:
String aType = A.class.getName();
String expression = "select count(*) from "+aType + "";
System.out.println("Our Query: " + expression);
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
EPStatement statement = epService.getEPAdministrator().createEPL(expression);
DummyListener listener = new DummyListener();
statement.addListener(listener);
System.out.println("Anything");
try{
A a = new A();
epService.getEPRuntime().sendEvent(a);
Thread.sleep(60000);
}catch(Exception E)
{
System.out.println("Exception ");
}
The consumer tries to count the events of type A. It also sends an instance of A as a test, and this works fine. The listener is called as expected.
The code above is just an excerpt.

You need to configure middleware (Message Queue, Distributed Cache, Networked FileSystem, Socket Connection, etc....) to get the events from the producer JVM to the consumer JVM. If you can deploy the producer and consumer to a container that supports Apache Camel (e.g. ServiceMix) then it should be trivial to stand up a prototype that uses ActiveMQ to transport your objects into Esper as Camel has support for both products.
JVM 1
From Data Source
To CEP Engine 1
To Message Queue
JVM 2 (also could host MQ Broker)
From Message Queue
To CEP Engine 2
To Destination
Update:
If the producer and consumer can be threads in the same JVM, then the issue may be in the consumer. I cannot see where the consumer does anything with the event from the producer. Try something like this instead (esper reference is provided to the producer/consumer and consumer is reworked with an update method to handle results of the select statement).
Test Driver:
public Driver() {
String aType = espertest.dummy.A.class.getName();
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
Consumer c = new Consumer(epService);
Producer p = new Producer(epService);
}
Producer:
public Producer(EPServiceProvider epsp) {
Object o = new A();
epsp.getEPRuntime().sendEvent(o);
}
Consumer:
public Consumer(EPServiceProvider epsp) {
EPStatement statement = epsp.getEPAdministrator().createEPL(input);
statement.setSubscriber(this);
}
public void update(A event) {
System.out.println("Consumer received event!");
}

Related

Redis Streams one message per consumer with Java

I'am trying to implement a java application with redis streams where every consomer consumes exactly one message. Like a pipeline/queue where every consumer takes exactly one message, processes it and after finishing the consumer takes the next message which was not processed so far in the stream.
What works is that every message is consumed by exactly one consumer (with xreadgroup).
I started with this tutorial from redislabs
The code:
RedisClient redisClient = RedisClient.create("redis://pw#host:port");
StatefulRedisConnection<String, String> connection = redisClient.connect();
RedisCommands<String, String> syncCommands = connection.sync();
try {
syncCommands.xgroupCreate(XReadArgs.StreamOffset.from(STREAM_KEY, "0-0"), ID_READ_GROUP);
} catch (RedisBusyException redisBusyException) {
System.out.println(String.format("\t Group '%s' already exists", ID_READ_GROUP));
}
System.out.println("Waiting for new messages ");
while (true) {
List<StreamMessage<String, String>> messages = syncCommands.xreadgroup(
Consumer.from(ID_READ_GROUP, ID_WORKER), ReadArgs.StreamOffset.lastConsumed(STREAM_KEY));
if (!messages.isEmpty()) {
System.out.println(messages.size()); //
for (StreamMessage<String, String> message : messages) {
System.out.println(message.getId());
Thread.sleep(5000);
syncCommands.xack(STREAM_KEY, ID_READ_GROUP, message.getId());
}
}
}
My current problem is that a consumer takes more that one message from the queue and in some situations the other consumers are waiting and one consumer is processing 10 messages at once.
Thanks in advance!
Notice that XREADGROUP can get COUNT argument.
See the JavaDoc how to do it in Lettuce xreadgroup, by passing XReadArgs.

Flink Kafka - how to make App run in Parallel?

I am creating a app in Flink to
Read Messages from a topic
Do some simple process on it
Write Result to a different topic
My code does work, however it does not run in parallel
How do I do that?
It seems my code runs only on one thread/block?
On the Flink Web Dashboard:
App goes to running status
But, there is only one block shown in the overview subtasks
And Bytes Received / Sent, Records Received / Sent is always zero ( no Update )
Here is my code, please assist me in learning how to split my app to be able to run in parallel, and am I writing the app correctly?
public class SimpleApp {
public static void main(String[] args) throws Exception {
// create execution environment INPUT
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment();
// event time characteristic
env_in.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// production Ready (Does NOT Work if greater than 1)
env_in.setParallelism(Integer.parseInt(args[0].toString()));
// configure kafka consumer
Properties properties = new Properties();
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("auto.offset.reset", "earliest");
// create a kafka consumer
final DataStream<String> consumer = env_in
.addSource(new FlinkKafkaConsumer09<>("test", new
SimpleStringSchema(), properties));
// filter data
SingleOutputStreamOperator<String> result = consumer.filter(new
FilterFunction<String>(){
#Override
public boolean filter(String s) throws Exception {
return s.substring(0, 2).contentEquals("PS");
}
});
// Process Data
// Transform String Records to JSON Objects
SingleOutputStreamOperator<JSONObject> data = result.map(new
MapFunction<String, JSONObject>()
{
#Override
public JSONObject map(String value) throws Exception
{
JSONObject jsnobj = new JSONObject();
if(value.substring(0, 2).contentEquals("PS"))
{
// 1. Raw Data
jsnobj.put("Raw_Data", value.substring(0, value.length()-6));
// 2. Comment
int first_index_comment = value.indexOf("$");
int last_index_comment = value.lastIndexOf("$") + 1;
// - set comment
String comment =
value.substring(first_index_comment, last_index_comment);
comment = comment.substring(0, comment.length()-6);
jsnobj.put("Comment", comment);
}
else {
jsnobj.put("INVALID", value);
}
return jsnobj;
}
});
// Write JSON to Kafka Topic
data.addSink(new FlinkKafkaProducer09<JSONObject>("localhost:9092",
"FilteredData",
new SimpleJsonSchema()));
env_in.execute();
}
}
My code does work, but it seems to run only on a single thread
( One block shown ) in web interface ( No passing of data, hence the bytes sent / received are not updated ).
How do I make it run in parallel ?
To run your job in parallel you can do 2 things:
Increase the parallelism of your job at the env level - i.e. do something like
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(4);
But this would only increase parallelism at flink end after it reads the data, so if the source is producing data faster it might not be fully utilized.
To fully parallelize your job, setup multiple partitions for your kafka topic, ideally the amount of parallelism you would want with your flink job. So, you might want to do something like below when you are creating your kafka topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 3 --partitions 4 --topic test

Vert.x performance drop when starting with -cluster option

I'm wondering if any one experienced the same problem.
We have a Vert.x application and in the end it's purpose is to insert 600 million rows into a Cassandra cluster. We are testing the speed of Vert.x in combination with Cassandra by doing tests in smaller amounts.
If we run the fat jar (build with Shade plugin) without the -cluster option, we are able to insert 10 million records in about a minute. When we add the -cluster option (eventually we will run the Vert.x application in cluster) it takes about 5 minutes for 10 million records to insert.
Does anyone know why?
We know that the Hazelcast config will create some overhead, but never thought it would be 5 times slower. This implies we will need 5 EC2 instances in cluster to get the same result when using 1 EC2 without the cluster option.
As mentioned, everything runs on EC2 instances:
2 Cassandra servers on t2.small
1 Vert.x server on t2.2xlarge
You are actually running into corner cases of the Vert.x Hazelcast Cluster manager.
First of all you are using a worker Verticle to send your messages (30000001). Under the hood Hazelcast is blocking and thus when you send a message from a worker the version 3.3.3 does not take that in account. Recently we added this fix https://github.com/vert-x3/issues/issues/75 (not present in 3.4.0.Beta1 but present in 3.4.0-SNAPSHOTS) that will improve this case.
Second when you send all your messages at the same time, it runs into another corner case that prevents the Hazelcast cluster manager to use a cache of the cluster topology. This topology cache is usually updated after the first message has been sent and sending all the messages in one shot prevents the usage of the ache (short explanation HazelcastAsyncMultiMap#getInProgressCount will be > 0 and prevents the cache to be used), hence paying the penalty of an expensive lookup (hence the cache).
If I use Bertjan's reproducer with 3.4.0-SNAPSHOT + Hazelcast and the following change: send message to destination, wait for reply. Upon reply send all messages then I get a lot of improvements.
Without clustering : 5852 ms
With clustering with HZ 3.3.3 :16745 ms
With clustering with HZ 3.4.0-SNAPSHOT + initial message : 8609 ms
I believe also you should not use a worker verticle to send that many messages and instead send them using an event loop verticle via batches. Perhaps you should explain your use case and we can think about the best way to solve it.
When you're you enable clustering (of any kind) to an application you are making your application more resilient to failures but you're also adding a performance penalty.
For example your current flow (without clustering) is something like:
client ->
vert.x app ->
in memory same process eventbus (negletible) ->
handler -> cassandra
<- vert.x app
<- client
Once you enable clustering:
client ->
vert.x app ->
serialize request ->
network request cluster member ->
deserialize request ->
handler -> cassandra
<- serialize response
<- network reply
<- deserialize response
<- vert.x app
<- client
As you can see there are many encode decode operations required plus several network calls and this all gets added to your total request time.
In order to achive best performance you need to take advantage of locality the closer you are of your data store usually the fastest.
Just to add the code of the project. I guess that would help.
Sender verticle:
public class ProviderVerticle extends AbstractVerticle {
#Override
public void start() throws Exception {
IntStream.range(1, 30000001).parallel().forEach(i -> {
vertx.eventBus().send("clustertest1", Json.encode(new TestCluster1(i, "abc", LocalDateTime.now())));
});
}
#Override
public void stop() throws Exception {
super.stop();
}
}
And the inserter verticle
public class ReceiverVerticle extends AbstractVerticle {
private int messagesReceived = 1;
private Session cassandraSession;
#Override
public void start() throws Exception {
PoolingOptions poolingOptions = new PoolingOptions()
.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 3)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 1)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 3)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 20)
.setMaxQueueSize(32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 20);
Cluster cluster = Cluster.builder()
.withPoolingOptions(poolingOptions)
.addContactPoints(ClusterSetup.SEEDS)
.build();
System.out.println("Connecting session");
cassandraSession = cluster.connect("kiespees");
System.out.println("Session connected:\n\tcluster [" + cassandraSession.getCluster().getClusterName() + "]");
System.out.println("Connected hosts: ");
cassandraSession.getState().getConnectedHosts().forEach(host -> System.out.println(host.getAddress()));
PreparedStatement prepared = cassandraSession.prepare(
"insert into clustertest1 (id, value, created) " +
"values (:id, :value, :created)");
PreparedStatement preparedTimer = cassandraSession.prepare(
"insert into timer (name, created_on, amount) " +
"values (:name, :createdOn, :amount)");
BoundStatement timerStart = preparedTimer.bind()
.setString("name", "clusterteststart")
.setInt("amount", 0)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStart);
EventBus bus = vertx.eventBus();
System.out.println("Bus info: " + bus.toString());
MessageConsumer<String> cons = bus.consumer("clustertest1");
System.out.println("Consumer info: " + cons.address());
System.out.println("Waiting for messages");
cons.handler(message -> {
TestCluster1 tc = Json.decodeValue(message.body(), TestCluster1.class);
if (messagesReceived % 100000 == 0)
System.out.println("Message received: " + messagesReceived);
BoundStatement boundRecord = prepared.bind()
.setInt("id", tc.getId())
.setString("value", tc.getValue())
.setTimestamp("created", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(boundRecord);
if (messagesReceived % 100000 == 0) {
BoundStatement timerStop = preparedTimer.bind()
.setString("name", "clusterteststop")
.setInt("amount", messagesReceived)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStop);
}
messagesReceived++;
//message.reply("OK");
});
}
#Override
public void stop() throws Exception {
super.stop();
cassandraSession.close();
}
}

How to check whether Kafka Server is running?

I want to ensure whether kafka server is running or not before starting production and consumption jobs. It is in windows environment and here's my kafka server's code in eclipse...
Properties properties = new Properties();
properties.setProperty("broker.id", "1");
properties.setProperty("port", "9092");
properties.setProperty("log.dirs", "D://workspace//");
properties.setProperty("zookeeper.connect", "localhost:2181");
Option<String> option = Option.empty();
KafkaConfig config = new KafkaConfig(properties);
KafkaServer kafka = new KafkaServer(config, new CurrentTime(), option);
kafka.startup();
In this case if (kafka != null) is not enough because it is always true. So is there any way to know that my kafka server is running and ready for producer. It is necessary for me to check this because it causes loss of some starting data packets.
All Kafka brokers must be assigned a broker.id. On startup a broker will create an ephemeral node in Zookeeper with a path of /broker/ids/$id. As the node is ephemeral it will be removed as soon as the broker disconnects, e.g. by shutting down.
You can view the list of the ephemeral broker nodes like so:
echo dump | nc localhost 2181 | grep brokers
The ZooKeeper client interface exposes a number of commands; dump lists all the sessions and ephemeral nodes for the cluster.
Note, the above assumes:
You're running ZooKeeper on the default port (2181) on localhost, and that localhost is the leader for the cluster
Your zookeeper.connect Kafka config doesn't specify a chroot env for your Kafka cluster i.e. it's just host:port and not host:port/path
You can install Kafkacat tool on your machine
For example on Ubuntu You can install it using
apt-get install kafkacat
once kafkacat is installed then you can use following command to connect it
kafkacat -b <your-ip-address>:<kafka-port> -t test-topic
Replace <your-ip-address> with your machine ip
<kafka-port> can be replaced by the port on which kafka is running. Normally it is 9092
once you run the above command and if kafkacat is able to make the connection then it means that kafka is up and running
I used the AdminClient api.
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("connections.max.idle.ms", 10000);
properties.put("request.timeout.ms", 5000);
try (AdminClient client = KafkaAdminClient.create(properties))
{
ListTopicsResult topics = client.listTopics();
Set<String> names = topics.names().get();
if (names.isEmpty())
{
// case: if no topic found.
}
return true;
}
catch (InterruptedException | ExecutionException e)
{
// Kafka is not available
}
For Linux, "ps aux | grep kafka" see if kafka properties are shown in the results. E.g. /path/to/kafka/server.properties
Paul's answer is very good and it is actually how Kafka & Zk work together from a broker point of view.
I would say that another easy option to check if a Kafka server is running is to create a simple KafkaConsumer pointing to the cluste and try some action, for example, listTopics(). If kafka server is not running, you will get a TimeoutException and then you can use a try-catch sentence.
def validateKafkaConnection(kafkaParams : mutable.Map[String, Object]) : Unit = {
val props = new Properties()
props.put("bootstrap.servers", kafkaParams.get("bootstrap.servers").get.toString)
props.put("group.id", kafkaParams.get("group.id").get.toString)
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val simpleConsumer = new KafkaConsumer[String, String](props)
simpleConsumer.listTopics()
}
The good option is to use AdminClient as below before starting to produce or consume the messages
private static final int ADMIN_CLIENT_TIMEOUT_MS = 5000;
try (AdminClient client = AdminClient.create(properties)) {
client.listTopics(new ListTopicsOptions().timeoutMs(ADMIN_CLIENT_TIMEOUT_MS)).listings().get();
} catch (ExecutionException ex) {
LOG.error("Kafka is not available, timed out after {} ms", ADMIN_CLIENT_TIMEOUT_MS);
return;
}
Firstly you need to create AdminClient bean:
#Bean
public AdminClient adminClient(){
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,
StringUtils.arrayToCommaDelimitedString(new Object[]{"your bootstrap server address}));
return AdminClient.create(configs);
}
Then, you can use this script:
while (true) {
Map<String, ConsumerGroupDescription> groupDescriptionMap =
adminClient.describeConsumerGroups(Collections.singletonList(groupId))
.all()
.get(10, TimeUnit.SECONDS);
ConsumerGroupDescription consumerGroupDescription = groupDescriptionMap.get(groupId);
log.debug("Kafka consumer group ({}) state: {}",
groupId,
consumerGroupDescription.state());
if (consumerGroupDescription.state().equals(ConsumerGroupState.STABLE)) {
boolean isReady = true;
for (MemberDescription member : consumerGroupDescription.members()) {
if (member.assignment() == null || member.assignment().topicPartitions().isEmpty()) {
isReady = false;
}
}
if (isReady) {
break;
}
}
log.debug("Kafka consumer group ({}) is not ready. Waiting...", groupId);
TimeUnit.SECONDS.sleep(1);
}
This script will check the state of the consumer group every second till the state will be STABLE. Because all consumers assigned to topic partitions, you can conclude that server is running and ready.
you can use below code to check for brokers available if server is running.
import org.I0Itec.zkclient.ZkClient;
public static boolean isBrokerRunning(){
boolean flag = false;
ZkClient zkClient = new ZkClient(endpoint.getZookeeperConnect(), 10000);//, kafka.utils.ZKStringSerializer$.MODULE$);
if(zkClient!=null){
int brokersCount = zkClient.countChildren(ZkUtils.BrokerIdsPath());
if(brokersCount > 0){
logger.info("Following Broker(s) {} is/are available on Zookeeper.",zkClient.getChildren(ZkUtils.BrokerIdsPath()));
flag = true;
}
else{
logger.error("ERROR:No Broker is available on Zookeeper.");
}
zkClient.close();
}
return flag;
}
I found an event OnError in confluent Kafka:
consumer.OnError += Consumer_OnError;
private void Consumer_OnError(object sender, Error e)
{
Debug.Log("connection error: "+ e.Reason);
ConsumerConnectionError(e);
}
And its documentation in code:
//
// Summary:
// Raised on critical errors, e.g. connection failures or all brokers down. Note
// that the client will try to automatically recover from errors - these errors
// should be seen as informational rather than catastrophic
//
// Remarks:
// Executes on the same thread as every other Consumer event handler (except OnLog
// which may be called from an arbitrary thread).
public event EventHandler<Error> OnError;

Apache ActiveMQ fall back mechanism

I am using Apache Active MQ with Spring..... the problem I am facing is that I am creating producer on one machine let say Machine1 and I am creating one consumer on second machine let say Machine2...
I am creating producer on Machine1 by calling a simple servlet call.... and then create a consumer on Machine2....
Problem I am facing is that suppose in anyway if my producer is not able to send any data packet in specified time duration then I want to delete my consumer and Queue from Machine2...
Is there anyway I can set my Consumer and Queue to get auto delete and perform some business logic if I do not get any packet from producer in a specified time duration....
connectionFactory = new ActiveMQConnectionFactory(ActiveMQConnection.DEFAULT_USER,ActiveMQConnection.DEFAULT_PASSWORD,ConnectorURL);
connection = connectionFactory.createConnection();
connection.start();
session = connection.createSession(transacted, Session.AUTO_ACKNOWLEDGE);
destination = session.createQueue(queueID+"");
connection = connectionFactory.createConnection();
connection.start();
consumer = session.createConsumer(destination);
basically this code create consumer for my application....then I assign this consumer to my application listener that listen if producer send any message to consumer....
ScenarioExecutionQueueListenerImpl executionQueueListener = new ScenarioExecutionQueueListenerImpl(scenario,result, host);
beanFactory.autowireBean(executionQueueListener);
connection.setExceptionListener(executionQueueListener);
Message message = consumer.receive();
consumer.setMessageListener(executionQueueListener);
executionQueueListener.setConsumer(consumer);
executionQueueListener.onMessage(message);
I would not setup a messagelistener for that case, but just use the consumer.receive() method. The MessageListener is more for time independent/asynchronous consuming.
public void run(){
Message m = consumer.receive(timeout_value_in_millisec);
if( m != null ){
// got a message, handle it.
processMessage(msg);
}else{
// no message received in specified time,
}
// close session, connection etc.
}
public void processMessage(Message msg){
}

Categories