Kafka with Domain Events - java

In my event driven project I have messages of type Commands and in response I have Events.
These Commands and Events messages express the domain so they hold complex types from the domain.
Example:
RegisterClientCommand(Name, Email)
ClientRegisteredEvent(ClientId)
There are tens more of these command and event pairs in the domain.
I was thinking of something like:
RawMessage(payloadMap, sequenceId, createdOn)
The payload would hold the message domain class type name and the message fields.
I was also reading about Avro format but seems like a lot of work defining the message format for each message.
What's the best practice in terms of the message format that's actually transmitted through the Kafka brokers?

There's no single "best" way to do it, it will all depend on the expertise in your team/organization, and the specific requirements for your project.
Kafka itself is indifferent to what messages actually contain. Most of the time, it just sees message values and keys as opaque byte arrays.
Whatever you end up defining your RawMessage as on the Java side, it will have to be serialized as byte arrays to produce it into Kafka, because that's what KafkaProducer requires. Maybe it's a custom string serializer you already have, maybe you can serialize a POJO to JSON using Jackson or something similar. Or maybe you simply send a huge comma-delimited string as the message. It's completely up to you.
What's important is that the consumer, when they pull the message from the kafka topic, are able to correctly and reliably read the data from each field in the message, without any errors, version conflicts, etc. Most serde/schema mechanisms that exist, like Avro, Protobuf or Thrift, try to make this job easier for you. Especially complex things like making sure new messages are backwards-compatible with previous versions of the same message.
Most people end up with some combination of:
Serde mechanisms for creating the byte arrays to produce into Kafka, some popular ones are Avro, Protobuf, Thrift.
Raw JSON strings
A huge string with some kind of internal/custom format that be parsed/analyzed.
Some companies use a centralized schema service. This is so your data consumers don't have to know ahead of time what schema the message contains, they just pull down the message, and request the corresponding schema from the service. Confluent has their own custom schema registry solution that has supported Avro for years, and as of a few weeks ago, officially supports Protobuf now. This is not required, and if you own the producer/consumer end-to-end, you might decide to handle the serialization by yourself, but a lot of people are used to it.
Depending on the message type, sometimes you want compression because the messages could be very repetitive and/or large, so you'd end up saving quite some storage and bandwidth if you send in compressed messages, at the cost of some CPU usage and latency. This could also be handled by yourself on the producer/consumer side, compressing the byte arrays after they've been serialized, or you can request message compression directly on the producer side (look for compression.type in the Kafka docs).

Related

Object to bytes array in Java

I'm working on a proprietary TCP protocol. This protocol sends and receive messages with a specific sequence of bytes.
I should be complaiant to this protocol, and i cant change it.
So my input / output results are something like that :
\x01\x08\x00\x01\x00\x00\x01\xFF
\x01 - Message type
\x01 - Message type
\x00\x01 - Length
\x00\x00\x01 - Transaction
\xFF - Body
The sequence of field is important. And i want only the values of the fields in my serialization, and nothing about the structure of the class.
I'm working on a Java controller that use this protocol and I've thought to define the message structures in specific classes and serialize/deserialize them, but I was naive.
First of all I tried ObjectOutputStream, but it output the entire structure of the object, when I need only the values in a specific order.
Someone already faced this problem:
Java - Object to Fixed Byte Array
and solved it with a dedicated Marshaller.
But I was searching for a more flexible solution.
For text serialization and deserialization I've found:
http://jeyben.github.io/fixedformat4j/
that with annotation defines the schema of the line. But it outputs a String, not a byte[]. So 1 is output like "1" that is represented differently based on encoding, and often with more bytes.
What I was searching for is something that given the order of my class properties will convert each property in a bunch of bytes (based on the internal representation) and append them to a byte[].
Do you know some library used for that purpose?
Or a simple way to do that, without coding a serialization algorithm for each of my entities?
Serialization just isn't easy; it sounds from your question like you feel you can just invoke something and out rolls compact, simple, versionable, universal data you can then put on the wire. What you need to fix is to scratch the word 'just' from that sentence. You're going to have to invest some time and care.
As you figured out already, java's baked in serialization has a ton of downsides. Don't use that.
There are various serializers. The popular ones are things like GSON or Jackson, which lets you serialize java objects into JSON. This isn't particularly efficient, and is string based. This sounds like crucial downsides but they really aren't, see below.
You can also spend a little more time specifying the exact format and use protobuf which lets you write a quite lean and simple data protocol (and protobuf is available for many languages, if eventually you want to write an participant in this protocol in non-java later).
So, those are the good options: Go to JSON via Jackson or GSON, or, use protobuf.
But JSON is a string.
You can turn a string to bytes trivially using str.getBytes(StandardCharsets.UTF_8). This cannot fail due to charset encoding differences (as long as you also 'decode' in the same fashion: Turn the bytes into a string with new String(theBytes, StandardCharsets.UTF_8). UTF-8 is guaranteed to be available on all JVMs; if it is not there, your JVM is as broken as a JVM that is missing the String class - not something to worry about.
But JSON is inefficient.
Zip it up, of course. You can trivially wrap an InputStream and an OutputStream so that gzip compression is applied which is simple, available on just about every platform, and fast (it's not the most efficient cutting edge compression algorithm, but usually squeezing the last few bytes out is not worth it) - and zipped-up JSON can often be more efficient that carefully handrolled protobuf, even.
The one downside is that it's 'slow', but on modern hardware, note that the overhead of encrypting and decrypting this data (which you should obviously be doing!!) is usually multiple orders of magnitude more involved. A modern CPU is simply very, very fast - creating JSON and zipping it up is going to take 1% of CPU or less even if you are shipping the collected works of shakespeare every second.
If an arduino running on batteries needs to process this data, go with uncompressed, unencrypted protobuf-based data. If you are facebook and writing the whatsapp protocol, the IAAS creds saved by not having to unzip and decode JSON is tiny and pales in comparison to the creds you spend just running the servers, but at that scale its worth the development effort.
In just about every other case, just toss gzipped JSON on the line.

Is it possible to send a byte array in a json over socket.io java client

this is actually two questions in one but they are closely related
I have read plenty of times that you have to send either json/string or binary data over a websocket like socket.io but that you cannot mix these types. But then I was puzzled by finding the following example in the official documentation of the socket.io client java implementation
// Sending an object
JSONObject obj = new JSONObject();
obj.put("hello", "server");
obj.put("binary", new byte[42]);
socket.emit("foo", obj);
// Receiving an object
socket.on("foo", new Emitter.Listener() {
#Override
public void call(Object... args) {
JSONObject obj = (JSONObject)args[0];
}
});
where the "binary" element of that json is clearly binary as the name suggests. The documentation talks about socket.io using org.json, but i couldnt find that this library supports adding binary data to json files anywhere.
is this functionality now supported? if so, what is socket.io doing in the background? is it splitting the emit in two separate messages and then remerging it? or is it simply saving the binary data in base64?
A bit of background.
I am trying to add a private chat functionality to my app so that a user can have multiple private two-party audio-message based chat conversations with several other users. i am having issues finding out how to tell my server how to forward all the messages. if i use a json i can simply add a sender and a receiver to the json and have the server read the id of the receiver and forward the message accordingly. but i am not sure how to handle messages containing only binary data. i have no idea how to add metadata (such a sender and a receiver id) to them so that the server knows to whom they are addressed. i have heard the suggestion of sending a json with a sender id, a receiver id and an MD5 hash of the file i am trying to send, and then send the binary data alone separately and having the server match the two messages over the md5 signature, but that seems to come with problems of its own. like i dont know how having to calculate the MD5 of a ton of audio files on the server is going to affect server performance and there is also the issue of potentially receiving the audio byte array before the json specifying its destination has arrived.
there is always the alternative of encoding my audio files in base64 and sending them as json, as i have been doing so far, but i have been told this is a bad practice and should be avoided as if inflates package sizes.
i feel like there are a bunch of messaging apps already out there, and i bet at least some are based on websockets. I would like to know if there are any best practices on how to route binary data over a websocket to a specific receiving connection.
Ill upvote any answer to the questions above as well as any hint on how to tackle the problem mentioned in the background part.

What is the better apporach to print a large json data in java 'getOutputStream' or 'getWriter'

Scenario is like i need to print very large json data set. This json data is consumed by mobile application's. In Java service application, used as below to print the json.
response.getWriter().println(mainjson);
getWriter taking too much time to print all the data.
I heard about getOutputStream also. Which is faster in case of large json data?
Any help will be appreciated :-)
It depends on how you retrieve the data and whether your JSON serializer has a streaming api available.
At the moment you are probably operating in three seperate steps
Retrieving all your data
Serializing it to JSON string
Writing the JSON response.
If you are spending a substantial amount of time on the retrieval and serialization part by itself then you can potentially speed things up by using streams. However, this requires your data retrieval and json serializer to support streams.
When using streams, instead of doing everything in sequential steps you basically setup a pipeline that allows you to start writing the response a bit earlier. This is not guaranteed to be faster though, it depends on where your particular bottleneck occurs. If its almost all an issue with the IO to the client then you are not going to see a substantial difference.
Also
Something else to look into is check if your are compressing your response to the user. Gzip can have a substantial impact on the size of text data and may reduce your payload sufficiently to make this a non issue.

Is it possible to convert an at runtime unknown Avro message into a specific Java class

I'm looking into Avro as a serialization format to publish events in a Java application.
What I'd like to do is convert some Avro bytes into an instance of a specific Java class, which I generated using the Avro Maven plugin, and vice versa.
I'd like to do this because it would allow the developers using my code to subscribe to a specific event and receive an instance of the specific generated class representing that specific event (maybe casting it to the Event class from an Object, but not having to touch any Avro specific code).
I can do this in a specific way, by writing code using the SpecificDatumReader and passing in the generated class to specify which class I expect. Unfortunately, this would require writing code for every generated class. An alternative would be using the GenericDatumReader, but this wouldn't give me an instance of the generated class I would want. I think I want something in between these two solutions, get a specific object as the output but have the flexibility of the generic approach.
I'm thinking of a solution in which I check the schema of the serialized message and create a SpecificDatumReader for this, which in turn creates the instance of the generated class.
Is this at all possible? How would I go about this? Any help is appreciated!
Some more specific contextual information: I'm publishing and subscribing to these events using RabbitMQ in a Spring application. Spring offers a RabbitTemplate for easy use of RabbitMQ and this class allows you to set a MessageConverter. What I'd like to do is create a generic MessageConverter which uses the Avro schema I created to turn bytes into a number of possible Java objects (generated by the Maven plugin) and Java objects into bytes. The latter (objects to bytes) sounds doable but I have no idea how to go about the former (bytes to objects).
Update December 29th 2016: None of the suggested solutions worked for us. Eventually we stepped away from Avro and went for a completely different solution. Therefore, I won't accept the suggested answer as it didn't help me and I can't vouch for its correctness.
There's no Avro API to get the record type from an Avro binary encoding because the Avro binary encoding does not have that data. A record is encoded by encoding the values of its fields in the order that they are declared. In other words, a record is encoded as just the concatenation of the encodings of its fields. Field values are encoded per their schema. That's why Avro must be provided the schema originally used to encode the data in order to decode it.
Because the bytes output by Avro's binary encoding don't specify the record type, you must send the record type along side Avro's binary encoded data. Change the messages sent to RabbitMQ to be a composition of (1) the schema used to encode the data and (2) the Avro binary encoded data. If you don't what to blow up the message size by including the schema, you can include an identifier for the schema instead. The consumer program would retrieve the schema by that identifier from a schema registry. A custom MessageConverter can extract the schema from the message to see which of the many record types is in the message.

Apache MQ Scanning Message

I am new to Apache Active Message Queues.
While reading(Consuming) the messages from MQ, the de-queue count increasing and that message deleting from MQ storage.
Here, I want to scan the message without deleting the message from MQ and de-queue count as same. means, just I want scan the message and storing it local or printing it at output.
Can Any body Suggest on this? I want to implement it using java.
What you need is an ActiveMQQueueBrowser. You can find an example code here.
But you need to be careful with this approach. Messaging Queues are not designed for this kind of access, only some implementations (like ActiveMQ) provides this access-type for special use-cases. It should be used only if really necessary, and you need to understand the limitations of this:
The returned enumeration might not fetch the full content of the queue
The enumeration might contain a message that has been already dequeued by the time you process it
etc.

Categories