How does Kafka Schema registration happen in Spring Cloud Stream? - java

I am trying to understand how to use Spring Cloud Streams with the Kafka Binder.
Currently, I am trying to register an AVRO schema with my Confluent Schema Registry and send messages to a topic.
I am unable to understand how the schema registration is being done by Spring Cloud Streams behind the scenes.
Lets take this example from the Spring Cloud Stream samples.
The AVRO schema is located in src/resources/avro
When the mvn:compile goal is run the POJO for the AVRO schema is generated and the producer can post data.
But what I am not able to understand is how Spring Cloud Stream is doing the schema registration to AVRO ?
#Autowired
StreamBridge streamBridge;
#Bean
public Supplier<Sensor> supplier() {
return () -> {
Sensor sensor = new Sensor();
sensor.setId(UUID.randomUUID().toString() + "-v1");
sensor.setAcceleration(random.nextFloat() * 10);
sensor.setVelocity(random.nextFloat() * 100);
sensor.setTemperature(random.nextFloat() * 50);
return sensor;
};
}
#Bean
public Consumer<Sensor> receiveAndForward() {
return s -> streamBridge.send("sensor-out-0", s);
}
#Bean
Consumer<Sensor> receive() {
return s -> System.out.println("Received Sensor: " + s);
}
Is it done when the beans are created ?
Or is it done when the first message is sent ? If so then how does Spring Stream know where to find the .avsc file from ?
Basically what is happening under the hood ?
There seems to be no mention about this is in the docs.
Thanks.

Your serialization strategy (in this case, AVRO) is always handled in the serializers (for producers) and deserializers (for consumers).
You can have Avro (de)serialized keys and/or Avro (de)serialized values. Which means one should pass in KafkaAvroSerializer.class/KafkaAvroDeserializer.class to the producer/consumer configs, respectively. On top of this, one must pass in the schema.registry.url to the clients config as well.
So behind the scenes, spring cloud stream makes your application avro compatible when it creates your producers/consumers (using the configs found in application.properties or else where). Your clients will connect to the schema registry (logs will tell you if failed to connect) on start up, but does not do any schema registration out of the box.
Schema registration is done on the first message that gets sent. If you haven't already, you'll see that the generated POJOs contain the schemas already, so spring cloud stream doesn't need the .avsc files at all. For example, my last generated Avro pojo contained (line 4) :
#org.apache.avro.specific.AvroGenerated
public class AvroBalanceMessage extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = -539731109258473824L;
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse({\"type\":\"record\",\"name\":\"AvroBalanceMessage\",\"namespace\":\"tech.nermindedovic\",\"fields\"[{\"name\":\"accountNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"routingNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"balance\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"default\":\"0.00\"},{\"name\":\"errors\",\"type\":\"boolean\",\"default\":false}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
.......
When producers send this pojo, it communicates to the registry about the current version of the schema. If the schema is not in the registry, then the registry will store it and identify it by ID. The producer sends the message with its schema ID to the Kafka broker. On the other hand, the consumer will get this message and check if its seen the ID (stored in cache so you don't always have to retrieve the schema from the Registry) and if it hasnt, it will communicate with the registry to get such information about the message.
A bit outside of the scope of spring cloud stream, but one can also use the REST API for SR to manually register schemas.

Related

Spring Cloud Stream Kafka: Specify JSON Type Mappings

I have a Kafka consumer that is implemented using Spring's Kafka Streams API. The consumer looks something like this:
#Bean
public Consumer<KStream<String, Foo>> fooProcess() {
return input -> input
.foreach((key, value) -> {
processFoo(value);
});
}
The problem I'm having is that messages consumed from this topic are serialized as type some.package.foo, but my application uses some.other.package.foo. I know it's possible to map these two types using standard Spring Kafka, but I can't for the life of me figure out how to specify this mapping when using the streams API.
Any guidance would be greatly appreciated!

Spring Cloud #StreamListener condition deprecated what is the alternative

We have multiple applications consumer listening to the same kafka topic and a producer sets the message header when sending message to the topic so specific instance can evaluate the header and process the message. eg
#StreamListener(target=ITestSink.CHANNEL_NAME,condition="headers['franchiseName'] == 'sydney'")
public void fullfillOrder(#Payload TestObj message) {
log.info("sydney order request received message is {}",message.getName());
}
In Spring Cloud Stream 3.0.0 the #StreamListener is deprecated and I could not find the equivalent of the condition property in Function.
Any suggestion?
Though I was not able to find the equivalent for the functional approach either, I do have a suggestion.
The #StreamListener annotations condition does not stop the fact that the application must consume the message, read its header, and filter out specific records before passing it to the listener (fullfillOrder()). So it's safe to assume you're consuming every message that hits the topic regardless (by the event receiver that Spring Cloud has implemented for us under the hood), but the listener only gets executed when header == sydney.
If there was a way to configure the event receiver that Spring Cloud uses (to discard message before hitting listener), I would suggest looking into that. If not, would resort to filtering out any messages (non-sydney) before doing any processing. If you're familiar with Spring Cloud's functional approach, would look something like this:
#Bean
public Consumer<Message<TestObj>> fulfillOrder() {
return msg -> {
// to get header - msg.getHeaders().get(key, valueType);
// filter out bad messages
}
}
or
#Bean
public Consumer<ConsumerRecord<?, TestObj>> fulfillOrder() {
return msg -> {
// msg.headers().lastHeader("franchiseName").value() -> filter em out
}
}
Other:
^ my code assumes you're integrating the kafka-client API with Spring cloud stream via spring-cloud-stream-binder-kafka. based on tags listed, i will note Spring Cloud Stream has two versions of binders for Kafka - one for the kafka client library, and one for kafka streams library.
Without considering Spring Cloud / Frameworks, the high-lvl DSL in kafka streams doesn't give you access to headers, but the low-level Processor API does. From the example, it seems like you're leveraging the client binder and not spring-cloud-stream-binder-kafka-streams / kafka streams binder. I haven't seen an implementation of spring cloud stream + kafka streams binder using the low-level processor API, so i can't tell if that was the aim.

set header on rsocket messages with spring boot

so I've started playing with rsocket and spring boot 2.2 to see if I can use it in my projects, but I'm facing a bit of troubles.
Normally, with spring messaging I define a listener method like the following:
#MessageMapping("addGeolocation")
public Mono<Boolean> addGeolocation(#Header("metadata") MmeMetadata metadata, #Payload String geolocation) { ... }
My understanding is that with rsocket I should be able to use the same logic, but when I'm defining the client I couldn't find an easy way to set message headers.
Currently I'm stuck with this:
boolean outcome = rSocketRequester.route("addGeolocation").metadata(...?).data(geolocationWKT).block();
is the metadata a replacement for headers? that method signature seems a little too generic to be used like headers. If I put an Map in it will spring be able to decode headers out of it?
Thank you,
Fernando
Please see this question: RSocket Metadata - custom object.
I used it as a starting point for my solution.
The term 'header' actually means some custom metadata. So, in order to get the correct value you need to configure metadataExtractorRegistry. For Spring Boot do it this way (code in kotlin):
val CUSTOM_MIMETYPE = MimeType.valueOf("<some custom mime type>")
val CUSTOM_HEADER = "<the name of a header>"
...
#Bean
fun rSocketStrategiesCustomizer(): RSocketStrategiesCustomizer {
return RSocketStrategiesCustomizer { strategies: RSocketStrategies.Builder ->
strategies.metadataExtractorRegistry {
it.metadataToExtract(CUSTOM_MIMETYPE, String::class.java, CUSTOM_HEADER)
}
}
}
The type of the data object can be any, not necessarily a String. There is default String endcoder/decoder, so I didn't provide one in the code. For your own type you can provide one of existing encoders/decoders (Json for example) or create your own:
#Bean
fun rSocketStrategiesCustomizer(): RSocketStrategiesCustomizer {
return RSocketStrategiesCustomizer { strategies: RSocketStrategies.Builder ->
strategies.metadataExtractorRegistry {
it.metadataToExtract(CUSTOM_MIMETYPE, YourType::class.java, CUSTOM_HEADER)
}.decoder(Jackson2JsonDecoder())
.encoder(Jackson2JsonEncoder())
}
}
After you've configured registry as above, use the header name defined in the registry in your controller:
#MessageMapping("addGeolocation")
public Mono<Boolean> addGeolocation(#Header(CUSTOM_HEADER) String metadata, #Payload String geolocation) { ... }
And in order to send that header, use next code:
boolean outcome = rSocketRequester.route("addGeolocation")
.metadata("value", CUSTOM_MIMETYPE)
.data(geolocationWKT)
.block();
Hope this helps
Instead of a bag of name-value pairs (i.e. headers), RSocket uses metadata which can be in any format (i.e. MIME type) and it can be composite metadata with multiple types of metadata each formatted differently. So you can have one section with routing metadata, another with security, yet another with tracing, and so on.
To achieve something similar to headers, you can send name-value pairs as JSON-formatted metadata. Now on the server side, you'll need to provide a hint to Spring for how to extract a Map (of headers) from the metadata of incoming requests. To do that you can configure a MetadataExtractor and that's described in this section of the docs. Once that's configured, the extracted Map becomes the headers of the message and can be accessed from #MessageMapping methods as usual (via MessageHeaders, #Header, etc).

Creating a generic Avro sink in Spring Cloud Stream / Dataflow

I am trying to create a generic receiver for Avro messages in Spring Cloud Data Flow but I am running into some bother. The setup I have currently is a processor converting my input data into an Avro message and pushing this out to the sink. I am using the Spring Schema Registry server and I can see the the schema being POSTed to it and successfully being stored, I can also see it being successfully being retrieved from the registry server by my sink.
If I place my Avro generated object from the processor into the sink application and configure my sink like so with the type declared it works perfectly.
#StreamListener(Sink.INPUT)
public void logHandler(DataRecord data) {
LOGGER.info("data='{}'", data.toString());
}
However, I would like to make it so that my sink does not need to be aware of the schema ahead of time e.g. use the schema from the schema registry and access the fields through data.get("fieldName")
I was hoping to accomplish this through use of the Avro GenericRecord like so:
#StreamListener(Sink.INPUT)
public void logHandler(GenericRecord data) {
LOGGER.info("data='{}'", data.toString());
}
But this throws an exception into the logs:
2017-12-05 12:10:15,206 DEBUG -L-2 o.s.w.c.RestTemplate:691 - GET request for "http://192.168.99.100:8990/datarecord/avro/v1" resulted in 200 (null)
org.springframework.messaging.converter.MessageConversionException: No schema can be inferred from type org.apache.avro.generic.GenericRecord and no schema has been explicitly configured.
Is there a way to accomplish what I am trying to do?

Spring cloud stream to support routing messages dynamically

i want to create a common project (using spring cloud stream) to route messages to different (consumer) projects dynamically according to message content. (rabbitmq as the message broker)
does spring cloud stream support it? if not, any proposed way to accomplish that? thx
You can achieve that by setting spring.cloud.stream.dynamicDestinations property to a list of destination names (if you know the name beforehand) or keeping it as empty. The BinderAwareChannelResolver takes care of dynamically creating/binding the outbound channel for these dynamic destinations.
There is an out of the box router application available which does the similar thing.
You can use StreamBridge with topicname and spring-cloud will bind it with destination automatically in runtime.
#Autowired
private final StreamBridge streamBridge;
public void sendDynamically(Message message, String topicName) {
streamBridge.send(route, topicName);
}
https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/spring-cloud-stream.html#_streambridge_and_dynamic_destinations

Categories