Cannot query local state store in Kafka Streams Application - java

I'm building a kafka streams application with spring-kafka to group records by key and apply some business logic. I'm following the configuration stated on spring-kafka-streams doc, but the problem is that when I want to retrieve a value from the local store I get the following error:
org.apache.kafka.streams.errors.InvalidStateStoreException: The state store, user-data-response-count, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1053)
at com.umantis.management.service.UserDataManagementService.broadcastUserDataRequest(UserDataManagementService.java:121)
Here is my KafkaStreamsConfiguration:
#Configuration
#EnableConfigurationProperties(EventsKafkaProperties.class)
#EnableKafka
#EnableKafkaStreams
public class KafkaConfiguration {
#Value("${app.kafka.streams.application-id}")
private String applicationId;
// This contains both the bootstrap servers and the schema registry url
#Autowired
private EventsKafkaProperties eventsKafkaProperties;
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public StreamsConfig streamsConfig() {
Map<String, Object> props = new HashMap<>();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, this.eventsKafkaProperties.getBrokers());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, this.eventsKafkaProperties.getSchemaRegistryUrl());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return new StreamsConfig(props);
}
#Bean
public KGroupedStream<String, UserDataResponse> responseKStream(StreamsBuilder streamsBuilder, TopicUtils topicUtils) {
final Map<String, String> serdeConfig = Collections.singletonMap("schema.registry.url", this.eventsKafkaProperties.getSchemaRegistryUrl());
final Serde<UserDataResponse> valueSpecificAvroSerde = new SpecificAvroSerde<>();
valueSpecificAvroSerde.configure(serdeConfig, false);
return streamsBuilder
.stream("myTopic", Consumed.with(Serdes.String(), valueSpecificAvroSerde))
.groupByKey();
}
And here is my service code failing on getKafkaStreams().store:
#Slf4j
#Service
public class UserDataManagementService {
private static final String RESPONSE_COUNT_STORE = "user-data-response-count";
#Autowired
private StreamsBuilderFactoryBean streamsBuilderFactory;
public UserDataResponse broadcastUserDataRequest() {
this.responseGroupStream.count(Materialized.as(RESPONSE_COUNT_STORE));
if (!this.streamsBuilderFactory.isRunning()) {
throw new KafkaStoreNotAvailableException();
}
// here we should have a single running kafka instance
ReadOnlyKeyValueStore<String, Long> countStore =
this.streamsBuilderFactory.getKafkaStreams().store(RESPONSE_COUNT_STORE, QueryableStoreTypes.keyValueStore());
...
}
Context: I'm running the app on a single instance in a spring boot test and I'm ensuring the kafka instance is on a running state. I've searched on documentation from apache on this issue, but my case does not appear to match.
Can anyone point me what I'm doing wrong and a possible solution?
I'm quite new on Kafka Streams, so any help would be highly appreciated.

Ok, just saw that I was asking if the streams factory was running but I wasn't asking if the kakfa streams instance was actually running.
Polling over streamsBuilderFactory.getKafkaStreams().state solved the issue.

Related

How to run multiple instances of the same app as producer/consumer in Kafka

i have an simple rest api that have a h2 database so my plan is when i run multiple instances of the same app they will have different in memory databases.Now i want to syncronize these databases beetwen them.I thought kafka to be a good solution , so for example when i get an POST for instance with port 8080 , i should post also for all other instances. Now my app acts as a producer/consumer at the same time and i do not know why only one instance receive the message.
The code:
#EnableKafka
#Configuration
public class KafkaProducerConfigForDepartment {
#Value(value = "${kafka.bootstrapAddress}")
private String bootstrapAddress;
#Bean
public ProducerFactory<String, MessageEventForDepartment> producerFactoryForDepartment() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<String, MessageEventForDepartment> kafkaTemplate() {
return new KafkaTemplate<>(producerFactoryForDepartment());
}
}
#Configuration
public class KafkaTopicConfig {
#Value(value = "${kafka.bootstrapAddress}")
private String bootstrapAddress;
#Bean
public ConsumerFactory<String, MessageEventForDepartment> consumerFactoryForDepartments() {
Map<String, Object> props = new HashMap<>();
props.put(JsonDeserializer.TRUSTED_PACKAGES, "*");
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "groupId");
return new DefaultKafkaConsumerFactory<>(props, new StringDeserializer(), new JsonDeserializer<>(MessageEventForDepartment.class));
}
#Bean
public NewTopic topic1() {
return TopicBuilder.name("topic12")
.partitions(10)
.replicas(10)
.build();
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, MessageEventForDepartment>
kafkaListenerContainerFactoryForDepartments() {
ConcurrentKafkaListenerContainerFactory<String, MessageEventForDepartment> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactoryForDepartments());
return factory;
}
}
#Component
#Slf4j
public class DepartmentKafkaService {
#Autowired
private DepartmentService departmentService;
#KafkaListener(topics = "topic12" , groupId = "groupId",containerFactory = "kafkaListenerContainerFactoryForDepartments")
public void listenGroupFoo(MessageEventForDepartment message) {
log.info(message.toString());
}
}
Why is this happening ? or maybe my approach is not very good , what are your thoughts ,guys?
Have you considered Kafka Streams? In my opinion, your solution is already done by internal RocksDB and Global KTable implementation in Kafka Streams.
RocksDB will behave exactly like the H2 database which you've mentioned. GlobalKTables functionality allows you to broadcast the current state to all running KafkaStreams instances and read data with ease.
Example:
Producer part:
#RestController
class MessageEventForDepartmentController {
#Autowired
KafkaTemplate<String, MessageEventForDepartment> kafkaTemplate;
#PostMapping(path = "/departments", consumes = "application/json")
#ResponseStatus(HttpStatus.ACCEPTED)
void(#RequestBody MessageEventForDepartment event) {
kafkaTemplate.send("topic-a", event.getId(), event);
}
}
Consumer part - KafkaStreams GlobalKTable
#Component
public class StreamsBuilderMessageEventForDepartment {
#Autowired
void buildPipeline(StreamsBuilder streamsBuilder) {
KeyValueBytesStoreSupplier storeSupplier = Stores.inMemoryKeyValueStore("MessageEventForDepartmentGlobalStateStore");
Materialized<String, MessageEventForDepartment, KeyValueStore<Bytes, byte[]>> materialized = Materialized.<String, MessageEventForDepartment>as(storeSupplier)
.withKeySerde(Serdes.String())
.withValueSerde(new JsonSerde(MessageEventForDepartment.class));
GlobalKTable<String, MessageEventForDepartment> messagesCount = messagesGroupedByUser.globalTable("topic-a", materialized);
}
}
Read data from RocksDB
#RestController
class MessageEventForDepartmentReadModelController {
#Autowired
KafkaStreams kafkaStreams
#Get(path = "/departments")
MessageEventForDepartment getMessageEventForDepartment(String eventId) {
ReadOnlyKeyValueStore<String, MessageEventForDepartment> store = kafkaStreams.store(StoreQueryParameters.fromNameAndType("MessageEventForDepartmentGlobalStateStore", QueryableStoreTypes.keyValueStore()));
return store.get(eventId);
}
}
The reason why only one instance of the application receives each message is that each instance has the same ConsumerConfig.GROUP_ID_CONFIG. Kafka's consumer protocol is such that each consumer group gets each message delivered once (obviously, there's a lot more nuance to it, but this is basically how it works).
Pawel's suggestion to use KafkaStreams is a good one—a GlobalKTable would provide what you want.
Luca Pette wrote a great primer on Kakfa Streams here: https://lucapette.me/writing/getting-started-with-kafka-streams/
My understanding to your qus is that your using multiple instances for the same app which uses IN-MEMEORY so for Eventually consistency your going with Kafka stream.
MY SOLUTIONS:
I have used Rabbitmq mirroring which solves the same problem you have in Kafka also supports mirroring find the doc: https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27846330#content/view/27846330
Consider redis cluster or master slave for In-memory db

SpringBoot Embedded Kafka to produce Event using Avro Schema

I have created the below test class to produce an event using AvroSerializer.
#SpringBootTest
#EmbeddedKafka(partitions = 1, brokerProperties = { "listeners=PLAINTEXT://localhost:9092", "port=9092" })
#TestPropertySource(locations = ("classpath:application-test.properties"))
#ContextConfiguration(classes = { TestAppConfig.class })
#DirtiesContext
#TestInstance(TestInstance.Lifecycle.PER_CLASS)
class EntitlementEventsConsumerServiceImplTest {
#Autowired
EmbeddedKafkaBroker embeddedKafkaBroker;
#Bean
MockSchemaRegistryClient mockSchemaRegistryClient() {
return new MockSchemaRegistryClient();
}
#Bean
KafkaAvroSerializer kafkaAvroSerializer() {
return new KafkaAvroSerializer(mockSchemaRegistryClient());
}
#Bean
public DefaultKafkaProducerFactory producerFactory() {
Map<String, Object> props = KafkaTestUtils.producerProps(embeddedKafkaBroker);
props.put(KafkaAvroSerializerConfig.AUTO_REGISTER_SCHEMAS, false);
return new DefaultKafkaProducerFactory(props, new StringSerializer(), kafkaAvroSerializer());
}
#Bean
public KafkaTemplate<String, ApplicationEvent> kafkaTemplate() {
KafkaTemplate<String, ApplicationEvent> kafkaTemplate = new KafkaTemplate(producerFactory());
return kafkaTemplate;
}
}
But when I send an event using kafkaTemplate().send(appEventsTopic, applicationEvent);I am getting the below exception.
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema Not Found; error code: 404001
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getIdFromRegistry(MockSchemaRegistryClient.java:79)
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getId(MockSchemaRegistryClient.java:273)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:82)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:781)
at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:562)
at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:363)
When I use MockSchemaRegistryClient why it is trying to lookup the schema?
schema.registry.url= mock://localhost.something
Basically anything with mock as prefix will do the job.
Refer to this https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerDeConfig.java
Also set auto.register.schemas=true
You are setting the producer not to try and auto register new schema on producing the message , so it just trying to fetch from the SR and did not find its schema on the SR.
also did not see you setup schema registry URL guess its taking default values
To your question the mock is imitating the work of real schema registry, but has its clear disadvantages
/**
Mock implementation of SchemaRegistryClient that can be used for tests. This version is NOT
thread safe. Schema data is stored in memory and is not persistent or shared across instances.
*/
You may look on the document for more information
https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/MockSchemaRegistryClient.java#L47

How to reuse util java class into other karate project?

I was working with karate framework to test my rest service and it work great, however I have service that consume message from kafka topic then persist on mongo to finally notify kafka.
I made a java producer on my karate project, it called by js to be used by feature.
Then I have a consumer to check the message
Feature:
* def kafkaProducer = read('../js/KafkaProducer.js')
JS:
function(kafkaConfiguration){
var Producer = Java.type('x.y.core.producer.Producer');
var producer = new Producer(kafkaConfiguration);
return producer;
}
Java:
public class Producer {
private static final Logger LOGGER = LoggerFactory.getLogger(Producer.class);
private static final String KEY = "C636E8E238FD7AF97E2E500F8C6F0F4C";
private KafkaConfiguration kafkaConfiguration;
private ObjectMapper mapper;
private AESEncrypter aesEncrypter;
public Producer(KafkaConfiguration kafkaConfiguration) {
kafkaConfiguration.getProperties().put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
kafkaConfiguration.getProperties().put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
this.kafkaConfiguration = kafkaConfiguration;
this.mapper = new ObjectMapper();
this.aesEncrypter = new AESEncrypter(KEY);
}
public String produceMessage(String payload) {
// Just notify kafka with payload and return id of payload
}
Other class
public class KafkaConfiguration {
private static final Logger LOGGER = LoggerFactory.getLogger(KafkaConfiguration.class);
private Properties properties;
public KafkaConfiguration(String host) {
try {
properties = new Properties();
properties.put(BOOTSTRAP_SERVERS_CONFIG, host);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "karate-integration-test");
properties.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset123");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
} catch (Exception e) {
LOGGER.error("Fail creating the consumer...", e);
throw e;
}
}
public Properties getProperties() {
return properties;
}
public void setProperties(Properties properties) {
this.properties = properties;
}
}
I'd would like to use the producer code with anotation like cucumber does like:
#Then("^Notify kafka with payload (-?\\d+)$")
public void validateResult(String payload) throws Throwable {
new Producer(kafkaConfiguration).produceMessage(payload);
}
and on feature use
Then Notify kafka with payload "{example:value}"
I want to do that because I want to reuse that code on base project in order to be included in other project
If annotation doesn't works, maybe you can suggest me another way to do it
The answer is simple, use normal Java / Maven concepts. Move the common Java code to the "main" packages (src/main/java). Now all you need to do is build a JAR and add it as a dependency to any Karate project.
The last piece of the puzzle is this: use the classpath: prefix to refer to any features or JS files in the JAR. Karate will be able to pick them up.
EDIT: Sorry Karate does not support Cucumber or step-definitions. It has a much simpler approach. Please read this for details: https://github.com/intuit/karate/issues/398

Schema Registery Issue with Kafka Streams TopologyTestDriver with Avro record

I am trying to test the kafka streams using the TopologyTestDriver.
I am sharing the code snippet and the error I am facing.
public class ToplogyTest extends AvroSourceJsonTopologyTestSupport {
private static final String MOCK_SCHEMA_REGISTRY_URL = "mock://test:8081";
private TopologyTestDriver testDriver;
private TestInputTopic<String, GenericRecord> inputTopic;
private MockSchemaRegistryClient schemaRegistryClient;
}
private final Properties props;
public ToplogyTest () {
super();
props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streamsTest");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "dummy:1234");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
props.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, MOCK_SCHEMA_REGISTRY_URL);
schemaRegistryClient = new MockSchemaRegistryClient();
}
#BeforeEach
public void setup() throws Exception {
// Created the topology
// Create test driver
testDriver = new TopologyTestDriver(topology, props);
// Create Serdes used for test record keys and values
Serde<String> stringSerde = Serdes.String();
Serde<GenericRecord> avroSerde = new GenericAvroSerde();
final Map<String,String> avroSerdeConfig = new HashMap<>();
avroSerdeConfig.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, MOCK_SCHEMA_REGISTRY_URL);
avroSerde.configure(avroSerdeConfig, false);
usersTopic = testDriver.createInputTopic(
"input-topic",
stringSerde.serializer(),
avroSerde.serializer());
}
#Test
public void Test(){
usersTopic.pipeInput("null",record);
}
Error org.apache.kafka.common.errors.SerializationException: Error
serializing Avro message Suppressed:
java.lang.IllegalArgumentException: Please always: Initialize the
test driver at the end of setup before each test using provided
method; Close the test driver after each test using provided method.
Caused by: java.net.MalformedURLException: unknown protocol: mock at
java.net.URL.(URL.java:618) at java.net.URL.(URL.java:508)
at java.net.URL.(URL.java:457) at
io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:
152)
What i understood here is I am not able to register the mock schema registry. Is there anyone faced similar issue ?
I haven't tried with GenericAvroSerde, but with SpecificAvroSerde it has a constructor takes a param of the SchemaRegistryClient.
When you create GenericAvroSerde, pass in your instance of mockSchemaRegistryClient so that it uses your mock, to force it to not use the SchemaRegistryClient that it will create itself (with the no arg constructor).
Also, remove the property for default.value.serde=GenericAvroSerde - if you configure this in your test like you normally would in your code under test, you configure an instance of GenericAvroSerde without the mock again, and at runtime it still attempts to connect to the schema.registry.url
I don't know if this is how MockSchemaRegistryClient is intended to be used, but this approach worked for me when used with SpecificAvroSerde
According to the error, mock:// is not a valid URI scheme. For your test, it'll need to be http:// even if it's not used

Is there a better way to implement multitenant using kafka?

I’m trying to implement a multi tenant micro service using Spring Boot. I already implemented the web layer and the persistence layer. On web layer, I’ve implement a filter which sets the tenant id in a prototype bean (using ThreadLocalTargetSource), on persistence layer I’ve used Hibernate multi tenancy configuration (schema per tenant), they work fine, data is persisted in the appropriate schema. Currently I am implementing the same behaviour on messaging layer, using spring-kaka library, so far ir works the way I expected, but I’d like to know if there is a better way to do it.
Here is my code:
This si the class that manage a KafkaMessageListenerContainer:
#Component
public class MessagingListenerContainer {
private final MessagingProperties messagingProperties;
private KafkaMessageListenerContainer<String, String> container;
#PostConstruct
public void init() {
ContainerProperties containerProps = new ContainerProperties(
messagingProperties.getConsumer().getTopicsAsList());
containerProps.setMessageListener(buildCustomMessageListener());
container = createContainer(containerProps);
container.start();
}
#Bean
public MessageListener<String, String> buildCustomMessageListener() {
return new CustomMessageListener();
}
private KafkaMessageListenerContainer<String, String> createContainer(
ContainerProperties containerProps) {
Map<String, Object> props = consumerProps();
…
return container;
}
private Map<String, Object> consumerProps() {
Map<String, Object> props = new HashMap<>();
…
return props;
}
#PreDestroy
public void finish() {
container.stop();
}
}
This is the CustomMessageListener:
#Slf4j
public class CustomMessageListener implements MessageListener<String, String> {
#Autowired
private TenantStore tenantStore; // Prototype Bean
#Autowired
private List<ServiceListener> services;
#Override
public void onMessage(ConsumerRecord<String, String> record) {
log.info(“Tenant {} | Payload: {} | Record: {}", record.key(),
record.value(), record.toString());
tenantStore.setTenantId(record.key()); // Currently tenant is been setting as key
services.stream().forEach(sl -> sl.onMessage(record.value()));
}
}
This is a test service which would use the message data and tenant:
#Slf4j
#Service
public class ConsumerService implements ServiceListener {
private final MessagesRepository messages;
private final TenantStore tenantStore;
#Override
public void onMessage(String message) {
log.info("ConsumerService {}, tenant {}", message, tenantStore.getTenantId());
messages.save(new Message(message));
}
}
Thanks for your time!
Just to be clear ( correct me if I'm wrong ): you are using the same topic(s) for all your tenants. The way that you distinguish the message according to each tenant is by using the message key which in your case is the tenant id.
A slight improvement can be done by using message headers to store the tenant id instead of the key. By doing this then you will not be limited to partitioning messages based on tenants.
Although the model described by you works it has a major security issue. If someone gets access to your topic then you will be leaking data of all your tenants.
A more secure approach is using topic naming conventions and ACL's ( access control lists ). You can find a short explanation here. In a nutshell, you can include the name of your tenant in the topic's name by either using a suffix or a prefix.
e.g: orders_tenantA, orders_tenantB or tenantA_orders, tenantB_orders
Then, using ACL's you can restrict which applications can connect to those specific topics. This scenario is also helpful if one of your tenants need to connect one of their applications directly to your Kafka cluster.

Categories