I am studying about apache kafka for past two weeks and I have managed to understand how kafka functions, kafka producer and consumer works. Now I want to design a small java program where I can send my apache tomcat 9 logs and metrics to kafka as it can be used for log aggregation purpose. I have searched how to do this, any method or tool I have to learn to design this and I came to know about Log4j.jar through which I can produce custom logs in apache tomcat but I don't know how to send this log to kafka? Please give some guidance regarding this on how to do this program if anyone done this work before.
Thank you.
As commented, you would use KafkaAppender on the application server-side to point at your Kafka brokers to send data to it; Kafka doesn't request data from your applications.
You can also write logs directly to disk, and use any combination of log-processors like Filebeat, Fluent-bit, Rsyslog, which all have Kafka-integrations.
Related
I'm quite new to Kafka and as one of my first projects I'm trying to create a kafka producer in Java which will read events from Wikipedia/Wikimedia and post them to relevant topics.
I'm looking at https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams and https://stream.wikimedia.org/v2/ui/#/ for references on the wikipedia API.
I followed the basic guides for creating Kafka producers in Java, but they mainly rely on events created locally on my machine.
When looking at solutions which read events from a remote server, I see they are using libraries which are not kafka native (e.g. spring.io).
Is there a way to set up my producer with native Kafka libraries that come as part of the kafka installation package?
Spring just wraps the native Kafka libraries to ease development and configuration. It is not required, and so, yes, you can essentially do the same that they do, but with less overhead.
mainly rely on events created locally on my machine
Because that is easier to demo, and an implementation detail. If you pull from a remote server, then that data becomes "local" in-memory data structure at some point.
We have an application that is already using Spring Cloud Stream with RabbitMQ, some endpoints of the application are sending messages to Rabbit MQ. Now we want new endpoints start sending messages to Kafka, hoping that the existing endpoints continue using RabbitMQ with Spring Cloud Stream. I am not sure if this is even possible, because this means we have to include both kafka and rabbit binder dependencies in pom.xml. What configuration changes we need to make in the yml file so that the app understands what bindings are for kafka and what bindings are for Rabbit? Many thanks.
Yes it is possible. This is what we call a multi-binder scenario and is one of the core features specifically to support the use case you are describing.
Here is where you can find more information - https://docs.spring.io/spring-cloud-stream/docs/3.2.1/reference/html/spring-cloud-stream.html#multiple-binders
Also, here is an example that actually provides configuration that uses Kafka and Rabbit. While example is centered around CloudEvent, you can ignore it and strictly concentrate on configuration related to Rabbit and Kafka binders - https://github.com/spring-cloud/spring-cloud-function/tree/main/spring-cloud-function-samples/function-sample-cloudevent-stream
Feel free to ask follow up questions once you get familiar with that.
I have a use case where I want to move messages from SQS to a Kafka topic. The framework to be used is SpringBoot. So, whenever I run my code it should start moving the messages. I searched for some articles but there were very few. I am looking for some boilerplate code to start with that follow the best practices and how to proceed further.
Thanks in advance.
You need to make yourself familiar with Enterprise Integration Patterns and its Spring Integration implementation.
To take messages from AWS SQS you would need to use an SqsMessageDrivenChannelAdapter from a Spring Integration for AWS extension. To post records into an Apache Kafka topic you need a KafkaProducerMessageHandler from the spring-integration-kafka module.
Then you wire everything together via an IntegrationFlow bean in your Spring Boot configuration.
Of course you can use Spring Cloud for AWS and Spring for Apache Kafka directly.
Choice is yours, but better to follow best practice and start developing really an integration solution.
Apache Kafka offers multiple ways to ingest data from different sources e.g. Kafka Connect, Kafka Producer, etc., and we need to be careful while selecting specific components of Kafka by keeping certain things in mind such as retry mechanism, scalability, etc.
The best solution, in this case, would be to use Amazon SQS Source Connector to ingest data from AWS SQS into Kafka topic and then write your consumer application to do whatever is necessary with the stream of records of that particular topic.
How can we ingest data to elastic search through java without logstash and beats is there any option like kafka or something like using only java without any tools
I am not sure why you dont want to consider Filebeats --> Elastic. But yes, there are other ways to send your logs to Elastic search.
Also, you did not mention whats the source, whether you want to insert app logs, database. Assuming you want to send microservices logs also, and below options holds good for sending other data too.
As you dont want to use Filebeat, you should add custom code to collect, refine, format and publish the logs.
you can use Kafka Sink Connector to Elastic search to move all your logs
Also, you can use UDP protocol to send(client) logs and listen(server), then implement buffer and ingest to Elastic.
you can develop a commons lib which holds all this code and use in all your java applications.
Simple udp client server code - https://github.com/suren03/udp-server-client
I have a producer spitting messages out on a cometd topic. I need to pick and process the stream of messages from this topic. I have probably spent last two hours trying to find a way to ingest the messages on cometd directly into apache Beam. I seem to be hitting a wall here. I know that I have following options:
Get from topic, Write the data to kafka and then push the data from kafka to beam.
Get from topic, Write the data to pub/sub on GCP and then push it through to Apache Beam.
Both the options above seem to be including an extra component in the architecture. Is there is a better way to do this? Any examples? Code Samples? Pointers?
I am not aware of anybody yet having written or started a CometD connector for Beam. The current connectors are listed at https://beam.apache.org/documentation/io/built-in/. To write your own, you could try mimicking the code of one of the other basic streaming connectors, such as AMQP, MQTT or JMS (Kafka and Pubsub are very advanced and I don't advise guiding your implementation by their source code).