Spark Streaming Context blocking REST endpoints

Spark Streaming Context blocking REST endpoints - java

I am working on a small project using Spring Boot, Kafka, and Spark. So far I have been able to create a Kafka producer in one project and a Spark-Kafka direct stream as a consumer.
I am able to see messages pass through and things seem to be working as intended. However, I have a rest endpoint on the project that is running the consumer. Whenever I disable the Direct Stream, the endpoint works fine. However when I have the stream running, Postman says there is no response. I see nothing in the server logs indicating that a request was ever received either.
The Spark consumer is started by a bean at project launch. Is this keeping the normal server on localhost:8080 from being started?

Initially I was kicking off the StreamingContext by annotating it as a Bean. I instead made the application implement CommandLineRunner, and in the overridden run method, I called the method that kicks off the Streaming Context. That allowed Apache to start and fixed the issue.

Related

Broadcast message to all instances of kubernetes java service

The problem: I have a spring boot service running on K8s. Generally API calls can be served by any pod of my service, but for a particular use case we have a requirement to propagate the call to all instances of the service.
A bit of googling led me to https://discuss.kubernetes.io/t/how-to-broadcast-message-to-all-the-pod/10002 where they suggest using
kubectl get endpoints cache -o yaml
and proceeding from there. This is fine for a human or a CLI environment, but how do I accomplish the same from within my Java service, aside from executing the above command via Process and parsing the output?
Essentially I want a way to do what the above command is doing but in a more java-friendly way.

Seems like your spring boot service should be listening to a message queue, and when one service receives a specific HTTP request message to the /propagateme endpoint, it sends an event to the topic to all other clients listening to the Propagation topic, when the instances receive a message from the topic they perform the specific action
See JMS https://spring.io/guides/gs/messaging-jms/

Spring KafkaTemplate cannot find bootstrap server on OpenShift

I am working on moving a Spring sink microservice from my local machine to the OpenShift platform. Inside of my microservice I create a KafkaTemplate like this:
#Autowired
KafkaTemplate<String, String> kafkaTemplate;
Using this method to send Kafka message works perfectly fine on my local machine, when I move to OpenShift however I get this error:
Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
I find this to be a little confusing because I when my microservice goes up I can see the address of the Kafka servers in the logs and because other Kafka functions seem to work fine (I am able to read out from the Kafka topic that my processor microservice is writing into). I have been able to work around this issue is to manually create a ProducerFactory and set ProducerConfig.BOOTSTRAP_SERVERS_CONFIG equal to the bootstrap server address I found in the logs.
So I guess I have a couple of questions: One, is KafkaTemplate only wired to run locally or is there a way to wire it to use the bootstrap server assigned after the service goes up? If not is there a way to get the bootstrap server that I am seeing in the logs? Obviously my service has access to it at one point so is there some way I could dynamically set the BOOTSTRAP_SERVERS_CONFIG equal to the value that is being printed in the logs?
Or is there a better way to write out to a Kafka topic from a Spring sink microservice?
Any help would be much appreciated!

Start Spring Boot app with Spring Integration Kafka consumers paused

I am working on a Spring Boot application that uses Spring Integration flows that have Kafka topics as their source. Our integration flow starts using an interface containing SubscribableChannels with springframework.cloud.stream.annotation.Input and Output annotations. These are configured to read from Kafka via Cloud Config with spring.cloud.stream.kafka.bindings.
When the app first starts up it immediately begins reading from the Kafka topics. This is a problem as the app needs to initialize some local, non-persistable databases before it can start correctly processing incoming Kafka messages.
We are currently using a #PostConstruct to populate these in-memory databases before Kafka starts but this is suboptimal as the app can't use Eureka, Feign, etc, to reliably find a healthy service that has the latest data for the in-memory database.
For a variety of reasons the architecture can't be changed such that the in-memory database is shared or prepopulated. Just know that when I call it an in-memory database I'm simplifying things a bit, it's actually another service, of sorts.
What is the best way to start a Spring Boot app such that an Integration Flow that reads from Kafka starts in a paused state and can be unpaused after some other process completes?

I assume you use KafkaMessageDrivenChannelAdapter and according your mentioning of Spring Integration Java DSL - Kafka.messageDrivenChannelAdapter() to be exact. That one can be configured with the id and autoStartup(false). Therefore it isn't going to start to consume Kafka topic immediately. Whenever you are ready to consume, you can start() this component obtaining it as a Lifecycle from the application context using the mentioned id.
Or you can send an appropriate message to the Control Bus.
UPDATE
If you deal with Spring Cloud Stream and Kafka Binder, you should consider to inject a BindingsEndpoint bean and perform its changeState(#Selector String name, State state) for the name of your binding and the State.STOPPED. When your in-memory DB is ready you call it back with the State.STARTED: https://docs.spring.io/spring-cloud-stream/docs/Elmhurst.RELEASE/reference/htmlsingle/#_binding_visualization_and_control

What is approch to consume Kafka for Java web application

I made one web application in java spring mvc,
In this application traffic will be around 10000 files upload/minute.
So in controller we wrote a code to inject file byte data into Kafka Producer.
Where to implement consumer that can get result of file and do the process in java?
Need to make service which will run separately like Windows service. Is this good way?

Background job Framework Needed in Production

We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.

Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.