Classpath issue preventing pipeline to be run on Google Dataflow - java

I'm new to a project where we have a spring-boot application running on GKE receiving (via Kafka) and publishing events via Pub/Sub. Consumers of these events might want to have these events replayed and we want them to request this via the REST API of our application. Since the application stores the events in GCS before publishing, we thought Apache Beam pipelines run with DataFlow should do the trick.
One "replay request" might result in multiple pipelines, since the events in GCS are stored in folder structures containing the date (e.g. gs://<entity>/2020/12/13/event.json) and depending on how much history the consumer needs, we create a pipeline per day of events.
I'm fairly confident that the logic of defining and submitting pipelines is correct, since the application is able to perform this on a local Kubernetes cluster with the DirectRunner.
On DataFlow I run into the issue summarized here. Spawning a worker (org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness) fails due to a classpath issue:
Caused by: java.lang.NoClassDefFoundError: org/apache/beam/sdk/options/PipelineOptions
I can see that my jar that should have the correct dependencies on the classpath when DataFlow spawns the worker (Omitted most parameters):
java
-cp
/opt/google/dataflow/batch/libshuffle_v1.jar:/opt/google/dataflow/batch/dataflow-worker.jar:/opt/google/dataflow/slf4j/jcl_over_slf4j.jar:/opt/google/dataflow/slf4j/log4j_over_slf4j.jar:/opt/google/dataflow/slf4j/log4j_to_slf4j.jar:/var/opt/google/dataflow/app-6BkavP-0nx4wHMC__85sdbCjJQa7QcQcOxGSQL5huMU.jar
...
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness
After playing around with different scopes of the beam dependencies, because I suspected a clash with the google-dataflow.jar, I haven't seen any change. I'm a bit clueless on where to continue looking. I'm using beam library version 2.27.0 and these are the ones referred to in my pom.xml:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-direct-java</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
<version>${beam.version}</version>
</dependency>
Any advice is much appreciated.

The class org/apache/beam/sdk/options/PipelineOptions is found in the core Java SDK. The artifact is beam-sdks-java-core. This is not baked in to the Dataflow worker, but is part of the expected staged files.
The DataflowRunner by default will attempt to stage every file that it finds on the classpath. If there is anything about your environment or application that affects its ability to do this, you will need to add the SDK dependency yourself.

Related

Spring Dependency Mess - conflict with Spring Boot 2.5.4 and Spring Cloud 3.0.3

I am trying to update an application which already pulls in the kitchen sink (or perhaps a few, they're joined at the hip) and I am sorting through version conflicts.
I want to update to Spring Boot 2.5+ and also use Spring Cloud Consul - I am attempting to pull in:
spring-cloud-starter-consul-discovery:3.0.3
spring-boot:2.5.4
For bonus points, within spring-cloud-starter-consul-discovery, I am seeing that it pulls in reactor-core:3.4.6 and at the same time reactor-extra:3.4.3 (which pulls in reactor-core:3.4.5). The list goes on and on ...
https://search.maven.org/artifact/org.springframework.cloud/spring-cloud-starter-consul-discovery/3.0.3/jar - original point of contention is that it pulls in spring boot 2.4.6 ... it was advertised as supporting 2.5+, then shouldn't the version reference 2.5+?
https://search.maven.org/artifact/org.springframework.cloud/spring-cloud-loadbalancer/3.0.3/jar - this to me is just plain laziness, right below reactor-core is reactor-extra, why wouldn't the Spring developers make extra pull in the same version of core? See: https://search.maven.org/artifact/io.projectreactor.addons/reactor-extra/3.4.3/jar
While this is a trivial problem to solve, it shouldn't be my problem. Am I missing something, or is this just the way it is and I shouldn't expect more?
First of all, you need to look at this compatibility matrix between cloud and boot dependencies. Then, you need (for example) to generate your bom, where you import
the correct cloud dependencies bom
spring boot dependencies bom
These boms, internally, either import other boms, like for example consul, the one you are interested in, which is at version 2.2.8.RELEASE. Look in the properties tag in that file and see this:
<spring-cloud-consul.version>2.2.8.RELEASE</spring-cloud-consul.version>
specifically:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-consul-dependencies</artifactId>
<version>${spring-cloud-consul.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
You can then look at the specific consul bom and see that the version consul-discovery is:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-consul-discovery</artifactId>
<version>${project.version}</version>
</dependency>
Same pattern to find out what version is where can be done for reactor dependencies.
From my 10 minutes investing into this, I don't see a version of spring-cloud-starter-consul-discovery:3.0.3 that would be included in a cloud-dependecies.
You could still try to force a certain version of a dependency. We just recently had such a problem in spring-cloud-kubernetes, internally.
This may or may not work, though.

Camel AMQP - AMQConnectionFactory ClassNotFound

I'm using Camel 2.13.3 and trying to establish a connection via AMQP to a remote ActiveMQ instance.
According to the Camel AMQP docs is should be sufficient to add the following dependency
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-amqp</artifactId>
<version>2.13.1</version>
</dependency>
It then indicates that you should configure the jms component to use a connection factory supplied by the QPID project. The docs page uses org.apache.qpid.amqp_1_0.jms.impl.ConnectionFactoryImpl, and the results of other google searches indicate that org.apache.qpid.client.AMQConnectionFactory could be used.
However, the org.apache.qpid dependencies do not appear to have been added to the project and, unsurprisingly, I get a ClassNotFoundException when I run it.
I considered downloading the qpid dependency separately, but their web site seems to indicate that the qpid client project has been deprecated and replaced by something else ( QPID Messaging API if I remember correctly )
Can anyone point me in the right direction?
should be sufficient
The Camel docs you linked to does not state that. It just says this dependency is needed, doesn't say anything about additional dependencies. Just looked inside the jar you're using, and it does not contain qpid-client classes. You should add that dependency to your pom as well. For AMQP 0.x, there is a good chance you'll need JMS spec dependency as well:
<dependency>
<groupId>org.apache.qpid</groupId>
<artifactId>qpid-client</artifactId>
<version>0.32</version> <!-- replace with appropriate version -->
</dependency>
<dependency>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-jms_1.1_spec</artifactId>
<version>1.0</version>
</dependency>
If you're using AMQP 1.0,
<dependency>
<groupId>org.apache.qpid</groupId>
<artifactId>qpid-jms-client</artifactId>
<version>0.3.0</version>
</dependency>

How to exchange signals between applications?

I have two unconnected applications. One is the main app that performs the business logic and CRUD on database.
A 2nd app periodically rebuilds a database cache (long running taks). I want to send a signal to the main app when the rebuild starts, and when it's finished, as the main app should take specific actions while rebuilding takes place.
How could I achive this best using spring-boot?
using spring-boot you can use jms simply by adding active-mq dependencies.
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-jms</artifactId>
</dependency>
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-broker</artifactId>
</dependency>
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-pool</artifactId>
</dependency>
in yml config you would start the amq jms broker in one application by not specifying broker-url at all because spring.activemq.in-memory property defaults to true (http://docs.spring.io/spring-boot/docs/current/reference/html/common-application-properties.html)
or configuring it like this:
activemq:
broker-url: failover:(vm://localhost:61616?connectionTimeout=3000)
and connect to it from the other application like this
activemq:
broker-url: failover:(tcp://machineoftheotherapplication:61616?connectionTimeout=3000)
You might need to consider if you need your messages to use persistent reliable delivery, meaning if you send a message and the other application is not running it would get the message after it start up again.
If your application works with http requests you could just add a special controller which would process the requests from the second app.
Another option would be a JMX request.
However, security should be considered.

Dropwizard as a non blocking task execution component

I'm in the process of evaluating dropwizard for a mission critical component of our production system.
What I need to implement is a command line tool with RESTful support(there is little need to provide REST API, I need mostly to communicate with external API systems for B2B), logging dependency injection and some short of non blocking I/O operations for maximum performance.
My question is if someone has experience with the particular framwork being ready for a production system and some alternative solutions of a lightweight non blocking operations (Something like Celery on Python)
Finally, does Dropwizard supports java 1.8?
Many thanks for the help in advance
dropwizard-sundial lets you schedule jobs in dropwizard. The github README has more sample but here's a quick sneak peak:
#CronTrigger(cron = "0/5 * * * * ?")
public class SampleJob extends org.knowm.sundial.Job {
#Override
public void doRun() throws JobInterruptException {
// Do something interesting...
}
}
What it does is at the time of initializing your dropwizard application, sundial will drop in and start its scheduler. Then you can configure tasks via package in yaml, tasks, xml. Also the admin task it registed could be used to manage jobs (create/trigger/etc).
One thing to notice is that dropwizard-sundial package itself includes a set of dropwizard-core and dropwizard-util. Very likely you'll find it conflicts with the version which would cause NoClassDefException or NoMethodFoundException. My solution is to exclude from dropwizard-sundial and use your own one.
<dependency>
<groupId>org.knowm</groupId>
<artifactId>dropwizard-sundial</artifactId>
<version>1.0.0.0</version>
<exclusions>
<exclusion>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-core</artifactId>
</exclusion>
<exclusion>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-util</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</exclusion>
</exclusions>
</dependency>

Java - Salesforce PartnerConnection Query request hangs

I have a written a simple program to test Java to Salesforce integration. I followed the steps mentioned in the links below:
Salesforce Api Partner Examples
Sample Query Calls
But when I execute these, the program hangs at the step
QueryResult qr = partnerConnection.query(soqlQuery);
I'm not sure what is happening here - any advice would be welcome.
If you are using an outdated version of the SDK and running against a new endpoint, your program will hang.
To fix this, use the latest version of the SDK as well as point to the latest endpoint.
For example, I used:
/services/Soap/u/34.0
as the endpoint and the following projects versions in maven:
<dependency>
<groupId>com.force.api</groupId>
<artifactId>force-wsc</artifactId>
<version>34.0</version>
</dependency>
<dependency>
<groupId>com.force.api</groupId>
<artifactId>force-partner-api</artifactId>
<version>34.0</version>
</dependency>

Categories