Kafka Connect cannot cast custom storage sink partitioner to Partitioner interface - java

I need to create a custom partitioner for the kafka connect S3 sink plugin.
I've extended the HourlyPartitioner in a custom class using kotlin:
class RawDumpHourlyPartitioner<T> : HourlyPartitioner<T>() {
...
}
and changed my connector config accordingly to use the custom class:
"partitioner.class": "co.myapp.RawDumpHourlyPartitioner",
I've then created our jar (we use shadow) and included it into a custom docker image based on the kafka connect image (the image version is the same as the dependencies we use in the project):
FROM gradle:6.0-jdk8 as builder
WORKDIR /app
ADD . .
RUN gradle clean shadowJar
FROM confluentinc/cp-kafka-connect:5.3.2
COPY --from=builder /app/build/libs/kafka-processor-0.1-all.jar /usr/share/java/kafka/kafka-processor.jar
When the connector starts I get this error:
ERROR WorkerSinkTask{id=staging-raw-dump-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
java.lang.ClassCastException: co.myapp.RawDumpHourlyPartitioner cannot be cast to io.confluent.connect.storage.partitioner.Partitioner
To double check I've created a java file that tries to instantiate the class and it didn't throw any error:
import io.confluent.connect.storage.partitioner.Partitioner;
public class InstantiateTest {
public static void main(String[] args) throws ClassNotFoundException, IllegalAccessException, InstantiationException {
Class<? extends Partitioner<?>> partitionerClass =
(Class<? extends Partitioner<?>>) Class.forName("co.myapp.RawDumpHourlyPartitioner");
Partitioner<?> partitioner = partitionerClass.newInstance();
}
}

Looking at the kafka connect guide it says:
A Kafka Connect plugin is simply a set of JAR files where Kafka Connect can find an implementation of one or more connectors, transforms, and/or converters. Kafka Connect isolates each plugin from one another so that libraries in one plugin are not affected by the libraries in any other plugins. This is very important when mixing and matching connectors from multiple providers.
This means that since I'm using the S3 sink connector, I have to put my jar with the custom partitioner in the directory of the S3 plugin.
Moving the jar file to /usr/share/java/kafka-connect-s3 solved the issue
In the comments I've mentioned that my jar also includes a custom subject name strategy that we use in the main kafka-connect config (the env variables), in that case the jar needs to be in the /usr/share/java/kafka folder
Update: as cricket_007 mentioned it's better to put the custom partitioner jar into the /usr/share/java/kafka-connect-storage-common folder which is where all the other partitioners are

Depending on which Sink you use, We need to push partitioner class there, as in our case when we were using Confluent Kafka 5.5 , and connector class Azure Gen2 Storage.
For that we need to write custom partitioner similar to following Repo in Github.
Then We place the custom JAR in following path:
/usr/share/confluent-hub-components/confluentinc-kafka-connect-azure-data-lake-gen2-storage/lib/
After which our connector class working successfully!

Related

Custom LiquibaseDataTypes not found in docker image classpath

I am trying to build a custom Liquibase docker image (based on the official liquibase/liquibase:4.3.5 image) for running database migrations in Kubernetes.
I am using some custom types for the database which are implemented using #DataTypeInfo annotation and extending existing LiquibaseDataTypes like liquibase.datatype.core.VarcharType (class discovery is implemented using the META-INF/services/liquibase.datatype.LiquibaseDatatype mechanism introduced in Liquibase 4+).
These extensions are implemented inside their own maven module called "schema-impl", which is generating a schema-impl.jar. Everything was working fine when using migrations integrated inside the app startup process, but now we want this to be done by the dedicated docker image.
The only information in the Liquibase documentation regarding this topic is the "Drivers and extensions" section from this document. According to this, I added the schema-impl.jar into the /liquibase/classpath directory during the image building process and also modified the liquibase.docker.properties in order to add this jar file explicitly inside the classpath property:
classpath: /liquibase/changelog:/liquibase/classpath:/liquibase/classpath/schema-impl.jar
liquibase.headless: true
However, when I try to run my changesets with the docker image, I am always getting an error because it cannot find the custom type definition:
liquibase.exception.DatabaseException: ERROR: type "my-string" does not exist
Any help would be really appreciated. Thanks in advance.
Ok I found it. Basically the problem was that I needed to include the classpath in the entrypoint command, not in the liquibase.docker.properties file (which seems to be useless for this usecase), like this:
--classpath=/liquibase/changelog:/liquibase/classpath/schema-impl.jar

Cannot load custom File System on Flink's shadow jar

I needed some metadata on my S3 objects, so I had to override the S3 file system provided by flink.
I followed this guide to the letter and now I have a custom file system which works on my local machine, when I run my application in the IDE.
Now I am trying to use it on a local kafka cluster OR on my docker deployment, and I keep getting this error Could not find a file system implementation for scheme 's3c'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
I package my application using shadowJar, using the following configuration:
shadowJar {
configurations = [project.configurations.flinkShadowJar]
mainClassName = "dev.vox.collect.delivery.Application"
mergeServiceFiles()
}
I have my service file in src/main/resources/META-INF/services/org.apache.flink.core.fs.FileSystemFactory that contains a single line with the namespace and name of my factory :dev.vox.collect.delivery.filesystem.S3CFileSystemFactory
If I unzip my shadowJar I can see in its org.apache.flink.core.fs.FileSystemFactory file it has both my factory and the others declared by Flink, which should be correct:
dev.vox.collect.delivery.filesystem.S3CFileSystemFactory
org.apache.flink.fs.s3hadoop.S3FileSystemFactory
org.apache.flink.fs.s3hadoop.S3AFileSystemFactory
When I use the S3 file system provided by flink everything works, it is just mine that does not.
I am assuming the service loader is not loading my factory, either because it does not find it or because it is not declared correctly.
How can I make it work? Am I missing something?

Janus Graph - gremlin-server - Java Client - Cannot find class file for apache.commons.configuration

I am trying to access the janus graph on Cassandra through Java Client, but not able to use the properties file through the Client class.
public static void main(){
Cluster cluster = Cluster.open("remote.yaml");
// Cluster cluster2 = Cluster.build();
Client client = cluster.connect();
graph = JanusGraphFactory.open("conf/janusgraph-cassandra-solr.properties");
error:
Description Resource Path Location Type The project was not built
since its build path is incomplete. Cannot find the class file for
org.apache.commons.configuration.Configuration. Fix the build path
then try building this project janusgraph Unknown Java Problem
This questions was also asked on the janusgraph-users Google Group. My answer from that thread:
org.apache.commons.configuration.Configuration is not found in Apache commons-lang. It is in Apache commons-configuration.
There are a lot of dependencies for the JanusGraph/TinkerPop stack, so you're best approach would be to use a tool like Apache Maven or Gradle to manage the dependencies for your project, rather than adding the jars one by one. Please refer to the code examples included with the distribution.

Using Spring boot/cloud with Amazon AWS lambda does not inject values

I have an AWS lambda RequestHandler class which is invoked directly by AWS. Eventually I need to get it working with Spring Boot because I need it to be able to retrieve data from Spring Cloud configuration server.
The problem is that the code works if I run it locally from my own dev environment but fails to inject config values when deployed on AWS.
#Configuration
#EnableAutoConfiguration
#ComponentScan("my.package")
public class MyClass implements com.amazonaws.services.lambda.runtime.RequestHandler<I, O> {
public O handleRequest(I input, Context context) {
ApplicationContext applicationContext = new SpringApplicationBuilder()
.main(getClass())
.showBanner(false)
.web(false)
.sources(getClass())
.addCommandLineProperties(false)
.build()
.run();
log.info(applicationContext.getBean(SomeConfigClass.class).foo);
// prints cloud-injected value when running from local dev env
//
// prints "${path.to.value}" literal when running from AWS
// even though Spring Boot starts successfully without errors
}
}
#Configuration
public class SomeConfigClass {
#Value("${path.to.value}")
public String foo;
}
src/main/resources/bootstrap.yml:
spring:
application:
name: my_service
cloud:
config:
uri: http://my.server
failFast: true
profile: localdev
What have I tried:
using regular Spring MVC, but this doesn't have integration with #Value injection/Spring cloud.
using #PropertySource - but found out it doesn't support .yml files
verified to ensure the config server is serving requests to any IP address (there's no IP address filtering)
running curl to ensure the value is brought back
verified to ensure that .jar actually contains bootstrap.yml at jar root
verified to ensure that .jar actually contains Spring Boot classes. FWIW I'm using Maven shade plugin which packages the project into a fat .jar with all dependencies.
Note: AWS Lambda does not support environment variables and therefore I can not set anything like spring.application.name (neither as environment variable nor as -D parameter). Nor I can control the underlying classes which actually launch MyClass - this is completely transparent to the end user. I just package the jar and provide the entry point (class name), rest is taken care of.
Is there anything I could have missed? Any way I could debug this better?
After a bit of debugging I have determined that the issue is with using the Maven Shade plugin. Spring Boot looks in its autoconfigure jar for a META-INF/spring.factories jar see here for some information on this. In order to package a Spring Boot jar correctly you need to use the Spring Boot Maven Plugin and set it up to run during the maven repackage phase. The reason it works in your local IDE is because you are not running the Shade packaged jar. They do some special magic in their plugin to get things in the right spot that the Shade plugin is unaware of.
I was able to create some sample code that initially was not injecting values but works now that I used the correct plugin. See this GitHub repo to check out what I have done.
I did not connect it with Spring Cloud but now that the rest of the Spring Boot injection is working I think it should be straightforward.
As I mentioned in the comments you may want to consider just a simple REST call to get the cloud configuration and inject it yourself to save on the overhead of loading a Spring application with every request.
UPDATE: For Spring Boot 1.4.x you must provide this configuration in the Spring Boot plugin:
<configuration>
<layout>MODULE</layout>
</configuration>
If you do not then by default the new behavior of the plugin is to put all of your jars under BOOT-INF as the intent is for the jar to be executable and have the bootstrap process load it. I found this out while addressing adding a warning for the situation that was encountered here. See https://github.com/spring-projects/spring-boot/issues/5465 for reference.

Hadoop Configured getConf() returning null

I have a Spring MVC application running on tomcat which submits MapReduce jobs and analyzes results. My Spring Batch tasklet is able to successfully call an MR driver class and run the job. The driver class extends Configured and implements Tool and is easily able to manipulate HDFS files. The maven module containing the driver class and MR code is added as a dependency to the webapp module.
For analysis, I created a new class in the webapp module which extends Configured. This class is supposed to read an HDFS file and analyze it. However when I try to create the FileSystem object I am getting a null pointer exception.
public class ReportAnalyzer extends Configured{
public void analyze(String path) throws Exception{
FileSystem hdfs=FileSystem.get(getConf()); <-- NPE
//create Path, etc.
}
}
Is there anything else that needs to be done in order to get the FileSystem object? the hadoop dependencies are added to the webapp via the mapreduce module.
You either have to implement Configured(Configuration conf) constructor
public ReportAnalyzer(Configuration conf){
super(conf);
}
or use setConf() before calling analyze().

Categories