Hadoop distcp from a spring boot application - ClassNotFoundException

Hadoop distcp from a spring boot application - ClassNotFoundException - java

I am trying to submit distCP job from a spring boot application on a REST API call.
version of spring: 1.5.13.RELEASE
hadoop version: 2.7.3
below is the code I am using to instantiate the DistCP:
List<Path> srcPathList = new ArrayList<Path>();
srcPathList.add(new Path("hdfs://<cluster>/tmp/<user>/source"));
Path targetPath = new Path("hdfs://<cluster>/tmp/<user>/destination");
DistCpOptions distCpOptions = new DistCpOptions(srcPathList,targetPath);
DistCp distCp = new DistCp(configuration,distCpOptions);
Job job = distCp.execute();
The job is submitted successfully to the cluster, however the job fails due to ClassNotFoundException on the cluster. Below is the exception:
INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED;
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.tools.mapred.CopyOutputFormat not found
Why does this happen? Any pointers around this would be very helpful!! Thanks!

I found the reason via viewing the job.jar on the NodeManager machine. The structure of job.jar is：
BOOT-INF/class/xxx
this is unreasonable.
I tried to replace the jar package with war，it works!
<packaging>war</packaging>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<!--exclude inner tomcat-->
<exclusions>
<exclusion>
<artifactId>spring-boot-starter-tomcat</artifactId>
<groupId>org.springframework.boot</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- include tomcat-->
<dependency>
<groupId>org.apache.tomcat</groupId>
<artifactId>tomcat-servlet-api</artifactId>
<version>7.0.47</version>
<scope>provided</scope>
</dependency>
...
and then add start class:
import org.springframework.boot.builder.SpringApplicationBuilder;
import org.springframework.boot.web.support.SpringBootServletInitializer;
public class SpringBootStartApplication extends SpringBootServletInitializer {
#Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder builder) {
//
return builder.sources(xxxPortalApplication.class);
}
}

Related

Gremlin query on Janusgraph through Spark. Error: Provider org.janusgraph.hadoop.serialize.JanusGraphKryoShimService could not be instantiated

Current Architecture
Description
I am using JanusGraph 0.6.2 for graph processing.
GCP BigTable as JanusGraph Backend/database.
Spark 3.0.0 with hadoop 2.7 for data processing, setup locally (planning to setup the env in GCP after the POC).
Gremlin Client and Java 11 as a client to run Spark Job, to do queries like traversal, find nodes and etc through SparkGraphComputer
Problem
I am able to trigger a query job, to do the node count on Spark using Gremlin Client, But I am facing issues triggering a query job using Java apis.
Expectation
Trigger a Query Job using Java APIs.
Apache Spark Setup is done
Configuration working for Gremlin Client
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.spark.persistContext=true
#
# JanusGraph HBase InputFormat configuration
#
#janusgraphmr.ioformat.conf.storage.backend=hbase
#janusgraphmr.ioformat.conf.storage.hostname=localhost
#janusgraphmr.ioformat.conf.storage.port=8586
#janusgraphmr.ioformat.conf.storage.hbase.table=janusgraph
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hbase.ext.hbase.client.connection.impl=com.google.cloud.bigtable.hbase2_x.BigtableConnection
janusgraphmr.ioformat.conf.storage.hbase.ext.google.bigtable.project.id= *****
janusgraphmr.ioformat.conf.storage.hbase.ext.google.bigtable.instance.id= *****
janusgraphmr.ioformat.conf.storage.hbase.table= ******
janusgraphmr.ioformat.conf.storage.hbase.ext.hbase.regionsizecalculator.enable=false
# This defines the indexing backend configuration used while writing data to JanusGraph.
janusgraphmr.ioformat.conf.index.search.backend=elasticsearch
janusgraphmr.ioformat.conf.index.search.hostname=localhost
#
# SparkGraphComputer Configuration
#
spark.master=spark://RINMAC1714:7077
spark.executor.memory=1g
spark.executor.extraClassPath=/Users/rohit.pahan/portables/janusgraph-0.6.2/lib/*
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
Above config works and I get the result. Please follow the screenshot
Java API configuration which is not working for me
GraphTraversalProvider.java
import org.apache.commons.configuration.Configuration;
import org.apache.tinkerpop.gremlin.hadoop.Constants;
public class GraphTraversalProvider {
public static Configuration makeLocal() {
return make(true);
}
public static Configuration makeRemote() {
return make(false);
}
private static Configuration make(boolean local) {
final Configuration hadoopConfig = new BaseConfiguration();
hadoopConfig.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph");
hadoopConfig.setProperty(Constants.GREMLIN_HADOOP_GRAPH_READER, "org.janusgraph.hadoop.formats.hbase.HBaseInputFormat");
hadoopConfig.setProperty(Constants.GREMLIN_HADOOP_GRAPH_WRITER, "org.apache.hadoop.mapreduce.lib.output.NullOutputFormat");
hadoopConfig.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, true);
hadoopConfig.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, "none");
hadoopConfig.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, "output");
hadoopConfig.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.backend", "hbase");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.hbase.ext.hbase.client.connection.impl", "com.google.cloud.bigtable.hbase2_x.BigtableConnectio");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.hbase.ext.google.bigtable.project.id", "******");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.hbase.ext.google.bigtable.instance.id", "*******");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.hbase.table", "******");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.storage.hbase.ext.hbase.regionsizecalculator.enable", false);
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.index.search.backend", "elasticsearch");
hadoopConfig.setProperty("janusgraphmr.ioformat.conf.index.search.hostname", "localhost");
if (local) {
hadoopConfig.setProperty("spark.master", "local[*]"); // Run Spark locally with as many worker threads as logical cores on your machine.
} else {
hadoopConfig.setProperty("spark.master", "spark://MAC1714:7077");
}
hadoopConfig.setProperty("spark.executor.memory", "1g");
hadoopConfig.setProperty(Constants.SPARK_SERIALIZER, "org.apache.spark.serializer.KryoSerializer");
hadoopConfig.setProperty("spark.kryo.registrator", "org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator");
hadoopConfig.setProperty("spark.kryo.registrationRequired","false");
return hadoopConfig;
}
}
Main Class
public static void main(String[] args) throws Exception {
runSpark();
}
private static void runSpark() throws Exception {
Configuration config = GraphTraversalProvider.makeRemote();
Graph hadoopGraph = GraphFactory.open(config);
Long totalVertices = hadoopGraph.traversal().withComputer(SparkGraphComputer.class).V().count().next();
System.out.println("IT WORKED: " + totalVertices);
hadoopGraph.close();
}
}
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.6.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.janus</groupId>
<artifactId>janus-spark</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>janus-spark</name>
<description>Demo project for Spring Boot</description>
<properties>
<janus.version>0.6.2</janus.version>
<spark.version>3.0.0</spark.version>
<gremlin.version>3.4.6</gremlin.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/org.janusgraph/janusgraph-bigtable -->
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-bigtable</artifactId>
<version>${janus.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.janusgraph/janusgraph-hadoop -->
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-hadoop</artifactId>
<version>${janus.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.janusgraph/janusgraph-hbase -->
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-hbase</artifactId>
<version>${janus.version}</version>
</dependency>
<dependency>
<groupId>org.janusgraph</groupId>
<artifactId>janusgraph-solr</artifactId>
<version>${janus.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo -->
<dependency>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
<version>2.16</version>
</dependency>
<!--
<dependency>
<groupId>com.twitter</groupId>
<artifactId>chill_2.13</artifactId>
<version>0.10.0</version>
</dependency>-->
<!-- GREMLIN -->
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>spark-gremlin</artifactId>
<version>${gremlin.version}</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>hadoop-gremlin</artifactId>
<version>${gremlin.version}</version>
</dependency>
<!-- SPARK -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0-jre</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Error Logs
SLF4J: Found binding in [jar:file:/Users/rohit.pahan/portables/janusgraph-0.6.2/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/rohit.pahan/portables/janusgraph-0.6.2/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/rohit.pahan/.m2/repository/ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/rohit.pahan/.m2/repository/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
0 [main] WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
Exception in thread "main" java.lang.IllegalStateException: java.util.ServiceConfigurationError: org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.KryoShimService: Provider org.janusgraph.hadoop.serialize.JanusGraphKryoShimService could not be instantiated
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:88)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:150)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:68)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:135)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:40)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:240)
at com.janus.app.services.RunSparkJob.runSpark(RunSparkJob.java:20)
at com.janus.app.services.RunSparkJob.main(RunSparkJob.java:14)
Caused by: java.util.concurrent.ExecutionException: java.util.ServiceConfigurationError: org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.KryoShimService: Provider org.janusgraph.hadoop.serialize.JanusGraphKryoShimService could not be instantiated
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.VertexProgramStep.processNextStart(VertexProgramStep.java:68)
... 8 more
Caused by: java.util.ServiceConfigurationError: org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.KryoShimService: Provider org.janusgraph.hadoop.serialize.JanusGraphKryoShimService could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:804)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:722)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1393)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.KryoShimServiceLoader.load(KryoShimServiceLoader.java:97)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.KryoShimServiceLoader.applyConfiguration(KryoShimServiceLoader.java:58)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:248)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializer" for class: java.util.concurrent.atomic.AtomicLong
at org.apache.tinkerpop.shaded.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:67)
at org.apache.tinkerpop.shaded.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
at org.apache.tinkerpop.shaded.kryo.Kryo.newDefaultSerializer(Kryo.java:380)
at org.apache.tinkerpop.shaded.kryo.Kryo.getDefaultSerializer(Kryo.java:364)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoTypeReg.registerWith(GryoTypeReg.java:122)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoMapper.createMapper(GryoMapper.java:101)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoMapper.createMapper(GryoMapper.java:75)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader.<init>(GryoReader.java:71)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader.<init>(GryoReader.java:64)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoReader$Builder.create(GryoReader.java:302)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoPool.createPool(GryoPool.java:126)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoPool.access$100(GryoPool.java:40)
at org.apache.tinkerpop.gremlin.structure.io.gryo.GryoPool$Builder.create(GryoPool.java:227)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopPools.initialize(HadoopPools.java:51)
at org.janusgraph.hadoop.serialize.JanusGraphKryoShimService.<init>(JanusGraphKryoShimService.java:30)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:780)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at jdk.internal.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at org.apache.tinkerpop.shaded.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:54)
... 29 more
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private volatile long java.util.concurrent.atomic.AtomicLong.value accessible: module java.base does not "opens java.util.concurrent.atomic" to unnamed module #1d9b7cce
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:177)
at java.base/java.lang.reflect.Field.setAccessible(Field.java:171)
at org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializer.buildValidFields(FieldSerializer.java:306)
at org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:239)
at org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:182)
at org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializer.<init>(FieldSerializer.java:155)
... 34 more
Process finished with exit code 1
I am still exploring Janusgraph and its processing capabilities with Spark. I have given all the details here, Let me know if any more details are required. It is a very new techstack for me. I would be grateful for any help.

<properties>
<janus.version>0.6.2</janus.version>
<spark.version>3.0.0</spark.version>
<gremlin.version>3.4.6</gremlin.version>
</properties>
JanusGraph-0.6.2 depends on TinkerPop-3.5.3.
Mixing with other TinkerPop versions can easily lead to these kind of problems.

How to specify GZip properties in Spring boot 2 application

I have spring boot 2 application with REST API clients. There is a API to download large byte array (around 85MB), so I willing to compress it using GZip encoding. Added following properties to application.properties file
server.compression.enabled=true
server.compression.min-response-size=1024
server.compression.mime-types=application/octet-stream
Default compression reduce file size but it increase processing time rapidly. I saw there are compression levels in GZip encoding from 0-9.
How I set compression level in application.properties file.

Resolve my issue by changing embedded server to jetty.
First exclude embedded tomcat and add jetty dependency in pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jetty</artifactId>
</dependency>
Then add Custom JettyServletWebServerFactory as follows
#Configuration
public class GZipConfig {
#Bean
public JettyServletWebServerFactory jettyServletWebServerFactory() {
JettyServletWebServerFactory factory = new JettyServletWebServerFactory();
factory.addServerCustomizers((Server server) -> {
GzipHandler gzipHandler = new GzipHandler();
gzipHandler.setInflateBufferSize(1);
gzipHandler.setHandler(server.getHandler());
gzipHandler.setIncludedMethods("GET", "POST", "DELETE", "PUT");
gzipHandler.setCompressionLevel(Deflater.BEST_SPEED);
HandlerCollection handlerCollection = new HandlerCollection(gzipHandler);
server.setHandler(handlerCollection);
});
return factory;
}
}
Here we can add compression level.
Thanks

Send Apache Camel Actuator Metrics to Prometheus

I am trying to forward/add the Actuator Camel metrics from /actuator/camelroutes (route metrics like number of exchanges/transactions) to the Prometheus Actuator endpoint. Is there a way for me to configure Camel to add those metrics to the PrometheusMeterRegistry?
I have tried adding:
camel.component.metrics.metric-registry=io.micrometer.prometheus.PrometheusMeterRegistry
in my application.properties according to the documentation here: https://camel.apache.org/components/latest/metrics-component.html
But still nothing relating to Apache Camel is displayed in actuator/prometheus
Here are the dependencies I am using with Spring Boot 2.1.9 and Apache Camel 2.24.2:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-metrics-starter</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Got the Camel Routes metrics working in the /actuator/prometheus endpoint.
Use the camel-micrometer-starter dependency as stated by #claus-ibsen 's comment.
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-metrics-starter</artifactId>
</dependency>
Set the following in your properties file:
camel.component.metrics.metric-registry=prometheusMeterRegistry
Then add set the Camel Context to use the MicrometerRouterPolicyFactory and MicrometerMessageHistoryFactory. Code seen below is places in a Configuration class:
#Configuration
public class AppConfig {
#Bean
public CamelContextConfiguration camelContextConfiguration() {
return new CamelContextConfiguration() {
#Override
public void beforeApplicationStart(CamelContext camelContext) {
camelContext.addRoutePolicyFactory(new MicrometerRoutePolicyFactory());
camelContext.setMessageHistoryFactory(new MicrometerMessageHistoryFactory());
}
#Override
public void afterApplicationStart(CamelContext camelContext) {
}
};
}
}
You need to trigger an exchange in a route for the metrics to appear in /actuator/prometheus.
Here are the metrics made available to Prometheus:
CamelMessageHistory_seconds_count
CamelMessageHistory_seconds_max
CamelRoutePolicy_seconds_max
CamelRoutePolicy_seconds_count
CamelRoutePolicy_seconds_sum
You can use the JMX Exporter jar for Prometheus to get the more detailed metrics from the JMX of Camel. I wanted to avoid this approach as it would mean that for each Camel Spring Boot App I have would use 2 ports; 1 for the JMX Metrics and 1 for the Actuator Metrics.

There is a camel-micrometer-starter dependency you should use instead that integrates with micrometer. And then you can use the micrometer route policy from that dependency to let it monitor all your routes. See the docs at: https://camel.apache.org/components/2.x/micrometer-component.html

I can see the metrics by keeping these dependencies intact
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-management</artifactId>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-metrics</artifactId>
</dependency>
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-micrometer-starter</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
Why dont I see the actual process and bean names rather than like process3, bean1 etc ??
CamelMessageHistory_seconds_sum{camelContext="AppName",nodeId="process3",routeId="AppNameRoute",serviceName="MicrometerMessageHistoryService",} 0.041466
CamelMessageHistory_seconds_count{camelContext="AppName",nodeId="bean1",routeId="AppNameRoute",serviceName="MicrometerMessageHistoryService",} 100.0
CamelMessageHistory_seconds_sum{camelContext="AppName",nodeId="bean1",routeId="AppNameRoute",serviceName="MicrometerMessageHistoryService",} 4.8417576

Cant connect to Phoenix using JDBC in spring boot

I have a spring-boot application where I am trying to configure phoenix DataSource but getting "no suitable Driver" found error.
#Bean(name="phoenixDataSource")
#DependsOn(value = "placeholderConfigurer")
public DataSource phoenixDataSource() {
SimpleDriverDataSource phoenixDataSource = new SimpleDriverDataSource();
phoenixDataSource.setUrl( "jdbc:phoenix:localhost" );
try {
Class<?> driverClass = this.getClass().getClassLoader().loadClass("org.apache.phoenix.jdbc.PhoenixDriver");
phoenixDataSource.setDriverClass((Class<? extends Driver>) driverClass);
} catch( ClassNotFoundException e ) {
// TODO Auto-generated catch block
e.printStackTrace();
}
);
return phoenixDataSource;
}
#Bean(name = "phoenixJdbcTemplate")
public JdbcTemplate phoenixJdbcTemplate(#Qualifier("phoenixDataSource") DataSource ds) {
return new JdbcTemplate(ds);
}

First step you need to find wether you have access to connect or not
Connect to sqline using /usr/hdp/current/phoenix-client/bin/sqlline.py
/usr/hdp/current/phoenix-client/bin/sqlline.py <Zoo-keeper-url>:2181:/hbase-unsecure
If your Habse is not set unsecure so you need to find wether its Kerberos or HBase protected by Ranger authorization ,you can find required information in your logs.
Now you have following three options to connect
Zookeper URL with non secure
"jdbc:phoenix:<Zookeeper_host_name> :<port_number> : /hbase-unsecure"); //With No password
Zookeper URL with secure
"jdbc:phoenix:<Zookeeper_host_name>:<port_number>:<secured_Zookeeper_node>:<user_name> "
With URL
jdbc:phoenix:thin:url=<scheme>://<server-hostname>:<port>;authentication=vaquarkhan
Default zookeeper port =2181.
following code you can use for setup connection make sure you already added dependancy into POM file
POM :
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.3.1</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8.0_05</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.7.0-HBase-1.1</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>sqlline</groupId>
<artifactId>sqlline</artifactId>
<version>1.1.9</version>
</dependency>
JavaCode:
package com.khan.vaquar.config;
import javax.sql.DataSource;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;
import org.springframework.jdbc.datasource.SimpleDriverDataSource;
/**
* Database Configures for Phoenix database.
*/
#Configuration
public class DBConfig {
#Bean
public DataSource dataSource() {
return new SimpleDriverDataSource(new org.apache.phoenix.jdbc.PhoenixDriver(),
"jdbc:phoenix:<Zookeeper-URL> :<PORT_NO> : /hbase-unsecure");
}
#Bean
public NamedParameterJdbcTemplate databasePhoenixJdbcTemplate() {
JdbcTemplate template = new JdbcTemplate(this.dataSource());
template.setQueryTimeout("1500");
return new NamedParameterJdbcTemplate(template);
}
}
Inside your repo use it for connection
#Autowired
private NamedParameterJdbcTemplate databasePhoenixJdbcTemplate;
Few useful links :
https://phoenix.apache.org/server.html
https://community.cloudera.com/t5/Support-Questions/How-to-pass-user-with-Phoenix-url/td-p/96707
https://community.cloudera.com/t5/Community-Articles/Phoenix-JDBC-Client-Setup/ta-p/244284
https://community.cloudera.com/t5/Support-Questions/SQuirreL-on-phoenix-Sandbox/m-p/153362
https://community.cloudera.com/t5/Community-Articles/Phoenix-Part-4-working-with-Ranger/ta-p/249174

There is two kind of drivers Thin and Thick.
Your code is using the thick driver.
so, you have to add the phoenix-core jar file to your classpath.
I'm using hdp 3.0.1.0-187 phoenix server.
In my gradle configuration is below.
implementation('org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0')

Kafka stream not working in spark job

I wrote code to get data from "topicTest1" Kafka Queue. I am not able to print data from the consumer. Error occurred and mentioned below,
Below is my code to consume data,
public static void main(String[] args) throws Exception {
// StreamingExamples.setStreamingLogLevels();
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount").setMaster("local[*]");
;
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(100));
int numThreads = Integer.parseInt("3");
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = "topicTest1".split(",");
for (String topic : topics) {
topicMap.put(topic, numThreads);
}
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, "9.98.171.226:9092", "1",
topicMap);
messages.print();
jssc.start();
jssc.awaitTermination();
}
Using following depedencies
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.6.1</version>
</dependency>
Below error I got
Exception in thread "dispatcher-event-loop-0" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy.scheduleReceivers(ReceiverSchedulingPolicy.scala:138)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receive$1.applyOrElse(ReceiverTracker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)16/11/14 13:38:00 INFO ForEachDStream: metadataCleanupDelay = -1
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)
Another Error
Exception in thread "JobGenerator" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceivedBlockTracker (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:114)
at org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:203)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

Make sure that you use the correct versions. Lets say you use following maven dependecy:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
So the artifact equals: spark-streaming-kafka_2.10
Now check if you use the correct Kafka version:
cd /KAFKA_HOME/libs
Now find kafka_YOUR-VERSION-sources.jar.
In case you have kafka_2.10-0xxxx-sources.jar you are fine! :)
If you use different versions, just change maven dependecies OR download the correct kafka version.
After that check your Spark version. Make sure you use the correct versions
groupId: org.apache.spark
artifactId: spark-core_2.xx
version: xxx

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop distcp from a spring boot application - ClassNotFoundException - java

Related

Gremlin query on Janusgraph through Spark. Error: Provider org.janusgraph.hadoop.serialize.JanusGraphKryoShimService could not be instantiated

How to specify GZip properties in Spring boot 2 application

Send Apache Camel Actuator Metrics to Prometheus

Cant connect to Phoenix using JDBC in spring boot

Kafka stream not working in spark job

Categories

Resources