I have built spark using scala 2.11. I ran the following steps :
./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
After building spark successfully, I tried to intialize spark via akka model .
So, my Main class looks like :
ActorSystem system = ActorSystem.create("ClusterSystem");
Inbox inbox = Inbox.create(system);
ActorRef sparkActorRef = system.actorOf(SparkActor.props(mapOfArguments), "sparkActor");
inbox.send(sparkActorRef, "start");
The spark actor looks like:
public class SparkActor extends UntypedActor{
private static Logger logger = LoggerFactory.getLogger(SparkActor.class);
final Map<String,Object> configurations;
final SparkConf sparkConf;
private int sparkBatchDuration;
public static Props props(final Map<String,Object> configurations) {
return Props.create(new Creator<SparkActor>() {
private static final long serialVersionUID = 1L;
#Override
public SparkActor create() throws Exception {
return new SparkActor(configurations);
}
});
}
public SparkActor(Map<String,Object> configurations) {
this.configurations = configurations;
this.sparkConf =initializeSparkConf(configurations);
ActorRef mediator = DistributedPubSub.get(getContext().system()).mediator();
mediator.tell(new DistributedPubSubMediator.Subscribe("data", getSelf()), getSelf());
}
private SparkConf initializeSparkConf(Map<String, Object> mapOfArgs) {
SparkConf conf = new SparkConf();
Configuration sparkConf = (Configuration) mapOfArgs.get(StreamingConstants.MAP_SPARK_CONFIGURATION);
Iterator it = sparkConf.getKeys();
while(it.hasNext()){
String propertyKey = (String)it.next();
String propertyValue = sparkConf.getString(propertyKey);
conf.set(propertyKey.trim(), propertyValue.trim());
}
conf.setMaster(sparkConf.getString(StreamingConstants.SET_MASTER));
return conf;
}
#Override
public void onReceive(Object arg0) throws Exception {
if((arg0 instanceof String) & (arg0.toString().equalsIgnoreCase("start"))){
logger.info("Going to start");
sparkConf.setAppName(StreamingConstants.APP_NAME);
logger.debug("App name set to {}. Beginning spark execution",StreamingConstants.APP_NAME);
Configuration kafkaConfiguration = (Configuration) configurations.get(StreamingConstants.MAP_KAFKA_CONFIGURATION);
sparkBatchDuration = Integer.parseInt((String)configurations.get(StreamingConstants.MAP_SPARK_DURATION));
//Initializing Kafka configurations.
String[] eplTopicsAndThreads = kafkaConfiguration.getString(StreamingConstants.EPL_QUEUE).split(",");
Map<String,Integer> mapofeplTopicsAndThreads = new TreeMap<>();
for (String item : eplTopicsAndThreads){
String topic = item.split(StreamingConstants.EPL_QUEUE_SEPARATOR)[0];
Integer numberOfThreads= Integer.parseInt(item.split(StreamingConstants.EPL_QUEUE_SEPARATOR)[1]);
mapofeplTopicsAndThreads.put(topic, numberOfThreads);
}
//Creating a receiver stream in spark
JavaPairReceiverInputDStream<String,String> receiverStream = null;
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(sparkBatchDuration));
receiverStream = KafkaUtils.createStream(ssc,
kafkaConfiguration.getString(StreamingConstants.ZOOKEEPER_SERVER_PROPERTY),
kafkaConfiguration.getString(StreamingConstants.KAFKA_GROUP_NAME),
mapofeplTopicsAndThreads);
JavaDStream<String> javaRdd = receiverStream.map(new SparkTaskTupleHelper());
javaRdd.foreachRDD(new Function<JavaRDD<String>, Void>() {
#Override
public Void call(JavaRDD<String> jsonData) throws Exception {
//Code to process some data from kafka
}
});
ssc.start();
ssc.awaitTermination();
}
}
I start my spark application as
./spark-submit --class com.sample.Main --master local[8] ../executables/spark-akka.jar
I get the following exception on startup
Uncaught error from thread [ClusterSystem-akka.actor.default-dispatcher-3] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ClusterSystem]
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at akka.cluster.pubsub.protobuf.DistributedPubSubMessageSerializer.<init>(DistributedPubSubMessageSerializer.scala:42)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:161)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:200)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.serialization.Serialization.serializerOf(Serialization.scala:165)
at akka.serialization.Serialization$$anonfun$3.apply(Serialization.scala:174)
at akka.serialization.Serialization$$anonfun$3.apply(Serialization.scala:174)
at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at akka.serialization.Serialization.<init>(Serialization.scala:174)
at akka.serialization.SerializationExtension$.createExtension(SerializationExtension.scala:15)
at akka.serialization.SerializationExtension$.createExtension(SerializationExtension.scala:12)
at akka.actor.ActorSystemImpl.registerExtension(ActorSystem.scala:713)
at akka.actor.ExtensionId$class.apply(Extension.scala:79)
at akka.serialization.SerializationExtension$.apply(SerializationExtension.scala:12)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:175)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1913)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1904)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:253)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:53)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:252)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:450)
at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:864)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81)
at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:134)
at com.sample.SparkActor.onReceive(SparkActor.java:106)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
A list of options that I have already tried..
1) rebuilt spark with akka version 2.4.4 and got a NoSuchMethodError for toRootLowerCase
2) Tried to reuse the inbuilt spark of 2.3.11 and still got the same exception at CLusterSettings.scala
I have looked at similar errors on stackoverflow and found that it was due to a scala version mismatch. But having built everything with 2.11 and using akka 2.4.4 I thought that all jars will be on the same scala version.
Am i missing any particular step?
My pom file for your reference.
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<slf4j.version>1.7.6</slf4j.version>
<log4j.version>2.0-rc1</log4j.version>
<commons.cli.version>1.2</commons.cli.version>
<kafka.version>0.8.2.2</kafka.version>
<akka.version>2.4.4</akka.version>
<akka.version.old>2.4.4</akka.version.old>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-cluster_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-kernel_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-cluster-tools_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-remote_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-slf4j_2.11</artifactId>
<version>2.4.4</version>
</dependency>
If I remove the cluster jars and the distributedpubsub code and use plain remoting i.e akka.tcp then no errors are shown. It works fine in that scenario. I wish to know why the distributedpubsub throws this error.
Related
I am developing an azure function app. function in azure function app is responsible to receive messages from azure event hub. this method should should update azure digital twin. I am creating Azure DigitalTwin instance like below
#FunctionName("eventGridMonitorString")
public void eventHubProcessor(
#EventHubTrigger(name = "msg", eventHubName = "", connection = "EventHubConnectionString") String message,
final ExecutionContext context) {
// context.getLogger().info(message);
String adtUrl = System.getenv("ADT_SERVICE_URL");
context.getLogger().info("ADTURl : " + adtUrl);
DigitalTwinsClient client = new DigitalTwinsClientBuilder().credential(new ClientSecretCredentialBuilder()
.tenantId("my_tenant_id").clientId("my_client_id")
.clientSecret("my_client_secret").build()).endpoint(adtUrl).buildClient();
Iterable<DigitalTwinsModelData> modelList = client.listModels();
Iterator<DigitalTwinsModelData> it = modelList.iterator();
while (it.hasNext()) {
DigitalTwinsModelData model = it.next();
context.getLogger().info("" + model.getDtdlModel());
}
for (DigitalTwinsModelData model : modelList) {
context.getLogger().info("Created model: " + model.getModelId());
}
}
This code works fine in my local java application but when I deploy this code to azure function app, it gives me below error
2021-05-27T07:42:35.173 [Error] Executed 'Functions.eventGridMonitorString' (Failed, Id=12a87102-78a3-4e2e-8715-b4401091d753, Duration=116ms)Result: FailureException: NoClassDefFoundError: Could not initialize class reactor.netty.http.client.HttpClientConfigStack: java.lang.reflect.InvocationTargetExceptionat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at com.microsoft.azure.functions.worker.broker.JavaMethodInvokeInfo.invoke(JavaMethodInvokeInfo.java:22)at com.microsoft.azure.functions.worker.broker.JavaMethodExecutorImpl.execute(JavaMethodExecutorImpl.java:54)at com.microsoft.azure.functions.worker.broker.JavaFunctionBroker.invokeMethod(JavaFunctionBroker.java:57)at com.microsoft.azure.functions.worker.handler.InvocationRequestHandler.execute(InvocationRequestHandler.java:33)at com.microsoft.azure.functions.worker.handler.InvocationRequestHandler.execute(InvocationRequestHandler.java:10)at com.microsoft.azure.functions.worker.handler.MessageHandler.handle(MessageHandler.java:45)at com.microsoft.azure.functions.worker.JavaWorkerClient$StreamingMessagePeer.lambda$onNext$0(JavaWorkerClient.java:92)at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NoClassDefFoundError: Could not initialize class reactor.netty.http.client.HttpClientConfigat reactor.netty.http.client.HttpClientConnect.<init>(HttpClientConnect.java:84)at reactor.netty.http.client.HttpClient.create(HttpClient.java:393)at com.azure.core.http.netty.NettyAsyncHttpClientBuilder.build(NettyAsyncHttpClientBuilder.java:91)at com.azure.core.http.netty.implementation.ReactorNettyClientProvider.createInstance(ReactorNettyClientProvider.java:14)at com.azure.core.implementation.http.HttpClientProviders.createInstance(HttpClientProviders.java:58)at com.azure.core.http.HttpClient.createDefault(HttpClient.java:50)at com.azure.core.http.HttpClient.createDefault(HttpClient.java:40)at com.azure.core.http.HttpPipelineBuilder.build(HttpPipelineBuilder.java:62)at com.azure.digitaltwins.core.DigitalTwinsClientBuilder.buildPipeline(DigitalTwinsClientBuilder.java:151)at com.azure.digitaltwins.core.DigitalTwinsClientBuilder.buildAsyncClient(DigitalTwinsClientBuilder.java:193)at com.azure.digitaltwins.core.DigitalTwinsClientBuilder.buildClient(DigitalTwinsClientBuilder.java:160)at com.ey.azurefunctions.PolarDelightFunctionApp.Function.eventHubProcessor(Function.java:32)... 16 more
am I missing something or is there any issue with above code?
Edit 1 I have below maven dependencies added to my project
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-digitaltwins-core</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-core-http-netty</artifactId>
<version>1.7.1</version> <!-- {x-version-update;com.azure:azure-core-http-netty;dependency} -->
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.49.Final</version>
</dependency>
<dependency>
<groupId>io.projectreactor.netty</groupId>
<artifactId>reactor-netty</artifactId>
<version>1.0.7</version>
</dependency>
<dependency>
<groupId>io.projectreactor.netty</groupId>
<artifactId>reactor-netty-http</artifactId>
<version>1.0.7</version>
</dependency>
<dependency>
<groupId>io.projectreactor</groupId>
<artifactId>reactor-bom</artifactId>
<version>Dysprosium-SR20</version>
<type>pom</type>
</dependency>
I try to make java client codes of apache spark 3.0.1. First belows are the pom.xml codes.
<dependencies>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.10.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-csv</artifactId>
<version>2.11.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.7.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.12</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.12.7</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.12</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>
And I make java client codes with spark structured streaming api
SparkSession spark = SparkSession.builder().master("local[*]").appName("KafkaMongo_StrctStream").getOrCreate();
Dataset<Row> inputDF = spark.read().format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "topicForMongoDB").option("startingOffsets", "earliest").load().selectExpr("CAST(value AS STRING)");
Encoder<Document> mongoEncode = Encoders.bean(Document.class);
Dataset<Row> tempDF = inputDF.map(row -> { //map function throws the exception.
String[] parameters = new String[row.mkString().split(",").length];
CsvMapper csvMapper = new CsvMapper();
parameters = csvMapper.readValue(row.mkString(), String[].class);
DateTimeFormatter formatter = DateTimeFormatter.ISO_DATE;
EntityMongoDB data = new EntityMongoDB();//LocalDate.parse(parameters[2], formatter), Float.valueOf(parameters[3]), parameters[4], parameters[5], parameters[6], parameters[7], parameters[8], parameters[9]);
String jsonInString = csvMapper.writeValueAsString(data);
Document doc = new Document(Document.parse(jsonInString));
return doc;
}, mongoEncode).toDF();
But these codes can not run because of the below exception,
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method map(Function1<Row,Document>, Encoder<Document>) is ambiguous for the type Dataset<Row>
I can not see any errors on these codes because these codes worked without exceptions on apache spark 2.4 version. These unresolved compilation exception are brought from the apache spark versions matters? Kindly inform me how to solve this issue.
= Updated =
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Properties;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;
import com.aaa.etl.pojo.EntityMongoDB;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
For your information, I also attach the EntityMongoDB class source,
#Data
#AllArgsConstructor
#NoArgsConstructor
public class EntityMongoDB implements Serializable {
#JsonFormat(pattern="yyyy-MM-dd")
#JsonDeserialize(using = LocalDateDeserializer.class)
private LocalDate date;
private float value;
private String id;
private String title;
private String state;
private String frequency_short;
private String units_short;
private String seasonal_adjustment_short;
}
I was upgrading from Spark 2.x -> 3.x. I found this error occurs when moving from scala 2.11 to 2.12, for example the artifact below has this problem too.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.7</version>
</dependency>
The fix I found was to avoid the inline mapping. In the example above you can split it into it's own MapFunction class.
public class DocumentMapper implements MapFunction<Row, Document> {
#Override
public Document call(Row row) throws Exception {
String[] parameters = new String[row.mkString().split(",").length];
CsvMapper csvMapper = new CsvMapper();
parameters = csvMapper.readValue(row.mkString(), String[].class);
DateTimeFormatter formatter = DateTimeFormatter.ISO_DATE;
EntityMongoDB data = new EntityMongoDB();//LocalDate.parse(parameters[2], formatter), Float.valueOf(parameters[3]), parameters[4], parameters[5], parameters[6], parameters[7], parameters[8], parameters[9]);
String jsonInString = csvMapper.writeValueAsString(data);
Document doc = new Document(Document.parse(jsonInString));
return doc;
}
}
Then reference this in the mapping call.
Dataset<Row> tempDF = inputDF.map(new DocumentMapper(), mongoEncode).toDF();
Hope this helps other upgraders out there.
Here is my SOLR Data Model,
#SolrDocument(solrCoreName = "solrData")
public class SolrData {
#Id
#Indexed(name = "id", type = "string")
String id;
#Indexed(name = "name", type = "string")
String name;
This SOLR configuration,
#Configuration
#EnableSolrRepositories(basePackages={"com.ows.repository.solrRepository"}, multicoreSupport=true)
#ComponentScan
public class SolrConfig {
static final String SOLR_HOST = "http://localhost:8983/solr/";
#Bean
public SolrClient solrClient() {
return new HttpSolrClient.Builder(SOLR_HOST).build();
}
#Bean
public SolrTemplate solrTemplate(SolrClient solrClient) throws Exception {
return new SolrTemplate(solrClient);
}
}
The repository,
public interface SolrProductRepository extends SolrCrudRepository<SolrData, String> {
List<SolrData> findByName(String name);
}
The index controller,
#Autowired
SolrProductRepository solrProductRepository;
#RequestMapping("/solrindex")
public void solrIndex(Model model) {
SolrData solrData = new SolrData();
solrData.setName("You know Who");
solrProductRepository.save(solrData);
}
POM.xml
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-solr</artifactId>
<version>2.1.6.RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-common</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>6.6.0</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>4.3.10.RELEASE</version>
</dependency>
With the above settings while I go for indexing using the index controller it says,
Updated complete error messages.
org.springframework.data.solr.UncategorizedSolrException: org.apache.solr.common.SolrInputDocument cannot be cast to java.util.Map; nested exception is java.lang.ClassCastException: org.apache.solr.common.SolrInputDocument cannot be cast to java.util.Map
at org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:224)
at org.springframework.data.solr.core.SolrTemplate.saveBean(SolrTemplate.java:330)
at org.springframework.data.solr.core.SolrTemplate.saveBean(SolrTemplate.java:318)
at org.springframework.data.solr.core.SolrTemplate.saveBean(SolrTemplate.java:300)
at org.springframework.data.solr.repository.support.SimpleSolrRepository.save(SimpleSolrRepository.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.executeMethodOn(RepositoryFactorySupport.java:504)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:489)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:461)
Caused by: java.lang.ClassCastException: org.apache.solr.common.SolrInputDocument cannot be cast to java.util.Map
at org.springframework.data.solr.core.convert.MappingSolrConverter.write(MappingSolrConverter.java:62)
at org.springframework.data.solr.core.SolrTemplate.convertBeanToSolrInputDocument(SolrTemplate.java:1132)
at org.springframework.data.solr.core.SolrTemplate$4.doInSolr(SolrTemplate.java:335)
at org.springframework.data.solr.core.SolrTemplate$4.doInSolr(SolrTemplate.java:330)
at org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:220)
... 129 more
SOLVED
I solved the problem with below settings,
The sole configuration file is changed to below,
#Configuration
#EnableSolrRepositories(basePackages={"com.ows.rokomari.repository.solrRepository"}, multicoreSupport=true)
#ComponentScan
public class SolrConfig {
static final String SOLR_HOST = "http://localhost:8983/solr";
#Bean
public SolrClient solrClient() {
return new HttpSolrClient(SOLR_HOST);
}
#Bean
public SolrTemplate solrTemplate(SolrClient solrClient) throws Exception {
return new SolrTemplate(solrClient);
}
}
The pom.xml file is changed to below settings,
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-solr</artifactId>
<version>2.1.6.RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-common</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>4.3.10.RELEASE</version>
</dependency>
Now everything is quite fine.
I included solr-common because other than this the project throws error. Since my project is spring and running on version 4 with some other old dependencies. I guess the updated Solr related dependencies conflicts with existing dependencies, which is resolved by this one.
I used spring-data-solr which is bit different from using Solrj. Solrj implemation can be found here
working with spark1.6.0 and cassandra-3.1.1 and I tried to connect to cassandra database using Java spark. there is no error while building but getting the following error while i run the application
vException in thread "main" java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:51)
at com.datastax.spark.connector.cql.CassandraConnector$.log(CassandraConnector.scala:144)
at org.apache.spark.Logging$class.logDebug(Logging.scala:62)
at com.datastax.spark.connector.cql.CassandraConnector$.logDebug(CassandraConnector.scala:144)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:154)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$4.apply(CassandraConnector.scala:151)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$4.apply(CassandraConnector.scala:151)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:72)
at com.test.cassandra.spark.Main.generateData(Main.java:30)
at com.test.cassandra.spark.Main.run(Main.java:21)
at com.test.cassandra.spark.Main.main(Main.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
my code
import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import java.io.Serializable;
public class Main implements Serializable {
private transient SparkConf sconf;
private static final String keySpaceName = "java_api";
private static final String primaryTableName = "test_cassandra";
private Main(SparkConf conf) {
this.sconf = conf;
}
private void run() {
JavaSparkContext sc = new JavaSparkContext(sconf);
generateData(sc);
sc.stop();
}
private void generateData(JavaSparkContext sc) {
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
try (Session session = connector.openSession()) {
System.out.println("connected to cassandra");
session.execute("DROP KEYSPACE IF EXISTS java_api");
session.execute("CREATE KEYSPACE java_api WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}");
session.execute("CREATE TABLE java_api.sales (id UUID PRIMARY KEY, product INT, price DECIMAL)");
session.execute("CREATE TABLE java_api.summaries (product INT PRIMARY KEY, summary DECIMAL)");
System.out.println("connected");
}
}
public static void main(String[] args) {
if (args.length != 2) {
System.err
.println("Syntax: com.datastax.spark.demo.Main <Spark Master URL> <Cassandra contact point>");
System.exit(1);
}
SparkConf conf = new SparkConf()
.set("spark.cassandra.connection.host", "localhost")
.set("spark.cassandra.connection.native.port", "9042");
conf.setAppName("Java API demo");
conf.setMaster(args[0]);
//conf.set("spark.cassandra.connection.host", "127.0.0.1");
Main app = new Main(conf);
app.run();
}
}
my pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test</groupId>
<artifactId>cassandra-spark</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!--Spark Cassandra Connector -->
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0-M3</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0-rc1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
</project>
This may come from the fact that
some class has incompatibly changed since the currently executing method was last compiled.
This may come from the java version for example
See the response to this question:
Spark streaming StreamingContext.start() - Error starting receiver 0
Seems this issue is because of conflict in logging of spark and Cassandra.I was getting this error while using below dependency.
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.2"
I used below Cassandra connector to resolve this issue..
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.5"
I am trying to write application for real time processing with apache storm , kafka and trident
but in initialization of TridentKafkaConfig i see this error
Exception in thread "main" java.lang.NoClassDefFoundError: kafka/api/OffsetRequest
at storm.kafka.KafkaConfig.<init>(KafkaConfig.java:43)
at storm.kafka.trident.TridentKafkaConfig.<init>(TridentKafkaConfig.java:30)
at spout.TestSpout.<clinit>(TestSpout.java:22)
at IOTTridentTopology.initializeTridentTopology(IOTTridentTopology.java:31)
at IOTTridentTopology.main(IOTTridentTopology.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.ClassNotFoundException: kafka.api.OffsetRequest
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 10 more
my spout class is
public class TestSpout extends OpaqueTridentKafkaSpout {
private static TridentKafkaConfig config;
private static BrokerHosts HOSTS = new ZkHosts(TridentConfig.ZKHOSTS);
private static String TOPIC = "test";
private static int BUFFER_SIZE = TridentConfig.BUFFER_SIZE;
static{
config = new TridentKafkaConfig(HOSTS, TOPIC);
config.scheme = new SchemeAsMultiScheme(new RawScheme());
config.bufferSizeBytes = BUFFER_SIZE;
}
public TestSpout(TridentKafkaConfig config) {
super(config);
}
public TestSpout() {
super(config);
}
}
main class:
public static void main(String[] args) {
initializeTridentTopology();
}
private static void initializeTridentTopology() {
TridentTopology topology = new TridentTopology();
TestSpout spout = new TestSpout();
//////////////// test //////////////////////
topology.newStream("testspout", spout).each(spout.getOutputFields(), new TestFunction(), new Fields());
/////////////// end test ///////////////////
LocalCluster cluster = new LocalCluster();
Config config = new Config();
config.setDebug(false);
config.setMaxTaskParallelism(1);
config.registerSerialization(storm.kafka.trident.GlobalPartitionInformation.class);
config.registerSerialization(java.util.TreeMap.class);
config.setNumWorkers(5);
config.setFallBackOnJavaSerialization(true);
cluster.submitTopology("KafkaTrident", config, topology.build());
}
and my pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0
<groupId>IOT</groupId>
<artifactId>ver0.1</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.3</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.3</version>
</dependency>
</dependencies>
I am trying different version of storm-kafka (0.9.3 and 0.9.4 and 0.9.5 and 0.9.6 and 0.10.0) and storm-core (9.3 and 9.4 and 9.6)
But I still see my previous error
by googling i found this link but ...
ClassNotFoundException: kafka.api.OffsetRequest
after some googling i found this link
https://github.com/wurstmeister/storm-kafka-0.8-plus-test
and found my answer in pom.xml file
by adding this code and find compatible version of kafka all problem resolved
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.9.0.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
If you use LocalCluster deployment a storm topology you need to add the Kafka lib to your dependencies (for Storm 0.10.0):
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>
</dependency>
kafka.api.OffsetRequest class is missed beacause org.apache.kafka is provided dependency for the storm-kafka:
http://mvnrepository.com/artifact/org.apache.storm/storm-kafka/0.10.0. Please, see the Provided Dependencies section for details.