elasticsearch with RSS plugin hangs after restart - java

I have a clean ES server on my mac with the RSS plugin - after adding several sources the server hangs after I restart it (because I wanted to add additional plugins or restarting the mac).
Is there a way to prevent it / restore the current installation?
Log:
[2015-01-14 09:43:27,870][WARN ][discovery.zen.ping.multicast]
[Mar-Vell] received ping response ping_response{node
[[Asmodeus][JpUdhq0FRYKr7RqDqcK7WQ][Dorons-MBP.Home][inet[/10.0.0.7:9300]]],
id[1089], master
[[Asmodeus][JpUdhq0FRYKr7RqDqcK7WQ][Dorons-MBP.Home][inet[/10.0.0.7:9300]]],
hasJoinedOnce [true], cluster_name[elasticsearch]} with no matching id
[2] [2015-01-14 09:43:30,724][DEBUG][action.admin.cluster.health]
[Mar-Vell] no known master node, scheduling a retry [2015-01-14
09:43:57,943][WARN ][discovery.zen ] [Mar-Vell] failed to
connect to master
[[Asmodeus][JpUdhq0FRYKr7RqDqcK7WQ][Dorons-MBP.Home][inet[/10.0.0.7:9300]]],
retrying... org.elasticsearch.transport.ConnectTransportException:
[Asmodeus][inet[/10.0.0.7:9300]] connect_timeout[30s] at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:807)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:741)
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:714)
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:150)
at
org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:441)
at
org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:393)
at
org.elasticsearch.discovery.zen.ZenDiscovery.access$6000(ZenDiscovery.java:80)
at
org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1318)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
org.elasticsearch.common.netty.channel.ConnectTimeoutException:
connection timed out: /10.0.0.7:9300 at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more [2015-01-14
09:44:00,729][DEBUG][action.admin.cluster.health] [Mar-Vell] observer:
timeout notification from cluster service. timeout setting [30s], time
since start [30s] [2015-01-14 09:44:01,048][WARN
][discovery.zen.ping.multicast] [Mar-Vell] received ping response
ping_response{node
[[Asmodeus][JpUdhq0FRYKr7RqDqcK7WQ][Dorons-MBP.Home][inet[/10.0.0.7:9300]]],
id[1095], master
[[Asmodeus][JpUdhq0FRYKr7RqDqcK7WQ][Dorons-MBP.Home][inet[/10.0.0.7:9300]]],
hasJoinedOnce [true], cluster_name[elasticsearch]} with no matching id
[3]

Related

The kafka script kafka-consumer-group.sh throws timed out waiting for a node assignment

I have a Kafka cluster running with 3 brokers.
<node-ip>:10092 //broker-0
<node-ip>:10093 //broker-1
<node-ip>:10094 //broker-2
The broker-1 <node-ip>:10093 is in a not-ready state(due to some readiness failure). But other 2 brokers are running fine.
But when I use the script kafka-consumer-groups.sh with a running broker address as bootstrap-server, I get the following error.
kafka#mirror-maker-0:/opt/kafka/bin$ /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server <node-ip>:10094 --describe --group c2-c1-consumer-group --state
[2022-03-14 10:24:16,008] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:17,086] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:18,206] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:19,458] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Error: Executing consumer group command failed due to org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$describeConsumerGroups$1(ConsumerGroupCommand.scala:543)
at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.map(JavaCollectionWrappers.scala:309)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeConsumerGroups(ConsumerGroupCommand.scala:542)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsState(ConsumerGroupCommand.scala:620)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:373)
at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:72)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:59)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
Could someone please help me to understand
Why is it connecting to the non mentioned broker(log shows 10093 but I passed :10094)?
Is there any solution to use only the mentioned bootstrap-servers?
One more thing is,
When I run kafka-topics.sh with the running broker address as bootstrap-server, it returns the response.
Thanks
I faced a similar issue. I was able to read the topics but I cannot list, describe the groups. I solved the issue by adding a large timeout. Can you please also try putting a large timeout?
./kafka-consumer-groups.sh --command-config kafka.properties --bootstrap-server brokers --group group --describe --timeout 100000

Flink 1.9 : Standalone Cluster : Failed to transfer file from TaskExecutor id : `AskTimeoutException`

Background
Trying to create Apache Flink Standalone cluster.
Environment : AWS
Job Manager : 1
Task Manager : 2
Config :
FLINK_PLUGINS_DIR : /usr/local/flink-1.9.1/plugins
io.tmp.dirs : /tmp/flink
jobmanager.execution.failover-strategy : region
jobmanager.heap.size : 1024m
jobmanager.rpc.address : job manager ip
jobmanager.rpc.port : 6123
jobstore.cache-size : 52428800
jobstore.expiration-time : 3600
parallelism.default : 4
slot.idle.timeout : 50000
slot.request.timeout : 300000
task.cancellation.interval : 30000
task.cancellation.timeout : 180000
task.cancellation.timers.timeout : 7500
taskmanager.exit-on-fatal-akka-error : false
taskmanager.heap.size : 1024m
taskmanager.network.bind-policy : "ip"
taskmanager.numberOfTaskSlots : 2
taskmanager.registration.initial-backoff: 500ms
taskmanager.registration.timeout : 5min
taskmanager.rpc.port : 50100-50200
web.tmpdir : /tmp/flink-web-74cce811-17c0-411e-9d11-6d91edd2e9b0
Instance Type : t2 medium (2 CPUs 4 GB Memory)
Security Group ports opened : 6123, 8081, 50100 - 50200
OS : CentOS Linux release 7.6.1810 (Core)
Java :
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
Cluster is up and running properly
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://ip:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http:/ip:8081.
org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://flink#ip:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager.
org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink#ip:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs.
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID f2c7f664378b40ce44463713ae98e1c4 (akka.tcp://flink#TaskManager1Ip:38566/user/taskmanager_0) at ResourceManager
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering TaskManager with ResourceID 354a785f637751fb3b034618a47480ed (akka.tcp://flink#TaskManager2Ip:34400/user/taskmanager_0) at ResourceManager
UI shows all the cluster details
Problem
The task submission does not work
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn'
t send a reply.
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:871)
at akka.dispatch.OnComplete.internal(Future.scala:263)
at akka.dispatch.OnComplete.internal(Future.scala:261)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:644)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
... 9 more
2020-02-04 23:25:16,125 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler - Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/resourcemanager#-1545644127]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
at java.lang.Thread.run(Thread.java:748)
Can somebody throw some light on this ? Is it problem related to ports / firewall or some setting is messed up ?
The issue was with security group port permissions. When opened entire range from 0 - 65565, everything started working. Still this is not good enough for production system so eventually for Job workers in the flink-conf.yaml config file, the key taskmanager.data.port was assigned a particular port and that did the trick. This way the task managers could be configured to listen to a particular port within the range.

Flink cluster deployed with "Fencing token not set exception"

What does mean this exception?
I am trying to deploy flink cluster(v.1.5.2) with 3 nodes in HA mode (zookeeper).
I have following flink-conf.yaml settings:
high-availability: zookeeper
high-availability.storageDir: /flink/ha
high-availability.zookeeper.quorum: {node1_ip}:2181,{node2_ip}:2181,{node3_ip}:2181
high-availability.jobmanager.port: 50010
high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.path.namespace: /default_ns
Zookeeper cluster is running.
After start-cluster.sh executed I have only one working node. Another 2 nodes return
{"errors":["Could not retrieve the redirect address of the current leader. Please try to refresh."]} from web UI
and exception in flink-root-standalongsession-.log:
2018-08-07 18:55:22,081 ERROR org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(aec7a76447f8d44131605f5c10fb4fdc, LocalRpc
...
<------>at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
<------>at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(aec7a76447f8d44131605f5c10fb4fdc, LocalRpcInvocation(requestRestAddress(T
<------>at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:59)

Cassandra sstableloader - Connection refused

I am trying to bulk load a 4 node Cassandra 3.0.10 cluster with some data. I've successfully generated the SStables following the documentation, however it seems like I cannot get the sstableloader load them.
I get the following java.net.ConnectException: Connection refused
bin/sstableloader -v -d localhost test-data/output/si_test/messages/
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-1-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-10-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-11-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-12-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-13-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-14-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-15-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-16-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-17-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-18-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-19-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-2-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-20-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-21-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-22-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-23-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-24-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-25-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-26-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-27-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-28-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-29-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-3-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-30-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-31-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-32-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-33-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-34-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-35-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-36-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-37-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-38-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-39-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-4-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-40-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-41-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-42-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-43-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-44-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-45-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-46-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-47-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-48-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-49-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-5-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-50-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-51-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-52-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-53-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-54-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-55-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-56-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-57-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-58-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-59-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-6-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-60-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-61-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-7-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-8-big-Data.db /home/ubuntu/cassandra/apache-cassandra-3.0.10/test-data/output/si_test/messages/mc-9-big-Data.db to [localhost/127.0.0.1]
ERROR 14:46:24 [Stream #3d0c24e0-cc43-11e6-8c9f-615437259231] Streaming error occurred
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_111]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_111]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_111]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_111]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_111]
at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:255) ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:84) ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:242) ~[apache-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) [apache-cassandra-3.0.10.jar:3.0.10]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
progress: total: 100% 0 MB/s(avg: 0 MB/s)WARN 14:46:24 [Stream #3d0c24e0-cc43-11e6-8c9f-615437259231] Stream failed
Streaming to the following hosts failed:
[localhost/127.0.0.1]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:120)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540)
at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:248)
at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The utility seems to connect to the cluster (...Established connection to initial hosts..), however I does not stream the data.
What I've tried so far to debug the issue:
Dropped the target keyspace (and got a different error), created again via cqlsh
I can telnet to each node of the cluster through ports 9042 and 7000
Enabled thrift using nodetool enablethrift
EDIT
That's the output of netstat -an | grep 7000. The nodes have only one network interface and Cassandra is listening to it. It has also established connections with all the other nodes on port 7000.
tcp 0 0 172.31.3.88:7000 0.0.0.0:* LISTEN
tcp 0 0 172.31.3.88:7000 172.31.3.86:54348 ESTABLISHED
tcp 0 0 172.31.3.88:7000 172.31.3.87:60661 ESTABLISHED
tcp 0 0 172.31.3.88:53061 172.31.3.87:7000 ESTABLISHED
tcp 0 0 172.31.3.88:7000 172.31.11.43:36984 ESTABLISHED
tcp 0 0 172.31.3.88:51412 172.31.11.43:7000 ESTABLISHED
tcp 0 0 172.31.3.88:54018 172.31.3.87:7000 ESTABLISHED
tcp 0 0 172.31.3.88:7000 172.31.3.87:40667 ESTABLISHED
tcp 0 0 172.31.3.88:34469 172.31.3.86:7000 ESTABLISHED
tcp 0 0 172.31.3.88:43658 172.31.3.86:7000 ESTABLISHED
tcp 0 0 172.31.3.88:7000 172.31.11.43:49487 ESTABLISHED
tcp 0 0 172.31.3.88:40798 172.31.11.43:7000 ESTABLISHED
tcp 0 0 172.31.3.88:7000 172.31.3.86:51537 ESTABLISHED
EDIT 2
Changing the initial host from 127.0.0.1 to the actual address of the node in the network results to a com.datastax.driver.core.exceptions.TransportException:
bin/sstableloader -v -d 172.31.3.88 test-data/output/si_test/messages/
All host(s) tried for query failed (tried: /172.31.3.88:9042 (com.datastax.driver.core.exceptions.TransportException: [/172.31.3.88] Cannot connect))
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /172.31.3.88:9042 (com.datastax.driver.core.exceptions.TransportException: [/172.31.3.88] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.init(Cluster.java:163)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:334)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:309)
at com.datastax.driver.core.Cluster.connect(Cluster.java:251)
at org.apache.cassandra.utils.NativeSSTableLoaderClient.init(NativeSSTableLoaderClient.java:70)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:159)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:104)
Any suggestion is appreciated.
Thanks
Thats it trying to connect to the storage port (7000). Its most likely binding to a different interface than 127.0.0.1. You can check what its binding too with netstat -an | grep 7000. You may want to double check any firewall or iptable settings.
UPDATE: its not bound to 127.0.0.1 which is default but to 172.31.3.88. So call sstableloader -v -d 172.31.3.88 test-data/output/si_test/messages/
Also if you have ssl enabled (server_encryption_options in cassandra.yaml) you need to use 7001 and configure it to match. If you can telnet to 7000 its most likely not that though.
Worth noting that enabling thrift is not necessarily in 3.0.10. sstableloader no longer uses that (in older versions it was used to read the schema).
Solved changing the rpc_address in the cassandra.yaml file from the default localhost to the actual address of each node.

Spark and Java: Exception thrown in awaitResult

I am trying to connect a Spark cluster running within a virtual machine with IP 10.20.30.50 and port 7077 from within a Java application and run the word count example:
SparkConf conf = new SparkConf().setMaster("spark://10.20.30.50:7077").setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> textFile = sc.textFile("hdfs://localhost:8020/README.md");
String result = Long.toString(textFile.count());
JavaRDD<String> words = textFile.flatMap((FlatMapFunction<String, String>) s -> Arrays.asList(s.split(" ")).iterator());
JavaPairRDD<String, Integer> pairs = words.mapToPair((PairFunction<String, String, Integer>) s -> new Tuple2<>(s, 1));
JavaPairRDD<String, Integer> counts = pairs.reduceByKey((Function2<Integer, Integer, Integer>) (a, b) -> a + b);
counts.saveAsTextFile("hdfs://localhost:8020/tmp/output");
sc.stop();
return result;
The Java application shows the following stack trace:
Running Spark version 2.0.1
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Changing view acls to: lii5ka
Changing modify acls to: lii5ka
Changing view acls groups to:
Changing modify acls groups to:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
Successfully started service 'sparkDriver' on port 61267.
Registering MapOutputTracker
Registering BlockManagerMaster
Created local directory at /private/var/folders/4k/h0sl02993_99bzt0dzv759000000gn/T/blockmgr-51de868d-3ba7-40be-8c53-f881f97ced63
MemoryStore started with capacity 2004.6 MB
Registering OutputCommitCoordinator
Logging initialized #48403ms
jetty-9.2.z-SNAPSHOT
Started o.s.j.s.ServletContextHandler#1316e7ec{/jobs,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#782de006{/jobs/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2d0353{/jobs/job,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#381e24a0{/jobs/job/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#1c138dc8{/stages,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#b29739c{/stages/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#63f6de31{/stages/stage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2a04ddcb{/stages/stage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2af9688e{/stages/pool,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6a0c5bde{/stages/pool/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3f5e17f8{/storage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#33b86f5d{/storage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5264dcbc{/storage/rdd,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5a3ebf85{/storage/rdd/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#159082ed{/environment,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6522c585{/environment/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#115774a1{/executors,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3e3a3399{/executors/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2f2c5959{/executors/threadDump,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5c51afd4{/executors/threadDump/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#76893a83{/static,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#19c07930{/,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#54eb0dc0{/api,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5953786{/stages/stage/kill,null,AVAILABLE}
Started ServerConnector#2eeb8bd6{HTTP/1.1}{0.0.0.0:4040}
Started #48698ms
Successfully started service 'SparkUI' on port 4040.
Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040
Connecting to master spark://10.20.30.50:7077...
Successfully created connection to /10.20.30.50:7077 after 25 ms (0 ms spent in bootstraps)
Connecting to master spark://10.20.30.50:7077...
Still have 2 requests outstanding when connection from /10.20.30.50:7077 is closed
Failed to connect to master 10.20.30.50:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
Caused by: java.io.IOException: Connection from /10.20.30.50:7077 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:128) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:109) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:257) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
... 1 common frames omitted
In the Spark Master log on 10.20.30.50, I get the following error message:
16/11/05 14:47:20 ERROR OneForOneStrategy: Error while decoding incoming Akka PDU of length: 1298
akka.remote.transport.AkkaProtocolException: Error while decoding incoming Akka PDU of length: 1298
Caused by: akka.remote.transport.PduCodecException: Decoding PDU failed.
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:167)
at akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:580)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:375)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:343)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at akka.actor.FSM$class.processEvent(FSM.scala:604)
at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:269)
at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:598)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:592)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.transport.ProtocolStateActor.aroundReceive(AkkaProtocolTransport.scala:269)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6643)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6607)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6703)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6698)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at akka.remote.WireFormats$AkkaProtocolMessage.parseFrom(WireFormats.java:6821)
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
... 19 more
Additional Information
The example works fine when I use new SparkConf().setMaster("local") instead
I can connect to the Spark Master with spark-shell --master spark://10.20.30.50:7077 on the very same machine
Looks like network error in the first place (but actually NOT) in the disguise of version mismatch of spark . You can point to correct
version of spark jars mostly assembly jars.
This issue may happen due to version miss match in Hadoop RPC call using Protobuffer.
when a protocol message being parsed is invalid in some way, e.g. it contains a malformed varint or a negative byte length.
My experience with protobuf, InvalidProtocolBufferException can happen, only when the message was not able to parse(programatically if you are parsing protobuf message, may be message legth is zero or message is corrupted...).
Spark uses Akka Actors for Message Passing between Master/Driver and Workers and Internally akka uses googles protobuf to communicate. see method below from AkkaPduCodec.scala)
override def decodePdu(raw: ByteString): AkkaPdu = {
try {
val pdu = AkkaProtocolMessage.parseFrom(raw.toArray)
if (pdu.hasPayload) Payload(ByteString(pdu.getPayload.asReadOnlyByteBuffer()))
else if (pdu.hasInstruction) decodeControlPdu(pdu.getInstruction)
else throw new PduCodecException("Error decoding Akka PDU: Neither message nor control message were contained", null)
} catch {
case e: InvalidProtocolBufferException ⇒ throw new PduCodecException("Decoding PDU failed.", e)
}
}
But in your case, since its version mismatch, new protobuf version message cant be parsed from old version of parser... or something like...
If you are using maven other dependencies, pls. review.
It turned out that I had Spark version 1.5.2 running in the virtual machine and used version 2.0.1 of the Spark library in Java. I fixed the issue by using the appropriate Spark library version in my pom.xml which is
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
Another problem (that occurred later) was, that I also had to pin the Scala version with which the library was build. This is the _2.10 suffix in the artifactId.
Basically #RamPrassad's answer pointed me into the right direction but didn't give a clear advice what I need to do to fix my problem.
By the way: I couldn't update Spark in the virtual machine, since it was brought to me by the HortonWorks distribution...
This error also happens when you have strange characters that you wanna insert into a table, for example:
INSERT INTO mytable
(select 'GARDENﬠy para el uso para el control de rat쟣om﷠rata' as text)
So the solution is to change the encoding text or to remove these special characters like this:
INSERT INTO mytable
(select regexp_replace('GARDENﬠy para el uso para el control de rat쟣om﷠rata',
'[\u0100-\uffff]', '') as text)
I got this same error while trying to read in data from my SQL table. My particular issue was that I did not give the SQL user sufficient permissions to read in the data.
My SQL user had permission to run a SELECT statement, but when I look at the log via select * from stv_recents (on redshift) I saw that it also performs a UNLOAD command to s3. Here's a resource to GRANT permissions on redshift
Not sure if this is still relevant for you seeing that this was almost 6 years ago, but I see this in your error:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
which looks like you also didn't have sufficient permissions set on your SQL user

Categories