Spark and Java: Exception thrown in awaitResult

Spark and Java: Exception thrown in awaitResult - java

I am trying to connect a Spark cluster running within a virtual machine with IP 10.20.30.50 and port 7077 from within a Java application and run the word count example:
SparkConf conf = new SparkConf().setMaster("spark://10.20.30.50:7077").setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> textFile = sc.textFile("hdfs://localhost:8020/README.md");
String result = Long.toString(textFile.count());
JavaRDD<String> words = textFile.flatMap((FlatMapFunction<String, String>) s -> Arrays.asList(s.split(" ")).iterator());
JavaPairRDD<String, Integer> pairs = words.mapToPair((PairFunction<String, String, Integer>) s -> new Tuple2<>(s, 1));
JavaPairRDD<String, Integer> counts = pairs.reduceByKey((Function2<Integer, Integer, Integer>) (a, b) -> a + b);
counts.saveAsTextFile("hdfs://localhost:8020/tmp/output");
sc.stop();
return result;
The Java application shows the following stack trace:
Running Spark version 2.0.1
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Changing view acls to: lii5ka
Changing modify acls to: lii5ka
Changing view acls groups to:
Changing modify acls groups to:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
Successfully started service 'sparkDriver' on port 61267.
Registering MapOutputTracker
Registering BlockManagerMaster
Created local directory at /private/var/folders/4k/h0sl02993_99bzt0dzv759000000gn/T/blockmgr-51de868d-3ba7-40be-8c53-f881f97ced63
MemoryStore started with capacity 2004.6 MB
Registering OutputCommitCoordinator
Logging initialized #48403ms
jetty-9.2.z-SNAPSHOT
Started o.s.j.s.ServletContextHandler#1316e7ec{/jobs,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#782de006{/jobs/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2d0353{/jobs/job,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#381e24a0{/jobs/job/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#1c138dc8{/stages,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#b29739c{/stages/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#63f6de31{/stages/stage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2a04ddcb{/stages/stage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2af9688e{/stages/pool,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6a0c5bde{/stages/pool/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3f5e17f8{/storage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#33b86f5d{/storage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5264dcbc{/storage/rdd,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5a3ebf85{/storage/rdd/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#159082ed{/environment,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6522c585{/environment/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#115774a1{/executors,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3e3a3399{/executors/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2f2c5959{/executors/threadDump,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5c51afd4{/executors/threadDump/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#76893a83{/static,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#19c07930{/,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#54eb0dc0{/api,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5953786{/stages/stage/kill,null,AVAILABLE}
Started ServerConnector#2eeb8bd6{HTTP/1.1}{0.0.0.0:4040}
Started #48698ms
Successfully started service 'SparkUI' on port 4040.
Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040
Connecting to master spark://10.20.30.50:7077...
Successfully created connection to /10.20.30.50:7077 after 25 ms (0 ms spent in bootstraps)
Connecting to master spark://10.20.30.50:7077...
Still have 2 requests outstanding when connection from /10.20.30.50:7077 is closed
Failed to connect to master 10.20.30.50:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
Caused by: java.io.IOException: Connection from /10.20.30.50:7077 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:128) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:109) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:257) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
... 1 common frames omitted
In the Spark Master log on 10.20.30.50, I get the following error message:
16/11/05 14:47:20 ERROR OneForOneStrategy: Error while decoding incoming Akka PDU of length: 1298
akka.remote.transport.AkkaProtocolException: Error while decoding incoming Akka PDU of length: 1298
Caused by: akka.remote.transport.PduCodecException: Decoding PDU failed.
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:167)
at akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:580)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:375)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:343)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at akka.actor.FSM$class.processEvent(FSM.scala:604)
at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:269)
at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:598)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:592)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.transport.ProtocolStateActor.aroundReceive(AkkaProtocolTransport.scala:269)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6643)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6607)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6703)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6698)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at akka.remote.WireFormats$AkkaProtocolMessage.parseFrom(WireFormats.java:6821)
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
... 19 more
Additional Information
The example works fine when I use new SparkConf().setMaster("local") instead
I can connect to the Spark Master with spark-shell --master spark://10.20.30.50:7077 on the very same machine

Looks like network error in the first place (but actually NOT) in the disguise of version mismatch of spark . You can point to correct
version of spark jars mostly assembly jars.
This issue may happen due to version miss match in Hadoop RPC call using Protobuffer.
when a protocol message being parsed is invalid in some way, e.g. it contains a malformed varint or a negative byte length.
My experience with protobuf, InvalidProtocolBufferException can happen, only when the message was not able to parse(programatically if you are parsing protobuf message, may be message legth is zero or message is corrupted...).
Spark uses Akka Actors for Message Passing between Master/Driver and Workers and Internally akka uses googles protobuf to communicate. see method below from AkkaPduCodec.scala)
override def decodePdu(raw: ByteString): AkkaPdu = {
try {
val pdu = AkkaProtocolMessage.parseFrom(raw.toArray)
if (pdu.hasPayload) Payload(ByteString(pdu.getPayload.asReadOnlyByteBuffer()))
else if (pdu.hasInstruction) decodeControlPdu(pdu.getInstruction)
else throw new PduCodecException("Error decoding Akka PDU: Neither message nor control message were contained", null)
} catch {
case e: InvalidProtocolBufferException â‡’ throw new PduCodecException("Decoding PDU failed.", e)
}
}
But in your case, since its version mismatch, new protobuf version message cant be parsed from old version of parser... or something like...
If you are using maven other dependencies, pls. review.

It turned out that I had Spark version 1.5.2 running in the virtual machine and used version 2.0.1 of the Spark library in Java. I fixed the issue by using the appropriate Spark library version in my pom.xml which is
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
Another problem (that occurred later) was, that I also had to pin the Scala version with which the library was build. This is the _2.10 suffix in the artifactId.
Basically #RamPrassad's answer pointed me into the right direction but didn't give a clear advice what I need to do to fix my problem.
By the way: I couldn't update Spark in the virtual machine, since it was brought to me by the HortonWorks distribution...

This error also happens when you have strange characters that you wanna insert into a table, for example:
INSERT INTO mytable
(select 'GARDENﬠy para el uso para el control de rat쟣om﷠rata' as text)
So the solution is to change the encoding text or to remove these special characters like this:
INSERT INTO mytable
(select regexp_replace('GARDENﬠy para el uso para el control de rat쟣om﷠rata',
'[\u0100-\uffff]', '') as text)

I got this same error while trying to read in data from my SQL table. My particular issue was that I did not give the SQL user sufficient permissions to read in the data.
My SQL user had permission to run a SELECT statement, but when I look at the log via select * from stv_recents (on redshift) I saw that it also performs a UNLOAD command to s3. Here's a resource to GRANT permissions on redshift
Not sure if this is still relevant for you seeing that this was almost 6 years ago, but I see this in your error:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
which looks like you also didn't have sufficient permissions set on your SQL user

Related

Elasticsearch RestClient Connection reset by peer

I have in my AWS VPC a cluster of ES with 2 nodes. On top of those nodes I have a load balancer. In the same vpc I have a microservice that accesses Elasticsearch via RestHighLevelClient version 7.5.2 .
I create the client in the following manner :
public class ESClientWrapper {
#Getter
private RestHighLevelClient client;
public ESClientWrapper() throws IOException {
FileInputStream propertiesFile = new FileInputStream("/var/elastic.properties");
Properties properties = new Properties();
properties.load(propertiesFile );
RestClientBuilder builder = RestClient.builder(new HttpHost(
properties .getProperty("host"),
Integer.parseInt(properties.getProperty("port"))
));
this.client = new RestHighLevelClient(builder);
}
}
When my micro service doesn't get requests for a long time (12h..) there are occurrences when the first response that is sent (or a few after..) are getting the following error:
2020-09-09 07:03:13.106 INFO 1 --- [nio-8080-exec-1] c.a.a.services.CustomersMetadataService : Trying to add the following role : {role=a2}
2020-09-09 07:03:13.106 INFO 1 --- [nio-8080-exec-1] c.a.a.e.repositories.ESRepository : Trying to insert the following document to app-index : {role=a2}
2020-09-09 07:03:13.109 ERROR 1 --- [nio-8080-exec-1] c.a.a.e.dal.ESRepository : Failed to add customer : {role=a2}
java.io.IOException: Connection reset by peer
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:871) ~[elasticsearch-rest-high-level-client-7.5.2.jar!/:7.5.2]
....
....
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.35.jar!/:9.0.35]
at java.base/java.lang.Thread.run(Thread.java:836) ~[na:na]
Caused by: java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:na]
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:na]
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[na:na]
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[na:na]
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[na:na]
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358) ~[na:na]
at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:231) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
... 1 common frames omitted
2020-09-09 07:06:55.109 INFO 1 --- [nio-8080-exec-2] c.a.a.services.MyService : Trying to add the following role : {role=a2}
2020-09-09 07:06:55.109 INFO 1 --- [nio-8080-exec-2] c.a.a.e.repositories.ESRepository : Trying to insert the following document to index app-index: {role=a2}
2020-09-09 07:06:55.211 INFO 1 --- [nio-8080-exec-2] c.a.a.e.dal.ESRepository : IndexResponse[index=app-index,type=_doc,id=x532323272533321870287,version=1,result=created,seqNo=70,primaryTerm=1,shards={"total":2,"successful":2,"failed":0}]
As you can see, 3 minutes after the failed request the next request was successfully handeled by ES. What can kill the request ? I checked Elasticsearch logs and didn't see any indication for killing connection. The MS is in the same vpc as elastic so it isn't passing through any firewall that might kill it.
I found the following issue in github that suggested to increase the default connection timeout but I'm wondering if the issue here is really a timeout problem and if increasing the default time is really the best solution..
Also, I found this bug opened in their repo regarding the same problem but without any answers.
UPDATE
I noticed that even after 10 minutes my service is up this happens. My service started and sent a query to ES and everything worked well. After 10 minutes I sent insert request and it failed on connection reset by peer.

In the end I didn't find a problem in my configuration/implementation. It seems like a bug in the implementation of Elasticsearch's RestHighLevelClient.
I implemented a retry mechanism that wraps the RestHighLevelClient and retries the query if I get the same error. I used Spring #Retry annotation for this solution.

I was facing the same issue. Everything worked fine, but after some time a single request got refused.
The solution (in my case) was to set the keepalive property of the tcp connection with:
final RestClientBuilder restClientBuilder = RestClient.builder(...);
restClientBuilder.setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder.setDefaultIOReactorConfig(IOReactorConfig.custom()
.setSoKeepAlive(true)
.build()))
Found here:
https://github.com/elastic/elasticsearch/issues/65213

Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read

I am trying to read from bigquery using Java BigqueryIO.read method. but getting below error.
public POutput expand(PBegin pBegin) {
final String queryOperation = "select query";
return pBegin
.apply(BigQueryIO.readTableRows().fromQuery(queryOperation));
}
2020-06-08 19:32:01.391 ISTError message from worker: java.io.IOException: Failed to start reading from source: org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter#77f0db34 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:792) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:467) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:446) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:298) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:291) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:787) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748)

I suppose that this issue might be connected with missing tempLocation pipeline execution parameter when you are using DataflowRunner for cloud execution.
According to the documentation:
If tempLocation is not specified and gcpTempLocation is, tempLocation will not be populated.
Since this is just my presumption, I'll also encourage you to inspect native Apache Beam runtime logs to expand the overall issue evidence, as long as Stackdriver logs don't reflect a full picture of the problem.
There was raised a separate Jira tracker thread BEAM-9043, indicating this vaguely outputted error description.
Feel free to append more certain information to you origin question for any further concern or essential updates.

FusionAuth incomplete reindex with AWS Elasticsearch

I am migrating from a self-hosted Elasticsearch FusionAuth search to an AWS Elasticsearch Service solution.
I have a new FusionAuth app EC2 instance reading from the in-use database that is configured to use the new Elasticsearch service.
On triggering a reindex from the new app instance I see that only around 60k or 62.5k documents are being written to the new index when I am expecting roughly 6mil.
I see no errors from AWS's Elasticsearch Service and in the app's logs I can see: (endpoint intentionally omitted)
Feb 13, 2020 10:18:46.116 AM INFO io.fusionauth.api.service.search.ElasticSearchClientProvider - Connecting to FusionAuth Search Engine at [https://vpc-<<omitted>>.eu-west-1.es.amazonaws.com]
13-Feb-2020 11:19:55.176 INFO [http-nio-9011-exec-3] org.apache.coyote.http11.Http11Processor.service Error parsing HTTP request header
Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in method name. HTTP method names must be tokens
at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:430)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:684)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
"/usr/local/fusionauth/logs/fusionauth-app.log" [readonly] 43708L, 4308629C 42183,1 96%
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.lambda$index$1(ElasticsearchSearchEngine.java:140)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.index(ElasticsearchSearchEngine.java:140)
at io.fusionauth.api.service.user.ReindexRunner$ReindexWorker.run(ReindexRunner.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-14" java.lang.NullPointerException
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.lambda$index$1(ElasticsearchSearchEngine.java:140)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.index(ElasticsearchSearchEngine.java:140)
at io.fusionauth.api.service.user.ReindexRunner$ReindexWorker.run(ReindexRunner.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-13" java.lang.NullPointerException
Exception in thread "Thread-11" java.lang.NullPointerException
Exception in thread "Thread-12" java.lang.NullPointerException
Feb 18, 2020 10:23:29.064 AM INFO io.fusionauth.api.service.user.ReindexRunner - Reindex completed in [86797] ms or [86] seconds.
Although there are some exceptions there is also an "Reindex completed" INFO log at the end.
Not knowing the ins-and-outs of Elasticsearch I'm also not sure where to start in investigating a NullPointerException.

It looks like the re-index operation is taking an exception which is likely the cause of the truncated index.
Exception in thread "Thread-14" java.lang.NullPointerException
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
This code makes an assumption that you have a username or email address. This should be enforced by the FusionAuth APIs. But in this case, for this exception to occur the email and username are both NULL.
How did you get users into the db, using the Import API, User API, or something else?
In theory you should find at least one user with a NULL value for the email and username.
This query - or similar - should find the offending user, then we need to identify how this user was added to FusionAuth.
SELECT email, username from identities WHERE email IS NULL OR username IS NULL

Error querying Hive tables when using 'Order by' or 'Group by'

I recently installed DBvisualizier to query Hive tables. I installed it on my Mac and downloaded/installed the jdbc jar files for Hive from this website: https://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/hive-jdbc4/1.0.42.1054/Simba_HiveJDBC41_1.0.42.1054.zip
When I connected to our DB and test the queries. A simple select would work:
select *
from table_name
limit 10
But when I added 'order by' or 'group by':
select *
from table_name
order by rollingtime
limit 10
I got the following error which I have no clue why. Did anyone have similar error and know how to fix this?
09:56:17 START Executing for: 'NewDev' [Hive], Database: Hive, Schema: sdc
09:56:17 FAILED [SELECT - 0 rows, 0.504 secs] [Code: 500051, SQL State: HY000] [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 2, SQL state: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1516123265840_0008_8_00, diagnostics=[Task failed, taskId=task_1516123265840_0008_8_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.NoClassDefFoundError: Could not initialize class org.apache.tez.runtime.library.api.TezRuntimeConfiguration
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:107)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
, errorMessage=Cannot recover from this error:java.lang.NoClassDefFoundError: Could not initialize class org.apache.tez.runtime.library.api.TezRuntimeConfiguration
at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:107)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:186)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1516123265840_0008_8_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1516123265840_0008_8_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1516123265840_0008_8_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1, Query: select *
from nomura_qa_mblock_capacity_stage
order by rollingtime
limit 10.
select *
from nomura_qa_mblock_capacity_stage
order by rollingtime
limit 10;
09:56:17 END Execution 1 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.504/0.000 secs [0 successful, 1 errors]

This is sometimes caused by an inability to write to the configured or hardcoded .staging directory usually because of an incorrect path or invalid permissions.
The mapreduce exec engine is more verbose than the tez engine in helping to identify the culprit which you can choose by running this query in your Hive shell:
SET hive.execution.engine=mr
You may then be able to see the following error:
Permission denied: user=dbuser, access=WRITE,
inode="/user/dbuser/.staging":hdfs:hdfs:drwxr-xr-x
In this case, the "dbuser" staging directory is specified to a non-existent path, it should be /home/dbuser/.staging
At runtime, before executing any queries that perform actions which would need to be staged (sort, order, group, distribute, etc) along with setting the exec engine to mr as previously shown, you would set the staging path to a valid parent directory such as your user's home dir by running the following query:
SET yarn.app.mapreduce.am.staging-dir=/home/dbuser/.staging
Depending on version and environment, if that directive doesn't work, you can try
SET hive.exec.stagingdir=/home/dbuser/.staging
Of course, change "dbuser" to your actual home directory (or any other to which you have read/write access). The .staging directory, assuming write access by the user running the query will automatically be created.
More info at http://doc.mapr.com/display/MapR/Default+mapred+Parameters

SSTable loader streaming failed giving java.io.IOException: Connection reset by peer

I am trying to use sstableloader to stream data to a Cassandra database, which is in fact in the same node. It used to work when i was using DSE 2.2 but when i upgraded it to DSE 4.5 and made all the relevant changes in the cassandra.yaml file, it stopped working and now it is throwing an error like this:
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of demo/test_yale/demo-test_yale-jb-2-Data.db demo/test_yale/demo-test_yale-jb-1-Data.db to [/127.0.0.1]
Streaming session ID: 02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1
progress: [/127.0.0.1 1/2 (88%)] [total: 88% - 2147483647MB/s (avg: 14MB/s)]ERROR 16:36:29,029 [Stream #02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1] Streaming error occurred
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:98)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at com.ning.compress.lzf.LZFChunk.writeCompressedHeader(LZFChunk.java:77)
at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:132)
at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:151)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:101)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:363)
at java.lang.Thread.run(Thread.java:745)
WARN 16:36:29,032 [Stream #02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1] Stream failed
Streaming to the following hosts failed:
[/127.0.0.1]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
I have even tried assigning actual ip address of the node for the listen_address, broadcast_address, and rpc_address in the cassandra.yaml file but the same error occurs.
Can anyone be of assistance please?

It's worth looking at your system.log as specified in cassandra/conf/logback.xml, as suggested by Zanson.
In my case the issue was simply with exhausting disk space on the node:
ERROR [STREAM-IN-/xx.xx.xx.xx] 2016-08-02 10:50:31,125 StreamSession.java:505 - [Stream #8420bfa0-589c-11e6-9512-235b1f79cf1b] Streaming error occurred
java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method) ~[na:1.8.0_101]
at java.io.RandomAccessFile.write(RandomAccessFile.java:525) ~[na:1.8.0_101]

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark and Java: Exception thrown in awaitResult - java

Related

Elasticsearch RestClient Connection reset by peer

Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read

FusionAuth incomplete reindex with AWS Elasticsearch

Error querying Hive tables when using 'Order by' or 'Group by'

SSTable loader streaming failed giving java.io.IOException: Connection reset by peer

Categories

Resources