I am trying to evaluate hive LLAP on a Hortonworks HDP 2.6 cluster.
Unfortunately, I get a java.lang.RuntimeException: ORC split generation failed when trying to execute queries:
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
... 4 more
]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : AM_CPU_MILLISECONDS: 840
INFO : AM_GC_TIME_MILLIS: 23
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
... 4 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO : Resetting the caller context to HIVE_SSN_ID:2415846c-fb92-480f-b869-240a0b0f30ed
INFO : Completed executing command(queryId=hive_20170831085623_51baf1df-5823-459e-80cf-76fa1f81789f); Time taken: 0.342 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
... 4 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)
Here's some information to complete the picture:
I've enabled LLAP using the Ambari interface, installed HiveServer interactive, restarted the services. I am using Beeline version 1.2.1000.2.6.0 to connect to it.
The table being queried is in ORC format. ORC files are generated by an ETL pipeline and written by Java code using ORC Core. I have no problem querying it without LLAP using HiveServer2.
Any advice is really appreciated. Thank you all!
The issue seems to be related to several known issues of ORC split generation. Try running the query with the following config:
hive.exec.orc.split.strategy=BI
https://community.hortonworks.com/questions/131609/hive-llap-orc-split-generation-failed.html
Your orc file is written by OrcFileWriter with larger version.
Is your hive's version 2.1.1?
1. Fix the problem by using this patch file.
https://patch-diff.githubusercontent.com/raw/apache/orc/pull/75.patch
2. Recompile the hive project.
3. Replace the hive-exec-2.1.1.jar and hive-orc-2.1.1.jar in the lib directory.
Related
when I use Spark: rdd.saveAsText(path, LzopCodec.class),get Exception like this, Is that casue java buffer size limit?
maybe there is one row string size is over 2G ?
Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:810)
at org.apache.hadoop.io.Text.encode(Text.java:451)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1505)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1504)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:125)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:123)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1413)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:135)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3224 in stage 22.0 failed 4 times, most recent failure: Lost task 3224.3 in stage 22.0 (TID 10280, hadoop12.fj8.yiducloud.cn, executor 14): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:151)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:79)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:810)
at org.apache.hadoop.io.Text.encode(Text.java:451)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1505)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1504)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:125)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:123)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1413)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:135)
I know this kind of question have been asked previously, but I still don't get solution after reading their posts, so I decide to post this question again from here.
I Am working on Java multi-threaded application where I am trying to run HQL queries using JDBC on Hive environment. I have bunch of hive-sql queries and i am executing them on Hive in parallel with multiple threads and I am getting following exception when queries count more (for example, if i am running more than 100 queries). can some one please check this and help me on this?
2020-06-16 06:00:45,314 ERROR [main]: Terminal exception
java.lang.Exception: Map step agg_cas_auth_reinstate_derive failed.
at com.mine.idn.magellan.ParallelExecGraph.execute(ParallelExecGraph.java:198)
at com.mine.idn.magellan.WarehouseSession.executeMap(WarehouseSession.java:332)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:872)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:778)
at com.mine.idn.magellan.StandAloneEnv.executeAndExit(StandAloneEnv.java:642)
at com.mine.idn.magellan.StandAloneEnv.main(StandAloneEnv.java:77)
Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Bad file descriptor
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2850)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2685)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2591)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1077)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:190)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:599)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:599)
at org.apache.hadoop.mapred.JobClient.getJobInner(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:639)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Bad file descriptor
at java.io.FileInputStream.close0(Native Method)
at java.io.FileInputStream.access$000(FileInputStream.java:49)
at java.io.FileInputStream$1.close(FileInputStream.java:336)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileInputStream.close(FileInputStream.java:334)
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2676)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2661)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2741)
... 22 more
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at com.mine.idn.magellan.WarehouseSession.executeMapStep(WarehouseSession.java:797)
at com.mine.idn.magellan.WarehouseSession.access$000(WarehouseSession.java:23)
at com.mine.idn.magellan.WarehouseSession$ParallelExecResources.executeMapStep(WarehouseSession.java:91)
at com.mine.idn.magellan.ParallelExecGraph$Node.run(ParallelExecGraph.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What i dont understand is, why hadoop framework throwing - Bad File Descriptor Exception? My Java code invoking Hadoop-Hive code and its throwing this exception.
Also one more thing is this issue is intermittent, not consistent. If i re-run the same application, most of the cases, it went through.
Thank you for
this works well:
scan = new Scan(startRow, stopRow);
this throws exception sometimes:
scan = new Scan(stopRow, startRow);
scan.setReversed(true);
throwing exception while traffic is 100 req/s. there are actually no timeouts, exception is fired immediately for 1-10% requests
hbase: 0.98.12-hadoop2;
hadoop: 2.7.0;
cluster in AWS, 10 datanodes: d2.4xlarge
I think it's maybe related with this issue but I'm not using any filters http://apache-hbase.679495.n3.nabble.com/Exception-during-a-reverse-scan-with-filter-td4069721.html
java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
at com.socialbakers.broker.client.hbase.htable.AbstractHtableListScanner.scanToList(AbstractHtableListScanner.java:30)
at com.socialbakers.broker.client.hbase.htable.AbstractHtableListSingleScanner.invokeOperation(AbstractHtableListSingleScanner.java:23)
at com.socialbakers.broker.client.hbase.htable.AbstractHtableListSingleScanner.invokeOperation(AbstractHtableListSingleScanner.java:11)
at com.socialbakers.broker.client.hbase.AbstractHbaseApi.endPointMethod(AbstractHbaseApi.java:40)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.socialbakers.broker.client.Route.invoke(Route.java:241)
at com.socialbakers.broker.client.handler.EndpointHandler.invoke(EndpointHandler.java:173)
at com.socialbakers.broker.client.handler.EndpointHandler.process(EndpointHandler.java:69)
at com.thetransactioncompany.jsonrpc2.server.Dispatcher.process(Dispatcher.java:196)
at com.socialbakers.broker.client.RejectableRunnable.run(RejectableRunnable.java:38)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:430)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:333)
at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:91)
... 15 more
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 2 But the nextCallSeq got from client: 1; request=scanner_id: 27700695 number_of_rows: 100 close_scanner: false next_call_seq: 1
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3231)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
at sun.reflect.GeneratedConstructorAccessor16.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:287)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:375)
... 17 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 2 But the nextCallSeq got from client: 1; request=scanner_id: 27700695 number_of_rows: 100 close_scanner: false next_call_seq: 1
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3231)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31392)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:173)
... 21 more
I am very new to Camus and Hadoop, and am running into an exception error. I am trying to write some avro files to a hdfs, and keep getting the following error block:
[EtlMultiOutputRecordWriter] - ExceptionWritable key: topic=_schemas partition=0leaderId=0 server= service= beginOffset=0 offset=0 msgSize=1024 server= checksum=0 time=1450371931447 value: java.lang.Exception
at com.linkedin.camus.etl.kafka.common.KafkaReader.getNext(KafkaReader.java:108)
at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
... 14 more
I looked up line 108 in com.linkedin.camus.etl.kafka.common.KafkaReader.getNext and found it to be this: MessageAndOffset msgAndOffset = messageIter.next();.
I am using io.confluent.camus.etl.kafka.coders.AvroMessageDecoder for my decoder and com.linkedin.camus.example.DummySchemaRegistry for my coder.
At the end of the logs I get another line indicating an error from one of the hdfs files: Error from file [hdfs://localhost:9000/user/username/exec/2015-12-17-17-05-25/errors-m-00000]. The error-m-00000 file contains a somewhat readable beginning, but then changes to an undecipherable string:
SEQ*com.linkedin.camus.etl.kafka.common.EtlKey5com.linkedin.camus.etl.kafka.common.ExceptionWritable*org.apache.hadoop.io.compress.DefaultCodec|Ò ∫±ß˝}pºHí$ò¸·:0schemasQ∞∆øÿxúïîÀN√0E7l‡+∫»¢lFMõ>á*êxU®™ËzÍmàc[ÆÕ„XÚÕÿqZ%#[ÿD±gÓô…¯∆üGœ¯Ç¿Q,·Úçë2ô'«hZL¿3ëSöXÿ5ê·ê„Sé‡ÇÖpÎS¬î4,…LËÕ¥Î{û}wFßáâ*M)>%&uZÑCfi“˚#rKÌÔ¡flÌu^Í%†B∂"Xa*•⁄0ÔQÕpùGzùidy&ñªkT…śԈ≥-#0>›…∆RG∫.ˇÅ¨«JÚ®sÃ≥Ö¡\£Rîfi˚ßéT≥D#%T8ãW®ÚµÌ∫4N˙©W∫©mst√—Ô嶥óhÓ$C~#S+Ñâ{ãÇfl¡ßí⁄L´ÏíÙºÙΩ5wfÃjM¬∏_Äò5RØ£
Ë"Eeúÿëx{ÆÏ«{XW÷XM€O¨-C#É¡Òl•ù9§‰õö2ó:wɲ%Œ-N∫ˇbFXˆ∑:àá5fyQÑ‘ö™:roõ1⁄5•≠≈˚yM0±ú?»ÃW◊.h≈I´êöNæ
[û3
At the end it appears that a hadoop job has run, but a commit never takes place, based of the timing report:
Job time (seconds):
pre setup 1.0 (11%)
get splits 1.0 (11%)
hadoop job 4.0 (44%)
commit 0.0 (0%)
Total: 0 minutes 9 seconds
Any help or an idea of where to look to resolve this would be greatly appreciated. Thank you.
I'm loading a large number of nodes and relationships into an embedded Neo4j database. After about 10,000 inserts, it dies. If I stay under that point, then everything works great. Queries return as they should, as do inserts. It looks like somehow a database file is getting deleted in the middle of the inserts, which is causing everything to fall apart. My database builds itself from scratch, so if I completely delete my graphdb folder and restart it, it runs exactly the same every time. So how do you handle large embedded Neo4j databases?
Here are the pertinent errors.
From the Java output side
The transactions start not committing:
WorkerThread exception::org.neo4j.graphdb.TransactionFailureException::Unable to commit transaction
org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.graphdb.TransactionFailureException: commit threw exception
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:500)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 4 more
Caused by: javax.transaction.xa.XAException
Then it informs me that there's a missing file in the database:
saction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
... 7 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
Then I can no longer get a new transaction:
WorkerThread exception::org.neo4j.graphdb.TransactionFailureException::Unable to get transaction.
org.neo4j.graphdb.TransactionFailureException: Unable to get transaction.
at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:1064)
at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:1037)
at org.neo4j.kernel.TransactionBuilderImpl.begin(TransactionBuilderImpl.java:43)
at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:1024)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.transaction.SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.neo4j.kernel.impl.transaction.KernelHealth.assertHealthy(KernelHealth.java:61)
at org.neo4j.kernel.impl.transaction.TxManager.assertTmOk(TxManager.java:339)
at org.neo4j.kernel.impl.transaction.TxManager.getTransaction(TxManager.java:725)
at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:1060)
... 7 more
Caused by: javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 4 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
From the messages.log side
It starts off getting memory mapping errors. This actually happens first when the database first comes online. But more trickle in before it totally dies:
2015-01-27 21:51:29.112+0000 ERROR [org.neo4j]: [/home/user/graphdb/neostore.nodestore.db] Unable to memory map Unable to map pos=0 recordSize=15 totalSize=1048575
org.neo4j.kernel.impl.nioneo.store.MappedMemException: Unable to map pos=0 recordSize=15 totalSize=1048575
at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.<init>(MappedPersistenceWindow.java:59)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.allocateNewWindow(PersistenceWindowPool.java:656)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.expandBricks(PersistenceWindowPool.java:617)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(PersistenceWindowPool.java:144)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(CommonAbstractStore.java:546)
at org.neo4j.kernel.impl.nioneo.store.NodeStore.forceGetRecord(NodeStore.java:149)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreIndexStoreView$NodeStoreScan.run(NeoStoreIndexStoreView.java:327)
at org.neo4j.kernel.impl.api.index.IndexPopulationJob.indexAllNodes(IndexPopulationJob.java:212)
at org.neo4j.kernel.impl.api.index.IndexPopulationJob.run(IndexPopulationJob.java:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid argument
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:875)
at org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.map(StoreFileChannel.java:57)
at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.<init>(MappedPersistenceWindow.java:53)
... 13 more
Then, and I've figured out that this is when everything falls apart, I get this error in messages.log:
2015-01-27 21:59:41.516+0000 ERROR [org.neo4j]: setting TM not OK. Kernel has encountered some problem, please perform neccesary action (tx recovery/restart) null
javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
2015-01-27 21:59:41.519+0000 ERROR [org.neo4j]: TM error tx commit commit threw exception
Any ideas on what's causing that .frq file to disappear?
We resolved it in a side-conversation, maximum open files was too low (4000) which is also reported at startup.
That causes Lucene to break internally.
After increasing the limit the OP could import the data successfully.