error when opening a lucene index: java.io.IOException: Map failed

error when opening a lucene index: java.io.IOException: Map failed - java

I have a problem which I think is the same as that described here:
Error when opening a lucene index: Map failed
However the solution does not apply in this case so I am providing more details and asking again.
The index is created using Solr 5.3
The line of code causing the exception is:
IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get("the_path")));
The exception stacktrace is:
Exception in thread "main" java.io.IOException: Map failed: MMapIndexInput(path="/mnt/fastdata/ac1zz/JATE/solr-5.3.0/server/solr/jate/data_aclrd/index/_5t.tvd") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 434505698 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'), and 'sysctl vm.max_map_count'. More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:265)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:239)
at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.<init>(CompressingTermVectorsReader.java:144)
at org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:91)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:120)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at uk.ac.shef.dcs.jate.app.AppATTF.extract(AppATTF.java:39)
at uk.ac.shef.dcs.jate.app.AppATTF.main(AppATTF.java:33)
The suggested solutions as in the exception message do not work in this case because I am running the application on a server and I do not have permissions to change those.
Namely,
ulimit -v unlimited
prints: "-bash: ulimit: virtual memory: cannot modify limit: Operation not permitted"
and
sysctl -w vm.max_map_count=10000000
gives:"error: permission denied on key 'vm.max_map_count'"
Is there anyway I can solve this?
Thanks

I have found a solution and so I am answering myself.
If you really cannot set ulimit or vm.max_map_count, the only solution, according to http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html, is to configure your solr (or if you work with Lucene api, choose explicitly) to use SimpleFSDirectory (if windows) or NIOFSDirectory, both are slower than the default.
For example
DirectoryReader.open(new NIOFSDirectory(Paths.get("path_to_index"), FSLockFactory.getDefault()))

Related

Spark on mesos Executors failing with OOM Errors

We are using spark 2.0.2 managed by a DCOS system that fetch data from a Kafka 1.0.0 messaging service and writes parquet in a hdfs system.
Every thing was working ok, but when we increase the number of topics in Kafka, our spark executors began to crash constantly with OOM errors:
java.lang.OutOfMemoryError: Java heap space
at org.apache.parquet.column.values.dictionary.IntList.initSlab(IntList.java:90)
at org.apache.parquet.column.values.dictionary.IntList.<init>(IntList.java:86)
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.<init>(DictionaryValuesWriter.java:93)
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainDoubleDictionaryValuesWriter.<init>(DictionaryValuesWriter.java:422)
at org.apache.parquet.column.ParquetProperties.dictionaryWriter(ParquetProperties.java:139)
at org.apache.parquet.column.ParquetProperties.dictWriterWithFallBack(ParquetProperties.java:178)
at org.apache.parquet.column.ParquetProperties.getValuesWriter(ParquetProperties.java:203)
at org.apache.parquet.column.impl.ColumnWriterV1.<init>(ColumnWriterV1.java:83)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.newMemColumn(ColumnWriteStoreV1.java:68)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.getColumnWriter(ColumnWriteStoreV1.java:56)
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.<init>(MessageColumnIO.java:183)
at org.apache.parquet.io.MessageColumnIO.getRecordWriter(MessageColumnIO.java:375)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:99)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:217)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:175)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:146)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:113)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:87)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:62)
at org.apache.parquet.avro.AvroParquetWriter.<init>(AvroParquetWriter.java:47)
at npm.parquet.ParquetMeasurementWriter.ensureOpenWriter(ParquetMeasurementWriter.java:91)
at npm.parquet.ParquetMeasurementWriter.write(ParquetMeasurementWriter.java:75)
at npm.ingestion.spark.StagingArea$Measurements.store(StagingArea.java:100)
at npm.ingestion.spark.StagingArea$StagingAreaStorage.store(StagingArea.java:80)
at npm.ingestion.spark.StagingArea.add(StagingArea.java:40)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.sendToStagingArea(Kafka2HDFSPM.java:207)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.consumeRecords(Kafka2HDFSPM.java:193)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.process(Kafka2HDFSPM.java:169)
at npm.ingestion.spark.Kafka2HDFSPM$FetchSubsetsAndStore.call(Kafka2HDFSPM.java:133)
at npm.ingestion.spark.Kafka2HDFSPM$FetchSubsetsAndStore.call(Kafka2HDFSPM.java:111)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
18/03/20 18:41:13 ERROR [Executor task launch worker-0] SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.OutOfMemoryError: Java heap space
at org.apache.parquet.column.values.dictionary.IntList.initSlab(IntList.java:90)
at org.apache.parquet.column.values.dictionary.IntList.<init>(IntList.java:86)
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.<init>(DictionaryValuesWriter.java:93)
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainDoubleDictionaryValuesWriter.<init>(DictionaryValuesWriter.java:422)
at org.apache.parquet.column.ParquetProperties.dictionaryWriter(ParquetProperties.java:139)
at org.apache.parquet.column.ParquetProperties.dictWriterWithFallBack(ParquetProperties.java:178)
at org.apache.parquet.column.ParquetProperties.getValuesWriter(ParquetProperties.java:203)
at org.apache.parquet.column.impl.ColumnWriterV1.<init>(ColumnWriterV1.java:83)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.newMemColumn(ColumnWriteStoreV1.java:68)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.getColumnWriter(ColumnWriteStoreV1.java:56)
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.<init>(MessageColumnIO.java:183)
at org.apache.parquet.io.MessageColumnIO.getRecordWriter(MessageColumnIO.java:375)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:99)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:217)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:175)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:146)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:113)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:87)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:62)
at org.apache.parquet.avro.AvroParquetWriter.<init>(AvroParquetWriter.java:47)
at npm.parquet.ParquetMeasurementWriter.ensureOpenWriter(ParquetMeasurementWriter.java:91)
at npm.parquet.ParquetMeasurementWriter.write(ParquetMeasurementWriter.java:75)
at npm.ingestion.spark.StagingArea$Measurements.store(StagingArea.java:100)
at npm.ingestion.spark.StagingArea$StagingAreaStorage.store(StagingArea.java:80)
at npm.ingestion.spark.StagingArea.add(StagingArea.java:40)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.sendToStagingArea(Kafka2HDFSPM.java:207)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.consumeRecords(Kafka2HDFSPM.java:193)
at npm.ingestion.spark.Kafka2HDFSPM$SubsetProcessor.process(Kafka2HDFSPM.java:169)
at npm.ingestion.spark.Kafka2HDFSPM$FetchSubsetsAndStore.call(Kafka2HDFSPM.java:133)
at npm.ingestion.spark.Kafka2HDFSPM$FetchSubsetsAndStore.call(Kafka2HDFSPM.java:111)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:218)
We tried to increase the available the executors memory, review the code, but we couldn't find anything wrong.
Another info: we are using RDDs in spark.
Have someone encountered a similar problem, that already been solved

What is the heap configuration for the executor? By default, Java will autotune its heap according to machine memory. You need to change it to fit in your container with -Xmx setting.
See this article about running Java in the container
https://github.com/fabianenardon/docker-java-issues-demo/tree/master/memory-sample

Ehcache Initial table allocation failed

One of our web application runs within Tomcat 7 which is deployed on AS400 server, and it is using Ehcache as cache component swap data into disk and reduce memory usage.
Few weeks ago, when we try to deploy this application for one of our customer, it fails at startup. And log shows:
Caused by: java.lang.IllegalStateException: Cache 'data' creation in EhcacheManager failed.
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:288)
at org.ehcache.core.EhcacheManager.init(EhcacheManager.java:567)
... 7 more
Caused by: org.ehcache.StateTransitionException: Initial table allocation failed.
Initial Table Size (slots) : 64
Allocation Will Require : 1KB
Table Page Source : org.terracotta.offheapstore.disk.paging.MappedPageSource#bc8a4ca2
at org.ehcache.core.StatusTransitioner$Transition.succeeded(StatusTransitioner.java:209)
at org.ehcache.core.Ehcache.init(Ehcache.java:567)
at org.ehcache.core.EhcacheManager.createCache(EhcacheManager.java:261)
... 8 more
Caused by: java.lang.IllegalArgumentException: Initial table allocation failed.
Initial Table Size (slots) : 64
Allocation Will Require : 1KB
Table Page Source : org.terracotta.offheapstore.disk.paging.MappedPageSource#bc8a4ca2
at org.terracotta.offheapstore.OffHeapHashMap.<init>(OffHeapHashMap.java:219)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap.<init>(AbstractLockedOffHeapHashMap.java:71)
at org.terracotta.offheapstore.AbstractOffHeapClockCache.<init>(AbstractOffHeapClockCache.java:76)
at org.terracotta.offheapstore.disk.persistent.AbstractPersistentOffHeapCache.<init>(AbstractPersistentOffHeapCache.java:43)
at org.terracotta.offheapstore.disk.persistent.PersistentReadWriteLockedOffHeapClockCache.<init>(PersistentReadWriteLockedOffHeapClockCache.java:36)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory$EhcachePersistentSegment.<init>(EhcachePersistentSegmentFactory.java:73)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory.newInstance(EhcachePersistentSegmentFactory.java:60)
at org.ehcache.impl.internal.store.disk.factories.EhcachePersistentSegmentFactory.newInstance(EhcachePersistentSegmentFactory.java:37)
at org.terracotta.offheapstore.concurrent.AbstractConcurrentOffHeapMap.<init>(AbstractConcurrentOffHeapMap.java:106)
at org.terracotta.offheapstore.concurrent.AbstractConcurrentOffHeapCache.<init>(AbstractConcurrentOffHeapCache.java:48)
at org.terracotta.offheapstore.disk.persistent.AbstractPersistentConcurrentOffHeapCache.<init>(AbstractPersistentConcurrentOffHeapCache.java:52)
at org.ehcache.impl.internal.store.disk.EhcachePersistentConcurrentOffHeapClockCache.<init>(EhcachePersistentConcurrentOffHeapClockCache.java:52)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.createBackingMap(OffHeapDiskStore.java:279)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.getBackingMap(OffHeapDiskStore.java:167)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore.access$600(OffHeapDiskStore.java:95)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.init(OffHeapDiskStore.java:460)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.initStore(OffHeapDiskStore.java:456)
at org.ehcache.impl.internal.store.disk.OffHeapDiskStore$Provider.initAuthoritativeTier(OffHeapDiskStore.java:507)
at org.ehcache.impl.internal.store.tiering.TieredStore$Provider.initStore(TieredStore.java:472)
at org.ehcache.core.EhcacheManager$8.init(EhcacheManager.java:499)
at org.ehcache.core.StatusTransitioner.runInitHooks(StatusTransitioner.java:135)
at org.ehcache.core.StatusTransitioner.access$000(StatusTransitioner.java:33)
at org.ehcache.core.StatusTransitioner$Transition.succeeded(StatusTransitioner.java:194)
this code triggered this is:
CacheConfiguration<String, String[]> dconf = CacheConfigurationBuilder
.newCacheConfigurationBuilder(String.class, String[].class, ResourcePoolsBuilder.heap(11)
.disk(3, MemoryUnit.GB, false))
.withExpiry(Expirations.timeToLiveExpiration(Duration.of(30, TimeUnit.MINUTES)))
.build();
dataCacheManager = CacheManagerBuilder.newCacheManagerBuilder()
.with(CacheManagerBuilder.persistence(new File(cacheFolder, "requestdata"))) //$NON-NLS-1$
.withCache(CACHE_NAME_DATA,dconf)
.build(true);
which surprised us because it has never happened before, we have deployed it for some other customers' server (Windows, As400, linux), none of them has this issues.
This is really a headache, we spend weeks try to figure it out, read source code, tuning jvm parameters, googling around..., nothing except one unanswered post: https://groups.google.com/forum/#!topic/ehcache-users/ApFAe5nYxuA
Is there anyone can help us one this? thanks ahead!

The Ehcache 3 disk store uses java.nio.MappedByteBuffer which require access to direct memory.
There is no documented default MaxDirectMemorySize in Java and the same JVM on different OS can behave differently.
If you have not already set the flag -XX:MaxDirectMemorySize=3G when launching your application, it could be the cause of that exception you see.

Apache-Flink Quickstart - reading CSV file error : Futures timed out after [10000 milliseconds]

I want to read CSV file using Flink-API locally, by the following code:
csvPath="data/weather.csv";
List<Tuple2<String, Double>> csv= env.readCsvFile(csvPath)
.types(String.class,Double.class).collect();
I tried some files in different size(from 800mb to 6gb). Sometimes the operation is completed successfully and sometimes it is not, because of the following timeout exception:
Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:169)
at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:439)
at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:408)
at org.apache.flink.client.LocalExecutor.stop(LocalExecutor.java:127)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:195)
at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:923)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
at org.apache.flink.simpleCSV.run(simpleCSV.java:83)
how can I fix this problem? increase this timeout programmatically? Or should I put a config file somewhere? Is there a specific heap size that I should set based on the file size?

collect() transfers the data from the cluster to the local client. This does only work for very small data sets (< 10 MB).
If you have larger data sets, you need to process them on the cluster and emit the results through an output format, e.g., write it to a file.

If you are debugging this program, you can set a break point at the constructor of org.apache.flink.api.java.LocalEnvironment (the constructor with config) and run the following command to change the timeout to 200 seconds (Alt+F8 in IntelliJ Idea):
config.setString("akka.ask.timeout", "200 s")
To find LocalEnvironment class in IntelliJ Idea, press Ctr+n, and check "Include non-project classes in the pop-up window, then type "LocalEnvironment" in the edit box.

Questions about native calltrace using Java

We're debugging an error that causing a crash in a Tomcat web application.
The application uses 2 3rd-party apps over jni, one of the 3rd-parties using SmartHeap (it is a memory management library for c/c++ applications), the other don't (it is webMethods broker version 5).
The strange thing is I see in the crash log that webMethods calls its native methods to initiate a connection to the broker server, but if I print the call trace of the thread where the crash happened using WinDbg (loading the minidump file created when the JVM crashed), it contains calls to SmartHeap functions. Now i feel I'm a bit lost... because I've checked, and found no references to this dll from the webMethods binaries.
(actually a memory allocation is called)
My question is how is it possible?
I mean anybody could describe how this part is working? Because I thought that the interpreted/compiled and native frames are called in a fixed order (it is logical).
maybe the call stack is invalid? (now we have many dump files with almost the same call trace)
or the call trace (the calling order of the native functions) is valid, only some of the functions have been reordered before calling (like a lazy object has to be generated before sending it to the webMethods broker, but i don't see any sign of this)
I'm querying the call trace on the dump file by calling ".ecxr" and "kv", the output is:
0:060> .ecxr
eax=4d330554 ebx=4d350010 ecx=4d330010 edx=00000000 esi=4d350010 edi=00000000
eip=4c912f15 esp=4bf1dad0 ebp=3574884d iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
shsmp!shi_allocSmall2+0x195:
4c912f15 8b4d00 mov ecx,dword ptr [ebp] ss:0023:3574884d=????????
0:060> k
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr
4bf1daec 4c912bbd shsmp!shi_allocSmall2+0x195
4bf1dafc 4c91b973 shsmp!MemAllocPtr+0x5d
*** WARNING: Unable to verify checksum for awssl50jn.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for awssl50jn.dll -
4bf1db14 49abc38d shsmp!shi_malloc_dbg+0x23
WARNING: Stack unwind information not available. Following frames may be wrong.
4bf1db3c 49abeca2 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xa1cd
4bf1db48 49ab5e66 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xcae2
4bf1db4c 49ab5e55 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x3ca6
4bf1db60 49ab667d awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x3c95
4bf1db80 49abdbbc awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0x44bd
4bf1dc20 4c912f4f awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xb9fc
4bf1dc78 49abd607 shsmp!shi_allocSmall2+0x1cf
00000000 00000000 awssl50jn!Java_COM_activesw_api_client_ssl_AwSSLNative_getSecurityInfo+0xb447`
Any help would be appreciated!

Unable to get resource from jedis

After running my application, i am getting this error after around 5 mins.
Even though i am returning the resource after use, i keep getting this.
I have built jedis-2.2.2-SNAPSHOT.jar from the jedis code base, since its not released yet
I had set the minIdle = 100, maxIdle=200 & maxActive=200. At the time of this exception, the connection count to redis was 122 from my application
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:42)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:442)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
at redis.clients.util.Pool.getResource(Pool.java:40)
... 6 more

Did you check that redis is still up & running ?
If not, investigate why it died.
try a redis-cli in a terminal if you can. "info" would give you more details.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

error when opening a lucene index: java.io.IOException: Map failed - java

Related

Spark on mesos Executors failing with OOM Errors

Ehcache Initial table allocation failed

Apache-Flink Quickstart - reading CSV file error : Futures timed out after [10000 milliseconds]

Questions about native calltrace using Java

Unable to get resource from jedis

Categories

Resources