Ignite gridgain generated project openfile limit issue

Ignite gridgain generated project openfile limit issue - java

I m trying to cache a large dataset of some tables, My server is centos based with 8Go ram and 500Go disk space
I configured my local storage policy to persist and after getting a file open limit issue I tried to make to to 2 000 000 following theses steps
vi /etc/sysctl.conf
fs.file-max = 2000000 (2 million)
:wq
sysctl -p
but even using this fix
and setting the work directory on chmod -x I m still having the following error prompt
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]
class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:448)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:337)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:478)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:462)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:853)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:694)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getOrAllocatePartitionMetas(GridCacheOffheapManager.java:1679)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1507)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2137)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:429)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4261)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3407)
at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:771)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.loadEntry(GridDhtCacheAdapter.java:683)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.access$600(GridDhtCacheAdapter.java:103)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:633)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:629)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter$3.apply(GridCacheStoreManagerAdapter.java:535)
at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:469)
at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:433)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.<init>(AsyncFileIO.java:56)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:43)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:420)
... 23 more
Nov 24, 2019 4:54:51 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]
what could I do to fix IT

Adding the following configuration was enough for me to avoid this exception
vi /etc/security/limits.conf
root soft nofile 10240
root hard nofile 20480
Then in /etc/sysctl.conf I appended the max watcher config
fs.inotify.max_user_watches=524288
Knowing that root is my user account name
The values are experimental I m not sure if this is safe but I hadn't any remarquable issue in my VM
I didn't drop the previous configuration
A reboot was needed
Credit to #Stephen Darlington
Just to explain what's going on here: fs.file-max sets an overall limit for the operating system. The stuff in limits.conf set limits for each user. The only other thing I would add is that if you're running Ignite as a user other than root (recommended) you'd change that users limits.

Related

Testcontainers + db2 issue

I am having an issue with db2 using testcontainers. I keep receiving a connection refused error.
When running db2 with:
docker run I am able to connect with dbvis.
using fabric8 maven plugin to start the db2 container and again I am able to connect with dbvis
I put a breakpoint in the junit5 test and attempt access db2 and I receive the connection refused.
My db2 testcontainers configuration:
#Testcontainers
public class ArchiveTest {
#Container
private static final Db2Container DB2 = new Db2Container("ibmcom/db2:11.5.7.0").withPrivilegedMode(true)
.acceptLicense().withUsername("db2inst1").withPassword("password").withDatabaseName("BPMF")
.withEnv("ARCHIVE_LOGS", "false").withEnv("PERSISTENT_HOME", "false");
The db2 logs from docker:
(*) Previous setup has not been detected. Creating the users...
(*) Creating users ...
(*) Creating instance ...
DB2 installation is being initialized.
Total estimated time for all tasks to be performed: 309 second(s)
Total number of tasks to be performed: 4
Estimated time 1 second(s)
Description: Setting default global profile registry variables
Task #1 start
Task #1 end
Estimated time 5 second(s)
Description: Initializing instance list
Task #2 start
Task #2 end
Estimated time 300 second(s)
Description: Configuring DB2 instances
Task #3 start
Task #3 end
The execution completed successfully.
Task #4 end
Estimated time 3 second(s)
Description: Updating global profile registry
Task #4 start
For more information see the DB2 installation log at "/tmp/db2icrt.log.72".
(*) Fixing /etc/services file for DB2 ...
DBI1070I Program db2icrt completed successfully.
DBI1446I The db2icrt command is running.
chown: cannot access '/database/config/db2inst1/sqllib/adm/fencedid': No such file or directory
03/16/2022 10:26:18 0 0 SQL1032N No start database manager command was issued.
SQL1032N No start database manager command was issued. SQLSTATE=57019
(*) Cataloging existing databases
(*) Applying Db2 license ...
ls: cannot access /database/data/db2inst1/NODE0000: No such file or directory
LIC1426I This product is now licensed for use as outlined in your License Agreement. USE OF THE PRODUCT CONSTITUTES ACCEPTANCE OF THE TERMS OF THE IBM LICENSE AGREEMENT, LOCATED IN THE FOLLOWING DIRECTORY: "/opt/ibm/db2/V11.5/license/en_US.iso88591"
LIC1402I License added successfully.
(*) Updating DBM CFG parameters ...
(*) Saving the checksum of the current nodelock file ...
successfully.
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed
(*) Remounting /database with suid...
No Cgroup memory limit detected, instance memory will follow automatic tuning
(*) Nothing appears in the Db2 directory. will skip update/upgrade.
(*) Code level is the same. No update/upgrade needed.
DB2 State : Operable
Starting DB2...
DB2 has not been started
SQL1063N DB2START processing was successful.
03/16/2022 10:26:29 0 0 SQL1063N DB2START processing was successful.
(*) Creating database BPMF ...
(*) User chose to create BPMF database
DB20000I The CREATE DATABASE command completed successfully.
DB20000I The ACTIVATE DATABASE command completed successfully.
(*) Instance and database will not be auto configured. AUTOCONFIG has been set to false.
(*) Log archiving will not be configured as ARCHIVE_LOGS has been set to false.
(*) Skipping TEXT_SEARCH setup for database BPMF because TEXT_SEARCH is not configured for the instance ...
ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519
(*) All databases are now active.
(*) Setup has completed.
2022-03-16 12:27:28 | INFO | [main] d.5.7.0]:503 - Container ibmcom/db2:11.5.7.0 started in PT1M27.212S
The error from java is:
Caused by: com.ibm.db2.jcc.am.DisconnectNonTransientConnectionException: [jcc][t4][2043][11550][4.25.13] Exception java.net.ConnectException: Error opening socket to server /127.0.0.1 on port 50,000 with message: Connection refused: connect. ERRORCODE=-4499, SQLSTATE=08001
at com.ibm.db2.jcc.am.b6.a(b6.java:338)
at com.ibm.db2.jcc.am.b6.a(b6.java:435)
at com.ibm.db2.jcc.t4.a0.a(a0.java:445)
at com.ibm.db2.jcc.t4.a0.<init>(a0.java:96)
at com.ibm.db2.jcc.t4.a.b(a.java:366)
at com.ibm.db2.jcc.t4.b.newAgent_(b.java:2148)
at com.ibm.db2.jcc.am.Connection.initConnection(Connection.java:839)
at com.ibm.db2.jcc.am.Connection.<init>(Connection.java:784)
at com.ibm.db2.jcc.t4.b.<init>(b.java:350)
at com.ibm.db2.jcc.DB2SimpleDataSource.getConnection(DB2SimpleDataSource.java:233)
at com.ibm.db2.jcc.DB2SimpleDataSource.getConnection(DB2SimpleDataSource.java:200)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:471)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:113)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.osjava.datasource.SJDataSource.getConnection(SJDataSource.java:115)
at org.osjava.datasource.SJDataSource.getConnection(SJDataSource.java:106)
at org.osjava.datasource.SJDataSource.getConnection(SJDataSource.java:88)
at org.flywaydb.core.internal.jdbc.JdbcUtils.openConnection(JdbcUtils.java:48)
... 105 common frames omitted
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at com.ibm.db2.jcc.t4.x.run(x.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at com.ibm.db2.jcc.t4.a0.a(a0.java:431)
... 121 common frames omitted
Caused by: org.springframework.beans.BeanInstantiationException:
Failed to instantiate [org.flywaydb.core.Flyway]: Factory method 'migration' threw exception; nested exception is org.flywaydb.core.internal.exception.FlywaySqlException: Unable to obtain connection from database: [jcc][t4][2043][11550][4.25.13] Exception java.net.ConnectException: Error opening socket to server /127.0.0.1 on port 50,000 with message: Connection refused: connect. ERRORCODE=-4499, SQLSTATE=08001
I have confirmed my JDBC parameters are correct...so I am at a bit of a loss where it is going wrong.
EDIT 1: db2 is running:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7046334e6c8 ibmcom/db2:11.5.0.0a "/var/db2_setup/lib/…" About a minute ago Up About a minute 22/tcp, 55000/tcp, 60006-60007/tcp, 0.0.0.0:53444->50000/tcp wizardly_cartwright
ccfe6845bfb1 testcontainers/ryuk:0.3.3 "/app" About a minute ago Up About a minute 0.0.0.0:53439->8080/tcp testcontainers-ryuk-99222438-9340-47ca-b6d2-0a13bfe50f9d
EDIT2: docker-for-java command parameters:
AbstrDockerCmd:34 - Cmd: org.testcontainers.shaded.com.github.dockerjava.core.command.CreateContainerCmdImpl#7df60067[name=<null>,hostName=<null>,domainName=<null>,user=<null>,attachStdin=<null>,attachStdout=<null>,attachStderr=<null>,portSpecs=<null>,tty=<null>,stdinOpen=<null>,stdInOnce=<null>,env={DB2INSTANCE=db2inst1,AUTOCONFIG=false,ARCHIVE_LOGS=false,DB2INST1_PASSWORD=password,PERSISTENT_HOME=false,DBNAME=BPMF,LICENSE=accept},cmd={},healthcheck=<null>,argsEscaped=<null>,entrypoint=<null>,image=ibmcom/db2:11.5.0.0a,volumes=Volumes(volumes=[]),workingDir=<null>,macAddress=<null>,onBuild=<null>,networkDisabled=<null>,exposedPorts=ExposedPorts(exposedPorts=[50000/tcp]),stopSignal=<null>,stopTimeout=<null>,hostConfig=HostConfig(binds=[], blkioWeight=null, blkioWeightDevice=null, blkioDeviceReadBps=null, blkioDeviceWriteBps=null, blkioDeviceReadIOps=null, blkioDeviceWriteIOps=null, memorySwappiness=null, nanoCPUs=null, capAdd=null, capDrop=null, containerIDFile=null, cpuPeriod=null, cpuRealtimePeriod=null, cpuRealtimeRuntime=null, cpuShares=null, cpuQuota=null, cpusetCpus=null, cpusetMems=null, devices=null, deviceCgroupRules=null, deviceRequests=null, diskQuota=null, dns=null, dnsOptions=null, dnsSearch=null, extraHosts=[], groupAdd=null, ipcMode=null, cgroup=null, links=[], logConfig=LogConfig(type=null, config=null), lxcConf=null, memory=null, memorySwap=null, memoryReservation=null, kernelMemory=null, networkMode=null, oomKillDisable=null, init=null, autoRemove=null, oomScoreAdj=null, portBindings={50000/tcp=[Lcom.github.dockerjava.api.model.Ports$Binding;#393881f0}, privileged=true, publishAllPorts=null, readonlyRootfs=null, restartPolicy=null, ulimits=null, cpuCount=null, cpuPercent=null, ioMaximumIOps=null, ioMaximumBandwidth=null, volumesFrom=[], mounts=null, pidMode=null, isolation=null, securityOpts=null, storageOpt=null, cgroupParent=null, volumeDriver=null, shmSize=null, pidsLimit=null, runtime=null, tmpFs=null, utSMode=null, usernsMode=null, sysctls=null, consoleSize=null),labels={org.testcontainers=true, org.testcontainers.sessionId=090442b0-8cc4-4f6e-b07e-1afdfed5ec15},shell=<null>,networkingConfig=<null>,ipv4Address=<null>,ipv6Address=<null>,aliases=<null>,authConfig=<null>,platform=<null>]
As I am using smplie-jndi and have the JDBC parameters in property files, the port for the JDBC URL is not 50000. Busy looking how to set it to a specific port as the default is specified in the DB2Container class
EDIT 1: It's mentioned in the docs
Note that this exposed port number is from the perspective of the container.
From the host's perspective Testcontainers actually exposes this on a random free port. This is by design, to avoid port collisions that may arise with locally running software or in between parallel test runs.

Please make sure to use DB2.getJdbcUrl() or similar access the container after starting it. Testcontainers publishes the exposed ports of the container to a random free host port and this dynamic port needs to be injected into your system under test at runtime. Depending on framework and libraries, there are different ways to achieve these, either configuration properties in Spring, or worst case, by templating config files.

hadoop error: No MD5 file found corresponding to image file

Got an old hadoop system (that haven't been used for years), when trying to restart the cluster (1 master, 2 slaves), all on Linux, got error, on the namenode.
Error output:
2021-03-18 20:18:28,628 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Failed to load image from FSImageFile(file=/home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607, cpktTxId=0000000000000480607)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:651)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:627)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
Caused by: java.io.IOException: No MD5 file found corresponding to image file /home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:736)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:632)
... 9 more
2021-03-18 20:18:28,631 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2021-03-18 20:18:28,633 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
More info:
One of the slave's datanode's partition has bad disk blocks, so I have removed that partition from /etc/fstab so that to bring the Linux up. So, that slave's data is lost.
What I have tried:
Start the cluster, including the all 3 nodes, got above error.
Start the cluster, excluding the bad slave, thus only 2 nodes, still got above error.
Questions:
A. What the error means ?
B. Is it relevant to the bad slave?
C. Is there anyway to recover without re-format hdfs filesystem on namenode?

There should be a file called:
/home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607.md5
In the same location as the image file. It will have contents that look like this:
177e5f4ed0b7f43eb9e274903e069da4 *fsimage_0000000000000014367
Simply get the md5 sum of your fsimage file:
md5sum fsimage_0000000000000480607.md5
Then create a new md5 file that looks like:
xxxxxx *fsimage_0000000000000480607.md5
Where xxxxxx is the md5sum from the md5 command.

How do you configure the java heap size for zookeeper?

I am trying to set up Zookeeper 3.4.10 on an Ubuntu 18.04LTS Azure VM.
Per the ZooKeeper Administrator's Guide
"incorrect Java heap size
You should take special care to set your Java max heap size correctly. In particular, you should not create a situation in which ZooKeeper swaps to disk..."
The guide does not give instructions on how to set the max heap size. I've already done a little searching and found suggestions to create a java.env in the zookeeper/conf directory. I've done this and have tried setting the variable using two different methods I've found during research:
Attempt #1:
export JVMFLAGS="-Xmx6144m"
Attempt #2
#!/bin/bash
export CLASSPATH="~/zookeeper-3.4.10/conf/log4j.properties"
export JVMFLAGS="-Xmx6144m"
After making these changes, and restarting zookeeper, I checked the java heap size with:
java -XshowSettings:vm
And the max heap size is not changing.
What are the proper steps for configuring the max heap for zookeeper?

Binary release of Zookeeper contains bin directory with following files in which you interested in:
zkServer.sh
zkEnv.sh
zkEnv.sh defines location of all config files and some JVM tuning knobs, such as JVM heap size. JVM heap size can be changed by shell variable ZK_SERVER_HEAP(in MB).
Use following command to set custom heap size:
cd bin/
ZK_SERVER_HEAP=128 ./zkServer.sh start-foreground
In output of recent versions of Zookeeper you can find following lines:
2021-01-14 17:24:18,400 [myid:] - INFO [main:Environment#98] - Server environment:os.memory.free=114MB
2021-01-14 17:24:18,400 [myid:] - INFO [main:Environment#98] - Server environment:os.memory.max=128MB
2021-01-14 17:24:18,400 [myid:] - INFO [main:Environment#98] - Server environment:os.memory.total=128MB
Other JVM opts can be set using SERVER_JVMFLAGS, for instance, use non default GC:
ZK_SERVER_HEAP=128 SERVER_JVMFLAGS="-XX:+UseShenandoahGC" ./zkServer.sh start-foreground

ES Query Exception in Storm Crawler

I am using following packages
Apache zookeeper 3.4.14
Apache storm 1.2.3
Apache Maven 3.6.2
ElasticSearch 7.2.0 (hosted locally)
Java 1.8.0_252
aws ec2 medium instance with 4GB ram
I have used this command to increase the virtual memory for jvm(Earlier it was showing error for jvm not having enough memory)
sysctl -w vm.max_map_count=262144
I have created maven package with -
mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -
DarchetypeArtifactId=storm-crawler-elasticsearch-archetype -DarchetypeVersion=LATEST
Command used for submitting topology
storm jar target/newscrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-crawler.flux --sleep 30000
when i run this command, it shows my topology is submitted sucessfully, and in elasticsearch status index it shows FETCH_ERROR and also the url from seeds.txt
content index shows no hits in elasticsearch
In worker.log file there were many exceptions of following type-
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_252]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_252]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) [stormjar.jar:?]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) [stormjar.jar:?]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) [stormjar.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
2020-06-12 10:31:14.635 c.d.s.e.p.AggregationSpout Thread-46-spout-executor[17 17] [INFO] [spout #7] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z
2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout Thread-32-spout-executor[19 19] [INFO] [spout #9] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z
2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout pool-13-thread-1 [ERROR] [spout #7] Exception with ES query
There are following logs in worker.log related to elasticsearch
'Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A1&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"}],"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"},"status":503}
'
'
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A8&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}
'
I have checked health of shards, they are in green status.
Earlier i was using Java 11 , with which i was not able to submit topology so i shifted to java 8.
Now topology is submitted sucessfully, but no data is injected in Elasticsearch.
I want to know if there is a problem with version imcompatibility between java and elasticsearch or with any oher package.

Use an absolute path for the seed file and run it in remote mode. The local mode should be used mostly for debugging.
The sleep parameter is (I think) in milliseconds. The command above means that the topology will run for 30 seconds only, which doesn't give it much time to do anything.

Failing to set up Zookeeper cluster for Pulsar

I am trying to set up a Zookeeper cluster for Pulsar. I am following the instructions here, but I keep failing.
In my setup, I have two nodes, that should be part of the cluster. Since I need to deploy bookie to the same nodes, I executed
$ PULSAR_EXTRA_OPTS="-Dstats_server_port=8001" bin/pulsar-daemon start zookeeper
to start zookeeper. Afterwards, I am trying to init the cluster using this command:
bin/pulsar initialize-cluster-metadata \
--cluster pulsar-cluster-1 \
--zookeeper 10.100.100.77:2181 \
--configuration-store 10.100.100.77:2181 \
--web-service-url http://10.100.100.77:8080 \
--broker-service-url pulsar://10.100.100.77:6650 \
But I keep getting this error:
17:12:24.146 [main-SendThread(10.100.100.77:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket error occurred: 10.100.100.77/10.100.100.77:2181: Verbindungsaufbau abgelehnt
17:12:25.251 [main-SendThread(10.100.100.77:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.100.100.77/10.100.100.77:2181. Will not attempt to authenticate using SASL (unknown error)
I read here that I need to have an odd number of nodes, so I added a virtual machine on one of the nodes. When I start Zookeeper on it, it doesn't print an error message, but but shows:
$ PULSAR_EXTRA_OPTS="-Dstats_server_port=8001" bin/pulsar-daemon start zookeeper
doing start zookeeper ...
starting zookeeper, logging to /home/host1/apache-pulsar-2.4.0/logs/pulsar-zookeeper-host1-VirtualBox.log
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in version 11.0 and will likely be removed in a future release.
[AppClassLoader#27c170f0] info AspectJ Weaver Version 1.9.2 built on Wednesday Oct 24, 2018 at 15:43:33 GMT
[AppClassLoader#27c170f0] info register classloader jdk.internal.loader.ClassLoaders$AppClassLoader#27c170f0
[AppClassLoader#27c170f0] info using configuration file:/home/host1/apache-pulsar-2.4.0/lib/org.apache.pulsar-pulsar-zookeeper-utils-2.4.0.jar!/META-INF/aop.xml
[AppClassLoader#27c170f0] info using configuration file:/home/host1/apache-pulsar-2.4.0/lib/org.apache.pulsar-pulsar-zookeeper-2.4.0.jar!/META-INF/aop.xml
[AppClassLoader#27c170f0] info register aspect org.apache.pulsar.zookeeper.SerializeUtilsAspect
[AppClassLoader#27c170f0] info register aspect org.apache.pulsar.broker.zookeeper.aspectj.ClientCnxnAspect
However the Zookeeper service is not started, even if the setup is very similar to its host and I can't make up why.
Any Ideas how I could proceed from here? Thanks in advance!

The first error you posted seems to indicate that the connection to 10.100.100.77:2181 is refused "Verbindungsaufbau abgelehnt", and therefore the ZK server isn't listening on that server and port. You should first confirm that ZK is up and running and check the ZK log for any errors.
HTH

I found the soulution. The original error was indeed caused by having an odd number of nodes. The third (virtual) one wouldn't start, because of a mislocation of Zookepers data-directory. Now that the third server started, also the configuration passed successfully.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Ignite gridgain generated project openfile limit issue - java

Related

Testcontainers + db2 issue

hadoop error: No MD5 file found corresponding to image file

How do you configure the java heap size for zookeeper?

ES Query Exception in Storm Crawler

Failing to set up Zookeeper cluster for Pulsar

Categories

Resources