I am experiencing Embedded InfiniSpan cache issue where nodes timeout on re-joining the cluster.
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 7 from vvshost
at org.infinispan.remoting.transport.impl.SingleTargetRequest.onTimeout(SingleTargetRequest.java:64)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
The only way I can get the node to re-join is to switch off the cache and delete all local cache persistence files.
Here is the configuration which I am using:
Transport:
TransportConfigurationBuilder - defaultClusteredBuild
JMX Statistics - Enabled
Duplicate domains - Allowed
Cache Manager:
Manager Class - EmbeddedCacheManager
Memory - Memory Size: 0
Persistence: Single File Store
async: disabled
Clustering Cache Mode - CacheMode.DIST_SYNC
It seems right to me, but the value of remote-timeout is "15000" milliseconds by default. Increase the timeout until you stop getting the error.
Hope it helps
Related
Our ignite server is showing the following error.
We are not performing any queries or Ignite tasks, we are just using the put and get features of the caches.
We do have persistence enabled.
We did connect Ignite Visor, and it shows nothing anomalous within the cluster topology or with any of the caches.
Any insight into this error would be appreciated.
[15:01:03,943][SEVERE][sys-stripe-4-#5][FailureProcessor] A critical problem with persistence data structures was detected. Please make backup of persistence storage and WAL files for further analysis. Persistence storage path: null WAL path: /ignite/wal WAL archive path: /ignite/walarchive
[15:01:03,946][SEVERE][sys-stripe-4-#5][FailureProcessor] No deadlocked threads detected.
....
[15:01:04,135][SEVERE][sys-stripe-4-#5][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-2113909708, val2=1125985806188550]], msg=Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=20, val=2AC8D168CC86654C1879E6999D1CCFD7, hasValBytes=true], hash=741925420, cacheId=0]]
Edit, Server side stack trace:
[18:40:50,919][SEVERE][sys-stripe-4-#5][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-2113909708, val2=1125985806188550]], msg=Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=20, val=2AC8D168CC86654C1879E6999D1CCFD7, hasValBytes=true], hash=741925420, cacheId=0]]]]
class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-2113909708, val2=1125985806188550]], msg=Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=20, val=2AC8D168CC86654C1879E6999D1CCFD7, hasValBytes=true], hash=741925420, cacheId=0]]
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1953)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1765)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1748)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2794)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2342)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2589)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2049)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1866)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1725)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3228)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:143)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:284)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:279)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1908)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1529)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1422)
at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Record is too long [capacity=134217728, size=134219738]
at org.apache.ignite.internal.processors.cache.persistence.wal.SegmentedRingByteBuffer.offer0(SegmentedRingByteBuffer.java:214)
at org.apache.ignite.internal.processors.cache.persistence.wal.SegmentedRingByteBuffer.offer(SegmentedRingByteBuffer.java:193)
at org.apache.ignite.internal.processors.cache.persistence.wal.filehandle.FileWriteHandleImpl.addRecord(FileWriteHandleImpl.java:243)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:858)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:811)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4338)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(GridCacheMapEntry.java:6492)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6244)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5928)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4021)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3915)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2042)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1920)
... 27 more
The issue is caused by the insertion of the record whose size is greater than the actual WAL buffer size.
Ignite uses a WAL buffer to store serialized WAL records before writing them to the WAL file. Due to that, the size of each WAL record should be less than the actual size of the WAL buffer.
By default, the WAL buffer size is equal to the configured WAL segment size property. But in case IGNITE_WAL_MMAP is disabled then the WAL buffer size will be limited by the WAL buffer size property, which has the WAL segment size / 4 value as a default.
As a workaround, you can try to increase the WAL buffer size using the properties mentioned above. More details regarding that can be found here.
I am developing an application using play framework (version 2.8.0), java(version 1.8) with an oracle database(version 12C).
There is only zero or one hit to the database in a day, I am getting below error.
java.sql.SQLRecoverableException: IO Error: Socket read timed out
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:919)
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:2005)
at com.zaxxer.hikari.pool.PoolBase.quietlyCloseConnection(PoolBase.java:138)
at com.zaxxer.hikari.pool.HikariPool.lambda$closeConnection$1(HikariPool.java:447)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Socket read timed out
at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:174)
at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101)
at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98)
at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
at oracle.jdbc.driver.T4C7Ocommoncall.doOLOGOFF(T4C7Ocommoncall.java:62)
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:908)
... 6 common frames omitted
db {
default {
driver=oracle.jdbc.OracleDriver
url="jdbc:oracle:thin:#XXX.XXX.XXX.XX:XXXX/XXXXXXX"
username="XXXXXXXXX"
password="XXXXXXXXX"
hikaricp {
dataSource {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
}
It seems it is causing due to inactive database connection, How can I solve this?
Please let me know if any other information is required?
You can enable TCP keepalive for JDBC - either be setting directive or by adding "ENABLE=BROKEN" into connection string.
Usually Cisco/Juniper cuts off TCP connection when it is inactive for more that on hour.
While Linux kernel starts sending keepalive probes after two hours(tcp_keepalive_time). So if you decide to turn tcp keepalive on, you will also need root, to change this kernel tunable to lower value(10-15 minutes)
Moreover HikariCP should not keep open any connection for longer than 30 minutes - by default.
So if your FW, Linux kernel and HikariCP all use default settings, then this error should not occur in your system.
See HikariCP official documentation
maxLifetime:
This property controls the maximum lifetime of a connection in the
pool. An in-use connection will never be retired, only when it is
closed will it then be removed. On a connection-by-connection basis,
minor negative attenuation is applied to avoid mass-extinction in the
pool. We strongly recommend setting this value, and it should be
several seconds shorter than any database or infrastructure imposed
connection time limit. A value of 0 indicates no maximum lifetime
(infinite lifetime), subject of course to the idleTimeout setting. The
minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30
minutes)
I have added the below configuration for hickaricp in configuration file and it is
working fine.
## Database Connection Pool
play.db.pool = hikaricp
play.db.prototype.hikaricp.connectionTimeout=120000
play.db.prototype.hikaricp.idleTimeout=15000
play.db.prototype.hikaricp.leakDetectionThreshold=120000
play.db.prototype.hikaricp.validationTimeout=10000
play.db.prototype.hikaricp.maxLifetime=120000
I use Spring-boot version 2.0.2 to make web application with default connection pool HikariCP.
HikariCP debug log shows collect connection size like 2, but spring boot metrics show connection creation is 1.
Did I misunderstand?
Thanks in advance.
application.yml is the below
spring:
datasource:
minimum-idle: 2
maximum-pool-size: 7
Log:
DEBUG 8936 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - After cleanup stats (total=2, active=0, idle=2, waiting=0)
URL for metrics:http://localhost:8080/xxx/metrics/hikaricp.connections.creation
Response:
{
name: "hikaricp.connections.creation",
measurements:
[
{
statistic: "COUNT",
value: 1 <--- I think this should be 2
},
...
]
}
What you are seeing is HikariCPs failfast check behaviour with regards to tracking metrics at this stage.
(I actually dug into this as I didn't know the answer beforehand)
At this stage a MetricsTracker isn't set yet and thus the initial connection creation isn't counted. In case the initial connection could be established, HikariCP just keeps this connection. In your case only the next connection creation is counted.
In case you really want the metric value to be "correct" you can set spring.datasource.hikari.initialization-fail-timeout=-1. The behaviour is described in HikariCPs README under initializationFailTimeout.
If you really need a "correct" value is debatable as you'll only miss that initial count. Ideally you'll want to reason about connection creation counts in a specific time window - e.g. rate of connection creations per minute to determine if you dispose connections too early from the pool.
I have an application running on Websphere Application Server 6.1.0.43. And I'm having slowdown issues when thrying to invoke a remote service.
The slowdown is on the method findGroupAndGetConnection from the class outboundConnectionCache.
According to the IBM APAR PK94494:
The delay occurs after the client-side JAX-RPC handler (if present) is invoked and before the actual SOAP message is sent to the provider.
Because the delay occurs in the IBM web services engine, this problem
can be difficult to detect.
A com.ibm.ws.webservices.engine.transport.*=all trace will show entries similar to these which repeat:
[8/19/09 18:08:29:658 GMT] 00000047 OutboundConne 1 Enter:
WSWS3595I: Current pool size: 25. Connections-in-use size: 0.
Configured pool size: 25
In addition, that same trace spec will show long delays in executing
the .findGroupAndGetConnection() method:
[8/19/09 18:08:03:428 GMT] 00000047 OutboundConne >
OutboundConnectionCache.findGroupAndGetConnection()
WAITING_THREADS_THRESHOLD is 5 Entry
[8/19/09 18:08:38:358 GMT] 00000047 OutboundConne <
OutboundConnectionCache.findGroupAndGetConnection() Exit
And they recommend the following:
Reduce the 'com.ibm.websphere.webservices.http.connectionPoolCleanUpTime' from
the default of 180 to 120 seconds
Increase the max connections 'com.ibm.websphere.webservices.http.maxConnection' property from
default of 25 to 50. This will also require increasing the web
container thread pool size to 100.
Before changing the default properties I decided to monitor the Web Container thread usage and I noticed that maximum thread pool size (50) is never reached, but the minimum pool size (10) is reached very often, forcing connections to be destroyed and recreated.
Running over the minimum pool size will cause this slowdown? Should I increase the minimum pool size? Is my problem something other than http outbound connection pool?
After running my application, i am getting this error after around 5 mins.
Even though i am returning the resource after use, i keep getting this.
I have built jedis-2.2.2-SNAPSHOT.jar from the jedis code base, since its not released yet
I had set the minIdle = 100, maxIdle=200 & maxActive=200. At the time of this exception, the connection count to redis was 122 from my application
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:42)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:442)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
at redis.clients.util.Pool.getResource(Pool.java:40)
... 6 more
Did you check that redis is still up & running ?
If not, investigate why it died.
try a redis-cli in a terminal if you can. "info" would give you more details.