We have troubleshooting about below exception. Our environment versions;
openjdk version "11.0.18"
nginx/1.14.0
com.squareup.okhttp3:okhttp:3.8.0
when we tried our local environment without nginx and classic load balancer, we did not receive any error.
when application running our live environment, sometimes (it is not regular) we got an error, for example, i checked requests,
below request counts meaning success, for example 15 request success than got an exception, later 13 requests success, than again error. it is not regular or linear.
15,13,13,11,14,17,30,30,11,2,37,27,6,2,5,21,10,29,13,51.
I found some nginx configurations about proxy_redirect off; proxy_buffering off; and also found some bugs in nginx 1.15 but i did not approved this bug affected 1.14 version.
root#sandbox-gateway-1b.teller-sandbox [~] # cat /srv/www/gateway/logs/spring.log |grep '13:52:27.746' -A 30 2023-02-08 13:52:27.746 [https-jsse-nio-32752-exec-16] INFO JavaUtilLogger : [principal : null | request-id : 66420bde-abbe-4e9e-ba89-2a5b6e6390e9] HTTP CALL EXCEPTION: https://sandbox-api.example.com/api/v1/auth/token okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.java:384) at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.java:286) at okio.Buffer.writeAll(Buffer.java:1143) at okio.RealBufferedSource.readString(RealBufferedSource.java:203) at okhttp3.ResponseBody.string(ResponseBody.java:182) at common.restclient.HttpClient.doCall(HttpClient.java:92) at common.restclient.HttpClient.call(HttpClient.java:53) at common.restclient.RestClientImpl.postWithoutHeaders(RestClientImpl.java:45) at example.client.exampleClientImpl.login(exampleClientImpl.java:37) at com.dux.gateway.core.adapters.example.example.loginToexample(example.java:74) at com.dux.gateway.core.adapters.example.example.processTransaction(example.java:47) at com.dux.gateway.core.adapters.example.example.processTransaction(example.java:22) at com.dux.gateway.core.TransactionProcessorLoggerDecorator.processTransaction(TransactionProcessorLoggerDecorator.java:26) at com.dux.gateway.core.TransactionProcessorValidatorDecorator.processTransaction(TransactionProcessorValidatorDecorator.java:24) at com.dux.gateway.core.PaymentGateway.execute(PaymentGateway.java:87) at com.dux.gateway.api.controller.TransactionController.createTransaction(TransactionController.java:86) at jdk.internal.reflect.GeneratedMethodAccessor562.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:894) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1063) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:963) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909) at javax.servlet.http.HttpServlet.service(HttpServlet.java:681)
when we tried our local environment without nginx and classic load balancer, we did not receive any error.
when application running our live environment, sometimes (it is not regular) we got an error, for example, i checked requests,
below request counts meaning success, for example 15 request success than got an exception, later 13 requests success, than again error. it is not regular or linear.
15,13,13,11,14,17,30,30,11,2,37,27,6,2,5,21,10,29,13,51.
I found some nginx configurations about proxy_redirect off; proxy_buffering off; and also found some bugs in nginx 1.15 but i did not approved this bug affected 1.14 version.
Related
I'm working on switching keycloak (v.3.4.0 final) from using embedded infinispan to a dedicated remote infinispan cluster (v8.2.8.final). I've gone through the upgrade process to use infinispan cluster as remote-store in lower environments without issues. In my production setting I am running into a timeout exception on InfinispanCacheInitializer
Where error is happening on Keycloak: https://github.com/keycloak/keycloak/blob/3.4.2.Final/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/initializer/InfinispanCacheInitializer.java#L117
ERROR [org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer] (ServerService Thread Pool -- 54) ExecutionException when computed future. Errors: 13: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutExc
eption
at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:850)
at org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer.startLoading(InfinispanCacheInitializer.java:102)
at org.keycloak.models.sessions.infinispan.initializer.DBLockBasedCacheInitializer.startLoading(DBLockBasedCacheInitializer.java:75)
at org.keycloak.models.sessions.infinispan.initializer.CacheInitializer.loadSessions(CacheInitializer.java:41)
at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$2.run(InfinispanUserSessionProviderFactory.java:150)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)
at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadPersistentSessions(InfinispanUserSessionProviderFactory.java:137)
at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1.onEvent(InfinispanUserSessionProviderFactory.java:108)
at org.keycloak.services.DefaultKeycloakSessionFactory.publish(DefaultKeycloakSessionFactory.java:68)
at org.keycloak.services.resources.KeycloakApplication$2.run(KeycloakApplication.java:165)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)
at org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:158)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:150)
at org.jboss.resteasy.spi.ResteasyProviderFactory.createProviderInstance(ResteasyProviderFactory.java:2298)
at org.jboss.resteasy.spi.ResteasyDeployment.createApplication(ResteasyDeployment.java:340)
at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:253)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:120)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.init(HttpServletDispatcher.java:36)
at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)
at org.wildfly.extension.undertow.security.RunAsLifecycleInterceptor.init(RunAsLifecycleInterceptor.java:78)
at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:103)
at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:250)
at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:133)
at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:565)
at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:536)
at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)
at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)
at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)
at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:578)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at org.jboss.threads.JBossThread.run(JBossThread.java:320)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:88)
at org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart.getResult(DefaultExecutorService.java:1083)
at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.innerGet(DefaultExecutorService.java:868)
at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:848)
... 44 more
Overview
Keycloak version: 3.4.0.final (aware this is an older version - and in use with custom
implementation - not easy to upgrade)
startup script: ExecStart={{ keycloak_jboss_home }}/bin/standalone.sh -b using standalone.xml
Infinispan version: 8.2.8.final
Switching from embedded local cache to remote store configuration for the following caches:
users (distributed)
sessions (replicated)
authenticationSessions (replicated)
offlineSessions (replicated)
loginFailures (replicated)
authorization (replicated)
offline_user_session count: ~3 million
In an attempt to test out this sized cache sync from database, I've updated the configuration for timeouts in standalone.xml and standalone.conf files
keycloak/keycloak-3.4.0.Final/standalone/configuration/standalone.xml
Updated coordinator timeout to be 3 hours and commented out query-timeout
<subsystem xmlns="urn:jboss:domain:transactions:4.0">
<core-environment>
<process-id>
<uuid/>
</process-id>
</core-environment>
<recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>
<object-store path="tx-object-store" relative-to="jboss.server.data.dir"/>
<coordinator-environment default-timeout="10800"/>
</subsystem>
....
....
..
<!-- <timeout>
<query-timeout>15</query-timeout>
</timeout>-->
/keycloak/keycloak-3.4.0.Final/bin/standalone.conf
add blocking timeout to JAVA_OPTS
JAVA_OPTS="$JAVA_OPTS -Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS -Djava.awt.headless=true -Djboss.as.management.blocking.timeout=10800"
I'd like to note that after the first few attempts at getting this to work, when reverting back to using the embedded cache on the keycloak nodes, the data synced fine without any timeout errors in around 1.5 hours.
After startup of keycloak it takes about 60 minutes before it starts to sync offline user sessions. Taking a look at the queries being run by keycloak, I can see that the timeout errors occur about 5-10 minutes after it starts syncing offline_user_session records to offlineSessions cache
The queries being run before it times out are:
delete from OFFLINE_CLIENT_SESSION where not (exists (select persistent1_.USER_SESSION_ID from OFFLINE_USER_SESSION persistent1_ where persistent1_.USER_SESSION_ID=OFFLINE_CLIENT_SESSION.USER_SESSION_ID))
update OFFLINE_USER_SESSION set LAST_SESSION_REFRESH=$1
DELETE FROM JGROUPSPING WHERE own_addr=$1 AND cluster_name=$2
select count(persistent0_.OFFLINE_FLAG) as col_0_0_ from OFFLINE_USER_SESSION persistent0_ where persistent0_.OFFLINE_FLAG=$1
select userrolema0_.ROLE_ID as col_0_0_ from USER_ROLE_MAPPING userrolema0_ where userrolema0_.USER_ID=$1
select userentity0_.ID as ID1_76_, userentity0_.CREATED_TIMESTAMP as CREATED_2_76_, userentity0_.EMAIL as EMAIL3_76_, userentity0_.EMAIL_CONSTRAINT as EMAIL_CO4_76_, userentity0_.EMAIL_VERIFIED as EMAIL_VE5_76_, userentity0_.ENABLED as ENABLED6_76_, userentity0_.FEDERATION_LINK as FEDERATI7_76_, userentity0_.FIRST_NAME as FIRST_NA8_76_, userentity0_.LAST_NAME as LAST_NAM9_76_, userentity0_.NOT_BEFORE as NOT_BEF10_76_, userentity0_.REALM_ID as REALM_I11_76_, userentity0_.SERVICE_ACCOUNT_CLIENT_LI
select attributes0_.USER_ID as USER_ID4_72_0_, attributes0_.ID as ID1_72_0_, attributes0_.ID as ID1_72_1_, attributes0_.NAME as NAME2_72_1_, attributes0_.USER_ID as USER_ID4_72_1_, attributes0_.VALUE as VALUE3_72_1_ from USER_ATTRIBUTE attributes0_ where attributes0_.USER_ID=$1
select persistent0_.OFFLINE_FLAG as OFFLINE_1_47_, persistent0_.USER_SESSION_ID as USER_SES2_47_, persistent0_.DATA as DATA3_47_, persistent0_.LAST_SESSION_REFRESH as LAST_SES4_47_, persistent0_.REALM_ID as REALM_ID5_47_, persistent0_.USER_ID as USER_ID6_47_ from OFFLINE_USER_SESSION persistent0_ where persistent0_.OFFLINE_FLAG=$1 order by persistent0_.USER_SESSION_ID limit $2 offset $3
I setup Infinispan WebConsole UI so I can see the progress of the cache sync. Every time it gets 15k entries in (out of the ~3 million)
I am not positive the issue here since the syncing of offline sessions from database works fine for the embedded version, but for the remote infinispan startup it seems to have an issue either with the batching of the queries or there's another configuration I am missing either on keycloak of infinispan side.
Update - further testing
Setup a test environment with snapshot of database with 3.5 million OUS/OCS. The RDS instance was provisioned with 5500 IOPS. Upgrading to keycloak version 5.0, there were timeouts happening but running vacuum analyze on entire database resolved the issue and we were able to successfully standup remote infinispan. However, after successful runs, in our live environment we ran into the same timeouts and vacuum analyze did not fix the issue.
I am using following packages
Apache zookeeper 3.4.14
Apache storm 1.2.3
Apache Maven 3.6.2
ElasticSearch 7.2.0 (hosted locally)
Java 1.8.0_252
aws ec2 medium instance with 4GB ram
I have used this command to increase the virtual memory for jvm(Earlier it was showing error for jvm not having enough memory)
sysctl -w vm.max_map_count=262144
I have created maven package with -
mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -
DarchetypeArtifactId=storm-crawler-elasticsearch-archetype -DarchetypeVersion=LATEST
Command used for submitting topology
storm jar target/newscrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-crawler.flux --sleep 30000
when i run this command, it shows my topology is submitted sucessfully, and in elasticsearch status index it shows FETCH_ERROR and also the url from seeds.txt
content index shows no hits in elasticsearch
In worker.log file there were many exceptions of following type-
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_252]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_252]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) [stormjar.jar:?]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) [stormjar.jar:?]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) [stormjar.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
2020-06-12 10:31:14.635 c.d.s.e.p.AggregationSpout Thread-46-spout-executor[17 17] [INFO] [spout #7] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z
2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout Thread-32-spout-executor[19 19] [INFO] [spout #9] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z
2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout pool-13-thread-1 [ERROR] [spout #7] Exception with ES query
There are following logs in worker.log related to elasticsearch
'Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A1&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"}],"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"},"status":503}
'
'
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A8&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable]
{"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}
'
I have checked health of shards, they are in green status.
Earlier i was using Java 11 , with which i was not able to submit topology so i shifted to java 8.
Now topology is submitted sucessfully, but no data is injected in Elasticsearch.
I want to know if there is a problem with version imcompatibility between java and elasticsearch or with any oher package.
Use an absolute path for the seed file and run it in remote mode. The local mode should be used mostly for debugging.
The sleep parameter is (I think) in milliseconds. The command above means that the topology will run for 30 seconds only, which doesn't give it much time to do anything.
I m trying to cache a large dataset of some tables, My server is centos based with 8Go ram and 500Go disk space
I configured my local storage policy to persist and after getting a file open limit issue I tried to make to to 2 000 000 following theses steps
vi /etc/sysctl.conf
fs.file-max = 2000000 (2 million)
:wq
sysctl -p
but even using this fix
and setting the work directory on chmod -x I m still having the following error prompt
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]
class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:448)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:337)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:478)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:462)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:853)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:694)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getOrAllocatePartitionMetas(GridCacheOffheapManager.java:1679)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1507)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2137)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:429)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4261)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3407)
at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:771)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.loadEntry(GridDhtCacheAdapter.java:683)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.access$600(GridDhtCacheAdapter.java:103)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:633)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$5.apply(GridDhtCacheAdapter.java:629)
at org.apache.ignite.internal.processors.cache.store.GridCacheStoreManagerAdapter$3.apply(GridCacheStoreManagerAdapter.java:535)
at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:469)
at org.apache.ignite.cache.store.jdbc.CacheAbstractJdbcStore$1.call(CacheAbstractJdbcStore.java:433)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.<init>(AsyncFileIO.java:56)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:43)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:420)
... 23 more
Nov 24, 2019 4:54:51 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to initialize partition file: /home/grid-gain-server/gridgain-community-8.7.7/work/db/node00-3273af50-1e97-47fa-a237-29e7dfc2d987/cache-COrderCache/part-56.bin]]
what could I do to fix IT
Adding the following configuration was enough for me to avoid this exception
vi /etc/security/limits.conf
root soft nofile 10240
root hard nofile 20480
Then in /etc/sysctl.conf I appended the max watcher config
fs.inotify.max_user_watches=524288
Knowing that root is my user account name
The values are experimental I m not sure if this is safe but I hadn't any remarquable issue in my VM
I didn't drop the previous configuration
A reboot was needed
Credit to #Stephen Darlington
Just to explain what's going on here: fs.file-max sets an overall limit for the operating system. The stuff in limits.conf set limits for each user. The only other thing I would add is that if you're running Ignite as a user other than root (recommended) you'd change that users limits.
I have a configuration problem that has me stumped. I have a couple webapps that run in Tomcat and are connected and accessed through Apache httpd. I previously used Tomcat 7 and Apache 2.2, and I installed Tomcat 9 and Apache 2.4 and loaded my webapps. I read up on the configuration changes, and I thought I adjusted as needed, but for some reason only one of my two apps is accessible. That should rule a lot of things out, since the one works just fine.
I will add below my abbreviated Apache httpd config. I did adjust the Order deny,allow stuff to Require all granted in the conf file. I wonder if it's related to the JkMount directives, but this is how it worked in Apache 2.2. Could it be related to one of the webapps running as ROOT /? I do see some errors in my mod_jk.log such as:
[info] jk_open_socket::jk_connect.c (817): connect to 127.0.0.1:8010 failed (errno=61)
[info] ajp_connect_to_endpoint::jk_ajp_common.c (1068): (worker1) Failed opening socket to (127.0.0.1:8010) (errno=61)
[error] ajp_send_request::jk_ajp_common.c (1728): (worker1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=61)
[info] ajp_service::jk_ajp_common.c (2778): (worker1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1)
..
[info] ajp_service::jk_ajp_common.c (2778): (worker1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2)
[error] ajp_service::jk_ajp_common.c (2799): (worker1) connecting to tomcat failed (rc=-3, errors=1, client_errors=0).
[info] jk_handler::mod_jk.c (2995): Service error=-3 for worker=worker1
Any help is greatly appreciated!
Apache 2.4 httpd.conf
Listen 80
LoadModule ssl_module modules/mod_ssl.so
LoadModule jk_module modules/mod_jk.so
JkWorkersFile conf/workers.properties
JkShmFile "logs/mod_jk.shm"
JkLogFile "logs/mod_jk.log"
JkLogLevel info
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkMount / worker1
JkMount /* worker1
JkMount /webapp2 worker1
JkMount /webapp2/* worker1
ServerName sub.mydomain.com:80
Include conf/extra/httpd-ssl.conf
Apache 2.4 httpd-ssl.conf
Listen 443
Protocols h2 http/1.1
SSLCipherSuite ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
SSLProxyCipherSuite HIGH:MEDIUM:!MD5:!RC4:!3DES
SSLHonorCipherOrder on
SSLProtocol all -SSLv3
SSLProxyProtocol all -SSLv3
SSLPassPhraseDialog builtin
SSLSessionCache "shmcb:C:/Program Files/Apache Software Foundation/Apache24/logs/ssl_scache(512000)"
SSLSessionCacheTimeout 300
<VirtualHost *:80>
ServerName sub.mydomain.com
Redirect permanent / https://sub.mydomain.com/
</VirtualHost>
<VirtualHost _default_:443>
ServerName sub.mydomain.com:443
<Location />
Require all granted
</Location>
<Location /webapp2>
Require all granted
</Location>
SSLEngine on
SSLCertificateFile "C:/ssl/mycert.crt"
SSLCertificateKeyFile "C:/ssl/mykey.key"
SSLCertificateChainFile "C:/ssl/mycabundle.crt"
</VirtualHost>
Apache 2.4 workers.properties
worker.list=worker1
worker.worker1.type=ajp13
worker.worker1.host=localhost
worker.worker1.port=8010
Tomcat 9 server.xml
<Connector port="8010" URIEncoding="utf-8" protocol="AJP/1.3" redirectPort="8443" />
By the way, this is in Windows.
Ok I finally figured this out. I was looking in the wrong place. I tested a different way and it seemed like the Apache to Tomcat connection was actually working for the second webapp as well. The problem actually occurred in PHP code on another server trying to access a resource in this second webapp (and that is this second webapp's sole purpose). Apparently when I switched from Apache httpd 2.2 to 2.4, the method used in that remote PHP code was no longer able to successfully POST to the webapp resource and retrieve a result. The code hadn't changed at all. That made it look at first like the webapp was inaccessible. When I changed the PHP method used for POST from fsockopen()/fwrite()/fgets()/etc. to file_get_contents(), then it worked. More granular error reporting a more thorough test early on would have helped, but wow what a bugger of a problem. I never would have guessed that would be a problem and I wonder why that didn't work after the change... something else to research or perhaps another question. I don't know how to explain the errors in the mod_jk.log. Perhaps I had something wrong temporarily. But there aren't more errors currently.
If you are in Linux. You should try issuing "setenforce 0".
Then to check if it was successfull if you issue "getenforce" you should get "Permissive".
I mean All of this in the linux shell.
I went this way 2 months ago.
Installed a tomcat environment on my test server (Fedora 26). Everything is stock package. I've also installed and setup Nginx reverse proxy on the front. tomcat-users.xml is set and I can login to the app manager as expected.
Now, when I try to deploy a WAR to it, I get critical failure on my Nginx log:
2017/09/25 15:12:21 [crit] 13878#0: *36 open() "/var/lib/nginx/tmp/client_body/000000XXXX" failed (13: Permission denied), client: 200.x.x.x, server: some-sandbox.com, request: "POST /manager/html/upload?org.apache.catalina.filters.CSRF_NONCE=XXXXXXXxxxx HTTP/1.1", host: "some-sandbox.com", referrer: "https://some-sandbox.com/manager/html/upload?org.apache.catalina.filters.CSRF_NONCE=XXXXXXXxxxx
Nginx then return 500 internal server to browser.
What could I have get wrong? Any suggestion how to tackle?
Thanks.
Apparently there is some permission issue with the temporary upload folder /var/lib/nginx/tmp. I've made sure that the whole path is owned by the correct system user. But the issue still exists.
So to circumvent the issue, I decided to config Nginx to skip caching the client body at all. For my purpose, there is no practically value to cache before proxying.
Nginx 1.7.11 introduced a new proxy_request_buffering directive. If you set it to off, the buffering would be disabled. And hence any permission issue would not affect the upload.
So my server section has this:
location / {
proxy_request_buffering off;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass http://127.0.0.1:8080/;
}
You can check the user privilage on the file for current user.