I have a redis server that keeps almost 1500000 keys under a huge traffic (500 requests per second, almost 50 write operation per request), on Windows Server 2008 R2 OS. It works good with very low response times. However, when it starts the snapshot process, it keeps increasing ram usage then, couple of minutes later, goes back to normal.
Here is the screen shot when it happens :
And here is the redis snapshot configuration at conf file :
save 900 1
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
Here is the memory info from redis client :
Is this normal ? If not, how can I fix this ?
Related
I came across this issue during load test where we are seeing considerable increase in response time of the application when new pod scale up by Kubernetes HPA. The HPA we have set is for 75% CPU utlization, minimum 3 pods are already running. So for example:
As you can see the response time increase drastically, the peaks in this image are the time when new pods scale up. Even if the java application takes some time to start and warming up the JVM, the request serve almost reaches to zero for this time.
Any clue of what could be causing the issue?
Make sure that you have the correct readiness probe for your pod. Seems that the new pod getting status ready before it's ready for serving traffic
We are doing a load test on Apache Ignite.
We have 1 DB server and 1 tomcat's App server.
Both machines have this setup.
CPU Intel I7
Speed 2.6Ghz
Cores 4
Ram 16GB
Disk 500GB
Configuration ->
App server Java Heap -> Xms512m, Xmx3072m.
DB server Java Heap -> Xms512m, Xmx3072m.
DB server persistence -> true
DB server Offheap Max size -> 3072m.
Write throttling enabled.
Client failure detection timeout set to 10000ms
Failure detection timeout set to 30000ms
Query thread pool size is the default -> 8
Scenario ->
Via the tomcat App server, I have started 500 threads that run a business logic to set and get data from Ignite. Code-wise, there is semaphore locking for cache access and threads usually are being in blocked state as other threads are using the resources. After running for say 3-4 hours, the App server has thrown a warning mentioned below.
"org.apache.ignite.logger.java.JavaLogger" "warning" "WARNING" "" "294" "Communication SPI session write timed out (consider increasing 'socketWriteTimeout' configuration property) [remoteAddr=xxxxx/xxxxx:47100, writeTimeout=2000]" "" "" "" "" "" "" "" "" "" "" "" "" "" "ROOT" "{""service"":""xxxx"",""logger_name"":""org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi""}"
I've also seen quite a few "Possible too long JVM" prints before that print showing around 500ms to 1000ms delay.
After the execption is thrown, few minutes later, the client got disconnected and queries thrown this error ->
org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Query was canceled, client node disconnected.
While it was running fine, I enabled JProfiler just to see how it's running in App server JVM and I could see a lot of threads being in "Blocked" state. And since it's 4 core machine, I can see a max of 12-15 app server threads being executed at a time (using logical cores). And then quit the profiler and let it run for 2-3 hours until the exception occurred.
Although in real-time, we won't spawn those many threads, and in production, we will have 100s of cores on Servers, it's important for us to understand how we can set up a deployment that will scale up to meet the need of spawning many threads.
Can someone please explain?
Sounds like long GC causing client node to segment. What's the longest "Possible too long JVM pause" message? Have you tried increasing failureDetectionTimeout? Decreasing heap size of client nodes?
I am caching my serialized POJO s(4mb-8mb size objects) concurrently into Couchbase server with CouchBase client (couchbase-client-1.4.3).
for(upto 20 itertarions){
new Thread().start().. //this thread cache the objects
Thread.sleep(500); // the less sleep time, the more cache failures :(
}
I have 2 replicated servers. The client can cach small size objects, but when the object size increases, it throws exceptions.
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: 192.168.0.1/192.168.0.2:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:167)
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:140)`
I found similar questions and answers. How ever, I am not a position to upgrade my memory as the applications which use the couchbase client have their concerns of memory. How ever I tried adding JVM arguments such as -XX:+UseConcMarkSweepGC -XX:MaxGCPauseMillis=500
This is how I create couchBase cache client
CouchbaseConnectionFactoryBuilder cfb = new CouchbaseConnectionFactoryBuilder();
cfb.setFailureMode(FailureMode.Retry);
cfb.setMaxReconnectDelay(5000);
cfb.setOpTimeout(15000);
cfb.setOpQueueMaxBlockTime(10000);
client=new CouchbaseClient(cfb.buildCouchbaseConnection(
uris,BUCKET_TYPE,PASSWORD));
I tried with maximum time gaps to make caching successful and avoid time outs.But it also doesn't work.In our real live applications, usual 7 or 8 caches can happen within a second. The applications cannot hold the process until the cache happens successfully.(it if waits, then caching is useless because of its time consumption. Getting direct Database is always cheaper!!!)
Pleas any one let me know how can improve my couchbase client(since I have limitations of hardware and JVM limitations, I am looking a way to improve the client) to avoid such time outs and improve the performance? Cannot I do serializations compressions out of the couchbase client and do it myself ?
Updated: My COUCHBASE Setup.
- I am caching serialized object which has 5 to 10mb sizes.
-I have 2 nodes in difference machines.
-Each pc has 4GB RAM. CPUs are: one pc is 2 cores and other 4 cores. (Is it not enough ? )
-The client application runs in the pc which has 4 cores.
-I just configured a LAN for this testing.
-Both OS are ubuntu 14. one pc is 32 bit another one 64bit.
-The couchbase version is Latest Community Edition couchbase-server-community_2.2.0_x86_64 .(Is this buggy? :( )
- Couchbase client Couchbase-Java-Client-1.4.3
- There are 100 threads starts with a 500ms gap. Each thread cache into CB.
Also, I checked the System monitoring. The PC which the CB node and the client runs shows more CPU and RAM usage but, the other replicated PC (less hardware features) does not show much hardware usage and it is normal.
EDIT: Can this happens because of the client side issue or the CB server ? Any idea please ?
When I send about 100 users to my web service, I get response and web service performs fine, but when I check for 1000 concurrent users none of the requests get reply.
I am using jmeter for testing.
When I send 1000 concurrent users my glassfish admin panel goes time out in browser and it opens after 4-5 minutes only.Same happen for wsdl URL.
I have tested my web service on our LAN and it works for 2000 queries without any issues.
Please help me find a solution.
Edit 1.0
Some more findings
Hi on your recommendation, what I did is that I simply returned string on web service function call, no lookup, no dao, nothing... just returning a string
Thread pool is 2000 no issues on that.
Now when I ran jmeter for 1000 users they run much fast and returned response for ~200 requests
So this means that my PC running Windows 7 with an i5 processor and 4GB RAM is out performing dedicated server of hostgator having 4GB RAM with xeon 5*** 8 cores :(
This is not for what am paying 220$ a month....
Correct me if my finding is wrong, I tested my app on lan b/w two pc's locally and it can process 2000+msgs smoothly
Edit 1.1
After lot of reading,and practicals I have come to a conclusion that it is network latency which is responsible for such a behavior.
I increased bean pool size in glassfish's admin panel and it helped improving number of concurrent users to 300, but issue arise again no matter how much beans I keep in pool.
So friends question is: please suggest some other settings which I can change in Glassfish's admin panel to remove this issue from root!
You need to add some performance logging for the various steps that your service performs. Does it do multiple steps? Is computation slow? Database access slow? Your connection pool not scale well? Do things need to be tweaked in the web server to allow for such high concurrency? You'll need to measure these things to find the bottlenecks so you can eliminate them.
I had the same problem in a server (with 200+ simultaneously users), I studied the official glassfish tuning guide but there is a parameter very important that doesn't appear. I used Jmeter too and in my case the time response increases exponentially but the server's processor stay low.
In the glassfish admin server (configurations/server-config/Network config/thread pools/http-thread-pool) you can see how many users you server can handle. (The parameters are different in glassfish 2 and 3).
Max Queue Size: The maximum number of threads in the queue. A value of –1 indicates that there is no limit to the queue size.
Max Thread Pool Size: The maximum number of threads in the thread pool
Min Thread Pool Size: The minimum number of threads in the thread pool
Idle Thread Timeout: The maximum amount of time that a thread can remain idle in the pool. After this time expires, the thread is removed from the pool.
I recommend you to set Max Thread Pool Size to 100 or 200 to fix the problem.
Also you can set another JMV variables, for example:
-Xmx/s/m
-server
-XX:ParallelGCThreads
-XX:+AggressiveOpts
I hope it helps.
I created a web service both client and server. I thought of doing the performance testing. I tried jmeter with a sample test plan to execute it. Upto 3000 request jboss processed the request but when requests more than 3000 some of the request are not processed (In sense of Can't open connection : Connection refused). Where i have to make the changes to handle more than 10000 request at the same time. Either it's a jboss issue or System Throughput ?
jmeter Config : 300 Threads, 1 sec ramp up and 10 loop ups.
System (Server Config) : Windows 7, 4G RAM
Where i have to make the changes to handle more than 10000 request at the same time
10 thousand concurrent requests in Tomcat (I believe it is used in JBoss) is quite a lot. In typical setup (with blocking IO connector) you need one thread per one HTTP connection. This is way too much for ordinary JVM. On a 64-bit server machine one thread needs 1 MiB (check out -Xss parameter). And you only have 4 GiB.
Moreover, the number of context switches will kill your performance. You would need hundreds of cores to effectively handle all these connections. And if your request is I/O or database bound - you'll see a bottleneck elsewhere.
That being said you need a different approach. Either try out non-blocking I/O or asynchronous servlets (since 3.0) or... scale out. By default Tomcat can handle 100-200 concurrent connections (reasonable default) and a similar amount of connections are queued. Everything above that is rejected and you are probably experiencing that.
See also
Advanced IO and Tomcat
Asynchronous Support in Servlet 3.0
There are two common problems that I think of.
First, if you run JBoss on Linux as a normal user, you can run into 'Too many open files', if you did not edit the limits.conf file. See https://community.jboss.org/thread/155699. Each open socket counts as an 'open file' for Linux, so the OS could block your connections because of this.
Second, the maximum threadpool size for incoming connections is 200 by default. This limits the number of concurrent requests, i.e. requests that are in progress at the same time. If you have jmeter doing 300 threads, the jboss connector threadpool should be larger. You can find this in jboss6 in the jboss-web.sar/server.xml. Look for 'maxThreads' in the element: http://docs.jboss.org/jbossweb/latest/config/http.html.
200 is the recommended maximum for a single core CPU. Above that, the context switches start to give too much overhead, like Tomasz says. So for production use, only increase to 400 on a dual core, 800 on a quad core, etc.