We have a small search app in local context. For back services, we are using Apache Solr 6.6.2 for data index and storage. The front-end is in PHP with Apache2 webserver.
We have a server of 48 core and 96 GB RAM where these services are installed. The expected size of documents in index in about 200 Million and each document can have maximum 20 fields. Most fields are both indexed and stored.
The expected simultaneous requests can be hundreds of thousands at a time. So what will be the best configuration of Apache Solr to handle it? We have started Solr with 20 GB RAM and stress tested but it start to degrade performance near 100 users. Where is the problem? What is the optimal way for this issue.
We have also tested Solr in SolrCloud mode but the performance does not improve too much. We were expecting that if there will be some memory problem that their will be OOM exception but did not happen anything like that. We have just changed schema according to our requirement and change memory via command-line. All other setting are default.
Following are few references that we have consulted already
https://wiki.apache.org/solr/SolrPerformanceProblems
https://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
We have 200 million records in each Collection and we have 200 collections. We have 5 servers and each server has 8 cores and 64 gb ram.
I would suggest you break up your servers into multiple servers.
Do the replication of the data on each server so that requests gets divided into multiple servers. The more the number of servers, the more you'll be able to responsd quicker.
Note : Just understand the replication factor : 2F+1 formula where if you have 5 servers then 3 replicas atleast should be there. I'll suggest you to go with 5 replicas only (1 replica for each server)
If you plan to handle hundreds of thousands of requests per second you will need more than one server - no matter how big it is. Even if it's just for HA / DR purposes. So I would recommend using SolrCloud and sharding the index across multiple machines and with multiple replicas just so start.
Beyond that the devil is in the details
How fast do you want queries to perform (median and 99%ile). This will help you size CPU and memory needs.
How complex are your queries?
Are you using filters? (Requires more heap memory)
How fast is your disk acccess?
Will you be adding data in real time (impacting your setting of autoCommit and soft commit
But first and foremost you need to get away from "one big box" thinking.
Related
We've been using Hazelcast for a number of years but I'm new to the group.
We have a cluster formed by a dedicated Java application (it's sole purpose is to provide the cluster). It's using the 3.8.2 jars and running JDK 1.8.0_192 on Linux (Centos 7).
The cluster manages relatively static data (ie. a few updates a day/week). Although an update may involve changing a 2MB chunk of data. We're using the default sharding config with 271 shards across 6 cluster members. There are between 40 and 80 clients. Each client connection should be long-lived and stable.
"Occasionally" we get into a situation where the Java app that's providing the cluster repeatedly restarts and any client that attempts to write to the cluster is unable to do so. We've had issues in the past where the cluster app runs out of memory due to limits on the JVM command line. We've previously increased these and (to the best of my knowledge) the process restarts are no longer caused by OutOfMemory exceptions.
I'm aware we're running a very old version and many people will suggest simply updating. This is work we will carry out but we're attempting to diagnose the existing issue with the system we have in front of us.
What I'm looking for here is any suggestions regarding types of investigation to carry out, queries to run (either periodically when the system is healthy or during the time when it is in this failed state).
We use tools such as: netstat, tcpdump, wireshark and top regularly (I'm sure there are more) when diagnosing issues such as this but have been unable to establish a convincing root cause of this issue.
Any help greatly appreciated.
Thanks,
Dave
As per the problem description.
Our only way to resolve the issue is to bounce the cluster completely - ie. stop all the members and then restart the cluster.
Ideally we'd have a system to remained stable and could recover from whatever "event" causes the issue we're seeing.
This may involve config or code changes.
Updating entries the size of 2MBs has many consequences - large serialization/deserialization costs, fat packets in the network, cost of accommodating those chunks in JVM heap etc. An ideal entry size is < 30-40KB.
To your immediate problem, start with GC diagnosis. You can use jstat to investigate memory usage patterns. If you are running into lot of full GCs and/or back-to-back full GCs then you will need to adjust heap settings. Also check the network bandwidth, which is usually the prime suspect in the cases of fat packets traveling through the network.
All of the above are just band-aid solutions, you should really look to break your entries down to smaller entries.
Wowza is causing us troubles, not scaling past 6k concurrent users, sometimes freezing on a few hundred. It crashes and starts killing sessions. We step in to restart Wowza multiple times per streaming event.
Our server specs:
DL380 Gen10
2x Intel Xeon Silver 4110 / 2.1 GHz
64 GB RAM
300 GB HDD
The network 10 GB dedicated
Some servers running Centos 6, others Centos 7
Java version 1.8.0_20 64-bit
Wowza streaming engine 4.2.0
I asked and was told:
Wowza scales easily to millions if you put a CDN in front of it (which
is trivially easy to do). 6K users off a single instance simply ain’t
happening. For one, Java maxes out at around 4Gbps per JVM instance.
So even if you have a 10G NIC on the machine, you’ll want to run
multiple instances if you want to use the full bandwidth.
And:
How many 720 streams can you do on a 10gb network # 2mbps?
Without network overhead, it’s about 5,000
With the limitation of java at 4gbps, it’s only 2,000 per instance.
Then if you do manage to utilize that 10Gb network and saturate it,
what happens to all other applications people are accessing on other
servers?
If they want more streams, they need edge servers in multiple data
centers or have to somehow to get more 10Gb networks installed.
That’s for streaming only. No idea what transcoding would add in terms
of CPU load and disk IO.
So I began looking for an alternative to Wowza. Due to the nature of our business, we can't use CDN or cloud hosting except with very few clients. Everything should be hosted in-house, in the client's datacenter.
I found this article and reached out to the author to ask him about Flussonic and how it compares to Wowza. He said:
I can't speak to the 4 Gbps limit that you're seeing in Java. It's
also possible that your Wowza instance is configured incorrectly. We'd
need to look at your configuration parameters to see what's happening.
We've had great success scaling both Wowza and Flussonic media servers
by pairing them with our peer-to-peer (p2p) CDN service. That was the
whole point of the article we wrote in 2016
Because we reduce the number of HTTP requests that the server has to
handle by up to 90% (or more), we increase the capacity of each server
by 10x - meaning each server can handle 10x the number of concurrent
viewers.
Even on the Wowza forum, some people say that Java maxes out at around 5Gbps per JVM instance. Others say that this number is incorrect, or made up.
Due to the nature of our business, this number, as silly as it is, means so much for us. If Java cannot handle more than 7k viewers per instance, we need to hold meetings and discuss what to do with Wowza.
So is it true that Java maxes out at around 4Gbps or 5Gbps per JVM instance?
We are evaluating Apache Ignite for our product. In our scenario, we may have 10000 caches, and I have a try in the yardstick benchmark framework. I find that when the cache numbers climb to 8192, the Ignite server became abnormal. The case is expected to be finished after 1 minute since I have set the duration in the configuration, but the test keep running in 10 minutes long and I have to kill the test.
If I set the cache number to 4096, the test finished in 1 minute as expected.
So the question: Does Apache Ignite support 10 thousands of cache?
One cache will use around 20M heap for its data structures (per node). Multiple that by 10000 and you have 200G right here. In practice Java will not work with that much heap.
Why do you need 10,000 caches anyway? Please consider at least using Cache Groups. The best approach will be having a few caches and routing between them.
I am developing a web application in Scala. Its a simple application which will take data on a port from clients (JSON or ProtoBufs) and do some computation using a database server and then reply the client with a JSON / Protobuf object.
Its not a very heavy application. 1000 lines of code max. It will create a thread on every client request. The time it takes right now between getting the request and replying back is between 20 - 40ms.
I need an advice on what kind of hardware / setup should i use to serve 3000+ such requests per second. I need to procure hardware to put at my data center.
Anybody who has some experience deploying java apps at scale, please advice. Should i use one big box with 2 - 4 Xeon 5500s with 32 GB RAMs or multiple smaller machines.
UPDATE - we dont have many clients. 3 - 4 of them. Requests will be from these 3 of them.
If each request takes on average 30 ms, a single core can handle only 30 requests per second. Supposing that your app scales linearly (the best scenario you can expect), then you will need at least 100 cores to reach 3000 req/s. Which is more than 2-4 Xeon.
Worst, if you app relies on IO or on DB (like most useful applications), you will get a sublinear scaling and you may need a lot more...
So the first thing to do is to analyze and optimize the application. Here are a few tips:
Creating a thread is expensive, try to create a limited number of threads and reuse them among requests (in java see ExecutorService for example).
If you app is IO-intensive: try to reduce IO calls as much a possible, using a cache in memory and give a try to non-blocking IO.
If you app is dependent of a database, consider caching and try a distributed solution if possible.
I'm trying to speed test jetty (to compare it with using apache) for serving dynamic content.
I'm testing this using three client threads requesting again as soon as a response comes back.
These are running on a local box (OSX 10.5.8 mac book pro). Apache is pretty much straight out of the box (XAMPP distribution) and I've tested Jetty 7.0.2 and 7.1.6
Apache is giving my spikey times : response times upto 2000ms, but an average of 50ms, and if you remove the spikes (about 2%) the average is 10ms per call. (This was to a PHP hello world page)
Jetty is giving me no spikes, but response times of about 200ms.
This was calling to the localhost:8080/hello/ that is distributed with jetty, and starting jetty with java -jar start.jar.
This seems slow to me, and I'm wondering if its just me doing something wrong.
Any sugestions on how to get better numbers out of Jetty would be appreciated.
Thanks
Well, since I am successfully running a site with some traffic on Jetty, I was pretty surprised by your observation.
So I just tried your test. With the same result.
So I decompiled the Hello Servlet which comes with Jetty. And I had to laugh - it really includes following line:
Thread.sleep(200L);
You can see for yourself.
My own experience with Jetty performance: I ran multi threaded load tests on my real-world app where I had a throughput of about 1000 requests per second on my dev workstation...
Note also that your speed test is really just a latency test, which is fine so long as you know what you are measuring. But Jetty does trade off latency for throughput, so often there are servers with lower latency, but also lower throughput as well.
Realistic traffic for a webserver is not 3 very busy connections - 1 browser will open 6 connections, so that represents half a user. More realistic traffic is many hundreds or thousands of connections, each of them mostly idle.
Have a read of my blogs on this subject:
https://webtide.com/truth-in-benchmarking/
and
https://webtide.com/lies-damned-lies-and-benchmarks-2/
You should definitely check it with profiler. Here are instructions how to setup remote profiling with Jetty:
http://sujitpal.sys-con.com/node/508048/mobile
Speedup or performance tune any application or server is really hard to get done in my experience. You'll need to benchmark several times with different work models to define what your peak load is. Once you define the peak load for the configuration/environment mixture you need to tune and benchmark, you might have to run 5+ iterations of your benchmark. Check the configuration of both apache/jetty in terms of number of working threads to process the request and get them to match if possible. Here are some recommendations:
Consider the differences of the two environments (GC in jetty, consider tuning you min and max memory threshold to the same size and later proceed to execute your test)
The load should come from another box. If you don't have a second box/PC/server take your CPU/core into count and setup your the test to a specific CPU, do the same for jetty/apache.
This is given that you cant get another machine to be the stress agent.
Run several workload model
Moving to modeling the test do the following 2 stages:
One Thread for each configuration for 30 minutes.
Start with 1 thread and going up to 5 with a 10 minutes interval to increase the count,
Base on the metrics Stage 2 define a number of threads for the test. and run that number of thread concurrent for 1 hour.
Correlate the metrics (response times) from your testing app to the server hosting the application resources (use sar, top and other unix commands to track cpu and memory), some other process might be impacting you app. (memory is relevant for apache jetty will be constraint to the JVM memory configuration so it should not change the memory usage once the server is up and running)
Be aware of the Hotspot Compiler.
Methods have to be called several times (1000 times ?), before the are compiled into native code.