Does Java max out at around 4Gbps per JVM instance? - java

Wowza is causing us troubles, not scaling past 6k concurrent users, sometimes freezing on a few hundred. It crashes and starts killing sessions. We step in to restart Wowza multiple times per streaming event.
Our server specs:
DL380 Gen10
2x Intel Xeon Silver 4110 / 2.1 GHz
64 GB RAM
300 GB HDD
The network 10 GB dedicated
Some servers running Centos 6, others Centos 7
Java version 1.8.0_20 64-bit
Wowza streaming engine 4.2.0
I asked and was told:
Wowza scales easily to millions if you put a CDN in front of it (which
is trivially easy to do). 6K users off a single instance simply ain’t
happening. For one, Java maxes out at around 4Gbps per JVM instance.
So even if you have a 10G NIC on the machine, you’ll want to run
multiple instances if you want to use the full bandwidth.
And:
How many 720 streams can you do on a 10gb network # 2mbps?
Without network overhead, it’s about 5,000
With the limitation of java at 4gbps, it’s only 2,000 per instance.
Then if you do manage to utilize that 10Gb network and saturate it,
what happens to all other applications people are accessing on other
servers?
If they want more streams, they need edge servers in multiple data
centers or have to somehow to get more 10Gb networks installed.
That’s for streaming only. No idea what transcoding would add in terms
of CPU load and disk IO.
So I began looking for an alternative to Wowza. Due to the nature of our business, we can't use CDN or cloud hosting except with very few clients. Everything should be hosted in-house, in the client's datacenter.
I found this article and reached out to the author to ask him about Flussonic and how it compares to Wowza. He said:
I can't speak to the 4 Gbps limit that you're seeing in Java. It's
also possible that your Wowza instance is configured incorrectly. We'd
need to look at your configuration parameters to see what's happening.
We've had great success scaling both Wowza and Flussonic media servers
by pairing them with our peer-to-peer (p2p) CDN service. That was the
whole point of the article we wrote in 2016
Because we reduce the number of HTTP requests that the server has to
handle by up to 90% (or more), we increase the capacity of each server
by 10x - meaning each server can handle 10x the number of concurrent
viewers.
Even on the Wowza forum, some people say that Java maxes out at around 5Gbps per JVM instance. Others say that this number is incorrect, or made up.
Due to the nature of our business, this number, as silly as it is, means so much for us. If Java cannot handle more than 7k viewers per instance, we need to hold meetings and discuss what to do with Wowza.
So is it true that Java maxes out at around 4Gbps or 5Gbps per JVM instance?

Related

Cluster gets into state where members restart repeatedly and clients cannot update the data in the cluster

We've been using Hazelcast for a number of years but I'm new to the group.
We have a cluster formed by a dedicated Java application (it's sole purpose is to provide the cluster). It's using the 3.8.2 jars and running JDK 1.8.0_192 on Linux (Centos 7).
The cluster manages relatively static data (ie. a few updates a day/week). Although an update may involve changing a 2MB chunk of data. We're using the default sharding config with 271 shards across 6 cluster members. There are between 40 and 80 clients. Each client connection should be long-lived and stable.
"Occasionally" we get into a situation where the Java app that's providing the cluster repeatedly restarts and any client that attempts to write to the cluster is unable to do so. We've had issues in the past where the cluster app runs out of memory due to limits on the JVM command line. We've previously increased these and (to the best of my knowledge) the process restarts are no longer caused by OutOfMemory exceptions.
I'm aware we're running a very old version and many people will suggest simply updating. This is work we will carry out but we're attempting to diagnose the existing issue with the system we have in front of us.
What I'm looking for here is any suggestions regarding types of investigation to carry out, queries to run (either periodically when the system is healthy or during the time when it is in this failed state).
We use tools such as: netstat, tcpdump, wireshark and top regularly (I'm sure there are more) when diagnosing issues such as this but have been unable to establish a convincing root cause of this issue.
Any help greatly appreciated.
Thanks,
Dave
As per the problem description.
Our only way to resolve the issue is to bounce the cluster completely - ie. stop all the members and then restart the cluster.
Ideally we'd have a system to remained stable and could recover from whatever "event" causes the issue we're seeing.
This may involve config or code changes.
Updating entries the size of 2MBs has many consequences - large serialization/deserialization costs, fat packets in the network, cost of accommodating those chunks in JVM heap etc. An ideal entry size is < 30-40KB.
To your immediate problem, start with GC diagnosis. You can use jstat to investigate memory usage patterns. If you are running into lot of full GCs and/or back-to-back full GCs then you will need to adjust heap settings. Also check the network bandwidth, which is usually the prime suspect in the cases of fat packets traveling through the network.
All of the above are just band-aid solutions, you should really look to break your entries down to smaller entries.

How to estimate the authntiaction time

I designed an authentication protocol using java. The average excution time for a single authentication on my Desktop computer is 2.87 ms. My computer has the following specification. Windows 10 with a 1.99 GHz Intel Core i7 and 8GB of RAM.
If a number of users say 10 users preform the authentication simultaneously. What is the total computational time. Can I just say (2.87*10)?
A typical Core i7 CPU has between 4 and 8 cores, and can execute 6..12 threads in parallel (# cores times hyperthreading).
Assuming
you have an i7-8650U (1.9GHz, 4 cores, hyper-threaded),
your Java server is multithreaded (which should be the case if you use any popular implementation like Tomcat, Jetty, or alike) and
there is no other CPU-intensive workloads running
You can say your server can handle 8 authentication requests simultaneously: 4 at full speed + 4 more at ~30% speed because of hyper-threading, or 2.87ms*1.3=3.73ms for every 8 users.
10 users will thus take around ~ 3.73ms+2.87ms = 6.6ms.
What's important when measuring Java performance, however, is to measure steady state under load, in order to take garbage collection overhead into account. When measuring a single request, you may often miss the garbage collection step entirely.
You can't say that the time will be multiplied by the number of users.
First of all because it would mean that your authentication mechanisms works on 1 threads, which would be terrible.
Then the time registered on your computer will for sure be different on the production or staging environment. You will have to add network hops, but on the others end servers that would do only this, but you cannot compare this time to the one on your personal computer.
If you want to test this kind of things, use performance/load testing tools. I can recommend JMeter and open source tool from the Apache foundation, or Gatling another open source tool for this.
They are designed to be used on such use case. With them you could then call your authentication API, for example with 100 users in 10 seconds, and you would see reports in the end which will give you your answer.

Apache Solr handle hundreds of thousands requests

We have a small search app in local context. For back services, we are using Apache Solr 6.6.2 for data index and storage. The front-end is in PHP with Apache2 webserver.
We have a server of 48 core and 96 GB RAM where these services are installed. The expected size of documents in index in about 200 Million and each document can have maximum 20 fields. Most fields are both indexed and stored.
The expected simultaneous requests can be hundreds of thousands at a time. So what will be the best configuration of Apache Solr to handle it? We have started Solr with 20 GB RAM and stress tested but it start to degrade performance near 100 users. Where is the problem? What is the optimal way for this issue.
We have also tested Solr in SolrCloud mode but the performance does not improve too much. We were expecting that if there will be some memory problem that their will be OOM exception but did not happen anything like that. We have just changed schema according to our requirement and change memory via command-line. All other setting are default.
Following are few references that we have consulted already
https://wiki.apache.org/solr/SolrPerformanceProblems
https://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
We have 200 million records in each Collection and we have 200 collections. We have 5 servers and each server has 8 cores and 64 gb ram.
I would suggest you break up your servers into multiple servers.
Do the replication of the data on each server so that requests gets divided into multiple servers. The more the number of servers, the more you'll be able to responsd quicker.
Note : Just understand the replication factor : 2F+1 formula where if you have 5 servers then 3 replicas atleast should be there. I'll suggest you to go with 5 replicas only (1 replica for each server)
If you plan to handle hundreds of thousands of requests per second you will need more than one server - no matter how big it is. Even if it's just for HA / DR purposes. So I would recommend using SolrCloud and sharding the index across multiple machines and with multiple replicas just so start.
Beyond that the devil is in the details
How fast do you want queries to perform (median and 99%ile). This will help you size CPU and memory needs.
How complex are your queries?
Are you using filters? (Requires more heap memory)
How fast is your disk acccess?
Will you be adding data in real time (impacting your setting of autoCommit and soft commit
But first and foremost you need to get away from "one big box" thinking.

Increase Solr search concurrency

Short story: I am not able to run more than 2 simultaneous searches on solr5 (same story with 4.10) from the same client process. Is there any flag in configuration file I missed? It's a proven fact it's not a hardware problem or software (client) problem. See below for the full story.
Long story:
I need to build a word-based search engine (fields contain in general only one word/value - even if it is a multi-value field, all values will be only one word) and 60-70% of the searches are without wildcards. The expected core size is around 50K documents with an average of 20 fields. The collection is expected to be updated around one time per week (probably even less) - so I don't really care about indexing time. I guess we can safely assume there will be no write, just read - therefore, we can minimize probability of locks and other concurrency issues. Also, the most "expensive" query in my test is (as per solr's qtime) around 150. I have a batch of 10K radomly generated searches and no matter what I am doing, I am not able to finish them in less than 5 minutes. No matter how many threads I am opening on client side, no matter what value I set in configuration files ... and the processor is around 30-40% tops, with only 30% memory;
What I have tried:
solr5 + jetty on a single-core virtual machine with 3GB RAM;
solr5 + jetty on a dual-core virtual machine with 6GB RAM (4GB for java);
solr5 + tomcat6 on a dual-core virtual machine with 6GB RAM;
using netstat -a -n | grep #port for #1 and #2 I only saw 2 active connections (ESTABLISHED) at any given time - but no more, and for #3 I had beside those 2 active connections other 10-15 in TIME_WAIT mode (not active).
I am somehow lost in this ... I am not a Java ninja and I am not savy with the java-related products and their configuration. I used 2 different servlet-containers with pretty much the same problem. IMO, it's obvious that someone throttles the active connections - and I don't know what to do to find out who and why.
As a side note - I am not sure if it is important or not - I copied the same tool on another machine, started the "stress" test at the same time with the one on my machine and I noticed that the number of active connections is doubled (via netstat), the resources are only a little bit higher than in single-machine-test and the execution time is identic of both machines: 5 minutes.
So, what should I do to remove this limit - or at least to increase it?
As usual, the problem lies between the chair and the keyboard. :(
The client was done in C# using the plain old WebRequest class - which obeys to system limit of concurrent HTTP calls made to the same address (to avoid DOS).
After reading this article, I realized where the problem was. So, the following tweak in app.config solved the issue:
<system.net>
<connectionManagement>
<add address = "*" maxconnection = "300" />
</connectionManagement>
</system.net>
It finished all those requests in around one minute with 16 opened threads. Active connections were also visible in netstats.

Can I use JVM to implementing high loaded TCP/IP single-machine server? What about GC settings? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I’m trying to implement high loaded TCP/IP single-machine server.
I’ve several limitations:
Server must support up to 8 connections.
Each connection must receive up to 25 Megabytes per second. I think average summary speed of all connections - 90% of time - will be about 100 Megabytes per second, but system must stable work in worst case too.
I need to extract messages from network input and split it to messages (my classes). Messages vary from hundreds bytes to 10-15 Megabytes. Messages are very simple: several fields and – in case of big messages - array of bytes.
I need to register (write to a very big plain file) data from each connection. It’s guaranteed that disk system can write at such speed. I’ll use several enterprise-class SSDs.
There is process which processes data after writing to disk. After some time data will be deleted. I.e. I need about 50% of processor resources.
Server must work 24*7*365.
Server must work on commodity-class hardware (8Gb RAM, i7 (i5 preferable) processor). My project have restrictions with size of hardware (very small box, low heat and low power consumption) and price. Unfortunately, I can’t change it if you propose.
Is here anyone who implemented high loaded systems on commodity software with JVM? As I know, operations system must bufferize network input, so GC delays don’t matter in such situation, isn’t it? What can I read about JVM (GC)?
Meybe it’s not high load system. You are welcome to discuss
Language / Runtime is the least of your worries:
You aren't going to get 200MB a second over a single network interface, not even a 1Gb one. You will need at least 2 1Gb network interfaces bonded on both ends to be able to push anything near that speed through commodity hardware. 100MB per second isn't even feasible on a 1Gb interface. That would be approximately 0.8Gb a second. Well over the real world sustained rate of about 0.6Gb I see on a single connection that has jumbo frames enabled even. And then you have no overhead for bursts.
This is I/O bound at the hardware level, software is the least of your worries. And these interfaces need to support Jumbo Frames on both ends and all the switches, routers and other hardware in between.
Ethernet Maximum Rates, Generation, Capturing & Monitoring
Software:
What you propose can be written in any reasonably peformance high level language. Java, Python, Ruby, Erlang or .Net would all be capable. Your hardware constraints are what you are not going to be able to over come.
Hardware:
I would say you are going to be hard pressed to get that kind of throughput on a single commodity hardware machine. Regardless of the class of SSD, a dedicated RAID controller is most likely the only way you are going to get the I/O you want.
Server must support up to 8 connections.
I can't imagine how this would be a problem. You can handle thousands, if not tens of thousands of connection with Java.
Each connection must receive up to 25 Megabytes per second. I think average summary speed of all connections - 90% of time - will be about 100 Megabytes per second, but system must stable work in worst case too.
This means you need at least a 1 Gb/s connection. Java can handle this rate with one thread.
Note: 100 MB/s is about 8 TB/day. You will have to find way to minimise disk usage if you plan to use SSD.
Server must work on commodity-class hardware (8Gb RAM, i7 (i5 preferable) processor). My project have restrictions with size of hardware (very small box, low heat and low power consumption) and price. Unfortunately, I can’t change it if you propose.
I have no idea why you would buy multiple enterprise SSD and very buy on 8 GB (I assume you mean Giga-byte not Gb = Giga-bit) My 8 year son had 8 GB for his games PC 18 months ago. Not sure why you would by an i5 for a performance system either. I would get an i7 if not a Xeon. Not because it can't be done using i5, but because you will cost far more in development time than you will save in hardware.
The rest seems entirely achievable.
Is here anyone who implemented high loaded systems on commodity software with JVM?
Many have, it depends on what you are trying to do.
As I know, operations system must bufferize network input, so GC delays don’t matter in such situation, isn’t it?
Network buffers don't make any difference to GCs. They can help if you have very bursty behaviour, but if you only need 1 Gb/s I wouldn't worry about it so much.
The most important thing to do is to manage your allocaton rate by reducing usage. A memory profiler can help you do that.
What can I read about JVM (GC)?
I spend your time reducing the need for GC, esp since you have so little memory. Most servers these days have well over 32 GB e.g. 256 GB to 4 TB. It sounds like your main problem will be relatively high data rates for a small amount of memory. This is fine if the work you do on each message is simple, but without more information, I would start with at least 64 GB.

Categories