Java Chat application bandwidth usage? - java

I've tried to look around for data concerning how much of a bandwidth hog a chat application is.
In this case maybe with a Java/AJAX implementation or simply just Java, using Server/Client relationship.
I want to find out, how much bandwidth such a system would use when it's written in Java. The benchmark could be 15-20 users from all over the world and peaking at maybe 8 or 10 max connected at a time. I know it might seem vague, but I simply can't seem to find data on this specific situation.
Can anyone point me to some resources regarding this? Or chip in if possible?

Unless the chat application is sending photos or files, it will use a trivial amount of data. With a max user count of ten people at once you could wrap the messages in a bandwidth hog of xml and I would still stick with my answer: it will use a trivial amount of bandwidth.
Say all ten of your users are fast typers and very chatty. They type non-stop at 100 words per minute. Break that down to 10 sentences per minute and wrap each of these in a message to the server. Add some XML data describing who the message came from and whether it is private to another user or sent to a group of users and maybe you could get 1K per message. So each user is then sending 1K to the server every 6 seconds. With 10 users, we get 10K sent to the server every 6 seconds.
So by my estimate, we could connect your server to a 56K modem from 1995 and you'll be fine.

The reason you can't find data about this is because there's nothing particularly Java- or AJAX-related here. Bandwidth usage depends on the data you send/receive over the network, and therefore is dependent upon the protocol that you design to pass data around; it has nothing to do with whether you use Java only, or AJAX in combination of Java, or CGI scripts, PL/I or Assembler.
You can code a chat application in Assembler that will be a worse bandwidth hog than a chat application coded in Java.
In order to know your bandwidth impact, you need to analyze your data model, data flow and your overall communication protocol: namely, what data is being sent, in what structure, and how frequently.

Related

Need advice, Saving messages from JMS Queue to Hadoop Hbase a good solution?

I'm new to the Hadoop world and I've been tasked to research solutions to ingest data from our current JMS Queues into our Hadoop cluster.
So far on my quest to becoming a data ingestion expert... I've scoured the web going through books and tutorials for a couple of weeks now. I've managed to write a simple Java Service which listens to one of our Queues and simply write the incoming messages to an HBase HTable.
After completing this proof of concept I have a couple of questions I would like ask the community/Hadoop/Hbase/data ingestion experts. Before I ask let me describe a little bit of my scenario/scope.
We receive approximately 30,000 messages per day from our JMS Queue
These messages are JSON objects which can range anywhere from 1 MB to 20 MB each
Needs to be near real time
We would like to continuously save these messages into Hadoop for future analytics and historical reference for years to come
We don't need to parse the incoming messages, just store them. (Current line of thinking is to write another Service which will parse these messages and save them into proper schema later. Reason = No bottlenecking during message ingestion)
With my "proof of concept" Java Service, it works, but I don't know if this solution is the best for my case scenario especially in a Production environment.
Is this a good approach/solution for my case scenario?
If not, what other technologies would be a good fit for what I'm trying to do?
Is using HBase for this overkill?
Saving up to 20 MB in a single cell a good idea? Especially if we plan to continuously append messages to this table with no purging?
Appreciate any input, thanks!
Is this a good approach/solution for my case scenario?
If not, what other technologies would be a good fit for what I'm trying to do?
Flume can be another option. It provides jms source and HBase/Hive sinks.
Is using HBase for this overkill?
May not be, if the number of messages run into a large number over a period of time based on your daily input.
What is the purpose of storing messages in HBase if you plan to parse it out again into another store.
Depending on your needs, the another service (example, mapreduce) can consume jms messages, process it and output to what ever final destination you are thinking of instead of HBase. Unless you have a need to store original messages for a long term.

Measuring bandwidth/speed in several java sockets?

I have coded a server in Java that will have several clients connected to it. I want to be able to see how much data is sent to each client to be able to make decisions like allowing more clients or decreasing them, or even to increase/decrease the frequency at which the data is sent.
How can I do that?
I'm currently using Java's Socket API, but if any other library gives me this easily, then a change can be done. The server will run in a linux flavor, likely Ubuntu, so a OS specific answer is welcomed too.
When you write data to the socket, you need to remember how much you sent. There really isn't smarter way to do this.
Generally speaking, you would allow the server to have a limited number of connections. Trying to tune the system based on bandwidth restrictions is very hard to get right.

Java/Scala resource consumption and load

I am developing a web application in Scala. Its a simple application which will take data on a port from clients (JSON or ProtoBufs) and do some computation using a database server and then reply the client with a JSON / Protobuf object.
Its not a very heavy application. 1000 lines of code max. It will create a thread on every client request. The time it takes right now between getting the request and replying back is between 20 - 40ms.
I need an advice on what kind of hardware / setup should i use to serve 3000+ such requests per second. I need to procure hardware to put at my data center.
Anybody who has some experience deploying java apps at scale, please advice. Should i use one big box with 2 - 4 Xeon 5500s with 32 GB RAMs or multiple smaller machines.
UPDATE - we dont have many clients. 3 - 4 of them. Requests will be from these 3 of them.
If each request takes on average 30 ms, a single core can handle only 30 requests per second. Supposing that your app scales linearly (the best scenario you can expect), then you will need at least 100 cores to reach 3000 req/s. Which is more than 2-4 Xeon.
Worst, if you app relies on IO or on DB (like most useful applications), you will get a sublinear scaling and you may need a lot more...
So the first thing to do is to analyze and optimize the application. Here are a few tips:
Creating a thread is expensive, try to create a limited number of threads and reuse them among requests (in java see ExecutorService for example).
If you app is IO-intensive: try to reduce IO calls as much a possible, using a cache in memory and give a try to non-blocking IO.
If you app is dependent of a database, consider caching and try a distributed solution if possible.

BlazeDS Polling Interval set to 0: Unwanted side-effects?

tl;dr: Setting the polling-interval to 0 has given my performance a huge boost, but I am worried about possible problems down the line.
In my application, I am doing a fair amount of publishing from our java server to our flex client, publishing on a variety of topics and sub-topics.
Recently, we have been on a round of performance improvements system-wide, and the messaging layer was proving to be a big bottleneck.
A few minutes ago, I discovered that setting the <polling-interval-millis> property in our services-config.xml to 0 caused published messages, even when there are lots of them, to be recognized by the client almost instantly, instead of with the 3 second delay that is the default value for polling-interval-millis, which has obviously had a tremendous impact.
So, I'm pretty happy with the current performance, only thing is, I'm a bit nervous about unintended side-effects caused by this change. In particular, I am worried about our Flash client slowing way down, and of way too much unwanted traffic.
My preliminary testing has not borne out this fear, but before I commit the change to our repository, I was hoping that somebody with experience with this stuff would chime in.
Unfortunately your question is too general...there is no way to receive a specific answer. I'll write below some ideas, maybe they are helpful.
Decreasing the value from 3 to 0 means that you are receiving new data way faster. If your Flex client uses this data in order to make complex computations it is possible to slow your client or to show obsolete data (it is a known pattern, see http://help.adobe.com/en_US/LiveCycleDataServicesES/3.1/Developing/WS3a1a89e415cd1e5d1a8a18fb122bdc0aad5-8000Update.html ). You need to understand how the data is processed and probably to do some client benchmarking.
Also the server will have to handle more requests, and it would be good to identify what is the maximum requests per second which can be handled. For that, you will need to use a tool like Jmeter in order to detect the maximum capacity of your system, after that you can do some computations trying to figure out how many requests per second you will have after you reduced the interval from 3 to 0, taking into account that the number of clients is increasing with 10% per month etc etc.
The main idea is that you should do some performance testing for some API and save the scripts in order to see if your future modification are slowing down the system too much. Without having this it is quite hard to guess if it ok or not to change configuration parameters.
You might want to try out long-polling. For our Weblogic servers, we don't get any problems unless we let the poll request go to 5 minutes, so we keep it to 4, then give it a 1 second rest before starting again. We have a couple of hundred total users, with 60-70 on it hard core all day. The thing to keep in mind is that you're basically turning intermittent user requests into what amounts to almost always connected telnet sessions. Depending on the browser your users are using it can implications from that as well, but overall we've been very pleased.

Measuring latency

I'm working on a multiplayer project in Java and I am trying to refine how I gather my latency measurement results.
My current setup is to send a batch of UDP packets at regular intervals that get timestamped by the server and returned, then latency is calculated and recorded. I take number of samples then work out the average to get the latency.
Does this seem like a reasonable solution to work out the latency on the client side?
I would have the client timestamp the outgoing packet, and have the response preserve the original timestamp. This way you can compute the roundtrip latency while side-stepping any issues caused by the server and client clocks not being exactly synchronized.
You could also timestamp packets used in your game protocol . So you will have more data to integrate your statistics. (This method is also useful to avoid the overhead caused by an additional burst of data. You simply used the data you are already exchanging to do your stats)
You could also start to use other metrics (for example variance) in order to make a more accurate estimation of your connection quality.
If you haven't really started your project yet, consider using a networking framework like KryoNet, which has RMI and efficient serialisation and which will automatically send ping requests using UDP. You can get the ping time values easily.
If you are measuring roundtrip latency, factors like clock drift, precision of HW clock and OS api would affect your measurement. Without spending money on the hardware the closest that you can get is by using RDTSC instructions. But RDTSC doesnt go without its own problems, you have to be careful how you call it.

Categories