Measuring latency - java

I'm working on a multiplayer project in Java and I am trying to refine how I gather my latency measurement results.
My current setup is to send a batch of UDP packets at regular intervals that get timestamped by the server and returned, then latency is calculated and recorded. I take number of samples then work out the average to get the latency.
Does this seem like a reasonable solution to work out the latency on the client side?

I would have the client timestamp the outgoing packet, and have the response preserve the original timestamp. This way you can compute the roundtrip latency while side-stepping any issues caused by the server and client clocks not being exactly synchronized.

You could also timestamp packets used in your game protocol . So you will have more data to integrate your statistics. (This method is also useful to avoid the overhead caused by an additional burst of data. You simply used the data you are already exchanging to do your stats)
You could also start to use other metrics (for example variance) in order to make a more accurate estimation of your connection quality.

If you haven't really started your project yet, consider using a networking framework like KryoNet, which has RMI and efficient serialisation and which will automatically send ping requests using UDP. You can get the ping time values easily.

If you are measuring roundtrip latency, factors like clock drift, precision of HW clock and OS api would affect your measurement. Without spending money on the hardware the closest that you can get is by using RDTSC instructions. But RDTSC doesnt go without its own problems, you have to be careful how you call it.

Related

Java Chat application bandwidth usage?

I've tried to look around for data concerning how much of a bandwidth hog a chat application is.
In this case maybe with a Java/AJAX implementation or simply just Java, using Server/Client relationship.
I want to find out, how much bandwidth such a system would use when it's written in Java. The benchmark could be 15-20 users from all over the world and peaking at maybe 8 or 10 max connected at a time. I know it might seem vague, but I simply can't seem to find data on this specific situation.
Can anyone point me to some resources regarding this? Or chip in if possible?
Unless the chat application is sending photos or files, it will use a trivial amount of data. With a max user count of ten people at once you could wrap the messages in a bandwidth hog of xml and I would still stick with my answer: it will use a trivial amount of bandwidth.
Say all ten of your users are fast typers and very chatty. They type non-stop at 100 words per minute. Break that down to 10 sentences per minute and wrap each of these in a message to the server. Add some XML data describing who the message came from and whether it is private to another user or sent to a group of users and maybe you could get 1K per message. So each user is then sending 1K to the server every 6 seconds. With 10 users, we get 10K sent to the server every 6 seconds.
So by my estimate, we could connect your server to a 56K modem from 1995 and you'll be fine.
The reason you can't find data about this is because there's nothing particularly Java- or AJAX-related here. Bandwidth usage depends on the data you send/receive over the network, and therefore is dependent upon the protocol that you design to pass data around; it has nothing to do with whether you use Java only, or AJAX in combination of Java, or CGI scripts, PL/I or Assembler.
You can code a chat application in Assembler that will be a worse bandwidth hog than a chat application coded in Java.
In order to know your bandwidth impact, you need to analyze your data model, data flow and your overall communication protocol: namely, what data is being sent, in what structure, and how frequently.

Time synchronization

I am creating a web application in Java in which I need to run a reverse timer on client browser. I have planned to send the remaining time from server to client and then tick the timer using javascript.
My questions are:
1. Does the clock tick rate varies with different systems?
2. Is there any better way to do this?
Does the clock tick rate varies with different systems?
Yes, it's the result of really, really small differences of frequencies of the quatrz used in chipsets. So if you do not synchronize your clocks now and then, they will diverge.
However, if you're not designing a satellite, remote control for ballistic missiles, or life supporting devices, you really should not care.
Is there any better way to do this?
Yes, if:
your reverse clock counts down from a year or at least month, or
you are running your client on a device with broken / really inaccurate clock
you may use a NTP protocol to make sure the client and the server clocks are synchronized. There are NTP libraries available for JavaScript and Java.
#npe -s solution with NTP will do, but is theoretically incorrect:
Even if the clocks are perfectly sync-ed, you will send the client the remaining-time. But that message needs to travel on the net, so by the time the client receives it it won't be correct anymore.
A better approach would be to send the end time to the client, which is an absolute value, hence not affected by network lag and do the countdown on the client, calculating the remaining-time there.
that said, the other answers about NTP are also necessary of course

How use NTP server when scheduling tasks in Java app?

I know NTP servers can be used to synchronize your computer's system clock. But can NTP be used by an application that wants to schedule things in sync with other systems?
Scenario: Developing a java app (perhaps to run in an ESB like Mule) and you won't necessarily be able to control the time of the machine on which it will run. Can your app use an NTP server to obtain the time and schedule tasks to run based on that time?
Let's say you're using Quartz as the scheduler and perhaps joda-time for handling times (if that's useful). The time doesn't have to be super precise, just want to make sure not ahead of remote systems by much.
If you're not super worried about drift, and assuming that the machines aren't just randomly changing time, then you could ping an NTP server to get what time IT thinks it is, and compare that to the time your local machine thinks that it is, then calculate the differential and finally schedule your task in local time.
So, for example, say that the NTP server says that it's 12:30, but your local machine says that it is 12:25. And you want your task to go off at 13:00 NTP time.
So, 12:25 - 12:30 = -0:05. 13:00 + (-0:05) = 12:55, therefore you schedule your task for 12:55.
Addenda --
I can't speak to the naivety of an implementation, I'm not familiar enough with the protocol.
In the end it comes down to what level of practical accuracy is acceptable to you. NTP is used to synchronize time between systems. One of the problems it solves is by being continually invoked, it prevents clock creep. If you use the "NTP Ping, schedule with offset" technique, and, say, that future time is perhaps 8 hrs in the future, there's a very real possibility of clock creep, meaning that although you wanted the task to go off at "12:55", when 12:55 rolls around, it could be off from the original NTP server since the clocks have not been synced (at all), and the job has not been rescheduled to virtually resync.
Obviously, the longer the period between original schedule and actual execution, the more the potential for drift. This is an artifact no matter how good the original NTP ping is. If you do not plan on rescheduling these tasks as they get close to execution time in order to compensate for drift, then odds are any "reasonable" implementation of NTP will suit.
There's the Apache Commons NET library that has a NTP client. Some complain that it uses System.currentTimeMillis(), which has (had?) resolution issues (10-15ms) on Windows. System.nanoTime addresses this, and you could easily change the library to use that, and rebuild it.
I can't speak to how it reflects the "naivety" of the implementation. But in the end it comes down to how close you need to keep the two machines and their jobs (virtually) in sync.
My intuition tells me that the NTP requires a hardware clock adjustments to keep a pace. So if you don't have access to the hardware, you cannot do it.
However, if it is enough to have a few seconds precision, you could periodically send sample time from a server to calculate a skew between the system clock and adjust scheduled time for jobs.
But can NTP be used by an application that wants to schedule things in sync with other systems?
I've never heard of it being used that way. However, there's nothing to stop you implementing a client for the Network Time Protocol (RFC 1305). A full NTP implementation is probably overkill, but you can also use the protocol in SNTP mode (RFC 2030).
You probably want to set up and use a local NTP server if you want high availability and reasonable accuracy.
A Google search indicates that there are a number of Java NTP clients out there ...

BlazeDS Polling Interval set to 0: Unwanted side-effects?

tl;dr: Setting the polling-interval to 0 has given my performance a huge boost, but I am worried about possible problems down the line.
In my application, I am doing a fair amount of publishing from our java server to our flex client, publishing on a variety of topics and sub-topics.
Recently, we have been on a round of performance improvements system-wide, and the messaging layer was proving to be a big bottleneck.
A few minutes ago, I discovered that setting the <polling-interval-millis> property in our services-config.xml to 0 caused published messages, even when there are lots of them, to be recognized by the client almost instantly, instead of with the 3 second delay that is the default value for polling-interval-millis, which has obviously had a tremendous impact.
So, I'm pretty happy with the current performance, only thing is, I'm a bit nervous about unintended side-effects caused by this change. In particular, I am worried about our Flash client slowing way down, and of way too much unwanted traffic.
My preliminary testing has not borne out this fear, but before I commit the change to our repository, I was hoping that somebody with experience with this stuff would chime in.
Unfortunately your question is too general...there is no way to receive a specific answer. I'll write below some ideas, maybe they are helpful.
Decreasing the value from 3 to 0 means that you are receiving new data way faster. If your Flex client uses this data in order to make complex computations it is possible to slow your client or to show obsolete data (it is a known pattern, see http://help.adobe.com/en_US/LiveCycleDataServicesES/3.1/Developing/WS3a1a89e415cd1e5d1a8a18fb122bdc0aad5-8000Update.html ). You need to understand how the data is processed and probably to do some client benchmarking.
Also the server will have to handle more requests, and it would be good to identify what is the maximum requests per second which can be handled. For that, you will need to use a tool like Jmeter in order to detect the maximum capacity of your system, after that you can do some computations trying to figure out how many requests per second you will have after you reduced the interval from 3 to 0, taking into account that the number of clients is increasing with 10% per month etc etc.
The main idea is that you should do some performance testing for some API and save the scripts in order to see if your future modification are slowing down the system too much. Without having this it is quite hard to guess if it ok or not to change configuration parameters.
You might want to try out long-polling. For our Weblogic servers, we don't get any problems unless we let the poll request go to 5 minutes, so we keep it to 4, then give it a 1 second rest before starting again. We have a couple of hundred total users, with 60-70 on it hard core all day. The thing to keep in mind is that you're basically turning intermittent user requests into what amounts to almost always connected telnet sessions. Depending on the browser your users are using it can implications from that as well, but overall we've been very pleased.

Sporadic behavior by the machines in stress

We are doing some Java stress runs (involving network IO). Initially things are all fine and the system responds very fast (avg latency in test 2ms). But hours later when I redo the same test I observe the performance goes down (20 - 60ms). It's the same Jar files, same JVM, and the same LAN over which the stress is running. I am not understanding the reason for this behavior.
The LAN is 1GBPS and for the stress requirements I'm sure we are not using all of it.
So my questions:
Can it be because of some switches in the LANs?
Does the machine slow off after some time ( The machines are restarted .. say about 6months back well before the stress can start; They are RHEL5, XEON 64bit Quad core)
What is the general way to debug such an issues?
A few questions...
How much of the environment is under your control and are you putting any measures in place to ensure it's consistent for each run? i.e. are you sharing the network with other systems, is the machine you're using being used solely for your stress testing?
The way I'd look at this is to start gathering details on what your machine and code are up to. That means use perfmon (windows) sar (unix) to find out what the OS and hardware is doing and get a profiler attached to make sure your code is doing the same thing and help pin-point where the bottleneck is occuring from a code perspective.
Nothing terribly detailed but something I hope that will help get you started.
The general way is "measure everything". This, in particular might mean:
Ensure time on all servers is the same (use ntp or something similar);
Measure how long did it take to generate request (what if request generator has a bug?);
Measure when did request leave the client machine(s), or at least how long did it take to do i/o. Sometimes it is enough to know average time necessary for many requests.
Measure when did the request arrive.
Measure how long did it take to generate a response.
Measure how long did it take to send the response.
You can probably start from the 5th element, as this is (you believe) your critical chain. But it is best to log as much as you can - as according to what you've said yourself, it takes days to produce different results.
If you don't want to modify your code, look for cases where you can sniff data without intervening (e.g. define a servlet filter in your web.xml).

Categories