Precise time measurement in Java

Precise time measurement in Java - java

Java gives access to two method to get the current time: System.nanoTime() and System.currentTimeMillis(). The first one gives a result in nanoseconds, but the actual accuracy is much worse than that (many microseconds).
Is the JVM already providing the best possible value for each particular machine?
Otherwise, is there some Java library that can give finer measurement, possibly by being tied to a particular system?

The problem with getting super precise time measurements is that some processors can't/don't provide such tiny increments.
As far as I know, System.currentTimeMillis() and System.nanoTime() is the best measurement you will be able to find.
Note that both return a long value.

It's a bit pointless in Java measuring time down to the nanosecond scale; an occasional GC hit will easily wipe out any kind of accuracy this may have given. In any case, the documentation states that whilst it gives nanosecond precision, it's not the same thing as nanosecond accuracy; and there are operating systems which don't report nanoseconds in any case (which is why you'll find answers quantized to 1000 when accessing them; it's not luck, it's limitation).
Not only that, but depending on how the feature is actually implemented by the OS, you might find quantized results coming through anyway (e.g. answers that always end in 64 or 128 instead of intermediate values).
It's also worth noting that the purpose of the method is to find the two time differences between some (nearby) start time and now; if you take System.nanoTime() at the start of a long-running application and then take System.nanoTime() a long time later, it may have drifted quite far from real time. So you should only really use it for periods of less than 1s; if you need a longer running time than that, milliseconds should be enough. (And if it's not, then make up the last few numbers; you'll probably impress clients and the result will be just as valid.)

Unfortunately, I don't think java RTS is mature enough at this moment.
Java time does try to provide best value (they actually delegate the native code to call get the kernal time). However, JVM specs make this coarse time measurement disclaimer mainly for things like GC activities, and support of underlying system.
Certain GC activities will block all threads even if you are running concurrent GC.
default linux clock tick precision is only 10ms. Java cannot make it any better if linux kernal does not support.
I haven't figured out how to address #1 unless your app does not need to do GC. A decent and med size application probably and occasionally spends like tens of milliseconds on GC pauses. You are probably out of luck if your precision requirement is lower 10ms.
As for #2, You can tune the linux kernal to give more precision. However, you are also getting less out of your box because now kernal context switch more often.
Perhaps, we should look at it different angle. Is there a reason that OPS needs precision of 10ms of lower? Is it okay to tell Ops that precision is at 10ms AND also look at the GC log at that time, so they know the time is +-10ms accurate without GC activity around that time?

If you are looking to record some type of phenomenon on the order of nanoseconds, what you really need is a real-time operating system. The accuracy of the timer will greatly depend on the operating system's implementation of its high resolution timer and the underlying hardware.
However, you can still stay with Java since there are RTOS versions available.

JNI:
Create a simple function to access the Intel RDTSC instruction or the PMCCNTR register of co-processor p15 in ARM.
Pure Java:
You can possibly get better values if you are willing to delay until a clock tick. You can spin checking System.nanoTime() until the value changes. If you know for instance that the value of System.nanoTime() changes every 10000 loop iterations on your platform by amount DELTA then the actual event time was finalNanoTime-DELTA*ITERATIONS/10000. You will need to "warm-up" the code before taking actual measurements.
Hack (for profiling, etc, only):
If garbage collection is throwing you off you could always measure the time using a high-priority thread running in a second jvm which doesn't create objects. Have it spin incrementing a long in shared memory which you use as a clock.

Related

Best approach for dealing with time measures? [duplicate]

This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 5 years ago.
My goal is to write a framework for measuring method execution or transaction time and for processing the measurements, i.e. storing, analysis etc. Transaction may include calls to external systems, with synchronously or asynchronously waiting for the results.
There already have been some questions around that topic, like
"How do I time a method's execution"
"Measure execution time for a Java method"
"System.currentTimeMillis vs System.nanoTime"
And all the answers boil down to three approaches for taking the time
System.currentTimeMillis()
System.nanoTime()
Instant.now() and Duration(since Java 8)
I know, all of these have some implications
System.currentTimeMillis()
The result of this method depends on the platform. On Linux you get 1ms resolution, of Windows you get 10ms (single core) ~15ms (multi core). So it's ok for measuring large running operations or multiple executions of short running ops.
System.nanoTime()
You get a high resolution time measure, with nano second precision (but not necessarily nano second accuracy) and you get an overflow after 292 years (I could live with that).
Instant.now() and Duration
Since Java 8 there is the new time API. An instant has a second and a nano second field, so it uses on top of the Object reference two long values (same for Duration). You also get nano second precision, depending on the underlying clock (see "Java 8 Instant.now() with nanosecond resolution?"). The instantiaion is done by invoking Instant.now() which maps down to System.currentTimeMillis() for the normal System clock.
Given the facts, it becomes apparent, that best precision is only achievable with System.nanoTime(), but my question targets more towards a best-practice for dealing with the measures in general, which not only includes the measure taking but also the measure handling.
Instant and Duration provide best API support (calculating, comparing, etc) but have os-dependend precision in standard case, and more overhead for memory and creating a measure (Object construction, deeper callstack)
System.nanoTime() and System.currentTimeMillis() have different levels of precision but only have basic "api" support (math operations on long), but are faster to get and smaller to keep in memory.
So what would be the best approach? Are there any implications I didn't think of? Are there any alternatives?

You are focusing too much on the unimportant detail of the precision. If you want to measure/profile the execution of certain operations, you have to make sure that these operation run long enough to make the measurement unaffected by one-time artifacts, small differences in thread scheduling timing, garbage collection or HotSpot optimization. In most cases, if the differences become smaller than the millisecond scale, they are not useful to draw conclusions from them.
The more important aspect is whether the tools are designed for your task. System.currentTimeMillis() and all other wall-clock based APIs, whether they are based on currentTimeMillis() or not, are designed to give you a clock which is intended to be synchronized with Earth’s rotation and its path around the Sun, which loads it with the burden of Leap Seconds and other correction measures, not to speak of the fact that your computer’s clock may be out of sync with the wall clock anyway and get corrected, e.g. via NTP updates, in the worst case jumping right when you are trying to measure your elapsed time, perhaps even backwards.
In contrast, System.nanoTime() is designed to measure elapsed time (exactly what you want to do) and nothing else. Since its return value has an unspecified origin and may even be negative, only differences between two values returned by this method make any sense at all. You will find this even in the documentation:
The values returned by this method become meaningful only when the difference between two such values, obtained within the same instance of a Java virtual machine, is computed.
So when you want to measure and process the elapsed time of your method execution or transactions, System.nanoTime() is the way to go. Granted, it only provides a naked long value, but it isn’t clear what kind of API support you want. Since points of time are irrelevant and even distracting here, you’ll have a duration only, which you may convert to other time units or, if you want to use the new time API, you can create a Duration object using Duration.ofNanos(long), allowing you to add and subtract duration values and compare them, but there isn’t much more you could do. You must not mix them up with wall-clock or calendar based durations…
As a final note, the documentation is a bit imprecise about the limitation. If you are calculating the difference between two values returned by System.nanoTime(), a numerical overflow isn’t bad per se. Since the counter has an unspecified origin, the start value of your operation might be close to Long.MAX_VALUE whereas the end value is close to Long.MIN_VALUE because the JVM’s counter had an overflow. In this case, calculating the difference will cause another overflow, producing a correct value for the difference. But if you store that difference in a signed long, it can hold at most 2⁶³ nanoseconds, limiting the difference to max 292 years, but if you treat it as unsigned long, e.g. via Long.compareUnsigned and Long.toUnsignedString, you may handle even 2⁶⁴ nanoseconds duration, in other words you can measure up to 584 years of elapsed time this way, if your computer doesn’t break in-between…

I would recommend using getThreadCpuTime from ThreadMXBean (see also https://stackoverflow.com/a/7467299/185031). If you want to measure the execution time of a method, you are most of the time not so much interested in the wall clock, but more on the CPU execution time.

How long is one millisecond in the Java world?

I have different reasons for asking this question.
What was the decision for switching to micro-seconds in JSR 310: Date and Time API based on.
If I measure time with System.currentTimeMillis() how can I interpret 1ms? How many method-calls, how many sysouts, how many HashMap#pushs.
I'm absolutely aware of the low scientific standard of this question, but I'd like to have some default values for java operations.
Edit:
I was talking about:
long t1 = System.currentTimeMillis();
//do random stuff
System.out.println(System.currentTimeMillis()-t1);

What was the decision for switching to micro-seconds in JSR 310: Date and Time API based on.
Modern hardware has had microsecond-precision clocks for quite some time; and it is not new to JSR 310. Consider TimeUnit, which appeared in 1.5 (along with System.nanoTime(), see below) and has a MICROSECONDS value.
If I measure time with System.currentTimeMillis() how can I interpret 1ms?
As accurate as the hardware clock/OS primitive combo allows. There will always be some skew. But unless you do "real" real time (and you shouldn't be doing Java in this case), you will likely never notice.
Note also that this method measures the number of milliseconds since epoch, and it depends on clock adjustements. This is unlike System.nanoTime(), which relies on an onboard tick counter which never "decrements" with time.
FWIW, Timer{,Task} uses System.currentTimeMillis() to measure time, while the newer ScheduledThreadPool uses System.nanoTime().
As to:
How many method-calls, how many sysouts, how many HashMap#pushs (in 1 ms)
impossible to tell! Method calls depend on what methods do, sysouts depend on the speed of your stdout (try and sysout on a 9600 baud serial port), pushes depend on memory bus speed/CPU cache etc; you didn't actually expect an accurate answer for that, did you?

A system.currentTimeMillis() milli-second is almost exactly a milli-second except when the system clock is being corrected e.g. using Network Time Protocol (NTP). NTP can cause significant leaps in time either forward or backward, but this is rare. If you want a monotonically increasing clock with more resolution, use System.nanoTime() instead.
How many method-calls, how many sysouts, how many HashMap#pushs (in 1 ms)
Empty method calls can be eliminated so only sizable method call matter. What the method does is more important. You can expect between 1 and 10000 method calls in a milli-second.
sysouts are very dependent on the display and whether it has been paused. Under normal conditions, you can expect to get 1 to 100 lines of output depending on length and the display in 1 ms. If the stream is paused for any reason, you might not get any.
HashMap has a put() not push() and you can get around 1000 to 10000 of these in a mill-second. The problem is you have to have something worth putting, and this usually takes longer.

The answers of Peter Lawrey and fge are so far correct, but I would add following detail considering your statement about microsecond resolution of JSR-310. This new API uses nanosecond resolution, not microsecond resolution.
The reason for this is not that clocks based on System.currentTimeMillis() might achieve such high degree of precision. No, such clocks just count milliseconds and are sensible for external clock adjustments which can even cause big jumps in time far beyond any subsecond level. The support for nanoseconds is rather motivated to support at maximum nanosecond-based timestamps in many databases (not all!).
It should be noted though that this "precision" is not real-time accuracy, but serves more for avoiding duplicated timestamps (which are often created via mixed clock and counter mechanisms - not measuring scientifically accurate real-time). Such database timestamps are often used as primary keys by some people - therefore their requirement for a non-duplicate monotonically increasing timestamp).
The alternative System.nanoTime() is supposed and designed to show better monotonically increasing behaviour although I would not bet my life on that in some multi-core environments. But here you indeed get nanosecond-like differences between timestamps so JSR-310 can give at least calculatory support via classes like java.time.Duration (but again not necessarily scientifically accurate nanosecond differences).

Timing a block of code with C++ and Java

I am trying to compare the accuracy of timing methods with C++ and Java.
With C++ I usually use CLOCKS_PER_SEC, I run the block of code I want to time for a certain amount of time and then calculate how long it took, based on how many times the block was executed.
With Java I usually use System.nanoTime().
Which one is more accurate, the one I use for C++ or the one I use for Java? Is there any other way to time in C++ so I don't have to repeat the piece of code to get a proper measurement? Basically, is there a System.nanoTime() method for C++?
I am aware that both use system calls which cause considerable latencies. How does this distort the real value of the timing? Is there any way to prevent this?

Every method has errors. Before you spend a great deal of time on this question, you have to ask yourself "how accurate do I need my answer to be"? Usually the solution is to run a loop / piece of code a number of times, and keep track of the mean / standard deviation of the measurement. This is a good way to get a handle on the repeatability of your measurement. After that, assume that latency is "comparable" between the "start time" and "stop time" calls (regardless of what function you used), and you have a framework to understand the issues.
Bottom line: clock() function typically gives microsecond accuracy.
See https://stackoverflow.com/a/20497193/1967396 for an example of how to go about this in C (in that instance, using a usec precision clock). There's the ability to use ns timing - see for example the answer to clock_gettime() still not monotonic - alternatives? which uses clock_gettime(CLOCK_MONOTONIC_RAW, &tSpec);
Note that you have to extract seconds and nanoseconds separately from that structure.

Be careful using System.nanoTime() as it is still limited by the resolution that the machine you are running on can give you.
Also there are complications timing Java as the first few times through a function will be a lot slower until they get optimized for your system.
Virtually all modern systems use pre-emptive multi threading and multiple cores, etc - so all timings will vary from run to run. (For example if control gets switched away from your thread while it in the method).
To get reliable timings you need to
Warm up the system by running around the thing you are timing a few hundred times before starting.
Run the code for a good number of times and average the results.
The reliability issues are the same for any language so apply just as well to C as to Java so C may not need the warm-up loop but you will still need to take a lot of samples and average them.

Timing a remote call in a multithreaded java program

I am writing a stress test that will issue many calls to a remote server. I want to collect the following statistics after the test:
Latency (in milliseconds) of the remote call.
Number of operations per second that the remote server can handle.
I can successfully get (2), but I am having problems with (1). My current implementation is very similar to the one shown in this other SO question. And I have the same problem described in that question: latency reported by using System.currentTimeMillis() is longer than expected when the test is run with multiple threads.
I analyzed the problem and I am positive the problem comes from the thread interleaving (see my answer to the other question that I linked above for details), and that System.currentTimeMillis() is not the way to solve this problem.
It seems that I should be able to do it using java.lang.management, which has some interesting methods like:
ThreadMXBean.getCurrentThreadCpuTime()
ThreadMXBean.getCurrentThreadUserTime()
ThreadInfo.getWaitedTime()
ThreadInfo.getBlockedTime()
My problem is that even though I have read the API, it is still unclear to me which of these methods will give me what I want. In the context of the other SO question that I linked, this is what I need:
long start_time = **rightMethodToCall()**;
result = restTemplate.getForObject("Some URL",String.class);
long difference = (**rightMethodToCall()** - start_time);
So that the difference gives me a very good approximation of the time that the remote call took, even in a multi-threaded environment.
Restriction: I'd like to avoid protecting that block of code with a synchronized block because my program has other threads that I would like to allow to continue executing.
EDIT: Providing more info.:
The issue is this: I want to time the remote call, and just the remote call. If I use System.currentTimeMillis or System.nanoTime(), AND if I have more threads than cores, then it is possible that I could have this thread interleaving:
Thread1: long start_time ...
Thread1: result = ...
Thread2: long start_time ...
Thread2: result = ...
Thread2: long difference ...
Thread1: long difference ...
If that happens, then the difference calculated by Thread2 is correct, but the one calculated by Thread1 is incorrect (it would be greater than it should be). In other words, for the measurement of the difference in Thread1, I would like to exclude the time of lines 4 and 5. Is this time that the thread was WAITING?
Summarizing question in a different way in case it helps other people understand it better (this quote is how #jason-c put it in his comment.):
[I am] attempting to time the remote call, but running the test with multiple threads just to increase testing volume.

Use System.nanoTime() (but see updates at end of this answer).
You definitely don't want to use the current thread's CPU or user time, as user-perceived latency is wall clock time, not thread CPU time. You also don't want to use the current thread's blocking or waiting time, as it measures per-thread contention times which also doesn't accurately represent what you are trying to measure.
System.nanoTime() will return relatively accurate results (although granularity is technically only guaranteed to be as good or better than currentTimeMillis(), in practice it tends to be much better, generally implemented with hardware clocks or other performance timers, e.g. QueryPerformanceCounter on Windows or clock_gettime on Linux) from a high resolution clock with a fixed reference point, and will measure exactly what you are trying to measure.
long start_time = System.nanoTime();
result = restTemplate.getForObject("Some URL",String.class);
long difference = (System.nanoTime() - start_time);
long milliseconds = difference / 1000000;
System.nanoTime() does have it's own set of issues but be careful not to get whipped up in paranoia; for most applications it is more than adequate. You just wouldn't want to use it for, say, precise timing when sending audio samples to hardware or something (which you wouldn't do directly in Java anyways).
Update 1:
More importantly, how do you know the measured values are longer than expected? If your measurements are showing true wall clock time, and some threads are taking longer than others, that is still an accurate representation of user-perceived latency, as some users will experience those longer delay times.
Update 2 (based on clarification in comments):
Much of my above answer is still valid then; but for different reasons.
Using per-thread time does not give you an accurate representation because a thread could be idle/inactive while the remote request is still processing, and you would therefore exclude that time from your measurement even though it is part of perceived latency.
Further inaccuracies are introduced by the remote server taking longer to process the simultaneous requests you are making - this is an extra variable that you are adding (although it may be acceptable as representative of the remote server being busy).
Wall time is also not completely accurate because, as you have seen, variances in local thread overhead may add extra latency that isn't typically present in single-request client applications (although this still may be acceptable as representative of a client application that is multi-threaded, but it is a variable you cannot control).
Of those two, wall time will still get you closer to the actual result than per-thread time, which is why I left the previous answer above. You have a few options:
You could do your tests on a single thread, serially -- this is ultimately the most accurate way to achieve your stated requirements.
You could not create more threads than cores, e.g. a fixed size thread pool with bound affinities (tricky: Java thread affinity) to each core and measurements running as tasks on each. Of course this still adds any variables due to synchronization of underlying mechanisms that are beyond your control. This may reduce the risk of interleaving (especially if you set the affinities) but you still do not have full control over e.g. other threads the JVM is running or other unrelated processes on the system.
You could measure the request handling time on the remote server; of course this does not take network latency into account.
You could continue using your current approach and do some statistical analysis on the results to remove outliers.
You could not measure this at all, and simply do user tests and wait for a comment on it before attempting to optimize it (i.e. measure it with people, who are what you're developing for anyways). If the only reason to optimize this is for UX, it could very well be the case that users have a pleasant experience and the wait time is totally acceptable.
Also, none of this makes any guarantees that other unrelated threads on the system won't be affecting your timings, but that is why it is important to both a) run your test multiple times and average (obviously) and b) set an acceptable requirement for timing error's that you are OK with (do you really need to know this to e.g. 0.1ms accuracy?).
Personally, I would either do the first, single-threaded approach and let it run overnight or over a weekend, or use your existing approach and remove outliers from the result and accept a margin of error in the timings. Your goal is to find a realistic estimate within a satisfactory margin of error. You will also want to consider what you are going to ultimately do with this information when deciding what is acceptable.

Is System.nanoTime() completely useless?

As documented in the blog post Beware of System.nanoTime() in Java, on x86 systems, Java's System.nanoTime() returns the time value using a CPU specific counter. Now consider the following case I use to measure time of a call:
long time1= System.nanoTime();
foo();
long time2 = System.nanoTime();
long timeSpent = time2-time1;
Now in a multi-core system, it could be that after measuring time1, the thread is scheduled to a different processor whose counter is less than that of the previous CPU. Thus we could get a value in time2 which is less than time1. Thus we would get a negative value in timeSpent.
Considering this case, isn't it that System.nanotime is pretty much useless for now?
I know that changing the system time doesn't affect nanotime. That is not the problem I describe above. The problem is that each CPU will keep a different counter since it was turned on. This counter can be lower on the second CPU compared to the first CPU. Since the thread can be scheduled by the OS to the second CPU after getting time1, the value of timeSpent may be incorrect and even negative.

This answer was written in 2011 from the point of view of what the Sun JDK of the time running on operating systems of the time actually did. That was a long time ago! leventov's answer offers a more up-to-date perspective.
That post is wrong, and nanoTime is safe. There's a comment on the post which links to a blog post by David Holmes, a realtime and concurrency guy at Sun. It says:
System.nanoTime() is implemented using the QueryPerformanceCounter/QueryPerformanceFrequency API [...] The default mechanism used by QPC is determined by the Hardware Abstraction layer(HAL) [...] This default changes not only across hardware but also across OS versions. For example Windows XP Service Pack 2 changed things to use the power management timer (PMTimer) rather than the processor timestamp-counter (TSC) due to problems with the TSC not being synchronized on different processors in SMP systems, and due the fact its frequency can vary (and hence its relationship to elapsed time) based on power-management settings.
So, on Windows, this was a problem up until WinXP SP2, but it isn't now.
I can't find a part II (or more) that talks about other platforms, but that article does include a remark that Linux has encountered and solved the same problem in the same way, with a link to the FAQ for clock_gettime(CLOCK_REALTIME), which says:
Is clock_gettime(CLOCK_REALTIME) consistent across all processors/cores? (Does arch matter? e.g. ppc, arm, x86, amd64, sparc).
It should or it's considered buggy.
However, on x86/x86_64, it is possible to see unsynced or variable freq TSCs cause time inconsistencies. 2.4 kernels really had no protection against this, and early 2.6 kernels didn't do too well here either. As of 2.6.18 and up the logic for detecting this is better and we'll usually fall back to a safe clocksource.
ppc always has a synced timebase, so that shouldn't be an issue.
So, if Holmes's link can be read as implying that nanoTime calls clock_gettime(CLOCK_REALTIME), then it's safe-ish as of kernel 2.6.18 on x86, and always on PowerPC (because IBM and Motorola, unlike Intel, actually know how to design microprocessors).
There's no mention of SPARC or Solaris, sadly. And of course, we have no idea what IBM JVMs do. But Sun JVMs on modern Windows and Linux get this right.
EDIT: This answer is based on the sources it cites. But i still worry that it might actually be completely wrong. Some more up-to-date information would be really valuable. I just came across to a link to a four year newer article about Linux's clocks which could be useful.

I did a bit of searching and found that if one is being pedantic then yes it might be considered useless...in particular situations...it depends on how time sensitive your requirements are...
Check out this quote from the Java Sun site:
The real-time clock and
System.nanoTime() are both based on
the same system call and thus the same
clock.
With Java RTS, all time-based APIs
(for example, Timers, Periodic
Threads, Deadline Monitoring, and so
forth) are based on the
high-resolution timer. And, together
with real-time priorities, they can
ensure that the appropriate code will
be executed at the right time for
real-time constraints. In contrast,
ordinary Java SE APIs offer just a few
methods capable of handling
high-resolution times, with no
guarantee of execution at a given
time. Using System.nanoTime() between
various points in the code to perform
elapsed time measurements should
always be accurate.
Java also has a caveat for the nanoTime() method:
This method can only be used to
measure elapsed time and is not
related to any other notion of system
or wall-clock time. The value returned
represents nanoseconds since some
fixed but arbitrary time (perhaps in
the future, so values may be
negative). This method provides
nanosecond precision, but not
necessarily nanosecond accuracy. No
guarantees are made about how
frequently values change. Differences
in successive calls that span greater
than approximately 292.3 years (263
nanoseconds) will not accurately
compute elapsed time due to numerical
overflow.
It would seem that the only conclusion that can be drawn is that nanoTime() cannot be relied upon as an accurate value. As such, if you do not need to measure times that are mere nano seconds apart then this method is good enough even if the resulting returned value is negative. However, if you're needing higher precision, they appear to recommend that you use JAVA RTS.
So to answer your question...no nanoTime() is not useless....its just not the most prudent method to use in every situation.

Since Java 7, System.nanoTime() is guaranteed to be safe by JDK specification. System.nanoTime()'s Javadoc makes it clear that all observed invocations within a JVM (that is, across all threads) are monotonic:
The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.
JVM/JDK implementation is responsible for ironing out the inconsistencies that could be observed when underlying OS utilities are called (e. g. those mentioned in Tom Anderson's answer).
The majority of other old answers to this question (written in 2009–2012) express FUD that was probably relevant for Java 5 or Java 6 but is no longer relevant for modern versions of Java.
It's worth mentioning, however, that despite JDK guarantees nanoTime()'s safety, there have been several bugs in OpenJDK making it to not uphold this guarantee on certain platforms or under certain circumstances (e. g. JDK-8040140, JDK-8184271). There are no open (known) bugs in OpenJDK wrt nanoTime() at the moment, but a discovery of a new such bug or a regression in a newer release of OpenJDK shouldn't shock anybody.
With that in mind, code that uses nanoTime() for timed blocking, interval waiting, timeouts, etc. should preferably treat negative time differences (timeouts) as zeros rather than throw exceptions. This practice is also preferable because it is consistent with the behaviour of all timed wait methods in all classes in java.util.concurrent.*, for example Semaphore.tryAcquire(), Lock.tryLock(), BlockingQueue.poll(), etc.
Nonetheless, nanoTime() should still be preferred for implementing timed blocking, interval waiting, timeouts, etc. to currentTimeMillis() because the latter is a subject to the "time going backward" phenomenon (e. g. due to server time correction), i. e. currentTimeMillis() is not suitable for measuring time intervals at all. See this answer for more information.
Instead of using nanoTime() for code execution time measurements directly, specialized benchmarking frameworks and profilers should preferably be used, for example JMH and async-profiler in wall-clock profiling mode.

No need to debate, just use the source.
Here, SE 6 for Linux, make your own conclusions:
jlong os::javaTimeMillis() {
timeval time;
int status = gettimeofday(&time, NULL);
assert(status != -1, "linux error");
return jlong(time.tv_sec) * 1000 + jlong(time.tv_usec / 1000);
}
jlong os::javaTimeNanos() {
if (Linux::supports_monotonic_clock()) {
struct timespec tp;
int status = Linux::clock_gettime(CLOCK_MONOTONIC, &tp);
assert(status == 0, "gettime error");
jlong result = jlong(tp.tv_sec) * (1000 * 1000 * 1000) + jlong(tp.tv_nsec);
return result;
} else {
timeval time;
int status = gettimeofday(&time, NULL);
assert(status != -1, "linux error");
jlong usecs = jlong(time.tv_sec) * (1000 * 1000) + jlong(time.tv_usec);
return 1000 * usecs;
}
}

Linux corrects for discrepancies between CPUs, but Windows does not. I suggest you assume System.nanoTime() is only accurate to around 1 micro-second. A simple way to get a longer timing is to call foo() 1000 or more times and divide the time by 1000.

Absolutely not useless. Timing aficionados correctly point out the multi-core problem, but in real-word applications it is often radically better than currentTimeMillis().
When calculating graphics positions in frame refreshes nanoTime() leads to MUCH smoother motion in my program.
And I only test on multi-core machines.

I have seen a negative elapsed time reported from using System.nanoTime(). To be clear, the code in question is:
long startNanos = System.nanoTime();
Object returnValue = joinPoint.proceed();
long elapsedNanos = System.nanoTime() - startNanos;
and variable 'elapsedNanos' had a negative value. (I'm positive that the intermediate call took less than 293 years as well, which is the overflow point for nanos stored in longs :)
This occurred using an IBM v1.5 JRE 64bit on IBM P690 (multi-core) hardware running AIX. I've only seen this error occur once, so it seems extremely rare. I do not know the cause - is it a hardware-specific issue, a JVM defect - I don't know. I also don't know the implications for the accuracy of nanoTime() in general.
To answer the original question, I don't think nanoTime is useless - it provides sub-millisecond timing, but there is an actual (not just theoretical) risk of it being inaccurate which you need to take into account.

This doesn't seem to be a problem on a Core 2 Duo running Windows XP and JRE 1.5.0_06.
In a test with three threads I don't see System.nanoTime() going backwards. The processors are both busy, and threads go to sleep occasionally to provoke moving threads around.
[EDIT] I would guess that it only happens on physically separate processors, i.e. that the counters are synchronized for multiple cores on the same die.

No, it's not... It just depends on your CPU, check High Precision Event Timer for how/why things are differently treated according to CPU.
Basically, read the source of your Java and check what your version does with the function, and if it works against the CPU you will be running it on.
IBM even suggests you use it for performance benchmarking (a 2008 post, but updated).

I am linking to what essentially is the same discussion where Peter Lawrey is providing a good answer.
Why I get a negative elapsed time using System.nanoTime()?
Many people mentioned that in Java System.nanoTime() could return negative time. I for apologize for repeating what other people already said.
nanoTime() is not a clock but CPU cycle counter.
Return value is divided by frequency to look like time.
CPU frequency may fluctuate.
When your thread is scheduled on another CPU, there is a chance of getting nanoTime() which results in a negative difference. That's logical. Counters across CPUs are not synchronized.
In many cases, you could get quite misleading results but you wouldn't be able to tell because delta is not negative. Think about it.
(unconfirmed) I think you may get a negative result even on the same CPU if instructions are reordered. To prevent that, you'd have to invoke a memory barrier serializing your instructions.
It'd be cool if System.nanoTime() returned coreID where it executed.

Java is crossplatform, and nanoTime is platform-dependent. If you use Java - when don't use nanoTime. I found real bugs across different jvm implementations with this function.

The Java 5 documentation also recommends using this method for the same purpose.
This method can only be used to
measure elapsed time and is not
related to any other notion of system
or wall-clock time.
Java 5 API Doc

Also, System.currentTimeMillies() changes when you change your systems clock, while System.nanoTime() doesn't, so the latter is safer to measure durations.

nanoTime is extremely insecure for timing. I tried it out on my basic primality testing algorithms and it gave answers which were literally one second apart for the same input. Don't use that ridiculous method. I need something that is more accurate and precise than get time millis, but not as bad as nanoTime.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.