This question already has answers here:
Do not use System.out.println in server side code
(9 answers)
Closed 6 years ago.
I've got a simple program that I got from my Java programming book, just added a bit to it.
package personal;
public class SpeedTest {
public static void main(String[] args) {
double DELAY = 5000;
long startTime = System.currentTimeMillis();
long endTime = (long)(startTime + DELAY);
long index = 0;
while (true) {
double x = Math.sqrt(index);
long now = System.currentTimeMillis();
if (now >= endTime) {
break;
}
index++;
}
System.out.println(index + " loops in " + (DELAY / 1000) + " seconds.");
}
}
This returns 128478180 loops in 5.0 seconds.
If I add System.out.println(x); before the if statement, then my number of loops in 5 seconds goes down to the 400,000s, is that due to latency in the System.out.println()? Or is it just that x was not being calculated when I wasn't printing it out?
Anytime you "do output" within a very-busy loop, in any programming language whatsoever, you introduce two possibly-very-significant delays:
The data must be converted to printable characters, then be written to whatever display/device it might be going to ... and ...
"The act of outputting anything" obliges the process to synchronize itself with any-and-every-other process that might also be generating output.
One alternative strategy that is often used for this purpose is a "trace table." This is an in-memory array, of some fixed size, which contains strings. Entries are added to this table in a "round-robin" fashion: the oldest entry is continuously being replaced by the newest one. This strategy provides a history without requiring output. (The only requirement that remains is that anyone which is adding an entry to the table, or reading from it, must synchronize their activities e.g. using a mutex.)
Processes which wish to display the contents of the trace-table should grab the mutex, make an in-memory copy of the content-of-interest, then release the mutex before preparing their output. In this way, the various processes which are contributing entries to the trace-table will not be delayed by I/O-associated sources of delay.
Related
I have a generator which generates events for Flink CEP, code for which is given below. Basically, I am using Thread.sleep() and I have read somewhere that java can't sleep less than 1 millisecond even we use System.nanoTime(). Code for the generator is
public class RR_interval_Gen extends RichParallelSourceFunction<RRIntervalStreamEvent> {
Integer InputRate ; // events/second
Integer Sleeptime ;
Integer NumberOfEvents;
public RR_interval_Gen(Integer inputRate, Integer numberOfEvents ) {
this.InputRate = inputRate;
Sleeptime = 1000 / InputRate;
NumberOfEvents = numberOfEvents;
}
#Override
public void run(SourceContext<RRIntervalStreamEvent> sourceContext) throws Exception {
long currentTime;
Random random = new Random();
int RRInterval;
int Sensor_id;
for(int i = 1 ; i <= NumberOfEvents ; i++) {
Sensor_id = 2;
currentTime = System.currentTimeMillis();
// int randomNum = rand.nextInt((max - min) + 1) + min;
RRInterval = 10 + random.nextInt((20-10)+ 1);
RRIntervalStreamEvent stream = new RRIntervalStreamEvent(Sensor_id,currentTime,RRInterval);
synchronized (sourceContext.getCheckpointLock())
{
sourceContext.collect(stream);
}
Thread.sleep(Sleeptime);
}
}
#Override
public void cancel() {
}
}
I will specify my requirement here in simple words.
I want generator class to generate events, let's say an ECG stream at 1200 Hz. This generator will accept parameters like input rate and total time for which we have to generate the stream.
So far so good, the issue is that I need to send more than 1000 events / second. How can I do this by using generator function which is generating values U[10,20]?
Also please let me know if I am using wrong way to generate x number of events / second in the above below.
Sleeptime = 1000 / InputRate;
Thanks in advance
The least sleep time in Windows systems is ~ 10 ms and in Linux and Macintosh is 1 millisecond as mentioned here.
The granularity of sleep is generally bound by the thread scheduler's
interrupt period. In Linux, this interrupt period is generally 1ms in
recent kernels. In Windows, the scheduler's interrupt period is
normally around 10 or 15 milliseconds
Through my research, I learned that using the nano time sleep in java will not help as the issue in at OS level. If you want to send data at arrival rate > 1000 in a controlled way, then it can be done using Real-Time Operating Systems (RTOS), as they can sleep for less then a millisecond. Now, I have come up with another way of doing it, but in this solution, the interarrival times will not be constantly distributed.
Let's say you want arrival rate of 3000 events/ second, then you can create a for loop which iterates 3 times to send data in each iteration and then sleep for 1ms. So for the 3 tuples, the interarrival time will be close to one another, but the issue will be solved. This may be a stupid solution but it works.
Please let me know if there is some better solution to this.
I was struggling since 2 days to understand what is going on with c++ threadpool performance compared to a single thread, then I decided to do the same on java, this is when I noticed that the behaviour is same on c++ and java.. basically my code is simple straight forward.
package com.examples.threading
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class ThreadPool {
final static AtomicLong lookups = new AtomicLong(0);
final static AtomicLong totalTime = new AtomicLong(0);
public static class Task implements Runnable
{
int start = 0;
Task(int s) {
start = s;
}
#Override
public void run()
{
for (int j = start ; j < start + 3000; j++ ) {
long st = System.nanoTime();
boolean a = false;
long et = System.nanoTime();
totalTime.getAndAdd((et - st));
lookups.getAndAdd(1l);
}
}
}
public static void main(String[] args)
{
// change threads from 1 -> 100 then you will get different numbers
ExecutorService executor = Executors.newFixedThreadPool(1);
for (int i = 0; i <= 1000000; i++)
{
if (i % 3000 == 0) {
Task task = new Task(i);
executor.execute(task);
System.out.println("in time " + (totalTime.doubleValue()/lookups.doubleValue()) + " lookups: " + lookups.toString());
}
}
executor.shutdown();
while (!executor.isTerminated()) {
;
}
System.out.println("in time " + (totalTime.doubleValue()/lookups.doubleValue()) + " lookups: " + lookups.toString());
}
}
now same code when you run with different pool number say like 100 threads, the overall elapsed time will change.
one thread:
in time 36.91493612774451 lookups: 1002000
100 threads:
in time 141.47934530938124 lookups: 1002000
the question is, the code is same why the overall elapsed time is different what is exactly going on here..
You have a couple of obvious possibilities here.
One is that System.nanoTime may serialize internally, so even though each thread is making its call separately, it may internally execute those calls in sequence (and, for example, queue up calls as they come in). This is particularly likely when nanoTime directly accesses a hardware clock, such as on Windows (where it uses Windows' QueryPerformanceCounter).
Another point at which you get essentially sequential execution is your atomic variables. Even though you're using lock-free atomics, the basic fact is that each has to execute a read/modify/write as an atomic sequence. With locked variables, that's done by locking, then reading, modifying, writing, and unlocking. With lock-free, you eliminate some of the overhead in doing that, but you're still stuck with the fact that only one thread can successfully read, modify, and write a particular memory location at a given time.
In this case the only "work" each thread is doing is trivial, and the result is never used, so the optimizer can (and probably will) eliminate it entirely. So all you're really measuring is the time to read the clock and increment your variables.
To gain at least some of the speed back, you could (for one example) give thread thread its own lookups and totalTime variable. Then when all the threads finish, you can add together the values for the individual threads to get an overall total for each.
Preventing serialization of the timing is a little more difficult (to put it mildly). At least in the obvious design, each call to nanoTime directly accesses a hardware register, which (at least with most typical hardware) can only happen sequentially. It could be fixed at the hardware level (provide a high-frequency timer register that's directly readable per-core, guaranteed to be synced between cores). That's a somewhat non-trivial task, and (more importantly) most current hardware just doesn't include such a thing.
Other than that, do some meaningful work in each thread, so when you execute in multiple threads, you have something that can actually use the resources of your multiple CPUs/cores to run faster.
This is just a hypothetical question, but could be a way to get around an issue I have been having.
Imagine you want to be able to time a calculation function based not on the answer, but on the time it takes to calculating. So instead of finding out what a + b is, you wish to continue perform some calculation while time < x seconds.
Look at this pseudo code:
public static void performCalculationsForTime(int seconds)
{
// Get start time
int millisStart = System.currentTimeMillis();
// Perform calculation to find the 1000th digit of PI
// Check if the given amount of seconds have passed since millisStart
// If number of seconds have not passed, redo the 1000th PI digit calculation
// At this point the time has passed, return the function.
}
Now I know that I am horrible, despicable person for using precious CPU cycles to simple get time to pass, but what I am wondering is:
A) Is this possible and would JVM start complaining about non-responsiveness?
B) If it is possible, what calculations would be best to try to perform?
Update - Answer:
Based on the answers and comments, the answer seems to be that "Yes, this is possible. But only if it is not done in Android main UI thread, because the user's GUI will be become unresponsive and will throw an ANR after 5 seconds."
A) Is this possible and would JVM start complaining about non-responsiveness?
It is possible, and if you run it in the background, neither JVM nor Dalvik will complain.
B) If it is possible, what calculations would be best to try to perform?
If the objective is to just run any calculation for x seconds, just keep adding 1 to a sum until the required time has reached. Off the top of my head, something like:
public static void performCalculationsForTime(int seconds)
{
// Get start time
int secondsStart = System.currentTimeMillis()/1000;
int requiredEndTime = millisStart + seconds;
float sum = 0;
while(secondsStart != requiredEndTime) {
sum = sum + 0.1;
secondsStart = System.currentTimeMillis()/1000;
}
}
You can and JVM won't complain if your code is not part of some complex system that actually tracks thread execution time.
long startTime = System.currentTimeMillis();
while(System.currentTimeMillis() - startTime < 100000) {
// do something
}
Or even a for loop that checks time only every 1000 cycles.
for (int i = 0; ;i++) {
if (i % 1000 == 0 && System.currentTimeMillis() - startTime < 100000)
break;
// do something
}
As for your second question, the answer is probably calculating some value that can always be improved upon, like your PI digits example.
I went for an interview and there i faced one question which is something like this.
int RetrieveAmount(String User) Throws UnableToRetrieveAmountException() {
int amount = 0;
int amount1 = RetrieveFromSystem1(User);
int amount2 = RetrieveFromSystem2(User);
amount = amount1+amount2;
return amount;
}
RetrieveAmount is a syncronozed function
1). Modify the code such that RetrieveFromSystemX(String user) should run independent of each other i.e. if the 1st one is taking 10 seconds and second function also takes 10 second to execute then they should be run in parallel.
2). RetrieveFromSystemX() is taking more time then a time out should happen.
Could anyone give me some pointers on this.
For the first part, I can use the executors with fixed thread pool of number of threads as 2 and can have two separate locks locking each one of these functions. Now two threads can act in parallel to RetrieveAmount(). Please let me know if i am thinking in the rite direction or not.
Could anyone guide me of the 2nd part of the question.
1) Both RetrieveFromSystem1(user) and RetrieveFromSystem2(user) can have there own threads started in the body of the methods. And there is no restriction on these two to be synchronized, it should work.
2) The thread class should have code like
long startTime = System.currentTimeMillis(); while(true) { if( System.currentTimeMillis - startTime < 10000) psudeocode: getData(); else Thread.currentThread().interrupt()};
In its run()
I came across simple java program with two for loops. The question was whether these for loops will take same time to execute or first will execute faster than second .
Below is programs :
public static void main(String[] args) {
Long t1 = System.currentTimeMillis();
for (int i = 999; i > 0; i--) {
System.out.println(i);
}
t1 = System.currentTimeMillis() - t1;
Long t2 = System.currentTimeMillis();
for (int j = 0; j < 999; j++) {
System.out.println(j);
}
t2 = System.currentTimeMillis() - t2;
System.out.println("for loop1 time : " + t1);
System.out.println("for loop2 time : " + t2);
}
After executing this I found that first for loop takes more time than second. But after swapping there location the result was same that is which ever for loop written first always takes more time than the other. I was quite surprised with result. Please anybody tell me how above program works.
The time taken by either loop will be dominated by I/O (i.e. printing to screen), which is highly variable. I don't think you can learn much from your example.
The first loop will allocate 1000 Strings in memory while the second loop, regardsless of working forwards or not, can use the already pre-allocated objects.
Although working with System.out.println, any allocation should be neglible in comparison.
Long (and other primitive wrappers) has cache (look here for LongCache class) for values -128...127. It is populated at first loop run.
i think, if you are going to do a real benchmark, you should run them in different threads and use a higher value (not just 1000), no IO (printing output during execution time), and not to run them sequentially, but one by one.
i have an experience executing the same code a few times may takes different execution time.
and in my opinion, both test won't be different.