Sudoku timing irregularities - java

I wrote a Sudoku puzzle solver using brute force recursion. Now, I wanted to see how long it would take to solve 10 puzzles of similar types. So, I made a folder called easy and placed 10 "easy" puzzles in the folder. When I run the solver the first time it may take 171 ms, the second time it takes 37 ms, and the third run takes 16 ms. Why the different time for solving the exact same problems over again? Shouldn't the time be consistent?
The second problem is that is only displaying the last puzzle solved even though I tell it to repaint the screen after loading the puzzle and again after solving it. If I load just a single puzzle without solving it it will show the initial puzzle state. If I then call the Solve method the final solution is drawn on screen. Here is my method that solves multiple puzzles.
void LoadFolderAndSolve() throws FileNotFoundException {
String folderName = JOptionPane.showInputDialog("Enter folder name");
long startTime = System.currentTimeMillis();
for (int i = 1; i < 11; i++) {
String fileName = folderName + "/puzzle" + i + ".txt";
ReadPuzzle(filename); // this has a call to repaint to show the initial puzzle
SolvePuzzle(); // this has a call to repaint to show the solution
// If I put something to delay here, puzzle 1-9 is still not shown only 10.
}
long finishTime = System.currentTimeMillis();
long difference = finishTime - startTime;
System.out.println("Time in ms - " + difference);
}

The first time it runs the JVM needs to load the classes, create the objects you're using etc - it takes more time. Further, it always takes time for the JVM to "start kicking", which is why, when profiling, usually running a few thousands of loops and dividing the result to get a better estimation.
For the second problem it's impossible to help you without seeing the code, but a good guess would be that you're not "flushing" the data.

Related

Why is a particular Guava Stopwatch.elapsed() call much later than others? (output in post)

I am working on a small game project and want to track time in order to process physics. After scrolling through different approaches, at first I had decided to use Java's Instant and Duration classes and now switched over to Guava's Stopwatch implementation, however, in my snippet, both of those approaches have a big gap at the second call of runtime.elapsed(). That doesn't seem like a big problem in the long run, but why does that happen?
I have tried running the code below as both in focus and as a Thread, in Windows and in Linux (Ubuntu 18.04) and the result stays the same - the exact values differ, but the gap occurs. I am using the IntelliJ IDEA environment with JDK 11.
Snippet from Main:
public static void main(String[] args) {
MassObject[] planets = {
new Spaceship(10, 0, 6378000)
};
planets[0].run();
}
This is part of my class MassObject extends Thread:
public void run() {
// I am using StringBuilder to eliminate flushing delays.
StringBuilder output = new StringBuilder();
Stopwatch runtime = Stopwatch.createStarted();
// massObjectList = static List<MassObject>;
for (MassObject b : massObjectList) {
if(b!=this) calculateGravity(this, b);
}
for (int i = 0; i < 10; i++) {
output.append(runtime.elapsed().getNano()).append("\n");
}
System.out.println(output);
}
Stdout:
30700
1807000
1808900
1811600
1812400
1813300
1830200
1833200
1834500
1835500
Thanks for your help.
You're calling Duration.getNano() on the Duration returned by elapsed(), which isn't what you want.
The internal representation of a Duration is a number of seconds plus a nano offset for whatever additional fraction of a whole second there is in the duration. Duration.getNano() returns that nano offset, and should almost never be called unless you're also calling Duration.getSeconds().
The method you probably want to be calling is toNanos(), which converts the whole duration to a number of nanoseconds.
Edit: In this case that doesn't explain what you're seeing because it does appear that the nano offsets being printed are probably all within the same second, but it's still the case that you shouldn't be using getNano().
The actual issue is probably some combination of classloading or extra work that has to happen during the first call, and/or JIT improving performance of future calls (though I don't think looping 10 times is necessarily enough that you'd see much of any change from JIT).

SlidingWindows for slow data (big intervals) on Apache Beam

I am working with Chicago Traffic Tracker dataset, where new data is published every 15 minutes. When new data is available, it represents records off by 10-15 minutes from the "real time" (example, look for _last_updt).
For example, at 00:20, I get data timestamped 00:10; at 00:35, I get from 00:20; at 00:50, I get from 00:40. So the interval that I can get new data "fixed" (every 15 minutes), although the interval on timestamps change slightly.
I am trying to consume this data on Dataflow (Apache Beam) and for that I am playing with Sliding Windows. My idea is to collect and work on 4 consecutive datapoints (4 x 15min = 60min), and ideally update my calculation of sum/averages as soon as a new datapoint is available. For that, I've started with the code:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes());
Unfortunately, looks like when I receive a new datapoint from my input, I do not get a new (updated) result from the GroupByKey that I have after.
Is this something wrong with my SlidingWindows? Or am I missing something else?
One issue may be that the watermark is going past the end of the window, and dropping all later elements. You may try giving a few minutes after the watermark passes:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane())
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(15))
.accumulatingFiredPanes());
Let me know if this helps at all.
So #Pablo (from my understanding) gave the correct answer. But I had some suggestions that would not fit in a comment.
I wanted to ask whether you need sliding windows? From what I can tell, fixed windows would do the job for you and be computationally simpler as well. Since you are using accumulating fired panes, you don't need to use a sliding window since your next DoFn function will already be doing an average from the accumulated panes.
As for the code, I made changes to the early and late firing logic. I also suggest increasing the windowing size. Since you know the data comes every 15 minutes, you should be closing the window after 15 minutes rather than on 15 minutes. But you also don't want to pick a window which will eventually collide with multiples of 15 (like 20) because at 60 minutes you'll have the same problem. So pick a number that is co-prime to 15, for example 19. Also allow for late entries.
PCollection<TrafficData> trafficData = input
.apply("MapIntoFixedWindows", Window.<TrafficData>into(
FixedWindows.of(Duration.standardMinutes(19))
.triggering(AfterWatermark.pastEndOfWindow()
// fire the moment you see an element
.withEarlyFirings(AfterPane.elementCountAtLeast(1))
//this line is optional since you already have a past end of window and a early firing. But just in case
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(60))
.accumulatingFiredPanes());
Let me know if that solves your issue!
EDIT
So, I could not understand how you computed the above example, so I am using a generic example. Below is a generic averaging function:
public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
public static class Accum {
int sum = 0;
int count = 0;
}
#Override
public Accum createAccumulator() { return new Accum(); }
#Override
public Accum addInput(Accum accum, Integer input) {
accum.sum += input;
accum.count++;
return accum;
}
#Override
public Accum mergeAccumulators(Iterable<Accum> accums) {
Accum merged = createAccumulator();
for (Accum accum : accums) {
merged.sum += accum.sum;
merged.count += accum.count;
}
return merged;
}
#Override
public Double extractOutput(Accum accum) {
return ((double) accum.sum) / accum.count;
}
}
In order to run it you would add the line:
PCollection<Double> average = trafficData.apply(Combine.globally(new AverageFn()));
Since you are currently using accumulating firing triggers, this would be the simplest coding way to solve the solution.
HOWEVER, if you want to use a discarding fire pane window, you would need to use a PCollectionView to store the previous average and pass it as a side input to the next one in order to keep track of the values. This is a little more complex in coding but would definitely improve performance since constant work is done every window, unlike in accumulating firing.
Does this make enough sense for you to generate your own function for discarding fire pane window?

Java: Why is calling a method for the first time slower?

Recently, I was writing a plugin using Java and found that retrieving an element(using get()) from a HashMap for the first time is very slow. Originally, I wanted to ask a question on that and found this (No answers though). With further experiments, however, I notice that this phenomenon happens on ArrayList and then all the methods.
Here is the code:
public class Test {
public static void main(String[] args) {
long startTime, stopTime;
// Method 1
System.out.println("Test 1:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod1();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
// Method 2
System.out.println("Test 2:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod2();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
}
public static void testMethod1() {
// Do nothing
}
public static void testMethod2() {
// Do nothing
}
}
Snippet: Test Snippet
The output would be like this:
Test 1:
2485ns
505ns
453ns
603ns
362ns
414ns
424ns
488ns
325ns
426ns
618ns
794ns
389ns
686ns
464ns
375ns
354ns
442ns
404ns
450ns
Test 2:
3248ns
700ns
538ns
531ns
351ns
444ns
321ns
424ns
523ns
488ns
487ns
491ns
551ns
497ns
480ns
465ns
477ns
453ns
727ns
504ns
I ran the code for a few times and the results are about the same. The first call would be even longer(>8000 ns) on my computer(Windows 8.1, Oracle Java 8u25).
Apparently, the first calls is usually slower than the following calls(Some calls may be longer in random cases).
Update:
I tried to learn some JMH, and write a test program
Code w/ sample output: Code
I don't know whether it's a proper benchmark(If the program has some problems, tell me), but I found that the first warm-up iterations spend more time(I use two warm-up iterations in case the warm-ups affect the results). And I think that the first warm-up should be the first call and is slower. So this phenomenon exists, if the test is proper.
So why does it happen?
You're calling System.nanoTime() inside a loop. Those calls are not free, so in addition to the time taken for an empty method you're actually measuring the time it takes to exit from nanotime call #1 and to enter nanotime call #2.
To make things worse, you're doing that on windows where nanotime performs worse compared to other platforms.
Regarding JMH: I don't think it's much help in this situation. It's designed to measure by averaging many iterations, to avoid dead code elimination, account for JIT warmup, avoid ordering dependence, ... and afaik it simply uses nanotime under the hood too.
Its design goals pretty much aim for the opposite of what you're trying to measure.
You are measuring something. But that something might be several cache misses, nanotime call overhead, some JVM internals (class loading? some kind of lazy initialization in the interpreter?), ... probably a combination thereof.
The point is that your measurement can't really be taken at face value. Even if there is a certain cost for calling a method for the first time, the time you're measuring only provides an upper bound for that.
This kind of behaviour is often caused by the compiler or RE. It starts to optimize the execution after the first iteration. Additionally class loading can have an effect (I guess this is not the case in your example code as all classes are loaded in the first loop latest).
See this thread for a similar problem.
Please keep in mind this kind of behaviour is often dependent on the environment/OS it's running on.

Time based reservoir sampling in java?

I've devised a way to do reservoir sampling in java, the code I used is here.
I've put in a huge file to be read now, and it takes about 40 seconds to read the lot before out putting the results to screen, and then reading the lot again. The file is too big to store in memory and just pick a random sample from that.
I was hoping I could write an extra while loop in there to get it to out put my reservoirList at a set period of time, and not just after it finished scanning the file.
Something like:
long startTime = System.nanoTime();
timeElapsed = 0;
while(sc.hasNext()) //avoid end of file
do{
long currentTime = System.nanoTime();
timeElapsed = (int) TimeUnit.MILLISECONDS.convert(startTime-currentTime,
TimeUnit.NANOSECONDS);
//sampling code goes here
}while(timeElapsed%5000!=0)
return reservoirList;
} return reservoirList;
But this outputs a bunch (not the full length of my ReservoirList) of lines and then a whole stream (a few hundred?) of the same line.
Is there a more elegant way to do this? One that, perhaps, works if possible.
I've cheated. For now I'm outputting every X lines read from file, where X is large enough to give me a nice time delay between each sample. I use the count from the sampling program to work out when this is.
do {
//sampling which includes a count++
}while(count%5000!=0)
One final note, I intialise counts to 1 to stop it outputting the first ten lines as a sample.
If anyone has a better, time based, solution, let me know.

Time tracking in Java: using currentTimeMills()

I got an interesting "time-travel" problem today, using the following code:
for (int i = 0; i < 1; i++){
long start = System.currentTimeMillis();
// Some code here
System.out.print(i + "\t" + (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
// Some code here
System.out.println("\t" + (System.currentTimeMillis() - start));
}
And I got the result
0 15 -606
And it seems that it is not repeatable. Anyone has any clues on what happened inside during the running time? Just curious...
New edit: I used a small test to confirmed the answers below. I run the program and change the system time during the run, and finally repeat the "time-travel":
0 -3563323 163
Case closed. Thanks guys!
More words: both currentTimeMillis() and nanoTime() are system-timer based, so they will be not monotonic if the system timer is updated (turned back, specifically). It is better to use some internet-based timer for such cases.
System.currentTimeMillis() depends on the system time. So it could be modified by third party systems.
For measuring time is System.nanoTime() the better option.
I recall something like time adjustments made to the systems time once in a while too match actual time. And since currentTimeMillis relies on the system clock that might have happened. Also are you synchronizing with a time server, that could also be.

Categories