What is the best way to push 20 Million entities into a java map object?
Without multi-threading it is taking ~40 seconds.
Using ForkJoinPool it is taking ~25 seconds, where I have created 2 tasks and each of these tasks are pushing 10 Million entities
I believe that both these tasks are running in 2 different cores.
Question: When I create 1 task that pushes 10 Million data, it takes ~9 seconds, then when running 2 tasks where each of these tasks pushes 10 million data, why does it take ~26 seconds ? Am I doing something wrong ?
Is there a different solution for inserting 20 M data where it takes less than 10 seconds ?
Without seeing your code, the most probable cause of these bad performance results is due to the garbage collection activity. To demonstrate it, I wrote the following program:
import java.lang.management.ManagementFactory;
import java.util.*;
import java.util.concurrent.*;
public class TestMap {
// we assume NB_ENTITIES is divisible by NB_TASKS
static final int NB_ENTITIES = 20_000_000, NB_TASKS = 2;
static Map<String, String> map = new ConcurrentHashMap<>();
public static void main(String[] args) {
try {
System.out.printf("running with nb entities = %,d, nb tasks = %,d, VM args = %s%n", NB_ENTITIES, NB_TASKS, ManagementFactory.getRuntimeMXBean().getInputArguments());
ExecutorService executor = Executors.newFixedThreadPool(NB_TASKS);
int entitiesPerTask = NB_ENTITIES / NB_TASKS;
List<Future<?>> futures = new ArrayList<>(NB_TASKS);
long startTime = System.nanoTime();
for (int i=0; i<NB_TASKS; i++) {
MyTask task = new MyTask(i * entitiesPerTask, (i + 1) * entitiesPerTask - 1);
futures.add(executor.submit(task));
}
for (Future<?> f: futures) {
f.get();
}
long elapsed = System.nanoTime() - startTime;
executor.shutdownNow();
System.gc();
Runtime rt = Runtime.getRuntime();
long usedMemory = rt.maxMemory() - rt.freeMemory();
System.out.printf("processing completed in %,d ms, usedMemory after GC = %,d bytes%n", elapsed/1_000_000L, usedMemory);
} catch (Exception e) {
e.printStackTrace();
}
}
static class MyTask implements Runnable {
private final int startIdx, endIdx;
public MyTask(final int startIdx, final int endIdx) {
this.startIdx = startIdx;
this.endIdx = endIdx;
}
#Override
public void run() {
long startTime = System.nanoTime();
for (int i=startIdx; i<=endIdx; i++) {
map.put("sambit:rout:" + i, "C:\\Images\\Provision_Images");
}
long elapsed = System.nanoTime() - startTime;
System.out.printf("task[%,d - %,d], completed in %,d ms%n", startIdx, endIdx, elapsed/1_000_000L);
}
}
}
At the end of the processing, this code computes an approximation of the used memory by doing a System.gc() immediately followed by Runtime.maxMemory() - Runtime.freeMemory(). This shows that the map with 20 million entries takes approximately just under 2.2 GB, which is considerable. I have run it with 1 and 2 threads, for various values of the -Xmx and -Xms JVM arguments, here are the resulting outputs (just to be clear: 2560m = 2.5g):
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 19,999,999], completed in 11,781 ms
processing completed in 11,782 ms, usedMemory after GC = 2,379,068,760 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 9,999,999], completed in 8,269 ms
task[10,000,000 - 19,999,999], completed in 12,385 ms
processing completed in 12,386 ms, usedMemory after GC = 2,379,069,480 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms3g, -Xmx3g]
task[0 - 19,999,999], completed in 12,525 ms
processing completed in 12,527 ms, usedMemory after GC = 2,398,339,944 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms3g, -Xmx3g]
task[0 - 9,999,999], completed in 12,220 ms
task[10,000,000 - 19,999,999], completed in 12,264 ms
processing completed in 12,265 ms, usedMemory after GC = 2,382,777,776 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms4g, -Xmx4g]
task[0 - 19,999,999], completed in 7,363 ms
processing completed in 7,364 ms, usedMemory after GC = 2,402,467,040 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms4g, -Xmx4g]
task[0 - 9,999,999], completed in 5,466 ms
task[10,000,000 - 19,999,999], completed in 5,511 ms
processing completed in 5,512 ms, usedMemory after GC = 2,381,821,576 bytes
running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms8g, -Xmx8g]
task[0 - 19,999,999], completed in 7,778 ms
processing completed in 7,779 ms, usedMemory after GC = 2,438,159,312 bytes
running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms8g, -Xmx8g]
task[0 - 9,999,999], completed in 5,739 ms
task[10,000,000 - 19,999,999], completed in 5,784 ms
processing completed in 5,785 ms, usedMemory after GC = 2,396,478,680 bytes
These results can be summarized in the following table:
--------------------------------
heap | exec time (ms) for:
size (gb) | 1 thread | 2 threads
--------------------------------
2.5 | 11782 | 12386
3.0 | 12527 | 12265
4.0 | 7364 | 5512
8.0 | 7779 | 5785
--------------------------------
I also observed that, for the 2.5g and 3g heap sizes, there was a high CPU activity, with spikes at 100% during the whole processing time, due to the GC activity, whereas for 4g and 8g it is only observed at the end due to the System.gc() call.
To conclude:
if your heap is sized inappropriately, the garbage collection will kill any performance gain you would hope to obtain. You should make it large enough to avoid the side effects of long GC pauses.
you must also be aware that using a concurrent collection such as ConcurrentHashMap has a significant performance overhead. To illustrate this, I slightly modified the code so that each task uses its own HashMap, then at the end all the maps are aggregated (with Map.putAll()) in the map of the first task. The processing time fell to around 3200 ms
An addition probably takes one CPU cycle, so if your CPU runs at 3GHz, that's 0.3 nanoseconds. Do it 20M times and that becomes 6000000 nanoseconds or 6 milliseconds. So your measurement is more affected by the overhead of starting threads, thread switching, JIT compilation etc. than by the operation you are
trying to measure.
Garbage collection may also play a role as it may slow you down.
I suggest you use a specialized library for micro benchmarking, such as jmh.
Thanks to assylias's post which helped me write my response
While I have not tried multiple threads, I did try all 7 appropriate Map types of the 10 provided by Java 11.
My results were all substantially faster than your reported 25 to 40 seconds. My results for 20,000,000 entries of < String , UUID > is more like 3-9 seconds for any of the 7 map classes.
I am using Java 13 on:
Model Name: Mac mini
Model Identifier: Macmini8,1
Processor Name: Intel Core i5
Processor Speed: 3 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 9 MB
Memory: 32 GB
Preparing.
size of instants: 20000000
size of uuids: 20000000
Running test.
java.util.HashMap took: PT3.645250368S
java.util.WeakHashMap took: PT3.199812894S
java.util.TreeMap took: PT8.97788412S
java.util.concurrent.ConcurrentSkipListMap took: PT7.347253106S
java.util.concurrent.ConcurrentHashMap took: PT4.494560252S
java.util.LinkedHashMap took: PT2.78054883S
java.util.IdentityHashMap took: PT5.608737472S
My code:
System.out.println( "Preparing." );
int limit = 20_000_000; // 20_000_000
Set < String > instantsSet = new TreeSet <>(); // Use `Set` to forbid duplicates.
List < UUID > uuids = new ArrayList <>( limit );
while ( instantsSet.size() < limit )
{
instantsSet.add( Instant.now().toString() );
}
List < String > instants = new ArrayList <>( instantsSet );
for ( int i = 0 ; i < limit ; i++ )
{
uuids.add( UUID.randomUUID() );
}
System.out.println( "size of instants: " + instants.size() );
System.out.println( "size of uuids: " + uuids.size() );
System.out.println( "Running test." );
// Using 7 of the 10 `Map` implementations bundled with Java 11.
// Omitting `EnumMap`, as it requires enums for the key.
// Omitting `Map.of` because it is for literals.
// Omitting `HashTable` because it is outmoded, replaced by `ConcurrentHashMap`.
List < Map < String, UUID > > maps = List.of(
new HashMap <>( limit ) ,
new WeakHashMap <>( limit ) ,
new TreeMap <>() ,
new ConcurrentSkipListMap <>() ,
new ConcurrentHashMap <>( limit ) ,
new LinkedHashMap <>( limit ) ,
new IdentityHashMap <>( limit )
);
for ( Map < String, UUID > map : maps )
{
long start = System.nanoTime();
for ( int i = 0 ; i < instants.size() ; i++ )
{
map.put( instants.get( i ) , uuids.get( i ) );
}
long stop = System.nanoTime();
Duration d = Duration.of( stop - start , ChronoUnit.NANOS );
System.out.println( map.getClass().getName() + " took: " + d );
// Free up memory.
map = null;
System.gc(); // Request garbage collector do its thing. No guarantee!
try
{
Thread.sleep( TimeUnit.SECONDS.toMillis( 4 ) ); // Wait for garbage collector to hopefully finish. No guarantee!
}
catch ( InterruptedException e )
{
e.printStackTrace();
}
}
System.out.println("Done running test.");
And here is a table I wrote comparing the various Map implementations.
I'd an interview yesterday. I couldn't figure out a solution to one programming problem and I'd like to get some ideas here. The problem is:
I need to implement a TimeWindowBuffer in Java, which stores the number a user continuously receives as time goes on. The buffer has a maxBufferSize. The user wants to know the average value of the past several seconds, a timeWindow passed in by user (so this is a sliding window). We could get the current time from the system (e.g. System.currentTimeMills() in Java). The TimeWindowBuffer class is like this:
public class TimeWindowBuffer {
private int maxBufferSize;
private int timeWindow;
public TimwWindowBuffer(int maxBufferSize, int timeWindow) {
this.maxBufferSize = maxBufferSize;
this.timeWindow = timeWindow;
}
public void addValue(long value) {
...
}
public double getAvg() {
...
return average;
}
// other auxiliary methods
}
Example:
Say, a user receive a number every second (the user may not receive a number at a certain rate) and wants to know the average value of the past 5 seconds.
Input:
maxBufferSize = 5, timeWindow = 5 (s)
numbers={-5 4 -8 -8 -8 1 6 1 8 5}
Output (I list the formula here for illustration but the user only needs the result)
:
-5 / 1 (t=1)
(-5 + 4) / 2 (t=2)
(-5 + 4 - 8) / 3 (t=3)
(-5 + 4 - 8 - 8) / 4 (t=4)
(-5 + 4 - 8 - 8 - 8) / 5 (t=5)
(4 - 8 - 8 - 8 + 1) / 5 (t=6)
(-8 - 8 - 8 + 1 + 6) / 5 (t=7)
(-8 - 8 + 1 + 6 + 1) / 5 (t=8)
(-8 + 1 + 6 + 1 + 8) / 5 (t=9)
(1 + 6 + 1 + 8 + 5) / 5 (t=10)
Since the data structure of the TimeWindowBuffer is not specified, I've been thinking about keeping a pair of value and its added time. So my declaration of underlying buffer is like this:
private ArrayList<Pair> buffer = new ArrayList<Pair>(maxBufferSize);
where
class Pair {
private long value;
private long time;
...
}
Since the Pair is added in time order, I could do a binary search on the list and calculate the average of the numbers that fall into the timeWindow. The problem is the buffer has a maxBufferSize (although ArrayList doesn't) and I have to remove the oldest value when the buffer is full. And that value could still satisfy the timeWindow but now it goes off the record and I will never know when it expires.
I'm stuck here for the current.
I don't need a direct answer but have some discussion or ideas here. Please let me now if there are any confusions about the problem and my description.
I enjoy little puzzles like this. I did not compile this code, nor did I take into account all the things you would have to for production usage. Like I did not design a way to set a missed value to 0 - i.e. if a value does not come in at every tick.
But this will give you another way to think of it....
public class TickTimer
{
private int tick = 0;
private java.util.Timer timer = new java.util.Timer();
public TickTimer(double timeWindow)
{
timer.scheduleAtFixedRate(new TickerTask(),
0, // initial delay
Math.round(1000/timeWindow)); // interval
}
private class TickerTask extends TimerTask
{
public void run ()
{
tick++;
}
}
public int getTicks()
{
return tick;
}
}
public class TimeWindowBuffer
{
int buffer[];
TickTimer timer;
final Object bufferSync = new Object();
public TimeWindowBuffer(int maxBufferSize, double timeWindow)
{
buffer = new int[maxBufferSize];
timer = TickTimer(timeWindow);
}
public boolean add(int value)
{
synchronize(bufferSync)
{
buffer[timer.getTicks() % maxBufferSize] = value;
}
}
public int averageValue()
{
int average = 0;
synchronize(bufferSync)
{
for (int i: buffer)
{
average += i;
}
}
return average/maxBufferSize;
}
}
Your question could be summarized as using constant memory to compute some statistics on a stream.
To me it's a heap (priority queue) with time as the key and value as the value, and least time on the top.
When you receive a new (time,value), add it to the heap. If the heap size is greater than the buffer size, just remove the root node in the heap, until the heap is small enough.
Also by using a heap you can get the minimum time in the buffer (i.e. the heap) in O(1) time, so just remove the root (the node with the minimum time) until all out-dated pairs are cleared.
For statistics, keep an integer sum. When you add a new pair to the heap, sum = sum + value of pair. When you remove the root from the heap, sum = sum - value of root.
I am running this code and getting unexpected results. I expect that the loop which adds the primitives would perform much faster, but the results do not agree.
import java.util.*;
public class Main {
public static void main(String[] args) {
StringBuilder output = new StringBuilder();
long start = System.currentTimeMillis();
long limit = 1000000000; //10^9
long value = 0;
for(long i = 0; i < limit; ++i){}
long i;
output.append("Base time\n");
output.append(System.currentTimeMillis() - start + "ms\n");
start = System.currentTimeMillis();
for(long j = 0; j < limit; ++j) {
value = value + j;
}
output.append("Using longs\n");
output.append(System.currentTimeMillis() - start + "ms\n");
start = System.currentTimeMillis();
value = 0;
for(long k = 0; k < limit; ++k) {
value = value + (new Long(k));
}
output.append("Using Longs\n");
output.append(System.currentTimeMillis() - start + "ms\n");
System.out.print(output);
}
}
Output:
Base time
359ms
Using longs
1842ms
Using Longs
614ms
I have tried running each individual test in it's own java program, but the results are the same. What could cause this?
Small detail: running java 1.6
Edit:
I asked 2 other people to try out this code, one gets the same exact strange results that I get. The other gets results that actually make sense! I asked the guy who got normal results to give us his class binary. We run it and we STILL get the strange results. The problem is not at compile time (I think). I'm running 1.6.0_31, the guy who gets normal results is on 1.6.0_16, the guy who gets strange results like I do is on 1.7.0_04.
Edit: Get same results with a Thread.sleep(5000) at the start of program. Also get the same results with a while loop around the whole program (to see if the times would converge to normal times after java was fully started up)
I suspect that this is a JVM warmup effect. Specifically, the code is being JIT compiled at some point, and this is distorting the times that you are seeing.
Put the whole lot in a loop, and ignore the times reported until they stabilize. (But note that they won't entirely stabilize. Garbage is being generated, and therefore the GC will need to kick occasionally. This is liable to distort the timings, at least a bit. The best way to deal with this is to run a huge number of iterations of the outer loop, and calculate / display the average times.)
Another problem is that the JIT compiler on some releases of Java may be able to optimize away the stuff you are trying to test:
It could figure out that the creation and immediate unboxing of the Long objects could be optimized away. (Thanks Louis!)
It could figure out that the loops are doing "busy work" ... and optimize them away entirely. (The value of value is not used once each loop ends.)
FWIW, it is generally recommended that you use Long.valueOf(long) rather than new Long(long) because the former can make use of a cached Long instance. However, in this case, we can predict that there will be a cache miss in all but the first few loop iterations, so the recommendation is not going to help. If anything, it is likely to make the loop in question slower.
UPDATE
I did some investigation of my own, and ended up with the following:
import java.util.*;
public class Main {
public static void main(String[] args) {
while (true) {
test();
}
}
private static void test() {
long start = System.currentTimeMillis();
long limit = 10000000; //10^9
long value = 0;
for(long i = 0; i < limit; ++i){}
long t1 = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for(long j = 0; j < limit; ++j) {
value = value + j;
}
long t2 = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for(long k = 0; k < limit; ++k) {
value = value + (new Long(k));
}
long t3 = System.currentTimeMillis() - start;
System.out.print(t1 + " " + t2 + " " + t3 + " " + value + "\n");
}
}
which gave me the following output.
28 58 2220 99999990000000
40 58 2182 99999990000000
36 49 157 99999990000000
34 51 157 99999990000000
37 49 158 99999990000000
33 52 158 99999990000000
33 50 159 99999990000000
33 54 159 99999990000000
35 52 159 99999990000000
33 52 159 99999990000000
31 50 157 99999990000000
34 51 156 99999990000000
33 50 159 99999990000000
Note that the first two columns are pretty stable, but the third one shows a significant speedup on the 3rd iteration ... probably indicating that JIT compilation has occurred.
Interestingly, before I separated out the test into a separate method, I didn't see the speedup on the 3rd iteration. The numbers all looked like the first two rows. And that seems to be saying that the JVM (that I'm using) won't JIT compile a method that is currently executing ... or something like that.
Anyway, this demonstrates (to me) that there should be a warm up effect. If you don't see a warmup effect, your benchmark is doing something that is inhibiting JIT compilation ... and therefore isn't meaningful for real applications.
I'm surprised, too.
My first guess would have been inadvertant "autoboxing", but that's clearly not an issue in your example code.
This link might give a clue:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Long.html
valueOf
public static Long valueOf(long l)
Returns a Long instance representing the specified long value. If a new Long instance is not required, this method should generally be
used in preference to the constructor Long(long), as this method is
likely to yield significantly better space and time performance by
caching frequently requested values.
Parameters:
l - a long value.
Returns:
a Long instance representing l.
Since:
1.5
But yes, I would expect using a wrapper (e.g. "Long") to take MORE time, and MORE space. I would not expect using the wrapper to be three times FASTER!
================================================================================
ADDENDUM:
I got these results with your code:
Base time 6878ms
Using longs 10515ms
Using Longs 428022ms
I'm running JDK 1.6.0_16 on a pokey 32-bit, single-core CPU.
OK - here's a slightly different version, along with my results (running JDK 1.6.0_16 pokey 32-bit single-code CPU):
import java.util.*;
/*
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 343 896 3431 6025
1 342 957 3401 5796
2 342 881 3379 5742
*/
public class LongTest {
private static int limit = 100000000;
private static int ntimes = 3;
private static final long[] base = new long[ntimes];
private static final long[] primitives = new long[ntimes];
private static final long[] wrappers1 = new long[ntimes];
private static final long[] wrappers2 = new long[ntimes];
private static void test_base (int idx) {
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){}
base[idx] = System.currentTimeMillis() - start;
}
private static void test_primitive (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + i;
}
primitives[idx] = System.currentTimeMillis() - start;
}
private static void test_wrappers1 (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + new Long(i);
}
wrappers1[idx] = System.currentTimeMillis() - start;
}
private static void test_wrappers2 (int idx) {
long value = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < limit; ++i){
value = value + Long.valueOf(i);
}
wrappers2[idx] = System.currentTimeMillis() - start;
}
public static void main(String[] args) {
for (int i=0; i < ntimes; i++) {
test_base (i);
test_primitive(i);
test_wrappers1 (i);
test_wrappers2 (i);
}
System.out.println ("Test Base longs Longs/new Longs/valueOf");
System.out.println ("---- ---- ----- --------- -------------");
for (int i=0; i < ntimes; i++) {
System.out.printf (" %2d %6d %6d %6d %6d\n",
i, base[i], primitives[i], wrappers1[i], wrappers2[i]);
}
}
}
=======================================================================
5.28.2012:
Here are some additional timings, from a faster (but still modest), dual-core CPU running Windows 7/64 and running the same JDK revision 1.6.0_16:
/*
PC 1: limit = 100,000,000, ntimes = 3, JDK 1.6.0_16 (32-bit):
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 343 896 3431 6025
1 342 957 3401 5796
2 342 881 3379 5742
PC 2: limit = 1,000,000,000, ntimes = 5,JDK 1.6.0_16 (64-bit):
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 3 2 5627 5573
1 0 0 5494 5537
2 0 0 5475 5530
3 0 0 5477 5505
4 0 0 5487 5508
PC 2: "for loop" counters => long; limit = 10,000,000,000, ntimes = 5:
Test Base longs Longs/new Longs/valueOf
---- ---- ----- --------- -------------
0 6278 6302 53713 54064
1 6273 6286 53547 53999
2 6273 6294 53606 53986
3 6274 6325 53593 53938
4 6274 6279 53566 53974
*/
You'll notice:
I'm not using StringBuilder, and I separate out all of the I/O until the end of the program.
"long" primtive is consistently equivalent to a "no-op"
"Long" wrappers are consistently much, much slower
"new Long()" is slightly faster than "Long.valueOf()"
Changing the loop counters from "int" to "long" makes the first two columns ("base" and "longs" much slower.
"JIT warmup" is negligible after the the first few iterations...
... provided I/O (like System.out) and potentially memory-intensive activities (like StringBuilder) are moved outside of the actual test sections.
i gotta question about Java Serialization.
I'm simply writing out 10 arrays of size int[] array = new int[2^28] to my harddik (i know that's kinda big, but i need it that way) using a FileOutputStream and a BufferedOutputStream in combination with a Dataoutputstream. Before each serialization i create a new FileOutputstream and all the other streams and afterwards i close and flush my streams.
Problem:
The first serialization takes about 2 seconds, afterwards it increases up tp 17seconds and stays on this level. What's the problem here? If i go into the code i can see that the FileOutputStreams take a huge amount of time for writeByte(...). Is this due to the HDD caching (full)? How can i avoid this? Can i clear it?
Here is my simple code:
public static void main(String[] args) throws IOException {
System.out.println("### Starting test");
for (int k = 0; k < 10; k++) {
System.out.println("### Run nr ... " + k);
// Creating the test array....
int[] testArray = new int[(int) Math.pow(2, 28)];
for (int i = 0; i < testArray.length; i++) {
if (i % 2 == 0) {
testArray[i] = i;
}
}
BufferedDataOutputStream dataOut = new BufferedDataOutputStream(
new FileOutputStream("e:\\test" + k + "_" + 28 + ".dat"));
// Serializing...
long start = System.nanoTime();
dataOut.write(testArray);
System.out.println((System.nanoTime() - start) / 1000000000.0
+ " s");
dataOut.flush();
dataOut.close();
}
}
where dataOut.write(int[], 0, end)
public void write(int[] i, int start, int len) throws IOException {
for (int ii = start; ii < start + len; ii += 1) {
if (count + 4 > buf.length) {
checkBuf(4);
}
buf[count++] = (byte) (i[ii] >>> 24);
buf[count++] = (byte) (i[ii] >>> 16);
buf[count++] = (byte) (i[ii] >>> 8);
buf[count++] = (byte) (i[ii]);
}
}
and `protected void checkBuf(int need) throws IOException {
if (count + need > buf.length) {
out.write(buf, 0, count);
count = 0;
}
}`
BufferedDataOutputStream extends BufferedOutputStream comes along with the fits framework. It simply combines the BufferedOutputStream with the DataOutputStream to reduce the number of method calls when you write big arrays (which makes it a lot faster... up to 10 times ...).
Here is the output:
Starting benchmark
STARTING RUN 0
2.001972271
STARTING RUN 1
1.986544604
STARTING RUN 2
15.663881232
STARTING RUN 3
17.652161328
STARTING RUN 4
18.020969301
STARTING RUN 5
11.647542466
STARTING RUN 6
Why the time is so much increasing?
Thank you,
Eeth
In this program I populate 1 GB as int values and "force" these to be written to disk.
String dir = args[0];
for (int i = 0; i < 24; i++) {
long start = System.nanoTime();
File tmp = new File(dir, "deleteme." + i);
tmp.deleteOnExit();
RandomAccessFile raf = new RandomAccessFile(tmp, "rw");
final MappedByteBuffer map = raf.getChannel().map(FileChannel.MapMode.READ_WRITE, 0, 1 << 30);
IntBuffer array = map.order(ByteOrder.nativeOrder()).asIntBuffer();
for (int n = 0; n < array.capacity(); n++)
array.put(n, n);
map.force();
((DirectBuffer) map).cleaner().clean();
raf.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.1f seconds to write 1 GB%n", time / 1e9);
}
with each file forced to disk, they take about the same amount of time each.
Took 7.7 seconds to write 1 GB
Took 7.5 seconds to write 1 GB
Took 7.7 seconds to write 1 GB
Took 7.9 seconds to write 1 GB
Took 7.6 seconds to write 1 GB
Took 7.7 seconds to write 1 GB
However, if I comment out the map.force(); I see this profile.
Took 0.8 seconds to write 1 GB
Took 1.0 seconds to write 1 GB
Took 4.9 seconds to write 1 GB
Took 7.2 seconds to write 1 GB
Took 7.0 seconds to write 1 GB
Took 7.2 seconds to write 1 GB
Took 7.2 seconds to write 1 GB
It appears that it will buffer about 2.5 GB which is about 10% of my main memory before it slows down.
You can clear you cache by waiting for the previous writes to finish.
Basically you have 1 GB of data and the sustain write speed of your disk appears to be about 60 MB/s which is reasonable for a SATA Hard Drive. If you get speeds higher than this its because the data hasn't really written to disk and is actually in memory.
If you want to this to be faster you can use a memory mapped file. This has the benefit of writing to disk in background as you are populating the "array" i.e. it can be finished writing almost as soon as you finish setting the values.
Another option is to get a faster drive. A single 250 GB SSD drive can sustain writes of around 200 MB/s. Using multiple drives in a RAID configuration can also increase write speed.
The first writes may just be filling up your hard drive's cache without actually writing to disk yet.