MultiThread runs slower than single process

MultiThread runs slower than single process - java

for an assignment in school I was asked to create a simple program that creates 1000 text files, each with a random amount of lines, count how many lines are there via multi-thread\single process. than delete those files.
now a strange thing happens during testing - linear counting of all files is always a bit faster than counting them in a multi-threaded way which has sparked quite the academic theorizing session within my classroom circle.
when using Scanner to read all files, everything works as intended - 1000 files are read at around 500ms linear time and 400ms threaded time
yet when i use BufferedReader times drop to around 110ms linear and 130ms threaded.
which part of the code causes this bottleneck and why?
EDIT: Just to clarify, I'm not asking why does Scanner works slower than BufferedReader.
the full compile-able code: (though you should change the file creation path output)
import java.io.*;
import java.util.Random;
import java.util.Scanner;
/**
* Builds text files with random amount of lines and counts them with
* one process or multi-threading.
* #author Hazir
*/// CLASS MATALA_4A START:
public class Matala_4A {
/* Finals: */
private static final String MSG = "Hello World";
/* Privates: */
private static int count;
private static Random rand;
/* Private Methods: */ /**
* Increases the random generator.
* #return The new random value.
*/
private static synchronized int getRand() {
return rand.nextInt(1000);
}
/**
* Increments the lines-read counter by a value.
* #param val The amount to be incremented by.
*/
private static synchronized void incrementCount(int val) {
count+=val;
}
/**
* Sets lines-read counter to 0 and Initializes random generator
* by the seed - 123.
*/
private static void Initialize() {
count=0;
rand = new Random(123);
}
/* Public Methods: */ /**
* Creates n files with random amount of lines.
* #param n The amount of files to be created.
* #return String array with all the file paths.
*/
public static String[] createFiles(int n) {
String[] array = new String[n];
for (int i=0; i<n; i++) {
array[i] = String.format("C:\\Files\\File_%d.txt", i+1);
try ( // Try with Resources:
FileWriter fw = new FileWriter(array[i]);
PrintWriter pw = new PrintWriter(fw);
) {
int numLines = getRand();
for (int j=0; j<numLines; j++) pw.println(MSG);
} catch (IOException ex) {
System.err.println(String.format("Failed Writing to file: %s",
array[i]));
}
}
return array;
}
/**
* Deletes all the files who's file paths are specified
* in the fileNames array.
* #param fileNames The files to be deleted.
*/
public static void deleteFiles(String[] fileNames) {
for (String fileName : fileNames) {
File file = new File(fileName);
if (file.exists()) {
file.delete();
}
}
}
/**
* Creates numFiles amount of files.<br>
* Counts how many lines are in all the files via Multi-threading.<br>
* Deletes all the files when finished.
* #param numFiles The amount of files to be created.
*/
public static void countLinesThread(int numFiles) {
Initialize();
/* Create Files */
String[] fileNames = createFiles(numFiles);
Thread[] running = new Thread[numFiles];
int k=0;
long start = System.currentTimeMillis();
/* Start all threads */
for (String fileName : fileNames) {
LineCounter thread = new LineCounter(fileName);
running[k++] = thread;
thread.start();
}
/* Join all threads */
for (Thread thread : running) {
try {
thread.join();
} catch (InterruptedException e) {
// Shouldn't happen.
}
}
long end = System.currentTimeMillis();
System.out.println(String.format("threads time = %d ms, lines = %d",
end-start,count));
/* Delete all files */
deleteFiles(fileNames);
}
#SuppressWarnings("CallToThreadRun")
/**
* Creates numFiles amount of files.<br>
* Counts how many lines are in all the files in one process.<br>
* Deletes all the files when finished.
* #param numFiles The amount of files to be created.
*/
public static void countLinesOneProcess(int numFiles) {
Initialize();
/* Create Files */
String[] fileNames = createFiles(numFiles);
/* Iterate Files*/
long start = System.currentTimeMillis();
LineCounter thread;
for (String fileName : fileNames) {
thread = new LineCounter(fileName);
thread.run(); // same process
}
long end = System.currentTimeMillis();
System.out.println(String.format("linear time = %d ms, lines = %d",
end-start,count));
/* Delete all files */
deleteFiles(fileNames);
}
public static void main(String[] args) {
int num = 1000;
countLinesThread(num);
countLinesOneProcess(num);
}
/**
* Auxiliary class designed to count the amount of lines in a text file.
*/// NESTED CLASS LINECOUNTER START:
private static class LineCounter extends Thread {
/* Privates: */
private String fileName;
/* Constructor: */
private LineCounter(String fileName) {
this.fileName=fileName;
}
/* Methods: */
/**
* Reads a file and counts the amount of lines it has.
*/ #Override
public void run() {
int count=0;
try ( // Try with Resources:
FileReader fr = new FileReader(fileName);
//Scanner sc = new Scanner(fr);
BufferedReader br = new BufferedReader(fr);
) {
String str;
for (str=br.readLine(); str!=null; str=br.readLine()) count++;
//for (; sc.hasNext(); sc.nextLine()) count++;
incrementCount(count);
} catch (IOException e) {
System.err.println(String.format("Failed Reading from file: %s",
fileName));
}
}
} // NESTED CLASS LINECOUNTER END;
} // CLASS MATALA_4A END;

The bottleneck is the disk.
You can access to the disk only with one thread per time, so using multiple threads doesn't help and instead the overtime needed for the thread switching will slow your global performances.
Using multithread is interesting only if you need to split your work waiting for long I/O operations on different sources (for example network and disk, or two different disks, or many network streams) or if you have a cpu intensive operation that can be splitted between different cores.
Remember that for a good multithreading program you need always to take in consideration:
switch context time between threads
long I/O operations can be done in parallel or not
intensive cpu time for computations is present or not
cpu computations can be splitted in subproblems or not
complexity to share data between threads (semaphores or synchronization)
difficult to read, write and manage a multithread code compared to a single thread application

There can be different factors:
Most important is avoiding disk access from multiple threads at the same time (but since you are on SSD, you might get away with that). On a normal harddisk however, switching from one file to another could cost you 10ms seek time (depending on how the data is cached).
1000 threads is too much, try to use number of cores * 2. Too much time will be lost switching contexts only.
Try using a thread pool. Total times are between 110ms and 130ms, part of that will be from creating threads.
Do some more work in the test in general. Timing 110ms isn't always that accurate. Also depends on what other processes or threads are running at that time.
Try to switch the order of your tests to see if it makes a difference (caching could be an important factor)
countLinesThread(num);
countLinesOneProcess(num);
Also, depending on the system, currentTimeMillis() might have a resolution of 10 to 15ms. So it isn't very accurate to time short runs.
long start = System.currentTimeMillis();
long end = System.currentTimeMillis();

The number of Threads used is very important. a single process trying to switch between 1000 threads(you have created a new thread per file) is probably the main reason for being slower.
try to use let's say 10 threads to read 1000 files, then you'll see the noticeable speed increase

If the actual time needed for the computation is negligible compared to the time needed for I/O, potential multi-threding benefits are negligible as well: One thread is well able to saturate the I/O and will then do a very quick computation; more threads cannot accelerate things much. Instead, the usual threading overheads will apply, plus possibly a locking penalty in the I/O implementation actually decreasing throughput.
I think the potential benefits are greatest when the CPU time needed to deal with a data chunk is long compared to the time to obtain it from disk. In that case all threads but the currently reading one (if any) can compute, and execution speed should scale nicely with the number of cores. Try checking large prime number candidates from a file or cracking encrypted lines (which, kindof, amounts to the same thing, silly enough).

Related

OpenCL kernel slower than normal Java loop

I've been looking into OpenCL for use with optimizing code and running tasks in parallel to achieve greater speed over pure Java. Now I'm having a bit of an issue.
I've put together a Java program using LWJGL, which as far as I can tell,should be able to do nearly identical tasks -- in this case adding elements from two arrays together and storing the result in another array -- two separate ways: one with pure Java, and the other with an OpenCL Kernel. I'm using System.currentTimeMillis() to keep track of how long each one takes for arrays with a large number of elements(~10,000,000). For whatever reason, the pure java loop seems to be executing around 3 to 10 times, depending on array size, faster than the CL program. My code is as follows(imports omitted):
public class TestCL {
private static final int SIZE = 9999999; //Size of arrays to test, this value is changed sometimes in between tests
private static CLContext context; //CL Context
private static CLPlatform platform; //CL platform
private static List<CLDevice> devices; //List of CL devices
private static CLCommandQueue queue; //Command Queue for context
private static float[] aData, bData, rData; //float arrays to store test data
//---Kernel Code---
//The actual kernel script is here:
//-----------------
private static String kernel = "kernel void sum(global const float* a, global const float* b, global float* result, int const size){\n" +
"const int itemId = get_global_id(0);\n" +
"if(itemId < size){\n" +
"result[itemId] = a[itemId] + b[itemId];\n" +
"}\n" +
"}";;
public static void main(String[] args){
aData = new float[SIZE];
bData = new float[SIZE];
rData = new float[SIZE]; //Only used for CPU testing
//arbitrary testing data
for(int i=0; i<SIZE; i++){
aData[i] = i;
bData[i] = SIZE - i;
}
try {
testCPU(); //How long does it take running in traditional Java code on the CPU?
testGPU(); //How long does the GPU take to run it w/ CL?
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* Test the CPU with pure Java code
*/
private static void testCPU(){
long time = System.currentTimeMillis();
for(int i=0; i<SIZE; i++){
rData[i] = aData[i] + bData[i];
}
//Print the time FROM THE START OF THE testCPU() FUNCTION UNTIL NOW
System.out.println("CPU processing time for " + SIZE + " elements: " + (System.currentTimeMillis() - time));
}
/**
* Test the GPU with OpenCL
* #throws LWJGLException
*/
private static void testGPU() throws LWJGLException {
CLInit(); //Initialize CL and CL Objects
//Create the CL Program
CLProgram program = CL10.clCreateProgramWithSource(context, kernel, null);
int error = CL10.clBuildProgram(program, devices.get(0), "", null);
Util.checkCLError(error);
//Create the Kernel
CLKernel sum = CL10.clCreateKernel(program, "sum", null);
//Error checker
IntBuffer eBuf = BufferUtils.createIntBuffer(1);
//Floatbuffer for the first array of floats
FloatBuffer aBuf = BufferUtils.createFloatBuffer(SIZE);
aBuf.put(aData);
aBuf.rewind();
CLMem aMem = CL10.clCreateBuffer(context, CL10.CL_MEM_WRITE_ONLY | CL10.CL_MEM_COPY_HOST_PTR, aBuf, eBuf);
Util.checkCLError(eBuf.get(0));
//And the second
FloatBuffer bBuf = BufferUtils.createFloatBuffer(SIZE);
bBuf.put(bData);
bBuf.rewind();
CLMem bMem = CL10.clCreateBuffer(context, CL10.CL_MEM_WRITE_ONLY | CL10.CL_MEM_COPY_HOST_PTR, bBuf, eBuf);
Util.checkCLError(eBuf.get(0));
//Memory object to store the result
CLMem rMem = CL10.clCreateBuffer(context, CL10.CL_MEM_READ_ONLY, SIZE * 4, eBuf);
Util.checkCLError(eBuf.get(0));
//Get time before setting kernel arguments
long time = System.currentTimeMillis();
sum.setArg(0, aMem);
sum.setArg(1, bMem);
sum.setArg(2, rMem);
sum.setArg(3, SIZE);
final int dim = 1;
PointerBuffer workSize = BufferUtils.createPointerBuffer(dim);
workSize.put(0, SIZE);
//Actually running the program
CL10.clEnqueueNDRangeKernel(queue, sum, dim, null, workSize, null, null, null);
CL10.clFinish(queue);
//Write results to a FloatBuffer
FloatBuffer res = BufferUtils.createFloatBuffer(SIZE);
CL10.clEnqueueReadBuffer(queue, rMem, CL10.CL_TRUE, 0, res, null, null);
//How long did it take?
//Print the time FROM THE SETTING OF KERNEL ARGUMENTS UNTIL NOW
System.out.println("GPU processing time for " + SIZE + " elements: " + (System.currentTimeMillis() - time));
//Cleanup objects
CL10.clReleaseKernel(sum);
CL10.clReleaseProgram(program);
CL10.clReleaseMemObject(aMem);
CL10.clReleaseMemObject(bMem);
CL10.clReleaseMemObject(rMem);
CLCleanup();
}
/**
* Initialize CL objects
* #throws LWJGLException
*/
private static void CLInit() throws LWJGLException {
IntBuffer eBuf = BufferUtils.createIntBuffer(1);
CL.create();
platform = CLPlatform.getPlatforms().get(0);
devices = platform.getDevices(CL10.CL_DEVICE_TYPE_GPU);
context = CLContext.create(platform, devices, eBuf);
queue = CL10.clCreateCommandQueue(context, devices.get(0), CL10.CL_QUEUE_PROFILING_ENABLE, eBuf);
Util.checkCLError(eBuf.get(0));
}
/**
* Cleanup after CL completion
*/
private static void CLCleanup(){
CL10.clReleaseCommandQueue(queue);
CL10.clReleaseContext(context);
CL.destroy();
}
}
Here are a few example console results from various tests:
CPU processing time for 10000000 elements: 24
GPU processing time for 10000000 elements: 88
CPU processing time for 1000000 elements: 7
GPU processing time for 1000000 elements: 10
CPU processing time for 100000000 elements: 193
GPU processing time for 100000000 elements: 943
Is there something wrong with my coding that's causing the CL to take faster, or is that actually to be expected in cases such as this? If the case is the latter, then when is CL preferable?

I revised the test to do something which I believe is more computationally expensive than simple addition.
Regarding the CPU test, the line:
rData[i] = aData[i] + bData[i];
was changed to:
rData[i] = (float)(Math.sin(aData[i]) * Math.cos(bData[i]));
And in the CL kernel, the line:
result[itemId] = a[itemId] + b[itemId];
was changed to:
result[itemId] = sin(a[itemId]) * cos(b[itemId]);
I'm now getting console results such as:
CPU processing time for 1000000 elements: 154
GPU processing time for 1000000 elements: 11
CPU processing time for 10000000 elements: 8699
GPU processing time for 10000000 elements: 98
(The CPU is taking longer than I'd like to bother with for tests of 100000000 elements.)
For checking accuracy, I added checks that compare an arbitrary element of rData and res to ensure they're the same. I omitted the result here, as it should suffice to say that they were equal.
Now that the function is more complicated(two trigonometric functions being multiplied together), it appears that the CL kernel is much more efficient than the pure Java loop.

Java multi-threading programme not using a lot of CPU

I am beginner in programming and Java, and this is my first multi-core program. The problem is that my program never uses more than 13% of my CPU. I do not know if I do it in the right way or not.
How do I compute faster and use more CPU resources?
My program consists of three class:
The "main class that instantiates the Work object with a number of threads
A "T1" class that extends Thread and contains the work to be performed
A "Work" class that launches the desired thread numbers and displays the time taken by all threads to perform the work
Here is the code of my Main class:
public static void main(String[] args) {
System.out.println("Number of CPUs available = " + Runtime.getRuntime().availableProcessors()); //Display the number of CPUs available
int iteration = 100000000; // Define a number of itterations to do by all threads
/*
Instantiates each work with a different number of threads (1, 4, 8, 12, and 24)
*/
Work t1 = new Work(1);
Work t4 = new Work(4);
Work t8 = new Work(8);
Work t12 = new Work(12);
Work t24 = new Work(24);
/*
Launch the work for each thread with the specified number of iterations
*/
t1.goWork(iteration);
t4.goWork(iteration);
t8.goWork(iteration);
t12.goWork(iteration);
t24.goWork(iteration);
}
And here the Work class code:
public class Work {
static long time; // A variable that each thread increase by the time it takes to complete its task.
static int itterationPerThread; // A variable that stores the number of itterations Per Thread to do.
static int finish; // A variable that each thread incrase when it finish its task, used to wait until all thread has complete their task.
private int numberOfThreads; // The number of threads to launch.
/**
*
* The constructor, set the number Of threads to run
* #param numberOfThreads
*/
public Work(int numberOfThreads)
{
this.numberOfThreads = numberOfThreads; //Set the number of threads
}
/**
*
* A method that launch a specified number of thread in the constructor of the class, and distributes the a number of iteration of each thread.
* The method does nothing until each thread completes its task and print the time needed for all threads to complete their tasks.
* #param itterationPerThread
*/
public void goWork(int itterationPerThread)
{
finish = 0; //Reset the variable in the case that we call the method more than one time
time = 0; //Reset the variable in the case that we call the method more than one time
this.itterationPerThread = itterationPerThread/numberOfThreads; // Divide the given number of iterations by the number of threads specified in the constructor
for (int i=0; i<numberOfThreads; i++) //Launch the specified number of threads
{
new T1().run();
}
while (finish != numberOfThreads) //Do nothing until all thread as completed their task
{
}
System.out.println("Time for " + numberOfThreads + " thread = " + time + " ms"); //Display the total time
}
}
And finally my T1 class:
public class T1 extends Thread{
#Override
public void run()
{
long before = System.currentTimeMillis();
for (int i=0; i<Work.itterationPerThread; i++) //Get the thread busy with a number of itterations
{
Math.cos(2.1545); //Do something...
}
long after = System.currentTimeMillis(); //Compute the elapsed time
Work.time += after - before; //Increase the static variable in Work.java by the time elapsed for this thread
Work.finish++; // Increase the static variable in Work.java when the thread has finished its job
}
}
The programme gives me the following ouput on my machine (four physical cores and eight hyperthreaded):
Number of CPUs available = 8
Time for 1 thread = 11150 ms
Time for 4 thread = 4630 ms
Time for 8 thread = 2530 ms
Time for 12 thread = 2530 ms
Time for 24 thread = 2540 ms
According to my CPU this result seems correct, but my CPU usage never exceeds 13%.
I found the following Stack Overflow post, but I did not really find an answer to my question.

Instead of calling Thread.run(), which implements what your thread does, you should call Thread.start(), which will create a new thread and call run() on that new thread.
Now you are running run() on your main thread, without making a new thread. Since you have 13% CPU load, I expect you have 8 cores (meaning you have fully filled a single core).
Even better would be to create a custom implementation of the interface Runnable, instead of extending Thread. You can then run it on a thread as follows:
Thread t = new Thread(new MyRunnableTask());
t.start();
This is the common way because it gives you the flexibility (later on) to use more advanced mechanisms, such as ExecutorService.
EDIT:
As also noted in some of the comments. You are also changing the same variables (the static ones in Work) from several threads. You should never do this, because it allows for race conditions. For instance incrementing a variable can cause one, as explained here.

Thank you all for answering my question:
Yes, the JVM does not calculate the Math.cos(2.1545); on each iteration, so as said I've tried with Math.cos(i); on the original programme and there is a big difference!
And for the multi Thread, as said, I've created a custom implementation of the interface Runnable, instead of extending Thread and now use the Start(); method instead of run();
I now use the join method to wait until thread finish and remove the static variable.
Now the program use the full CPU load with the correct number of threads.
Just for information, here is my new code for the work class:
public class Work {
private Thread[] threadArray; //An array to store a specified number of new threads in the constructor
/**
*
* The constructor, set to the number Of threads to run
* #param numberOfThreads
*/
public Work(int numberOfThreads)
{
threadArray = new Thread[numberOfThreads];
}
/**
*
* A methode that launch a specified number of threads in the constructor of the class, and distributes the a number of iteration of each thread.
* the methode wait until each thread complete their task and print the time needed for all thread to complette their task.
* #param itterationForAllThread --> the total of itteration to do by all thread
*/
public void goWork(int itterationForAllThread)
{
long time = 0; // A variable used to compute the elapsed time
int itterationPerThread; // A variable that store the number of itterations Per Thread to do
itterationPerThread = itterationForAllThread/threadArray.length; //Divide the given number of itteration by the number of tread specified in the constructor
for(int i=0; i<threadArray.length; i++) //Launch the specified number of threads
{
threadArray[i] = new Thread(new T1(itterationPerThread)); //Create a new thread
threadArray[i].start(); //Start the job
}
long before = System.currentTimeMillis();
for (Thread thread : threadArray) //For each thread wait until it finish
{
try {
thread.join(); //Wait for the thread as finish
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
long after = System.currentTimeMillis();
time = after - before; //Compute the time elapsed
System.out.println("Time for " + threadArray.length + " Thread = " + time + " ms"); //Display the total time for the number of threads
}
}
And here the T1 class:
public class T1 implements Runnable{
private int iterrattionPerThread;
T1(int iterrattionPerThread)
{
this.iterrattionPerThread=iterrattionPerThread;
}
#Override
public void run()
{
for(int i=0; i<iterrattionPerThread; i++) //Get the thread busy with a number of iterations
{
Math.cos(i); //Do something that the JVM can not cache and need to be recaculated every iteration
}
}
}

What could cause a java process to get gradually decreasing share of CPU?

I have a very simple java program that prints out 1 million random numbers. In linux, I observed the %CPU that this program takes during its lifespan, it starts off at 98% then gradually decreases to 2%, thus causing the program to be very slow. What are some of the factors that might cause the program to gradually get less CPU time?
I've tried running it with nice -20 but I still see the same results.
EDIT: running the program with /usr/bin/time -v I'm seeing an unusual amount of involuntary context switches (588 voluntary vs 16478 involuntary), which suggests that the OS is letting some other higher priority process run.

It boils down to two things:
I/O is expensive, and
Depending on how you're storing the numbers as you go along, that can have an adverse effect on performance as well.
If you're mainly doing System.out.println(randInt) in a loop a million times, then that can get expensive. I/O isn't one of those things that comes for free, and writing to any output stream costs resources.

I would start by profiling via JConsole or VisualVM to see what it's actually doing when it has low CPU %. As mentioned in comments there's a high chance it's blocking, e.g. waiting for IO (user input, SQL query taking a long time, etc.)

If your application is I/O bound - for example waiting for responses from network calls, or disk read/write

If you want to try and balance everything, you should create a queue to hold numbers to print, then have one thread generate them (the producer) and the other read and print them (the consumer). This can easily be done with a LinkedBlockingQueue.
public class PrintQueueExample {
private BlockingQueue<Integer> printQueue = new LinkedBlockingQueue<Integer>();
public static void main(String[] args) throws InterruptedException {
PrinterThread thread = new PrinterThread();
thread.start();
for (int i = 0; i < 1000000; i++) {
int toPrint = ...(i) ;
printQueue.put(Integer.valueOf(toPrint));
}
thread.interrupt();
thread.join();
System.out.println("Complete");
}
private static class PrinterThread extends Thread {
#Override
public void run() {
try {
while (true) {
Integer toPrint = printQueue.take();
System.out.println(toPrint);
}
} catch (InterruptedException e) {
// Interruption comes from main, means processing numbers has stopped
// Finish remaining numbers and stop thread
List<Integer> remainingNumbers = new ArrayList<Integer>();
printQueue.drainTo(remainingNumbers);
for (Integer toPrint : remainingNumbers)
System.out.println(toPrint);
}
}
}
}
There may be a few problems with this code, but this is the gist of it.

Attempting to create a stable game engine loop

I'm writing a fairly simple 2D multiplayer-over-network game. Right now, I find it nearly impossible for myself to create a stable loop. By stable I mean such kind of loop inside which certain calculations are done and which is repeated over strict periods of time (let's say, every 25 ms, that's what I'm fighting for right now). I haven't faced many severe hindrances this far except for this one.
In this game, several threads are running, both in server and client applications, assigned to various tasks. Let's take for example engine thread in my server application. In this thread, I try to create game loop using Thread.sleep, trying to take in account time taken by game calculations. Here's my loop, placed within run() method. Tick() function is payload of the loop. It simply contains ordered calls to other methods doing constant game updating.
long engFPS = 40;
long frameDur = 1000 / engFPS;
long lastFrameTime;
long nextFrame;
<...>
while(true)
{
lastFrameTime = System.currentTimeMillis();
nextFrame = lastFrameTime + frameDur;
Tick();
if(nextFrame - System.currentTimeMillis() > 0)
{
try
{
Thread.sleep(nextFrame - System.currentTimeMillis());
}
catch(Exception e)
{
System.err.println("TSEngine :: run :: " + e);
}
}
}
The major problem is that Thread.sleep just loves to betray your expectations about how much it will sleep. It can easily put thread to rest for much longer or much shorter time, especially on some machines with Windows XP (I've tested it myself, WinXP gives really nasty results compared to Win7 and other OS). I've poked around internets quite a lot, and result was disappointing. It seems to be fault of the thread scheduler of the OS we're running on, and its so-called granularity. As far as I understood, this scheduler constantly, over certain amount of time, checks demands of every thread in system, in particular, puts/awakes them from sleep. When re-checking time is low, like 1ms, things may seem smooth. Although, it is said that WinXP has granularity as high as 10 or 15 ms. I've also read that not only Java programmers, but those using other languages face this problem as well.
Knowing this, it seems almost impossible to make a stable, sturdy, reliable game engine. Nevertheless, they're everywhere.
I'm highly wondering by which means this problem can be fought or circumvented. Could someone more experienced give me a hint on this?

Don't rely on the OS or any timer mechanism to wake your thread or invoke some callback at a precise point in time or after a precise delay. It's just not going to happen.
The way to deal with this is instead of setting a sleep/callback/poll interval and then assuming that the interval is kept with a high degree of precision, keep track of the amount of time that has elapsed since the previous iteration and use that to determine what the current state should be. Pass this amount through to anything that updates state based upon the current "frame" (really you should design your engine in a way that the internal components don't know or care about anything as concrete as a frame; so that instead there is just state that moves fluidly through time, and when a new frame needs to be sent for rendering a snapshot of this state is used).
So for example, you might do:
long maxWorkingTimePerFrame = 1000 / FRAMES_PER_SECOND; //this is optional
lastStartTime = System.currentTimeMillis();
while(true)
{
long elapsedTime = System.currentTimeMillis() - lastStartTime;
lastStartTime = System.currentTimeMillis();
Tick(elapsedTime);
//enforcing a maximum framerate here is optional...you don't need to sleep the thread
long processingTimeForCurrentFrame = System.currentTimeMillis() - lastStartTime;
if(processingTimeForCurrentFrame < maxWorkingTimePerFrame)
{
try
{
Thread.sleep(maxWorkingTimePerFrame - processingTimeForCurrentFrame);
}
catch(Exception e)
{
System.err.println("TSEngine :: run :: " + e);
}
}
}
Also note that you can get greater timer granularity by using System.nanoTime() in place of System.currentTimeMillis().

You may getter better results with
LockSupport.parkNanos(long nanos)
altho it complicates the code a bit compared to sleep()

maybe this helps you.
its from david brackeen's bock developing games in java
and calculates average granularity to fake a more fluent framerate:
link
public class TimeSmoothie {
/**
How often to recalc the frame rate
*/
protected static final long FRAME_RATE_RECALC_PERIOD = 500;
/**
Don't allow the elapsed time between frames to be more than 100 ms
*/
protected static final long MAX_ELAPSED_TIME = 100;
/**
Take the average of the last few samples during the last 100ms
*/
protected static final long AVERAGE_PERIOD = 100;
protected static final int NUM_SAMPLES_BITS = 6; // 64 samples
protected static final int NUM_SAMPLES = 1 << NUM_SAMPLES_BITS;
protected static final int NUM_SAMPLES_MASK = NUM_SAMPLES - 1;
protected long[] samples;
protected int numSamples = 0;
protected int firstIndex = 0;
// for calculating frame rate
protected int numFrames = 0;
protected long startTime;
protected float frameRate;
public TimeSmoothie() {
samples = new long[NUM_SAMPLES];
}
/**
Adds the specified time sample and returns the average
of all the recorded time samples.
*/
public long getTime(long elapsedTime) {
addSample(elapsedTime);
return getAverage();
}
/**
Adds a time sample.
*/
public void addSample(long elapsedTime) {
numFrames++;
// cap the time
elapsedTime = Math.min(elapsedTime, MAX_ELAPSED_TIME);
// add the sample to the list
samples[(firstIndex + numSamples) & NUM_SAMPLES_MASK] =
elapsedTime;
if (numSamples == samples.length) {
firstIndex = (firstIndex + 1) & NUM_SAMPLES_MASK;
}
else {
numSamples++;
}
}
/**
Gets the average of the recorded time samples.
*/
public long getAverage() {
long sum = 0;
for (int i=numSamples-1; i>=0; i--) {
sum+=samples[(firstIndex + i) & NUM_SAMPLES_MASK];
// if the average period is already reached, go ahead and return
// the average.
if (sum >= AVERAGE_PERIOD) {
Math.round((double)sum / (numSamples-i));
}
}
return Math.round((double)sum / numSamples);
}
/**
Gets the frame rate (number of calls to getTime() or
addSample() in real time). The frame rate is recalculated
every 500ms.
*/
public float getFrameRate() {
long currTime = System.currentTimeMillis();
// calculate the frame rate every 500 milliseconds
if (currTime > startTime + FRAME_RATE_RECALC_PERIOD) {
frameRate = (float)numFrames * 1000 /
(currTime - startTime);
startTime = currTime;
numFrames = 0;
}
return frameRate;
}
}

Read large files in Java

I need the advice from someone who knows Java very well and the memory issues.
I have a large file (something like 1.5GB) and I need to cut this file in many (100 small files for example) smaller files.
I know generally how to do it (using a BufferedReader), but I would like to know if you have any advice regarding the memory, or tips how to do it faster.
My file contains text, it is not binary and I have about 20 character per line.

To save memory, do not unnecessarily store/duplicate the data in memory (i.e. do not assign them to variables outside the loop). Just process the output immediately as soon as the input comes in.
It really doesn't matter whether you're using BufferedReader or not. It will not cost significantly much more memory as some implicitly seem to suggest. It will at highest only hit a few % from performance. The same applies on using NIO. It will only improve scalability, not memory use. It will only become interesting when you've hundreds of threads running on the same file.
Just loop through the file, write every line immediately to other file as you read in, count the lines and if it reaches 100, then switch to next file, etcetera.
Kickoff example:
String encoding = "UTF-8";
int maxlines = 100;
BufferedReader reader = null;
BufferedWriter writer = null;
try {
reader = new BufferedReader(new InputStreamReader(new FileInputStream("/bigfile.txt"), encoding));
int count = 0;
for (String line; (line = reader.readLine()) != null;) {
if (count++ % maxlines == 0) {
close(writer);
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("/smallfile" + (count / maxlines) + ".txt"), encoding));
}
writer.write(line);
writer.newLine();
}
} finally {
close(writer);
close(reader);
}

First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible length).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,

You can consider using memory-mapped files, via FileChannels .
Generally a lot faster for large files. There are performance trade-offs that could make it slower, so YMMV.
Related answer: Java NIO FileChannel versus FileOutputstream performance / usefulness

This is a very good article:
http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/
In summary, for great performance, you should:
Avoid accessing the disk.
Avoid accessing the underlying operating system.
Avoid method calls.
Avoid processing bytes and characters individually.
For example, to reduce the access to disk, you can use a large buffer. The article describes various approaches.

Does it have to be done in Java? I.e. does it need to be platform independent? If not, I'd suggest using the 'split' command in *nix. If you really wanted, you could execute this command via your java program. While I haven't tested, I imagine it perform faster than whatever Java IO implementation you could come up with.

You can use java.nio which is faster than classical Input/Output stream:
http://java.sun.com/javase/6/docs/technotes/guides/io/index.html

Yes.
I also think that using read() with arguments like read(Char[], int init, int end) is a better way to read a such a large file
(Eg : read(buffer,0,buffer.length))
And I also experienced the problem of missing values of using the BufferedReader instead of BufferedInputStreamReader for a binary data input stream. So, using the BufferedInputStreamReader is a much better in this like case.

package all.is.well;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import junit.framework.TestCase;
/**
* #author Naresh Bhabat
*
Following implementation helps to deal with extra large files in java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.
It uses random access file,which is almost like streaming API.
* ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);
* for 10 threads:Total time required for reading and writing the text in
* :seconds 349.317
*
* For 100:Total time required for reading the text and writing : seconds 464.042
*
* For 1000 : Total time required for reading and writing text :466.538
* For 10000 Total time required for reading and writing in seconds 479.701
*
*
*/
public class DealWithHugeRecordsinFile extends TestCase {
static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";
static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";
static volatile RandomAccessFile fileToWrite;
static volatile RandomAccessFile file;
static volatile String fileContentsIter;
static volatile int position = 0;
public static void main(String[] args) throws IOException, InterruptedException {
long currentTimeMillis = System.currentTimeMillis();
try {
fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles
file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles
seriouslyReadProcessAndWriteAsynch();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thread currentThread = Thread.currentThread();
System.out.println(currentThread.getName());
long currentTimeMillis2 = System.currentTimeMillis();
double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
System.out.println("Total time required for reading the text in seconds " + time_seconds);
}
/**
* #throws IOException
* Something asynchronously serious
*/
public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
while (true) {
String readLine = file.readLine();
if (readLine == null) {
break;
}
Runnable genuineWorker = new Runnable() {
#Override
public void run() {
// do hard processing here in this thread,i have consumed
// some time and ignore some exception in write method.
writeToFile(FILEPATH_WRITE, readLine);
// System.out.println(" :" +
// Thread.currentThread().getName());
}
};
executor.execute(genuineWorker);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
file.close();
fileToWrite.close();
}
/**
* #param filePath
* #param data
* #param position
*/
private static void writeToFile(String filePath, String data) {
try {
// fileToWrite.seek(position);
data = "\n" + data;
if (!data.contains("Randomization")) {
return;
}
System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data);
System.out.println("Lets consume through this loop");
int i=1000;
while(i>0){
i--;
}
fileToWrite.write(data.getBytes());
throw new Exception();
} catch (Exception exception) {
System.out.println("exception was thrown but still we are able to proceeed further"
+ " \n This can be used for marking failure of the records");
//exception.printStackTrace();
}
}
}

Don't use read without arguments.
It's very slow.
Better read it to buffer and move it to file quickly.
Use bufferedInputStream because it supports binary reading.
And it's all.

Unless you accidentally read in the whole input file instead of reading it line by line, then your primary limitation will be disk speed. You may want to try starting with a file containing 100 lines and write it to 100 different files one line in each and make the triggering mechanism work on the number of lines written to the current file. That program will be easily scalable to your situation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

MultiThread runs slower than single process - java

The number of Threads used is very important. a single process trying to switch between 1000 threads(you have created a new thread per file) is probably the main reason for being slower. try to use let's say 10 threads to read 1000 files, then you'll see the noticeable speed increase

Related

OpenCL kernel slower than normal Java loop

Java multi-threading programme not using a lot of CPU

What could cause a java process to get gradually decreasing share of CPU?

Attempting to create a stable game engine loop

Read large files in Java

Categories

Resources