Increase speed of JAVA program for Machine Learning

Increase speed of JAVA program for Machine Learning - java

I am doing machine learning in java using GATE Learning. I have a huge data set of documents to learn from. While using netbeans, I was getting java heap space error. So I provided 1600MB in the -Xmx parameter. Now, I do not get the heap space error but it takes ample of time to run!! (runs for 90 mins and I had to stop the process since I lost my patience!).
I do not understand whether I should increase my RAM(currently 4GB) or upgrade my OS(currently XP SP3, I have heard vista and win 7 better utilize RAM and Processor) or upgrade my processor(currently Dual Core E5500 2.80 GHz)?
Please throw some insight into what I can do to make this process run faster!
Thanks Rishabh

Before you can answer what will make it run faster, you have to find the bottleneck.
I'm not very familiar with Windows, but there is some sort of system load monitoring widget, IIRC.
What I would do is as follows:
Create some datasets of increasing sizes (more documents)
Run your program against those datasets
On each run, work out if the CPU maxes out, or if the memory maxes out and starts swapping, or if the whole thing is IO bound
Then fix the one that is causing the problem.
Just for context, it's not that unusual for ML algorithms to take a long time to run on large data sets. You can use the above approach to plot out the run time as the size of the input datasets increase, at least then you'll know if your program would have stopped in 100 minutes or 100 centuries.

Get a Profiler such as VisualVM or YourKit - start your programm - connect the Profiler to your running program - Find out, which methods and objects are your bottleneck - then at least you know where to start improving your program.

Related

How to prevent physical memory consuming when running parallel Java processes

I have big list (up to 500 000) of some functions.
My task is to generate some graph for each function (it can be do independently from other functions) and dump output to the file (it can be several files).
The process of generating graphs can be time consuming.
I also have server with 40 physical cores and 128GB ram.
I have tried to implement parallel processing using java Threads/ExecutorPool, but it seems not to use processors all resources.
On some inputs the program takes up to 25 hours to run and only 10-15 cores are working according to htop.
So the second thing I've tried is to create 40 distinct processes (using Runtime.exec) and split the list among them.
This method uses processor all resources (100% load on all 40 cores) and speedups performance up to 5 times on previous example (it takes only 5 hours which is reasonable for my task).
But the problem of this method is that, each java process runs separately and consumes memory independently from others. Is some scenarios all 128gb of ram is consumed after 5 minutes of parallel work. One solution that I am using now is to call System.gc() for each process if Runtime.totalMemory > 2GB. This slows down overall performance a bit (8 hours on previous input) but lefts memory usage in reasonable boundaries.
But this configuration works only for my server. If you run it on the server with 40 core and 64GB run, you need to tune Runtime.totalMemory > 2GB condition.
So the question is what is the best way to avoid such aggressive memory consuming?
Is it normal practice to run separate processes to do parallel jobs?
Is there any other parallel method in Java (maybe fork/join?) which uses 100% physical resources of processor.

You don't need to call System.gc() explicitly! The JVM will do it automatically when needed, and almost always does it better. You should, however, set the max heap size (-Xmx) to a number that works well.
If your program won't scale further you have some kind of congestion. You can either analyse your program and your java- and system settings and figure out why, or run it as multiple processes. If each process is multi-threaded, then you may get better performance using 5-10 processes instead of 40.
Note that you may get higher performance with more than one thread per core. Fiddle around with 1-8 threads per core and see if throughput increases.
From your description it sounds like you have 500,000 completely independent items of work and that each work item doesn't really need a lot of memory. If that is true, then memory consumption isn't really an issue. As long as each process has enough memory so it doesn't have to gc very often then gc isn't going to affect the total execution time by much. Just make sure you don't have any dangling references to objects you no longer need.

One of the problems here: it is still very hard to understand how many threads, cores, ... are actually available.
My personal suggestion: there are several articles on the java specialist newsletter which do a very deep dive into this subject.
For example this one: http://www.javaspecialists.eu/archive/Issue135.html
or a more recent new, on "the number of available processors": http://www.javaspecialists.eu/archive/Issue220.html

Why is multithreaded Java Program not faster on 'super' Linux server vs laptop Win7?

Intro
So far, I have been working on a piece of software which I am now testing to see the benefit of concurrency. I am testing the same software using two different systems:
System 1: 2 x Intel(R) Xeon(R) CPU E5-2665 # 2.40GHz with a
total of 16 cores , 64GB of RAM running on
Scientific LINUX 6.1 and JAVA SE Runtime Enviroment (build
1.7.0_11-b21).
System 2 Lenovo Thinkpad T410 with Intel i5
processor # 2.67GHz with 4 cores, 4GB of ram running windows 7 64-bit
and JAVA SE Runtime Enviroment (build 1.7.0_11-b21).
Details: The program simulates patients with type 1 diabetes. It does some import (read from csv), some numerical computations(Dopri54 + newton) and some export (Write to csv).
I have exclusive rights to the server, so there should be no noise at all.
Results
These are my results:
Now as you can see system 1 is just as fast as system 2 despite it is a pretty powerfull machine. I have no idea why this is the case - and I am confident that the system is the same. The number of threads goes from 10-100.
Question:
Why would does the two runs have similar execution time despite system 1 being significantly more powerfull than system 2?
UPDATE!
Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem?

As you write .csv files, it is possible that the bottleneck is not your camputation power, but the writing rate on your hard disk.

Almost certainly this means that either CPU time is not the bottleneck for this application, or that something about it is making it resistant to effective parallelization, or both.
For example if reading the data from disk is actually the limiting factor then faster disks are what matters, not faster processors.
If it's running out of memory then that will be a bigger bottlneck.
If it takes more time to spawn each thread than the actual processing inside the thread.
etc.
In this sort of optimization work metrics are king. You need real hard solid numbers for how long things are taking, and where in your program you are losing that time. Only then can you see where to focus your efforts and see if they are effective.

Caused by: java.lang.OutOfMemoryError: Java heap space

MY GOAL:
I want run my application for 1000 users.
NOW
I am trying to run for 100 user. During application run, I'd like to do some process for each user that will take a minimum of one hour per user, so I'm using one thread per user.
ERROR
Caused by: java.lang.OutOfMemoryError: Java heap space
I've tried to figure out what this means, but I'm not really sure how to resolve it.
Can anybody help me?

This error means that your program needs more memory than your JVM allowed it to use!
Therefore you pretty much have two options:
Increase the default memory your program is allowed to use using the -Xmx option (for instance for 1024 MB: -Xmx1024m)
Modify your program so that it needs less memory, using less big data structures and getting rid of objects that are not any more used at some point in your program
As Peter Lawrey pointed out, using a profiler to see what your program is doing in such situations is generally a good idea.

Use a producer/consumer pattern with a limited number of worker threads.
100+ threads is ridiculous - no wonder your application is exploding.

You haven't provided any information which indicates the problem is very different to all the answers given in StackOverflow regarding this error either;
You are using too much memory and you need to use a memory profiler to reduce it.
You are setting the maximum memory too low and you need to increase the maximum memory with -mx or -Xmx
I suspect that since you want 1000 users to run processes which take an hour each you may need more resources than you have. e.g. 1000 cores perhaps? I suggest you look at how much hardware you need based on the CPU, memory, disk IO and network IO that is required to run the users at an acceptible level e.g. 20 users and multiple that by 50.

You can try increasing the JVM heap space when you launch your application. You can try setting it to 2GB with -Xmx2g. If you're running 32-bit Java I think 2GB is as high as you can go, but if you have a 64-bit JVM you should be able to go higher.
Edit:
Example: java -Xmx2g MyApp

I will check 2 areas when there is out of memory error
Is the allocated memory to the JVM sufficient, if not increase it using -Xmx
Check the code thoroughly, more than 90% of the time I found the error with some loop going recursive under some border condition.

Java Random Slowdowns on Mac OS

I have a Java program for doing a set of scientific calculations across multiple processors by breaking it into pieces and running each piece in a different thread. The problem is trivially partitionable so there's no contention or communication between the threads. The only common data they access are some shared static caches that don't need to have their access synchronized, and some data files on the hard drive. The threads are also continuously writing to the disk, but to separate files.
My problem is that sometimes when I run the program I get very good speed, and sometimes when I run the exact same thing it runs very slowly. If I see it running slowly and ctrl-C and restart it, it will usually start running fast again. It seems to set itself into either slow mode or fast mode early on in the run and never switches between modes.
I have hooked it up to jconsole and it doesn't seem to be a memory problem. When I have caught it running slowly, I've tried connecting a profiler to it but the profiler won't connect. I've tried running with -Xprof but the dumps between a slow run and fast run don't seem to be much different. I have tried using different garbage collectors and different sizings of the various parts of the memory space, also.
My machine is a mac pro with striped RAID partition. The cpu usage never drops off whether its running slowly or quickly, which you would expect if threads were spending too much time blocking on reads from the disk, so I don't think it could be a disk read problem.
My question is, what types of problems with my code could cause this? Or could this be an OS problem? I haven't been able to duplicate it on a windows a machine, but I don't have a windows machine with a similar RAID setup.

You might have thread that have gone into an endless loop.
Try connecting with VisualVM and use the Thread monitor.
https://visualvm.dev.java.net
You may have to connect before the problem occurs.

I second that you should be doing it with a profiler looking at the threads view - how many threads, what states are they in, etc. It might be an odd race condition happening every now and then. It could also be the case that instrumenting the classes with profiler hooks (which causes slowdown), sortes the race condition out and you will see no slowdown with the profiler attached :/
Please have a look at this post, or rather the answer, where there is Cache contention problem mentioned.
Are you spawning the same umber of threads each time? Is that number less or equal the number of threads available on your platform? That number could be checked or guestimated with a fair accuracy.
Please post any finidngs!

Do you have a tool to measure CPU temperature? The OS might be throttling the CPU to deal with temperature issues.

Is it possible that your program is being paged to disk sometimes? In this case, you will need to look at the memory usage of the operating system as whole, rather than just your program. I know from experience there is a huge difference in runtime performance when memory is being continually paged to the disk and back.
I don't know much about OSX, but in linux the "free" command is useful for this purpose.
Another issue that might cause this slowdown is log files? I've known at least some logging code that slowed down the system incrementally as the log files grew. It's possible that your threads are synchronizing on a log file which is growing in size, then when you restart your program, another log file is used.

Java memory consumption, "top" and HP-Ux

We ship Java applications that are run on Linux, AIX and HP-Ux (PA-RISC). We seem to struggle to get acceptable levels of performance on HP-Ux from applications that work just fine in the other two environments. This is true of both execution time and memory consumption.
Although I'm yet to find a definitive article on "why", I believe that measuring memory consumption using "top" is a crude approach due to things like the shared code giving misleading results. However, it's about all we have to go on with a customer site where memory consumption on HP-Ux has become an issue. It only became an issue this time when we moved from Java 1.4 to Java 1.5 (on HP-Ux 11.23 PA-RISC). By "an issue", I mean that the machine ceased to create new processes because we had exhausted all 16GB of physical memory.
By measuring "before" and "after" total "free memory" we are trying to gauge how much has been consumed by a Java application. I wrote a quick app that stores 10,000 random 64 bit strings in an ArrayList and tried this approach to measuring consumption on Linux and HP-Ux under Java 1.4 and Java 1.5.
The results:
HP Java 1.4 ~60MB
HP Java 1.5 ~150MB
Linux Java 1.4 ~24MB
Linux Java 1.5 ~16MB
Can anyone explain why these results might arise? Is this some idiosyncrasy of the way "top" measures free memory? Does Java 1.5 on HP really consume 2.5 times more memory than Java 1.4?
Thanks.

The JVMs might just have different default parameters. The heap will grow to the size that you have configured to let it. The default on the Sun VM is a certain percentage of the RAM in the machine - that's to say that Java will, by default, use more memory if you use a machine with more memory on it.
I'd be really surprised if the HP-UX VM hadn't had lots of tuning for this sort of thing by HP. I'd suggest you fiddle with the parameters on both - figure out what the smallest max heap size you can use without hurting performance or throughput.

I don't have a HP box right now to test my hypothesis. However, if I were you, I would use a profiler like JConsole(comes with JDK) OR yourkit to measure what is happening.
However, it appears that you started measuring after you saw something amiss; So, I'm NOT discounting that it's happening -- just pointing you at something I'd have done in the same situation.

First, it's not clear what did you measure by "10,000 random 64 bit strings" test. You supposed to start the application, measure it's bootstrap memory footprint, and then run your test. It could easily be that Java 1.5 acquires more heap right after start (due to heap manager settings, for instance).
Second, we do run Java apps under 1.4, 1.5 and 1.6 under HP-UX, and they don't demonstrate any special memory requirements. We have Itanium hardware, though.
Third, why do you use top? Why not just print Runtime.getRuntime().totalMemory()?
Fourth, by adding values to ArrayList you create memory fragmentation. ArrayList has to double it's internal storage now and then. Depending on GC settings and ArrayList.ensureCapacity() implementation the amount of non-collected memory may differ dramatically between 1.4 and 1.5.
Essentially, instead of figuring out the cause of problem you have run a random test that gives you no useful information. You should run a profiler on the application to figure out where the memory leaks.

You might also want to look at the problem you are trying to solve... I don't imagine there are many problems that eat 16GB of memory that aren't due for a good round of optimization.
Are you launching multiple VMs? Are you reading large datasets into memory, and not discarding them quickly enough? etc etc etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.