I am trying to calculate the factorial of very large numbers using threads but the threadless function is calculating faster.How can i use parallel computing with threads---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
public class Faktoriyel implements Runnable{
private Sayi sayi;
public Sayi faktoriyelSonuc;
public Faktoriyel(Sayi sayi){
this.sayi = sayi;
}
#Override
public void run() {
BigInteger fact = new BigInteger("1");
for (int i = 1 ;i <= sayi.GetSayi().longValue() ; i++) {
fact = fact.multiply(new BigInteger(i + ""));
}
faktoriyelSonuc = new Sayi(fact.toString());
System.out.println(faktoriyelSonuc.GetSayi());
}
}
These are main ---
public class Project1{
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
long baslangicSeri = System.nanoTime();
System.out.println(SeriFaktoriyel(new Sayi("200000")));
long bitisSeri = System.nanoTime();
double SerigecenSure = (double)(bitisSeri-baslangicSeri)/1000000000;
System.out.println("Seri Hesaplama : "+SerigecenSure+" saniye");
long baslangicParalel = System.nanoTime();
ExecutorService havuz = Executors.newFixedThreadPool(10);
havuz.execute(new Faktoriyel(new Sayi("200000")));
havuz.shutdown();
while(!havuz.isTerminated()){ }
long bitisParalel = System.nanoTime();
double gecenSure = (double)(bitisParalel-baslangicParalel)/1000000000;
System.out.println("Paralel hesaplama : "+gecenSure+" saniye");
}
public static String SeriFaktoriyel(Sayi sayi){
BigInteger fact = new BigInteger("1");
for (int i = 1; i <= sayi.GetSayi().longValue() ; i++) {
fact = fact.multiply(new BigInteger(i + ""));
}
return fact.toString();
}
}
There are a two things that I can point out that damage the performance of your threaded version:
System.out.println() is a system call which has a significant overhead, and it is applied only in the threaded version.
You are using a thread-pool of size 10 , unless you have 10 cores on your computer, it means that your program suffers from redundant context switches. (You will get better performance with a thread pool in the size of your actual amount of pc cores)
If this is Chinese to you I would recommend reading about context switches :)
Related
I have a piece of the source code in java8:
public class Test {
public static void main(String[] args) {
testObject(1.3);
testObject(1.4);
}
private static void testObject(double num) {
System.out.println("test:" + num);
long sta = System.currentTimeMillis();
int size = 10000000;
Object[] o = new Object[(int) (size * num)];
for (int i = 0; i < size; i++) {
o[i] = "" + i;
}
System.out.println("object[]: " + (System.currentTimeMillis() - sta) + " ms");
}
}
execution Result:
test:1.3
object[]: 7694 ms
test:1.4
object[]: 3826 ms
Why is the running time so different when my quantity is 1.4 * size?
I wanted to see how Java array assignment works, but I couldn't find anything on google.
In addition you have to keep in mind that System.currentTimeMillis returns a "Wall-Clock-Time". If your OS does a reschedule during the for-loop and a different process gets the cpu, the Wall-Clock-Time increases but your program won't execute.
I'm currently trying to increase the performance of my software by implementing the producer-consumer pattern. In my particular case I have a producer that sequentially creates Rows and multiple consumers that perform some task for a given batch of rows.
The problem I'm facing now is that when I measure the performance of my Producer-Consumer pattern, I can see that the producer's running time massively increases and I don't understand why this is the case.
So far I mainly profiled my code and did micro-benchmarking yet the results did not lead me to the actual problem.
public class ProdCons {
static class Row {
String[] _cols;
Row() {
_cols = Stream.generate(() -> "Row-Entry").limit(5).toArray(String[]::new);
}
}
static class Producer {
private static final int N_ITER = 8000000;
final ExecutorService _execService;
final int _batchSize;
final Function<Row[], Consumer> _f;
Producer(final int batchSize, final int nThreads, Function<Row[], Consumer> f) throws InterruptedException {
_execService = Executors.newFixedThreadPool(nThreads);
_batchSize = batchSize;
_f = f;
// init all threads to exclude their generaration time
startThreads();
}
private void startThreads() throws InterruptedException {
List<Callable<Void>> l = Stream.generate(() -> new Callable<Void>() {
#Override
public Void call() throws Exception {
Thread.sleep(10);
return null;
}
}).limit(4).collect(Collectors.toList());
_execService.invokeAll(l);
}
long run() throws InterruptedException {
final long start = System.nanoTime();
int idx = 0;
Row[] batch = new Row[_batchSize];
for (int i = 0; i < N_ITER; i++) {
batch[idx++] = new Row();
if (idx == _batchSize) {
_execService.submit(_f.apply(batch));
batch = new Row[_batchSize];
idx = 0;
}
}
final long time = System.nanoTime() - start;
_execService.shutdownNow();
_execService.awaitTermination(100, TimeUnit.MILLISECONDS);
return time;
}
}
static abstract class Consumer implements Callable<String> {
final Row[] _rowBatch;
Consumer(final Row[] data) {
_rowBatch = data;
}
}
static class NoOpConsumer extends Consumer {
NoOpConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
return null;
}
}
static class SomeConsumer extends Consumer {
SomeConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
String res = null;
for (int i = 0; i < 1000; i++) {
res = "";
for (final Row r : _rowBatch) {
for (final String s : r._cols) {
res += s;
}
}
}
return res;
}
}
public static void main(String[] args) throws InterruptedException {
final int nRuns = 10;
long totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new NoOpConsumer(data)).run();
}
System.out.println("Avg time with NoOpConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new SomeConsumer(data)).run();
}
System.out.println("Avg time with SomeConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
}
Actually, since the consumers run in different threads than the producer, I would expect that the running time of the producer is not effected by the Consumer's workload. However, running the program I get the following output
#1 Thread, #100 batch size
Avg time with NoOpConsumer: 0.7507254368s
Avg time with SomeConsumer: 1.5334749871s
Note that the time measurement does only measure the production time and not the consumer time and that not submitting any jobs requires on avg. ~0.6 secs.
Even more surprising is that when I increase the number of threads from 1 to 4, I get the following results (4-cores with hyperthreading).
#4 Threads, #100 batch size
Avg time with NoOpConsumer: 0.7741189636s
Avg time with SomeConsumer: 2.5561667638s
Am I doing something wrong? What am I missing? Currently I have to believe that the running time differences are due to context switches or anything related to my system.
Threads are not completely isolated from one another.
It looks like your SomeConsumer class allocates a lot of memory, and this produces garbage collection work that is shared between all threads, including your producer thread.
It also accesses a lot of memory, which can knock the memory used by the producer out of L1 or L2 cache. Accessing real memory takes a lot longer than accessing cache, so this can make your producer take longer as well.
Note also that I didn't actually verify that you're measuring the producer time properly, and it's easy to make mistakes there.
I have two threads and they are both reading the same static variable (some big object - an array with 500_000_000 ints).
The two threads are pinned to a cpu (1 and 2) (cpu affinity) so minimize jitters.
Do you know if the two threads will slow down each other because of the static variable is read by both threads running on different cpu?
import net.openhft.affinity.AffinityLock;
public class BigObject {
public final int[] array = new int[500_000_000];
public static final BigObject bo_static = new BigObject();
public BigObject() {
for( int i = 0; i<array.length; i++){
array[i]=i;
}
}
public static void main(String[] args) {
final Boolean useStatic = true;
Integer n = 2;
for( int i = 0; i<n; i++){
final int k = i;
Runnable r = new Runnable() {
#Override
public void run() {
BigObject b;
if( useStatic){
b = BigObject.bo_static;
}
else{
b = new BigObject();
}
try (AffinityLock al = AffinityLock.acquireLock()) {
while(true){
long nt1 = System.nanoTime();
double sum = 0;
for( int i : b.array){
sum+=i;
}
long nt2 = System.nanoTime();
double dt = (nt2-nt1)*1e-6;
System.out.println(k + ": sum " + sum + " " + dt);
}
}
}
};
new Thread(r).start();
}
}
}
Thanks
In your case there won't be a slow down from doing it multi-threaded - since you're doing only reads no need to invalidate any shared state between your CPUs.
Depending on the back-ground load there could be bus limitations and stuff, but if the affinity is defined at the OS level as well - there would be more inter-CPU and inter-core communications at an easily pre-fetched manner (since you access the data sequentially) than memory-cpu communications. Back-ground load would affect the performance in single-threaded case as well - so there's no need to argue about it.
If the whole system is dedicated to your program - than you would have approximately ~20Gb/s memory bandwidth on modern CPUs which is more than enough for your data-set.
Have an issue where the performance of a simple loop (see code below in LoopTest.performTest) varies dramatically, but is consistent for the lifetime of the process.
For example, when run from within Weblogic/Tomcat it may achieve an average of 3.5 billion iterations per second. Re-start and it may only achieve 20 million iterations per second. This will remain consistent for the lifetime of the process. When it has been run directly from the command line, on every occasion, it has run fast.
This has been run under linux, windows, Tomcat & WebLogic. The slow execution occurs more regularly in WebLogic than Tomcat.
Test Specifics
The test code moves any potential OS calls (timing) to before and after the test, with differing size loops which should allow any slow OS calls to be apparent as a progressive apparent performance improvement as loop size increases.
The number of iterations performed by the test is determined by time (see runTest) rather than being fixed due to large variation in performance and is thus more complex than may initially be expected.
public static abstract class PerformanceTest {
private final String name;
public PerformanceTest(String name) {
this.name = name;
}
/**
* Return value to ensure loops etc not optimised away.
*
* #param loopCount
* #return
*/
public abstract long performTest(final long loopCount);
public String getName() {
return name;
}
}
private static class LoopTest extends PerformanceTest {
LoopTest() {
super("Loop");
}
#Override
public long performTest(final long loopCount) {
long sum=0;
for(long i=0;i<loopCount;i++) {
sum+=i;
}
return sum;
}
}
public static List<PerformanceTest> loadTests() {
List<PerformanceTest> performanceTests = new ArrayList<PerformanceTest>();
performanceTests.add(new LoopTest());
return performanceTests;
}
public static void main(String[] argv) {
int maxDuration = 30;
if (argv.length == 1) {
maxDuration = Integer.parseInt(argv[0]);
}
List<PerformanceTest> tests = loadTests();
for(PerformanceTest test : tests) {
runTest(test, maxDuration);
}
}
public static void runTest(PerformanceTest test, int maxDuration) {
System.out.println("Processing " + test.getName());
long stopDuration = 1000 * maxDuration;
long estimatedDuration = 1;
long priorDelta = 1;
long loopCount=10;
while (estimatedDuration < stopDuration) {
long startTime = System.currentTimeMillis();
test.performTest(loopCount);
long endTime = System.currentTimeMillis();
long delta = endTime - startTime;
estimatedDuration = delta * Math.max(10, delta / Math.min(estimatedDuration, priorDelta));
if (estimatedDuration <= 0) {
estimatedDuration = 1;
}
priorDelta = delta;
if (priorDelta <= 0) {
priorDelta = 1;
}
if (delta > 0) {
double itemsPerSecond = 1000 * (double)loopCount / (double)delta;
DecimalFormat formatter;
if (itemsPerSecond < 1) {
formatter = new DecimalFormat( "#,###,###,##0.000");
} else if (itemsPerSecond < 10) {
formatter = new DecimalFormat( "#,###,###,##0.0");
} else {
formatter = new DecimalFormat( "#,###,###,##0");
}
System.out.println(" " + loopCount + " : Duration " + delta + ", Items Per-Second: " + formatter.format(itemsPerSecond));
}
loopCount*=10;
}
}
I have written a java program, and I want to see when it runs how much RAM it uses. Is there any way to see how much RAM usage is related to my program? I mean something like time usage of the program that can be seen by writing this code before and after calling the main code:
long startTime = System.currentTimeMillis();
new Main();
long endTime = System.currentTimeMillis();
System.out.println("Total Time: " + (endTime - startTime));
You can use the following class. Implemeting the Instantiator interface you can execute several time the same process to get a precise view of the memory consumption
public class SizeOfUtil {
private static final Runtime runtime = Runtime.getRuntime();
private static final int OBJECT_COUNT = 100000;
/**
* Return the size of an object instantiated using the instantiator
*
* #param instantiator
* #return byte size of the instantiate object
*/
static public int execute(Instantiator instantiator) {
runGarbageCollection();
usedMemory();
Object[] objects = new Object[OBJECT_COUNT + 1];
long heapSize = 0;
for (int i = 0; i < OBJECT_COUNT + 1; ++i) {
Object object = instantiator.execute();
if (i > 0)
objects[i] = object;
else {
object = null;
runGarbageCollection();
heapSize = usedMemory();
}
}
runGarbageCollection();
long heap2 = usedMemory(); // Take an after heap snapshot:
final int size = Math.round(((float) (heap2 - heapSize)) / OBJECT_COUNT);
for (int i = 1; i < OBJECT_COUNT + 1; ++i)
objects[i] = null;
objects = null;
return size;
}
private static void runGarbageCollection() {
for (int r = 0; r < 4; ++r){
long usedMem1 = usedMemory();
long usedMem2 = Long.MAX_VALUE;
for (int i = 0; (usedMem1 < usedMem2) && (i < 500); ++i) {
runtime.runFinalization();
runtime.gc();
Thread.yield();
usedMem2 = usedMem1;
usedMem1 = usedMemory();
}
}
}
private static long usedMemory() {
return runtime.totalMemory() - runtime.freeMemory();
}
}
Implement the interface
public interface Instantiator { Object execute(); }
With the code you want to test
public void sizeOfInteger(){
int size = SizeOfUtil.execute(new Instantiator(){
#Override public Object execute() {
return new Integer (3);
}
});
System.out.println(Integer.class.getSimpleName() + " size = " + size + " bytes");
}
source : Java Tutorial Java Size of objects
I think this must help:
visualvm
it comes with jdk, and have many thing that help to control memory usage
you can get a very close value by comparing the free memory of the JVM before and after the loading of your program. The difference is very close to the memory usage of your program. To get the JVM free memory use
Runtime.getRuntime().freeMemory()
To get the memory usage do this:
public static void main (String args[]){
long initial = Runtime.getRuntime().freeMemory(); //this must be the first line of code executed
//run your program ... load gui etc
long memoryUsage = Runtime.getRuntime().freeMemory() - initial ;
}