Creating threads is slower on AMD systems compared to Intel systems - java

In a study project we should create 100.000 threads to get a feeling for the time it takes to create a lot of threads and why it's more efficient to use tasks instead.
However we found out, that the same "Create and start 100.000 threads" code runs a lot slower on a modern Ryzen AMD systems compared to some older (even notebook) Intel systems. We did some benchmarking with different JDKs, but all using Java 16 (older versions didn't make a difference).
public class ThreadCreator {
public static void main(String[] args) throws InterruptedException {
List<Thread> startedThreads = new ArrayList<>();
long startTime = System.currentTimeMillis();
for (int i = 0; i < 100_000; i++) {
Thread t = new Thread(() -> {});
t.start();
startedThreads.add(t);
}
for (Thread t : startedThreads) {
t.join();
}
System.out.println("Duration: " + (System.currentTimeMillis() - startTime));
}
}
The benchmark results:
AMD Ryzen 7 3700X System (Java 16, Ubuntu 20.04):
Adopt OpenJDK (Hotspot): 13882ms
Adopt OpenJDK (OpenJ9): 7521ms
Intel i7-8550U System (Fedora 34, Java 16):
Adopt OpenJDK (Hotspot): 5321ms
Adopt OpenJDK (OpenJ9): 3089ms
Intel i5-6600k System (Windows 10, Java 16):
Adopt OpenJDK (Hotspot): 29433ms (Maybe realted to low memory of this system)
Adopt OpenJDK (OpenJ9): 5119ms
The OpenJ9 JVM reduces the time on both systems to nearly the half. However the AMD system never reaches the time of the Intel systems. The AMD system only runs at 10% cpu utilisation during this test.
What might be the reason why creating threads is so much slower on AMD systems compared to Intel systems?

I have a Ryzen 3700 system running Windows 10 and I got the following results:
Duration: 5.813002900 seconds
100000 tasks completed.
The program I ran, using Ada is:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Calendar; use Ada.Calendar;
procedure Main is
protected counter is
procedure add;
function report return Natural;
private
count : Natural := 0;
end counter;
protected body counter is
procedure add is
begin
count := count + 1;
end add;
function report return Natural is
begin
return count;
end report;
end counter;
task type worker;
task body worker is
begin
counter.add;
end worker;
type worker_access is access worker;
type list is array (Positive range 1 .. 100000) of worker_access;
start_time : Time;
end_time : Time;
begin
start_time := Clock;
declare
The_List : list;
begin
for I in The_List'Range loop
The_List (I) := new worker;
end loop;
end;
end_time := Clock;
Put_Line
("Duration:" & Duration'Image (end_time - start_time) & " seconds");
Put_Line (Natural'Image (counter.report) & " tasks completed.");
end Main;
This program creates a protected object used to count the number of tasks (similar to Java threads) executed. The protected procedure named add only allows one task at a time to increment the count within the protected object.
The inner block within the main procedure achieves the effect of a Java join. Note that a timing of 5.813 seconds is the same as 5813 milliseconds.

Related

How to use Java's getCpuLoad() method?

I tried to get current CPU load:
while (true) {
OperatingSystemMXBean os = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
System.out.println("CPU Load: " + os.getCpuLoad() * 100 + "%");
}
To my surprise, it is always 0! Note that in the mean time, I am running stress --cpu 4, and therefore the actual CPU load is around 20%. I checked out the JDK doc:
Returns the "recent cpu usage" for the operating environment.
However, if I add Thread.sleep(1000) inside the while loop, then the result is expected. It seems that some time gaps are required between calling getCpuLoad() because it needs to collect some system statics in that interval.
I also tried to print the CPU load after the sleeping (without any loop, only twice of printing):
Thread.sleep(1000);
OperatingSystemMXBean os = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
Thread.sleep(1000);
System.out.println(os.getCpuLoad() * 100 + "%"); // still always 0%
Thread.sleep(1000);
System.out.println(os.getCpuLoad() * 100 + "%"); // about 20%
The result shows that the first calling of getCpuLoad() will always return 0. I think it is really confusing.
Is it a pitfall of JDK's implementation? Why does not JDK doc add warnings about its usage?
My settings:
OS: 5.15.91-1-MANJARO
JDK: openjdk 17.0.6 2023-01-17 LTS
Good one.
On my machine (macos 13.2) jdk 17.06 demonstrates the same behaviour.
It does seem to be an implementation detail .
please see the code by the link below:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/macosx/native/libmanagement_ext/UnixOperatingSystem.c
if (last_used == 0 || last_total == 0) {
// First call, just set the last values
last_used = used;
last_total = total;
// return 0 since we have no data, not -1 which indicates error
return 0;
}
it's a bummer though, that they didn't specify this in the docs.

Java Standalone Application is behaving as inconsistency with long runs in windows-10

Can you please any one help on this problem?
we have a Java Desktop application(JDK8/JRE8-32bit) which will run on Windows-10
we are capturing times for each Runs for huge data.
Run1: 76min:24sec
Run2: 80min:34sec
Run3: 57min:8sec
Run4: 76min:50sec
we are not able to predict for next Run how much time we can get.
Windows 10-configuration
Socket-1
No.of Cores=8
No.of Logical Processors=8
Base Speed : 1.99 GHz
RAM: 16GB
Note: If we run the same application in 8GB Of windows-10 then it is taking very huge time nearly 3 hrs for each Run.
In code we have captured the timings for each line execution at attribute level, but it was showing the time difference in mills sec's and this difference was some time increasing/decreasing during executing of application.
CODE:
Block nextSubBlock = this.getNextSubBlock();
while (nextSubBlock != null && !this.endOfFile()) {
while (!this.blockReached(nextSubBlock) && !this.endOfFile()) {
this.processAttribute(this.next());
}
nextSubBlock.processBlock();
nextSubBlock = this.getNextSubBlock();
}
while (!this.endOfBlock() && !this.endOfFile()) {
this.processAttribute(this.next());
}
this.setStaticCounters();
this.processRepeatedBlocks()

Performance issue between solr 1.4.0 and 4.6.0

I updated my solr version from 1.4.0 to 4.6.0 and now we are facing several performance issues.
a) If I use embedded version, it's very slow
b) Using http, I have these average times:
1.4: 151ms
4.6: 301ms
c) I saw that JavaBinCodec changed from version 1 to 2. Anybody nows if this can be the problem?
Note1: I tested many times, discarding first time, because of the warm up of server.
Note2: The documents returned are very big (3k lines in XML view, each document)
Any help would be apreciated.
The code used to test, showing code to solr 4.6
public class Main {
private static HttpSolrServer server;
public static void main(String[] args) throws Exception {
String url = "http://foo.bar/myIndex";
server = new HttpSolrServer(url);
for (int i = 0; i < 10; i++) {
search();
}
}
public static void search() throws Exception {
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery("foo:bar");
solrQuery.setStart(0);
solrQuery.setRows(20);
// QUERY
long before = new GregorianCalendar().getTimeInMillis();
server.query(solrQuery);
long after = new GregorianCalendar().getTimeInMillis();
System.out.println(after - before);
}
}
Solr 4.6 runs on Java 6 or higher version. When using Java 7, Solr recommends to install at least Update 1 and also discourages the experimental usage of -XX JVM options. Latest version of JVMs may affect the performance of Solr. You can have an overview of issues in Solr due to JVM at the link below.
http://wiki.apache.org/lucene-java/JavaBugs
CPU, disk and memory requirements are based on the many choices made in implementing Solr (document size, number of documents, and number of hits retrieved to name a few).
However, you can try several things to improve the performance of Solr if you are using Zookeeper.
Move Zookeeper, if using, to another disk. If the index is huge then number of I/O call from Solr to Zookeeper will degrade the assembly performance.
Increase the Zookeeper timeout period.
Log gc times, I found out pauses of upto 20s on Zookeeper boxes.
Use the recommendations to tune the heap from http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning.

Force startup a computer automatically?

We know how to force shutdown an computer using Java. For example, the following code works fine for force shutdown:
public static void main(String arg[]) throws IOException{
Runtime runtime = Runtime.getRuntime();
Process proc = runtime.exec("shutdown -s -t 0");
System.exit(0);
}
Now, suppose if I want to force startup a computer (which is in shut down state), at a particular time, is it possible to do in Java or any other language?
You need something to trigger the startup. The best way to trigger this is Wake On Lan.
If you want to do this in Java, this might be a good resource.
In addition to wake on lan, there are IPMI devices that run on some server-grade hardware that is connected to the motherboard and can control power as well as provide serial console output over a network connection. This computer is running all the time, but I'm not familiar with any you can load your own code onto.
You can control this device remotely to power control the server that is off from any language including java.
http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface
If your BIOS supports Advanced Power Management (APM) version 1.2 or later, it should be possible to wake it from sleep/standy or hibernation based on a timer. On Windows an end user can do this through Task Scheduler, and if you wish to do it programmatically you can use the Task Scheduler interfaces.
I don't know how you would do this through Java, but here is some example C code that will create a task to wake the computer up 2 minutes in the future:
#include <mstask.h>
#include <time.h>
int main() {
HRESULT hr = CoInitialize(NULL);
if (SUCCEEDED(hr)) {
ITaskScheduler *scheduler;
hr = CoCreateInstance(CLSID_CTaskScheduler, NULL, CLSCTX_INPROC_SERVER, IID_ITaskScheduler, (void**)&scheduler);
if (SUCCEEDED(hr)) {
ITask *task;
hr = scheduler->NewWorkItem(L"Wake Timer", CLSID_CTask, IID_ITask, (LPUNKNOWN*)&task);
if (SUCCEEDED(hr)) {
WORD index;
ITaskTrigger *trigger;
hr = task->CreateTrigger(&index, &trigger);
if (SUCCEEDED(hr)) {
time_t t = time(NULL) + 120;
struct tm *ltime = localtime(&t);
TASK_TRIGGER triggertime;
memset(&triggertime, 0, sizeof(triggertime));
triggertime.cbTriggerSize = sizeof(TASK_TRIGGER);
triggertime.wBeginYear = ltime->tm_year+1900;
triggertime.wBeginMonth = ltime->tm_mon+1;
triggertime.wBeginDay = ltime->tm_mday;
triggertime.wStartHour = ltime->tm_hour;
triggertime.wStartMinute = ltime->tm_min;
triggertime.TriggerType = TASK_TIME_TRIGGER_ONCE;
trigger->SetTrigger(&triggertime);
trigger->Release();
}
task->SetFlags(TASK_FLAG_DELETE_WHEN_DONE|TASK_FLAG_SYSTEM_REQUIRED|TASK_FLAG_RUN_ONLY_IF_LOGGED_ON);
task->SetAccountInformation(L"", NULL);
IPersistFile *file;
hr = task->QueryInterface(IID_IPersistFile, (void**)&file);
if (SUCCEEDED(hr)) {
file->Save(NULL, TRUE);
file->Release();
}
task->Release();
}
scheduler->Release();
}
CoUninitialize();
}
return 0;
}
Assumedly if you can do this on Windows, there must be equivalent APIs for other operating systems.
I did manage to find a similar question floating around on the internet, so I'll post the links here to see if you find it helpful. (this was the thread I found: http://www.coderanch.com/t/440680/gc/interact-Windows-Task-Scheduler-Java)
First of all though, I might add that Java is a language that must run in a Virtual Machine - there are no two ways around it. I'm not well versed in 'low-level' programming, such as programming at closer to BIOS level, which is sort of where we are heading with this.
As the question was explicitly about Java, the best I could come up with from research, is (if you're really wanting to use Java for something), using the JAVA-COM (JACOB) http://sourceforge.net/projects/jacob-project/ which allows you to hook into the Windows Task Scheduler http://msdn.microsoft.com/en-us/library/aa383581%28VS.85%29.aspx via the COM language (AF
As far as I am aware, because Java needs to be in a virtual machine to run, there would be no way of getting it to do an action such as turning on a PC - let's not even get into issues of whether such an action would require administrator or above privileges.
Hope that helps.

Can't get past 2542 Threads in Java on 4GB iMac OSX 10.6.3 Snow Leopard (32bit)

I am running the following program trying to figure out how to configure my JVM to get the maximum number of threads my machine can support. For those that might not know, Snow Leopard ships with Java 6.
I tried starting it with defaults, and the following command lines, I always get the Out of Memory Error at Thread 2542 no matter what the JVM options are set to.
java TestThreadStackSizes 100000
java -Xss1024 TestThreadStackSizes 100000
java -Xmx128m -Xss1024 TestThreadStackSizes 100000
java -Xmx2048m -Xss1024 TestThreadStackSizes 100000
java -Xmx2048m -Xms2048m -Xss1024 TestThreadStackSizes 100000
no matter what I pass it, I get the same results, Out of Memory Error at 2542
public class TestThreadStackSizes
{
public static void main(final String[] args)
{
Thread.currentThread().setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
public void uncaughtException(final Thread t, final Throwable e)
{
System.err.println(e.getMessage());
System.exit(1);
}
});
int numThreads = 1000;
if (args.length == 1)
{
numThreads = Integer.parseInt(args[0]);
}
for (int i = 0; i < numThreads; i++)
{
try
{
Thread t = new Thread(new SleeperThread(i));
t.start();
}
catch (final OutOfMemoryError e)
{
throw new RuntimeException(String.format("Out of Memory Error on Thread %d", i), e);
}
}
}
private static class SleeperThread implements Runnable
{
private final int i;
private SleeperThread(final int i)
{
this.i = i;
}
public void run()
{
try
{
System.out.format("Thread %d about to sleep\n", this.i);
Thread.sleep(1000 * 60 * 60);
}
catch (final InterruptedException e)
{
throw new RuntimeException(e);
}
}
}
}
Any ideas on how I can affect these results?
I wrote this program to figure out what a Windows Server 2003 is capable of, because I am getting these out of memory can't create native threads at very low numbers, like a couple of hundred. I need to see what a particular box was capable of with different -Xss parameters, then I run into this arbitrary limit on OSX.
2542 seems like an arbitrary number:
I shut all programs down except the one terminal window I was running my test from and I got to 2545, that told me it was an arbitrary limit.
To get the number of threads for OSX 10.6.3 you do:
> sysctl kern.num_threads
kern.num_threads: 2560
and
> sysctl kern.num_taskthreads
kern.num_taskthreads: 2560
The 2560 number matches up with the 2542 and 2545 because there are obviously other threads running in the background.
According to the official documentation kern.num_taskthreads can not be adjusted in the desktop version of OSX.
According to the Apple Developer doc the thread stack size should be at least 64K, so your -Xss 1014 is ignored. But even with 64K per thread, the thread stack memory consumption comes only to about 160MB, so this shouldn't be the problem. Threads could also consume memory from a more limited pool, or there could simply be limit on the number of thread you can have per process or user.
You need to find out the maximum number of threads the operating system supports on your system.
On linux you can do something like :
cat /proc/sys/kernel/threads-max
to get the max, and to set it you can do something like :
echo 10000 > /proc/sys/kernel/threads-max
Also try running with :
-XX:-UseBoundThreads
and report back the results.
Do you think you will have these much thread concurrently up to 1 hour? I don't think so. I have worked in application which processed hundreds of documents, convert them from and to diff. format, generates proper logs in DB and stores specific info also. Then also it finished in seconds.
The thing you should take care about it, code wisely to avoid making too much threads. Instead use a ThreadPool provided by Java, so that same threads can be utilized when needed. that will provide better performance. Also keep synchronization on minimal blocks to avoid bottle necks in your execution.
thanks.

Categories