I found a source describing that the default gc used changes depending on the available resources. It seems that the jvm uses either g1gc or serial gc dependnig on hardware and os.
The serial collector is selected by default on certain hardware and operating system configurations
Can someone point out a more detailed source on what the specific criteria is and how that would apply in a dockerized/kubernetes enivronment.
In other words:
Could setting resource requests of the pod in k8s to eg. 1500 mCpu make the jvm use serial gc and changing to 2 Cpu change the default gc to g1gc? Do the limits on when which gc is used change depending on jvm version (11 vs 17)?
In JDK 11 and 17 Serial collector is used when there is only one CPU available. Otherwise G1 is selected
If you limit the number of CPUS available to your container, JVM selects Serial instead of the defaultG1
JDK11
1 CPU
docker run --cpus=1 --rm -it eclipse-temurin:11 java -Xlog:gc* -version
[0.004s][info][gc] Using **Serial**
It uses Serial
More than one CPU
docker run --cpus=2 --rm -it eclipse-temurin:11 java -Xlog:gc* -version
[0.008s][info][gc ] Using G1
It uses G1
JDK17
1 CPU
docker run --cpus=1 --rm -it eclipse-temurin:17 java -Xlog:gc* -version
[0.004s][info][gc] Using Serial
It uses Serial
More than one CPU
docker run --cpus=2 --rm -it eclipse-temurin:17 java -Xlog:gc* -version
[0.007s][info][gc] Using G1
It uses G1
In current OpenJDK, G1 GC is chosen by default for the "server class machine", or Serial GC otherwise. "Server class machine" is defined as a system with 2 or more non-HT CPUs and 2 or more GiB RAM.
The exact algorithm can be found in src/hotspot/share/runtime/os.cpp:
// This is the working definition of a server class machine:
// >= 2 physical CPU's and >=2GB of memory, with some fuzz
// because the graphics memory (?) sometimes masks physical memory.
// If you want to change the definition of a server class machine
// on some OS or platform, e.g., >=4GB on Windows platforms,
// then you'll have to parameterize this method based on that state,
// as was done for logical processors here, or replicate and
// specialize this method for each platform. (Or fix os to have
// some inheritance structure and use subclassing. Sigh.)
// If you want some platform to always or never behave as a server
// class machine, change the setting of AlwaysActAsServerClassMachine
// and NeverActAsServerClassMachine in globals*.hpp.
bool os::is_server_class_machine() {
// First check for the early returns
if (NeverActAsServerClassMachine) {
return false;
}
if (AlwaysActAsServerClassMachine) {
return true;
}
// Then actually look at the machine
bool result = false;
const unsigned int server_processors = 2;
const julong server_memory = 2UL * G;
// We seem not to get our full complement of memory.
// We allow some part (1/8?) of the memory to be "missing",
// based on the sizes of DIMMs, and maybe graphics cards.
const julong missing_memory = 256UL * M;
/* Is this a server class machine? */
if ((os::active_processor_count() >= (int)server_processors) &&
(os::physical_memory() >= (server_memory - missing_memory))) {
const unsigned int logical_processors =
VM_Version::logical_processors_per_package();
if (logical_processors > 1) {
const unsigned int physical_packages =
os::active_processor_count() / logical_processors;
if (physical_packages >= server_processors) {
result = true;
}
} else {
result = true;
}
}
return result;
}
Related
I tried to understand how -XX:ReservedCodeCacheSize=512m works but it did not get applied when running java as follows:
java --version -XX:ReservedCodeCacheSize=512m
It simply set to the default 48M on x86 at this point:
define_pd_global(uintx, ReservedCodeCacheSize, 48*M);
And then got 5 times increased at that point:
// Increase the code cache size - tiered compiles a lot more.
if (FLAG_IS_DEFAULT(ReservedCodeCacheSize)) {
FLAG_SET_ERGO(uintx, ReservedCodeCacheSize,
MIN2(CODE_CACHE_DEFAULT_LIMIT, (size_t)ReservedCodeCacheSize * 5));
}
causing reservation code space to 48*5 M instead of the value I configured:
size_t cache_size = ReservedCodeCacheSize;
//...
ReservedCodeSpace rs = reserve_heap_memory(cache_size);
I first though that ReservedCodeCacheSize is a development option and therefore not allowed to be overriden, but it is marked as product here so this is not the case.
What's wrong and why was the option silently ignored?
--version is a terminal option. JVM flags should precede terminal options.
Try java -XX:ReservedCodeCacheSize=512m --version
On my university’s high end computing cluster, I use the following script to run a Java program (repls.class):
#$ -S /bin/bash
#$ -q serial
#$ -l h_vmem=10G
source /etc/profile
module add java
export CLASSPATH=$CLASSPATH:`pwd`
java -Xmx3000M -Xms128M -Djava.awt.headless=true -classpath ".:./netlogo/app/netlogo-6.1.0.jar" repls 1 1 1 5 "E,16070608MAT,16070608MAT,L160706-08-MATHS-R3.txt,L160706-08-MATHS.csv"
(The cluster operating system is CentOS Linux, with job submission handled by Son of Grid Engine.)
‘repls.class’ starts NetLogo, running my program ‘VizSim19Calib.nlogo’ headlessly. It also sets several global variables for the run.
‘VizSim19Calib.nlogo’ runs many simulations (replications – but testing with 5).
The problem is that each simulation is taking approx. 3 s to run, whereas on my own desktop each simulation takes approx. 1.5 s!
It doesn’t matter what settings I use for virtual memory, heap or stack – even doubling these makes no difference, viz.: #$ -l h_vmem=20G and -Xmx6000M -Xms256M
Why does the simulation run so slowly?
Could the location of the NetLogo class and jar files be responsible?
They are in directories under my home folder.
My Java program ‘repls.java’ is basically:
import org.nlogo.headless.HeadlessWorkspace;
public class repls {
public static void main(String[] args) {
try { …
HeadlessWorkspace workspace =
HeadlessWorkspace.newInstance() ;
try {
workspace.open("VizSim19Calib.nlogo",false);
workspace.command("startup");
workspace.command("set Test? false");
workspace.command("set SIMULATION-RUN-ID " + args[0]);
…
workspace.command(
"RunOneLessonParamReps SelectedLessonData #Replications"
);
workspace.dispose();
}
catch(Exception ex) {
…
}
}
catch (NumberFormatException e) {
…
} } }
I realized that NetLogo was using all 4 cores on my desktop, but only 1 on one of the university cluster's single compute nodes, by default. I increased this to 4 then 8 cores, and the speed improvement is as desired! I'll try 16 tonight. I think the matter is closed.
We are running two separate JAVA programs in the below mentioned OS and JVM.
Operating System : HP-UX 11.11
JVM Used : 1.6
Program 1:
• This program monitors a folder for new files using Apache VFS.
• I am using multithreading in this program ,and it creates 5 threads in runtime to process the files in the folder which is being monitored. ( I use Executor service for this).
• This program runs on an infinite loop.
• I am using “ManagementFactory class“ to get the PID of this program and I write it to a txt file.
Program 2:
• In this program I will get the PID of the “Program 1” from text file and I want to find all the active threads of “Program1”
• Along with the active threads I would like to know the status whether these 5 threads of “Program1” are currently running/completed state.
Please let me know whether we can fetch the threads of another program based on the PID from JVM?
You can get the threads of a program in java by issuing platform specific commands and using JNA (Java Natice Access) to platform specific APIs, for instance under Linux you can issue:
ps uH p <PID_OF_U_PROCESS> | wc -l
when issuing the ps command you have several details on the process, for instance whether it's a zombie process or how much memory and processor it is using..
Under Windows you'll use similar command
You can do this by using JMX. Something like
for (final VirtualMachineDescriptor vmd : VirtualMachine.list()) {
int pid;
try {
if ((pid = Integer.parseInt(vmd.id())) == myPidOfInterest) {
String address = ConnectorAddressLink.importFrom(pid)));
MBeanServerConnection connection = JMXConnectorFactory.connect(address).getMBeanServerConnection();
ThreadMXBean threadMxBean = ManagementFactory.newPlatformMXBeanProxy(connection, "java.lang:type=Threading", ThreadMXBean.class);
// Now you have the ThreadMXBean you can find out all kinds of things about the threads
for (long threadId : threadMxBean.getAllThreadIds()) {
System.out.println(threadMxBean.getThreadInfo(threadId));
}
}
} catch (NumberFormatException e) {
// ignore
}
}
import java.io.*;
import java.util.*;
public class ReadPropertiesFile {
public static void main(String[] args)throws Throwable{
Properties prop = new Properties();
prop.load(new FileInputStream(
"C:\\Windows\\Sun\\Java\\Deployment\\deployment.properties"));
String Xmx = prop.getProperty("deployment.javaws.jre.0.args",
"This is Default");
if(Xmx!="This is Default")
{
System.setProperty("javaplugin.vm.options","\"Xmx\"");
}
long maxMemory = Runtime.getRuntime().maxMemory();
System.out.println("JVM maxMemory also equals to maximum heap size of JVM: "
+ maxMemory);
}
}
It should print the value of maxMemory around 96MB(for 2 gb RAM) when nothing specified in the deployment.properties AND 512MB when explicitly mentioning the deployment.javaws.jre.0.args=-Xmx512m.But in both case i am getting the result 259522560
The JVM memory parameters for a Hotspot Java implementation can only be set via the command line options when the JVM is launched / started. Setting them in the system properties either before or after the JVM is launched will have no effect.
What you are trying to do simply won't work.
By the time any Java code is able to run, it is too late to change the heap size settings. It is the same whether you run your code using the java command, using web start, using an applet runner, using an embedded JVM in a browser ... or any other means.
The attached simple Java code should load all available cpu core when starting it with the right parameters. So for instance, you start it with
java VMTest 8 int 0
and it will start 8 threads that do nothing else than looping and adding 2 to an integer. Something that runs in registers and not even allocates new memory.
The problem we are facing now is, that we do not get a 24 core machine loaded (AMD 2 sockets with 12 cores each), when running this simple program (with 24 threads of course). Similar things happen with 2 programs each 12 threads or smaller machines.
So our suspicion is that the JVM (Sun JDK 6u20 on Linux x64) does not scale well.
Did anyone see similar things or has the ability to run it and report whether or not it runs well on his/her machine (>= 8 cores only please)? Ideas?
I tried that on Amazon EC2 with 8 cores too, but the virtual machine seems to run different from a real box, so the loading behaves totally strange.
package com.test;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
public class VMTest
{
public class IntTask implements Runnable
{
#Override
public void run()
{
int i = 0;
while (true)
{
i = i + 2;
}
}
}
public class StringTask implements Runnable
{
#Override
public void run()
{
int i = 0;
String s;
while (true)
{
i++;
s = "s" + Integer.valueOf(i);
}
}
}
public class ArrayTask implements Runnable
{
private final int size;
public ArrayTask(int size)
{
this.size = size;
}
#Override
public void run()
{
int i = 0;
String[] s;
while (true)
{
i++;
s = new String[size];
}
}
}
public void doIt(String[] args) throws InterruptedException
{
final String command = args[1].trim();
ExecutorService executor = Executors.newFixedThreadPool(Integer.valueOf(args[0]));
for (int i = 0; i < Integer.valueOf(args[0]); i++)
{
Runnable runnable = null;
if (command.equalsIgnoreCase("int"))
{
runnable = new IntTask();
}
else if (command.equalsIgnoreCase("string"))
{
runnable = new StringTask();
}
Future<?> submit = executor.submit(runnable);
}
executor.awaitTermination(1, TimeUnit.HOURS);
}
public static void main(String[] args) throws InterruptedException
{
if (args.length < 3)
{
System.err.println("Usage: VMTest threadCount taskDef size");
System.err.println("threadCount: Number 1..n");
System.err.println("taskDef: int string array");
System.err.println("size: size of memory allocation for array, ");
System.exit(-1);
}
new VMTest().doIt(args);
}
}
I don't see anything wrong with your code.
However, unfortunately, you can't specify the processor affinity in Java. So, this is actually left up to the OS, not the JVM. It's all about how your OS handles threads.
You could split your Java threads into separate processes and wrap them up in native code, to put one process per core. This does, of course, complicate communication, as it will be inter-process rather than inter-thread. Anyway, this is how popular grid computing applications like boink work.
Otherwise, you're at the mercy of the OS to schedule the threads.
I would guess this is inherent to the JVM/OS and not necessarily your code. Check the various JVM performance tuning docs from Sun, e.g. http://ch.sun.com/sunnews/events/2009/apr/adworkshop/pdf/5-1-Java-Performance.pdf which suggests using numactl on Linux to set the affinity.
Good luck!
Apparently your VM is running in so-called "client" mode, where all Java threads are mapped to one native OS thread and consequently are run by one single CPU core. Try to invoke the JVM with -server switch, this should correct the problem.
If you get an: Error: no 'server' JVM found, you'll have to copy the server directory from a JDK's jre\bin directory to JRE's bin.
uname -a
2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Xeon(R) CPU E5530 # 2.40GHz
http://browse.geekbench.ca/geekbench2/view/182101
Java 1.6.0_20-b02
16cores, the program consumed 100% cpu as shown by vmstat
Interestingly I came to this article because I am suspecting my application is not utilizing all the cores as the cpu utilisation never increases but the response time starts deteriorating
I've noticed even on C that a tight loop often has issues like that. You'll also see pretty vast differences depending on OS.
Depending on the reporting tool you are using, it may not report the CPU used by some core services.
Java tends to be pretty friendly. You might try the same thing in linux but set the process priority to some negative number and see how it acts.
Setting thread priorities inside the app may help a little too if your jvm isn't using green threads.
Lots of variables.