BigQueue disk space not clearing - java

I am using a java persistence Queue named BigQueue, It stores the data in the disk, bigQueue.gc() is used to clear the used disk space. The big queue.gc() is not clearing the used disk space. The disk memory is continuously increasing.
IBigQueue bigQueue = new BigQueueImpl("/home/test/BigQueueNew", "demo1");
for (int i = 0; i < 10000; i++) {
ManagedObject mo = new ManagedObject();
mo.setName("Aravind " + i);
bigQueue.enqueue(serialize(mo));
}
while (!bigQueue.isEmpty()) {
ManagedObject mo = (ManagedObject) deserialize(bigQueue.dequeue());
System.out.println("Key Dqueue ME");
}
bigQueue.close();
// bigQueue.removeAll(); bigQueue.gc();; System.out.println("Big Queue is " + bigQueue.isEmpty() +" Size is "+bigQueue.size());

In case someone is looking at this as well.
If you are using Java 11 on ubuntu, this could be a known issue. Refer to the link below.
Unless it is fixed at the source, you could download the source and fix it yourself.
https://github.com/bulldog2011/bigqueue/issues/39

Related

Iterative GraphFrames AggregateMessages hitting memory limits

I'm using GraphFrame's aggregateMessages capability to build a custom clustering algorithm. I tested this algorithm on a small sample dataset (~100 items) and verified that it works. But when I run this on my real dataset of 50k items, I am getting OOM errors after ~10 iterations. Interestingly, the first few iterations are processed in a couple mins and mem is the normal range. It's after iteration 6 that mem usage creeps to ~30GB and eventually bombs. I am running this on a 2 node cluster 16cores with 32GB.
Since this is an iterative algorithm and the fact that the mem after each iteration only increases, I wonder if I need to release memory somehow. I added the unpersist blocks at the end of the the loop but that hasnt helped.
Are there any other efficiencies I could use? Are there best practices around using GraphFrames in an iterative setting?
Another thing I've noticed is that on the spark UI of the executor page, the used "storage memory" for ~300MB, but the spark process is infact taking ~30GB. Not sure if this is a memory leak!
while ( true ) {
System.out.println("["+new Date()+"] Running " + i);
Dataset<Row> lastRoutesDs = groups;
Dataset<Row> groupUnwind = groups.withColumn("id", explode(col("routeItems")));
GraphFrame gf = new GraphFrame(groupUnwind, edgesDs);
Dataset<Row> lvl1 = gf.aggregateMessages()
.sendToSrc(when(
callUDF("contains_in_array_str", AggregateMessages.dst().getField("routeItems"),
AggregateMessages.src().getField("id")).equalTo(false),
struct(AggregateMessages.dst().getField("routeItems").as("routeItems"),
AggregateMessages.dst().getField("routeScores").as("routeScores"),
AggregateMessages.dst().getField("grpId").as("grpId"),
AggregateMessages.dst().getField("grpScore").as("grpScore"),
AggregateMessages.edge().getField("score").as("edgeScore"))))
.agg(collect_set(AggregateMessages.msg()).as("incomings"))
.withColumn("inItem", explode(col("incomings")))
.groupBy("id", "inItem.grpId")
.agg(first("inItem.routeItems").as("routeItems"), first("inItem.routeScores").as("routeScores"),
first("inItem.grpScore").as("grpScore"), collect_list("inItem.edgeScore").as("inScores"))
.groupBy("grpId")
.agg(bestRouteAgg.apply(col("routeItems"), col("routeScores"), col("inScores"), col("grpScore"),
col("id"), col("grpScore")).as("best"))
.withColumn("newScore", callUDF("calcRouteScores", expr("size(best.routeItems)+1"),
col("best.routeScores"), col("best.inScores")))
.withColumn("edgeCount", expr("size(best.routeScores)"))
.persist(StorageLevel.MEMORY_AND_DISK());
lvl1
.filter("newScore > " + groupMaxScore)
.withColumn("itr", lit(i))
.select("grpId", "best.routeItems","best.routeScores", "best.grpScore", "edgeCount", "itr")
.write()
.mode(SaveMode.Append)
.json(workspaceDir + "clusters-rank-collect");
if (lvl1.count() == 0) {
System.out.println("****** End reached " + i);
break;
}
Dataset<Row> newGroups = lvl1.filter("newScore <= " + groupMaxScore)
.withColumn("routeItems_new",
callUDF("merge2Array", col("best.routeItems"), array(col("best.newNode"))))
.withColumn("routeScores_new",
callUDF("merge2ArrayDouble", col("best.routeScores"), col("best.inScores")))
.select(col("grpId"), col("routeItems_new").as("routeItems"),
col("routeScores_new").as("routeScores"), col("newScore").as("grpScore"));
if (i > 0 && (i % 2) == 0) {
newGroups = newGroups
.checkpoint();
}
newGroups = newGroups
.persist(StorageLevel.DISK_ONLY());
System.out.println( newGroups.count() );
groups.unpersist();
lastRoutesDs.unpersist();
groupUnwind.unpersist();
lvl1.unpersist();
groups = newGroups;
i++;
}

Sigar ProcCpu gather method always returns 0 for percentage value

I'm using Sigar to try and get the CPU and memory usage of individual processes (under Windows). I am able to get these stats correctly for the system as a whole with the below code :
Sigar sigar = new Sigar();
long totalMemory = sigar.getMem().getTotal() / 1024 /1024;
model.addAttribute("totalMemory", totalMemory);
double usedPercentage = sigar.getMem().getUsedPercent();
model.addAttribute("usedPercentage", String.format( "%.2f", usedPercentage));
double freePercentage = sigar.getMem().getFreePercent();
model.addAttribute("freePercentage", String.format( "%.2f", freePercentage));
double cpuUsedPercentage = sigar.getCpuPerc().getCombined() * 100;
model.addAttribute("cpuUsedPercentage", String.format( "%.2f", cpuUsedPercentage));
This displays the following quite nicely in my web page :
Total System Memory : 16289 MB
Used Memory Percentage : 66.81 %
Free Memory Percentage : 33.19 %
CPU Usage : 30.44 %
Now I'm trying to get info from individual processes such as Java and SQL Server and, while the memory is correctly gathered, the CPU usage for both processes is ALWAYS 0. Below is the code I'm using :
Sigar sigar = new Sigar();
List<ProcessInfo> processes = new ArrayList<>();
ProcessFinder processFinder = new ProcessFinder(sigar);
long[] javaPIDs = null;
Long sqlPID = null;
try
{
javaPIDs = processFinder.find("Exe.Name.ct=" + "java.exe");
sqlPID = processFinder.find("Exe.Name.ct=" + "sqlservr.exe")[0];
}
catch (Exception ex)
{}
int i = 0;
while (i < javaPIDs.length)
{
Long javaPID = javaPIDs[i];
ProcessInfo javaProcess = new ProcessInfo();
javaProcess.setPid(javaPID);
javaProcess.setName("Java");
ProcMem javaMem = new ProcMem();
javaMem.gather(sigar, javaPID);
javaProcess.setMemoryUsage(javaMem.getResident() / 1024 / 1024);
MultiProcCpu javaCpu = new MultiProcCpu();
javaCpu.gather(sigar, javaPID);
javaProcess.setCpuUsage(String.format("%.2f", javaCpu.getPercent() * 100));
processes.add(javaProcess);
i++;
}
if (sqlPID != null)
{
ProcessInfo sqlProcess = new ProcessInfo();
sqlProcess.setPid(sqlPID);
sqlProcess.setName("SQL Server");
ProcMem sqlMem = new ProcMem();
sqlMem.gather(sigar, sqlPID);
sqlProcess.setMemoryUsage(sqlMem.getResident() / 1024 / 1024);
ProcCpu sqlCpu = new MultiProcCpu();
sqlCpu.gather(sigar, sqlPID);
sqlProcess.setCpuUsage(String.format( "%.2f", sqlCpu.getPercent()));
processes.add(sqlProcess);
}
model.addAttribute("processes", processes);
I have tried both ProcCpu and MultiProcCpu and both of them always return 0.0 even if I can see Java using 15% CPU in task manager. The documentation on the Sigar library is virtually non existent but the research i did tells me that i appear to be doing this correctly.
Does anyone know what I'm doing wrong?
Thanks!
I found the issue while continuing to search online. Basically, the sigar library can only retrieve the correct CPU values after a certain time. My issue is that i was initializing a new Sigar instance every time the page was displayed. I made my Sigar instance global to my Spring controller and now it returns correct percentages.

download cover art from musicbrainz with java

I am struggling for a couple of hours now on how to link a discid to a musicbrainz mbid.
So, using dietmar-steiner / JMBDiscId
JMBDiscId discId = new JMBDiscId();
if (discId.init(PropertyFinder.getProperty("libdiscid.path")))
{
String musicBrainzDiscID = discId.getDiscId(PropertyFinder.getProperty("cdrom.path"));
}
or musicbrainzws2-java
Disc controller = new Disc();
String drive = PropertyFinder.getProperty("cdrom.path");
try {
DiscWs2 disc =controller.lookUp(drive);
log.info("DISC: " + disc.getDiscId() + " match: " + disc.getReleases().size() + " releases");
....
I can extract a discid for freedb or musicbrainz easily (more or less), but I have not found a way on calculating the id I that I need to download cover art via the CoverArtArchiveClient from last.fm.
CoverArtArchiveClient client = new DefaultCoverArtArchiveClient();
try
{
UUID mbid = UUID.fromString("mbid to locate release");
fm.last.musicbrainz.coverart.CoverArt coverArt = client.getByMbid(mbid);
Theoretically, I assume, I could you the data collected by musicbrainzws2-java to trigger a search, and then use the mbid from the result ... but that cannot be the best option to do.
I am happy about any push into the right direction...
Cheers,
Ed.
You don't calculate the MBID. The MBID is attached on every entity you retrieve from MusicBrainz.
When getting releases by DiscID you get a list. Each entry is a release and has an MBID, accessible with getId():
for (ReleaseWs2 rel : disc.getReleases()){
log.info("MBID: " + rel.getId() + ", String: " + rel.toString());
}
You then probably want to try the CoverArtArchive (CAA) for every release and take the first cover art you get.
Unfortunately I don't know of any API documentation for musicbrainzws2 on the web. I recommend running javadoc on all source files.

Java code to get the size of a BusinessObjects Report

I want to get the size of an SAP BusinessObjects report through Java code. My main concern is the size of the data we fetch through query, e.g., select si_id,si_name, si_size from ci_infoobjects where si_kind='webi'
How to get the size of particular report?
As Hardik Patel stated in the comments once you have a IInfoObject:
IFiles ifiles = object.getFiles();
IFile boFile = null;
long reportSize=0;
for (int k=0; k<ifiles.size(); k++)
{
boFile = (IFile) ifiles.get(k);
System.out.println("Size : " + boFile.getSize());
reportSize += boFile.getSize();
}
System.out.println("Size : " + reportSize);

Get OS-level system information

I'm currently building a Java app that could end up being run on many different platforms, but primarily variants of Solaris, Linux and Windows.
Has anyone been able to successfully extract information such as the current disk space used, CPU utilisation and memory used in the underlying OS? What about just what the Java app itself is consuming?
Preferrably I'd like to get this information without using JNI.
You can get some limited memory information from the Runtime class. It really isn't exactly what you are looking for, but I thought I would provide it for the sake of completeness. Here is a small example. Edit: You can also get disk usage information from the java.io.File class. The disk space usage stuff requires Java 1.6 or higher.
public class Main {
public static void main(String[] args) {
/* Total number of processors or cores available to the JVM */
System.out.println("Available processors (cores): " +
Runtime.getRuntime().availableProcessors());
/* Total amount of free memory available to the JVM */
System.out.println("Free memory (bytes): " +
Runtime.getRuntime().freeMemory());
/* This will return Long.MAX_VALUE if there is no preset limit */
long maxMemory = Runtime.getRuntime().maxMemory();
/* Maximum amount of memory the JVM will attempt to use */
System.out.println("Maximum memory (bytes): " +
(maxMemory == Long.MAX_VALUE ? "no limit" : maxMemory));
/* Total memory currently available to the JVM */
System.out.println("Total memory available to JVM (bytes): " +
Runtime.getRuntime().totalMemory());
/* Get a list of all filesystem roots on this system */
File[] roots = File.listRoots();
/* For each filesystem root, print some info */
for (File root : roots) {
System.out.println("File system root: " + root.getAbsolutePath());
System.out.println("Total space (bytes): " + root.getTotalSpace());
System.out.println("Free space (bytes): " + root.getFreeSpace());
System.out.println("Usable space (bytes): " + root.getUsableSpace());
}
}
}
The java.lang.management package does give you a whole lot more info than Runtime - for example it will give you heap memory (ManagementFactory.getMemoryMXBean().getHeapMemoryUsage()) separate from non-heap memory (ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage()).
You can also get process CPU usage (without writing your own JNI code), but you need to cast the java.lang.management.OperatingSystemMXBean to a com.sun.management.OperatingSystemMXBean. This works on Windows and Linux, I haven't tested it elsewhere.
For example ... call the get getCpuUsage() method more frequently to get more accurate readings.
public class PerformanceMonitor {
private int availableProcessors = getOperatingSystemMXBean().getAvailableProcessors();
private long lastSystemTime = 0;
private long lastProcessCpuTime = 0;
public synchronized double getCpuUsage()
{
if ( lastSystemTime == 0 )
{
baselineCounters();
return;
}
long systemTime = System.nanoTime();
long processCpuTime = 0;
if ( getOperatingSystemMXBean() instanceof OperatingSystemMXBean )
{
processCpuTime = ( (OperatingSystemMXBean) getOperatingSystemMXBean() ).getProcessCpuTime();
}
double cpuUsage = (double) ( processCpuTime - lastProcessCpuTime ) / ( systemTime - lastSystemTime );
lastSystemTime = systemTime;
lastProcessCpuTime = processCpuTime;
return cpuUsage / availableProcessors;
}
private void baselineCounters()
{
lastSystemTime = System.nanoTime();
if ( getOperatingSystemMXBean() instanceof OperatingSystemMXBean )
{
lastProcessCpuTime = ( (OperatingSystemMXBean) getOperatingSystemMXBean() ).getProcessCpuTime();
}
}
}
I think the best method out there is to implement the SIGAR API by Hyperic. It works for most of the major operating systems ( darn near anything modern ) and is very easy to work with. The developer(s) are very responsive on their forum and mailing lists. I also like that it is GPL2 Apache licensed. They provide a ton of examples in Java too!
SIGAR == System Information, Gathering And Reporting tool.
There's a Java project that uses JNA (so no native libraries to install) and is in active development. It currently supports Linux, OSX, Windows, Solaris and FreeBSD and provides RAM, CPU, Battery and file system information.
https://github.com/oshi/oshi
For windows I went this way.
com.sun.management.OperatingSystemMXBean os = (com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
long physicalMemorySize = os.getTotalPhysicalMemorySize();
long freePhysicalMemory = os.getFreePhysicalMemorySize();
long freeSwapSize = os.getFreeSwapSpaceSize();
long commitedVirtualMemorySize = os.getCommittedVirtualMemorySize();
Here is the link with details.
You can get some system-level information by using System.getenv(), passing the relevant environment variable name as a parameter. For example, on Windows:
System.getenv("PROCESSOR_IDENTIFIER")
System.getenv("PROCESSOR_ARCHITECTURE")
System.getenv("PROCESSOR_ARCHITEW6432")
System.getenv("NUMBER_OF_PROCESSORS")
For other operating systems the presence/absence and names of the relevant environment variables will differ.
Add OSHI dependency via maven:
<dependency>
<groupId>com.github.dblock</groupId>
<artifactId>oshi-core</artifactId>
<version>2.2</version>
</dependency>
Get a battery capacity left in percentage:
SystemInfo si = new SystemInfo();
HardwareAbstractionLayer hal = si.getHardware();
for (PowerSource pSource : hal.getPowerSources()) {
System.out.println(String.format("%n %s # %.1f%%", pSource.getName(), pSource.getRemainingCapacity() * 100d));
}
Have a look at the APIs available in the java.lang.management package. For example:
OperatingSystemMXBean.getSystemLoadAverage()
ThreadMXBean.getCurrentThreadCpuTime()
ThreadMXBean.getCurrentThreadUserTime()
There are loads of other useful things in there as well.
Usually, to get low level OS information you can call OS specific commands which give you the information you want with Runtime.exec() or read files such as /proc/* in Linux.
CPU usage isn't straightforward -- java.lang.management via com.sun.management.OperatingSystemMXBean.getProcessCpuTime comes close (see Patrick's excellent code snippet above) but note that it only gives access to time the CPU spent in your process. it won't tell you about CPU time spent in other processes, or even CPU time spent doing system activities related to your process.
for instance i have a network-intensive java process -- it's the only thing running and the CPU is at 99% but only 55% of that is reported as "processor CPU".
don't even get me started on "load average" as it's next to useless, despite being the only cpu-related item on the MX bean. if only sun in their occasional wisdom exposed something like "getTotalCpuTime"...
for serious CPU monitoring SIGAR mentioned by Matt seems the best bet.
On Windows, you can run the systeminfo command and retrieves its output for instance with the following code:
private static class WindowsSystemInformation
{
static String get() throws IOException
{
Runtime runtime = Runtime.getRuntime();
Process process = runtime.exec("systeminfo");
BufferedReader systemInformationReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
StringBuilder stringBuilder = new StringBuilder();
String line;
while ((line = systemInformationReader.readLine()) != null)
{
stringBuilder.append(line);
stringBuilder.append(System.lineSeparator());
}
return stringBuilder.toString().trim();
}
}
If you are using Jrockit VM then here is an other way of getting VM CPU usage. Runtime bean can also give you CPU load per processor. I have used this only on Red Hat Linux to observer Tomcat performance. You have to enable JMX remote in catalina.sh for this to work.
JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://my.tomcat.host:8080/jmxrmi");
JMXConnector jmxc = JMXConnectorFactory.connect(url, null);
MBeanServerConnection conn = jmxc.getMBeanServerConnection();
ObjectName name = new ObjectName("oracle.jrockit.management:type=Runtime");
Double jvmCpuLoad =(Double)conn.getAttribute(name, "VMGeneratedCPULoad");
It is still under development but you can already use jHardware
It is a simple library that scraps system data using Java. It works in both Linux and Windows.
ProcessorInfo info = HardwareInfo.getProcessorInfo();
//Get named info
System.out.println("Cache size: " + info.getCacheSize());
System.out.println("Family: " + info.getFamily());
System.out.println("Speed (Mhz): " + info.getMhz());
//[...]
One simple way which can be used to get the OS level information and I tested in my Mac which works well :
OperatingSystemMXBean osBean =
(OperatingSystemMXBean)ManagementFactory.getOperatingSystemMXBean();
return osBean.getProcessCpuLoad();
You can find many relevant metrics of the operating system here
To get the System Load average of 1 minute, 5 minutes and 15 minutes inside the java code, you can do this by executing the command cat /proc/loadavg using and interpreting it as below:
Runtime runtime = Runtime.getRuntime();
BufferedReader br = new BufferedReader(
new InputStreamReader(runtime.exec("cat /proc/loadavg").getInputStream()));
String avgLine = br.readLine();
System.out.println(avgLine);
List<String> avgLineList = Arrays.asList(avgLine.split("\\s+"));
System.out.println(avgLineList);
System.out.println("Average load 1 minute : " + avgLineList.get(0));
System.out.println("Average load 5 minutes : " + avgLineList.get(1));
System.out.println("Average load 15 minutes : " + avgLineList.get(2));
And to get the physical system memory by executing the command free -m and then interpreting it as below:
Runtime runtime = Runtime.getRuntime();
BufferedReader br = new BufferedReader(
new InputStreamReader(runtime.exec("free -m").getInputStream()));
String line;
String memLine = "";
int index = 0;
while ((line = br.readLine()) != null) {
if (index == 1) {
memLine = line;
}
index++;
}
// total used free shared buff/cache available
// Mem: 15933 3153 9683 310 3097 12148
// Swap: 3814 0 3814
List<String> memInfoList = Arrays.asList(memLine.split("\\s+"));
int totalSystemMemory = Integer.parseInt(memInfoList.get(1));
int totalSystemUsedMemory = Integer.parseInt(memInfoList.get(2));
int totalSystemFreeMemory = Integer.parseInt(memInfoList.get(3));
System.out.println("Total system memory in mb: " + totalSystemMemory);
System.out.println("Total system used memory in mb: " + totalSystemUsedMemory);
System.out.println("Total system free memory in mb: " + totalSystemFreeMemory);
Hey you can do this with java/com integration. By accessing WMI features you can get all the information.
Not exactly what you asked for, but I'd recommend checking out ArchUtils and SystemUtils from commons-lang3. These also contain some relevant helper facilities, e.g.:
import static org.apache.commons.lang3.ArchUtils.*;
import static org.apache.commons.lang3.SystemUtils.*;
System.out.printf("OS architecture: %s\n", OS_ARCH); // OS architecture: amd64
System.out.printf("OS name: %s\n", OS_NAME); // OS name: Linux
System.out.printf("OS version: %s\n", OS_VERSION); // OS version: 5.18.16-200.fc36.x86_64
System.out.printf("Is Linux? - %b\n", IS_OS_LINUX); // Is Linux? - true
System.out.printf("Is Mac? - %b\n", IS_OS_MAC); // Is Mac? - false
System.out.printf("Is Windows? - %b\n", IS_OS_WINDOWS); // Is Windows? - false
System.out.printf("JVM name: %s\n", JAVA_VM_NAME); // JVM name: Java HotSpot(TM) 64-Bit Server VM
System.out.printf("JVM vendor: %s\n", JAVA_VM_VENDOR); // JVM vendor: Oracle Corporation
System.out.printf("JVM version: %s\n", JAVA_VM_VERSION); // JVM version: 11.0.12+8-LTS-237
System.out.printf("Username: %s\n", getUserName()); // Username: johndoe
System.out.printf("Hostname: %s\n", getHostName()); // Hostname: garage-pc
var processor = getProcessor();
System.out.printf("CPU arch: %s\n", processor.getArch()) // CPU arch: BIT_64
System.out.printf("CPU type: %s\n", processor.getType()); // CPU type: X86

Categories