java heap analysis with oql: Count unique strings - java

Im doing a memory analysis of an existing java software. Is there a sql 'group by' equivalent in oql to see the count of objects with same values but different instances.
select count(*)
from java.lang.String s
group by s.toString()
I'd like to achieve a list of duplicated strings along with the number of duplicates. The purpose of this is to see the cases with large numbers so that they could be optimized using String.intern().
Example:
"foo" 100
"bar" 99
"lazy fox" 50
etc...

The following is based on the answer by Peter Dolberg and can be used in the VisualVM OQL Console:
var counts={};
var alreadyReturned={};
filter(
sort(
map(heap.objects("java.lang.String"),
function(heapString){
if( ! counts[heapString.toString()]){
counts[heapString.toString()] = 1;
} else {
counts[heapString.toString()] = counts[heapString.toString()] + 1;
}
return { string:heapString.toString(), count:counts[heapString.toString()]};
}),
'lhs.count < rhs.count'),
function(countObject) {
if( ! alreadyReturned[countObject.string]){
alreadyReturned[countObject.string] = true;
return true;
} else {
return false;
}
}
);
It starts by using a map() call over all String instances and for each String creating or updating an object in the counts array. Each object has a string and a count field.
The resulting array will contain one entry for each String instance, each having a count value one larger than the previous entry for the same String.
The result is then sorted on the count field and the result looks something like this:
{
count = 1028.0,
string = *null*
}
{
count = 1027.0,
string = *null*
}
{
count = 1026.0,
string = *null*
}
...
(in my test the String "*null*" was the most common).
The last step is to filter this using a function that returns true for the first occurrence of each String. It uses the alreadyReturned array to keep track of which Strings have already been included.

I would use Eclipse Memory Analyzer instead.

Sadly, there isn't an equivalent to "group by" in OQL. I'm assuming you're talking about the OQL that is used in jhat and VisualVM.
There is an alternative, though. If you use pure JavaScript syntax instead of the "select x from y" syntax then you have the full power of JavaScript to work with.
Even so, the alternative way of getting the information you're looking for isn't simple. For example, here's an OQL "query" that will perform the same task as your query:
var set={};
sum(map(heap.objects("java.lang.String"),function(heapString){
if(set[heapString.toString()]){
return 0;
}
else{
set[heapString.toString()]=true;
return 1;
}
}));
In this example a regular JavaScript object mimics a set (collection with no duplicates). As the the map function goes through each string, the set is used to determine if the string has already been seen. Duplicates don't count toward the total (return 0) but new strings do (return 1).

A far more efficient query:
var countByValue = {};
// Scroll the strings
heap.forEachObject(
function(strObject) {
var key = strObject.toString();
var count = countByValue[key];
countByValue[key] = count ? count + 1 : 1;
},
"java.lang.String",
false
);
// Transform the map into array
var mapEntries = [];
for (var i = 0, keys = Object.keys(countByValue), total = keys.length; i < total; i++) {
mapEntries.push({
count : countByValue[keys[i]],
string : keys[i]
});
}
// Sort the counts
sort(mapEntries, 'rhs.count - lhs.count');

Just post my solution and experience when doing similar issue for other references.
var counts = {};
var alreadyReturned = {};
top(
filter(
sort(
map(heap.objects("java.lang.ref.Finalizer"),
function (fobject) {
var className = classof(fobject.referent)
if (!counts[className]) {
counts[className] = 1;
} else {
counts[className] = counts[className] + 1;
}
return {string: className, count: counts[className]};
}),
'rhs.count-lhs.count'),
function (countObject) {
if (!alreadyReturned[countObject.string]) {
alreadyReturned[countObject.string] = true;
return true;
} else {
return false;
}
}),
"rhs.count > lhs.count", 10);
The previous code will output the top 10 classes used by java.lang.ref.Finalizer.
Tips:
1. The sort function by using function XXX is NOT working on my Mac OS.
2. The classof function can return the class of the referent. (I tried to use fobject.referent.toString() -> this returned a lot of org.netbeans.lib.profiler.heap.InstanceDump. This also wasted a lot of my time).

Method 1
You can select all the strings and then use the terminal to aggregate them.
Increase the oql limit in the visual vm config files
restart visual vm
oql to get all the strings
copy and paste them into vim
clean the data with vim macros so there's
sort | uniq -c to get the counts.
Method 2
Use a tool to dump all the fields object the class you're interested in ( https://github.com/josephmate/DumpHprofFields can do it )
Use bash to select the strings you're interested in
Use bash to aggregate

Related

Removing an input from a recursive method

Good morning! I received a problem statement to write a method that returns all possible combinations of a String input passed, e.g.
if ABC is passed then it returns [A, AB, BC, ABC, AC, B, C]
if ABCD is passed then it returns [A, AB, BC, CD, ABC, AC, ACD, B, BCD, BD, ABD, AD, C, D, ABCD]
means AB and BA are always taken same, ABC, BAC and ACB are also same.
I ended up writing below code and it seems to working though (not sure).
public static Set<String> getAnyPermutations(String s,String strInput) {
Set<String> resultSet = new HashSet<>();
char[] inp = strInput.toCharArray();
for(int i=0; i<inp.length; i++) {
String temp =s+String.valueOf(inp[i]);
resultSet.add(temp);
if(i+1<=inp.length)
resultSet.addAll(getAnyPermutations(temp, String.valueOf(Arrays.copyOfRange(inp, i+1, inp.length))));
}
return resultSet;
}
My question is, I want to remove the first param(String s) from the method as using it for interal comutations only, or if that is not possible then making sure that user always pass a "" value or I can reset it to "" for the first(non-recursive) call of this method. I am going confused how to do that inside a recursive funtion.
Also please add comment if you have doubt it can fail other than this situation.
Conditions, All has to be done inside this function only, no other method can be created.
All has to be done inside this function only, no other function can be created.
Then you can't do it. The function has no (reasonable)* way of knowing whether it called itself or was called by another function.
There are lots of solutions involving creating another function. One that might fit your requirements, depending on how they're actually expressed, would be to have the function define a lambda to do the work, and have the lambda call itself. E.g., getAnyPermutations wouldn't actually be recursive, it would contain a recursive function.
But that may be out of bounds depending on the exact meaning of the quote above, since the lambda is another function, just not one that can be accessed from the outside.
* The unreasonable way is by examining a stack trace, which you can get from Thread.currentThread().getStackTrace.
You can always transform a recursive method into its iterative equivalent - e.g. see
Way to go from recursion to iteration.
In the iterative version it's easy to not expose the state parameter (you now just need to initialize it at the beginning of the iterative method).
This is not very practical in general (but I believe that the purpose of the question is more theoretical, otherwise it's always a good solution to just expose another method).
Furthermore, in this particular situation you might consider this simple iterative approach (though it is not obtained by directly translating the given code):
public static Set<String> getAnyPermutations(String strInput) {
Set<String> resultSet = new HashSet<>();
char[] inp = strInput.toCharArray();
for (int bitMask = 0; bitMask < (1 << inp.length); bitMask++) {
StringBuilder str = new StringBuilder();
for (int i = 0; i < inp.length; i++) {
if ((bitMask & (1 << i)) != 0) {
str.append(inp[i]);
}
}
if (str.length() > 0) {
resultSet.add(str.toString());
}
}
return resultSet;
}
You can change the current method to be a private one and interface it with a public method with one argument e.g.:
private static Set<String> getAnyPermutations(String s,String strInput) {
Set<String> resultSet = new HashSet<>();
char[] inp = strInput.toCharArray();
for(int i=0; i<inp.length; i++){
String temp =s+String.valueOf(inp[i]);
resultSet.add(temp);
if(i+1<=inp.length)
resultSet.addAll(getAnyPermutations(temp, String.valueOf(Arrays.copyOfRange(inp, i+1, inp.length))));
}
return resultSet;
}
Now, you can expose a one argument method to the user which in turn will call the above method, e.g.:
public static Set<String> getAnyPermutations(String strInput) {
return getAnyPermutations("", strInput);
}
Update
If you can't create any other method at all then the only alternative would be to use var-args. However, that requires change in the implementation and doesn't actually restrict the user from passing multiple values.
You can rewrite this particular algorithm so that it doesn't need to carry a state through to the recursively called invocation.
(Java-centric pseudocode):
Set<String> getAnyPermutations(String str) {
if(str.length() == 0) {
return Collections.emptySet();
}
String head = str.substring(0,1);
String tail = str.substring(1);
Set<String> permutationsOfTail = getAnyPermutations(tail);
Set<String> result = new HashSet();
// Head on its own
// For input 'ABC', adds 'A'
result.add(head);
// All permutations that do not contain head
// For input 'ABC', adds 'B', 'C', 'BC'
result.addAll(permutationsOfTail);
// All permutations that contain head along with some other elements
// For input 'ABC', adds 'AB, 'AC', 'ABC'
for(String tailPerm : permutationsOfTail) {
result.add(head + tailPerm);
}
return result;
}
This meets your aim of not creating any extra methods -- but note that it would be cleaner code if the for loop was extracted into a new method Set<String> prefixEachMember(String prefix, Set<String> strings) allowing result.addAll(prefixEachMember(head,permutationsOfTail)).
However it's not always possible to do this, and sometimes you do want to carry state. One way is the way you've asked to avoid, but I'm going to include it in my answer because it's a clean and common way of achieving the aim.
public Foo myMethod(Bar input) {
return myMethod(new HashSet<Baz>(), input);
}
private Foo myMethod(Set<Baz> state, Bar input) {
if(...) {
return ...;
} else {
...
return myMethod(..., ...);
}
}
Here, the first method is your public API, in which the collector/state parameter is not required. The second method is a private worker method, which you initially call with an empty state object.
Another option is to refer to an object field. I would recommend against this, however, because it gets confusing when recursive code refers to a global object.

How do you write a counter method to save space and time?

I am working with interface Multiset which is then used by two different classes: ArrayListMultiset and CounterMultiset
The ArrayListMultiset simply uses the .add method to put something in the list. So in a loop like,
Multiset<String> set = new Multiset<String>();
for(int i = 0; i < 10000; i++)
{
set.add("Hello");
}
this will cause the program to add Hello to a list 10,000 times.
Next we have CounterMultiset. It stores a Pair object (another class that takes in (T, Integer), where T is the String, "Hello" and Integer is the number of times it is trying to be added. I have written it like so:
public void add(Multiset<T> item)
{
if(!contains(item))
{
Pair newpair = new Pair(item, 0);
pairs.add(newpair);
}
for(int i = 0; i < pairs.size(); i++)
{
if(pairs.get(i).getFirst() == item)
{
pairs.get(i).changeSecond();
}
}
}
changeSecond() increments the second number in the Object by 1 to show that the word Hello has tried to be added again.
My question is, is this an appropriate way to save space and time for a program? When would it be faster to use a Counter and when would it be faster to simply add "Hello" 10,000 times?
Hello is an intern string in your code.
You will not have a copy of Hello for each element of ArrayListMultiset. You will have a reference to String Pool object.
What is faster for get/put (I assume) - depends on underlying data structures.

New to Java, why doesn't this code work? i++ is 'dead code', and the function does not return a variable of desired type; even though it does

I am making a little program in Java that makes a program that acts like a "library", only with video games. In the program you shoud be able to add, delete and edit games; you shoud also be able to list off all the games in the "library".
To be able to delete and edit games, I have decided to implement a function that will return a list of all the elements in the list that matches the query String that I give it, and then the user will have to choose between a numbered list of all the returned results.
This is my code:
public static ArrayList<GameStorage> findElement(ArrayList<GameStorage> gameList, String query) {
ArrayList<GameStorage> temp = new ArrayList<GameStorage>();
for(int i = 0; i < gameList.size(); i++) {
if(gameList.get(i).getName().contains(query)) {
temp.add(gameList.get(i));
}
return temp;
}
}
I initialize an empty GameStorage ArrayList, and use this to store all the desired elements and then return it. However, this does not work at all and Eclipse says that the i++ part is supposedly 'dead code' (and this supposedly means that the code never is reached), the function also says that I do not return a result of the desired type ArrayList<GameStorage>, even though I do. I don't know what I've done wrong.
Could someone perhaps enlighten me?
return should be after your loop body, not the last statement. Because it is the last statement i++ is never reached. Change it like
for(int i = 0; i < gameList.size(); i++) {
if(gameList.get(i).getName().contains(query)) {
temp.add(gameList.get(i));
}
}
return temp;
You could also use a for-each loop like
for (GameStorage gs : gameList) {
if (gs.getName().contains(query)) {
temp.add(gs);
}
}
return temp;
And in Java 8+ you might implement the entire method1 with a filter and Collectors
public static List<GameStorage> findElement(List<GameStorage> gameList, String query) {
return gameList.stream().filter(x -> x.getName().contains(query))
.collect(Collectors.toList());
}
1And I would prefer to program to the List interface.
You can make your code shorter with java 8+ lambda's example below
gameList.forEach((k)->{
if(k.getName().contains(query)){
temp.add(k)
}
}

How to return multiple conditions?

What should be my return at the end of my for loop? I'm trying to display the added results of all three parties numDemocrat, numRepulican and numIndepent by
calculating and then printing the number of democrats (party is "D"),
republicans (party is "R"), and independents (party is anything else).
I'm currently looping over the MemberOfCongress ArrayList returned by parseMembersOfCongress and counting up how many of each party type there are.
Also in my loop I need to check which party the current member belongs to and increment the proper variable. After the loop completes I then print the totals.
public void printPartyBreakdownInSenate()
{
CongressDataFetcher.fetchSenateData(congressNum);
}
{
ArrayList<MemberOfCongress> parseMembersOfCongress; String jsonString;
}
{
System.out.println("Number of Members of the Senate of the " + "&congressNum=" + "?chamber=");
}
public String[]
{
int numDemocrats = 0;
int numRepblican = 0;
int numIndepent = 0;
ArrayList<MemberOfCongress> members;
for (MemberOfCongress memberParty : members) {
if (memberParty.getParty() == "D" ) {
numDemocrats++;
}
else if (memberParty.getParty() == "R" ){
numRepblican++;
}
else if (memberParty.getParty() == "null"){
numIndepent++;
}
}
return ???;
}
Firstly i'm 99% positive you cannot return multiple values, unless your return either an array, an array list or a map.
But what you could do as a work around is one of the following.
1). Return a String array of party members.
2). Return a 2D array mapping name to age or something similar.
3). Return a hashmap of the data with a custom class of information mapped to a name.
4). Use getters to get different pieces of the data at time or all at once.
Java (like the majority of programmming languages) allows only a single return value from a method. There are lots of good reasons for this.
If you need to return multiple values then you will need a separate class for which your method can return a reference to an instance.
For example, in your case:
public enum Party {
REPUBLICAN, DEMOCRAT, OTHER;
}
public Map<Party, Integer> senatorsByParty(List<MemberOfCongress> senators) {
return senators.stream()
.collect(Collectors.groupingBy(MemberOfCongress::getParty, Collectors.counting()));
}
Apologies if you are not aware of the Java 8 syntax here. The stream functions are really just saying 'take all the senators, group them by party and then count them'. The key point is that you are returning a map from parties to integers representing the count of senators.

Exponentially increasing amounts of time to repeat a function

I've written my own math parser and for some reason it takes increasing amounts of time to parse when I tried to profile the parser.
For testing I used this input: Cmd.NUM_9,Cmd.NUM_0,Cmd.NUM_0,Cmd.DIV,Cmd.NUM_2,Cmd.ADD,Cmd.NUM_6,Cmd.MULT,Cmd.NUM_3
Single execution ~1.7ms
3000 repeats ~ 1,360ms
6000 repeats ~ 5,290ms
9000 repeats ~11,800ms
The profiler says 64% of the time was spent on this function:
this is my function to allow implicit multiplications.
private void enableImplicitMultiplication(ArrayList<Cmd> input) {
int input_size = input.size();
if (input_size<2) return;
for (int i=0; i<input_size; i++) {
Cmd cmd = input.get(i);
if (i>0) {
Cmd last = input.get(i-1);
// [EXPR1, EXPR2] => [EXPR1, MULT, EXPR2]
boolean criteria1 = Cmd.exprnCmds.contains(cmd) && Cmd.exprnCmds.contains(last);
// [CBRAC, OBRAC] => [CBRAC, MULT, OBRAC]
// [NUM_X, OBRAC] => [NUM_X, MULT, OBRAC]
boolean criteria2 = cmd==Cmd.OBRAC && (last==Cmd.CBRAC || Cmd.constantCmds.contains(last));
// [CBRAC, NUM_X] => [CBRAC, MULT, NUM_X]
boolean criteria3 = last==Cmd.CBRAC && Cmd.constantCmds.contains(cmd);
if (criteria1 || criteria2 || criteria3) {
input.add(i++, Cmd.MULT);
}
}
}
}
What's going on here??
I executed the repeats like this:
public static void main(String[] args) {
Cmd[] iArray = {
Cmd.NUM_9,Cmd.NUM_0,Cmd.NUM_0,Cmd.DIV,Cmd.NUM_2,Cmd.ADD,Cmd.NUM_6,Cmd.MULT,Cmd.NUM_3
};
ArrayList<Cmd> inputArray = new ArrayList<Cmd>(Arrays.asList(iArray));
DirtyExpressionParser parser = null;
int repeat=9000;
double starttime = System.nanoTime();
for (int i=0; i<repeat; i++) {
parser = new DirtyExpressionParser(inputArray);
}
double endtime = System.nanoTime();
System.out.printf("Duration: %.2f ms%n",(endtime-starttime)/1000000);
System.out.println(parser.getResult());
}
Constructor looks like this:
public DirtyExpressionParser(ArrayList<Cmd> inputArray) {
enableImplicitMultiplication(inputArray); //executed once for each repeat
splitOnBrackets(inputArray); //resolves inputArray into Expr objects for each bracket-group
for (Expr expr:exprArray) {
mergeAndSolve(expr);
}
}
Your microbenchmark code is altogether wrong: microbenchmarking on the JVM is a craft in its own right and is best left to dedicated tools such as jmh or Google Caliper. You don't warm up the code, don't control for GC pauses, and so on.
One detail which does come out by analyzing your code is this:
you reuse the same ArrayList for all repetitions of the function call;
each function call may insert an element to the list;
insertion is a heavyweight operation on ArrayList: the whole contents of the list after the inserted element must be copied.
You should at least create a fresh ArrayList for each invocation, but that will not make your whole methodology correct.
From our discussion in the comments I diagnose the following issue you may have with understanding your code:
In Java there is no such thing as a variable whose value is an object. The value of the variable is a reference to the object. Therefore when you say new DirtyExpressionParser(inputArray), the constructor does not receive its own private copy of the list, but rather a reference to the one and only ArrayList you have instantiated in your main method. The next constructor call gets this same list, but now modified by the earlier invocation. This is why your list is growing all the time.

Categories