Groovy TemplateEngines and OutOfMemory. Possible memory leak?

Groovy TemplateEngines and OutOfMemory. Possible memory leak? - java

I am having some trouble in using the Groovy TemplateEngines in Java without running in OOM. When creating a lot of different templates it seems to me that there a lot of scripts created on the heap - which are then never garbage
collected.
I use java 8. When running this code with -Xmx32M there are about 3000 iterations possible. After that is a OOM-Error thrown.
Here is my code:
import groovy.text.SimpleTemplateEngine;
import groovy.text.Template;
import groovy.text.TemplateEngine;
import java.util.HashMap;
import java.util.Map;
public class Test {
public static void main(String[] args) throws Exception {
String groovy = "XX-${i}";
for (int i = 0; i < (1000000000); i++) {
TemplateEngine e = new SimpleTemplateEngine();
Template t = e.createTemplate(groovy);
Map<String, Object> binding = new HashMap<>();
binding.put("i", i);
String res = t.make(binding).toString();
if (i % 100 == 0) {
System.out.println("->" + res);
}
}
}
}
I also tried different variations and ClassLoaded - but in essence the results are always the same. As I can't find any current issues with that I guess I am missing something.
Could anyone help to enlighten me?
Tino

Here is your problem https://bugs.openjdk.java.net/browse/JDK-8037342.
Each time the parser runs it creates a new unique class based off the number of parse being done. For instance, after a while the class names look like
groovy.runtime.metaclass.SimpleTemplateScript4237MetaClass
groovy.runtime.metaclass.SimpleTemplateScript4238MetaClass
After a while the ClassLoader's parallelLockMap will fill the heap and nothing is eligible to be GC'd. It's sort of like a OOM PermGen error.

Use Apache Commons Text. Fast and Efficient alternative to SimpleTemplateEngine.
String templateString, Map binding;
StrSubstitutor sb = new StrSubstitutor(binding);
String value = sb.replace(templateString);

I have struggling with that problem has been a while and now I come up with that workaround.
Just call clear after run your script.
https://gist.github.com/jpozorio/38f26120e6346dfd74cecd7a147028aa

Related

Creating files in a separate thread

I have a method that starts creating JSON files in each of the folders in my tree.
public static void fill(List<String> subFoldersPaths) {
for (int i = 0; i < subFoldersPaths.size(); i++) {
String fullFileName = subFoldersPaths.get(i) + FILE_NAME;
String formatFullFileName = String.format(fullFileName, i)+"%d";
Runnable runnable = new JsonCreator(formatFullFileName);
new Thread(runnable).start();
}
}
List<String> subFoldersPaths is a list that contains paths to each folder in order.
Here is my folder structure:
I want each folder to be filled with files in a separate thread every 0.08 seconds. But my class will not fill every folder.
Here is a class that implements Runnable, which should perform the filling:
import com.epam.lab.model.Author;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import net.andreinc.mockneat.MockNeat;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.FileWriter;
import java.io.IOException;
public class JsonCreator implements Runnable {
private static Logger logger = LogManager.getLogger();
private static String fileName;
private static final int FILES_COUNT = 100;
public JsonCreator(String s){
this.fileName = s;
}
#Override
public void run() {
for (int i = 0; i < FILES_COUNT; i++) {
try {
String formatFullFileName = String.format(fileName, i)+".json";
FileWriter fileWriter = new FileWriter(formatFullFileName);
fileWriter.write(createJsonString());
fileWriter.close();
Thread.sleep(80);
} catch (IOException | InterruptedException e) {
logger.error("File was not created", e);
}
}
}
private static String createJsonString() {
MockNeat mockNeat = MockNeat.threadLocal();
Gson gson = new GsonBuilder()
.setPrettyPrinting()
.create();
String json = mockNeat
.reflect(Author.class)
.field("authorName", mockNeat.names().first())
.field("authorSurname", mockNeat.names().last())
.map(gson::toJson)
.val();
return json;
}
}
But this class fills not every folder with files. (maybe there is a problem with the file names) I can not figure it out.
And I want each folder below "foo" to be filled in a separate thread of JSON files in the amount of FILES_COUNT = 10
some examples of algorithm execution:
The folder structure is created with the participation of the random, so it is almost always different. but this does not affect the fact that files are not created in all folders

Your code is buggy; you cannot ever use that FileWriter constructor. Use new FileWriter(formatFullFileName, StandardCharsets.UTF_8), which is only in jdk11. If you're not on JDK11, you can't use FileWriter at all (it uses platform default encoding, and that is not acceptable; JSON must be in UTF-8 as per the JSON spec, and you have no guarantee that UTF-8 is your platform default).
you aren't guarding your FileWriter with an ARM block - you should add that.
In the initial block, formatFullFileName is a variable that is a format string. In the run() method, it's the opposite (it's the result of running a String.format op on one). Makes your code very hard to read.
Most likely your filenames are incorrect. You should be using List<Path> which would have removed any doubt. If your List<String> subFoldersPaths contains, for example, /home/misnomer/project/foo/1stLayerSubFolder0 in it, and the constant FILE_NAME (which you did not put in your pastes) is, say, example, then the path for the very first file to be created becomes: /home/misnomer/project/foo/1stLayerSubFolder0example0.json which is not what you wanted - you're missing a slash.
NB: If using the newer path API, writing a string to a file becomes vastly simpler: Files.write(path, string) is all you need (and note that the Files API defaults to UTF-8, unlike most other parts of the java libraries that involve turning strings to bytes or vice versa).
The paste needs more info, or you should debug this on your own: Print when you write a file, preferably including the thread ID (you can get it with Thread.currentThread().getName()). That's how programming works: You don't just stare at it, go --heck, I dunno, better ask stack overflow!-- and then give up. You debug it. Use a debugger, or if you can't/don't want to, use the poor man's debugger: Add a whole bunch of System.out.println statements. Go through your code and imagine (write it down if you have to) which each step is doing. Then, add a println statement that confirms this. The very place where what the program says it is doing does not match with what you thought it would do? That's where a bug is. Fix it, and keep going until all bugs are eliminated.

WALA Call Graph

I'm very new to WALA and trying to work through some simple examples to get a feel for it. I'm trying to build a call graph for the very simple class below
public class Example {
public static void main(String[] args) {
int x = 1;
int y = 1;
int z = x + y;
Math.pow(x, y); // issue here
}
}
My WALA code (simplified somewhat) is:
import com.ibm.wala.ipa.callgraph.*;
import com.ibm.wala.ipa.callgraph.impl.Util;
import com.ibm.wala.ipa.cha.ClassHierarchy;
import com.ibm.wala.util.WalaException;
import com.ibm.wala.util.config.AnalysisScopeReader;
...
AnalysisScope scope = AnalysisScopeReader.makeJavaBinaryAnalysisScope(jar, null);
ClassHierarchy cha = ClassHierarchy.make(scope);
Iterable<Entrypoint> entryPoints = Util.makeMainEntrypoints(scope, cha);
AnalysisOptions opts = new AnalysisOptions(scope, entryPoints);
AnalysisCache cache = new AnalysisCache();
CallGraphBuilder cgBuilder = Util.makeZeroCFABuilder(opts, cache, cha, scope);
CallGraph cg = cgBuilder.makeCallGraph(opts, null);
It works fine when the example doesn't have any calls to other methods inside main, but just hangs otherwise (stuck cgBuilder.makeCallGraph).
Any advice is much appreciated.

Here are some options that might help making your run a bit faster
1) consider removing the reflectionOptions from your analysis options. This will not be great for more complex code, but for the basic example it might help
you can do so by
options.setReflectionOptions(ReflectionOptions.NONE);
2) try using a different builder
for example
ZeroXCFABuilder.make(cha, options, cache, null, null,
ZeroXInstanceKeys.ALLOCATIONS | ZeroXInstanceKeys.CONSTANT_SPECIFIC);
There are more options, so check ZeroXInstanceKeysto see which options you might be willing to use.
3) finally, and this is probably going to give you a good run time, add exclusions
String exclusionFile = p.getProperty("exclusions");
AnalysisScope scope = AnalysisScopeReader.makeJavaBinaryAnalysisScope(appJar, exclusionFile != null ? new File(exclusionFile)
please note the following regex structure of an exclusion file
java\/awt\/.*
javax\/swing\/.*
sun\/awt\/.*
sun\/swing\/.*
com\/sun\/.*
sun\/.*
No spaces, single entry per line, etc.
this should help

File I/O bottleneck found via VisualVM

I've found a bottleneck in my app that keeps growing as data in my files grow (see attached screenshot of VisualVM below).
Below is the getFileContentsAsList code. How can this be made better performance-wise? I've read several posts on efficient File I/O and some have suggested Scanner as a way to efficiently read from a file. I've also tried Apache Commons readFileToString but that's not running fast as well.
The data file that's causing the app to run slower is 8 KB...that doesn't seem too big to me.
I could convert to an embedded database like Apache Derby if that seems like a better route. Ultimately looking for what will help the application run faster (It's a Java 1.7 Swing app BTW).
Here's the code for getFileContentsAsList:
public static List<String> getFileContentsAsList(String filePath) throws IOException {
if (ReceiptPrinterStringUtils.isNullOrEmpty(filePath)) throw new IllegalArgumentException("File path must not be null or empty");
Scanner s = null;
List<String> records = new ArrayList<String>();
try {
s = new Scanner(new BufferedReader(new FileReader(filePath)));
s.useDelimiter(FileDelimiters.RECORD);
while (s.hasNext()) {
records.add(s.next());
}
} finally {
if (s != null) {
s.close();
}
}
return records;
}

The size of an ArrayList is multiplied by 1.5 when necessary. This is O(log(N)). (Doubling was used in Vector.) I would certainly use an O(1) LinkedList here, and BufferedReader.readLine() rather than a Scanner if I was trying to speed it up. It's hard to believe that the time to read one 8k file is seriously a concern. You can read millions of lines in a second.

So, file.io gets to be REAL expensive if you do it a lot...as seen in my screen shot and original code, getFileContentsAsList, which contains file.io calls, gets invoked quite a bit (18.425 times). VisualVM is a real gem of a tool to point out bottlenecks like these!
After contemplating over various ways to improve performance, it dawned on me that possibly the best way is to do file.io calls as little as possible. So, I decided to use private static variables to hold the file contents and to only do file.io in the static initializer and when a file is written to. As my application is (fortunately) not doing excessive writing (but excessive reading), this makes for a much better performing application.
Here's the source for the entire class that contains the getFileContentsAsList method. I took a snapshot of that method and it now runs in 57.2 ms (down from 3116 ms). Also, it was my longest running method and is now my 4th longest running method. The top 5 longest running methods run for a total of 498.8 ms now as opposed to the ones in the original screenshot that ran for a total of 3812.9 ms. That's a percentage decrease of about 85%
[100 * (498.8 - 3812.9) / 3812.9].
package com.mbc.receiptprinter.util;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Level;
import org.apache.commons.io.FileUtils;
import com.mbc.receiptprinter.constant.FileDelimiters;
import com.mbc.receiptprinter.constant.FilePaths;
/*
* Various File utility functions. This class uses the Apache Commons FileUtils class.
*/
public class ReceiptPrinterFileUtils {
private static Map<String, String> fileContents = new HashMap<String, String>();
private static Map<String, Boolean> fileHasBeenUpdated = new HashMap<String, Boolean>();
static {
for (FilePaths fp : FilePaths.values()) {
File f = new File(fp.getPath());
try {
FileUtils.touch(f);
fileHasBeenUpdated.put(fp.getPath(), false);
fileContents.put(fp.getPath(), FileUtils.readFileToString(f));
} catch (IOException e) {
ReceiptPrinterLogger.logMessage(ReceiptPrinterFileUtils.class,
Level.SEVERE,
"IOException while performing FileUtils.touch in static block of ReceiptPrinterFileUtils", e);
}
}
}
public static String getFileContents(String filePath) throws IOException {
if (ReceiptPrinterStringUtils.isNullOrEmpty(filePath)) throw new IllegalArgumentException("File path must not be null or empty");
File f = new File(filePath);
if (fileHasBeenUpdated.get(filePath)) {
fileContents.put(filePath, FileUtils.readFileToString(f));
fileHasBeenUpdated.put(filePath, false);
}
return fileContents.get(filePath);
}
public static List<String> convertFileContentsToList(String fileContents) {
List<String> records = new ArrayList<String>();
if (fileContents.contains(FileDelimiters.RECORD)) {
records = Arrays.asList(fileContents.split(FileDelimiters.RECORD));
}
return records;
}
public static void writeStringToFile(String filePath, String data) throws IOException {
fileHasBeenUpdated.put(filePath, true);
FileUtils.writeStringToFile(new File(filePath), data);
}
public static void writeStringToFile(String filePath, String data, boolean append) throws IOException {
fileHasBeenUpdated.put(filePath, true);
FileUtils.writeStringToFile(new File(filePath), data, append);
}
}

ArrayLists have a good performance at reading and also on writing IF the lenth does not change very often. In your application the length changes very often (size is doubled, when it is full and an element is added) and your application needs to copy your array into an new, longer array.
You could use a LinkedList, where new elements are appended and no copy actions are needed.
List<String> records = new LinkedList<String>();
Or you could initialize the ArrayList with the approximated finished Number of Words. This will reduce the number of copy actions.
List<String> records = new ArrayList<String>(2000);

Java Collections containsAll Weired Behavior

I have following code , where I am using superList and subList , I want to check that subList is actually a subList of superList.
My objects do not implement hashCode or equals methods. I have created the similar situation in the test. When I run the test then the result show very big performance difference between results from JDK collection and common collections.After Running the test I am getting following output.
Time Lapsed with Java Collection API 8953 MilliSeconds & Result is true
Time Lapsed with Commons Collection API 78 MilliSeconds & Result is true
My question is why is java collection , so slow in processing the containsAll operation. Am I doing something wrong there? I have no control over collection Types I am getting that from legacy code. I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.
package com.mycompany.tests;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import org.apache.commons.collections.CollectionUtils;
import org.junit.Before;
import org.junit.Test;
public class CollectionComparison_UnitTest {
private Collection<MyClass> superList = new ArrayList<MyClass>();
private Collection<MyClass> subList = new HashSet<MyClass>(50000);
#Before
public void setUp() throws Exception {
for (int i = 0; i < 50000; i++) {
MyClass myClass = new MyClass(i + "A String");
superList.add(myClass);
subList.add(myClass);
}
#Test
public void testIt() {
long startTime = System.currentTimeMillis();
boolean isSubList = superList.containsAll(subList);
System.out.println("Time Lapsed with Java Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
startTime = System.currentTimeMillis();
isSubList = CollectionUtils.isSubCollection(subList, superList);
System.out.println("Time Lapsed with Commons Collection API "
+ (System.currentTimeMillis() - startTime)
+ " MilliSeconds & Result is " + isSubList);
}
}
class MyClass {
String myString;
MyClass(String myString) {
this.myString = myString;
}
String getMyString() {
return myString;
}
}

Different algorithms:
ArrayList.containsAll() offers O(N*N), while CollectionUtils.isSubCollection() offers O(N+N+N).

ArrayList.containsAll is inherited from AbstractCollection.containsAll and is a simple loop checking all elements in row. Each step is a slow linear search. I don't know how CollectionUtils works, but it's not hard to do it much faster then using the simple loop. Converting the second List to a HashSet is a sure win. Sorting both lists and going through them in parallel could be even better.
EDIT:
The CollectionUtils source code makes it clear. They're converting both collections to "cardinality maps", which is a simple and general way for many operations. In some cases it may not be a good idea, e.g., when the first list is empty or very short, you in fact loose time. In you case it's a huge win in comparison to AbstractCollection.containsAll, but you could do even better.
Addendum years later
The OP wrote
I know if I use HashSet for superList then I would get big performance gains using JDK containsAll operation, but unfortunately that is not possible for me.
and that's wrong. Classes without hashCode and equals inherit them from Object and can be used with a HashSet and everything works perfectly. Except for that each object is unique, which may be unintended and surprising, but the OP's test superList.containsAll(subList) does exactly the same thing.
So the quick solutions would be
new HashSet<>(superList).containsAll(subList)

You should at least try the tests in the opposite order. Your results may very well just show that the JIT compiler is doing its job well :-)

FST (Finite-state transducers) Libraries, C++ or java

I have a problem to solve using FSTs.
Basically, I'll make a morphological parser, and in this moment i have to work with large transducers. The performance is The Big issue here.
Recently, i worked in c++ in other projects where the performance matters, but now, i'am considering java, because the java's benefits and because java is getting better.
I studied some comparisons between java and c++, but I cannot decide what language i should use for this specific problem because it depends on lib in use.
I can´t find much information about java's libs, so, my question is: Are there any open source java libs in which the performance is good, like The RWTH FSA Toolkit that i read in an article that is the fastest c++ lib?
Thanks all.

What are the "benefits" of Java, for your purposes? What specific problem does that platform solve that you need? What is the performance constraint you must consider? Were the "comparisons" fair, because Java is actually extremely difficult to benchmark. So is C++, but you can at least get some algorithmic boundary guarantees from STL.
I suggest you look at OpenFst and the AT&T finite-state transducer tools. There are others out there, but I think your worry about Java puts the cart before the horse-- focus on what solves your problem well.
Good luck!

http://jautomata.sourceforge.net/ and http://www.cs.duke.edu/csed/jflap/ are based Java finite state machine libraries, although I don't have experience using them so I cannot comment on the efficiency.

I'm one of the developers of the morfologik-stemming library. It's pure Java and its performance is very good, both when you build the automaton and when you use it. We use it for morphological analysis in LanguageTool.

The problem here is the minimum size of your objects in Java. In C++, without virtual methods and run time type identification, your objects weight exactly their content. And the time your automata take to manipulate memory has a big impact on performance.
I think that should be the main reason for choosing C++ over Java.

OpenFST is a C++ finite state transducer framework that is really comprehensive. Some people from CMU ported it to Java for use in their natural language processing.
A blog post series describing it.
The code is located on svn.
Update:
I ported it to java here

Lucene has a excellent implementation of FST, which is easy to use and high performance, making query engines like Elasticsearch, Solr deliver very fast sub-second term based query.Let me take an example:
import com.google.common.base.Preconditions;
import org.apache.lucene.store.ByteArrayDataInput;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.GrowableByteArrayDataOutput;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.IntsRefBuilder;
import org.apache.lucene.util.fst.Builder;
import org.apache.lucene.util.fst.FST;
import org.apache.lucene.util.fst.PositiveIntOutputs;
import org.apache.lucene.util.fst.Util;
import java.io.IOException;
public class T {
private final String inputValues[] = {"cat", "dog", "dogs"};
private final long outputValues[] = {5, 7, 12};
// https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/fst/package-summary.html
public static void main(String[] args) throws IOException {
T t = new T();
FST<Long> fst = t.buildFSTInMemory();
System.out.println(String.format("memory used for fst is %d bytes", fst.ramBytesUsed()));
t.searchFST(fst);
byte[] bytes = t.serialize(fst);
System.out.println(String.format("length of serialized fst is %d bytes", bytes.length));
fst = t.deserialize(bytes);
t.searchFST(fst);
}
private FST<Long> buildFSTInMemory() throws IOException {
// Input values (keys). These must be provided to Builder in Unicode sorted order! Use Collections.sort() to sort inputValues first.
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
Builder<Long> builder = new Builder<Long>(FST.INPUT_TYPE.BYTE1, outputs);
BytesRef scratchBytes = new BytesRef();
IntsRefBuilder scratchInts = new IntsRefBuilder();
for (int i = 0; i < inputValues.length; i++) {
// scratchBytes.copyChars(inputValues[i]);
scratchBytes.bytes = inputValues[i].getBytes();
scratchBytes.offset = 0;
scratchBytes.length = inputValues[i].length();
builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
}
FST<Long> fst = builder.finish();
return fst;
}
private FST<Long> deserialize(byte[] bytes) throws IOException {
DataInput in = new ByteArrayDataInput(bytes);
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
FST<Long> fst = new FST<Long>(in, outputs);
return fst;
}
private byte[] serialize(FST<Long> fst) throws IOException {
final int capicity = 32;
GrowableByteArrayDataOutput out = new GrowableByteArrayDataOutput(capicity);
fst.save(out);
return out.getBytes();
}
private void searchFST(FST<Long> fst) throws IOException {
for (int i = 0; i < inputValues.length; i++) {
Long value = Util.get(fst, new BytesRef(inputValues[i]));
Preconditions.checkState(value == outputValues[i], "fatal error");
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.