I'm very new to WALA and trying to work through some simple examples to get a feel for it. I'm trying to build a call graph for the very simple class below
public class Example {
public static void main(String[] args) {
int x = 1;
int y = 1;
int z = x + y;
Math.pow(x, y); // issue here
}
}
My WALA code (simplified somewhat) is:
import com.ibm.wala.ipa.callgraph.*;
import com.ibm.wala.ipa.callgraph.impl.Util;
import com.ibm.wala.ipa.cha.ClassHierarchy;
import com.ibm.wala.util.WalaException;
import com.ibm.wala.util.config.AnalysisScopeReader;
...
AnalysisScope scope = AnalysisScopeReader.makeJavaBinaryAnalysisScope(jar, null);
ClassHierarchy cha = ClassHierarchy.make(scope);
Iterable<Entrypoint> entryPoints = Util.makeMainEntrypoints(scope, cha);
AnalysisOptions opts = new AnalysisOptions(scope, entryPoints);
AnalysisCache cache = new AnalysisCache();
CallGraphBuilder cgBuilder = Util.makeZeroCFABuilder(opts, cache, cha, scope);
CallGraph cg = cgBuilder.makeCallGraph(opts, null);
It works fine when the example doesn't have any calls to other methods inside main, but just hangs otherwise (stuck cgBuilder.makeCallGraph).
Any advice is much appreciated.
Here are some options that might help making your run a bit faster
1) consider removing the reflectionOptions from your analysis options. This will not be great for more complex code, but for the basic example it might help
you can do so by
options.setReflectionOptions(ReflectionOptions.NONE);
2) try using a different builder
for example
ZeroXCFABuilder.make(cha, options, cache, null, null,
ZeroXInstanceKeys.ALLOCATIONS | ZeroXInstanceKeys.CONSTANT_SPECIFIC);
There are more options, so check ZeroXInstanceKeysto see which options you might be willing to use.
3) finally, and this is probably going to give you a good run time, add exclusions
String exclusionFile = p.getProperty("exclusions");
AnalysisScope scope = AnalysisScopeReader.makeJavaBinaryAnalysisScope(appJar, exclusionFile != null ? new File(exclusionFile)
please note the following regex structure of an exclusion file
java\/awt\/.*
javax\/swing\/.*
sun\/awt\/.*
sun\/swing\/.*
com\/sun\/.*
sun\/.*
No spaces, single entry per line, etc.
this should help
Related
I am having some trouble in using the Groovy TemplateEngines in Java without running in OOM. When creating a lot of different templates it seems to me that there a lot of scripts created on the heap - which are then never garbage
collected.
I use java 8. When running this code with -Xmx32M there are about 3000 iterations possible. After that is a OOM-Error thrown.
Here is my code:
import groovy.text.SimpleTemplateEngine;
import groovy.text.Template;
import groovy.text.TemplateEngine;
import java.util.HashMap;
import java.util.Map;
public class Test {
public static void main(String[] args) throws Exception {
String groovy = "XX-${i}";
for (int i = 0; i < (1000000000); i++) {
TemplateEngine e = new SimpleTemplateEngine();
Template t = e.createTemplate(groovy);
Map<String, Object> binding = new HashMap<>();
binding.put("i", i);
String res = t.make(binding).toString();
if (i % 100 == 0) {
System.out.println("->" + res);
}
}
}
}
I also tried different variations and ClassLoaded - but in essence the results are always the same. As I can't find any current issues with that I guess I am missing something.
Could anyone help to enlighten me?
Tino
Here is your problem https://bugs.openjdk.java.net/browse/JDK-8037342.
Each time the parser runs it creates a new unique class based off the number of parse being done. For instance, after a while the class names look like
groovy.runtime.metaclass.SimpleTemplateScript4237MetaClass
groovy.runtime.metaclass.SimpleTemplateScript4238MetaClass
After a while the ClassLoader's parallelLockMap will fill the heap and nothing is eligible to be GC'd. It's sort of like a OOM PermGen error.
Use Apache Commons Text. Fast and Efficient alternative to SimpleTemplateEngine.
String templateString, Map binding;
StrSubstitutor sb = new StrSubstitutor(binding);
String value = sb.replace(templateString);
I have struggling with that problem has been a while and now I come up with that workaround.
Just call clear after run your script.
https://gist.github.com/jpozorio/38f26120e6346dfd74cecd7a147028aa
I'm doing a multithreaded Java optimization algorhithm which initiates various instances of the same subclass, for time improvement reason. This subclass have itself other subclasses.
The algorhthm searchs though the search space for an optimal solution, by means of random movements. So, if i run several instances of it, i should take advantage of my system's cores and improve the search widing the search space.
I've noticed that the first instance runs well, but others seems to share the running objects of the first, picking the information they hold, even when it has finished.
Thats not what i want; i want any of the instances be insulated for the others.
I'm using Executor Services:
Code:
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<float[][]> service = new ExecutorCompletionService<float[][]>(executorService);
IteratedGreedy[] ig = new IteratedGreedy[instances];
Future<float[][]>[] future = new Future[instances];
// launching instances:
for (int i=0; i<instances; i++)
{
path = "\\" + i + ".txt";
ig[i] = new IteratedGreedy(path);
future[i] = service.submit(ig[i]);
}
// retrieveing solutions:
for (int i=1; i<instances; i++)
{
solutions[i] = future[i].get();
}
As you may think, the IteratedGreedy function has its own sublcasses inside.
Any help is appreciated.
Problem is, somewhere in the code, theres a class with a global static variable:
static float[][] matrix;
And then, a method uses it:
SomeMethod()
{
int f = matrix[i][b];
}
The solution is to change the way the method obtains the object:
float[][] matrix;
SomeMethod(float[][] matrix)
{
int f = matrix[i][j];
}
I have this constructor:
public Revaluator(File model,PrintStream ps) {
modelFile=model;
rsession=Rsession.newInstanceTry(ps, null);
rsession.eval("library(e1071)");
rsession.load(modelFile);
}
i want to load a model and predict with it.
the problem that Rsession.newInstanceTry(ps, null); is always the same session, so if i load another model, like:
Revaluator re1=new Revaluator(new File("model1.RData"),System.out);
Revaluator re2=new Revaluator(new File("model2.RData"),System.out);
Both re1 and re2 using the same model, since the var name is model, so only the last one loaded.
the evaluate function:
public REXP evaluate(Object[] arr){
String expression=String.format("predict(model, c(%s))", J2Rarray(arr));
REXP ans=rsession.eval(expression);
return ans;
}
//J2Rarray just creates a string from the array like "1,2,true,'hello',false"
i need to load about 250 predictors, is there a way to get every instance of Rsession as a new separated R Session?
You haven't pasted all of your code in your question, so before trying the (complicated) way below, please rule out the simple causes and make sure that your fields modelFile and rsession are not declared static :-)
If they are not:
It seems that the way R sessions are created is OS dependent.
On Unix it relies on the multi-session ability of R itself, on Windows it starts with Port 6311 and checks if it is still free. If it's not, then the port is incremented and it checks again, if it's free and so on.
Maybe something goes wrong with checking free ports (which OS are you working on?).
You could try to configure the ports manually and explicitly start different local R servers like this:
Logger simpleLogger = new Logger() {
public void println(String string, Level level) {
if (level == Level.WARNING) {
p.print("! ");
} else if (level == Level.ERROR) {
p.print("!! ");
}
p.println(string);
}
public void close() {
p.close();
}
};
RserverConf serverConf = new RserverConf(null, staticPortCounter++, null, null, null);
Rdaemon server = new Rdaemon(serverConf, this);
server.start(null);
rsession = Rsession.newInstanceTry(serverConf);
If that does not work, please show more code of your Revaluator class and give details about which OS you are running on. Also, there should be several log outputs (at least if the log level is configured accordingly). Please paste the logged messages as well.
Maybe it could also help to get the source code of rsession from Google Code and use a debugger to set a breakpoint in Rsession.begin(). Maybe this can help figuring out what goes wrong.
I'm thinking about using HBase as a source for one of my MapReduce jobs. I know that TableInputFormat specifies one input split (and thus one mapper) per Region. However, this seems inefficient. I'd really like to have multiple mappers working on a given Region at once. Can I achieve this by extending TableInputFormatBase? Can you please point me to an example? Furthermore, is this even a good idea?
Thanks for the help.
You need a custom input format that extends InputFormat. you can get idea how do this from answer to question I want to scan lots of data (Range based queries), what all optimizations I can do while writing the data so that scan becomes faster. This is a good idea if the time of data processing is more greater then data retrieving time.
Not sure if you can specify multiple mappers for a given region, but consider the following:
If you think one mapper is inefficient per region (maybe your data nodes don't have enough resources like #cpus), you can perhaps specify smaller regions sizes in the file hbase-site.xml.
here's a site for the default configs options if you want to look into changing that:
http://hbase.apache.org/configuration.html#hbase_default_configurations
please note that by making the region size small, you will be increasing the number of files in your DFS, and this can limit the capacity of your hadoop DFS depending on the memory of your namenode. Remember, the namenode's memory usage is directly related to the number of files in your DFS. This may or may not be relavant to your situation as I do not know how your cluster is being used. There is never a silver bullet answer to these questions!
1 . Its absolutely fine just make sure the key set is mutually exclusive between the mappers .
you arent creating too many clients as this may lead to lot of gc , as during hbase read hbase block cache churning happens
Using this MultipleScanTableInputFormat, you can use MultipleScanTableInputFormat.PARTITIONS_PER_REGION_SERVER configuration to control how many mappers should execute against a single regionserver. The class will group all the input splits by their location (regionserver), and the RecordReader will properly iterate through all aggregated splits for the mapper.
Here is the example
https://gist.github.com/bbeaudreault/9788499#file-multiplescantableinputformat-java-L90
That work you have created the multiple aggregated splits for a single mapper
private List<InputSplit> getAggregatedSplits(JobContext context) throws IOException {
final List<InputSplit> aggregatedSplits = new ArrayList<InputSplit>();
final Scan scan = getScan();
for (int i = 0; i < startRows.size(); i++) {
scan.setStartRow(startRows.get(i));
scan.setStopRow(stopRows.get(i));
setScan(scan);
aggregatedSplits.addAll(super.getSplits(context));
}
// set the state back to where it was..
scan.setStopRow(null);
scan.setStartRow(null);
setScan(scan);
return aggregatedSplits;
}
Create partition by Region server
#Override
public List<InputSplit> getSplits(JobContext context) throws IOException {
List<InputSplit> source = getAggregatedSplits(context);
if (!partitionByRegionServer) {
return source;
}
// Partition by regionserver
Multimap<String, TableSplit> partitioned = ArrayListMultimap.<String, TableSplit>create();
for (InputSplit split : source) {
TableSplit cast = (TableSplit) split;
String rs = cast.getRegionLocation();
partitioned.put(rs, cast);
}
This would be useful if you wanna scan large regions (hundred of millions rows) with conditioned scan that finds only a few records. This will prevent of ScannerTimeoutException
package org.apache.hadoop.hbase.mapreduce;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
public class RegionSplitTableInputFormat extends TableInputFormat {
public static final String REGION_SPLIT = "region.split";
#Override
public List<InputSplit> getSplits(JobContext context) throws IOException {
Configuration conf = context.getConfiguration();
int regionSplitCount = conf.getInt(REGION_SPLIT, 0);
List<InputSplit> superSplits = super.getSplits(context);
if (regionSplitCount <= 0) {
return superSplits;
}
List<InputSplit> splits = new ArrayList<InputSplit>(superSplits.size() * regionSplitCount);
for (InputSplit inputSplit : superSplits) {
TableSplit tableSplit = (TableSplit) inputSplit;
System.out.println("splitting by " + regionSplitCount + " " + tableSplit);
byte[] startRow0 = tableSplit.getStartRow();
byte[] endRow0 = tableSplit.getEndRow();
boolean discardLastSplit = false;
if (endRow0.length == 0) {
endRow0 = new byte[startRow0.length];
Arrays.fill(endRow0, (byte) 255);
discardLastSplit = true;
}
byte[][] split = Bytes.split(startRow0, endRow0, regionSplitCount);
if (discardLastSplit) {
split[split.length - 1] = new byte[0];
}
for (int regionSplit = 0; regionSplit < split.length - 1; regionSplit++) {
byte[] startRow = split[regionSplit];
byte[] endRow = split[regionSplit + 1];
TableSplit newSplit = new TableSplit(tableSplit.getTableName(), startRow, endRow,
tableSplit.getLocations()[0]);
splits.add(newSplit);
}
}
return splits;
}
}
I have a problem to solve using FSTs.
Basically, I'll make a morphological parser, and in this moment i have to work with large transducers. The performance is The Big issue here.
Recently, i worked in c++ in other projects where the performance matters, but now, i'am considering java, because the java's benefits and because java is getting better.
I studied some comparisons between java and c++, but I cannot decide what language i should use for this specific problem because it depends on lib in use.
I canĀ“t find much information about java's libs, so, my question is: Are there any open source java libs in which the performance is good, like The RWTH FSA Toolkit that i read in an article that is the fastest c++ lib?
Thanks all.
What are the "benefits" of Java, for your purposes? What specific problem does that platform solve that you need? What is the performance constraint you must consider? Were the "comparisons" fair, because Java is actually extremely difficult to benchmark. So is C++, but you can at least get some algorithmic boundary guarantees from STL.
I suggest you look at OpenFst and the AT&T finite-state transducer tools. There are others out there, but I think your worry about Java puts the cart before the horse-- focus on what solves your problem well.
Good luck!
http://jautomata.sourceforge.net/ and http://www.cs.duke.edu/csed/jflap/ are based Java finite state machine libraries, although I don't have experience using them so I cannot comment on the efficiency.
I'm one of the developers of the morfologik-stemming library. It's pure Java and its performance is very good, both when you build the automaton and when you use it. We use it for morphological analysis in LanguageTool.
The problem here is the minimum size of your objects in Java. In C++, without virtual methods and run time type identification, your objects weight exactly their content. And the time your automata take to manipulate memory has a big impact on performance.
I think that should be the main reason for choosing C++ over Java.
OpenFST is a C++ finite state transducer framework that is really comprehensive. Some people from CMU ported it to Java for use in their natural language processing.
A blog post series describing it.
The code is located on svn.
Update:
I ported it to java here
Lucene has a excellent implementation of FST, which is easy to use and high performance, making query engines like Elasticsearch, Solr deliver very fast sub-second term based query.Let me take an example:
import com.google.common.base.Preconditions;
import org.apache.lucene.store.ByteArrayDataInput;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.GrowableByteArrayDataOutput;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.IntsRefBuilder;
import org.apache.lucene.util.fst.Builder;
import org.apache.lucene.util.fst.FST;
import org.apache.lucene.util.fst.PositiveIntOutputs;
import org.apache.lucene.util.fst.Util;
import java.io.IOException;
public class T {
private final String inputValues[] = {"cat", "dog", "dogs"};
private final long outputValues[] = {5, 7, 12};
// https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/fst/package-summary.html
public static void main(String[] args) throws IOException {
T t = new T();
FST<Long> fst = t.buildFSTInMemory();
System.out.println(String.format("memory used for fst is %d bytes", fst.ramBytesUsed()));
t.searchFST(fst);
byte[] bytes = t.serialize(fst);
System.out.println(String.format("length of serialized fst is %d bytes", bytes.length));
fst = t.deserialize(bytes);
t.searchFST(fst);
}
private FST<Long> buildFSTInMemory() throws IOException {
// Input values (keys). These must be provided to Builder in Unicode sorted order! Use Collections.sort() to sort inputValues first.
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
Builder<Long> builder = new Builder<Long>(FST.INPUT_TYPE.BYTE1, outputs);
BytesRef scratchBytes = new BytesRef();
IntsRefBuilder scratchInts = new IntsRefBuilder();
for (int i = 0; i < inputValues.length; i++) {
// scratchBytes.copyChars(inputValues[i]);
scratchBytes.bytes = inputValues[i].getBytes();
scratchBytes.offset = 0;
scratchBytes.length = inputValues[i].length();
builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
}
FST<Long> fst = builder.finish();
return fst;
}
private FST<Long> deserialize(byte[] bytes) throws IOException {
DataInput in = new ByteArrayDataInput(bytes);
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
FST<Long> fst = new FST<Long>(in, outputs);
return fst;
}
private byte[] serialize(FST<Long> fst) throws IOException {
final int capicity = 32;
GrowableByteArrayDataOutput out = new GrowableByteArrayDataOutput(capicity);
fst.save(out);
return out.getBytes();
}
private void searchFST(FST<Long> fst) throws IOException {
for (int i = 0; i < inputValues.length; i++) {
Long value = Util.get(fst, new BytesRef(inputValues[i]));
Preconditions.checkState(value == outputValues[i], "fatal error");
}
}
}