I want to save a random forest regression model in PMML from R, and load it in Spark (Scala or Java). Unfortunately I have issues in the second step.
A minimal example of saving a PMML of a random forest regresion model in R is provided below.
When I try to load this model from Scala or Java using jpmml (see code below), I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3
I can overcome this error editing the xml file: the attribute "xmlns" in the tag "PMML" contains the url that appears in the error message. If I remove completely the url or I change 4_3 to 4_2, this error disappears. However, a new error message appears:
Exception in thread "main" org.jpmml.evaluator.UnsupportedFeatureException (at or around line 19): MiningModel
Do you have please any suggestions or ideas on how to solve this specific error or, more in general, how to load in Scala a pmml created in R?
Thank you!
Update: The problem, as answered by #user1808924, was the version of the jpmml library. The code quoted below now works fine. The correct libs should be loaded, for example using the Maven Central Repository:
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.3.6</version>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
<version>1.3.7</version>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-spark</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
Minimal example of saving a PMML of a random forest regresion model in R:
library(randomForest)
library(r2pmml)
data(mtcars)
MPGmodel.rf <- randomForest(mpg~., mtcars, ntree=5, do.trace=1)
# with package "r2pmml", convert model to pmml version 4.3 and save to xml:
r2pmml(MPGmodel.rf, "MPGmodel-r2pmml.pmml")
Loading the model in Scala:
import java.io.File
import org.jpmml.evaluator.Evaluator
import org.jpmml.spark.EvaluatorUtil
val fileNamePmml = "MPGmodel-r2pmml.pmml"
val pmmlFile = new File(fileNamePmml)
// the "UnsupportedFeature MiningModel" error appears here:
val myEvaluator: Evaluator = EvaluatorUtil.createEvaluator(pmmlFile)
I've also tried to load the model using Java, with identical error messages:
import org.dmg.pmml.PMML;
import org.jpmml.evaluator.ModelEvaluator;
import org.jpmml.evaluator.ModelEvaluatorFactory;
import java.io.*;
import java.util.Scanner;
import java.io.ByteArrayInputStream;
File pmmlFile = new File(fileNamePmml );
// the pmml file is successfully loaded as a string:
String pmmlString = null;
pmmlString = new Scanner(pmmlFile).useDelimiter("FILEFINISHESHERE").next();
// a PMML object is successfully created from the pmml string:
PMML myPmml = null;
try(InputStream is = new ByteArrayInputStream(pmmlString.getBytes())){
myPmml = org.jpmml.model.PMMLUtil.unmarshal(is);
}
// the "UnsupportedFeature MiningModel" error appears here:
ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance();
ModelEvaluator<?> modelEvaluator = modelEvaluatorFactory.newModelEvaluator(myPmml);
You're using a legacy JPMML library, which was discontinued 3+ years ago. Naturally, it doesn't support new PMML features (such as PMML 4.2 and 4.3 schemas) that have been added since then.
Simply upgrade to the JPMML-Evaluator library. As a bonus, your code will be much shorter and cleaner.
You could use PMML4S to load the PMML model in Scala, for example:
import org.pmml4s.model.Model
val model = Model.fromFile("MPGmodel-r2pmml.pmml")
val result = model.predict(data)
The input data could be a map, a list of pairs of keys and values, an array, json, or PMML4S's Series.
Related
I have upgraded my application from Wicket 1.x to 8.x version.
I am facing an issue to convert Excel file into PDF format.
Using this below dependencies:
<dependency>
<groupId>net.sf.jodconverter</groupId>
<artifactId>jodconverter</artifactId>
<version>3.0-beta-4</version>
</dependency>
Using these import classes
import org.artofsolving.jodconverter.OfficeDocumentConverter;
import org.artofsolving.jodconverter.office.ExternalOfficeManagerConfiguration;
import org.artofsolving.jodconverter.office.OfficeConnectionProtocol;
import org.artofsolving.jodconverter.office.OfficeManager;
Getting this below error on this line while calling buildOfficeManager() method.
OfficeManager officeManager = eomc.buildOfficeManager();
I am getting this below exception on this above line:
java.lang.ClassNotFoundException: com.sun.star.connection.NoConnectException
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1358)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1180)
at org.artofsolving.jodconverter.office.ExternalOfficeManager.(ExternalOfficeManager.java:55)
at org.artofsolving.jodconverter.office.ExternalOfficeManagerConfiguration.buildOfficeManager(ExternalOfficeManagerConfiguration.java:50)
Using below system parameters:
[openofficeHome=C:/Program Files/openoffice.org3, hostname=127.0.0.1, port=8100, protocol=SOCKET]
Below is the more detail of code:
ExternalOfficeManagerConfiguration eomcTest = new ExternalOfficeManagerConfiguration();
eomcTest.setConnectOnStart(true);
eomcTest.setConnectionProtocol(ooConfig.getProtocol());
if (OfficeConnectionProtocol.PIPE.equals(ooConfig.getProtocol())) {
eomcTest.setPipeName("officePipe");
} else {
eomcTest.setPortNumber(ooConfig.getPort());
}
OfficeManager officeManager = eomcTest.buildOfficeManager();
officeManager.start();
OfficeDocumentConverter officeDocConverter = new OfficeDocumentConverter(officeManager);
resultFile = File.createTempFile(sheetName, TypeOfFile.PDF.getFileExtension());
officeDocConverter.convert(tempFile, resultFile);
fout.close();
officeManager.stop();
Kindly anyone let me know why buildOfficeManager() is giving error here and what can be the solution here to resolve this issue. It will be more appreciable.
According to https://search.maven.org/search?q=fc:com.sun.star.connection.NoConnectException you need to add org.libreoffice:libreoffice (or the old org.libreoffice:ridl) dependency to Maven's pom.xml.
I don't see net.sf.jodconverter at https://search.maven.org/search?q=jodconverter. You may try with a more recent version of it - probably any of the listed ones here: https://search.maven.org/search?q=g:org.jodconverter
I have resolved this issue and above code is working fine to convert excel file into pdf file with jodconverter API.
In my case, excel file and pdf file both were having the same name which was causing the issue to return the same excel file on pdf download link. After a change in pdf's name, it resolved my issue.
I'm trying ANTLR 4.8. I'm having problems coding a correct main file that calls to the lexer and parser classes.
After correctly parsing my ANTLR g4 file (getting all files and classes provided by antlr) I've coded the following main java file:
import java.io.*;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.runtime.*;
import org.antlr.runtime.TokenSource;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
public class example3Ppal {
public static void main(String[] args){
try{
CharStream input = CharStreams.fromFileName(args[0]);
//Create a Lexer with the previously created CharStream
example3Lexer mylexer = new example3Lexer(input);
//Conecting lexer and parser
CommonTokenStream tokens = new CommonTokenStream((TokenSource) mylexer);
example3Parser myparser = new example3Parser((TokenStream) tokens);
myparser.operation();
} catch (java.lang.RuntimeException re) {
System.out.println(re.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
}
}
I started from a former main file, so I had to change from ANTLRFileStream and such to CharStreams. Everything seems to work until I try to connect lexer and parser.
Following examples provided in ANTLR website a lexer object is enough to create a "CommonTokenStream" object that, in addition, should be enough to create a parser object.
Well I firstly tried without any cast but both, eclipse and NetBeans, ask me to cast "mylexer" and "tokens" objects. I don't undestand why, because lexer superclass implements "TokenSource" interface, as well as "CommonTokenStream" does with "TokenStream" interface. In addition both environments allow me to use a "CommonTokenStream" constructor without any attributes while this constructor doesn't exist in the ANTLR documentation
I've read many comments here about similar questions, but I haven't found any that could be applied to my situation.
The result is that it compiles but when I run the program I receive the following error message:
"mylexer cannot be cast to org.antlr.runtime.TokenSource"
There are no prior installations of ANTLR in my computer, the "antlr-4.8-complete" jar file is correctly included as an external jar lib in the projects and it is also included in the CLASSPATH environment variable. I don't know why what should work isn't working for me, could somebody help me? I'm begining to think about reinstalling java, eclipse, netBeans and ANTLR.
Thanks in advance.
You're mixing imports from ANTLR 4 and ANTLR 3. Any import that doesn't have v4 in it, is importing classes from ANTLR 3 (which is possible because the ANTLR 4 jar included ANTLR3 since ANTLR 4 uses ANTLR 4).
If you switch all your imports to org.antlr.v4 and remove the casts, the code should work.
I'm trying to create a custom transformer in Spark 2.4.0. Saving it works fine. However, when I try to load it, I get the following error:
java.lang.NoSuchMethodException: TestTransformer.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:496)
at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
at TestTransformer$.load(<console>:40)
... 31 elided
This suggests to me that it can't find my transformer's constructor, which doesn't really make sense to me.
MCVE:
import org.apache.spark.sql.{Dataset, DataFrame}
import org.apache.spark.sql.types.{StructType}
import org.apache.spark.ml.Transformer
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.ml.util.{DefaultParamsReadable, DefaultParamsWritable, Identifiable}
class TestTransformer(override val uid: String) extends Transformer with DefaultParamsWritable{
def this() = this(Identifiable.randomUID("TestTransformer"))
override def transform(df: Dataset[_]): DataFrame = {
val columns = df.columns
df.select(columns.head, columns.tail: _*)
}
override def transformSchema(schema: StructType): StructType = {
schema
}
override def copy(extra: ParamMap): TestTransformer = defaultCopy[TestTransformer](extra)
}
object TestTransformer extends DefaultParamsReadable[TestTransformer]{
override def load(path: String): TestTransformer = super.load(path)
}
val transformer = new TestTransformer("test")
transformer.write.overwrite().save("test_transformer")
TestTransformer.load("test_transformer")
Running this (I'm using a Jupyter notebook) leads to the above error. I've tried compiling and running it as a .jar file, with no difference.
What puzzles me is that the equivalent PySpark code works fine:
from pyspark.sql import SparkSession, DataFrame
from pyspark.ml import Transformer
from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
class TestTransformer(Transformer, DefaultParamsWritable, DefaultParamsReadable):
def transform(self, df: DataFrame) -> DataFrame:
return df
TestTransformer().save('test_transformer')
TestTransformer.load('test_transformer')
How can I make a custom Spark transformer that can be saved and loaded?
I can reproduce your problem in spark-shell.
Trying to find the source of the problem I looked into DefaultParamsReadable and DefaultParamsReader sources and I could see they utilize Java reflection.
https://github.com/apache/spark/blob/v2.4.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
lines 495-496
val instance =
cls.getConstructor(classOf[String]).newInstance(metadata.uid).asInstanceOf[Params]
I think scala REPLs and Java reflection aren't good friends.
If you run this snippet (after yours):
new TestTransformer().getClass.getConstructors
you'll get the following output:
res1: Array[java.lang.reflect.Constructor[_]] = Array(public TestTransformer($iw), public TestTransformer($iw,java.lang.String))
It is true! TestTransformer.<init>(java.lang.String) doesn't exist.
I found 2 workarounds,
Compiling your code with sbt and creating a jar, then including in spark-shell with :require, worked for me (You mentioned you tried a jar, I don't know how though)
Pasting the code in spark-shell with :paste -raw , worked fine as well. I suppose -raw prevents from REPL doing shenanigans to your classes.
See: https://docs.scala-lang.org/overviews/repl/overview.html
I'm not sure how you can adapt any of these to Jupyter but I hope this info is useful for you.
NOTE: I actually used spark-shell in spark 2.4.1
In Java programming, we should firstly add weka.jar into our classpath, thus we can call all classify or cluster algorithms in WEKA in the form of the following codes,
import weka.classifiers.trees.RandomForest;
...
RandomForest rf = new RandomForest(); // RandomForest object
But unfortunately, we can not use this way to import LibSVM algorithm, because there is not such class in weka.jar.
So, my question is How to import LibSVM into my Java code? Any help will be grateful :)
Firstly, I'd like to say there are so many methods to solve the problem. The solution I mentioned is quite simple, but other answers from StackOverflow are not detailed descripted, with waste my too much time to verify. So I'm happy to share it with all WEKA beginners :)
a) Download the LibSVM.jar from Maven Repository Center. Note that this LibSVM.jar is different from the libsvm.jar developed by Chih-Chung Chang and Chih-Jen Lin;
b) Add the LibSVM.jar to the classpath of our Java project;
c) Call the classifier LibSVM when you need, see the following Java code.
import weka.classifiers.functions.LibSVM; // contained in LibSVM.jar
String path = "file/train.arff";
Instances train = DataSource.read(path); // load the dataset
train.setClassIndex(train.numAttribute()-1); // set class index
LibSVM svm = new LibSVM(); // load the svm classifier
svm.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.crossValidateModel(svm, train, 10, new Random(1)); // 10-fold cross-validation
See: https://weka.wikispaces.com/LibSVM
Use Weka's package manager to install the LibSVM. Suppose "weka.jar" is in your current folder, than run this:
java -cp weka.jar weka.core.WekaPackageManager -install-package LibSVM
During the installation, it shows:
[DefaultPackageManager] Tmp file: /tmp/LibSVM1.0.107382715397815864641.zip
[DefaultPackageManager] Installing: Description.props
[DefaultPackageManager] Installing: LibSVM.jar
[DefaultPackageManager] Installing: build_package.xml
...
You can see that "LibSVM.jar" is installed somewhere. In my case, it is at:
/home/john/wekafiles/packages/LibSVM/LibSVM.jar
I just want to do some 2D Matrix operation in using JavaRDD and looked into this link https://spark.apache.org/docs/latest/mllib-data-types.html. I tried doing exactly the same sample codes that are given here. But eclipse doesn't seem to recognize the mllib in the first place. Here is my code snippet (same as that in the above link)
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.util.MLUtils;
import org.apache.spark.mllib.linalg.Matrix;
import org.apache.spark.mllib.linalg.Matrices;
JavaRDD<Vector> rows = ... // a JavaRDD of local vectors
// Create a RowMatrix from an JavaRDD<Vector>.
RowMatrix mat = new RowMatrix(rows.rdd());
// Get its size.
long m = mat.numRows();
long n = mat.numCols();
// QR decomposition
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
I am using Spark 2.0.2. Where am I going wrong? Do we need any maven dependency? I checked my spark home directory, and I have the mllib directory and mllib-local directory in my spark directory.
Check your pom.xml to see if there is a spark-mllib dependency. If not, get the right version from here: https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11
At the point of my answering, the latest version is:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
Make sure that your spark-mllib configuration in pom.xml hasn't been runtime.