How to use MLeap DenseTensor in Java - java

I am using MLeap to run a Pyspark logistic regression model in a java program. Once I run the pipeline I am able to get a DefaultLeapFrame object with one row Stream(Row(1.3,12,3.6,DenseTensor([D#538613b3,List(2)),1.0), ?).
But I am not sure how to actually inspect the DenseTensor object. When I use getTensor(3) on this row I get an object. I am not familiar with Scala but that seems to be how this is meant to be interacted with. In Java how can I get the values within this DenseVector?
Here is roughly what I am doing. I'm guessing using Object is not right for the type. . .
DefaultLeapFrame df = leapFrameSupport.select(frame2, Arrays.asList("feat1", "feat2", "feat3", "probability", "prediction"));
Tensor<Object> tensor = df.dataset().head().getTensor(3);
Thanks

So the MLeap documentation for the Java DSL is not so good but I was able to look over some unit tests (link) that pointed me to the right thing to use. In case anyone else is interested, this is what I did.
DefaultLeapFrame df = leapFrameSupport.select(frame, Arrays.asList("feat1", "feat2", "feat3", "probability", "prediction"));
TensorSupport tensorSupport = new TensorSupport();
List<Double> tensor_vals = tensorSupport.toArray(df.dataset().head().getTensor(3));

Related

Reuse production code in tests : good or bad idea? (Or can I use the same code in test and production)?

We have a piece of production code in our application that reads raw DB rows something like that:
List<Map<String, Object>> results =
txnNamedJdbcTemplate.queryForList(
transactionDbQueries.getProperty(QUERY_FETCH_REPORT_DETAILS).trim(), paramMap);
, then it does a whole load of field transformations to produce an object of desired structure:
private Report extractReportData(long reportId, List<Map<String, Object>> results) {
Map<String, Object> reportRow = results.get(0);
Timestamp completeTs = (Timestamp) reportRow.getOrDefault(RS_PARAM_END_DATETIME, null);
Timestamp lastOpenedTs =
(Timestamp) reportRow.getOrDefault(RS_PARAM_LAST_OPENED_DATETIME, null);
String reportData =
reportRow.get(RS_PARAM_REPORT_DATA) == null
? StringUtils.EMPTY
: ((PGobject) reportRow.get(RS_PARAM_REPORT_DATA)).getValue();
Duration executionTime =
reportRow.containsKey(RS_PARAM_DURATION)
? Duration.ofSeconds(Long.parseLong(reportRow.get(RS_PARAM_DURATION).toString()))
: null;
String reportRunLevel = (String) reportRow.getOrDefault(RS_PARAM_ACCESS_LEVEL, null);
boolean reportOpened = (Boolean) reportRow.getOrDefault(RS_PARAM_OPENED_STATUS, Boolean.FALSE);
String reportCategory = (String) reportRow.getOrDefault(RS_PARAM_REPORT_CATEGORY, null);
Long scheduledId =
reportRow.get(RS_PARAM_SCHEDULED_ID) != null
? Long.parseLong(reportRow.get(RS_PARAM_SCHEDULED_ID).toString())
: null;
return Report.builder()
.reportId(reportId)
.reportName((String) reportRow.get(RS_PARAM_REPORT_NAME))
.reportType((String) reportRow.get(RS_PARAM_REPORT_TYPE))
.reportCategory(reportCategory)
.reportStatusDesc(
ReportStatus.values()[(Integer) reportRow.get(RS_PARAM_STATUS_ID) - 1].getDesc())
.submittedBy((String) reportRow.get(RS_PARAM_USER_NAME))
.submittedById((int) reportRow.get(RS_PARAM_USER_ID))
.submittedTime((Timestamp) reportRow.get(RS_PARAM_SUBMIT_DATETIME))
.completedTime(completeTs)
.lastOpeningTime(lastOpenedTs)
.reportData(reportData)
.reportRunLevel(reportRunLevel)
.opened(reportOpened)
.executionTime(executionTime)
.scheduledId(scheduledId)
.build();
}
I know it's not the prettiest bit of code, but that's legacy system and is beside the point.
Now, I had to test that code to make sure we can read the same object and verify the fields, so I had the following 3 scenarios:
Clone this code in a test class. I've built a test utility that does just that. Clearly, that resulted in duplication of the same un-pretty code, which is not ideal.
An alternative way would be to outsource this code to some utility class and let both prod and test code use it, for the sake of avoiding duplication.
Also, there is a way to change access modifier of the production class and let test that.
That is an integration test that uses proper DB instance so using mocks isn't gonna do - we need to read actual data. The example is actually not the best and is only for illustration. The main question: do we reuse production code in the tests or is it best to duplicate it?
I strongly lean toward option #1, on the premise that if we introduce a bug in the transformation in the prod code - how do we detect it? For that, I believe code segregation is the best way.
Are there any other opinions or reasons behind these, please?
If you going to duplicate the code in test scope, what if some bug fix has been added in future and what if your test did't alert you about something happened on your production code flow?Also think about how you maintain your duplicate code in future.
Usually tests are not only meant to validate your present code.But they also document and feedback/system that will notify any changes/breakages in code.
If you couldn't write a test case for your piece, then its a time to refactor your code.
On a practical note it's not always good idea to dismantle the existing working piece, it may need bit time and depth knowledge on history of code base and strong test suite.But its worth to get our hands little dirty
I would suggest any one below,
Refactor your code, align it with Single Responsibility and cover each unit with test case(as you stated atleast you can move them to some util).
integration test
use powermock or any similar tool to test private methods(Note : Some organisation may not like tools like powermock for security concerns)

How to create a tensorflow serving client for the 'wide and deep' model?

I've created a model based on the 'wide and deep' example (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/wide_n_deep_tutorial.py).
I've exported the model as follows:
m = build_estimator(model_dir)
m.fit(input_fn=lambda: input_fn(df_train, True), steps=FLAGS.train_steps)
results = m.evaluate(input_fn=lambda: input_fn(df_test, True), steps=1)
print('Model statistics:')
for key in sorted(results):
print("%s: %s" % (key, results[key]))
print('Done training!!!')
# Export model
export_path = sys.argv[-1]
print('Exporting trained model to %s' % export_path)
m.export(
export_path,
input_fn=serving_input_fn,
use_deprecated_input_fn=False,
input_feature_key=INPUT_FEATURE_KEY
My question is, how do I create a client to make predictions from this exported model? Also, have I exported the model correctly?
Ultimately I need to be able do this in Java too. I suspect I can do this by creating Java classes from proto files using gRPC.
Documentation is very sketchy, hence why I am asking on here.
Many thanks!
I wrote a simple tutorial Exporting and Serving a TensorFlow Wide & Deep Model.
TL;DR
To export an estimator there are four steps:
Define features for export as a list of all features used during estimator initialization.
Create a feature config using create_feature_spec_for_parsing.
Build a serving_input_fn suitable for use in serving using input_fn_utils.build_parsing_serving_input_fn.
Export the model using export_savedmodel().
To run a client script properly you need to do three following steps:
Create and place your script somewhere in the /serving/ folder, e.g. /serving/tensorflow_serving/example/
Create or modify corresponding BUILD file by adding a py_binary.
Build and run a model server, e.g. tensorflow_model_server.
Create, build and run a client that sends a tf.Example to our tensorflow_model_server for the inference.
For more details look at the tutorial itself.
Just spent a solid week figuring this out. First off, m.export is going to deprecated in a couple weeks, so instead of that block, use: m.export_savedmodel(export_path, input_fn=serving_input_fn).
Which means you then have to define serving_input_fn(), which of course is supposed to have a different signature than the input_fn() defined in the wide and deep tutorial. Namely, moving forward, I guess it's recommended that input_fn()-type things are supposed to return an InputFnOps object, defined here.
Here's how I figured out how to make that work:
from tensorflow.contrib.learn.python.learn.utils import input_fn_utils
from tensorflow.python.ops import array_ops
from tensorflow.python.framework import dtypes
def serving_input_fn():
features, labels = input_fn()
features["examples"] = tf.placeholder(tf.string)
serialized_tf_example = array_ops.placeholder(dtype=dtypes.string,
shape=[None],
name='input_example_tensor')
inputs = {'examples': serialized_tf_example}
labels = None # these are not known in serving!
return input_fn_utils.InputFnOps(features, labels, inputs)
This is probably not 100% idiomatic, but I'm pretty sure it works. For now.

How configure Stanford QNMinimizer to get similar results as scipy.optimize.minimize L-BFGS-B

I want to configurate the QN-Minimizer from Stanford Core NLP Lib to get nearly similar optimization results as scipy optimize L-BFGS-B implementation or get a standard L-BFSG configuration that is suitable for the most things. I set the standard paramters as follow:
The python example I want to copy:
scipy.optimize.minimize(neuralNetworkCost, input_theta, method = 'L-BFGS-B', jac = True)
My try to do the same in Java:
QNMinimizer qn = new QNMinimizer(10,true) ;
qn.terminateOnMaxItr(batch_iterations);
//qn.setM(10);
output = qn.minimize(neuralNetworkCost, 1e-5, input,15000);
What I need is a solid and general L-BFSG configuration, that is suitable to solve most problems.
I m also not sure, if I need to set some of these parameters for standard L-BFGS configuration:
useAveImprovement = ?;
useRelativeNorm = ?;
useNumericalZero = ?;
useEvalImprovement = ?;
Thanks for your help in advance, I m new on that field.
Resources for Information:
Stanford Core NLP QNMinimizer:
http://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.2/edu/stanford/nlp/optimization/QNMinimizer.html#setM-int-
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/optimization/QNMinimizer.java
Scipy Optimize L-BFGS-B:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
Thanks in advance!
What you have should be just fine. (Have you actually had any problems with it?)
Setting termination both on max iterations and max function evaluations is probably overkill, so you might omit the last argument to qn.minimize(), but it seems from the documentation that scipy does use both with a default value of 15000.
In general, using the robustOptions (with a second argument of true as you do) should give a reliable minimizer, similar to the pgtol convergence criterion of scipy. The other options are for special situations or just to experiment with how they work.

JPivot Display Mondrian Result

I am trying to display the result of a Mondrian query using JPivot. Many examples are showing how to use the tag library for JSP but I need to use the Java API, I looked at the documentation but I cannot understand how to use it to display the results in the table. Here is my code
Query query = connection.parseQuery(mdxQuery);
Result result = connection.execute(query);
result.print(new PrintWriter(System.out,true));
I would like to know if I can use the result object to build the jpivot table.
Thanks in advance!
First of all, using JPivot
is a pretty bad idea.
It was discontinued back in 2008.
There is a good project which is intended to replace the JPivot called Pivot4j. Despite it is currently under development (0.8 -> 0.9 version), Pivot4j can actually do the business.
However, if we're talking about your case:
result.print(new PrintWriter(System.out,true));
This string prints the HTML code with OLAP cube into your System.out.
You can write the HTML code in some output stream (like FileOuputStream), and then display it.
OutputStream out = new FileOutputStream("result.html");
result.print(new PrintWriter(out, true));
//then display this file in a browser
However, if you want to have the same interface as in JPivot, I don't think there is an easy way to do it without .jsp. In these case I strongly recommend you to try Pivot4j.
Good luck!

Compilation issue with EMF Compare code

Version of EMF Compare: 2.1.0 M6 (2013/03/19 17:50)
I am trying to use standalone compare as explained in this guide. I get the below compilation error
The method setMatchEngine(IMatchEngine) is undefined for the type EMFCompare.Builder
for the below code
// Configure EMF Compare
IEObjectMatcher matcher = DefaultMatchEngine.createDefaultEObjectMatcher(UseIdentifiers.NEVER);
IComparisonFactory comparisonFactory = new DefaultComparisonFactory(new DefaultEqualityHelperFactory());
IMatchEngine matchEngine = new DefaultMatchEngine(matcher, comparisonFactory);
EMFCompare comparator = EMFCompare.builder().setMatchEngine(matchEngine).build();
I see that setMatchEngine is replaced by some other API as shown in the below figure. I am not sure how to specify the new matchEngine using that API.
These APIs have changed for M6 (the API are now in their final 2.1.0 stage as far as removals are concerned). A good source of "how to use the APIs" are the unit tests of EMF Compare if you have the code in your workspace.
For your particular use case, the code would look as such:
IMatchEngine.Factory factory = new MatchEngineFactoryImpl(UseIdentifiers.NEVER);
IMatchEngine.Factory.Registry matchEngineRegistry = new MatchEngineFactoryRegistryImpl();
matchEngineRegistry .add(factory);
EMFCompare comparator = EMFCompare.builder().setMatchEngineFactoryRegistry(matchEngineRegistry).build();
Note that using the default registry (EMFCompare.builder().build();) would be enough in most cases... except when you really can't let EMF Compare use the IDs :p.
[edit: a small note: we have now updated the wiki with accurate information, thanks for the feedback ;)]

Categories