Tensorflow 2.0 & Java API - java

(note, I've resolved my problem and posted the code at the bottom)
I'm playing around with TensorFlow and the backend processing must take place in Java. I've taken one of the models from the https://developers.google.com/machine-learning/crash-course and saved it with tf.saved_model.save(my_model,"house_price_median_income") (using a docker container). I copied the model off and loaded it into Java (using the 2.0 stuff built from source because I'm on windows).
I can load the model and run it:
try (SavedModelBundle model = SavedModelBundle.load("./house_price_median_income", "serve")) {
try (Session session = model.session()) {
Session.Runner runner = session.runner();
float[][] in = new float[][]{ {2.1518f} } ;
Tensor<?> jack = Tensor.create(in);
runner.feed("serving_default_layer1_input", jack);
float[][] probabilities = runner.fetch("StatefulPartitionedCall").run().get(0).copyTo(new float[1][1]);
for (int i = 0; i < probabilities.length; ++i) {
System.out.println(String.format("-- Input #%d", i));
for (int j = 0; j < probabilities[i].length; ++j) {
System.out.println(String.format("Class %d - %f", i, probabilities[i][j]));
}
}
}
}
The above is hardcoded to an input and output but I want to be able to read the model and provide some information so the end-user can select the input and output, etc.
I can get the inputs and outputs with the python command: saved_model_cli show --dir ./house_price_median_income --all
What I want to do it get the inputs and outputs via Java so my code doesn't need to execute python script to get them. I can get operations via:
Graph graph = model.graph();
Iterator<Operation> itr = graph.operations();
while (itr.hasNext()) {
GraphOperation e = (GraphOperation)itr.next();
System.out.println(e);
And this outputs both the inputs and outputs as "operations" BUT how do I know that it is an input and\or an output? The python tool uses the SignatureDef but that doesn't seem to appear in the TensorFlow 2.0 java stuff at all. Am I missing something obvious or is it just missing from TensforFlow 2.0 Java library?
NOTE, I've sorted my issue with the answer help below. Here is my full bit of code in case somebody would like it in the future. Note this is TF 2.0 and uses the SNAPSHOT mentioned below. I make a few assumptions but it shows how to pull the input and output and then use them to run a model
import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.exceptions.TensorFlowException;
import org.tensorflow.Session.Run;
import org.tensorflow.Graph;
import org.tensorflow.Operation;
import org.tensorflow.Output;
import org.tensorflow.GraphOperation;
import org.tensorflow.proto.framework.SignatureDef;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.tensorflow.proto.framework.MetaGraphDef;
import java.util.Map;
import org.tensorflow.proto.framework.TensorInfo;
import org.tensorflow.types.TFloat32;
import org.tensorflow.tools.Shape;
import java.nio.FloatBuffer;
import org.tensorflow.tools.buffer.DataBuffers;
import org.tensorflow.tools.ndarray.FloatNdArray;
import org.tensorflow.tools.ndarray.StdArrays;
import org.tensorflow.proto.framework.TensorInfo;
public class v2tensor {
public static void main(String[] args) {
try (SavedModelBundle savedModel = SavedModelBundle.load("./house_price_median_income", "serve")) {
SignatureDef modelInfo = savedModel.metaGraphDef().getSignatureDefMap().get("serving_default");
TensorInfo input1 = null;
TensorInfo output1 = null;
Map<String, TensorInfo> inputs = modelInfo.getInputsMap();
for(Map.Entry<String, TensorInfo> input : inputs.entrySet()) {
if (input1 == null) {
input1 = input.getValue();
System.out.println(input1.getName());
}
System.out.println(input);
}
Map<String, TensorInfo> outputs = modelInfo.getOutputsMap();
for(Map.Entry<String, TensorInfo> output : outputs.entrySet()) {
if (output1 == null) {
output1=output.getValue();
}
System.out.println(output);
}
try (Session session = savedModel.session()) {
Session.Runner runner = session.runner();
FloatNdArray matrix = StdArrays.ndCopyOf(new float[][]{ { 2.1518f } } );
try (Tensor<TFloat32> jack = TFloat32.tensorOf(matrix) ) {
runner.feed(input1.getName(), jack);
try ( Tensor<TFloat32> rezz = runner.fetch(output1.getName()).run().get(0).expect(TFloat32.DTYPE) ) {
TFloat32 data = rezz.data();
data.scalars().forEachIndexed((i, s) -> {
System.out.println(s.getFloat());
} );
}
}
}
} catch (TensorFlowException ex) {
ex.printStackTrace();
}
}
}

What you need to do is to read the SavedModelBundle metadata as a MetaGraphDef, from there you can retrieve input and output names from the SignatureDef, like in Python.
In TF Java 1.* (i.e. the client you are using in your example), the proto definitions are not available out-of-the-box from the tensorflow artifact, you need to add a dependency to org.tensorflow:proto as well and deserialize the result of SavedModelBundle.metaGraphDef() into a MetaGraphDef proto.
In TF Java 2.* (the new client actually only available as snapshots from here), the protos are present right away so you can simply call this line to retrieve the right SignatureDef:
savedModel.metaGraphDef().signatureDefMap.getValue("serving_default")

Related

Iceberg table does not see the generated Parquet file

In my use case, the table in Iceberg format is created. It only receives APPEND operations as it is about recording events in a time series stream. To evaluate the use of the Iceberg format in this use-case, I created a simple Java program that creates a set of 27600 lines. Both the metadata and the parquet file were created but I can't access them via the Java API (https://iceberg.apache.org/docs/latest/java-api-quickstart/). I'm using HadoopCatalog and FileAppender<GenericRecord>. It is important to say that I can read the Parquet file created using pyarrow and datafusion modules via Python 3 script, and it is correct!
I believe that the execution of some method in my program that links the generated Parquet file to the table created in the catalog must be missing.
NOTE: I'm only using Apache Iceberg's Java API in version 1.0.0
There is an org.apache.iceberg.Transaction object in the API that accepts an org.apache.iceberg.DataFile but I haven't seen examples of how to use it and I don't know if it's useful to solve this problem either.
See the program below:
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.*;
import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.data.GenericRecord;
import org.apache.iceberg.data.parquet.GenericParquetWriter;
import org.apache.iceberg.hadoop.HadoopCatalog;
import org.apache.iceberg.io.FileAppender;
import org.apache.iceberg.parquet.Parquet;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
import org.apache.iceberg.types.Types;
import java.io.File;
import java.io.IOException;
import java.time.LocalDate;
import java.time.temporal.ChronoUnit;
import java.util.List;
import static org.apache.iceberg.types.Types.NestedField.optional;
import static org.apache.iceberg.types.Types.NestedField.required;
public class IcebergTableAppend {
public static void main(String[] args) {
System.out.println("Appending records ");
Configuration conf = new Configuration();
String lakehouse = "/tmp/iceberg-test";
conf.set(CatalogProperties.WAREHOUSE_LOCATION, lakehouse);
Schema schema = new Schema(
required(1, "hotel_id", Types.LongType.get()),
optional(2, "hotel_name", Types.StringType.get()),
required(3, "customer_id", Types.LongType.get()),
required(4, "arrival_date", Types.DateType.get()),
required(5, "departure_date", Types.DateType.get()),
required(6, "value", Types.DoubleType.get())
);
PartitionSpec spec = PartitionSpec.builderFor(schema)
.month("arrival_date")
.build();
TableIdentifier id = TableIdentifier.parse("bookings.rome_hotels");
String warehousePath = "file://" + lakehouse;
Catalog catalog = new HadoopCatalog(conf, warehousePath);
// rm -rf /tmp/iceberg-test/bookings
Table table = catalog.createTable(id, schema, spec);
List<GenericRecord> records = Lists.newArrayList();
// generating a bunch of records
for (int j = 1; j <= 12; j++) {
int NUM_ROWS_PER_MONTH = 2300;
for (int i = 0; i < NUM_ROWS_PER_MONTH; i++) {
GenericRecord rec = GenericRecord.create(schema);
rec.setField("hotel_id", (long) (i * 2) + 10000);
rec.setField("hotel_name", "hotel_name-" + i + 1000);
rec.setField("customer_id", (long) (i * 2) + 20000);
rec.setField("arrival_date",
LocalDate.of(2022, j, (i % 23) + 1)
.plus(1, ChronoUnit.DAYS));
rec.setField("departure_date",
LocalDate.of(2022, j, (i % 23) + 5));
rec.setField("value", (double) i * 4.13);
records.add(rec);
}
}
File parquetFile = new File(
lakehouse + "/bookings/rome_hotels/arq_001.parquet");
FileAppender<GenericRecord> appender = null;
try {
appender = Parquet.write(Files.localOutput(parquetFile))
.schema(table.schema())
.createWriterFunc(GenericParquetWriter::buildWriter)
.build();
} catch (IOException e) {
throw new RuntimeException(e);
}
try {
appender.addAll(records);
} finally {
try {
appender.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
}
I found out how to fix the Java program.
Just add the lines below to the end of the main method
PartitionKey partitionKey = new PartitionKey(table.spec(), table.schema());
DataFile dataFile = DataFiles.builder(table.spec())
.withPartition(partitionKey)
.withInputFile(localInput(parquetFile))
.withMetrics(appender.metrics())
.withFormat(FileFormat.PARQUET)
.build();
Transaction t = table.newTransaction();
t.newAppend().appendFile(dataFile).commit();
// commit all changes to the table
t.commitTransaction();
Also add to your POM file the dependency below
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.3.4</version>
</dependency>
This avoids the runtime error shown below:
java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.input.FileInputFormat

How to pass input data to an existing tensorflow 2.x model in Java?

I'm doing my first steps with tensorflow. After having created a simple model for MNIST data in Python, I now want to import this model into Java and use it for classification. However, I don't manage to pass the input data to the model.
Here is the Python code for model creation:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical.
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32')
train_images /= 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32')
test_images /= 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
NrTrainimages = train_images.shape[0]
NrTestimages = test_images.shape[0]
import os
import numpy as np
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
# Network architecture
model = Sequential()
mnist_inputshape = train_images.shape[1:4]
# Convolutional block 1
model.add(Conv2D(32, kernel_size=(5,5),
activation = 'relu',
input_shape=mnist_inputshape,
name = 'Input_Layer'))
model.add(MaxPooling2D(pool_size=(2,2)))
# Convolutional block 2
model.add(Conv2D(64, kernel_size=(5,5),activation= 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.5))
# Prediction block
model.add(Flatten())
model.add(Dense(128, activation='relu', name='features'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax', name = 'Output_Layer'))
model.compile(loss='categorical_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
LOGDIR = "logs"
my_tensorboard = TensorBoard(log_dir = LOGDIR,
histogram_freq=0,
write_graph=True,
write_images=True)
my_batch_size = 128
my_num_classes = 10
my_epochs = 5
history = model.fit(train_images, train_labels,
batch_size=my_batch_size,
callbacks=[my_tensorboard],
epochs=my_epochs,
use_multiprocessing=False,
verbose=1,
validation_data=(test_images, test_labels))
score = model.evaluate(test_images, test_labels)
modeldir = 'models'
model.save(modeldir, save_format = 'tf')
For Java, I am trying to adapt the App.java code published here.
I am struggling with replacing this snippet:
Tensor result = s.runner()
.feed("input_tensor", inputTensor)
.feed("dropout/keep_prob", keep_prob)
.fetch("output_tensor")
.run().get(0);
While in this code, a particular input tensor is used to pass the data, in my model, there are only layers and no individual named tensors. Thus, the following doesn't work:
Tensor<?> result = s.runner()
.feed("Input_Layer/kernel", inputTensor)
.fetch("Output_Layer/kernel")
.run().get(0);
How do I pass the data to and get the output from my model in Java?
With the newest version of TensorFlow Java, you don't need to search for yourself the name of the input/output tensors from the model signature or from the graph. You can simply call the following:
try (SavedModelBundle model = SavedModelBundle.load("./model", "serve");
Tensor<TFloat32> image = TFloat32.tensorOf(...); // There a many ways to pass you image bytes here
Tensor<TFloat32> result = model.call(image).expect(TFloat32.DTYPE)) {
System.out.println("Result is " + result.data().getFloat());
}
}
TensorFlow Java will automatically take care of mapping your input/output tensors to the right nodes.
I finally managed to find a solution. To get all the tensor names in the graph, I used the following code:
for (Iterator it = smb.graph().operations(); it.hasNext();) {
Operation op = (Operation) it.next();
System.out.println("Operation name: " + op.name());
}
From this, I figured out that the following works:
SavedModelBundle smb = SavedModelBundle.load("./model", "serve");
Session s = smb.session();
Tensor<Float> inputTensor = Tensor.<Float>create(imagesArray, Float.class);
Tensor<Float> result = s.runner()
.feed("serving_default_Input_Layer_input", inputTensor)
.fetch("StatefulPartitionedCall")
.run().get(0).expect(Float.class);

How to generate custom triples with OpenIEDemo.java provided by stanford-nlp

I have trained custom NER and Relation extraction model and I have checked generating triples with corenlp server but when I'm using OpenIEDemo.java
to generate triples it's generating triples having relations "has" and "have" only but not the relations on which I have trained my Relation Extraction model on.
I'm loading custom NER and Relation Extraction model while running the same script. Here is my OpenIEDemo.java file...
package edu.stanford.nlp.naturalli;
import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import java.util.Collection;
import java.util.List;
import java.util.Properties;
/**
* A demo illustrating how to call the OpenIE system programmatically.
* You can call this code with:
*
* <pre>
* java -mx1g -cp stanford-openie.jar:stanford-openie-models.jar edu.stanford.nlp.naturalli.OpenIEDemo
* </pre>
*
*/
public class OpenIEDemo {
private OpenIEDemo() {} // static main
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, depparse, natlog, openie");
props.setProperty("ner.model", "./ner/ner-model.ser.gz");
props.setProperty("sup.relation.model", "./relation_extractor/relation_model_pipeline.ser.ser");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// Annotate an example document.
String text;
if (args.length > 0) {
text = args[0];
} else {
text = "Obama was born in Hawaii. He is our president.";
}
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
// Loop over sentences in the document
int sentNo = 0;
for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(CoreAnnotations.TextAnnotation.class));
// Print SemanticGraph
System.out.println(sentence.get(SemanticGraphCoreAnnotations.EnhancedDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST));
// Get the OpenIE triples for the sentence
Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
// Print the triples
for (RelationTriple triple : triples) {
System.out.println(triple.confidence + "\t" +
triple.subjectLemmaGloss() + "\t" +
triple.relationLemmaGloss() + "\t" +
triple.objectLemmaGloss());
}
// Alternately, to only run e.g., the clause splitter:
List<SentenceFragment> clauses = new OpenIE(props).clausesInSentence(sentence);
for (SentenceFragment clause : clauses) {
System.out.println(clause.parseTree.toString(SemanticGraph.OutputFormat.LIST));
}
System.out.println();
}
}
}
Thanks in advance.
As OpenIE module of stanfordCoreNLP not using custom relation model(don't know why) I can not use custom relation extraction model with this code instead I had to run SanfordCoreNLP pipeline adding path for my custom NER and Relation Extraction model in server.properties file and generate triples. If someone know the reason why OpenIE is not using custom Relation Extraction model please comment, it will be very useful for others.

RDF4J only scheduling 5 Queries against a Triple Store

I have some more Issues with handling semantic data technologies:
I have a GraphDB Triplestor running locally on my machine an try to schedule some SPARQL queries against it using RDF4J and Java. As you can see from the code below 10 Queries shall be launched in a row. However only 5 get launched (I see number 0 - 4 in console). The problem seems to be that I am limited to 5 open HTTP connections for some reason. Any call of repConn.close() does not seem to change anything. Any Ideas anyone?
import org.eclipse.rdf4j.query.QueryLanguage;
import org.eclipse.rdf4j.query.TupleQuery;
import org.eclipse.rdf4j.query.TupleQueryResult;
import org.eclipse.rdf4j.repository.RepositoryConnection;
import org.eclipse.rdf4j.repository.http.HTTPRepository;
public class testmain {
public HTTPRepository rep;
public RepositoryConnection repConn;
public static void main(String[] args) {
testmain test = new testmain();
test.rep = new HTTPRepository("http://localhost:7200/repositories/test01");
//test.repConn = test.rep.getConnection();
for (int i = 0; i < 10; i++) {
test.repConn = test.rep.getConnection();
String queryString = "select ?archiveID where { ?video <http://www.some.ns/ontology##hasArchiveID> ?archiveID .}";
try {
TupleQuery tupleQuery = test.repConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult queryResult = tupleQuery.evaluate();
} finally {
test.repConn.close();
}
System.out.println(i);
}
}
}
you also need to close the query result otherwise the repCon.close does not do anything.
try {
TupleQuery tupleQuery = test.repConn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult queryResult = tupleQuery.evaluate();
queryResult.close(); // this should solve your issue
}
Or even better use the new rdf4j streams API(QueryResults.stream(gqr)). That closes everything for you. http://docs.rdf4j.org/migration/ (Point 2.6.5)

Sonar WS : How to get the total unresolved bugs count using Sonar Webservice

I am trying to get the total number of unresolved bugs and Vulnerabilities in the particular project using sonar-ws-5.6.jar.
I tried to pass the type as BUG to the search query. But still I am getting all the unresolved stuff. It is not taking the parameter type.
How to get the exact number of bugs and Vulnerabilities using Webservice?
Here is my code to connect to sonar and get the data.
import java.util.ArrayList;
import java.util.List;
import org.sonarqube.ws.Issues.SearchWsResponse;
import org.sonarqube.ws.client.HttpConnector;
import org.sonarqube.ws.client.WsClient;
import org.sonarqube.ws.client.WsClientFactories;
import org.sonarqube.ws.client.issue.SearchWsRequest;
public class SonarTest {
static String resourceKey = "com.company.projectname:parent";
public static void main(String[] args) {
try {
// Get Issue
HttpConnector httpConnector = HttpConnector.newBuilder().url("http://localhost:9000").credentials("admin", "admin").build();
SearchWsRequest issueSearchRequest = new SearchWsRequest();
issueSearchRequest.setPageSize(1000);
issueSearchRequest.setResolved(false);
List<String> bugTypesList = new ArrayList<String>();
bugTypesList.add("BUG");
issueSearchRequest.setTypes(bugTypesList);
WsClient wsClient = WsClientFactories.getDefault().newClient(httpConnector);
SearchWsResponse issuesResponse = wsClient.issues().search(issueSearchRequest);
System.out.println(issuesResponse.getIssuesList());
System.out.println("DONE");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Note: I am using sonarqube 5.6 with Java 1.8
As of now I am iterating the response and getting the count
List<Issue> issueList = issuesResponse.getIssuesList();
int bugCount = 0;
for(Issue issue : issueList){
if(issue.getType() == RuleType.BUG){
bugCount ++;
}
}
Well, you caught a bug ! I used your code and found out that the types parameter was not properly passed from the WSClient to the actual HTTP query.
So thanks for sharing your issue, SONAR-7871 is opened to have it addressed.
I am using ComponentWsRequest to get the total number of bugs
We can pass the metric key to get the required value.
Here is the code which gives me the total number of bugs.
List<String> VALUE_METRIC_KEYS = Arrays.asList("bugs");
ComponentWsRequest componentWsRequest = new ComponentWsRequest();
componentWsRequest.setComponentKey(resourceKey);
componentWsRequest.setMetricKeys(VALUE_METRIC_KEYS);
ComponentWsResponse componentWsResponse = wsClient.measures().component(componentWsRequest);
List<Measure>measureList = componentWsResponse.getComponent().getMeasuresList();
for(Measure measure : measureList){
System.out.println(measure);
}
We can use any of the metric keys to get the respective values:
"quality_gate_details","reliability_rating","reliability_remediation_effort","vulnerabilities","security_rating","security_remediation_effort","code_smells","sqale_rating","sqale_debt_ratio","effort_to_reach_maintainability_rating_a","sqale_index","ncloc","lines","statements","functions","classes","files","directories","duplicated_lines_density","duplicated_blocks","duplicated_lines","duplicated_files","complexity","function_complexity","file_complexity","class_complexity","comment_lines_density","comment_lines","public_api","public_documented_api_density","public_undocumented_api","violations","open_issues","reopened_issues","confirmed_issues","false_positive_issues","wont_fix_issues"

Categories