How can I use feed and fetch functions in TensorFlowInferenceInterface? - java

Although I want to use feed and fetch functions in TensorFlowInferenceInterface, I can't understand feed and fetch args.
public void feed(String inputName, float[] src, long... dims)
public void fetch(String outputName, float[] dst)
Here is TensorflowInferenceInterface.↓
https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/android/java/org/tensorflow/contrib/android/TensorFlowInferenceInterface.java
Now, I use Android-Studio and want to import program using MNIST.
Here is program that make protocol buffer.
import tensorflow as tf
import shutil
import os.path
if os.path.exists("./tmp/beginner-export"):
shutil.rmtree("./tmp/beginner-export")
# Import data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./tmp/data/", one_hot=True)
g = tf.Graph()
with g.as_default():
# Create the model
x = tf.placeholder("float", [None, 784])
W = tf.Variable(tf.zeros([784, 10]), name="vaiable_W")
b = tf.Variable(tf.zeros([10]), name="variable_b")
y = tf.nn.softmax(tf.matmul(x, W) + b)
# Define loss and optimizer
y_ = tf.placeholder("float", [None, 10])
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
# Train
init = tf.initialize_all_variables()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_step.run({x: batch_xs, y_: batch_ys}, sess)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}, sess))
# Store variable
_W = W.eval(sess)
_b = b.eval(sess)
sess.close()
# Create new graph for exporting
g_2 = tf.Graph()
with g_2.as_default():
# Reconstruct graph
x_2 = tf.placeholder("float", [None, 784], name="input")
W_2 = tf.constant(_W, name="constant_W")
b_2 = tf.constant(_b, name="constant_b")
y_2 = tf.nn.softmax(tf.matmul(x_2, W_2) + b_2, name="output")
sess_2 = tf.Session()
init_2 = tf.initialize_all_variables();
sess_2.run(init_2)
graph_def = g_2.as_graph_def()
tf.train.write_graph(graph_def, './tmp/beginner-export',
'beginner-graph.pb', as_text=False)
# Test trained model
y__2 = tf.placeholder("float", [None, 10])
correct_prediction_2 = tf.equal(tf.argmax(y_2, 1), tf.argmax(y__2, 1))
accuracy_2 = tf.reduce_mean(tf.cast(correct_prediction_2, "float"))
print(accuracy_2.eval({x_2: mnist.test.images, y__2: mnist.test.labels}, sess_2))
placeholder name for input is "input".
placeholder name for output is "output".
Please tell me feed and fetch usage.

I have given a sample code with the comments. hope you will understand.
private static final String INPUT_NODE = "input:0"; // input tensor name
private static final String OUTPUT_NODE = "output:0"; // output tensor name
private static final String[] OUTPUT_NODES = {"output:0"};
private static final int OUTPUT_SIZE = 10; // number of classes
private static final int INPUT_SIZE = 784; // size of the input
INPUT_IMAGE //MNIST Image
float[] result = new float[OUTPUT_SIZE]; // get the output probabilities for each class
inferenceInterface.feed(INPUT_NODE, INPUT_IMAGE, 1, INPUT_SIZE); //1-D input (1,INPUT_SIZE)
inferenceInterface.run(OUTPUT_NODES);
inferenceInterface.fetch(OUTPUT_NODE, result);
For the Android Tensorflow library version that I'm using, I need to give a 1-D input. Therefore, the Tensorflow code needs to modify according to that,
x_2 = tf.placeholder("float", [None, 1, 784], name="input") //1-D input
x_2 = tf.reshape(x_2,[-1, 784]) // reshape according to the model requirements
Hope this helps.

Related

How to change Random location onClickListener android google maps api

hey i'm new with this and I'm trying to implement a button to generate a location polyline. i got the code from here https://github.com/Moneemsaadaoui/Gradientpoly. and the code to generate a polyline like this.
generate.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View view) {
double randomValuex = 36.046851 + ((36.203712 - 36.046851) * r.nextDouble());
double randomValuex2 = 36.046851 + ((36.203712 - 36.046851) * r.nextDouble());
double randomValuey = 8.269289 + ((10.486982 - 8.269289) * r.nextDouble());
double randomValuey2 = 8.269289 + ((10.486982 - 8.269289) * r.nextDouble());
from = new LatLng(randomValuex, randomValuey);
to = new LatLng(randomValuex2, randomValuey2);
//Setting up our awesome gradient 🌈🌈
gradientPoly.setApiKey("API_KEY")
.setStartPoint(from).setEndPoint(to)
.setStartColor(Color.parseColor("#1eb5ab"))
.setWidth(11).setEndColor(Color.parseColor("#ff0098"))
.DrawPolyline();
}
});
my question is how to change Random Generate location To some location that I have specified (fixed location)? sorry for my bad english.
Just replace the random variables to contain your own values,
change these:
double randomValuex = 36.046851 + ((36.203712 - 36.046851) * r.nextDouble());
double randomValuex2 = 36.046851 + ((36.203712 - 36.046851) * r.nextDouble());
To this:
const val MY_X_LOCATION = 36.0 (or whatever you want)
const val MY_Y_LOCATION = 36.1 (or whatever you want)
const val MY_X2_LOCATION = 36.2 (or whatever you want)
const val MY_Y2_LOCATION = 36.3 (or whatever you want)
double myValueX = MY_X_LOCATION
double myValueX2 = My_X2_LOCATION
double myValueY = MY_Y_LOCATION
double myValueY2 = My_Y2_LOCATION
And then pass them here:
from = new LatLng(myValueX, myValueY);
to = new LatLng(myValueX2, myValueY2);
(You can store the constants in your class or some other place.
You can also pass them as parameters of a method, and put
this onClickListener inside the method)

How can I feed a sparse placeholder in a TensorFlow model from Java

I'm trying to calculate the best match for a given address with the kNN algorithm in TensorFlow, which works pretty good, but when I'm trying to export the model and use it in our Java Environment I got stuck on how to feed the sparse placholders from Java.
Here is a pretty much stripped down version of the python part, which returns the smallest distance between the test name and the best reference name. So far this work's as expected. When I export the model and import it in my Java program it always returns the same value (distance of the placeholders default). I asume, that the python function sparse_from_word_vec(word_vec) isn't in the model, which would totally make sense to me, but then how should i make this sparse tensor? My input is a single string and I need to create a fitting sparse tensor (value) to calculate the distance. I also searched for a way to generate the sparse tensor on the Java side, but without success.
import tensorflow as tf
import pandas as pd
d = {'NAME': ['max mustermann',
'erika musterfrau',
'joseph haydn',
'johann sebastian bach',
'wolfgang amadeus mozart']}
df = pd.DataFrame(data=d)
input_name = tf.placeholder_with_default('max musterman',(), name='input_name')
output_dist = tf.placeholder(tf.float32, (), name='output_dist')
test_name = tf.sparse_placeholder(dtype=tf.string)
ref_names = tf.sparse_placeholder(dtype=tf.string)
output_dist = tf.edit_distance(test_name, ref_names, normalize=True)
def sparse_from_word_vec(word_vec):
num_words = len(word_vec)
indices = [[xi, 0, yi] for xi,x in enumerate(word_vec) for yi,y in enumerate(x)]
chars = list(''.join(word_vec))
return(tf.SparseTensorValue(indices, chars, [num_words,1,1]))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
t_data_names=tf.constant(df['NAME'])
reference_names = [el.decode('UTF-8') for el in (t_data_names.eval())]
sparse_ref_names = sparse_from_word_vec(reference_names)
sparse_test_name = sparse_from_word_vec([str(input_name.eval().decode('utf-8'))]*5)
feeddict={test_name: sparse_test_name,
ref_names: sparse_ref_names,
}
output_dist = sess.run(output_dist, feed_dict=feeddict)
output_dist = tf.reduce_min(output_dist, 0)
print(output_dist.eval())
tf.saved_model.simple_save(sess,
"model-simple",
inputs={"input_name": input_name},
outputs={"output_dist": output_dist})
And here is my Java method:
public void run(ApplicationArguments args) throws Exception {
log.info("Loading model...");
SavedModelBundle savedModelBundle = SavedModelBundle.load("/model", "serve");
byte[] test_name = "Max Mustermann".toLowerCase().getBytes("UTF-8");
List<Tensor<?>> output = savedModelBundle.session().runner()
.feed("input_name", Tensor.<String>create(test_names))
.fetch("output_dist")
.run();
System.out.printl("Nearest distance: " + output.get(0).floatValue());
}
I was able to get your example working. I have a couple of comments on your python code before diving in.
You use the variable output_dist for 3 different value types throughout the code. I'm not a python expert, but I think it's bad practice. You also never actually use the input_name placeholder, except for exporting it as an input. Last one is that tf.saved_model.simple_save is deprecated, and you should use the tf.saved_model.Builder instead.
Now for the solution.
Looking at the libtensorflow jar file using the command jar tvf libtensorflow-x.x.x.jar (thanks to this post), you can see that there are no useful bindings for creating a sparse tensor (maybe make a feature request?). So we have to change the input to a dense tensor, then add operations to the graph to convert it to sparse. In your original code the sparse conversion was on the python side which means that the loaded graph in java wouldn't have any ops for it.
Here is the new python code:
import tensorflow as tf
import pandas as pd
def model():
#use dense tensors then convert to sparse for edit_distance
test_name = tf.placeholder(shape=(None, None), dtype=tf.string, name="test_name")
ref_names = tf.placeholder(shape=(None, None), dtype=tf.string, name="ref_names")
#Java Does not play well with the empty character so use "/" instead
test_name_sparse = tf.contrib.layers.dense_to_sparse(test_name, "/")
ref_names_sparse = tf.contrib.layers.dense_to_sparse(ref_names, "/")
output_dist = tf.edit_distance(test_name_sparse, ref_names_sparse, normalize=True)
#output the index to the closest ref name
min_idx = tf.argmin(output_dist)
return test_name, ref_names, min_idx
#Python code to be replicated in Java
def pad_string(s, max_len):
return s + ["/"] * (max_len - len(s))
d = {'NAME': ['joseph haydn',
'max mustermann',
'erika musterfrau',
'johann sebastian bach',
'wolfgang amadeus mozart']}
df = pd.DataFrame(data=d)
input_name = 'max musterman'
#pad dense tensor input
max_len = max([len(n) for n in df['NAME']])
test_input = [list(input_name)]*len(df['NAME'])
#no need to pad, all same length
ref_input = list(map(lambda x: pad_string(x, max_len), [list(n) for n in df['NAME']]))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
test_name, ref_names, min_idx = model()
#run a test to make sure the model works
feeddict = {test_name: test_input,
ref_names: ref_input,
}
out = sess.run(min_idx, feed_dict=feeddict)
print("test output:", out)
#save the model with the new Builder API
signature_def_map= {
"predict": tf.saved_model.signature_def_utils.predict_signature_def(
inputs= {"test_name": test_name, "ref_names": ref_names},
outputs= {"min_idx": min_idx})
}
builder = tf.saved_model.Builder("model")
builder.add_meta_graph_and_variables(sess, ["serve"], signature_def_map=signature_def_map)
builder.save()
And here is the java to load and run it. There is probably a lot of room for improvement here (java isn't my main language), but it gives you the idea.
import org.tensorflow.Graph;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.TensorFlow;
import org.tensorflow.SavedModelBundle;
import java.util.ArrayList;
import java.util.List;
import java.util.Arrays;
public class Test {
public static byte[][] makeTensor(String s, int padding) throws Exception
{
int len = s.length();
int extra = padding - len;
byte[][] ret = new byte[len + extra][];
for (int i = 0; i < len; i++) {
String cur = "" + s.charAt(i);
byte[] cur_b = cur.getBytes("UTF-8");
ret[i] = cur_b;
}
for (int i = 0; i < extra; i++) {
byte[] cur = "/".getBytes("UTF-8");
ret[len + i] = cur;
}
return ret;
}
public static byte[][][] makeTensor(List<String> l, int padding) throws Exception
{
byte[][][] ret = new byte[l.size()][][];
for (int i = 0; i < l.size(); i++) {
ret[i] = makeTensor(l.get(i), padding);
}
return ret;
}
public static void main(String[] args) throws Exception {
System.out.println("Loading model...");
SavedModelBundle savedModelBundle = SavedModelBundle.load("model", "serve");
List<String> str_test_name = Arrays.asList("Max Mustermann",
"Max Mustermann",
"Max Mustermann",
"Max Mustermann",
"Max Mustermann");
List<String> names = Arrays.asList("joseph haydn",
"max mustermann",
"erika musterfrau",
"johann sebastian bach",
"wolfgang amadeus mozart");
//get the max length for each array
int pad1 = str_test_name.get(0).length();
int pad2 = 0;
for (String var : names) {
if(var.length() > pad2)
pad2 = var.length();
}
byte[][][] test_name = makeTensor(str_test_name, pad1);
byte[][][] ref_names = makeTensor(names, pad2);
//use a with block so the close method is called
try(Tensor t_test_name = Tensor.<String>create(test_name))
{
try (Tensor t_ref_names = Tensor.<String>create(ref_names))
{
List<Tensor<?>> output = savedModelBundle.session().runner()
.feed("test_name", t_test_name)
.feed("ref_names", t_ref_names)
.fetch("ArgMin")
.run();
System.out.println("Nearest distance: " + output.get(0).longValue());
}
}
}
}

Show line on map which connects points - receiving an empty map

I want to plot a line which connects 2 points on a map.
The code I am using:
public class Quickstart {
public static void main(String[] args) throws Exception {
// display a data store file chooser dialog for shapefiles
File file = JFileDataStoreChooser.showOpenFile("shp", null);
if (file == null) {
return;
}
FileDataStore store = FileDataStoreFinder.getDataStore(file);
SimpleFeatureSource featureSource = store.getFeatureSource();
GeometryFactory gf = JTSFactoryFinder.getGeometryFactory();
// ask for current and destination positions
double latitude, longitude, latitudeDest, longitudeDest;
Scanner reader = new Scanner(System.in);
reader.useLocale(Locale.US);
System.out.println("Enter reference longitude and latitude:\n");
longitude = reader.nextDouble();
latitude = reader.nextDouble();
System.out.println("Enter destination longitude and latitude:\n");
longitudeDest = reader.nextDouble();
latitudeDest = reader.nextDouble();
reader.close();
final String EPSG4326 = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\"," +
"\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\", " +
"0.01745329251994328,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]";
CoordinateReferenceSystem crs = CRS.parseWKT(EPSG4326);
Point start = gf.createPoint(new Coordinate(longitude, latitude));
Point end = gf.createPoint(new Coordinate(longitudeDest, latitudeDest));
GeodeticCalculator gc = new GeodeticCalculator(crs);
gc.setStartingPosition(JTS.toDirectPosition(start.getCoordinate(), crs));
gc.setDestinationPosition(JTS.toDirectPosition(end.getCoordinate(), crs));
// Calculate distance between points
double distance = gc.getOrthodromicDistance();
int totalmeters = (int) distance;
int km = totalmeters / 1000;
int meters = totalmeters - (km * 1000);
float remaining_cm = (float) (distance - totalmeters) * 10000;
remaining_cm = Math.round(remaining_cm);
float cm = remaining_cm / 100;
System.out.println("Distance = " + km + "km " + meters + "m " + cm + "cm");
SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder();
builder.setName("TwoDistancesType");
builder.setCRS(DefaultGeographicCRS.WGS84);
builder.add("location", Point.class);
// build the type
final SimpleFeatureType TYPE = builder.buildFeatureType();
SimpleFeatureBuilder featureBuilder = new SimpleFeatureBuilder(TYPE);
featureBuilder.add(start);
//featureBuilder.add(end);
SimpleFeature feature = featureBuilder.buildFeature(null);
DefaultFeatureCollection featureCollection = new DefaultFeatureCollection("internal", TYPE);
featureCollection.add(feature);
Style style = SLD.createSimpleStyle(TYPE, Color.red);
Layer layer = new FeatureLayer(featureCollection, style);
// Create a map content and add our shapefile to it
MapContent map = new MapContent();
map.setTitle("TEST");
map.addLayer(layer);
// Now display the map
JMapFrame.showMap(map);
}
}
I have 2 problems:
1) I can't add a second feature to featureBuilder.It doesn't allow it.It shows Can handle 1 attributes only, index is 1.
So, how can I plot a line then?
2) With the above code, I am receiving:
org.geotools.renderer.lite.StreamingRenderer fireErrorEvent SEVERE: The scale denominator must be positive
java.lang.IllegalArgumentException: The scale denominator must be positive
------- UPDATE ------------------------
After the solution that #Michael gave for the first question , now I no longer receive the error regarding the denominator, but I am receiving an empty map (white space).
----- UPDATE according to #iant suggestion ----------------
So, I tried this.Created a coordinates which holds the coordinates of the points (start and end) ,then created a linestring with those coordinates and added it to featurebuilder.
SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder();
builder.setName("TwoDistancesType");
builder.setCRS(DefaultGeographicCRS.WGS84);
builder.add("line", LineString.class); //added a linestring class
final SimpleFeatureType TYPE = builder.buildFeatureType();
SimpleFeatureBuilder featureBuilder = new SimpleFeatureBuilder(TYPE);
Coordinate[] coordinates = {start.getCoordinate(), end.getCoordinate()};
LineString line = gf.createLineString(coordinates);
featureBuilder.add(line);
and even though I am loading a map (countries.shp) it shows me an empty white space with a red line.
------ SOLUTION -------------
Ok, so the solution is (thanks to #iants comments):
Style style = SLD.createLineStyle(Color.red, 2.0f);
Layer layer = new FeatureLayer(featureCollection, style);
// Create style for the file
Style shpStyle = SLD.createSimpleStyle(TYPE, Color.blue);
Layer shpLayer = new FeatureLayer(featureSource, shpStyle);
// Create a map content and add our shapefile to it
MapContent map = new MapContent();
map.setTitle("TEST");
map.addLayer(layer);
map.addLayer(shpLayer);
and now you have a red line on a blue map!
You need to create a LineString from your points and then store that in your feature.
You should then get a correct scale but you might want to add some other data such as a coast line to the map first. The quick start tutorial can show you how to do that.
Disclaimer: I've never used geotools.
Looking at the source code of SimpleFeatureBuilder, add calls set which throws that error if:
if(index >= values.length)
throw new ArrayIndexOutOfBoundsException("Can handle "
+ values.length + " attributes only, index is " + index);
values is populated here:
values = new Object[featureType.getAttributeCount()];
so it's obvious that that problem is because your Type only has one property. Change it so it has two:
SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder();
builder.setName("TwoDistancesType");
builder.setCRS(DefaultGeographicCRS.WGS84);
builder.add("start", Point.class);
builder.add("end", Point.class);

Mapreduce java program to search QuadTree index and also run GeometryEngine.contains to confirm point in polygon using wkt file

This post is a map reduce implementation suggested for my previous question: "How to optimize scan of 1 huge file / table in Hive to confirm/check if lat long point is contained in a wkt geometry shape"
I am not well-versed in writing java programs for map-reduce and I mainly use Hive or Pig or spark to develop in Hadoop eco-system. To give a background of task at hand: I am trying to associate every latitude/longitude ping to corresponding ZIP postal code. I have a WKT multi-polygon shape file (500 MB) with all the zip information. I have loaded this in Hive and can do a join using ST_Contains(polygon, point). However, it takes very long to complete. To over come this bottle neck I am trying to leverage the example in ESRI ("https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-mr") by building a quad tree index for searching a point derived from lat-long in polygon.
I have managed to write the code and it clogs up the Java heap memory of the cluster. Any suggestions on improving the code or looking at a different approach will be greatly appreciated:
Error message:
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
My code:
public class MapperClass extends Mapper<LongWritable, Text, Text, IntWritable> {
// column indices for values in the text file
int longitudeIndex;
int latitudeIndex;
int wktZip;
int wktGeom;
int wktLineCount;
int wktStateID;
// in boundaries.wkt, the label for the polygon is "wkt"
//creating ArrayList to hold details of the file
ArrayList<ZipPolyClass> nodes = new ArrayList<ZipPolyClass>();
String labelAttribute;
EsriFeatureClass featureClass;
SpatialReference spatialReference;
QuadTree quadTree;
QuadTreeIterator quadTreeIter;
BufferedReader csvWkt;
// class to store all the values from wkt file and calculate geometryFromWKT
public class ZipPolyClass {
public String zipCode;
public String wktPoly;
public String stateID;
public int indexJkey;
public Geometry wktGeomObj;
public ZipPolyClass(int ijk, String z, String w, String s ){
zipCode = z;
wktPoly = w;
stateID = s;
indexJkey = ijk;
wktGeomObj = GeometryEngine.geometryFromWkt(wktPoly, 0, Geometry.Type.Unknown);
}
}
//building quadTree Index from WKT multiPolygon and creating an iterator
private void buildQuadTree(){
quadTree = new QuadTree(new Envelope2D(-180, -90, 180, 90), 8);
Envelope envelope = new Envelope();
int j=0;
while(j<nodes.size()){
nodes.get(j).wktGeomObj.queryEnvelope(envelope);
quadTree.insert(j, new Envelope2D(envelope.getXMin(), envelope.getYMin(), envelope.getXMax(), envelope.getYMax()));
}
quadTreeIter = quadTree.getIterator();
}
/**
* Query the quadtree for the feature containing the given point
*
* #param pt point as longitude, latitude
* #return index to feature in featureClass or -1 if not found
*/
private int queryQuadTree(Point pt)
{
// reset iterator to the quadrant envelope that contains the point passed
quadTreeIter.resetIterator(pt, 0);
int elmHandle = quadTreeIter.next();
while (elmHandle >= 0){
int featureIndex = quadTree.getElement(elmHandle);
// we know the point and this feature are in the same quadrant, but we need to make sure the feature
// actually contains the point
if (GeometryEngine.contains(nodes.get(featureIndex).wktGeomObj, pt, spatialReference)){
return featureIndex;
}
elmHandle = quadTreeIter.next();
}
// feature not found
return -1;
}
/**
* Sets up mapper with filter geometry provided as argument[0] to the jar
*/
#Override
public void setup(Context context)
{
Configuration config = context.getConfiguration();
spatialReference = SpatialReference.create(4326);
// first pull values from the configuration
String featuresPath = config.get("sample.features.input");
//get column reference from driver class
wktZip = config.getInt("sample.features.col.zip", 0);
wktGeom = config.getInt("sample.features.col.geometry", 18);
wktStateID = config.getInt("sample.features.col.stateID", 3);
latitudeIndex = config.getInt("samples.csvdata.columns.lat", 5);
longitudeIndex = config.getInt("samples.csvdata.columns.long", 6);
FSDataInputStream iStream = null;
try {
// load the text WKT file provided as argument 0
FileSystem hdfs = FileSystem.get(config);
iStream = hdfs.open(new Path(featuresPath));
BufferedReader br = new BufferedReader(new InputStreamReader(iStream));
String wktLine ;
int i=0;
while((wktLine = br.readLine()) != null){
String [] val = wktLine.split("\\|");
String qtZip = val[wktZip];
String poly = val[wktGeom];
String stID = val[wktStateID];
ZipPolyClass zpc = new ZipPolyClass(i, qtZip, poly, stID);
nodes.add(i,zpc);
i++; // increment in the loop before end
}
}
catch (Exception e)
{
e.printStackTrace();
}
finally
{
if (iStream != null)
{
try {
iStream.close();
} catch (IOException e) { }
}
}
// build a quadtree of our features for fast queries
if (!nodes.isEmpty()) {
buildQuadTree();
}
}
#Override
public void map(LongWritable key, Text val, Context context)
throws IOException, InterruptedException {
/*
* The TextInputFormat we set in the configuration, by default, splits a text file line by line.
* The key is the byte offset to the first character in the line. The value is the text of the line.
*/
String line = val.toString();
String [] values = line.split(",");
// get lat long from file and convert to float
float latitude = Float.parseFloat(values[latitudeIndex]);
float longitude = Float.parseFloat(values[longitudeIndex]);
// Create our Point directly from longitude and latitude
Point point = new Point(longitude, latitude);
int featureIndex = queryQuadTree(point);
// Each map only processes one record at a time, so we start out with our count
// as 1. Since we have a distinct record file we will not run reducer
IntWritable one = new IntWritable(1);
if (featureIndex >= 0){
String zipTxt =nodes.get(featureIndex).zipCode;
String stateIDTxt = nodes.get(featureIndex).stateID;
String latTxt = values[latitudeIndex];
String longTxt = values[longitudeIndex];
String pointTxt = point.toString();
String name;
name = zipTxt+"\t"+stateIDTxt+"\t"+latTxt+"\t"+longTxt+ "\t" +pointTxt;
context.write(new Text(name), one);
} else {
context.write(new Text("*Outside Feature Set"), one);
}
}
}
I was able to resolve the out of memory issue by modifying the arrayList < classObject > to just hold arrayList < geometry > type.
Creating a class object (around 50k) to hold each row of a text file, consumed all the java heap memory. After this change code ran fine even in a 1-node virtual sandbox. I was able to crunch around 40 million rows in around 6 minutes.

Rcaller not giving back variables

When integrating R with Java with RCaller, I never get back any variable that is created within the script. There seems to be a fundamental missunderstanding how RCaller works. Isn't it that all the variables in the Environment can be parsed from Java? How?
#Test
public void test() {
RCaller caller = new RCaller();
RCode code = new RCode();
caller.setRscriptExecutable("/usr/bin/Rscript");
caller.runAndReturnResult("source('~/git/conjoint_it/src/main/r/a.R')");
System.out.println(caller.getParser().getNames());
}
a.R:
...
m3 <- mlogit(choice ~ 0 + seat + cargo + eng
+ as.numeric(as.character(price)),
data = cbc.mlogit)
su = summary(m3)
m3 #last line
this returns only [visible]
you can handle all of the variables defined in an environment with RCaller.
Now we suppose you use the global environment (this is a special and the top level environment in which you declare variables out of a refclass or a function).
package org.expr.rcaller;
import java.util.ArrayList;
import org.expr.rcaller.Globals;
import org.expr.rcaller.RCaller;
import org.expr.rcaller.RCode;
import org.junit.Test;
import org.junit.Assert;
public class HandlingAllVariablesTest {
private final static double delta = 1.0 / 1000.0;
#Test
public void GetAllVariablesInEnvironmentTest() {
RCaller caller = new RCaller();
Globals.detect_current_rscript();
caller.setRscriptExecutable(Globals.Rscript_current);
RCode code = new RCode();
code.addDouble("x", 5.65);
code.addDouble("y", 8.96);
code.addRCode("result <- as.list(.GlobalEnv)");
caller.setRCode(code);
caller.runAndReturnResult("result");
ArrayList<String> names = caller.getParser().getNames();
System.out.println("Names : " + names);
System.out.println("x is " + caller.getParser().getAsDoubleArray("x")[0]);
System.out.println("y is " + caller.getParser().getAsDoubleArray("y")[0]);
Assert.assertEquals(caller.getParser().getAsDoubleArray("x")[0], 5.65, delta);
Assert.assertEquals(caller.getParser().getAsDoubleArray("y")[0], 8.96, delta);
}}
Results like this:
Names : [x, y]
x is 5.65
y is 8.96
Here is the key point
code.addRCode("result <- as.list(.GlobalEnv)");
so we are defining a variable to capture all of the variables defined in the global environment. as.list() function converts an environment object into a list. The second important point is to transfer this variable into the java by
caller.runAndReturnResult("result");
You can see more examples about capturing specific variables rather than environments by visiting the blog page and the web page.
Imports:
import com.github.rcaller.rStuff.RCaller;
import com.github.rcaller.rStuff.RCode;
Java code:
RCaller caller = new RCaller();
RCode code = new RCode();
caller.setRscriptExecutable("C:\\Program Files\\R\\R-4.0.2\\bin\\Rscript.exe");
caller.setRCode(code);
code.clear();
caller.cleanRCode();
//Methods to parse variables to the Rscript
code.addInt("mydata1", 5);
code.addDoubleArray("mydata2", new double[]{1, 2, 3, 4, 5});
code.addRCode("mydata3 <- 'Data'");
//Calling the Rscript
code.addRCode("source('./src/test.r')");
//Reciving Values from the Rscript through the result variable
caller.runAndReturnResult("result");
int data = caller.getParser().getAsIntArray("data")[0];
double mean = caller.getParser().getAsDoubleArray("mean")[0];
String text = caller.getParser().getAsStringArray("text")[0];
System.out.println(data);
System.out.println(mean);
System.out.println(text);
test.r:
result1 <- mydata1 * 2
result2 <- mean(mydata2)
result3 <- paste("Result3", mydata3, sep=" ")
result <- list(data=result1, mean=result2, text=result3)
Output:
10
3.0
Result3 Data

Categories