No data found exception using oracle olapi - java

i started experimenting with oracle olap api 'olapi', but i'm having some issues while running their examples package.
when i run MakingQueriesExamples.java in source package i get this error:
oracle.olapi.data.cursor.NoDataAvailableException
at oracle.express.olapi.data.full.DefinitionManager.handleException(Unknown Source)
at oracle.express.olapi.data.full.DefinitionManager.createCursorManagerInterfaces(Unknown Source)
at oracle.express.olapi.data.full.DefinitionManager.createCursorManagers(Unknown Source)
at oracle.olapi.data.source.DataProvider.createCursorManagers(Unknown Source)
at oracle.olapi.data.source.DataProvider.createCursorManagers(Unknown Source)
at oracle.olapi.data.source.DataProvider.createCursorManagers(Unknown Source)
at oracle.olapi.data.source.DataProvider.createCursorManagers(Unknown Source)
at oracle.olapi.data.source.DataProvider.createCursorManager(Unknown Source)
at olap.Context11g._displayResult(Context11g.java:650)
at olap.Context11g.displayResult(Context11g.java:631)
at olap.source.MakingQueriesExamples.controllingMatchingWithAlias(MakingQueriesExamples.java:114)
at olap.source.MakingQueriesExamples.run(MakingQueriesExamples.java:40)
at olap.BaseExample11g.execute(BaseExample11g.java:54)
at olap.BaseExample11g.execute(BaseExample11g.java:74)
at olap.source.MakingQueriesExamples.main(MakingQueriesExamples.java:478)
the part that's causing the error is in here (last line) (MakingQueriesExamples):
println("\nControlling Input-to-Source Matching With the alias Method");
MdmMeasure mdmUnits = getMdmMeasure("UNITS");
// Get the Source objects for the measure and for the default hierarchies
// of the dimensions.
NumberSource units = (NumberSource) mdmUnits.getSource();
StringSource prodHier = (StringSource)
getMdmPrimaryDimension("PRODUCT").getDefaultHierarchy().getSource();
StringSource custHier = (StringSource)
getMdmPrimaryDimension("CUSTOMER").getDefaultHierarchy().getSource();
StringSource chanHier = (StringSource)
getMdmPrimaryDimension("CHANNEL").getDefaultHierarchy().getSource();
StringSource timeHier = (StringSource)
getMdmPrimaryDimension("TIME").getDefaultHierarchy().getSource();
// Select single values for the hierarchies.
//Source prodSel = prodHier.selectValue("PRODUCT_PRIMARY::ITEM::ENVY ABM");
Source prodSel = prodHier.selectValue("PRIMARY::ITEM::ENVY ABM");
//Source custSel = custHier.selectValue("SHIPMENTS::SHIP_TO::BUSN WRLD SJ");
Source custSel = custHier.selectValue("SHIPMENTS::SHIP_TO::BUSN WRLD SJ");
//Source timeSel = timeHier.selectValue("CALENDAR_YEAR::MONTH::2001.01");
Source timeSel = timeHier.selectValue("CALENDAR::MONTH::2001.01");
// Produce a Source that specifies the units values for the selected
// dimension values.
Source unitsSel = units.join(timeSel).join(custSel).join(prodSel);
// Create aliases for the Channel dimension hierarchy.
Source chanAlias1 = chanHier.alias();
Source chanAlias2 = chanHier.alias();
// Join the aliases to the Source representing the units values specified
// by the selected dimension elements, using the value method to make the
// alias an input.
NumberSource unitsSel1 = (NumberSource) unitsSel.join(chanAlias1.value());
NumberSource unitsSel2 = (NumberSource) unitsSel.join(chanAlias2.value());
// chanAlias2 is the first output of result, so its values are the row
// (slower varying) values; chanAlias1 is the second output of result
// so its values are the column (faster varying) values.
Source result = unitsSel1.gt(unitsSel2)
.join(chanAlias1) // Output 2, column
.join(chanAlias2); // Output 1, row
getContext().commit();
getContext().displayResult(result);
and in here (first line)(context11g.java):
CursorManager cursorManager =
dp.createCursorManager(source);
Cursor cursor = cursorManager.createCursor();
cpw.printCursor(cursor, displayLocVal);
// Close the CursorManager.
cursorManager.close();
i'm using oracle database 11.2.0.1.0 with OLAP option enabled and oracle analytical workplace manager 11.2.0.4B
i've started by installing the 'global' schema as instructed here:
https://www.oracle.com/technetwork/database/options/olap/global-11g-readme-082667.html
i verified everything in AWM (cubes, dimensions and mesures), and the data in sqldevelopper.
i've noticed that some of the hierarchies' names have changed so i updated them on the java code
any help would be appreciated !
thanks in advance

Related

Loading saved Tensorflow model in Java

I have developed a Tensorflow model with python in Linux based on the tutorial here: "http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/". I trained and saved the model using "tf.train.Saver". I am able to deploy the model in Linux environment and perform prediction successfully. Now I need to be able to load this saved model in JAVA on WINDOWS. Through extensive research online I have read that it does not work with "tf.train.Saver" and I have to change my code to use "Serving" to be able to load a saved TF model in java! Therefore, I followed the tutorial here:
"https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_saved_model.py
" and changed my code. However, I have an error with "tf.FixedLenFeature" where it is asking me to use "FixedLenSequenceFeature". Here is the complete error message:
"ValueError: First dimension of shape for feature x unknown. Consider using FixedLenSequenceFeature."
which is happening here:
feature_configs = {'x': tf.FixedLenFeature(shape=[None, img_size,img_size,num_channels], dtype=tf.float32),}
I am not sure this is the right path to go since I have batch of images of size [batchsize*128*128*3] and should not be using the sequence feature! It would be great if someone could clear this out for me and answer these questions:
1- Do I have to change my code from "tf.train.Saver" to "serving" to be able to load the saved model and deploy it in JAVA?
2- If the answer to the above question is yes, how can I feed the data correctly and solve the aforementioned ERROR?
3- Is there any example of how to DEPLOY the model that was saved using "serving"?
Here is my training code that throws the error:
import dataset
import tensorflow as tf
import time
from datetime import timedelta
import math
import random
import numpy as np
import os
#Adding Seed so that random initialization is consistent
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
batch_size = 32
#Prepare input data
classes = ['class1','class2','class3']
num_classes = len(classes)
# 20% of the data will automatically be used for validation
validation_size = 0.2
img_size = 128
num_channels = 3
train_path='/home/user1/Downloads/Expression/Augmented/Data/Train'
# We shall load all the training and validation images and labels into memory using openCV and use that during training
data = dataset.read_train_sets(train_path, img_size, classes, validation_size=validation_size)
print("Complete reading input data. Will Now print a snippet of it")
print("Number of files in Training-set:\t\t{}".format(len(data.train.labels)))
print("Number of files in Validation-set:\t{}".format(len(data.valid.labels)))
session = tf.Session()
serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[None, img_size,img_size,num_channels], dtype=tf.float32),}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
x = tf.identity(tf_example['x'], name='x') # use tf.identity() to assign name
# x = tf.placeholder(tf.float32, shape=[None, img_size,img_size,num_channels], name='x')
## labels
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
y_true_cls = tf.argmax(y_true, dimension=1)
##Network graph params
filter_size_conv1 = 3
num_filters_conv1 = 32
filter_size_conv2 = 3
num_filters_conv2 = 32
filter_size_conv3 = 3
num_filters_conv3 = 64
fc_layer_size = 128
def create_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def create_biases(size):
return tf.Variable(tf.constant(0.05, shape=[size]))
def create_convolutional_layer(input,
num_input_channels,
conv_filter_size,
num_filters):
## We shall define the weights that will be trained using create_weights function.
weights = create_weights(shape=[conv_filter_size, conv_filter_size, num_input_channels, num_filters])
## We create biases using the create_biases function. These are also trained.
biases = create_biases(num_filters)
## Creating the convolutional layer
layer = tf.nn.conv2d(input=input,
filter=weights,
strides=[1, 1, 1, 1],
padding='SAME')
layer += biases
## We shall be using max-pooling.
layer = tf.nn.max_pool(value=layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
## Output of pooling is fed to Relu which is the activation function for us.
layer = tf.nn.relu(layer)
return layer
def create_flatten_layer(layer):
#We know that the shape of the layer will be [batch_size img_size img_size num_channels]
# But let's get it from the previous layer.
layer_shape = layer.get_shape()
## Number of features will be img_height * img_width* num_channels. But we shall calculate it in place of hard-coding it.
num_features = layer_shape[1:4].num_elements()
## Now, we Flatten the layer so we shall have to reshape to num_features
layer = tf.reshape(layer, [-1, num_features])
return layer
def create_fc_layer(input,
num_inputs,
num_outputs,
use_relu=True):
#Let's define trainable weights and biases.
weights = create_weights(shape=[num_inputs, num_outputs])
biases = create_biases(num_outputs)
# Fully connected layer takes input x and produces wx+b.Since, these are matrices, we use matmul function in Tensorflow
layer = tf.matmul(input, weights) + biases
if use_relu:
layer = tf.nn.relu(layer)
return layer
layer_conv1 = create_convolutional_layer(input=x,
num_input_channels=num_channels,
conv_filter_size=filter_size_conv1,
num_filters=num_filters_conv1)
layer_conv2 = create_convolutional_layer(input=layer_conv1,
num_input_channels=num_filters_conv1,
conv_filter_size=filter_size_conv2,
num_filters=num_filters_conv2)
layer_conv3= create_convolutional_layer(input=layer_conv2,
num_input_channels=num_filters_conv2,
conv_filter_size=filter_size_conv3,
num_filters=num_filters_conv3)
layer_flat = create_flatten_layer(layer_conv3)
layer_fc1 = create_fc_layer(input=layer_flat,
num_inputs=layer_flat.get_shape()[1:4].num_elements(),
num_outputs=fc_layer_size,
use_relu=True)
layer_fc2 = create_fc_layer(input=layer_fc1,
num_inputs=fc_layer_size,
num_outputs=num_classes,
use_relu=False)
y_pred = tf.nn.softmax(layer_fc2,name='y_pred')
y_pred_cls = tf.argmax(y_pred, dimension=1)
values, indices = tf.nn.top_k(y_pred, 3)
table = tf.contrib.lookup.index_to_string_table_from_tensor(
tf.constant([str(i) for i in xrange(3)]))
prediction_classes = table.lookup(tf.to_int64(indices))
session.run(tf.global_variables_initializer())
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2,
labels=y_true)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)
correct_prediction = tf.equal(y_pred_cls, y_true_cls)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
session.run(tf.global_variables_initializer())
def show_progress(epoch, feed_dict_train, feed_dict_validate, val_loss):
acc = session.run(accuracy, feed_dict=feed_dict_train)
val_acc = session.run(accuracy, feed_dict=feed_dict_validate)
msg = "Training Epoch {0} --- Training Accuracy: {1:>6.1%}, Validation Accuracy: {2:>6.1%}, Validation Loss: {3:.3f}"
print(msg.format(epoch + 1, acc, val_acc, val_loss))
total_iterations = 0
# saver = tf.train.Saver()
def train(num_iteration):
global total_iterations
for i in range(total_iterations,
total_iterations + num_iteration):
x_batch, y_true_batch, _, cls_batch = data.train.next_batch(batch_size)
x_valid_batch, y_valid_batch, _, valid_cls_batch = data.valid.next_batch(batch_size)
feed_dict_tr = {x: x_batch,
y_true: y_true_batch}
feed_dict_val = {x: x_valid_batch,
y_true: y_valid_batch}
session.run(optimizer, feed_dict=feed_dict_tr)
if i % int(data.train.num_examples/batch_size) == 0:
print(i)
val_loss = session.run(cost, feed_dict=feed_dict_val)
epoch = int(i / int(data.train.num_examples/batch_size))
show_progress(epoch, feed_dict_tr, feed_dict_val, val_loss)
print("Saving the model Now!")
# saver.save(session, save_path_full, global_step=i)
total_iterations += num_iteration
train(num_iteration=10000)#3000
# Export model
# WARNING(break-tutorial-inline-code): The following code snippet is
# in-lined in tutorials, please update tutorial documents accordingly
# whenever code changes.
export_path_base = './SavedModel/'
export_path = os.path.join(
tf.compat.as_bytes(export_path_base),
tf.compat.as_bytes(str(1)))
print 'Exporting trained model to', export_path
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
# Build the signature_def_map.
classification_inputs = tf.saved_model.utils.build_tensor_info(
serialized_tf_example)
classification_outputs_classes = tf.saved_model.utils.build_tensor_info(
prediction_classes)
classification_outputs_scores = tf.saved_model.utils.build_tensor_info(values)
classification_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={
tf.saved_model.signature_constants.CLASSIFY_INPUTS:
classification_inputs
},
outputs={
tf.saved_model.signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES:
classification_outputs_scores
},
method_name=tf.saved_model.signature_constants.CLASSIFY_METHOD_NAME))
tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y_pred)
prediction_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_x},
outputs={'scores': tensor_info_y},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
'predict_images':
prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
},
legacy_init_op=legacy_init_op)
builder.save()
print 'Done exporting!'

Java LibreOffice Draw - Set text of a shape

I'm using Java and the LibreOffice API, and I'd like to draw rectangles and set their names, or put some text fields on them. Drawing shapes was relatively easy, but adding text is really hard. I didn't find any solution, neither in documentation nor at forums.
I am declaring the shape and text like this:
Object drawShape = xDrawFactory.createInstance("com.sun.star.drawing.RectangleShape");
XShape xDrawShape = UnoRuntime.queryInterface(XShape.class, drawShape);
xDrawShape.setSize(new Size(10000, 20000));
xDrawShape.setPosition(new Point(5000, 5000));
xDrawPage.add(xDrawShape);
XText xShapeText = UnoRuntime.queryInterface(XText.class, drawShape);
XPropertySet xShapeProps = UnoRuntime.queryInterface(XPropertySet.class, drawShape);
And then I am trying to set XText:
xShapeText.setString("ABC");
And this is where the problem appears (this exception is not clear for me even after reading the explanation from documentation):
com.sun.star.lang.DisposedException
at com.sun.star.lib.uno.environments.remote.JobQueue.removeJob(JobQueue.java:210)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:330)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:303)
at com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:87)
at com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:636)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:146)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:128)
at com.sun.proxy.$Proxy6.setString(Unknown Source)
at com.ericsson.stpdiagramgenerator.presentation.core.HelloTextTableShape.manipulateText(HelloTextTableShape.java:265)
at com.ericsson.stpdiagramgenerator.presentation.core.HelloTextTableShape.useWriter(HelloTextTableShape.java:65)
at com.ericsson.stpdiagramgenerator.presentation.core.HelloTextTableShape.useDocuments(HelloTextTableShape.java:52)
at com.ericsson.stpdiagramgenerator.presentation.core.HelloTextTableShape.main(HelloTextTableShape.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.IOException: com.sun.star.io.IOException: EOF reached - socket,host=localhost,port=8100,localHost=localhost.localdomain,localPort=34456,peerHost=localhost,peerPort=8100
at com.sun.star.lib.uno.bridges.java_remote.XConnectionInputStream_Adapter.read(XConnectionInputStream_Adapter.java:55)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at com.sun.star.lib.uno.protocols.urp.urp.readBlock(urp.java:355)
at com.sun.star.lib.uno.protocols.urp.urp.readMessage(urp.java:92)
at com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge$MessageDispatcher.run(java_remote_bridge.java:105)
Maybe you have another solution for inserting text/textbox/textfield on a shape with the LibreOffice API.
Your code works fine on my machine. I tested it in LibreOffice 5.1.0.3 on Windows. Here is the code I used:
com.sun.star.frame.XDesktop xDesktop = null;
// getDesktop() is from
// https://wiki.openoffice.org/wiki/API/Samples/Java/Writer/BookmarkInsertion
xDesktop = getDesktop();
com.sun.star.lang.XComponent xComponent = null;
try {
xComponent = xDesktop.getCurrentComponent();
XDrawPagesSupplier xDrawPagesSupplier =
(XDrawPagesSupplier)UnoRuntime.queryInterface(
XDrawPagesSupplier.class, xComponent);
Object drawPages = xDrawPagesSupplier.getDrawPages();
XIndexAccess xIndexedDrawPages = (XIndexAccess)
UnoRuntime.queryInterface(
XIndexAccess.class, drawPages);
Object drawPage = xIndexedDrawPages.getByIndex(0);
XMultiServiceFactory xDrawFactory =
(XMultiServiceFactory)UnoRuntime.queryInterface(
XMultiServiceFactory.class, xComponent);
Object drawShape = xDrawFactory.createInstance(
"com.sun.star.drawing.RectangleShape");
XDrawPage xDrawPage = (XDrawPage)UnoRuntime.queryInterface(
XDrawPage.class, drawPage);
XShape xDrawShape = UnoRuntime.queryInterface(XShape.class, drawShape);
xDrawShape.setSize(new Size(10000, 20000));
xDrawShape.setPosition(new Point(5000, 5000));
xDrawPage.add(xDrawShape);
XText xShapeText = UnoRuntime.queryInterface(XText.class, drawShape);
XPropertySet xShapeProps = UnoRuntime.queryInterface(
XPropertySet.class, drawShape);
xShapeText.setString("DEF");
} catch( Exception e) {
e.printStackTrace(System.err);
System.exit(1);
}
To run it, I opened a new LibreOffice Draw file, then pressed "Run Project" in NetBeans. This was the result:
It looks like the exception may be caused by a problem with connecting to the document. How exactly are you running the macro?
Related: This question is also posted at https://forum.openoffice.org/en/forum/viewtopic.php?f=20&p=395334, which contains a solution in Basic.

How to overcome SVMWithSGD that throws ArrayIndexOutOfBoundsException for index bigger that 5000?

In order to detect visitors demographics based on their behavior I used SVM algorithm from SPARK MLlib:
JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc.sc(), "labels.txt").toJavaRDD();
JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L);
training.cache();
JavaRDD<LabeledPoint> test = data.subtract(training);
// Run training algorithm to build the model.
int numIterations = 100;
final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations);
// Clear the default threshold.
model.clearThreshold();
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(new SVMTestMapper(model));
Unfortunately final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations); throws ArrayIndexOutOfBoundsException :
Caused by: java.lang.ArrayIndexOutOfBoundsException: 4857
labels.txt is a txt file composed from:
Visitor criteria(is male) | List[siteId: access number]
1 27349:1 23478:1 35752:1 9704:2 27896:1 30050:2 30018:1
1 36214:1 26378:1 26606:1 26850:1 17968:2
1 21870:1 41294:1 37388:1 38626:1 10711:1 28392:1 20749:1
1 29328:1 34370:1 19727:1 29542:1 37621:1 20588:1 42426:1 30050:6 28666:1 23190:3 7882:1 35387:1 6637:1 32131:1 23453:1
I tried with a lot of data and algorithms and as seen it gives an error for site Ids bigger than 5000.
Is there any solution to overcome it or there is another library for this issue? Or because the data is matrix is too sparse should use SVD?

JSON to SSTable tool out-of-memory failure

json2sstable tool supplied with Cassandra 1.2.15 fails with out-of-memory error. Back in 2011 a similar issue was reported as bug and fixed: https://issues.apache.org/jira/browse/CASSANDRA-2189
Either I am missing some steps in the tool configuration/usage or the bug has re-emerged. Please point out what I am missing.
Repro steps:
1) Cassandra 1.2.15, one table with varchar key and one varchar column filled with random uuids, 6x10^6 records.
2) JSON file generated with sstable2json tool (~1G).
3) Cassandra restarted with new configuration (new data/cache/commit dirs, new partitioner)
4) Keyspace re-created
5) json2sstable fails after several minutes of processing:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:104)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:344)
at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:328)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:547)
From json2sstable source code, the tool loads all the records from json file into memory and sorts records by keys:
private int importUnsorted(String jsonFile, ColumnFamily columnFamily, String ssTablePath, IPartitioner<?> partitioner) throws IOException
{
int importedKeys = 0;
long start = System.currentTimeMillis();
JsonParser parser = getParser(jsonFile);
Object[] data = parser.readValueAs(new TypeReference<Object[]>(){});
keyCountToImport = (keyCountToImport == null) ? data.length : keyCountToImport;
SSTableWriter writer = new SSTableWriter(ssTablePath, keyCountToImport);
System.out.printf("Importing %s keys...%n", keyCountToImport);
// sort by dk representation, but hold onto the hex version
SortedMap<DecoratedKey,Map<?, ?>> decoratedKeys = new TreeMap<DecoratedKey,Map<?, ?>>();
for (Object row : data)
{
Map<?,?> rowAsMap = (Map<?, ?>)row;
decoratedKeys.put(partitioner.decorateKey( hexToBytes((String)rowAsMap.get("key"))), rowAsMap);
....
According to Jonathan Elis' comment in CASSANDRA-2322 issue the behavior is by design.
Thus json2sstable is not very well suited for importing production size data to Cassandra. The tool is likely to crash on large datasets.

How to provide correct arguments to setAsciiStream method?

This is my FULL test code with the main method:
public class TestSetAscii {
public static void main(String[] args) throws SQLException, FileNotFoundException {
String dataFile = "FastLoad1.csv";
String insertTable = "INSERT INTO " + "myTableName" + " VALUES(?,?,?)";
Connection conStd = DriverManager.getConnection("jdbc:xxxxx", "xxxxxx", "xxxxx");
InputStream dataStream = new FileInputStream(new File(dataFile));
PreparedStatement pstmtFld = conStd.prepareStatement(insertTable);
// Until this line everything is awesome
pstmtFld.setAsciiStream(1, dataStream, -1); // This line fails
System.out.println("works");
}
}
I get the "cbColDef value out of range" error
Exception in thread "main" java.sql.SQLException: [Teradata][ODBC Teradata Driver] Invalid precision: cbColDef value out of range
at sun.jdbc.odbc.JdbcOdbc.createSQLException(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.standardError(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.SQLBindInParameterAtExec(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcPreparedStatement.setStream(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcPreparedStatement.setAsciiStream(Unknown Source)
at file.TestSetAscii.main(TestSetAscii.java:21)
Here is the link to my FastLoad1.csv file. I guess that setAsciiStream fails because of the FastLoad1.csv file , but I am not sure
(In my previous question I was not able to narrow down the problem that I had. Now I have shortened the code.)
It would depend on the table schema, but the third parameter of setAsciiStream is length.
So
pstmtFld.setAsciiStream(1, dataStream, 4);
would work for a field of length 4 bytes.
But I dont think it would work as you expect it in the code. For each bind you should have separate stream.
This function setAsciiStream() is designed to be used for large data values BLOBS or long VARCHARS. It is not designed to read csv file line by line and split them into separate values.
Basicly it just binds one of the question marks with the inputStream.
After looking into the provided example it looks like teradata could handle csv but you have to explicitly tell that with:
String urlFld = "jdbc:teradata://whomooz/TMODE=ANSI,CHARSET=UTF8,TYPE=FASTLOADCSV";
I don't have enough reputation to comment, but I feel that this info can be valuable to those navigating fast load via JDBC for the first time.
This code will get the full stack trace and is very helpful for diagnosing problems with fast load:
catch (SQLException ex){
for ( ; ex != null ; ex = ex.getNextException ())
ex.printStackTrace () ;
}
In the case of the code above, it works if you specify TYPE=FASTLOADCSV in the connection string, but when run multiple times will fail due to the creation of the error tables _ERR_1 and _ERR_2. Drop these tables and clear out the destination tables to run again.

Categories