I've set up a cassandra cluster and work with the spring-cassandra framework 1.53. (http://docs.spring.io/spring-data/cassandra/docs/1.5.3.RELEASE/reference/html/)
I want to write millions of datasets into my cassandra cluster. The solution with executeAsync works good but the "ingest" command from the spring framework sounds interesting aswell.
The ingest method takes advantage of static PreparedStatements that are only prepared once for performance. Each record in your data set is bound to the same PreparedStatement, then executed asynchronously for high performance.
My code:
List<List<?>> session_time_ingest = new ArrayList<List<?>>();
for (Long tokenid: listTokenID) {
List<Session_Time_Table> tempListSessionTimeTable = repo_session_time.listFetchAggregationResultMinMaxTime(tokenid);
session_time_ingest.add(tempListSessionTimeTable);
}
cassandraTemplate.ingest("INSERT into session_time (sessionid, username, eserviceid, contextroot," +
" application_type, min_processingtime, max_processingtime, min_requesttime, max_requesttime)" +
" VALUES(?,?,?,?,?,?,?,?,?)", session_time_ingest);
Throws exception:
`Exception in thread "main" com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> ...tracking.Tables.Session_Time_Table]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:540)
at com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:520)
at com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:470)
at com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:77)
at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:201)
at com.datastax.driver.core.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:126)
at org.springframework.cassandra.core.CqlTemplate.ingest(CqlTemplate.java:1057)
at org.springframework.cassandra.core.CqlTemplate.ingest(CqlTemplate.java:1077)
at org.springframework.cassandra.core.CqlTemplate.ingest(CqlTemplate.java:1068)
at ...tracking.SessionAggregationApplication.main(SessionAggregationApplication.java:68)`
I coded exactly like in the spring-cassandra doku.. I've no idea how to map the values of my object to the values cassandra expects?!
Your Session_Time_Table class is probably a mapped POJO, but ingest methods do not use POJO mapping.
Instead you need to provide a matrix where each row contains as many arguments as there are variables to bind in your prepared statement, something along the lines of:
List<List<?>> rows = new ArrayList<List<?>>();
for (Long tokenid: listTokenID) {
Session_Time_Table obj = ... // obtain a Session_Time_Table instance
List<Object> row = new ArrayList<Object>();
row.add(obj.sessionid);
row.add(obj.username);
row.add(obj.eserviceid);
// etc. for all bound variables
rows.add(row);
}
cassandraTemplate.ingest(
"INSERT into session_time (sessionid, username, eserviceid, " +
"contextroot, application_type, min_processingtime, " +
"max_processingtime, min_requesttime, max_requesttime) " +
"VALUES(?,?,?,?,?,?,?,?,?)", rows);
So my question might be a bit silly to some of you, but I am querying for some data that must be returned as a Response, I then have to use parts of that data in the front end of my application to graph it using AngularJS and nvD3 charts. To correctly format the data for the graphing tool, I must translate this data into the correct json format. I could find no direct way to pull the numbers i needed out of the returned Response. I need to take just the values I need and translate them into a list to be then parsed into a json array. The following is my work around for this and it works, giving me the list I am looking for...
if (tableState.getIdentifier().getProperty().equals("backupSize")){
Response test4 = timeSeriesQuery.queryData("backup.data.size,", "", "1y-ago", "25", "desc");
String test5 = test4.getEntity().toString();
int test6 = test5.indexOf("value");
int charIndexStart = test6 + 9;
int charIndexEnd = test5.indexOf(",", test6);
String test7 = test5.substring(charIndexStart, charIndexEnd);
int charIndexStart2 = test5.indexOf(",", charIndexEnd);
int charIndexEnd2 = test5.indexOf(",", charIndexStart2 + 2);
String test9 = test5.substring(charIndexStart2 + 1, charIndexEnd2);
long test8 = Long.parseLong(test7);
long test10 = Long.parseLong(test9);
List<Long> graphs = new ArrayList<>();
graphs.add(test8);
graphs.add(test10);
List<List<Long>> graphs2 = new ArrayList<List<Long>>();
graphs2.add(graphs);
for(int i=1, charEnd = charIndexEnd2; i<24; i++){
int nextCharStart = test5.indexOf("}", charEnd) + 2;
int nextCharEnd = test5.indexOf(",", nextCharStart);
String test11 = test5.substring(nextCharStart + 1, nextCharEnd);
int nextCharStart2 = test5.indexOf(",", nextCharEnd) + 1;
int nextCharEnd2 = test5.indexOf(",", nextCharStart2 + 2);
String test13 = test5.substring(nextCharStart2, nextCharEnd2);
long test12 = Long.parseLong(test11);
long test14 = Long.parseLong(test13);
List<Long> graphs3 = new ArrayList<>();
graphs3.add(test12);
graphs3.add(test14);
graphs2.add(graphs3);
charEnd = test5.indexOf("}", nextCharEnd2);
} return graphs2;
here is the result of test5:
xxx.xx.xxxxxx.entity.timeseries.datapoints.queryresponse.DatapointsResponse#2be02a0c[start=, end=, tags={xxx.xx.xxxxxx.entity.timeseries.datapoints.queryresponse.Tag#1600cd19[name=backup.data.size, results={xxx.xx.xxxxxx.entity.timeseries.datapoints.queryresponse.Results#2b8a61bd[groups={xxx.xx.xxxxxx.entity.timeseries.datapoints.queryresponse.Group#61540dbc[name=type, type=number]}, attributes=xxx.xx.xxxxxx.entity.util.map.Map#4b4eebd0[], values={{1487620485896,973956,3},{1487620454999,973806,3},{1487620424690,956617,3},{1487620397181,938677,3},{1487620368825,934494,3},{1487620339219,926125,3},{1487620309050,917753,3},{1487620279239,909384,3},{1487620251381,872864,3},{1487620222724,846518,3},{1487620196441,832150,3},{1487620168141,819563,3},{1487620142079,787264,3},{1487620115827,787264,3},{1487620091991,787264,3},{1487620067230,787264,3},{1487620042333,787264,3},{1487620018508,787264,3},{1487619994967,787264,3},{1487619973549,778740,3},{1487619950069,770205,3},{1487619926850,749106,3},{1487619902486,740729,3},{1487619877298,728184,3},{1487619851449,719666,3}}]}, stats=xxx.xx.xxxxxx.entity.timeseries.datapoints.queryresponse.Stats#5bb68fa5[rawCount=25]]}]
and the returned list:
[[1487620485896, 973956], [1487620454999, 973806], [1487620424690, 956617], [1487620397181, 938677], [1487620368825, 934494], [1487620339219, 926125], [1487620309050, 917753], [1487620279239, 909384], [1487620251381, 872864], [1487620222724, 846518], [1487620196441, 832150], [1487620168141, 819563], [1487620142079, 787264], [1487620115827, 787264], [1487620091991, 787264], [1487620067230, 787264], [1487620042333, 787264], [1487620018508, 787264], [1487619994967, 787264], [1487619973549, 778740], [1487619950069, 770205], [1487619926850, 749106], [1487619902486, 740729], [1487619877298, 728184]]
I can then take this and shove it into a json (at least i think so! haven't gotten that far). But this code seems ridiculous, brittle, and not the right way to go about this.
Does anyone have a better way of pulling datapoints out of a response and translating them into a json array or at least a nested list?
Thank you to anyone who read and please let me know if I can provide any more information.
When we want just a few values from a query the best way to retrieve them is doing the query with resultSet and using it's powerful metadata:
ResultSet rs = stmt.executeQuery("SELECT a, b, c FROM TABLE2");
ResultSetMetaData rsmd = rs.getMetaData();
String name = rsmd.getColumnName(1);
Taken from here
So you take the columns you need by using the metadata properties and then the best you can do is use a DTO object to store each row check this to learn a bit more about DTOs
So, basically the idea is that you build an object from the data you've retrieved or just the one you need at that moment from the database and you can use the common getters and setters to access all the fields
However, when collecting data you're normally going to be using loops as you need to iterate over the resultSet values asking for the name of the column and keeping it's value if it coincides.
Hope it helps
I already have computed clusters and want to use ELKI library only to perform evaluation on this clustering.
So I have data in this form:
0.234 0.923 cluster_1 true_cluster1
0.543 0.874 cluster_2 true_cluster3
...
I tried to:
Create 2 databases: with result labels and with reference labels:
double [][] data;
String [] reference_labels, result_labels;
DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels);
Database db1 = new StaticArrayDatabase(dbc1, null);
DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels);
Database db2 = new StaticArrayDatabase(dbc2, null);
Perform ByLabel Clustering for each database:
Clustering<Model> clustering1 = new ByLabelClustering().run(db1);
Clustering<Model> clustering2 = new ByLabelClustering().run(db2);
Use ClusterContingencyTable for comparing clusterings and getting measures:
ClusterContingencyTable ct = new ClusterContingencyTable(true, false);
ct.process(clustering1, clustering2);
PairCounting paircount = ct.getPaircount();
The problem is that measuers are not computed.
I looked into source code of ContingencyTable and PairCounting and it seems that it won't work if clusterings come from different databases and a database can have only 1 labels relation.
Is there a way to do this in ELKI?
You can modify the ByLabelClustering class easily (or implement your own) to only use the first label, or only use the second label; then you can use only one database.
Or you use the 3-parameter constructor:
DatabaseConnection dbc1 = new ArrayAdapterDatabaseConnection(data, result_labels, 0);
Database db1 = new StaticArrayDatabase(dbc1, null);
DatabaseConnection dbc2 = new ArrayAdapterDatabaseConnection(data, reference_labels, 0);
Database db2 = new StaticArrayDatabase(dbc2, null);
so that the DBIDs are the same. Then ClusterContingencyTable should work.
By default, ELKI would continue enumerating objects, so the first database would have IDs 1..n, and the second n+1..2n. But in order to compare clusterings, they need to contain the same objects, not disjoint sets.
i am doing a task converting VB script written from Powerbuild to java,
i am struggled at converting the DataStore Object into java ,
i have something like this :
lds_appeal_application = Create DataStore
lds_appeal_application.DataObject = "ds_appeal_application_report"
lds_appeal_application.SetTransObject(SQLCA)
ll_row = lds_appeal_application.retrieve(as_ksdyh, adt_start_date, adt_end_date, as_exam_name, as_subject_code)
for ll_rc = 1 to ll_row
ldt_update_date = lds_appeal_application.GetItemDatetime(ll_rc, "sqsj")
ls_caseno = trim(lds_appeal_application.GetItemString(ll_rc, "caseno"))
ls_candidate_no = trim(lds_appeal_application.GetItemString(ll_rc, "zkzh"))
ls_subjectcode = trim(lds_appeal_application.GetItemString(ll_rc, "kmcode"))
ls_papercode = trim(lds_appeal_application.GetItemString(ll_rc, "papercode"))
ls_name = trim(lds_appeal_application.GetItemString(ll_rc, "mc"))
ll_ksh = lds_appeal_application.GetItemDecimal(ll_rc, "ks_h")
ll_kmh = lds_appeal_application.GetItemDecimal(ll_rc, "km_h")
simply speaking, a datasoure is created and a data table is point to it by sql query(ds_appeal_application_report). Finally using a for loop to retrieve information from the table.
in java way of doing, i use an entities manager to createnativequery and the query can result a list of object array. However, i just dont know how to retrieve the information like VB using the DataStore Object.
please give me some advice . Thanks
I have Some question about Hector Insert double/float data into Cassandra
new Double("13.45")------->13.468259733915328
new Float("64.13") ------->119.87449
When I insert data into Cassandra by hector
TestDouble ch = new TestDouble("talend_bj",
"localhost:9160");
String family = "talend_1";
ch.ensureColumnFamily(family);
List values = new ArrayList();
values.add(HFactory.createColumn("id", 2, StringSerializer.get(),
IntegerSerializer.get()));
values.add(HFactory.createColumn("name", "zhang",
StringSerializer.get(), StringSerializer.get()));
values.add(HFactory.createColumn("salary", 13.45,
StringSerializer.get(), DoubleSerializer.get()));
ch.insertSuper("14", values, "user1", family, StringSerializer.get(),
StringSerializer.get());
StringSerializer se = StringSerializer.get();
MultigetSuperSliceQuery<String, String, String, String> q = me.prettyprint.hector.api.factory.HFactory
.createMultigetSuperSliceQuery(ch.getKeyspace(), se, se, se, se);
// q.setSuperColumn("user1").setColumnNames("id","name")
q.setKeys("12", "11","13", "14");
q.setColumnFamily(family);
q.setRange("z", "z", false, 100);
QueryResult<SuperRows<String, String, String, String>> r = q
.setColumnNames("user1", "user").execute();
Iterator iter = r.get().iterator();
while (iter.hasNext()) {
SuperRow superRow = (SuperRow) iter.next();
SuperSlice s = superRow.getSuperSlice();
List<HSuperColumn> superColumns = s.getSuperColumns();
for (HSuperColumn superColumn : superColumns) {
List<HColumn> columns = superColumn.getColumns();
System.out.println(DoubleSerializer.get().fromBytes(((String) superColumn.getSubColumnByName("salary").getValue()).getBytes()));
}
}
You can see 13.45 but I get the column value is 13.468259733915328
You should break the problem in two. After writing, IF you defined part of your schema OR use te ASSUME keyword on the commandline cli, view the data in cassandra to see if it is correct. PlayOrm has this EXACT unit test(which is on PlayOrm on top of astyanax not hector) and it works just fine....Notice the comparison in the test of -200.23...
https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/TestColumnSlice.java
Once down, does your data in cassandra look correct? If so, the issue is on your reading the value in code, otherwise, it is the writes.