Retrieve multiple columns value from Cassandra using Hector client - java

I am working with Cassandra and I am using Hector client to read and upsert the data in Cassandra database. I am trying to retrieve the data from Cassandra database using hector client and I am able to do that if I am trying to retrieve only one column.
Now I am trying to retrieve the data for rowKey as 1011 but with columnNames as collection of string. Below is my API that will retrieve the data from Cassandra database using Hector client-
public Map<String, String> getAttributes(String rowKey, Collection<String> attributeNames, String columnFamily) {
final Cluster cluster = CassandraHectorConnection.getInstance().getCluster();
final Keyspace keyspace = CassandraHectorConnection.getInstance().getKeyspace();
try {
ColumnQuery<String, String, String> columnQuery = HFactory
.createStringColumnQuery(keyspace)
.setColumnFamily(columnFamily).setKey(rowKey)
.setName("c1");
QueryResult<HColumn<String, String>> result = columnQuery.execute();
System.out.println("Column Name from cassandra: " + result.get().getName() + "Column value from cassandra: " + result.get().getValue());
} catch (HectorException e) {
LOG.error("Exception in CassandraHectorClient::getAttributes " +e+ ", RowKey = " +rowKey+ ", Attribute Names = " +attributeNames);
} finally {
cluster.getConnectionManager().shutdown();
}
return null;
}
If you see my above method, I am trying to retrieve the data from Cassandra database for a particular rowKey and for column c1. Now I am trying to retrieve the data from Cassandra database for collection of columns for a particular rowKey.
Meaning something like this-
I want to retrieve the data for multiple columns but for the same rowKey. How can I do this using Hector client? And I don't want to retrieve the data for all the columns and then iterate to find out the individual columns data I am looking for.

Use column name made up with composite key as combination of UTF8Type and TIMEUUID
then after
sliceQuery.setKey("your row key");
Composite startRange = new Composite();
startRange.addComponent(0, "c1",Composite.ComponentEquality.EQUAL);
Composite endRange = new Composite();
endRange.addComponent(0, "c1",Composite.ComponentEquality.GREATER_THAN_EQUAL);
sliceQuery.setRange(startRange,endRange, false, Integer.MAX_VALUE);
QueryResult<ColumnSlice<Composite, String>> result = sliceQuery.execute();
ColumnSlice<Composite, String> cs = result.get();
above code will give you all records for you row key
after that iterate as follows
for (HColumn<Composite, String> col : cs.getColumns()) {
System.out.println("column key's first part : "+col.getName().get(0, HFactoryHelper.stringSerializer).toString());
System.out.println("column key's second part : "+col.getName().get(1, HFactoryHelper.uuidSerializer).toString());
System.out.println("column key's value : "+col.getValue());
}
some where you have to write logic to maintain set of records

Related

Cassandra Trigger

I am new to Cassandra and use Cassandra 3.10 and have table like
create table db1.table1 (id text, trip_id text, event_time timestamp, mileage double, primary key(id, event_time));
create table db1.table2 (id text, trip_id text, start_time timestamp, mileage double, primary key(id, start_time));
I need to transfer data from table1 to table2 aggregated by trip_id and sum on mileage and update data in table2
I have written a trigger function to get column name and value
public Collection<Mutation> augment(Partition partition) {
HashMap map = new HashMap();
CFMetaData cfm = partition.metadata();
String tableName = cfm.cfName;
try {
UnfilteredRowIterator it = partition.unfilteredIterator();
while (it.hasNext()) {
Unfiltered un = it.next();
Clustering clt = (Clustering) un.clustering();
Iterator<Cell> cells = partition.getRow(clt).cells().iterator();
while(cells.hasNext()){
Cell cell = cells.next();
map.put(cell.column().name.toString(), cell.value().array());
...
}
}
} catch (Exception e) {
}
...
}
But how can I get Primary key and the value of Primary key? If those are not gettable, how can I use trigger function to do the job?
Yes, It is possible to get primary key and value
To get partition keys column and value use :
List<ColumnDefinition> partitionKeyColumns = cfm.partitionKeyColumns();
ByteBuffer partitionKeyValues = partition.partitionKey().getKey();
To get clustering keys column and value :
List<ColumnDefinition> clusteringKeyColumns = cfm.clusteringColumns();
ByteBuffer[] clusteringKeyValues = clt.getRawValues();

DynamoDB get single column list of range keys or global secondary

I have a DynamoDB table that contains videos info.
Currently "videoID"is the primary (hash) key and "Category" is the range (sort) key.
I want to get a list of all of the "Categories" (Range keys) so I can allow the user to select from one of the available video categories.
https://www.quora.com/What-are-some-good-ways-to-extract-one-single-column-from-a-DynamoDB-table
I was reading that if you modified change the attribute "Category" to a global secondary index you can return the items for that GSI. But I have not been able to find how to do that.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSIJavaDocumentAPI.html
So I guess that gives me three questions:
Is there a way to do to find the items in Category by querying just the range key?
If change Category to a GSI can I fiind the items that way?
or
Is the only way of doing it scanning the whole table?
Thanks in advance for your help
Is the only way of doing it scanning the whole table?
-NO, you can implement GSI to avoid it
Is there a way to do to find the items in Category by querying just the range key?
- Yes, If you don't want to scan entire table then you need to create GSI which will have Category as Hash. This GSI will act as a table in itself and you can query on it by passing category values.
If change Category to a GSI can I find the items that way?
-Yes, you can query on GSI with category values
I was reading that if you modified change the attribute "Category" to a global secondary index you can return the items for that GSI. But I have not been able to find how to do that.
-You need to create GSI when you create table, example is given in the link that you have specified once that is done you can query that GSI
References:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Here is the sample code to create Videos table with GSI.
Create "Videos" table with GSI:-
#Autowired
private AmazonDynamoDBClient dynamoDBClient;
public Boolean createTableWithGlobalSecondaryIndex(String tableName) {
CreateTableRequest createTableRequest = null;
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
try {
ArrayList<AttributeDefinition> attributeDefinitions = new ArrayList<AttributeDefinition>();
attributeDefinitions.add(new AttributeDefinition().withAttributeName("videoid").withAttributeType("S"));
attributeDefinitions.add(new AttributeDefinition().withAttributeName("category").withAttributeType("S"));
ArrayList<KeySchemaElement> keySchema = new ArrayList<KeySchemaElement>();
keySchema.add(new KeySchemaElement().withAttributeName("videoid").withKeyType(KeyType.HASH));
keySchema.add(new KeySchemaElement().withAttributeName("category").withKeyType(KeyType.RANGE));
// Initial provisioned throughput settings for the indexes
ProvisionedThroughput ptIndex = new ProvisionedThroughput().withReadCapacityUnits(150L)
.withWriteCapacityUnits(150L);
GlobalSecondaryIndex videoCategoryGsi = new GlobalSecondaryIndex().withIndexName("VideoCategoryGsi")
.withProvisionedThroughput(ptIndex)
.withKeySchema(new KeySchemaElement().withAttributeName("category").withKeyType(KeyType.HASH),
new KeySchemaElement().withAttributeName("videoid").withKeyType(KeyType.RANGE))
.withProjection(new Projection().withProjectionType(ProjectionType.ALL));
createTableRequest = new CreateTableRequest().withTableName(tableName).withKeySchema(keySchema)
.withAttributeDefinitions(attributeDefinitions)
.withProvisionedThroughput(
new ProvisionedThroughput().withReadCapacityUnits(100L).withWriteCapacityUnits(100L))
.withGlobalSecondaryIndexes(videoCategoryGsi);
Table table = dynamoDB.createTable(createTableRequest);
table.waitForActive();
} catch (ResourceInUseException re) {
if (re.getErrorMessage().equalsIgnoreCase("Cannot create preexisting table")) {
LOGGER.info("Table already exists =============>" + tableName);
} else if (re.getErrorMessage().contains("Table already exists")) {
LOGGER.info("Table already exists =============>" + tableName);
LOGGER.info("Message =============>" + re.getErrorCode() + ";" + re.getErrorMessage());
} else {
throw new RuntimeException("DynamoDB table cannot be created ...", re);
}
} catch (Exception db) {
throw new RuntimeException("DynamoDB table cannot be created ...", db);
}
return true;
}
Query GSI by category:-
Here is the input is just category and it is querying using GSI. In other words, it is not scanning the entire table as well.
public List<String> findVideosByCategoryUsingGlobalSecondaryIndex(String category) {
List<String> videoAsJson = new ArrayList<>();
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
Table table = dynamoDB.getTable("Videos");
Index index = table.getIndex("VideoCategoryGsi");
ItemCollection<QueryOutcome> items = null;
QuerySpec querySpec = new QuerySpec();
querySpec.withKeyConditionExpression("category = :val1")
.withValueMap(new ValueMap()
.withString(":val1", category));
items = index.query(querySpec);
Iterator<Item> pageIterator = items.iterator();
while (pageIterator.hasNext()) {
String videoJson = pageIterator.next().toJSON();
System.out.println("Video json ==================>" + videoJson);
videoAsJson.add(videoJson);
}
return videoAsJson;
}

Map a table of a cassandra database using spark and RDD

i have to map a table in which is written the history of utilization of an app. The table has got these tuples:
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
<AppId,date,cpuUsage,memoryUsage>
AppId is always different, because is referenced at many app, date is expressed in this format dd/mm/yyyy hh/mm cpuUsage and memoryUsage are expressed in % so for example:
<3ghffh3t482age20304,230720142245,0.2,3,5>
I retrieved the data from cassandra in this way (little snippet):
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
session = cluster.connect();
session.execute("CREATE KEYSPACE IF NOT EXISTS foo WITH replication "
+ "= {'class':'SimpleStrategy', 'replication_factor':3};");
String createTableAppUsage = "CREATE TABLE IF NOT EXISTS foo.appusage"
+ "(appid text,date text, cpuusage double, memoryusage double, "
+ "PRIMARY KEY(appid,date) " + "WITH CLUSTERING ORDER BY (time ASC);";
session.execute(createTableAppUsage);
// Use select to get the appusage's table rows
ResultSet resultForAppUsage = session.execute("SELECT appid,cpuusage FROM foo.appusage");
for (Row row: resultForAppUsage)
System.out.println("appid :" + row.getString("appid") +" "+ "cpuusage"+row.getString("cpuusage"));
// Clean up the connection by closing it
cluster.close();
}
So, my problem now is to map the data by key value and create a tuple integrating this code (snippet that's doesn't work):
<AppId,cpuusage>
JavaPairRDD<String, Integer> saveTupleKeyValue =someStructureFromTakeData.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String x) {
return new Tuple2(x, y);
}
how i can map appId and cpuusage using RDD and the reduce eg. cpuusage >50?
Any help?
thanks in advance.
Assuming that you have a valid SparkContext sparkContext already created, have added the spark-cassandra connector dependencies to your project and configured your spark application to talk to your cassandra cluster (see docs for that), then we can load the data in an RDD like this:
val data = sparkContext.cassandraTable("foo", "appusage").select("appid", "cpuusage")
In Java, the idea is the same but it requires a bit more plumbing, described here

HBase doesn't store all records

I have 1.2M records at my MongoDB database. And I want to store all of this data at HBase programmatically. Basically I try to put each retrieved record to HBase in a loop. After the operation is finished, I got only 39912 records on HBase.
Here's what I've tried:
Configuration config = HBaseConfiguration.create();
String tableName = "storedtweet";
String familyName = "msg";
String qualifierName = "msg";
HTable table = new HTable(config, tableName);
// using Spring Data MongoDB to interact with MongoDB
List < StoredTweet > storedTweetList = mongoDAO.getMongoTemplate().findAll(StoredTweet.class);
for (StoredTweet storedTweet: storedTweetList) {
Put p = new Put(Bytes.toBytes(storedTweet.getTweetId()));
p.add(Bytes.toBytes(familyName), Bytes.toBytes(qualifierName), Bytes.toBytes(storedTweet.getMsg()));
table.put(p);
table.flushCommits();
}
If some row key exists and you put it again, HBase Put will override the former. I think there are some records having the same tweet id (you set it to the row key) in your data. That's why some records disappear.

Random column fetch in cassandra

I am using this code for fetching user_id & user_code
Keyspace keyspace = HFactory.createKeyspace("test", cluster);
CqlQuery<String,String,ByteBuffer> cqlQuery = new CqlQuery<String,String,ByteBuffer>(keyspace, stringSerializer, stringSerializer, new ByteBufferSerializer());
cqlQuery.setQuery("select user_id,user_code from User");
QueryResult<CqlRows<String,String,ByteBuffer>> result = cqlQuery.execute();
Iterator iterator = result.get().iterator();
while(iterator.hasNext()) {
Row<String, String, ByteBuffer> row = (Row<String, String, ByteBuffer>) iterator.next();
System.out.println("\nInserted data is as follows:\n" + row.getColumnSlice().getColumns().get(0).getValue().getInt());
System.out.println("\nInserted data is as follows:\n" + Charset.forName("UTF-8").decode(row.getColumnSlice().getColumns().get(1).getValueBytes()));
}
Now Problem lies here that I am converting the fields according to their specific type
What if query goes random? How to handle that scenario?
CQL queries are returned with metadata about the columns they contain, similar to a JDBC resultset.
I don't know if or how Hector exposes this information. For CQL, a better choice would be the new pure CQL driver here: https://github.com/datastax/java-driver

Categories