Paging in cassandra through hector API for user defined queries

Paging in cassandra through hector API for user defined queries - java

Is it possible to achieve paging in cassandra through hector API for user defined queries.
If yes, how?

I have added a basic method, rest of things you have to handle. Here, as you can see we have defined page size as 100 rows, each having 10 columns. Now after the first iteration you have to some how store the last key value, which will the starting point for the iteration.
int row_count = 100;
RangeSlicesQuery<UUID, String, Long> rangeSlicesQuery = HFactory
.createRangeSlicesQuery(keyspace, UUIDSerializer.get(), StringSerializer.get(), LongSerializer.get())
.setColumnFamily("Column Family")
.setRange(null, null, false, 10)
.setRowCount(row_count);
UUID last_key = null;
while (true) {
rangeSlicesQuery.setKeys(last_key, null);
System.out.println(" > " + last_key);
QueryResult<OrderedRows<UUID, String, Long>> result = rangeSlicesQuery.execute();
OrderedRows<UUID, String, Long> rows = result.get();
Iterator<Row<UUID, String, Long>> rowsIterator = rows.iterator();
if (last_key != null && rowsIterator != null) rowsIterator.next();
while (rowsIterator.hasNext()) {
Row<UUID, String, Long> row = rowsIterator.next();
last_key = row.getKey();
if (row.getColumnSlice().getColumns().isEmpty()) {
continue;
}
}
}
}

Related

java delete all items in dynamodb

Im trying to delete all items in my table in dynamodb but it does not work.
try {
ScanRequest scanRequest = new ScanRequest().withTableName(table);
ScanResult scanResult = null;
do {
if (Check.nonNull(scanResult)) {
scanRequest.setExclusiveStartKey(scanResult.getLastEvaluatedKey());
}
scanResult = client.scan(scanRequest);
scanResult.getItems().forEach((Item) -> {
String n1 = Item.get("n1").toString();
String n2 = tem.get("n2").toString();
DeleteItemSpec spec = new DeleteItemSpec().withPrimaryKey("n1", n1, "n2", n2);
dynamodb.getTable(table).deleteItem(spec);
});
} while (Check.nonNull(scanResult.getLastEvaluatedKey()));
} catch (Exception e) {
throw new BadRequestException(e);
}
n1 is my Primary partition key
n2 is my Primary sort key

The best approach to delete all the items from DynamoDB is to drop the table and recreate it.
Otherwise, there are lot of read capacity and write capacity units being used which will cost you.
Dropping and recreating the table is the best approach.

PREAMBLE: While a scan operation is expensive, I was needing this answer for initialising a table for a test scenario (low volume). The table was being created by another process and I needed the test scenario on that table, I could therefore not delete and recreate the table.
ANSWER:
given:
DynamoDbClient db
static String TABLE_NAME
static String HASH_KEY
static String SORT_KEY
ScanIterable scanIterable = db.scanPaginator(ScanRequest.builder()
.tableName(TABLE_NAME)
.build());
for(ScanResponse scanResponse:scanIterable){
for( Map<String, AttributeValue> item: scanResponse.items()){
Map<String,AttributeValue> deleteKey = new HashMap<>();
deleteKey.put(HASH_KEY,item.get(HASH_KEY));
deleteKey.put(SORT_KEY,item.get(SORT_KEY));
db.deleteItem(DeleteItemRequest.builder()
.tableName(TRANSACTION_TABLE_NAME)
.key(deleteKey).build());
}
}

To delete all the items from the table first you need to perform scan operation over the table which will results you an scanoutcome. Using the iterator loop over the sacnoutcome with the primary key and it's primary key value.This will be one of the approach to delete all the items from the table. Hope that this code will work you. Thanks
Table table = dynamoDB.getTable(your_table);
ItemCollection<ScanOutcome> deleteoutcome = table.scan();
Iterator<Item> iterator = deleteoutcome.iterator();
while (iterator.hasNext()) {
your_table.deleteItem("PrimaryKey", iterator.next().get("primary key value"));
}
//May be we can make it look generic by reading key schema first as below
String strPartitionKey = null;
String strSortKey = null;
TableDescription description = table.describe();
List<KeySchemaElement> schema = description.getKeySchema();
for (KeySchemaElement element : schema) {
if (element.getKeyType().equalsIgnoreCase("HASH"))
strPartitionKey = element.getAttributeName();
if (element.getKeyType().equalsIgnoreCase("RANGE"))
strSortKey = element.getAttributeName();
}
ItemCollection<ScanOutcome> deleteoutcome = table.scan();
Iterator<Item> iterator = deleteoutcome.iterator();
while (iterator.hasNext()) {
Item next = iterator.next();
if (strSortKey == null && strPartitionKey != null)
table.deleteItem(strPartitionKey, next.get(strPartitionKey));
else if (strPartitionKey != null && strSortKey != null)
table.deleteItem(strPartitionKey, next.get(strPartitionKey), strSortKey, next.get(strSortKey));
}

Aggregate data in CSV file using Java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2

String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}

Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

TreeMap<DateTime, Object> is not sorting

I have TreeMap using the Joda DateTime object and is does not seem to be sorting here is the definition:
TreeMap<DateTime, HolderAnswer> dateTimeTreeMap = new TreeMap<DateTime, HolderAnswer>();
I added in the values as follows (I'm just using a generic sql statement here);
//then get previously selected answers to move to the top of the list
String sql = "Select ActionDT, RecID, TextID, Text, Value from Foo";
Cursor c = DataBaseConnector.query(sql);
if (c != null) {
if (c.moveToFirst()) {
do {
HolderAnswer answer = null;
boolean valueAlreadyIn = false;
DateTime dt = formatter.parseDateTime(c.getString(c.getColumnIndex("ActionDT")));
//we will be adding in the options in the next section, setting to null for now.
answer = new HolderAnswer(c.getInt(c.getColumnIndex("RecID")),c.getInt(c.getColumnIndex("TextID")),null,count,c.getString(c.getColumnIndex("Text")));
//////////////////////////////////////////////////////////////
Iterator<Entry<DateTime, HolderAnswer>> it = dateTimeTreeMap.entrySet().iterator();
while (it.hasNext()) {
Entry<DateTime, HolderAnswer> pairs = it.next();
HolderAnswer tempAnswer = (HolderAnswer) pairs.getValue();
DateTime tempDateTime = (DateTime) pairs.getKey();
//if answers match, transfer over options
if (answer.getTextID() == tempAnswer.getTextID()) {
valueAlreadyIn = true;
}
}
if (!valueAlreadyIn) {
dateTimeTreeMap.put(dt,answer);
}
//////////////////////////////////////////////////////////////////
//count++;
} while(c.moveToNext());
c.close();
c = null;
}
}
When I print out the values, they don't seem to be sorted, they come out in no discernable pattern. Even doing:
dateTimeTreeMap.descendingMap();
Does nothing. Am I missing something?

The descendingMap() method is used to return a reverse order view of the mappings contained in this map so it looks like you're forgetting to assign the sorted map to the original one.
dateTimeTreeMap = dateTimeTreeMap.descendingMap();

Solr Performance for many documents query

I want to have Solr always retrieve all documents found by a search (I know Solr wasn't built for that, but anyways) and I am currently doing this with this code:
...
List<Article> ret = new ArrayList<Article>();
QueryResponse response = solr.query(query);
int offset = 0;
int totalResults = (int) response.getResults().getNumFound();
List<Article> ret = new ArrayList<Article>((int) totalResults);
query.setRows(FETCH_SIZE);
while(offset < totalResults) {
//requires an int? wtf?
query.setStart((int) offset);
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
response = solr.query(query);
List<Article> current = response.getBeans(Article.class);
offset += current.size();
ret.addAll(current);
}
...
This works, but is pretty slow if a query gets over 1000 hits (I've read about that on here. This is being caused by Solr because I am setting the start everytime which - for some reason - takes some time). What would be a nicer (and faster) ways to do this?

To improve the suggested answer you could use a streamed response. This has been added especially for the case that one fetches all results. As you can see in Solr's Jira that guy wants to do the same as you do. This has been implemented for Solr 4.
This is also described in Solrj's javadoc.
Solr will pack the response and create a whole XML/JSON document before it starts sending the response. Then your client is required to unpack all that and offer it as a list to you. By using streaming and parallel processing, which you can do when using such a queued approach, the performance should improve further.
Yes, you will loose the automatic bean mapping, but as performance is a factor here, I think this is acceptable.
Here is a sample unit test:
public class StreamingTest {
#Test
public void streaming() throws SolrServerException, IOException, InterruptedException {
HttpSolrServer server = new HttpSolrServer("http://your-server");
SolrQuery tmpQuery = new SolrQuery("your query");
tmpQuery.setRows(Integer.MAX_VALUE);
final BlockingQueue<SolrDocument> tmpQueue = new LinkedBlockingQueue<SolrDocument>();
server.queryAndStreamResponse(tmpQuery, new MyCallbackHander(tmpQueue));
SolrDocument tmpDoc;
do {
tmpDoc = tmpQueue.take();
} while (!(tmpDoc instanceof PoisonDoc));
}
private class PoisonDoc extends SolrDocument {
// marker to finish queuing
}
private class MyCallbackHander extends StreamingResponseCallback {
private BlockingQueue<SolrDocument> queue;
private long currentPosition;
private long numFound;
public MyCallbackHander(BlockingQueue<SolrDocument> aQueue) {
queue = aQueue;
}
#Override
public void streamDocListInfo(long aNumFound, long aStart, Float aMaxScore) {
// called before start of streaming
// probably use for some statistics
currentPosition = aStart;
numFound = aNumFound;
if (numFound == 0) {
queue.add(new PoisonDoc());
}
}
#Override
public void streamSolrDocument(SolrDocument aDoc) {
currentPosition++;
System.out.println("adding doc " + currentPosition + " of " + numFound);
queue.add(aDoc);
if (currentPosition == numFound) {
queue.add(new PoisonDoc());
}
}
}
}

You might improve performance by increasing FETCH_SIZE. Since you are getting all the results, pagination doesn't make sense unless you are concerned with memory or some such. If 1000 results are liable to cause a memory overflow, I'd say your current performance seems pretty outstanding though.
So I would try getting everything at once, simplifying this to something like:
//WHOLE_BUNCHES is a constant representing a reasonable max number of docs we want to pull here.
//Integer.MAX_VALUE would probably invite an OutOfMemoryError, but that would be true of the
//implementation in the question anyway, since they were still being stored in the list at the end.
query.setRows(WHOLE_BUNCHES);
QueryResponse response = solr.query(query);
int totalResults = (int) response.getResults().getNumFound(); //If you even still need this figure.
List<Article> ret = response.getBeans(Article.class);
If you need to keep the pagination though:
You are performing this first query:
QueryResponse response = solr.query(query);
and are populating the number of found results from it, but you are not pulling any results with the response. Even if you keep pagination here, you could at least eliminate one extra query here.
This:
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
Is unnecessary. setRows specifies a Maximum number of rows to return, so asking for more than are available won't cause any problems.
Finally, apropos of nothing, but I have to ask: what argument would you expect setStart to take if not an int?

Use below logic to fetch solr data as batch to optimize performance of solr data fetch query:
public List<Map<String, Object>> getData(int id,Set<String> fields){
final int SOLR_QUERY_MAX_ROWS = 3;
long start = System.currentTimeMillis();
SolrQuery query = new SolrQuery();
String queryStr = "id:" + id;
LOG.info(queryStr);
query.setQuery(queryStr);
query.setRows(SOLR_QUERY_MAX_ROWS);
QueryResponse rsp = server.query(query, SolrRequest.METHOD.POST);
List<Map<String, Object>> mapList = null;
if (rsp != null) {
long total = rsp.getResults().getNumFound();
System.out.println("Total count found: " + total);
// Solr query batch
mapList = new ArrayList<Map<String, Object>>();
if (total <= SOLR_QUERY_MAX_ROWS) {
addAllData(mapList, rsp,fields);
} else {
int marker = SOLR_QUERY_MAX_ROWS;
do {
if (rsp != null) {
addAllData(mapList, rsp,fields);
}
query.setStart(marker);
rsp = server.query(query, SolrRequest.METHOD.POST);
marker = marker + SOLR_QUERY_MAX_ROWS;
} while (marker <= total);
}
}
long end = System.currentTimeMillis();
LOG.debug("SOLR Performance: getData: " + (end - start));
return mapList;
}
private void addAllData(List<Map<String, Object>> mapList, QueryResponse rsp,Set<String> fields) {
for (SolrDocument sdoc : rsp.getResults()) {
Map<String, Object> map = new HashMap<String, Object>();
for (String field : fields) {
map.put(field, sdoc.getFieldValue(field));
}
mapList.add(map);
}
}

Hibernate query restrictions using URL key/value style parameters

I'm using Tapestry5 and Hibernate. I'm trying to build a criteria query that uses dynamic restrictions generated from the URL. My URL context is designed like a key/value pair.
Example
www.mywebsite.com/make/ford/model/focus/year/2009
I decode the parameters as followed
private Map<String, String> queryParameters;
private List<Vehicle> vehicles;
void onActivate(EventContext context) {
//Count is 6 - make/ford/model/focus/year/2009
int count = context.getCount();
if (count > 0) {
int i;
for (i = 0; (i + 1) < count; i += 2) {
String name = context.get(String.class, i);
String value = context.get(String.class, i + 1);
example "make"
System.out.println("name " + name);
example "ford"
System.out.println("value " + value);
this.queryParameters.put(name, value);
}
}
this.vehicles = this.session.createCriteria(Vehicle.class)
...add dynamic restrictions.
}
I was hoping someone could help me to figure out how to dynamically add the list of restrictions to my query. I'm sure this has been done, so if anybody knows of a post, that would be helpful too. Thanks

Exactly as the other answer said, but here more spelt out. I think the crux of your question is really 'show me how to add a restriction'. That is my interpretation anyhow.
You need to decode each restriction into its own field.
You need to know the Java entity property name for each field.
Then build a Map of these 2 things, the key is the known static Java entity property name and the value is the URL decoded data (possibly with type conversion).
private Map<String, Object> queryParameters;
private List<Vehicle> vehicles;
void onActivate(EventContext context) {
//Count is 6 - make/ford/model/focus/year/2009
int count = context.getCount();
queryParameters = new HashMap<String,Object>();
if (count > 0) {
int i;
for (i = 0; (i + 1) < count; i += 2) {
String name = context.get(String.class, i);
String value = context.get(String.class, i + 1);
Object sqlValue = value;
if("foobar".equals(name)) {
// sometime you don't want a String type for SQL compasition
// so convert it
sqlValue = UtilityClass.doTypeConversionForFoobar(value);
} else if("search".equals(name) ||
"model".equals(name) ||
"year".equals(name)) {
// no-op this is valid 'name'
} else if("make".equals(name)) {
// this is a suggestion depends on your project conf
name = "vehicleMake.name";
} else {
continue; // ignore values we did not expect
}
// FIXME: You should validate all 'name' values
// to be valid and/or convert to Java property names here
System.out.println("name " + name);
System.out.println("value " + value);
this.queryParameters.put(name, sqlValue);
}
}
Criteria crit = this.session.createCriteria(Vehicle.class)
for(Map.Entry<String,Object> e : this.queryParameters.entrySet()) {
String n = e.getKey();
Object v = e.getValue();
// Sometimes you don't want a direct compare 'Restructions.eq()'
if("search".equals(n))
crit.add(Restrictions.like(n, "%" + v + "%"));
else // Most of the time you do
crit.add(Restrictions.eq(n, v));
}
this.vehicles = crit.list(); // run query
}
See also https://docs.jboss.org/hibernate/orm/3.5/reference/en/html/querycriteria.html
With the above there should be no risk of SQL injection, since the "name" and "n" part should be 100% validated against a known good list. The "value" and "v" is correctly escaped, just like using SQL position placeholder '?'.
E&OE

I would assume you would just loop over the parameters Map and add a Restriction for each pair.
Be aware that this will open you up to sql injection attacks if you are not careful. the easiest way to protect against this would be to check the keys against the known Vehicle properties before adding to the Criteria.

Another option would be to create an example query by building an object from the name/value pairs:
Vehicle vehicle = new Vehicle();
int count = context.getCount();
int i;
for (i = 0; (i + 1) < count; i += 2) {
String name = context.get(String.class, i);
String value = context.get(String.class, i + 1);
// This will call the setter for the name, passing the value
// So if name is 'make' and value is 'ford', it will call vehicle.setMake('ford')
BeantUtils.setProperty(vehicle, name, value);
}
// This is using a Hibernate example query:
vehicles = session.createCriteria(Vehicle.class).add(Example.create(vehicle)).list();
See BeanUtils.setProperty and Example Queries for more info.
That assumes you are allowing only one value per property and that the query parameters map to the property names correctly. There may also be conversion issues to think about but I think setProperty handles the common ones.

If they are query paramaters you should treat them as query parameters instead of path parameters. Your URL should look something like:
www.mywebsite.com/vehicles?make=ford&model=focus&year=2009
and your code should look something like this:
public class Vehicles {
#ActivationRequestParameter
private String make;
#ActivationRequestParameter
private String model;
#ActivationRequestParameter
private String year;
#Inject
private Session session;
#OnEvent(EventConstants.ACTIVATE)
void activate() {
Criteria criteria = session.createCriteria(Vehicle.class);
if (make != null) criteria.add(Restrictions.eq("make", make));
if (model != null) criteria.add(Restrictions.eq("model", model));
if (year != null) criteria.add(Restrictions.eq("year", year));
vehicles = criteria.list();
}
}
Assuming you are using the Grid component to display the vehicles I'd highly recommend using the HibernateGridDataSource instead of making the query in the "activate" event handler.
public class Vehicles {
#ActivationRequestParameter
private String make;
#ActivationRequestParameter
private String model;
#ActivationRequestParameter
private String year;
#Inject
private Session session;
#OnEvent(EventConstants.ACTIVATE)
void activate() {
}
public GridDataSource getVehicles() {
return new HibernateGridDataSource(session, Vehicles.class) {
#Override
protected void applyAdditionalConstraints(Criteria criteria) {
if (make != null) criteria.add(Restrictions.eq("make", make));
if (model != null) criteria.add(Restrictions.eq("model", model));
if (year != null) criteria.add(Restrictions.eq("year", year));
}
};
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Paging in cassandra through hector API for user defined queries - java

Is it possible to achieve paging in cassandra through hector API for user defined queries. If yes, how?

Related

java delete all items in dynamodb

Aggregate data in CSV file using Java

TreeMap<DateTime, Object> is not sorting

Solr Performance for many documents query

Hibernate query restrictions using URL key/value style parameters

Categories

Resources