DynamoDb pagination Query in java - java

I am new to dynamo db. I have to implement pagination. I have to show ten records in my html page. I am completely new to dynamo db. Can any one share any sample query for pagination in dynamo db. I have studied amazon dynamo db tutorial but i did not get any idea.
Can i implement pagination using highlevel and lowlevel api? can any one suggest where to start??

As yegor256 suggested, you could use query(QueryRequest) or scan(ScanRequest) with setExclusiveStartKey instead. Here's a code snippet of how to do it
HashMap<String, Condition> scanFilter = new HashMap<String, Condition>();
Condition condition = new Condition()
.withComparisonOperator(ComparisonOperator.LT.toString())
.withAttributeValueList(new AttributeValue().withN("100"));
scanFilter.put("column1", condition);
Boolean lastEval = true;
int count = 0;
ScanRequest scanRequest = new ScanRequest(tableName).withScanFilter(scanFilter);
while(lastEval) {
ScanResult scanResult = dynamoDB.scan(scanRequest);
count += scanResult.getCount();
System.out.println("Page Size: " + scanResult.getCount());
System.out.println("Total count = " + count);
if (scanResult.getLastEvaluatedKey() != null)
lastEval = scanResult.getLastEvaluatedKey().isEmpty() == false;
else
lastEval = false;
if (lastEval) {
scanRequest.setExclusiveStartKey(scanResult.getLastEvaluatedKey());
}
}

You should use query(QueryRequest) or scan(ScanRequest) with addExclusiveStartKeyEntry()
Also, check this library: jcabi-dynamo

Related

java delete all items in dynamodb

Im trying to delete all items in my table in dynamodb but it does not work.
try {
ScanRequest scanRequest = new ScanRequest().withTableName(table);
ScanResult scanResult = null;
do {
if (Check.nonNull(scanResult)) {
scanRequest.setExclusiveStartKey(scanResult.getLastEvaluatedKey());
}
scanResult = client.scan(scanRequest);
scanResult.getItems().forEach((Item) -> {
String n1 = Item.get("n1").toString();
String n2 = tem.get("n2").toString();
DeleteItemSpec spec = new DeleteItemSpec().withPrimaryKey("n1", n1, "n2", n2);
dynamodb.getTable(table).deleteItem(spec);
});
} while (Check.nonNull(scanResult.getLastEvaluatedKey()));
} catch (Exception e) {
throw new BadRequestException(e);
}
n1 is my Primary partition key
n2 is my Primary sort key
The best approach to delete all the items from DynamoDB is to drop the table and recreate it.
Otherwise, there are lot of read capacity and write capacity units being used which will cost you.
Dropping and recreating the table is the best approach.
PREAMBLE: While a scan operation is expensive, I was needing this answer for initialising a table for a test scenario (low volume). The table was being created by another process and I needed the test scenario on that table, I could therefore not delete and recreate the table.
ANSWER:
given:
DynamoDbClient db
static String TABLE_NAME
static String HASH_KEY
static String SORT_KEY
ScanIterable scanIterable = db.scanPaginator(ScanRequest.builder()
.tableName(TABLE_NAME)
.build());
for(ScanResponse scanResponse:scanIterable){
for( Map<String, AttributeValue> item: scanResponse.items()){
Map<String,AttributeValue> deleteKey = new HashMap<>();
deleteKey.put(HASH_KEY,item.get(HASH_KEY));
deleteKey.put(SORT_KEY,item.get(SORT_KEY));
db.deleteItem(DeleteItemRequest.builder()
.tableName(TRANSACTION_TABLE_NAME)
.key(deleteKey).build());
}
}
To delete all the items from the table first you need to perform scan operation over the table which will results you an scanoutcome. Using the iterator loop over the sacnoutcome with the primary key and it's primary key value.This will be one of the approach to delete all the items from the table. Hope that this code will work you. Thanks
Table table = dynamoDB.getTable(your_table);
ItemCollection<ScanOutcome> deleteoutcome = table.scan();
Iterator<Item> iterator = deleteoutcome.iterator();
while (iterator.hasNext()) {
your_table.deleteItem("PrimaryKey", iterator.next().get("primary key value"));
}
//May be we can make it look generic by reading key schema first as below
String strPartitionKey = null;
String strSortKey = null;
TableDescription description = table.describe();
List<KeySchemaElement> schema = description.getKeySchema();
for (KeySchemaElement element : schema) {
if (element.getKeyType().equalsIgnoreCase("HASH"))
strPartitionKey = element.getAttributeName();
if (element.getKeyType().equalsIgnoreCase("RANGE"))
strSortKey = element.getAttributeName();
}
ItemCollection<ScanOutcome> deleteoutcome = table.scan();
Iterator<Item> iterator = deleteoutcome.iterator();
while (iterator.hasNext()) {
Item next = iterator.next();
if (strSortKey == null && strPartitionKey != null)
table.deleteItem(strPartitionKey, next.get(strPartitionKey));
else if (strPartitionKey != null && strSortKey != null)
table.deleteItem(strPartitionKey, next.get(strPartitionKey), strSortKey, next.get(strSortKey));
}

Streaming the result of REST API from Twitter

I'm working on the REST API of Twitter using Twitter4J libraries, particularly on the https://api.twitter.com/1.1/search/tweets.json endpoint. I am quite aware of Twitter's own Streaming API, but I don't want to use that (at least for now). I have a method that queries the /search/tweets endpoint by a do-while loop, but I want the method's return to be in streaming fashion, so that I can print the results in the console simultaneously, instead of loading everything all at once. Here's the method:
public List<Status> readTweets(String inputQuery) {
List<Status> tweets = new ArrayList<Status>();
int counter = 0;
try {
RateLimitStatus rateLimit = twitter.getRateLimitStatus().get("/search/tweets");
int limit = rateLimit.getLimit();
Query query = new Query(inputQuery);
QueryResult result;
do {
result = twitter.search(query);
tweets.addAll(result.getTweets());
counter++;
} while ((query = result.nextQuery()) != null && counter < (limit - 1));
} catch (TwitterException e) {
e.printStackTrace();
System.out.println("Failed to search tweets: " + e.getMessage());
tweets = null;
}
return tweets;
}
What can you suggest?
P.S. I don't want to put the console printing functionality inside this method.
Thanks.

Solr Performance for many documents query

I want to have Solr always retrieve all documents found by a search (I know Solr wasn't built for that, but anyways) and I am currently doing this with this code:
...
List<Article> ret = new ArrayList<Article>();
QueryResponse response = solr.query(query);
int offset = 0;
int totalResults = (int) response.getResults().getNumFound();
List<Article> ret = new ArrayList<Article>((int) totalResults);
query.setRows(FETCH_SIZE);
while(offset < totalResults) {
//requires an int? wtf?
query.setStart((int) offset);
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
response = solr.query(query);
List<Article> current = response.getBeans(Article.class);
offset += current.size();
ret.addAll(current);
}
...
This works, but is pretty slow if a query gets over 1000 hits (I've read about that on here. This is being caused by Solr because I am setting the start everytime which - for some reason - takes some time). What would be a nicer (and faster) ways to do this?
To improve the suggested answer you could use a streamed response. This has been added especially for the case that one fetches all results. As you can see in Solr's Jira that guy wants to do the same as you do. This has been implemented for Solr 4.
This is also described in Solrj's javadoc.
Solr will pack the response and create a whole XML/JSON document before it starts sending the response. Then your client is required to unpack all that and offer it as a list to you. By using streaming and parallel processing, which you can do when using such a queued approach, the performance should improve further.
Yes, you will loose the automatic bean mapping, but as performance is a factor here, I think this is acceptable.
Here is a sample unit test:
public class StreamingTest {
#Test
public void streaming() throws SolrServerException, IOException, InterruptedException {
HttpSolrServer server = new HttpSolrServer("http://your-server");
SolrQuery tmpQuery = new SolrQuery("your query");
tmpQuery.setRows(Integer.MAX_VALUE);
final BlockingQueue<SolrDocument> tmpQueue = new LinkedBlockingQueue<SolrDocument>();
server.queryAndStreamResponse(tmpQuery, new MyCallbackHander(tmpQueue));
SolrDocument tmpDoc;
do {
tmpDoc = tmpQueue.take();
} while (!(tmpDoc instanceof PoisonDoc));
}
private class PoisonDoc extends SolrDocument {
// marker to finish queuing
}
private class MyCallbackHander extends StreamingResponseCallback {
private BlockingQueue<SolrDocument> queue;
private long currentPosition;
private long numFound;
public MyCallbackHander(BlockingQueue<SolrDocument> aQueue) {
queue = aQueue;
}
#Override
public void streamDocListInfo(long aNumFound, long aStart, Float aMaxScore) {
// called before start of streaming
// probably use for some statistics
currentPosition = aStart;
numFound = aNumFound;
if (numFound == 0) {
queue.add(new PoisonDoc());
}
}
#Override
public void streamSolrDocument(SolrDocument aDoc) {
currentPosition++;
System.out.println("adding doc " + currentPosition + " of " + numFound);
queue.add(aDoc);
if (currentPosition == numFound) {
queue.add(new PoisonDoc());
}
}
}
}
You might improve performance by increasing FETCH_SIZE. Since you are getting all the results, pagination doesn't make sense unless you are concerned with memory or some such. If 1000 results are liable to cause a memory overflow, I'd say your current performance seems pretty outstanding though.
So I would try getting everything at once, simplifying this to something like:
//WHOLE_BUNCHES is a constant representing a reasonable max number of docs we want to pull here.
//Integer.MAX_VALUE would probably invite an OutOfMemoryError, but that would be true of the
//implementation in the question anyway, since they were still being stored in the list at the end.
query.setRows(WHOLE_BUNCHES);
QueryResponse response = solr.query(query);
int totalResults = (int) response.getResults().getNumFound(); //If you even still need this figure.
List<Article> ret = response.getBeans(Article.class);
If you need to keep the pagination though:
You are performing this first query:
QueryResponse response = solr.query(query);
and are populating the number of found results from it, but you are not pulling any results with the response. Even if you keep pagination here, you could at least eliminate one extra query here.
This:
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
Is unnecessary. setRows specifies a Maximum number of rows to return, so asking for more than are available won't cause any problems.
Finally, apropos of nothing, but I have to ask: what argument would you expect setStart to take if not an int?
Use below logic to fetch solr data as batch to optimize performance of solr data fetch query:
public List<Map<String, Object>> getData(int id,Set<String> fields){
final int SOLR_QUERY_MAX_ROWS = 3;
long start = System.currentTimeMillis();
SolrQuery query = new SolrQuery();
String queryStr = "id:" + id;
LOG.info(queryStr);
query.setQuery(queryStr);
query.setRows(SOLR_QUERY_MAX_ROWS);
QueryResponse rsp = server.query(query, SolrRequest.METHOD.POST);
List<Map<String, Object>> mapList = null;
if (rsp != null) {
long total = rsp.getResults().getNumFound();
System.out.println("Total count found: " + total);
// Solr query batch
mapList = new ArrayList<Map<String, Object>>();
if (total <= SOLR_QUERY_MAX_ROWS) {
addAllData(mapList, rsp,fields);
} else {
int marker = SOLR_QUERY_MAX_ROWS;
do {
if (rsp != null) {
addAllData(mapList, rsp,fields);
}
query.setStart(marker);
rsp = server.query(query, SolrRequest.METHOD.POST);
marker = marker + SOLR_QUERY_MAX_ROWS;
} while (marker <= total);
}
}
long end = System.currentTimeMillis();
LOG.debug("SOLR Performance: getData: " + (end - start));
return mapList;
}
private void addAllData(List<Map<String, Object>> mapList, QueryResponse rsp,Set<String> fields) {
for (SolrDocument sdoc : rsp.getResults()) {
Map<String, Object> map = new HashMap<String, Object>();
for (String field : fields) {
map.put(field, sdoc.getFieldValue(field));
}
mapList.add(map);
}
}

Paging in cassandra through hector API for user defined queries

Is it possible to achieve paging in cassandra through hector API for user defined queries.
If yes, how?
I have added a basic method, rest of things you have to handle. Here, as you can see we have defined page size as 100 rows, each having 10 columns. Now after the first iteration you have to some how store the last key value, which will the starting point for the iteration.
int row_count = 100;
RangeSlicesQuery<UUID, String, Long> rangeSlicesQuery = HFactory
.createRangeSlicesQuery(keyspace, UUIDSerializer.get(), StringSerializer.get(), LongSerializer.get())
.setColumnFamily("Column Family")
.setRange(null, null, false, 10)
.setRowCount(row_count);
UUID last_key = null;
while (true) {
rangeSlicesQuery.setKeys(last_key, null);
System.out.println(" > " + last_key);
QueryResult<OrderedRows<UUID, String, Long>> result = rangeSlicesQuery.execute();
OrderedRows<UUID, String, Long> rows = result.get();
Iterator<Row<UUID, String, Long>> rowsIterator = rows.iterator();
if (last_key != null && rowsIterator != null) rowsIterator.next();
while (rowsIterator.hasNext()) {
Row<UUID, String, Long> row = rowsIterator.next();
last_key = row.getKey();
if (row.getColumnSlice().getColumns().isEmpty()) {
continue;
}
}
}
}

Get All Result in Solr with Solrj

I want to get all result with solrj, I add 10 document to Solr, I don't get any exception, but if I add more than 10 document to SolrI get exception. I search that, I get this exception for this, in http://localhost:8983/solr/browse 10 document in first page,11th document go to second page. How I can get all result?
String qry="*:*";
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(new SolrQuery(qry));
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
Integer start = 0;
query.setStart(start);
QueryResponse response = server.query(query);
SolrDocumentList rs = response.getResults();
long numFound = rs.getNumFound();
int current = 0;
while (current < numFound) {
ListIterator<SolrDocument> iter = rs.listIterator();
while (iter.hasNext()) {
current++;
System.out.println("************************************************************** " + current + " " + numFound);
SolrDocument doc = iter.next();
Map<String, Collection<Object>> values = doc.getFieldValuesMap();
Iterator<String> names = doc.getFieldNames().iterator();
while (names.hasNext()) {
String name = names.next();
System.out.print(name);
System.out.print(" = ");
Collection<Object> vals = values.get(name);
Iterator<Object> valsIter = vals.iterator();
while (valsIter.hasNext()) {
Object obj = valsIter.next();
System.out.println(obj.toString());
}
}
}
query.setStart(current);
response = server.query(query);
rs = response.getResults();
numFound = rs.getNumFound();
}
}
An easier way:
CloudSolrServer server = new CloudSolrServer(solrZKServerUrl);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE);
QueryResponse rsp;
rsp = server.query(query, METHOD.POST);
SolrDocumentList docs = rsp.getResults();
for (SolrDocument doc : docs) {
Collection<String> fieldNames = doc.getFieldNames();
for (String s: fieldNames) {
System.out.println(doc.getFieldValue(s));
}
}
numFound gives you the total number of results that matched the Query.
However, by default Solr will return only top 10 results which is controlled by parameter rows.
You are trying to iterate over numFound, However as the results returned are only 10 it fails.
You should use the rows parameter for Iteration.
For getting the next set of results, you would need to requery Solr with a different start parameter. This is to support pagination so that you don't have to pull all the results at one go which is a very heavy operation.
If you refactor your code like this it will work
String qry="*:*";
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE); //Add me to avoid IndexOutOfBoundExc
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(query);
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
The answer to why it's quite simple.
The response is telling you that there are getNumFound() matching documents,
but if you do not specify in your query how many of them the response must carry, this limit is automatically setted to 10,
ending up
fetching only the top 10 documents out of getNumFound() documents found
For this reason the docs list will have just 10 elements and trying to do the get of the i-th elementh with i > 9 (Eg 10) will take you to a
java.lang.IndexOutOfBoundsException
just like you are experimenting.
P.S i suggest you to use the for iterator just like #Chen Sheng-Lun did.
P.P.S at first this drove me crazy too.

Categories