I'm fairly new to Amazon's AWS and its API for Java, so I'm not exactly sure what the most efficient method for what I'm trying to do would be. Basically, I'm trying to setup a database that will store a project's ID, it's status, as well as the bucket and location when uploaded to an S3 bucket by a user. What I'm having trouble with is getting a list of all the project IDs that have a status of "ready" under the status attribute. Any projects that are of status "ready" need to have their ID numbers loaded to an array or arraylist for later reference. Any recommendations?
The way to do this is to use the scan API. However, this means dynamo will need to look at every item in your table, and check if its attribute "status" is equal to "ready". The cost of this operation will be large, and will charge you for reading every item in your table.
The code would look something like this:
Condition scanFilterCondition = new Condition()
.withComparisonOperator(ComparisonOperator.EQ.toString())
.withAttributeValueList(new AttributeValue().withS("ready"));
Map<String, Condition> conditions = new HashMap<String, Condition>();
conditions.put("status", scanFilterCondition);
ScanRequest scanRequest = new ScanRequest()
.withTableName("MasterProductTable")
.withScanFilter(conditions);
ScanResult result = client.scan(scanRequest);
There is a way to make this better, though it requires denormalizing your data. Try keeping a second table with a hash key of "status", and a range key of "project ID". This is in addition to your existing table. This would allow you to use the Query API (scan's much cheaper cousin), and ask it for all items with a hash key of "ready". This will get you a list of the project IDs you need, and you can then get them from the project ID table you already have.
The code for this would look something like:
QueryRequest queryRequest = new QueryRequest()
.withTableName("ProductByStatus")
.withHashKeyValue(new AttributeValue().withS("ready"));
QueryResult result = client.query(queryRequest);
The downside to this approach is you have to update two tables whenever you update the status field, and you have to make sure that you keep them in sync. Dynamo doesn't offer transactionality, so you have to be ready for the case where the update to the master project table succeeds, but your secondary status table doesn't. Or vice-versa.
For further reference: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html
Related
I am using Datastore in Firestore mode for my Google App Engine app. I know how to store list/array property values in the Google Cloud Datastore. But how do I update these values (ex: add new values to the list?) I could not find an example in the documentation.
This is how you would add a list property to the datastore initially:
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Entity user = new Entity("User");
List<String> items = new ArrayList<String>();
user.setProperty("ItemsList", items);
datastore.put(user);
But what do I do if later on, I want to access an User entity's list of items and add an item to that list?
Thanks for the clarification. Now I understand that you want to be able to just add things to that list instead of overwriting the whole list.
Reading at the documentation for Datastore I can see that you can't just update a property.
To update an existing entity, modify the properties of the entity
previously retrieved and store it using the key
I your case you would do something like retrieve the data of the list, then append the new element or update something in that list and then update the whole list again like:
Entity task = Entity.newBuilder(datastore.get(user)).set("ItemsList", items).build();
datastore.update(user);
I have to check for changes in an old embedded DBF database which is populated by an old third-party application. I don't have access to source code of that application and cannot put trigger or whatever on the database. For business constraint I cannot change that...
My objective is to capture new records, deleted records and modified records from a table (~1500 records) of that database with a Java application for further processes. The database is accessible in my Spring application though JPA/Hibernate with HXTT DBF driver.
I am looking now for a way to efficiently capture changes made by the third-party app in the database.
Do I have to periodically read the whole table and check if each record is still unchanged or to apply any kind of diff within two readings? Is there a kind of "trigger" I can set in my Java app? How to listen properly for those changes?
There is no JPA mechanism for getting callbacks from a database when the data changes.
The only options is to build your own change detection. Typically you would start by detecting which entities were added, removed, and which still exists. For the once that still exist you will need to check if they are changed, so the entity needs an equals() method.
An entity is identified by it primary key, so you will need to get the set of all primary keys, once you have that you can easily use Guava's Sets methods to produce the 3 sets of added, removed, and existing (before and now), like this.
List<MyEntity> old = new ArrayList<>(); // load from the DB last time
List<MyEntity> current = new ArrayList<>(); // loaded from DB now
Map<Long, MyEntity> oldMap = old.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity() ));
Map<Long, MyEntity> currentMap = current.stream().collect(Collectors.toMap(MyEntity::getId, Function.<MyEntity>identity()));
Set<Long> oldKeys = oldMap.keySet();
Set<Long> currentKeys = currentMap.keySet();
Sets.SetView<Long> deletedKeys = Sets.difference(oldKeys, currentKeys);
Sets.SetView<Long> addedKeys = Sets.difference(currentKeys, oldKeys);
Sets.SetView<Long> couldBeChanged = Sets.intersection(oldKeys, currentKeys);
for (Long id : couldBeChanged) {
if (oldMap.get(id).equals(currentMap.get(id))) {
// entity with this id was changed
}
}
I'm new to DynamoDb and I'm struggling to work out how to do this (using the java sdk).
I currently have a table (in mongo) for notifications. The schema is basically as follows (I've simplified it)
id: string
notifiedUsers: [123, 345, 456, 567]
message: "this is a message"
created: 12345678000 (epoch millis)
I wanted to migrate to Dynamodb, but I can't work out the best way to select all notifications that went to a particular user after a certain date?
I gather I can't have an index on a list like notifiedUsers, therefore I can't use a query in this case - is that correct?
I'd prefer not to scan and then filter, there could be a lot of records.
Is there a way to do this using a query or another approach?
EDIT
This is what I'm trying now, it's not working and I'm not sure where to take it (if anywhere).
Condition rangeKeyCondition = new Condition()
.withComparisonOperator(ComparisonOperator.CONTAINS.toString())
.withAttributeValueList(new AttributeValue().withS(userId));
if(startTimestamp != null) {
rangeKeyCondition = rangeKeyCondition.withComparisonOperator(ComparisonOperator.GT.toString())
.withAttributeValueList(new AttributeValue().withS(startTimestamp));
}
NotificationFeedDynamoRecord replyKey = new NotificationFeedDynamoRecord();
replyKey.setId(partitionKey);
DynamoDBQueryExpression<NotificationFeedDynamoRecord> queryExpression = new DynamoDBQueryExpression<NotificationFeedDynamoRecord>()
.withHashKeyValues(replyKey)
.withRangeKeyCondition(NOTIFICATIONS, rangeKeyCondition);
In case anyone else comes across this question, in the end we flattened the schema, so that there is now a record per userId. This has lead to problems because it's not possible with dynamoDb to atomically batch write records. With the original schema we had one record, and could write it atomically ensuring that all users got that notification. Now we cannot be certain, and this is causing pain.
I am trying to use the new Amazon DynamoDB JSON API to add/overwrite key-value pairs in a JSON attribute called "document". Ideally, I would like simply to structure my write calls to send the KV pairs to add to the attribute, and have Dynamo create the attribute if it does not already exist for the given primary key. However if I try this with just a straightforward UpdateItemSpec:
PrimaryKey primaryKey = new PrimaryKey("key_str", "mapKey");
ValueMap valuesMap = new ValueMap().withLong(":a", 1234L).withLong(":b", 1234L);
UpdateItemSpec updateSpec = new UpdateItemSpec().withPrimaryKey(primaryKey).withUpdateExpression("SET document.value1 = :a, document.value2 = :b");
updateSpec.withValueMap(valuesMap);
table.updateItem(updateSpec);
I get com.amazonaws.AmazonServiceException: The document path provided in the update expression is invalid for update, meaning DynamoDB could not find the given attribute named "document" to which to apply the update.
I managed to approximate this functionality with the following series of calls:
try {
// 1. Attempt UpdateItemSpec as if attribute already exists
} catch (AmazonServiceException e) {
// 2. Confirm the exception indicated the attribute was not present, otherwise rethrow it
// 3. Use a put-if-absent request to initialize an empty JSON map at the attribute "document"
// 4. Rerun the UpdateItemSpec call from the above try block
}
This works, but is less than ideal as it will require 3 calls to DynamoDB every time I add a new primary key to the table. I experimented a bit with the attribute_not_exists function that can be used in Update Expressions, but wasn't able to get it to work in the way I want.
Any Dynamo gurus out there have any ideas on how/whether this can be done?
I received an answer from Amazon Support that it is not actually possible to accomplish this with a single call. They did suggest to reduce the number of calls when adding the attribute for a new primary key from 3 to 2, by using the desired JSON map in the put-if-absent request rather than an empty map.
Im trying to query a Dynamodb table using a secondary global index and I'm getting java.lang.IllegalArgumentException: Illegal query expression: No hash key condition is found in the query. All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key. The timestamp is not part of a key or range key, so i created a global index for it.
Does anyone have a clue what i might be missing?
Table Definition:
{
AttributeDefinitions:[
{
AttributeName:timestamp,
AttributeType:N
},
{
AttributeName:url,
AttributeType:S
}
],
TableName:SitePageIndexed,
KeySchema:[
{
AttributeName:url,
KeyType:HASH
}
],
TableStatus:ACTIVE,
CreationDateTime: Mon May 12 18:45:57 EDT 2014,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
TableSizeBytes:0,
ItemCount:0,
GlobalSecondaryIndexes:[
{
IndexName:TimestampIndex,
KeySchema:[
{
AttributeName:timestamp,
KeyType:HASH
}
],
Projection:{
ProjectionType:ALL,
},
IndexStatus:ACTIVE,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
IndexSizeBytes:0,
ItemCount:0
}
]
}
Code
Condition condition1 = new Condition().withComparisonOperator(ComparisonOperator.GE).withAttributeValueList(new AttributeValue().withN(Long.toString(start)));
DynamoDBQueryExpression<SitePageIndexed> exp = new DynamoDBQueryExpression<SitePageIndexed>().withRangeKeyCondition("timestamp", condition1);
exp.setScanIndexForward(true);
exp.setLimit(100);
exp.setIndexName("TimestampIndex");
PaginatedQueryList<SitePageIndexed> queryList = client.query(SitePageIndexed.class,exp);
All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key.
This is not how Global Secondary Indexes (GSI) on Amazon DynamoDB work. To query a GSI you must specify a value for its hash key and then you may filter/sort by the range key -- just like you'd do with the primary key. This is exactly what the exception is trying to tell you, and also what you will find on the documentation page for the Query API:
A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.
Think of a GSI as just another key that behaves almost exactly like the primary key (the main differences being that it is updated asynchronously, and you can only perform eventually consistent reads on GSIs).
Please refer to the Amazon DynamoDB Global Secondary Index documentation page for guidelines and best practices when creating GSIs: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
One possible way to achieve what you want would be to have a dummy attribute constrained to a finite, small set of possible values, create a GSI with hash key on that dummy attribute and range key on your timestamp. When querying, you would need to issue one Query API call for each possible value on your dummy hash key attribute, and then consolidate the results on your application. By constraining the dummy attribute to a singleton (i.e., a Set with a single element, i.e., a constant value), you can send only one Query API call and you get your result dataset directly -- but keep in mind that this will cause you problems related to hot partitions and you might have performance issues! Again, refer to the document linked above to learn the best practices and some patterns.
It is possible to query DynamoDb with only the GSI; could be confirmed by going to the web interaface Query/Index.
Programatically the way it is done is as following:
DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient(
new ProfileCredentialsProvider()));
Table table = dynamoDB.getTable("WeatherData");
Index index = table.getIndex("PrecipIndex");
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#d = :v_date and Precipitation = :v_precip")
.withNameMap(new NameMap()
.with("#d", "Date"))
.withValueMap(new ValueMap()
.withString(":v_date","2013-08-10")
.withNumber(":v_precip",0));
ItemCollection<QueryOutcome> items = index.query(spec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
System.out.println(iter.next().toJSONPretty());
}
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSIJavaDocumentAPI.html#GSIJavaDocumentAPI.QueryAnIndex
For doing it with DynamoDBMapper see: How to query a Dynamo DB having a GSI with only hashKeys using DynamoDBMapper
Here is how you can query in java with only GSI
Map<String, AttributeValue> eav = new HashMap<String, AttributeValue>();
eav.put(":val1", new AttributeValue().withS("PROCESSED"));
DynamoDBQueryExpression<Package> queryExpression = new DynamoDBQueryExpression<Package>()
.withIndexName("<your globalsecondaryindex key name>")
.withKeyConditionExpression("your_gsi_column_name= :val1").
withExpressionAttributeValues(eav).withConsistentRead(false).withLimit(2);
QueryResultPage<T> scanPage = dbMapper.queryPage(T.class, queryExpression);
While this is not the correct answer per say, could you possible accomplish this with a scan vs. a query? It's much more expensive, but could be a solution.