How to use begins_with in DynamoDBMapper BatchLoad - java

I am trying to perform batch get operation on DynamoDB using DynamoDBMapper.batchLoad() on a Table having composite primary key where I know the set of HashKey values but not the RangeKey value. Regarding RangeKey Value only information I only know character sequence with which they start with like if sequence says "test" then RangeKey value will be something like "test1243".
To solve this problem dynamodb support begins_with caluse but on query operation. How can I use the same begins_with clasue in BatchGet Operation.

You can only use the begins_with operator with queries. When you call GetItem or BatchGetItem you must specify the whole primary key (partition key + sort key if present) of the items you wish to retrieve so the begins_with operator is not useful.
You should just run queries in parallel, one for each of the hash keys you need to get the records for.

Related

Query max attribute value in DynamoDB with Java

Given my DynamoDB has a column of 'BlockNumber', how do I write the Java QuerySpec to find the MAX block number in the DB? (It is configured as a GSI.)
Typically, your GSI would have a partition key and a sort key, just like a regular DynamoDB table. You would issue a query against a known partition key and set ScanIndexForward=false and Limit=1, so it would return one item only, and it would be the item with a matching partition key and the maximum value of the sort key. When ScanIndexForward is false, DynamoDB reads the items in reverse order by sort key value.
If the data is immutable the best option for this is to have a separate record that holds aggregate values. Whenever you add an item that may change the max value you would update the aggregate record. The best approach to this is to use DynamoDB streams to perform the updates to the aggregate record. Using Global Secondary Indexes for Materialized Aggregation Queries

How do I query for a partition keys that contain a specific substring in dynamoDb?

I have a partitionKey that is made up 2 strings for e.g. userId:UserName. For e.g 1234:John, 4567:Mark etc. I want to query for all the records that match the substring defined by UserName for e.g. Find all the records that contain "Mark" in the partition key. How do I do this using DynamoDb APIs in Java?
Hopefully this is not something that you have to do frequently.
DynamoDB does not support querying by partial hash-key. You would have to use a table scan to iterate over all elements in the table and compare each one for matches.
This is highly inefficient and if you find yourself depending on this type of behavior then you have to revisit your choice of hash-key and your over-all design choices.
For the sake of completeness, the code you're looking for is along the following lines if you're using the Document API:
// dynamo returns results in chunks - you'll need this to get the next one
Map<String, AttributeValue> lastKeyEvaluated = null;
do {
ScanRequest scanRequest = new ScanRequest()
.withTableName("YourTableNameHere")
.withExclusiveStartKey(lastKeyEvaluated);
ScanResult result = client.scan(scanRequest);
for (Map<String, AttributeValue> item : result.getItems()){
// for each item in the result set, examine the partition key
// to determine if it's a match
string key = item.get("YourPartitionKeyAttributeNameHere").getS();
if (key.startsWith("Mark"))
System.out.println("Found an item that matches *:Mark:\n" + item);
}
lastKeyEvaluated = result.getLastEvaluatedKey();
} while (lastKeyEvaluated != null);
But before you implement something like this in your application consider choosing a different partition key strategy, or creating a secondary index for your table, or both - if you need to make this type of query often!
As a side note, I'm curious, what benefit do you get by including both user id and user name in the partition key? The user id would, presumably, be unique for you so why the user name?
You can't do this as you've described in a cost efficient manner. You'll need to scan the table, which is expensive and time consuming.
Revisit your choice of key so you are always running queries against full key values instead of substrings.
You might want to consider using a range key - when including a range key, queries can be efficiently run against either just the hash key (returning potentially multiple values), or the combination of hash key/range key (which must be unique).
In this example, if you're always querying on either userId:userName or userName (but not userId by itself), then using userName as hash key and userId as range key is a simple and efficient solution.

Inserting a New UpdatableRecord and Receiving Error on Duplicate Primary Keys

I'm trying to insert a new record using UpdatableRecords in jOOQ 3.4.2. The pattern is extremely concise and pleasant to use, except that the INSERT reads null values as no value and ignores default values or a generated index. How can I use the UpdatableRecord to do an insert that respects default values and generated indexes?
Here's my table:
CREATE TABLE aragorn_sys.org_person (
org_person_id SERIAL NOT NULL,
first_name CHARACTER VARYING(128) NOT NULL,
last_name CHARACTER VARYING(128) NOT NULL,
created_time TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp NOT NULL,
created_by_user_id INTEGER,
last_modified_time TIMESTAMP WITH TIME ZONE,
last_modified_by_user_id INTEGER,
org_id INTEGER NOT NULL,
CONSTRAINT PK_org_person PRIMARY KEY (org_person_id)
);
Note my primary key and default values. Now here's my jOOQ code:
// orgPerson represents a POJO filled with my values to be inserted and null for everything else
// Note that orgPerson.orgPersonId is null
OrgPersonRecord orgPersonRecord = create.newRecord( ORG_PERSON, orgPerson );
Integer orgPersonId = create.executeInsert( orgPersonRecord );
But when I run this, I get the error null value in column "org_person_id" violates not-null constraint.
I noticed the jOOQ docs say that calling newRecord automatically sets all the internal "changed" flags to true on the UpdatableRecord. So then I tried this:
// orgPerson represents a POJO filled with my values to be inserted and null for everything else
// Note that orgPerson.orgPersonId is null
OrgPersonRecord orgPersonRecord = create.newRecord( ORG_PERSON, orgPerson );
orgPersonRecord.changed( ORG_PERSON.ORG_PERSON_ID, false );
orgPersonRecord.changed( ORG_PERSON.CREATED_TIME, false );
orgPersonRecord.insert()
Integer orgPersonId = orgPersonRecord.getOrgPersonId();
But that gives me the error ERROR: duplicate key value violates unique constraint "pk_org_person". And when I do this repeatedly, the values seem to keep increasing by 1. This doesn't really make sense to me, but my greater question is: Is there a good way I can do an INSERT based on my object values, or better yet, simply include only the non-null columns?
I saw JOOQ ignoring database columns with default values, but that doesn't seem to resolve this. Any recommendations on the most concise way to handle this?
By the way, jOOQ has been fantastic to work with so far. Lukas, thank you for this awesome tool!
UPDATE #1:
The "not null issue" is addressed by Lukas's answer below, and that's an easy fix.
For the duplicate primary keys, I am definitely not confusing INSERT with UPDATE. When I run the above code (slight update since original post), jOOQ seems to arbitrarily pick a "starting" primary key value for OrgPersonId. For example, when I first load up my environment, jOOQ might start with "11" for OrgPersonId.
Then, when I do an INSERT, jOOQ will attempt to supply a value of "11" for OrgPersonId, I'll get the ERROR: duplicate key value and the INSERT will fail. If I then repeat the INSERT, jOOQ uses "12", then "13". It succeeds or fails based on whether that ID is available, but it's not "starting" with the right ID.
The manual (http://www.jooq.org/doc/3.4/manual/sql-execution/crud-with-updatablerecords/identity-values/) says that If you're using jOOQ's code generator, the above table will generate a org.jooq.UpdatableRecord with an IDENTITY column. This information is used by jOOQ internally, to update IDs after calling store().
UPDATE #2:
Ok, I just tried the generated query directly in Postgres and it fails there, too, with the same issue. So, clearly this is a Postgres issue and not a jOOQ issue. I'll post the final resolution on that when I find it in case anyone else runs into this.
UPDATE #3:
Issue has been resolved. We use FlywayDB (another awesome tool) to automate our database schema migration, and we had a bunch of INSERT statements in our Flyway scripts that manually INSERTED the id number. This was convenient because we wanted to create a bunch of dummy data and wanted to guarantee the right foreign key relationships.
But manually specifying the primary key increment does not advance the Postgres sequence! Hence, we had to cycle through the Postgres sequence before (correctly operating) jOOQ would get the right sequence value.
Solution is to remove all our manual inserts of the primary keys in our demo data migration scripts.
violates not-null constraint
The first part that you're describing is a flaw (#3582), which is related to a previous issue (#2700), which enforced storing null values loaded from POJOs into jOOQ Records for database columns that are NOT NULL. The fix will be in jOOQ 3.5.0, 3.4.3, 3.3.4, and 3.2.7
duplicate key value violates unique constraint "pk_org_person"
The second part probably is caused by the fact that you are really loading an existing record and then calling executeInsert() on it (observe the INSERT, which will always execute an INSERT statement). You might want to call executeUpdate(), instead

Querying DynamoDB

I've got a DynamoDB table with a an alpha-numeric string as a hash key (e.g. "d4ed6962-3ec2-4312-a480-96ecbb48c9da"). I need to query the table based on another field in the table, hence I need my query to select all the keys such as my field x is between dat x and date y.
I know I need a condition on the hash key and another on a range key, however I struggle to compose a hash key condition that does not bind my query to specific IDs.
I thought I could get away with a redundant condition based on the ID being NOT_NULL, but when I use it I get the error:
Query key condition not supported
Below is the conditions I am using, any idea how to achieve this goal?
Condition hashKeyCondition = new Condition()
.withComparisonOperator(ComparisonOperator.NOT_NULL.toString());
Condition rangeCondition = new Condition()
.withComparisonOperator(ComparisonOperator.BETWEEN.toString())
.withAttributeValueList(new AttributeValue().withS(dateFormatter.print(lastScanTime())),
new AttributeValue().withS(dateFormatter.print(currentScanTime)));
Map<String, Condition> keyConditions = new HashMap<String, Condition>();
keyConditions.put("userId", hashKeyCondition);
keyConditions.put("lastAccesTime", rangeCondition);
Thanks in advance to everyone helping.
In DynamoDB you can get items with 3 api:
. Scan (flexible but expensive),
. Query (less flexible: you have to specify an hash, but less expensive)
. GetItem (by Hash and, if your table has one, by range)
The only way to achieve what you want is by either:
Use Scan, and be slow or expensive.
Use another table (B) as an index to the previous one (A) like:
B.HASH = 'VALUES'
B.RANGE = userid
B.lastAccesTime = lastAccesTime (with a secondary index)
Now you have to maintain that index on writes, but you can use it with the Query operation,
to get your userIds. Query B: hash='VALUES', lastaccessTime between x and y, select userid.
Hope this helps.
The NOT_NULL comparison operator is not valid for the hash key condition. The only valid operator for the Hash key condition on a query is EQ. More information can be found here:
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
And what this means is that a query will not work, at least as your table is currently constructed. You can either use a Scan operation or you can create a separate table that stores the data by Date (hash) and User ID (range).
Good luck!
I ended up scanning the table and enforcing a filter.
Thanks to everyone taking time for helping out!
You could add Global Secondary Index with, for example, year and month of your date and make it your hash key, range key for that index would be your date then you could query any data range in a certain month. It will help you avoid expensive full scan.
E.g.
Global Secondary Index:
Hash key: month_and_year for example '2014 March'
Range key: full_date
Hope it helps!
You need to create GSI if you want to query other than Partition Key. Scan is very expensive in terms of cost and performance.

dictionary data insert or select (oracle, java)

I have an table (in ORADB) containing two columns: VARCHAR unique key and NUMBER unique key generated from an sequence.
I need my Java code to constantly (and in parallel) add records to this column whenever a new VARCHAR key it gets, returning the newly generated NUMBER key. Or returns the existing NUMBER key when it gets an existing VARCHAR (it doesn't insert it then, that would throw an exception of course due to the uniq key violation).
Such procedure would be executed from many (Java) clients working in parallel.
Hope my English is understandable :)
What is the best (maybe using PL/SQL block instead of Java code...) way to do it?
I do not think you can do better than
SELECT the_number FROM the_table where the_key = :key
if found, return it
if not found, INSERT INTO the_table SELECT :key, the_seq.NEXT_VAL RETURNING the_number INTO :number and COMMIT
this could raise a ORA-00001(duplicate primary key insert)
if the timing is unlucky. In this case, SELECT again.
Not sure if JDBC supports RETURNING, so you might need to wrap it into a stored procedure (also saves database roundtrips).
You can use an index-organized table (with the_key as primary key), makes the lookup faster.

Categories