I'm just getting started with MongoDb and I've noticed that I get a lot of duplicate records for entries that I meant to be unique. I would like to know how to use a composite key for my data and I'm looking for information on how to create them. Lastly, I am using Java to access mongo and morphia as my ORM layer so including those in your answers would be awesome.
Morphia: http://code.google.com/p/morphia/
You can use objects for the _id field as well. The _id field is always unique. That way you kind of get a composite primary key:
{ _id : { a : 1, b: 1} }
Just be careful when creating these ids that the order of keys (a and b in the example) matters, if you swap them around, it is considered a different object.
The other possibility is to leave _id alone and create a unique compound index.
db.things.ensureIndex({firstname: 1, lastname: 1}, {unique: true});
//Deprecated since version 3.0.0, is now an alias for db.things.createIndex()
https://docs.mongodb.org/v3.0/reference/method/db.collection.ensureIndex/
You can create Unique Indexes on the fields of the document that you'd want to test uniqueness on. They can be composite as well (called compound key indexes in MongoDB land), as you can see from the documentation. Morphia does have a #Indexed annotation to support indexing at the field level. In addition with morphia you can define compound keys at the class level with the #Indexed annotation.
I just noticed that the question is marked as "java", so you'd want to do something like:
final BasicDBObject id = new BasicDBObject("a", aVal)
.append("b", bVal)
.append("c", cVal);
results = coll.find(new BasicDBObject("_id", id));
I use Morphia too, but have found (that while it works) it generates lots of errors as it tries to marshall the composite key. I use the above when querying to avoid these errors.
My original code (which also works):
final ProbId key = new ProbId(srcText, srcLang, destLang);
final QueryImpl<Probabilities> query = ds.createQuery(Probabilities.class)
.field("id").equal(key);
Probabilities probs = (Probabilities) query.get();
My ProbId class is annotated as #Entity(noClassnameStored = true) and inside the Probabilities class, the id field is #Id ProbId id;
I will try to explain with an example:
Create a table Music
Add Artist as a primary key
Now since artist may have many songs we have to figure out a sort key.
The combination of both will be a composite key.
Meaning, the Artist + SongTitle will be unique.
something like this:
{
"Artist" : {"s" : "David Bowie"},
"SongTitle" : {"s" : "changes"},
"AlbumTitle" : {"s" : "Hunky"},
"Genre" : {"s" : "Rock"},
}
Artist key above is: Partition Key
SongTitle key above is: sort key
The combination of both is always unique or should be unique. Rest are attributes which may vary per record.
Once you have this data structure in place you can easily append and scan as per your custom queries.
Sample Mongo queries for reference:
db.products.insert(json file path)
db.collection.drop(json file path)
db.users.find(json file path)
Related
Say, I want to save/create new item to the DynamoDb table,
if and only if there is not any existent item already that that would contain the referenceId equal to the same value I set.
In my case I want to create a item with withReferenceId=123 if there is not any other withReferenceId=123 in the table.
the referenceId is not primary key! (I don not want it to be it)
So the code:
val withReferenceIdValue = "123";
val saveExpression = new DynamoDBSaveExpression();
final Map<String, ExpectedAttributeValue> expectedNoReferenceIdFound = new HashMap();
expectedNoReferenceIdFound.put(
"referenceId",
new ExpectedAttributeValue(new AttributeValue().withS(withReferenceIdValue)).withComparisonOperator(ComparisonOperator.NE)
);
saveExpression.setExpected(expectedNoReferenceIdFound);
newItemRecord.setReferenceId(withReferenceId);
this.mapper.save(newItemRecord, saveExpression); // do not fail..
That seems does not work.
I the table has the referenceId=123 already the save() does not fail.
I expected this.mapper.save to fail with exception.
Q: How to make it fail on condition?
I also checked this one where they suggest to add auxiliary table (transaction-state table)..because seems the saveExpression works only for primary/partition key... if so:
not sure why there that limitation. in any case if it is primary key
one can not create duplicated item with the same primary key.. why
creating conditions on first place. 3rd table is too much.. why there
is not just NE to whatever field I want to use. I may create an index
for this filed. not being limited to use only primary key.. that what
I mean
UPDATE:
My table mapping code:
#Data // I use [lombok][2] and it does generate getters and setters.
#DynamoDBTable(tableName = "MyTable")
public class MyTable {
#DynamoDBHashKey(attributeName = "myTableID")
#DynamoDBAutoGeneratedKey
private String myTableID;
#DynamoDBAttribute(attributeName = "referenceId")
private String referenceId;
#DynamoDBAttribute(attributeName = "startTime")
private String startTime;
#DynamoDBAttribute(attributeName = "endTime")
private String endTime;
...
}
Correct me if I'm wrong, but from the:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dynamodb-dg.pdf
Conditional Writes By default, the DynamoDB write operations (PutItem,
UpdateItem, DeleteItem) are unconditional: each of these operations
will overwrite an existing item that has the specified primary key
the primary key - that makes me thing that the conditional write works ONLY with primary keys
--
Also there is attempt use the transactional way r/w from the db. There is a library. That event has not maven repo: https://github.com/awslabs/dynamodb-transactions
As an alternative seems is the way to use 3rd transaction table with the primary keys that are responsible to tell you whether you are ok to read or write to the table. (ugly) as we replied here: DynamoDBMapper save item only if unique
Another alternative, I guess (by design): it is to design your tables in a way so you use the primary key as your business-key, so you can use it for the conditional writes.
--
Another option: use Aurora :)
--
Another options (investigating): https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/ - this I do not like either. because potentially it would create timeouts for others who would want to create new items in this table.
--
Another option: Live with this let duplication happens for the item-creation (not including the primary key). And take care of it as a part of "garbage collection". Depends on the scenario.
I saw various examples online where cassandra triggers were used to write to an audit table. I was following this one :
https://github.com/apache/cassandra/blob/cassandra-3.0/examples/triggers/src/org/apache/cassandra/triggers/AuditTrigger.java
However in my use case, I have an audit table that has a composite partition key ( PRIMARY KEY ((col1,col2),col3,col4) ) and multiple clustering columns.
I have been able to add the clustering columns by adding audit.clustering(values) but I am not able to figure out how to implement the composite partition key.
RowUpdateBuilder gives me an error if I pass update.partitionKey.partition () as the 3rd parameter of rowUpdateBuilder.
The error is :
Java.lang.IllegalArgumentException: Invalid number of components, expecting 2 but got 1.
I get the same error when I pass an array of size 2 as the 3rd parameter to rowUpdateBuilder.
Any help will be appreciated.
Build composite partition key from all of your partition key
To build composite partition key from one or more partition key, use the following method :
public DecoratedKey buildCompositePartitionKey(CFMetaData metadata, Object... partitionKey) {
return metadata.decorateKey(
CFMetaData.serializePartitionKey(
metadata.getKeyValidatorAsClusteringComparator().make(partitionKey)
)
);
}
Example :
CFMetaData metadata = Schema.instance.getCFMetaData("test_ks", "test_cf");
DecoratedKey compositePartitionKey = buildCompositePartitionKey(metadata, "col1 value", "col2 value");
RowUpdateBuilder audit = new RowUpdateBuilder(metadata, FBUtilities.timestampMicros(), compositePartitionKey);
Im trying to query a Dynamodb table using a secondary global index and I'm getting java.lang.IllegalArgumentException: Illegal query expression: No hash key condition is found in the query. All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key. The timestamp is not part of a key or range key, so i created a global index for it.
Does anyone have a clue what i might be missing?
Table Definition:
{
AttributeDefinitions:[
{
AttributeName:timestamp,
AttributeType:N
},
{
AttributeName:url,
AttributeType:S
}
],
TableName:SitePageIndexed,
KeySchema:[
{
AttributeName:url,
KeyType:HASH
}
],
TableStatus:ACTIVE,
CreationDateTime: Mon May 12 18:45:57 EDT 2014,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
TableSizeBytes:0,
ItemCount:0,
GlobalSecondaryIndexes:[
{
IndexName:TimestampIndex,
KeySchema:[
{
AttributeName:timestamp,
KeyType:HASH
}
],
Projection:{
ProjectionType:ALL,
},
IndexStatus:ACTIVE,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
IndexSizeBytes:0,
ItemCount:0
}
]
}
Code
Condition condition1 = new Condition().withComparisonOperator(ComparisonOperator.GE).withAttributeValueList(new AttributeValue().withN(Long.toString(start)));
DynamoDBQueryExpression<SitePageIndexed> exp = new DynamoDBQueryExpression<SitePageIndexed>().withRangeKeyCondition("timestamp", condition1);
exp.setScanIndexForward(true);
exp.setLimit(100);
exp.setIndexName("TimestampIndex");
PaginatedQueryList<SitePageIndexed> queryList = client.query(SitePageIndexed.class,exp);
All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key.
This is not how Global Secondary Indexes (GSI) on Amazon DynamoDB work. To query a GSI you must specify a value for its hash key and then you may filter/sort by the range key -- just like you'd do with the primary key. This is exactly what the exception is trying to tell you, and also what you will find on the documentation page for the Query API:
A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.
Think of a GSI as just another key that behaves almost exactly like the primary key (the main differences being that it is updated asynchronously, and you can only perform eventually consistent reads on GSIs).
Please refer to the Amazon DynamoDB Global Secondary Index documentation page for guidelines and best practices when creating GSIs: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
One possible way to achieve what you want would be to have a dummy attribute constrained to a finite, small set of possible values, create a GSI with hash key on that dummy attribute and range key on your timestamp. When querying, you would need to issue one Query API call for each possible value on your dummy hash key attribute, and then consolidate the results on your application. By constraining the dummy attribute to a singleton (i.e., a Set with a single element, i.e., a constant value), you can send only one Query API call and you get your result dataset directly -- but keep in mind that this will cause you problems related to hot partitions and you might have performance issues! Again, refer to the document linked above to learn the best practices and some patterns.
It is possible to query DynamoDb with only the GSI; could be confirmed by going to the web interaface Query/Index.
Programatically the way it is done is as following:
DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient(
new ProfileCredentialsProvider()));
Table table = dynamoDB.getTable("WeatherData");
Index index = table.getIndex("PrecipIndex");
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#d = :v_date and Precipitation = :v_precip")
.withNameMap(new NameMap()
.with("#d", "Date"))
.withValueMap(new ValueMap()
.withString(":v_date","2013-08-10")
.withNumber(":v_precip",0));
ItemCollection<QueryOutcome> items = index.query(spec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
System.out.println(iter.next().toJSONPretty());
}
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSIJavaDocumentAPI.html#GSIJavaDocumentAPI.QueryAnIndex
For doing it with DynamoDBMapper see: How to query a Dynamo DB having a GSI with only hashKeys using DynamoDBMapper
Here is how you can query in java with only GSI
Map<String, AttributeValue> eav = new HashMap<String, AttributeValue>();
eav.put(":val1", new AttributeValue().withS("PROCESSED"));
DynamoDBQueryExpression<Package> queryExpression = new DynamoDBQueryExpression<Package>()
.withIndexName("<your globalsecondaryindex key name>")
.withKeyConditionExpression("your_gsi_column_name= :val1").
withExpressionAttributeValues(eav).withConsistentRead(false).withLimit(2);
QueryResultPage<T> scanPage = dbMapper.queryPage(T.class, queryExpression);
While this is not the correct answer per say, could you possible accomplish this with a scan vs. a query? It's much more expensive, but could be a solution.
I have a List of ids:
List<Object> ids;
I need to use this in a criteria query to get all rows with an id that ids contains.
What I have now and works:
if (ids.size() > 0) {
for (Object id : ids) {
preparedResults.add((T)sessionMngr.getSession().
createCriteria(rowType).add(Restrictions.idEq(id))
.uniqueResult());
}
}
So I fetch them one by one, which isn't optimal, but I first tried something like following to get them in one query, but don't know what to put at the ???:
preparedResults.addAll(sessionMngr.getSession().
createCriteria(rowType).
add(Restrictions.in(???, ids)).list());
The first argument of Restrictions.in() is of type String. I can't put a hard coded "id" there as I don't know what the propertyname is.
So I don't know how to get the id property as a String there.
I saw Projections.id(), but I am not sure if I can use this to get it as a String.
With this solution you can retrieve the name of the id field of your entity. If you use annotations you can have it even shorter as described here. If you donĀ“t use composite primary keys you could also use
ClassMetadata classMetadata = getSessionFactory().getClassMetadata(myClass);
string identifierPropertyName = classMetadata.getIdentifierPropertyName();
Source of this snippet: here
I am newbie to mongodb. May I know how to avoid duplicate entries. In relational tables, we use primary key to avoid it. May I know how to specify it in Mongodb using java?
Use an index with the {unique:true} option.
// everyone's username must be unique:
db.users.createIndex({email:1},{unique:true});
You can also do this across multiple fields. See this section in the docs for more details and examples.
A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a collection.
If you wish for null values to be ignored from the unique key, then you have to also make the index sparse (see here), by also adding the sparse option:
// everyone's username must be unique,
//but there can be multiple users with no email field or a null email:
db.users.createIndex({email:1},{unique:true, sparse:true});
If you want to create the index using the MongoDB Java Driver. Try:
Document keys = new Document("email", 1);
collection.createIndex(keys, new IndexOptions().unique(true));
This can be done using "_id" field although this use is discouraged.
suppose you want the names to be unique, then you can put the names in "_id" column and as you might know "_id" column is unique for each entry.
BasicDBObject bdbo = new BasicDBObject("_id","amit");
Now , no other entry can have name as "amit" in the collection.This can be one of the way you are asking for.
As of Mongo's v3.0 Java driver, the code to create the index looks like:
public void createUniqueIndex() {
Document index = new Document("fieldName", 1);
MongoCollection<Document> collection = client.getDatabase("dbName").getCollection("CollectionName");
collection.createIndex(index, new IndexOptions().unique(true));
}
// And test to verify it works as expected
#Test
public void testIndex() {
MongoCollection<Document> collection = client.getDatabase("dbName").getCollection("CollectionName");
Document newDoc = new Document("fieldName", "duplicateValue");
collection.insertOne(newDoc);
// this will throw a MongoWriteException
try {
collection.insertOne(newDoc);
fail("Should have thrown a mongo write exception due to duplicate key");
} catch (MongoWriteException e) {
assertTrue(e.getMessage().contains("duplicate key"));
}
}
Theon solution didn't work for me, but this one did:
BasicDBObject query = new BasicDBObject(<fieldname>, 1);
collection.ensureIndex(query, <index_name>, true);
I am not a Java programmer however you can probably convert this over.
MongoDB by default does have a primary key known as the _id you can use upsert() or save() on this key to prevent the document from being written twice like so:
var doc = {'name': 'sam'};
db.users.insert(doc); // doc will get an _id assigned to it
db.users.insert(doc); // Will fail since it already exists
This will stop immediately duplicates. As to multithread safe inserts under certain conditions: well, we would need to know more about your condition in that case.
I should add however that the _id index is unqiue by default.
using pymongo it looks like:
mycol.create_index("id", unique=True)
where myCol is the collection in the DB
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.create_index("id", unique=True)
mydict = {"name": "xoce", "address": "Highway to hell 666", "id": 1}
x = mycol.insert_one(mydict)
Prevent mongoDB to save duplicate email
UserSchema.path('email').validate(async(email)=>{
const emailcount = await mongoose.models.User.countDocuments({email})
return !emailcount
}, 'Email already exits')
May this help ur question...
worked for me..
use in user model.
refer for explaination
THANKS...