Mongodb avoid duplicate entries

Mongodb avoid duplicate entries - java

I am newbie to mongodb. May I know how to avoid duplicate entries. In relational tables, we use primary key to avoid it. May I know how to specify it in Mongodb using java?

Use an index with the {unique:true} option.
// everyone's username must be unique:
db.users.createIndex({email:1},{unique:true});
You can also do this across multiple fields. See this section in the docs for more details and examples.
A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a collection.
If you wish for null values to be ignored from the unique key, then you have to also make the index sparse (see here), by also adding the sparse option:
// everyone's username must be unique,
//but there can be multiple users with no email field or a null email:
db.users.createIndex({email:1},{unique:true, sparse:true});
If you want to create the index using the MongoDB Java Driver. Try:
Document keys = new Document("email", 1);
collection.createIndex(keys, new IndexOptions().unique(true));

This can be done using "_id" field although this use is discouraged.
suppose you want the names to be unique, then you can put the names in "_id" column and as you might know "_id" column is unique for each entry.
BasicDBObject bdbo = new BasicDBObject("_id","amit");
Now , no other entry can have name as "amit" in the collection.This can be one of the way you are asking for.

As of Mongo's v3.0 Java driver, the code to create the index looks like:
public void createUniqueIndex() {
Document index = new Document("fieldName", 1);
MongoCollection<Document> collection = client.getDatabase("dbName").getCollection("CollectionName");
collection.createIndex(index, new IndexOptions().unique(true));
}
// And test to verify it works as expected
#Test
public void testIndex() {
MongoCollection<Document> collection = client.getDatabase("dbName").getCollection("CollectionName");
Document newDoc = new Document("fieldName", "duplicateValue");
collection.insertOne(newDoc);
// this will throw a MongoWriteException
try {
collection.insertOne(newDoc);
fail("Should have thrown a mongo write exception due to duplicate key");
} catch (MongoWriteException e) {
assertTrue(e.getMessage().contains("duplicate key"));
}
}

Theon solution didn't work for me, but this one did:
BasicDBObject query = new BasicDBObject(<fieldname>, 1);
collection.ensureIndex(query, <index_name>, true);

I am not a Java programmer however you can probably convert this over.
MongoDB by default does have a primary key known as the _id you can use upsert() or save() on this key to prevent the document from being written twice like so:
var doc = {'name': 'sam'};
db.users.insert(doc); // doc will get an _id assigned to it
db.users.insert(doc); // Will fail since it already exists
This will stop immediately duplicates. As to multithread safe inserts under certain conditions: well, we would need to know more about your condition in that case.
I should add however that the _id index is unqiue by default.

using pymongo it looks like:
mycol.create_index("id", unique=True)
where myCol is the collection in the DB
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.create_index("id", unique=True)
mydict = {"name": "xoce", "address": "Highway to hell 666", "id": 1}
x = mycol.insert_one(mydict)

Prevent mongoDB to save duplicate email
UserSchema.path('email').validate(async(email)=>{
const emailcount = await mongoose.models.User.countDocuments({email})
return !emailcount
}, 'Email already exits')
May this help ur question...
worked for me..
use in user model.
refer for explaination
THANKS...

Related

Get the object id after inserting the mongodb document in java

I am using mongodb 3.4 and I want to get the last inserted document id. I have searched all and I found out below code can be used if I used a BasicDBObject.
BasicDBObject docs = new BasicDBObject(doc);
collection.insertOne(docs);
ID = (ObjectId)doc.get( "_id" );
But the problem is am using Document type not BasicDBObject so I tried to get it as like this, doc.getObjectId();. But it asks a parameter which I actually I want, So does anyone know how to get it?
EDIT
This is the I am inserting it to mongo db.
Document doc = new Document("jarFileName", jarDataObj.getJarFileName())
.append("directory", jarDataObj.getPathData())
.append("version", jarDataObj.getVersion())
.append("artifactID", jarDataObj.getArtifactId())
.append("groupID", jarDataObj.getGroupId());
If I use doc.toJson() it shows me whole document. is there a way to extract only _id?
This gives me only the value i want it like the objectkey, So I can use it as reference key.
collection.insertOne(doc);
jarID = doc.get( "_id" );
System.out.println(jarID); //59a4db1a6812d7430c3ef2a5

Based on ObjectId Javadoc, you can simply instantiate an ObjectId from a 24 byte Hex string, which is what 59a4db1a6812d7430c3ef2a5 is if you use UTF-8 encoding. Why don't you just do new ObjectId("59a4db1a6812d7430c3ef2a5"), or new ObjectId("59a4db1a6812d7430c3ef2a5".getBytes(StandardCharsets.UTF_8))? Although, I'd say that exposing ObjectId outside the layer that integrates with Mongo is a design flaw.

Query Dynamo table with only the secondary global index

Im trying to query a Dynamodb table using a secondary global index and I'm getting java.lang.IllegalArgumentException: Illegal query expression: No hash key condition is found in the query. All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key. The timestamp is not part of a key or range key, so i created a global index for it.
Does anyone have a clue what i might be missing?
Table Definition:
{
AttributeDefinitions:[
{
AttributeName:timestamp,
AttributeType:N
},
{
AttributeName:url,
AttributeType:S
}
],
TableName:SitePageIndexed,
KeySchema:[
{
AttributeName:url,
KeyType:HASH
}
],
TableStatus:ACTIVE,
CreationDateTime: Mon May 12 18:45:57 EDT 2014,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
TableSizeBytes:0,
ItemCount:0,
GlobalSecondaryIndexes:[
{
IndexName:TimestampIndex,
KeySchema:[
{
AttributeName:timestamp,
KeyType:HASH
}
],
Projection:{
ProjectionType:ALL,
},
IndexStatus:ACTIVE,
ProvisionedThroughput:{
NumberOfDecreasesToday:0,
ReadCapacityUnits:8,
WriteCapacityUnits:4
},
IndexSizeBytes:0,
ItemCount:0
}
]
}
Code
Condition condition1 = new Condition().withComparisonOperator(ComparisonOperator.GE).withAttributeValueList(new AttributeValue().withN(Long.toString(start)));
DynamoDBQueryExpression<SitePageIndexed> exp = new DynamoDBQueryExpression<SitePageIndexed>().withRangeKeyCondition("timestamp", condition1);
exp.setScanIndexForward(true);
exp.setLimit(100);
exp.setIndexName("TimestampIndex");
PaginatedQueryList<SitePageIndexed> queryList = client.query(SitePageIndexed.class,exp);

All I'm trying to do is to get all items that have a timestamp greater than a value without considering the key.
This is not how Global Secondary Indexes (GSI) on Amazon DynamoDB work. To query a GSI you must specify a value for its hash key and then you may filter/sort by the range key -- just like you'd do with the primary key. This is exactly what the exception is trying to tell you, and also what you will find on the documentation page for the Query API:
A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.
Think of a GSI as just another key that behaves almost exactly like the primary key (the main differences being that it is updated asynchronously, and you can only perform eventually consistent reads on GSIs).
Please refer to the Amazon DynamoDB Global Secondary Index documentation page for guidelines and best practices when creating GSIs: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
One possible way to achieve what you want would be to have a dummy attribute constrained to a finite, small set of possible values, create a GSI with hash key on that dummy attribute and range key on your timestamp. When querying, you would need to issue one Query API call for each possible value on your dummy hash key attribute, and then consolidate the results on your application. By constraining the dummy attribute to a singleton (i.e., a Set with a single element, i.e., a constant value), you can send only one Query API call and you get your result dataset directly -- but keep in mind that this will cause you problems related to hot partitions and you might have performance issues! Again, refer to the document linked above to learn the best practices and some patterns.

It is possible to query DynamoDb with only the GSI; could be confirmed by going to the web interaface Query/Index.
Programatically the way it is done is as following:
DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient(
new ProfileCredentialsProvider()));
Table table = dynamoDB.getTable("WeatherData");
Index index = table.getIndex("PrecipIndex");
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#d = :v_date and Precipitation = :v_precip")
.withNameMap(new NameMap()
.with("#d", "Date"))
.withValueMap(new ValueMap()
.withString(":v_date","2013-08-10")
.withNumber(":v_precip",0));
ItemCollection<QueryOutcome> items = index.query(spec);
Iterator<Item> iter = items.iterator();
while (iter.hasNext()) {
System.out.println(iter.next().toJSONPretty());
}
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSIJavaDocumentAPI.html#GSIJavaDocumentAPI.QueryAnIndex
For doing it with DynamoDBMapper see: How to query a Dynamo DB having a GSI with only hashKeys using DynamoDBMapper

Here is how you can query in java with only GSI
Map<String, AttributeValue> eav = new HashMap<String, AttributeValue>();
eav.put(":val1", new AttributeValue().withS("PROCESSED"));
DynamoDBQueryExpression<Package> queryExpression = new DynamoDBQueryExpression<Package>()
.withIndexName("<your globalsecondaryindex key name>")
.withKeyConditionExpression("your_gsi_column_name= :val1").
withExpressionAttributeValues(eav).withConsistentRead(false).withLimit(2);
QueryResultPage<T> scanPage = dbMapper.queryPage(T.class, queryExpression);

While this is not the correct answer per say, could you possible accomplish this with a scan vs. a query? It's much more expensive, but could be a solution.

alias in mongo db

I've started to fiddle with mongo db and came up with a question.
Say, I have an object (POJO) with an id field (say, named 'ID') that I would like to represent in JSON and store/load in/from Mongo DB.
As far as I understood any object always has _id field (with underscore, lowercased).
What I would like to do is: during the query I would like the mongo db to return me my JSON with field ID instead of _id.
In SQL I would use something like
SELECT _id as ID ...
My question is whether its possible to do this in mongo db, and if it is, the Java based Example will be really appreciated :)
I understand that its possible to iterate over the records and substitute the _id with ID manually but I don't want this O(n) loop.
I also don't really want to duplicate the lines and store both "id" and "_id"
So I'm looking for solution at the level of query or maybe Java Driver.
Thanks in advance and have a nice day

Mongodb doesnt use SQL , its more like Object Query Language and Collections.
what you can try is , some thing similar to below code using Mongo Java Driver
Pojo obj = new PojoInstance();
obj.setId(id);
db.yourdb.find(obj);

I've end up using the following approach in the Java Driver:
DBCursor cursor = runSomeQuery();
try {
while(cursor.hasNext()) {
DBObject dbObject = cursor.next();
ObjectId id = (ObjectId) dbObject.get("_id");
dbObject.removeField("_id");
dbObject.put("ID", id.toString());
System.out.println(dbObject);
}
} finally {
cursor.close();
}
I was wondering whether this is the best solution or I have other better options
Mark

Here's an example of what I am doing in Javascript. It may be helpful to you. In my case I am removing the _id field and aliasing the two very nested fields to display simpler names.
db.players.aggregate([
{ $match: { accountId: '12345'}},
{ $project: {
"_id": 0,
"id": "$id",
"masterVersion": "$branches.master.configuration.player.template.version",
"previewVersion": "$branches.preview.configuration.player.template.version"
}
}
])
I hope you find this helpful.

Concurrency - Getting the MongoDB generated ID of an object inserted via Java in a thread safe way

What is the best method to get the Mongo generated ID of a document inserted via Java.
The Java process inserting the documents is multi-thread, meaning that we need some atomic way to insert and return the ID of the object.
Also, if we setup a unique index, in the event that the object is a duplicate, will an ID be returned?
Thanks!

Generate the ObjectId early, use it in the insert, and there will no need to have the database return it to you.
ObjectId doesn't use a shared sequence number to be unique, so it doesn't matter if you generate one before inserting or retrieve it after.
public ObjectId createThing() {
ObjectId result = new ObjectId();
BasicDBObject thingToInsert = new BasicDbObject();
thingToInsert.put('_id', result);
//set other fields here
collection.insert(thingToInsert);
return result;
}

native ObjectId's which are generated by Mongo are globally unique and can be safely used from the multi-threaded application.
generated ObjectId can be obtained from the DbObject under _id key.
If inserted document violates a unique index constraint - java driver may throw an exception, depending on a value of WriteConcern:
http://api.mongodb.org/java/current/com/mongodb/WriteConcern.html
If it's value is higher then NORMAL- exception will be thrown.
WriteConcern can be specified for every individual insert (or update) method, or globally by using DBCollection.setWriteConcern

I retrieve the document with _id but when I get the data into my java class eg mobile, _id attribute which is of type ObjectID me I change it set the value of the document in mongodb.

MongoDB Composite Key

I'm just getting started with MongoDb and I've noticed that I get a lot of duplicate records for entries that I meant to be unique. I would like to know how to use a composite key for my data and I'm looking for information on how to create them. Lastly, I am using Java to access mongo and morphia as my ORM layer so including those in your answers would be awesome.
Morphia: http://code.google.com/p/morphia/

You can use objects for the _id field as well. The _id field is always unique. That way you kind of get a composite primary key:
{ _id : { a : 1, b: 1} }
Just be careful when creating these ids that the order of keys (a and b in the example) matters, if you swap them around, it is considered a different object.
The other possibility is to leave _id alone and create a unique compound index.
db.things.ensureIndex({firstname: 1, lastname: 1}, {unique: true});
//Deprecated since version 3.0.0, is now an alias for db.things.createIndex()
https://docs.mongodb.org/v3.0/reference/method/db.collection.ensureIndex/

You can create Unique Indexes on the fields of the document that you'd want to test uniqueness on. They can be composite as well (called compound key indexes in MongoDB land), as you can see from the documentation. Morphia does have a #Indexed annotation to support indexing at the field level. In addition with morphia you can define compound keys at the class level with the #Indexed annotation.

I just noticed that the question is marked as "java", so you'd want to do something like:
final BasicDBObject id = new BasicDBObject("a", aVal)
.append("b", bVal)
.append("c", cVal);
results = coll.find(new BasicDBObject("_id", id));
I use Morphia too, but have found (that while it works) it generates lots of errors as it tries to marshall the composite key. I use the above when querying to avoid these errors.
My original code (which also works):
final ProbId key = new ProbId(srcText, srcLang, destLang);
final QueryImpl<Probabilities> query = ds.createQuery(Probabilities.class)
.field("id").equal(key);
Probabilities probs = (Probabilities) query.get();
My ProbId class is annotated as #Entity(noClassnameStored = true) and inside the Probabilities class, the id field is #Id ProbId id;

I will try to explain with an example:
Create a table Music
Add Artist as a primary key
Now since artist may have many songs we have to figure out a sort key.
The combination of both will be a composite key.
Meaning, the Artist + SongTitle will be unique.
something like this:
{
"Artist" : {"s" : "David Bowie"},
"SongTitle" : {"s" : "changes"},
"AlbumTitle" : {"s" : "Hunky"},
"Genre" : {"s" : "Rock"},
}
Artist key above is: Partition Key
SongTitle key above is: sort key
The combination of both is always unique or should be unique. Rest are attributes which may vary per record.
Once you have this data structure in place you can easily append and scan as per your custom queries.
Sample Mongo queries for reference:
db.products.insert(json file path)
db.collection.drop(json file path)
db.users.find(json file path)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.