Regex query MongoDB Performance issue - java

i have Mongodb collection which contains single field , each day i am receiving 31000 documents and in the collection i have almost 6 months data
Here is how my data looks like in database
{
"_id" : ObjectId("59202aa3f32dfba00d0773c3"),
"Data" : "20-05-2017 18:38:13 SYSTEM_000_00_SAVING ",
"__v" : 0
}
{
"_id" : ObjectId("59202aa3f32dfba00d0773c4"),
"Data" : "20-05-2017 18:38:13 SyTime_000_09_00:00 ",
"__v" : 0
}
here is my code for query
DBObject query = new BasicDBObject();
Pattern regex = Pattern.compile("20-05-2017");
query.put("Data", regex);
i have created index but its still slow
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "NOB_SRB.fdevices"
},
{
"v" : 1,
"unique" : true,
"key" : {
"Data" : 1.0
},
"name" : "Data_1",
"ns" : "NOB_SRB.fdevices"
}
]

Add a start of input anchor ^ to the start of the regex:
Pattern regex = Pattern.compile("^20-05-2017");
Because your regex does not have an anchor, the entire field is searched for the date anywhere in it, which requires every character in the field to be compared.

Related

How to get documents, where any of the field value matches to any of the listed expressions in MongoDB Java

I'am trying to fetch all documents in a collections, where any of the document field can match to any of the listed regular expressions.
Considering below scenarios.
User can create documents with different fields names as they wish in a collection.
such as
document1 = >{ "_id":1, "card" : 1234 , "status": 4}
document2 => {"_id": ***, "Housenumber" : 356/78 , "value" : null}
------
documentn =>{ "_id" : ObjectId("4ecd2e33dd68c9021e453d12"), "searchword" : "win" }
------
Field names are not same for all the documents in a collection.
regular expressions can be:"/^(^456$|^win$............etc)/"
I tried to get key dynamically and do find query as mentioned below:
----------
table = db.getCollection(coll);
DBObject dataKeys = table.findOne();
Set<String> keys = dataKeys.keySet();
Iterator<String> iterator = keys.iterator();
while(iterator.hasNext()){
String key = iterator.next();
regexQuery.put(**key**, new BasicDBObject("$regex", "^((^(([0-9]{4}[-. _]?)$)|"
+ "(^[a-zA-Z0-9._%+-]...........................0-9]$$").append("$options", "i"));
DBCursor cursor = table.find(regexQuery);
while (cursor.hasNext()) {
System.out.println(cursor.next());
I can see key value is coming properly but it is not fetching the matching documents.
I am new to MongoDB and I followed above approach after googling it.
If you are looking to regex match on the field names (not the values), then use $objectToArray to turn the field names (LHS) into expression-worthy values (RHS):
var r = [
{ _id: 1, name: "buzz", addr: "here"}
,{ _id: 2, searchword: "win", value: 6}
,{ _id: 3, game:0, word: "foo", fruit: "apple", fame: 7}
,{ _id: 4, qval:23}
];
db.foo.insert(r);
var rin = [ /ame/, /^val/ ]; // list of regex
db.foo.aggregate([
{$project: {x: {$objectToArray: "$$CURRENT"}}}
,{$unwind: "$x"}
,{$match: {"x.k": {$in: rin}}}
]);
{ "_id" : 1, "x" : { "k" : "name", "v" : "buzz" } }
{ "_id" : 2, "x" : { "k" : "value", "v" : 6 } }
{ "_id" : 3, "x" : { "k" : "game", "v" : 0 } }
{ "_id" : 3, "x" : { "k" : "fame", "v" : 7 } }

Retrieve all the documents matching the criteria in an array elements which is a subdocument

{
"_id" : ObjectId("577b54816081dd32cd3e2d60"),
"user" : ObjectId("577b54816081dd32cd3e2d5e"),
"journals" : [
{
"title" : "Journal Title2",
"desc" : "desx2",
"feeling" : 3,
"date" : ISODate("2016-07-05T06:32:45.404Z"),
"deleteFl" : true,
"_id" : ObjectId("577b548d6081dd32cd3e2d64")
},
{
"title" : "Journal Title3",
"desc" : "desx3",
"feeling" : 3,
"date" : ISODate("2016-07-05T06:49:00.156Z"),
"deleteFl" : false,
"_id" : ObjectId("577b585c6081dd32cd3e2d6d")
},
{
"title" : "Journal Title4",
"desc" : "desx4",
"feeling" : 3,
"date" : ISODate("2016-07-05T06:49:06.700Z"),
"deleteFl" : false,
"_id" : ObjectId("577b58626081dd32cd3e2d70")
}
]
}
Above is my document structure
now, I need all the journal documents whose deleteFl = false.
I tried in this way using Java Mongo driver
getDatabase().getCollection("journals").find(and(eq("user", user), eq("journals.deleteFl", false)));
but still it gives me back all the documents including "deleteFl": true. any help here ?
Actually, your query returns 1 document, because the data is inside 1 document. What you want is to limit the returning fields of a document (limit subdocuments).
Note: You can do that using elemMatch in the projection, to limit the fields returned by the query. But elemMatch will return just one subdocument. (I posted a deleted wrong answer using elemMatch)
When you want all subdocuments and only specific subdocuments from inside an array, you need to use the aggregation pipeline.
Here is a tested code that does what you want (just change DB and colelction name):
MongoClient mongoClient = new MongoClient();
MongoDatabase db = mongoClient.getDatabase("test");
MongoCollection collection = db.getCollection("test");
Iterable<Document> output = collection.aggregate(asList(
new BasicDBObject("$unwind", "$journals"),
new BasicDBObject("$match", new BasicDBObject("journals.deleteFl", false))
));
for (Document dbObject : output)
{
System.out.println(dbObject);
}

MongoDB update element of nested array

I have a mongo collection named firma which has one of the document structure as below:
{
"_id" : ObjectId("5729af099b3ebf1d0ca7ff05"),
"musteriler" : [
{
"_id" : "de0bf813-b707-4a8d-afc2-9752e05c3aa5",
"yetkiliListesi" : [
{
"_id" : "a5e487fa-2034-4817-94f2-3bd837b76284",
"ad" : "Burak",
"soyad" : "Duman 1",
"cepTel" : "3333333333333",
"mail" : "asdf#asdf.com"
},
{
"_class" : "com.bisoft.entity.MusteriYetkili",
"_id" : "bc4b537d-522a-4c9a-9f67-8ca243e18f46",
"ad" : "Ridvan",
"soyad" : "ENİŞ",
"cepTel" : "222222222222",
"mail" : "asdf#asdf.com"
}
]
}
],
"defaultTimezone" : "Europe/Istanbul"
}
In the above json, I need to update element of second array(yetkiliListesi) which _id = "a5e487fa-2034-4817-94f2-3bd837b76284"
Since I am using a java application(using mongo java driver and spring boot MongoTemplate) to access it and execute this query :
mongoTemplate.updateFirst(Query.query(Criteria.where("_id").is("5729af099b3ebf1d0ca7ff05").and("musteriler.yetkiliListesi._id").is("a5e487fa-2034-4817-94f2-3bd837b76284")),
new Update().set("musteriler.yetkiliListesi.$", yetkiliDBO), Firma.class);
In the above query, yetkiliDBO is a BasicDBObject and its content :
yetkiliDBO = {
'_class': 'com.bisoft.entity.MusteriYetkili',
'_id': "a5e487fa-2034-4817-94f2-3bd837b76284",
'ad': 'wer',
'soyad': 'xyz',
'cepTel': "222222222222",
mail: "asdf#asdf.com"
}
when execute my query I have an error
com.mongodb.WriteConcernException: { "serverUsed" : "192.168.2.250:27017" , "ok" : 1 , "n" : 0 , "updatedExisting" : false , "err" : "cannot use the part (musteriler of musteriler.yetkiliListesi.0) to traverse the element
What I need to do?
You can not use the '$' placeholder when traversing nested arrays.
The positional $ operator cannot be used for queries which traverse more than one array, such as queries that traverse arrays nested within other arrays, because the replacement for the $ placeholder is a single value
source
I would suggest restructuring your data into separate, less-nested collections.

Count number of fields in a document mongo Java

I have a collection containing following documents -
{
"_id" : ObjectId("56e7a51b4a66e30330151847"),
"host" : "myTestHost.com",
"sessionId" : "daxxxxxx-xxxx-xxxx-xxxx-xxxxxxx1",
"ssoId" : "xxxxxx#gmail.com",
"days" : {
"13" : NumberLong(130),
"11" : NumberLong(457),
"10" : NumberLong(77)
},
"count" : NumberLong(664),
"timeStamp" : NumberLong("1458021713370")
}
I am using mongo java driver 3.2.1.
This document contains an embedded document 'days', that holds a specific count for each day of month.
I need to find the number of days for which a count is present.For example - for above document mentioned, the number of days for which count is present is 3 (13th, 11th and 10th day of month).
I know how to get the count on mongo console -
mongos>var count = 0;
mongos> db.monthData.find({},{days:1}).forEach(function(record){for(f in record.days) { count++;}});
mongos> count;
I need to convert this to java code.
Maybe you can reshape your schema as follow:
{
"_id" : ObjectId("56e7a51b4a66e30330151847"),
"host" : "myTestHost.com",
"sessionId" : "daxxxxxx-xxxx-xxxx-xxxx-xxxxxxx1",
"ssoId" : "xxxxxx#gmail.com",
"days" : [
{
"13" : NumberLong(130)
},
{
"11" : NumberLong(457)
},
{
"10" : NumberLong(77)
}
],
"count" : NumberLong(664),
"timeStamp" : NumberLong(1458021713370)
}
days becomes an array of objects and in this way you can easily use aggregation pipeline to know how many elements are in days array:
>db.collection.aggregate(
[
{$match:{days:{"$exists":1}}},
{$project:{
numberOfDays: {$size:"$days"},
_id:1}
}
]
)
The aggregation returns:
{ "_id" : ObjectId("56e7a51b4a66e30330151847"), "numberOfDays" : 3 }
To use Aggregation Pipeline with Java driver see aggregate, AggregateIterable, Block and read Data Aggregation with Java Driver

MongoDB Java Driver: Multiple Date query

I am trying to create a query using MongoDB Java Driver as part of an aggregation command. Currently I allow a date range or an array of specific dates as an argument. eg
<date>
<start>2013-12-10 00:00:00.000</start>
<end>2013-12-12 23:59:59.999</end>
</date>
or
<date>
<specificDates>2013-12-10 00:00:00.000,2013-12-13 00:00:00.000</specificDates>
</date>
The date range query works fine, I parse and convert the xml into a DBObject that produces the following query in mongo;
{ "$match" : { "d" : { "$gte" : { "$date" : "2013-10-01T00:00:00.000Z"} , "$lt" : { "$date" : "2013-10-04T00:00:00.000Z"}}}}
For the specificDates I want to return only results that occur between 00:00:00.000 on the given day and 00:00:00.000 of the next day. From my pretty basic knowledge of mongo querys i had hoped to do a similar $match as the date range, but have it use $in on an array of date ranges similar to the following;
{ "$match" : { "d" : { "$in" : [ { "$gte" : { "$date" : "2013-10-01T00:00:00.000Z"} , "$lt" : { "$date" : "2013-10-02T00:00:00.000Z"}} , { "$gte" : { "$date" : "2013-10-03T00:00:00.000Z"} , "$lt" : { "$date" : "2013-10-04T00:00:00.000Z"}}]}}}
The above query fails to return any results. I have noticed that $in is not listed in the mongodb manual under the Mongo Aggregation Framework section, but its not throwing any kind of errors that I would have expected for an unsupported operation.
I think the issue may come from this line in the MongoDB Manual;
If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (e.g. , , etc.)
In my collection the date isn't stored in an array, I suppose I could store it in the collections in an single element array? (Actually, decided to try this quickly before I posted, no documents returned when the date entry in the document is stored in a single element array)
Document entry example
{ "_id" : ObjectId("52aea5b0065991de1a56d5b0"), "d" : ISODate("2013-12-15T00:00:11.088Z"), "t" : 1501824, "s" : 0, "e" : 601, "tld" : "uk", "y" : "domain:check", "n" : "removed.co.uk" }
Is anyone able to give me some advice as to how I should do this query? Thank you.
EDIT: I left the Java tag here in case anyone needs my DBObject creation code, though it shouldn't be necessary as the queries posted have been generated by my build.
EDIT2: So as Alan Spencer pointed out I should be using $or rather than $in, a working $or function is below (ignore the different formatting like the use of ISODate(), its just copy pasted from the mongo shell rather than getting output from my program)
{ $match : { $or : [ { d : { $gte : ISODate("2013-10-01T00:00:00.000Z"), $lt : ISODate("2013-10-02T00:00:00.000Z") } }, { d : { $gte : ISODate("2013-10-03T00:00:00.000Z"), $lt : ISODate("2013-10-04T00:00:00.000Z") } } ] } }
I think you're inverting the meaning of the $in.
$in is used to match exactly against a list of possible values, like
{"color":{"$in": ["red","green","blue"]}}
For your use case, you are trying to match if it satisfies the first or second, etc. So, you can use $or - http://docs.mongodb.org/manual/reference/operator/query/or/
{ "$match" : { "d" : { "$or" : [ { "$gte" : { "$date" : "2013-10-01T00:00:00.000Z"} , "$lt" : { "$date" : "2013-10-02T00:00:00.000Z"}} , { "$gte" : { "$date" : "2013-10-03T00:00:00.000Z"} , "$lt" : { "$date" : "2013-10-04T00:00:00.000Z"}}]}}}

Categories