MongoDB (java) takes a long time for a find - java

I have a mongdb database with the following stats:
{
"ns" : "pourmoi.featurecount",
"count" : 12152142,
"size" : 1361391264,
"avgObjSize" : 112,
"numExtents" : 19,
"storageSize" : 1580150784,
"lastExtentSize" : 415174656,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0.
It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 1165210816,
"indexSizes" : {
"_id_" : 690111632,
"feature_1" : 475099184
},
"ok" : 1
}
and a java program that does a find and returns around 50 results.
This is the query
db.featurecount.find(
{ "$or" : [ { "feature" : "hello"}, { "feature" : "how"},
{ "feature" : "are"} , { "feature" : "you"} ]})
.sort({count: -1}).limit(20)
This query takes around 30 seconds (at least)... Is it possible to make it run faster?
PS: The mongodb server is in localhost...

Related

Convert Iterable<Entity> to list - mongoDB Java

what is the most efficient way to convert Iterable to list?
below is my code:
for slower collection:
Iterable<TestEntity> entityResult = testCollection.find(and(
or(
geoWithin("aLocation", BasicDBObject.parse(area)),
geoWithin("bLocation", BasicDBObject.parse(area))
),
lte(START_DATE, anotherDate),
gte(END_DATE, Date2),
and(exists(STATUS), ne(STATUS, "TEST"))
)).maxTime(mongoDisasterMaxExecutionMaxTime, TimeUnit.MINUTES);
List<TestEntity> testSummary = StreamSupport.stream(entityResult.spliterator(), true).collect(Collectors.toList());
but this takes almost 15 seconds for 80k records. I have another collection, which take about a second for 30k records.
I also have compound indexes on both collections. So, I don't think there is a issue with mongo returning data slowly. when I print logs for mongoDB find, its pretty fast, but the streamsupport command takes 15 seconds.
Collection 1 (slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“aLocation" : [
-100.672735,
32.849354
],
“bLocation" : [
-69.816366,
42.808233
]
}
Collection 2 ( not slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“Location”Id : “8888”,
“aLocation" : [
3.063211,
50.633679
]
}

How to get limited results from Multiple Queries (x records from query 1 and y records from query 2) in Elastic Search

I have two distinct queries, and I want to get a combined result from them, i.e. x number of records from query 1 and y number of records from query 2.
How do I do that?
Currently I am using the query below:
**{ "bool" : {
"should" : [
{
"multi_match" : {
"query" : "indea",
"fields" : [
"admin^1.0",
"city^1.0",
"country^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"fuzziness" : "2",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"multi_match" : {
"query" : "Maharashtra",
"fields" : [
"admin^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}**

MongoDB 3.6 ClientCursor :: staticYield can't unlock b/c of recursive lock ns:

Our prod mongo server is running out of memory with excessive logging of below statement
2018-03-19T20:03:05.627-0500 [conn2] warning: ClientCursor::staticYield can't unlock b/c of recursive lock ns:
top:{
opid:16,
active:true,
secs_running:0,
microsecs_running:254,
op:"query",
ns:"users.person",
query:{
findandmodify:"person",
query:{
clientSN:"405F014DE02B33F1",
status:"New"
},
update:{
$set:{
status:"InProcess"
},
$currentDate:{
lastUpdateTime:true
}
},
new:true
},
client:"10.102.26.26:61299",
desc:"conn2",
connectionId:2,
locks:{
^:"w",
^users:"W"
},
waitingForLock:false,
numYields:0,
lockStats:{
timeLockedMicros:{
},
timeAcquiringMicros:{
r:0,
w:1070513
}
}
}
I have checked the MongoDB: How to disable logging the warning: ClientCursor::staticYield can't unlock b/c of recursive lock? which suggests cause of the issue is missing indexes.
I tried runnign the query with explain() as per the above article and Below is the output of db.getCollection('personSync').find({"clientSN":"405F014DE02B33F1","status":"New"}).explain() the query which has the fields of above findandmodify operation
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 71331,
"nscanned" : 71331,
"nscannedObjectsAllPlans" : 71331,
"nscannedAllPlans" : 71331,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 557,
"nChunkSkips" : 0,
"millis" : 59,
"server" : "SQL01:27017",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 71333,
"yields" : 557,
"unyields" : 557,
"invalidates" : 0,
"advanced" : 0,
"needTime" : 71332,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 71331,
"children" : []
}
}
so I was referring to article on https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/ to create index, would adding index in below way fix my issue for findandmodify operation in my case? Or do I need to add any more indexes too?
db.users.createIndex({ clientSN:"405F014DE02B33F1", status:"New"})
Adding the index should improve the performance of this operation. Currently it's performing a Collection Scan; using an index will be much more efficient.
The createIndex command is incorrect. It should be the following:
db.users.createIndex({ clientSN:1, status:1}, { background : true } )
Note that indexes are built in the foreground by default, so setting the background flag is very important.

Mongo driver updateMany not updating all documents

Using Mongo Java Driver version 3.2.1 against MongoDB 3.0.12.
Calling MongoCollection.updateMany(Bson filter, Bson update) returns a result showing all expected documents were modified, however only a portion of the documents were actually updated.
I've tried with multiple write concerns: JOURNALED, ACKNOWLEDGED, etc
Any ideas?
Here is the profile result:
{ "op" : "update", "ns" : "dev.timeSheet", "query" : { "lineItems.task" : ObjectId("53233e85e4b07f573f1d4466") }, "updateobj" : { "$set" : { "lineItems.$.task" : ObjectId("53233e85e4b07f573f1d446d") } }, "nscanned" : 0, "nscannedObjects" : 6733, "nMatched" : 248, "nModified" : 248, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 52, "locks" : { "Global" : { "acquireCount" : { "r" : NumberLong(53), "w" : NumberLong(53) } }, "MMAPV1Journal" : { "acquireCount" : { "w" : NumberLong(301) } }, "Database" : { "acquireCount" : { "w" : NumberLong(53) } }, "Collection" : { "acquireCount" : { "W" : NumberLong(53) } } }, "millis" : 50, "execStats" : { }, "ts" : ISODate("2016-08-25T18:17:16.025Z"), "client" : "127.0.0.1", "allUsers" : [ ], "user" : "" }
Update: Also occurs in MongoDB 3.2.9
Direct access calls:
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
126
db.timeSheet.updateMany({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}, {'$set': {'lineItems.$.task': ObjectId("53233e85e4b07f573f1d446d")}})
{ "acknowledged" : true, "matchedCount" : 126, "modifiedCount" : 126 }
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
90
This isn't working the way I expected because it is only updating the first lineItem it finds for each time sheet.
https://docs.mongodb.com/manual/reference/operator/update/positional/#up.S
Remember that the positional $ operator acts as a placeholder for the first match of the update query document.
No feature exists currently to update all items in the embedded array.
https://jira.mongodb.org/browse/SERVER-1243

Poor MongoDB read performance

I have a sharded collection containing flight information. The schema looks something like:
{
"_id" : ObjectId("537ef1bb5516dd401b5b109a"),
"departureAirport" : "HAJ",
"arrivalAirport" : "AYT",
"departureDate" : NumberLong("1412553600000"),
"operatingAirlineCode" : "DE",
"operatingFlightNumber" : "1808",
"flightClass" : "P",
"fareType" : "EX",
"availability" : "*"
}
Here are the statistics of my collection:
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "flights.flight",
"count" : 2809822,
"numExtents" : 30,
"size" : 674357280,
"storageSize" : 921788416,
"totalIndexSize" : 287746144,
"indexSizes" : {
"_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
},
"avgObjSize" : 240,
"nindexes" : 2,
"nchunks" : 869,
"shards" : {
"shard0000" : {
"ns" : "flights.flight",
"count" : 1396165,
"size" : 335079600,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 144633440,
"indexSizes" : {
"_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
},
"ok" : 1
},
"shard0001" : {
"ns" : "flights.flight",
"count" : 1413657,
"size" : 339277680,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 143112704,
"indexSizes" : {
"_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
},
"ok" : 1
}
},
"ok" : 1
}
I now run the queries from JAVA which look like:
{
"departureAirport" : "BSL",
"arrivalAirport" : "SMF",
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
"flightClass" : "C",
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
"fareType" : "OW"
}
The departureDate should be in between a range of a week and availability should be greater than the requested number or '*'.
My question is what can i do to increase my performance. When I query the database with 50 connections per host I only get about 1000 ops/s but I need to get something about 3000 - 5000 ops/s.
The cursor looks okay when I run the query in the shell:
"cursor" : "BtreeCursor departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1"
If I forgot something please write me. Thanks in advance.
The fact that a BtreeCursor is used doesn't make the query OK. The output of explain would help to identify the issue.
I guess a key problem is the order of your query params:
// equality, good
"departureAirport" : "BSL",
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
// again, looks like low selectivity to me. Since it's already at the end,
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"
You might want to change your index to something like
"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"
and query in that exact same order. Append everything else behind, so scans must be made only on those documents that matched all the other criteria in the index.
I'm assuming that flightClass and fareType have low selectivity. If that is not true, this won't be the best solution.

Categories