Poor MongoDB read performance - java

I have a sharded collection containing flight information. The schema looks something like:
{
"_id" : ObjectId("537ef1bb5516dd401b5b109a"),
"departureAirport" : "HAJ",
"arrivalAirport" : "AYT",
"departureDate" : NumberLong("1412553600000"),
"operatingAirlineCode" : "DE",
"operatingFlightNumber" : "1808",
"flightClass" : "P",
"fareType" : "EX",
"availability" : "*"
}
Here are the statistics of my collection:
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "flights.flight",
"count" : 2809822,
"numExtents" : 30,
"size" : 674357280,
"storageSize" : 921788416,
"totalIndexSize" : 287746144,
"indexSizes" : {
"_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
},
"avgObjSize" : 240,
"nindexes" : 2,
"nchunks" : 869,
"shards" : {
"shard0000" : {
"ns" : "flights.flight",
"count" : 1396165,
"size" : 335079600,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 144633440,
"indexSizes" : {
"_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
},
"ok" : 1
},
"shard0001" : {
"ns" : "flights.flight",
"count" : 1413657,
"size" : 339277680,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 143112704,
"indexSizes" : {
"_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
},
"ok" : 1
}
},
"ok" : 1
}
I now run the queries from JAVA which look like:
{
"departureAirport" : "BSL",
"arrivalAirport" : "SMF",
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
"flightClass" : "C",
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
"fareType" : "OW"
}
The departureDate should be in between a range of a week and availability should be greater than the requested number or '*'.
My question is what can i do to increase my performance. When I query the database with 50 connections per host I only get about 1000 ops/s but I need to get something about 3000 - 5000 ops/s.
The cursor looks okay when I run the query in the shell:
"cursor" : "BtreeCursor departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1"
If I forgot something please write me. Thanks in advance.

The fact that a BtreeCursor is used doesn't make the query OK. The output of explain would help to identify the issue.
I guess a key problem is the order of your query params:
// equality, good
"departureAirport" : "BSL",
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
// again, looks like low selectivity to me. Since it's already at the end,
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"
You might want to change your index to something like
"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"
and query in that exact same order. Append everything else behind, so scans must be made only on those documents that matched all the other criteria in the index.
I'm assuming that flightClass and fareType have low selectivity. If that is not true, this won't be the best solution.

Related

Convert Iterable<Entity> to list - mongoDB Java

what is the most efficient way to convert Iterable to list?
below is my code:
for slower collection:
Iterable<TestEntity> entityResult = testCollection.find(and(
or(
geoWithin("aLocation", BasicDBObject.parse(area)),
geoWithin("bLocation", BasicDBObject.parse(area))
),
lte(START_DATE, anotherDate),
gte(END_DATE, Date2),
and(exists(STATUS), ne(STATUS, "TEST"))
)).maxTime(mongoDisasterMaxExecutionMaxTime, TimeUnit.MINUTES);
List<TestEntity> testSummary = StreamSupport.stream(entityResult.spliterator(), true).collect(Collectors.toList());
but this takes almost 15 seconds for 80k records. I have another collection, which take about a second for 30k records.
I also have compound indexes on both collections. So, I don't think there is a issue with mongo returning data slowly. when I print logs for mongoDB find, its pretty fast, but the streamsupport command takes 15 seconds.
Collection 1 (slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“aLocation" : [
-100.672735,
32.849354
],
“bLocation" : [
-69.816366,
42.808233
]
}
Collection 2 ( not slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“Location”Id : “8888”,
“aLocation" : [
3.063211,
50.633679
]
}

How can I retrieve the value from a column instead of the link using Spring Data JPA?

I'm using a projection to retrieve a list of matches with the teams inline.
Projection:
#Projection(name = "matchInlineTeams", types = { Match.class })
public interface MatchInlineTeams {
Team getHomeTeam();
Long getHomeTeamGoals();
Long getAwayTeamGoals();
Team getAwayTeam();
}
And my result is a collection of these:
{
"homeTeam" : {
"teamName" : "Banfield",
"teamFoundation" : "1896-01-21T03:00:00.000+0000",
"teamCity" : 73,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : {
"teamName" : "Gimnasia (LP)",
"teamFoundation" : "1887-06-03T03:00:00.000+0000",
"teamCity" : 76,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
},
"goals" : {
"href" : "http://localhost:8080/matches/1/goals"
},
"homeTeam" : {
"href" : "http://localhost:8080/matches/1/homeTeam"
},
"competition" : {
"href" : "http://localhost:8080/matches/1/competition"
},
"matchStadium" : {
"href" : "http://localhost:8080/matches/1/matchStadium"
},
"awayTeam" : {
"href" : "http://localhost:8080/matches/1/awayTeam"
}
}
}
I need to do many calculations for a stats app and I have the logic in the front end so to build a match history between two teams, I need to make this request and it is taking about a second to retrieve everything, which is fine.
My problem now is that I want to build a table out of history matches, therefore I can't request the matches between 2 teams, I have to request all matches where a team participated.
Anyway, now I can't use that because instead of 200 matches, I get 3500 as a response, so it takes around 20 seconds to build the response.
I'm guessing that is because the API is returning all links and resolving both teams for each object which is fine but I don't need it so. Is there a way for me to create a projection (or any other class) that will return the literal version of my column instead of resolving the object reference?
I want my result to be like this:
{
"homeTeam" : 10,
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : 36,
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
}
}
When my table is built, I will call the teams endpoint to resolve the team's names.
So considering this, what I really need is to make it faster (like 20 times faster). So if this is not the right path, I would very much appreciate a suggestion.

Mongo driver updateMany not updating all documents

Using Mongo Java Driver version 3.2.1 against MongoDB 3.0.12.
Calling MongoCollection.updateMany(Bson filter, Bson update) returns a result showing all expected documents were modified, however only a portion of the documents were actually updated.
I've tried with multiple write concerns: JOURNALED, ACKNOWLEDGED, etc
Any ideas?
Here is the profile result:
{ "op" : "update", "ns" : "dev.timeSheet", "query" : { "lineItems.task" : ObjectId("53233e85e4b07f573f1d4466") }, "updateobj" : { "$set" : { "lineItems.$.task" : ObjectId("53233e85e4b07f573f1d446d") } }, "nscanned" : 0, "nscannedObjects" : 6733, "nMatched" : 248, "nModified" : 248, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 52, "locks" : { "Global" : { "acquireCount" : { "r" : NumberLong(53), "w" : NumberLong(53) } }, "MMAPV1Journal" : { "acquireCount" : { "w" : NumberLong(301) } }, "Database" : { "acquireCount" : { "w" : NumberLong(53) } }, "Collection" : { "acquireCount" : { "W" : NumberLong(53) } } }, "millis" : 50, "execStats" : { }, "ts" : ISODate("2016-08-25T18:17:16.025Z"), "client" : "127.0.0.1", "allUsers" : [ ], "user" : "" }
Update: Also occurs in MongoDB 3.2.9
Direct access calls:
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
126
db.timeSheet.updateMany({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}, {'$set': {'lineItems.$.task': ObjectId("53233e85e4b07f573f1d446d")}})
{ "acknowledged" : true, "matchedCount" : 126, "modifiedCount" : 126 }
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
90
This isn't working the way I expected because it is only updating the first lineItem it finds for each time sheet.
https://docs.mongodb.com/manual/reference/operator/update/positional/#up.S
Remember that the positional $ operator acts as a placeholder for the first match of the update query document.
No feature exists currently to update all items in the embedded array.
https://jira.mongodb.org/browse/SERVER-1243

MongoDB (java) takes a long time for a find

I have a mongdb database with the following stats:
{
"ns" : "pourmoi.featurecount",
"count" : 12152142,
"size" : 1361391264,
"avgObjSize" : 112,
"numExtents" : 19,
"storageSize" : 1580150784,
"lastExtentSize" : 415174656,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0.
It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 1165210816,
"indexSizes" : {
"_id_" : 690111632,
"feature_1" : 475099184
},
"ok" : 1
}
and a java program that does a find and returns around 50 results.
This is the query
db.featurecount.find(
{ "$or" : [ { "feature" : "hello"}, { "feature" : "how"},
{ "feature" : "are"} , { "feature" : "you"} ]})
.sort({count: -1}).limit(20)
This query takes around 30 seconds (at least)... Is it possible to make it run faster?
PS: The mongodb server is in localhost...

Find the average of array element in MongoDB using java

I am new to MongoDB. I have to find the average of array element in Mongo DB
e.g
{
"_id" : ObjectId("51236fbc3004f02f87c62e8e"),
"query" : "iPad",
"rating" : [
{
"end" : "130",
"inq" : "403",
"executionTime" : "2013-02-19T12:27:40Z"
},
{
"end" : "152",
"inq" : "123",
"executionTime" : "2013-02-19T12:35:28Z"
}
]
}
I want the average of "inq" where query:iPad
Here the output should be:
inq=263
I searched in google and got the aggregate method but not able to convert it in java code.
Thanks in advance
Regards
Let's try to decompose that problem. I would start with:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0,
q:"$query",i:"$rating.inq"}})
The projection is not required but makes the rest a little bit more readable:
{
"result" : [
{
"q" : "iPad",
"i" : "403"
},
{
"q" : "iPad",
"i" : "123"
}
],
"ok" : 1
}
So how do I group that? Of course, by "$q":
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0,
q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q"}}) :
{ "result" : [ { "_id" : "iPad" } ], "ok" : 1 }
Now let's add some aggregation operators:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0, q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q", max: {$max: "$i"}, min: {$min: "$i"}}}) :
{
"result" : [
{
"_id" : "iPad",
"max" : "403",
"min" : "123"
}
],
"ok" : 1
}
Now comes the average:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project:
{_id:0,q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q", av: {$avg:"$i"}}});
Try to get the java drivers for mongodb. I am able to get this link from mongodb site. Please check this: http://docs.mongodb.org/ecosystem/tutorial/use-aggregation-framework-with-java-driver/#java-driver-and-aggregation-framework

Categories