Convert Iterable<Entity> to list - mongoDB Java - java

what is the most efficient way to convert Iterable to list?
below is my code:
for slower collection:
Iterable<TestEntity> entityResult = testCollection.find(and(
or(
geoWithin("aLocation", BasicDBObject.parse(area)),
geoWithin("bLocation", BasicDBObject.parse(area))
),
lte(START_DATE, anotherDate),
gte(END_DATE, Date2),
and(exists(STATUS), ne(STATUS, "TEST"))
)).maxTime(mongoDisasterMaxExecutionMaxTime, TimeUnit.MINUTES);
List<TestEntity> testSummary = StreamSupport.stream(entityResult.spliterator(), true).collect(Collectors.toList());
but this takes almost 15 seconds for 80k records. I have another collection, which take about a second for 30k records.
I also have compound indexes on both collections. So, I don't think there is a issue with mongo returning data slowly. when I print logs for mongoDB find, its pretty fast, but the streamsupport command takes 15 seconds.
Collection 1 (slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“aLocation" : [
-100.672735,
32.849354
],
“bLocation" : [
-69.816366,
42.808233
]
}
Collection 2 ( not slow)
{
"_id" : “1234”,
"orderNumber" : “1234”,
"lastUpdateDate" : ISODate("2021-09-11T00:00:00.000Z"),
"lastModifiedTime" : NumberLong(1631400026077),
"product” : “xyzzy”,
“Feature” : "1",
"brand" : "Brand abc”,
“amountUSD" : 100.2,
“Desc” : “test”,
“Desc”2 : “test “2,
“anotherDate" : ISODate("2021-09-19T00:00:00.000Z"),
“Date2” : ISODate("2021-09-23T00:00:00.000Z"),
“IdNew” : “3456”,
“Count” : 1,
“customerName" : “Dave”,
“occuranceTime" : ISODate("2021-09-03T00:00:00.000Z"),
"status" : “confirmed,
“Location”Id : “8888”,
“aLocation" : [
3.063211,
50.633679
]
}

Related

MongoDB (java) takes a long time for a find

I have a mongdb database with the following stats:
{
"ns" : "pourmoi.featurecount",
"count" : 12152142,
"size" : 1361391264,
"avgObjSize" : 112,
"numExtents" : 19,
"storageSize" : 1580150784,
"lastExtentSize" : 415174656,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0.
It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 1165210816,
"indexSizes" : {
"_id_" : 690111632,
"feature_1" : 475099184
},
"ok" : 1
}
and a java program that does a find and returns around 50 results.
This is the query
db.featurecount.find(
{ "$or" : [ { "feature" : "hello"}, { "feature" : "how"},
{ "feature" : "are"} , { "feature" : "you"} ]})
.sort({count: -1}).limit(20)
This query takes around 30 seconds (at least)... Is it possible to make it run faster?
PS: The mongodb server is in localhost...

Poor MongoDB read performance

I have a sharded collection containing flight information. The schema looks something like:
{
"_id" : ObjectId("537ef1bb5516dd401b5b109a"),
"departureAirport" : "HAJ",
"arrivalAirport" : "AYT",
"departureDate" : NumberLong("1412553600000"),
"operatingAirlineCode" : "DE",
"operatingFlightNumber" : "1808",
"flightClass" : "P",
"fareType" : "EX",
"availability" : "*"
}
Here are the statistics of my collection:
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "flights.flight",
"count" : 2809822,
"numExtents" : 30,
"size" : 674357280,
"storageSize" : 921788416,
"totalIndexSize" : 287746144,
"indexSizes" : {
"_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
},
"avgObjSize" : 240,
"nindexes" : 2,
"nchunks" : 869,
"shards" : {
"shard0000" : {
"ns" : "flights.flight",
"count" : 1396165,
"size" : 335079600,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 144633440,
"indexSizes" : {
"_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
},
"ok" : 1
},
"shard0001" : {
"ns" : "flights.flight",
"count" : 1413657,
"size" : 339277680,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 143112704,
"indexSizes" : {
"_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
},
"ok" : 1
}
},
"ok" : 1
}
I now run the queries from JAVA which look like:
{
"departureAirport" : "BSL",
"arrivalAirport" : "SMF",
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
"flightClass" : "C",
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
"fareType" : "OW"
}
The departureDate should be in between a range of a week and availability should be greater than the requested number or '*'.
My question is what can i do to increase my performance. When I query the database with 50 connections per host I only get about 1000 ops/s but I need to get something about 3000 - 5000 ops/s.
The cursor looks okay when I run the query in the shell:
"cursor" : "BtreeCursor departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1"
If I forgot something please write me. Thanks in advance.
The fact that a BtreeCursor is used doesn't make the query OK. The output of explain would help to identify the issue.
I guess a key problem is the order of your query params:
// equality, good
"departureAirport" : "BSL",
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
// again, looks like low selectivity to me. Since it's already at the end,
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"
You might want to change your index to something like
"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"
and query in that exact same order. Append everything else behind, so scans must be made only on those documents that matched all the other criteria in the index.
I'm assuming that flightClass and fareType have low selectivity. If that is not true, this won't be the best solution.

Error in MapReduce command with MongoDB Java API

I'm testing the MongoDB Java API and I wanted to do a mapReduce.
I implemented it as follow :
String map = "function() { " +
"emit(this.ts, this.1_bid);}";
String reduce = "function(key, values) {" +
"return Array.sum(values);}";
MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce, null, MapReduceCommand.OutputType.INLINE, null);
MapReduceOutput out = collection.mapReduce(cmd);
for (DBObject o : out.results()) {
System.out.println(o.toString());
}
But when I execute it I have the following exception stack :
[tick_engine] 16:51:53.600 ERROR [MongoTickDataReader] Failed to read data from mongoDB
com.mongodb.CommandFailureException: { "serverUsed" : "/127.0.0.1:27017" , "errmsg" : "exception: SyntaxError: Unexpected token ILLEGAL" , "code" : 16722 , "ok" : 0.0}
at com.mongodb.CommandResult.getException(CommandResult.java:71) ~[mongo-2.11.1.jar:na]
at com.mongodb.CommandResult.throwOnError(CommandResult.java:110) ~[mongo-2.11.1.jar:na]
at com.mongodb.DBCollection.mapReduce(DBCollection.java:1265) ~[mongo-2.11.1.jar:na]
at com.smarttrade.tickEngine.in.MongoTickDataReader.mapReduce(MongoTickDataReader.java:321) ~[classes/:na]
at com.smarttrade.tickEngine.in.MongoTickDataReader.readData(MongoTickDataReader.java:157) ~[classes/:na]
at com.smarttrade.tick.engine.TickEngine.onMarketDataRequest(TickEngine.java:203) [classes/:na]
at com.smarttrade.tick.sttp.TickMarketDataRequestCommand.execute(TickMarketDataRequestCommand.java:62) [classes/:na]
at com.smarttrade.st.commands.Command.process(Command.java:140) [src/:na]
at com.smarttrade.st.server.STTPInvoker$1.process(STTPInvoker.java:385) [src/:na]
at com.smarttrade.st.server.STTPInvoker$1.process(STTPInvoker.java:1) [src/:na]
at com.smarttrade.util.concurrent.queue.MultiSessionsBlockingQueue$SimpleSession.run(MultiSessionsBlockingQueue.java:122) [src/:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
The problem seems to be with the attribute name that you have defined - 1_bid
I created sample documents to test your map-reduce -
{ "_id" : ObjectId("533ef7d0e1687dd644410d88"), "ts" : "TSKEY", "1_bid" : 200 }
{ "_id" : ObjectId("533ef7d3e1687dd644410d89"), "ts" : "TSKEY", "1_bid" : 300 }
{ "_id" : ObjectId("533ef7d5e1687dd644410d8a"), "ts" : "TSKEY", "1_bid" : 400 }
{ "_id" : ObjectId("533ef7dce1687dd644410d8b"), "ts" : "TSKEY2", "1_bid" : 800 }
{ "_id" : ObjectId("533ef7dfe1687dd644410d8c"), "ts" : "TSKEY2", "1_bid" : 300 }
I ran following map-reduce command -
db.sample4.mapReduce(function() { emit(this.ts, this.1_bid);},function(key, values) {return Array.sum(values);})
The error that I got is SyntaxError: missing ) after argument list (shell):1
I realized, that the function, that mapper is executing, is a JavaScript function and in Javascript, you cannot have a variable that starts with a number. Hence you get a syntax error. I then created new set of documents -
{ "_id" : ObjectId("533eff29e1687dd644410d8d"), "ts" : "TSKEY", "bid_1" : 200 }
{ "_id" : ObjectId("533eff2de1687dd644410d8e"), "ts" : "TSKEY", "bid_1" : 300 }
{ "_id" : ObjectId("533eff34e1687dd644410d8f"), "ts" : "TSKEY", "bid_1" : 400 }
{ "_id" : ObjectId("533eff7fe1687dd644410d92"), "ts" : "TSKEY2", "bid_1" : 800 }
{ "_id" : ObjectId("533eff85e1687dd644410d93"), "ts" : "TSKEY2", "bid_1" : 300 }
and then modified the mapper to use "bid_1" and ran the following command -
db.sample4.mapReduce(function() { emit(this.ts, this.bid_1);},function(key, values) {return Array.sum(values);},"pivot")
The output was -
{
"result" : "pivot",
"timeMillis" : 61,
"counts" : {
"input" : 12,
"emit" : 12,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
db.pivot.find()
{ "_id" : "TSKEY", "value" : 900 }
{ "_id" : "TSKEY2", "value" : 1100 }
I tested this in Java using the same program that you have pasted and just changed the attribute name to "bid_1" and it worked
To prevent syntax errors on field names, you can also write the map function this way :
function() {
emit(this["ts"], this["1_bid"]);
}

Find the average of array element in MongoDB using java

I am new to MongoDB. I have to find the average of array element in Mongo DB
e.g
{
"_id" : ObjectId("51236fbc3004f02f87c62e8e"),
"query" : "iPad",
"rating" : [
{
"end" : "130",
"inq" : "403",
"executionTime" : "2013-02-19T12:27:40Z"
},
{
"end" : "152",
"inq" : "123",
"executionTime" : "2013-02-19T12:35:28Z"
}
]
}
I want the average of "inq" where query:iPad
Here the output should be:
inq=263
I searched in google and got the aggregate method but not able to convert it in java code.
Thanks in advance
Regards
Let's try to decompose that problem. I would start with:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0,
q:"$query",i:"$rating.inq"}})
The projection is not required but makes the rest a little bit more readable:
{
"result" : [
{
"q" : "iPad",
"i" : "403"
},
{
"q" : "iPad",
"i" : "123"
}
],
"ok" : 1
}
So how do I group that? Of course, by "$q":
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0,
q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q"}}) :
{ "result" : [ { "_id" : "iPad" } ], "ok" : 1 }
Now let's add some aggregation operators:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project: {_id:0, q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q", max: {$max: "$i"}, min: {$min: "$i"}}}) :
{
"result" : [
{
"_id" : "iPad",
"max" : "403",
"min" : "123"
}
],
"ok" : 1
}
Now comes the average:
db.c.aggregate({$match: {query: "iPad"}}, {$unwind:"$rating"}, {$project:
{_id:0,q:"$query",i:"$rating.inq"}}, {$group:{_id: "$q", av: {$avg:"$i"}}});
Try to get the java drivers for mongodb. I am able to get this link from mongodb site. Please check this: http://docs.mongodb.org/ecosystem/tutorial/use-aggregation-framework-with-java-driver/#java-driver-and-aggregation-framework

How to construct query to update nested array document in mongo?

I am having following document in mongo,
{
"_id" : ObjectId("506e9e54a4e8f51423679428"),
"description" : "ffffffffffffffff",
"menus" : [
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "ffffffffffffffffffff",
"items" : [
{
"name" : "xcvxc",
"description" : "vxvxcvxc",
"text" : "vxcvxcvx",
"menuKey" : "0",
"onSelect" : "1",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ,
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
]
},
{
"_id" : ObjectId("506e9e5aa4e8f51423679429"),
"description" : "rrrrr",
"items" : [ {
"name" : "xcc",
"description" : "vx",
"text" : "vxc",
"menuKey" : "0",
"onSelect" : "2",
"_id" : ObjectId("506e9f07a4e8f5142367942f")
} ]
}
]
}
Now , i want to update the following document :
{
"name" : "abcd",
"description" : "qqq",
"text" : "qqq",
"menuKey" : "0",
"onSelect" : "3",
"_id" : ObjectId("507e9f07a4e8f5142367942f")
}
I am having main documnet id: "_id" : ObjectId("506e9e54a4e8f51423679428") and menus id
"_id" : ObjectId("506e9e54a4e8f51423679428") as well as items id "_id" : ObjectId("507e9f07a4e8f5142367942f") which is to be updated.
I have tried using the following query:
db.collection.update({ "_id" : { "$oid" : "506e9e54a4e8f51423679428"} , "menus._id" : { "$oid" : "506e9e5aa4e8f51423679429"}},{ "$set" : { "menus.$.items" : { "_id" : { "$oid" : "506e9f07a4e8f5142367942f"}} , "menus.$.items.$.name" : "xcvxc66666", ...}},false,false);
but its not working...
The positional operator does not work on the number of levels you are trying to get it to work on ( https://jira.mongodb.org/browse/SERVER-831?focusedCommentId=22438&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel ) with menus.$.items.$.name and even if it did MongoDB query parser would have no idea what the other $ is from the find of the update.
You will need to pull out the items from the schema, update that seprately and then update the root document.
One good way of judging when queries should be done separately is to think that each menu sounds like a separate entity (or table in a relational database) as such you should probably work on updating those entites (or tables in a relational model) separately to the parent entity (table).
So first you would get out the main root document. Scroll across it's menus in client side and then $set that particular menu to the entire item you build on client side.
Edit
The way I imagine this work client side is (in pseudo code since my Java is a little rusty) by first getting that document in an active record fashion:
doc = db.col.find({ "_id" : { "$oid" : "506e9e54a4e8f51423679428"} ,
"menus._id" : { "$oid" : "506e9e5aa4e8f51423679429"}});
Then you would iterate through the document assigning your values:
foreach(doc.menus as menu_key => menu){
foreach(menu['items'] as key => item){
if(item._id == { "$oid" : "506e9f07a4e8f5142367942f"}){
doc.menus[menu_key][key][name] = "xcvxc66666"
}
}
}
And then simple save the doc after all changes are commited:
db.col.save(doc);
This is of course just one way of doing it and this way uses the activen record paradigm which I personally like. In this idea you would combine the find with everything else you need to modify on the document, building it up client side and then sending it all down as one single query to your DB.

Categories