Mongo driver updateMany not updating all documents - java

Using Mongo Java Driver version 3.2.1 against MongoDB 3.0.12.
Calling MongoCollection.updateMany(Bson filter, Bson update) returns a result showing all expected documents were modified, however only a portion of the documents were actually updated.
I've tried with multiple write concerns: JOURNALED, ACKNOWLEDGED, etc
Any ideas?
Here is the profile result:
{ "op" : "update", "ns" : "dev.timeSheet", "query" : { "lineItems.task" : ObjectId("53233e85e4b07f573f1d4466") }, "updateobj" : { "$set" : { "lineItems.$.task" : ObjectId("53233e85e4b07f573f1d446d") } }, "nscanned" : 0, "nscannedObjects" : 6733, "nMatched" : 248, "nModified" : 248, "fastmod" : true, "keyUpdates" : 0, "writeConflicts" : 0, "numYield" : 52, "locks" : { "Global" : { "acquireCount" : { "r" : NumberLong(53), "w" : NumberLong(53) } }, "MMAPV1Journal" : { "acquireCount" : { "w" : NumberLong(301) } }, "Database" : { "acquireCount" : { "w" : NumberLong(53) } }, "Collection" : { "acquireCount" : { "W" : NumberLong(53) } } }, "millis" : 50, "execStats" : { }, "ts" : ISODate("2016-08-25T18:17:16.025Z"), "client" : "127.0.0.1", "allUsers" : [ ], "user" : "" }
Update: Also occurs in MongoDB 3.2.9
Direct access calls:
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
126
db.timeSheet.updateMany({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}, {'$set': {'lineItems.$.task': ObjectId("53233e85e4b07f573f1d446d")}})
{ "acknowledged" : true, "matchedCount" : 126, "modifiedCount" : 126 }
db.timeSheet.find({'lineItems.task': ObjectId("53233e85e4b07f573f1d4466")}).count()
90

This isn't working the way I expected because it is only updating the first lineItem it finds for each time sheet.
https://docs.mongodb.com/manual/reference/operator/update/positional/#up.S
Remember that the positional $ operator acts as a placeholder for the first match of the update query document.
No feature exists currently to update all items in the embedded array.
https://jira.mongodb.org/browse/SERVER-1243

Related

Elastic termsQuery not giving expected result

I have an index where each of my objects has status field which can have some predefined values. I want to fetch all of them which has statusINITIATED, UPDATED, DELETED, any match with these and hence created this query by java which I got printing on console, using Querybuilder and nativeSearchQuery, executing by ElasticsearchOperations:
{
"bool" : {
"must" : [
{
"terms" : {
"status" : [
"INITIATED",
"UPDATED",
"DELETED"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
I have data in my index with 'INITIATED' status but not getting anyone with status mentioned in the query. How to fix this query, please?
If you need anything, please let me know.
Update: code added
NativeSearchQueryBuilder nativeSearchQueryBuilder=new NativeSearchQueryBuilder();
QueryBuildersingleQb=QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("status",statusList));
Pageable pageable = PageRequest.of(0, 1, Sort.by(Defs.START_TIME).ascending());
FieldSortBuilder sort = SortBuilders.fieldSort(Defs.START_TIME).order(SortOrder.ASC);
nativeSearchQueryBuilder.withQuery(singleQb);
nativeSearchQueryBuilder.withSort(sort);
nativeSearchQueryBuilder.withPageable(pageable);
nativeSearchQueryBuilder.withIndices(Defs.SCHEDULED_MEETING_INDEX);
nativeSearchQueryBuilder.withTypes(Defs.SCHEDULED_MEETING_INDEX);
NativeSearchQuery searchQuery = nativeSearchQueryBuilder.build();
List<ScheduledMeetingEntity> scheduledList=elasticsearchTemplate.queryForList(searchQuery, ScheduledMeetingEntity.class);
Update 2: sample data:
I got this from kibana query on this index:
"hits" : [
{
"_index" : "index_name",
"_type" : "type_name",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-03-03T13:09:59.198",
"createTimeInMs" : 1614755399198,
"createdBy" : "user1#domain.com",
"editTime" : "2021-03-03T13:09:59.198",
"editTimeInMs" : 1614755399198,
"editedBy" : "user1#domain.com",
"versionId" : 1,
"id" : "1",
"meetingId" : "47",
"userId" : "129",
"username" : "user1#domain.com",
"recipient" : [
"user1#domain.com"
],
"subject" : "subject",
"body" : "hi there",
"startTime" : "2021-03-04T07:26:00.000",
"endTime" : "2021-03-04T07:30:00.000",
"meetingName" : "name123",
"meetingPlace" : "placeName",
"description" : "sfsafsdafsdf",
"projectName" : "",
"status" : "INITIATED",
"failTry" : 0
}
}
]
Confirm your mapping:
GET /yourIndexName/_mapping
And see if it is valid
Your mapping needs to have keyword for TermsQuery to work.
{
"status": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
ES can automatically do the mapping for you (without you having to do it yourself) when you first push a document. However you probably have finer control if you do the mapping yourself.
Either way, you need to have keyword defined for your status field.
=====================
Alternative Solution: (Case Insensitive)
If you have a Field named (status), and the values you want to search for are (INITIATED or UPDATED, or DELETED).
Then you can do it like this:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must(createStringSearchQuery());
public QueryBuilder createStringSearchQuery(){
QueryStringQueryBuilder queryBuilder = QueryBuilders.queryStringQuery(" INITIATED OR UPDATED OR DELETED ");
queryBuilder.defaultField("status");
return queryBuilder;
}
Printing the QueryBuilder:
{
"query_string" : {
"query" : "INITIATED OR UPDATED OR DELETED",
"default_field" : "status",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}

How to get limited results from Multiple Queries (x records from query 1 and y records from query 2) in Elastic Search

I have two distinct queries, and I want to get a combined result from them, i.e. x number of records from query 1 and y number of records from query 2.
How do I do that?
Currently I am using the query below:
**{ "bool" : {
"should" : [
{
"multi_match" : {
"query" : "indea",
"fields" : [
"admin^1.0",
"city^1.0",
"country^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"fuzziness" : "2",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"multi_match" : {
"query" : "Maharashtra",
"fields" : [
"admin^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}**

How can I retrieve the value from a column instead of the link using Spring Data JPA?

I'm using a projection to retrieve a list of matches with the teams inline.
Projection:
#Projection(name = "matchInlineTeams", types = { Match.class })
public interface MatchInlineTeams {
Team getHomeTeam();
Long getHomeTeamGoals();
Long getAwayTeamGoals();
Team getAwayTeam();
}
And my result is a collection of these:
{
"homeTeam" : {
"teamName" : "Banfield",
"teamFoundation" : "1896-01-21T03:00:00.000+0000",
"teamCity" : 73,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : {
"teamName" : "Gimnasia (LP)",
"teamFoundation" : "1887-06-03T03:00:00.000+0000",
"teamCity" : 76,
"teamCountry" : "ARG",
"handler" : { },
"hibernateLazyInitializer" : { }
},
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
},
"goals" : {
"href" : "http://localhost:8080/matches/1/goals"
},
"homeTeam" : {
"href" : "http://localhost:8080/matches/1/homeTeam"
},
"competition" : {
"href" : "http://localhost:8080/matches/1/competition"
},
"matchStadium" : {
"href" : "http://localhost:8080/matches/1/matchStadium"
},
"awayTeam" : {
"href" : "http://localhost:8080/matches/1/awayTeam"
}
}
}
I need to do many calculations for a stats app and I have the logic in the front end so to build a match history between two teams, I need to make this request and it is taking about a second to retrieve everything, which is fine.
My problem now is that I want to build a table out of history matches, therefore I can't request the matches between 2 teams, I have to request all matches where a team participated.
Anyway, now I can't use that because instead of 200 matches, I get 3500 as a response, so it takes around 20 seconds to build the response.
I'm guessing that is because the API is returning all links and resolving both teams for each object which is fine but I don't need it so. Is there a way for me to create a projection (or any other class) that will return the literal version of my column instead of resolving the object reference?
I want my result to be like this:
{
"homeTeam" : 10,
"homeTeamGoals" : 2,
"awayTeamGoals" : 0,
"awayTeam" : 36,
"_links" : {
"self" : {
"href" : "http://localhost:8080/matches/1"
},
"match" : {
"href" : "http://localhost:8080/matches/1{?projection}",
"templated" : true
}
}
When my table is built, I will call the teams endpoint to resolve the team's names.
So considering this, what I really need is to make it faster (like 20 times faster). So if this is not the right path, I would very much appreciate a suggestion.

MongoDB (java) takes a long time for a find

I have a mongdb database with the following stats:
{
"ns" : "pourmoi.featurecount",
"count" : 12152142,
"size" : 1361391264,
"avgObjSize" : 112,
"numExtents" : 19,
"storageSize" : 1580150784,
"lastExtentSize" : 415174656,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0.
It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 1165210816,
"indexSizes" : {
"_id_" : 690111632,
"feature_1" : 475099184
},
"ok" : 1
}
and a java program that does a find and returns around 50 results.
This is the query
db.featurecount.find(
{ "$or" : [ { "feature" : "hello"}, { "feature" : "how"},
{ "feature" : "are"} , { "feature" : "you"} ]})
.sort({count: -1}).limit(20)
This query takes around 30 seconds (at least)... Is it possible to make it run faster?
PS: The mongodb server is in localhost...

Poor MongoDB read performance

I have a sharded collection containing flight information. The schema looks something like:
{
"_id" : ObjectId("537ef1bb5516dd401b5b109a"),
"departureAirport" : "HAJ",
"arrivalAirport" : "AYT",
"departureDate" : NumberLong("1412553600000"),
"operatingAirlineCode" : "DE",
"operatingFlightNumber" : "1808",
"flightClass" : "P",
"fareType" : "EX",
"availability" : "*"
}
Here are the statistics of my collection:
{
"sharded" : true,
"systemFlags" : 1,
"userFlags" : 1,
"ns" : "flights.flight",
"count" : 2809822,
"numExtents" : 30,
"size" : 674357280,
"storageSize" : 921788416,
"totalIndexSize" : 287746144,
"indexSizes" : {
"_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
},
"avgObjSize" : 240,
"nindexes" : 2,
"nchunks" : 869,
"shards" : {
"shard0000" : {
"ns" : "flights.flight",
"count" : 1396165,
"size" : 335079600,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 144633440,
"indexSizes" : {
"_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
},
"ok" : 1
},
"shard0001" : {
"ns" : "flights.flight",
"count" : 1413657,
"size" : 339277680,
"avgObjSize" : 240,
"storageSize" : 460894208,
"numExtents" : 15,
"nindexes" : 2,
"lastExtentSize" : 124993536,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 143112704,
"indexSizes" : {
"_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
},
"ok" : 1
}
},
"ok" : 1
}
I now run the queries from JAVA which look like:
{
"departureAirport" : "BSL",
"arrivalAirport" : "SMF",
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
"flightClass" : "C",
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
"fareType" : "OW"
}
The departureDate should be in between a range of a week and availability should be greater than the requested number or '*'.
My question is what can i do to increase my performance. When I query the database with 50 connections per host I only get about 1000 ops/s but I need to get something about 3000 - 5000 ops/s.
The cursor looks okay when I run the query in the shell:
"cursor" : "BtreeCursor departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1"
If I forgot something please write me. Thanks in advance.
The fact that a BtreeCursor is used doesn't make the query OK. The output of explain would help to identify the issue.
I guess a key problem is the order of your query params:
// equality, good
"departureAirport" : "BSL",
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : {
"$gte" : 1402617600000,
"$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [
{ "availability" : { "$gte" : "3"}},
{ "availability" : "*"}
] ,
// again, looks like low selectivity to me. Since it's already at the end,
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"
You might want to change your index to something like
"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"
and query in that exact same order. Append everything else behind, so scans must be made only on those documents that matched all the other criteria in the index.
I'm assuming that flightClass and fareType have low selectivity. If that is not true, this won't be the best solution.

Categories