Get matched index value of array in MongoDB Java - java

I am using mongodb with java and my documents looks like :
{
_id: ObjectId("abcd1234rf54")
createdDate: "12/11/15"
type: 1
nameIdentity: [
{"name":"a"},
{"name":"b"},
{"name":"c"}
]
}
Where nameIdentity is an array of name documents. I am trying to query on name and find out index of matched document.
For eg: my query is Document resultDocument = mongoDatabase.getCollection(test).find(new Document("nameIdentity.name","b")).first();.
When this query is executed it gives me the result document/matched document. But what I also want is the index of the result document. I mean at what index there is a match. Is this possible in this approach or is there some other way to do so. Any suggestions are highly appreciated.

Related

optimize mongo query to get max date in a very short time

I'm using the query bellow to get max date (field named extractionDate) in a collection called KPI, and since I'm only interested in the field extractionDate:
#Override
public Mono<DBObject> getLastExtractionDate(MatchOperation matchOperation,ProjectionOperation projectionOperation) {
return Mono.from(mongoTemplate.aggregate(
newAggregation(
matchOperation,
projectionOperation,
group().max(EXTRACTION_DATE).as("result"),
project().andExclude("_id")
),
"kpi",
DBObject.class
));
}
And as you see above, I need to filter the result firstly using the match operation (matchOperation) after that, I'm doing a projection operation to extract only the max of field "extractionDate" and rename it as result.
But this query cost a lot of time (sometimes more than 20 seconds) because I have a huge amount of data, I already added an index on the field extractionDate but I did not gain a lot, so I'm looking for a way to mast it fast as max as possible.
update:
Number of documents we have in the collection kpi: 42.8m documents
The query that being executed:
Streaming aggregation: [{ "$match" : { "type" : { "$in" : ["INACTIVE_SITE", "DEVICE_NOT_BILLED", "NOT_REPLYING_POLLING", "MISSING_KEY_TECH_INFO", "MISSING_SITE", "ACTIVE_CIRCUITS_INACTIVE_RESOURCES", "INCONSISTENT_STATUS_VALUES"]}}}, { "$project" : { "extractionDate" : 1, "_id" : 0}}, { "$group" : { "_id" : null, "result" : { "$max" : "$extractionDate"}}}, { "$project" : { "_id" : 0}}] in collection kpi
explain plan:
Example of a document in the collection KPI:
And finally the indexes that already exist on this collection :
Index tuning will depend more on the properties in the $match expression. You should be able to run the query in mongosh with and get an explain plan to determine if your query is scanning the collection.
Other things to consider is the size of the collection versus the working set of the server.
Perhaps update your question with the $match expression, and the explain plan and perhaps the current set of index definitions and we can refine the indexing strategy.
Finally, "huge" is rather subjective? Are you querying millions or billions or documents, and what is the average document size?
Update:
Given that you're filtering on only one field, and aggregating on one field, you'll find the best result will be an index
{ "type":1,"extractionDate":1}
That index should cover your query -- because the $in will mean that a scan will be selected but a scan over a small index is significantly better than over the whole collection of documents.
NB. The existing index extractionDate_1_customer.irType_1 will not be any help for this query.
I was able to optimize the request thanks to previous answers using this approach:
#Override
public Mono<DBObject> getLastExtractionDate(MatchOperation matchOperation,ProjectionOperation projectionOperation) {
return Mono.from(mongoTemplate.aggregate(
newAggregation(
matchOperation,
sort(Sort.Direction.DESC,EXTRACTION_DATE),
limit(1),
projectionOperation
),
"kpi",
DBObject.class
));
}
Also I had to create a compound index on extractionDate and type (the field I had in matchOperation) like bellow:

Elasticsearch won't find any results for a search containing letters

I'm begining with elasticsearch in java.
I can make a query that matches documents having a property containing a text. This property is a string.
The results are strange: when I'm searching for numbers in a string, I have some results but as soon as the query contains a letter no results are returned.
Here is a summary of the current behavior:
I have 2 documents :
{
model:"123",
serialNumber: "123"
}
and
{
model:"123",
serialNumber: "TT123"
}
If I'm searching for "123" I have 2 results => OK
If I'm searching for "TT", I have no results.
I'm using wildcard.
Here is a sample of my code:
BoolQueryBuilder bqb = new BoolQueryBuilder();
bqb.should(new WildcardQueryBuilder("serialNumber", "*TT*"));
/*or bqb.should(new WildcardQueryBuilder("serialNumber", "*123*"));*/
return QueryBuilders.filteredQuery(bqb, null);
Does "*tt*" find 2 results? Wildcard queries are not analyzed, but your analyzed index probably is so the index contains "tt123", which will not match "*TT*" but "*tt*" will.
That said, Wildcards are slow, you should should look into other analyzers, such as ngram to create your index.

Solr : How to Index a field in document as json field?

I know we can index a document as json but I want to index a field inside my document as json.
e.g.
{
id:"Person1",
name:"bob",
associatedCompanies:[
{
companyName:"apple",
companyId:"c1"
},
{
companyName:"google",
companyId:"c2"
}
]
}
I can have associatedCompanies field as an array by declaring it as multiValued in schema. But how can I add company element as json?
I don't think the parent-child example applies here since in this use case, the json element which is nested is not exactly same as the document. I just want to add some json element in my document.
Does anyone have any idea how this can be indexed? And how to query with such index? Is it possible to do query like below..
id:person AND name:bob AND associatedCompanies:[{
companyName:"apple",
companyId:"c1"
}]
or
id:person AND name:bob AND associatedCompanies:[{
companyName:"apple"
}]
In second query, will I get the response with the document having apple company?
Try out : Solr Nested Documents
and the Block Join Queries

mongoDB: $inc of a nonexistent document in an array

I was not able to write a code, which would be able to increment a non-existent value in an array.
Let's consider a following structure in a mongo collection. (This is not the actual structure we use, but it maintains the issue)
{
"_id" : ObjectId("527400e43ca8e0f79c2ce52c"),
"content" : "Blotted Science",
"tags_with_ratings" : [
{
"ratings" : {
"0" : 6154,
"1" : 4974
},
"tag_name" : "math_core"
},
{
"ratings" : {
"0" : 154,
"1" : 474,
},
"tag_name" : "progressive_metal"
}
]
}
Example issue: We want to add to this document into the tags_with_ratings attribute an incrementation of a rating of a tag, which is not yet added in the array. For example we would want to increment a "0" value for a tag_name "dubstep".
So the expected behaviour would be, that mongo would upsert a document like this into the "tags_with_ratings" attribute:
{
"ratings" : {
"0" : 1
},
"tag_name" : "dubstep"
}
At the moment, we need to have one read operation, which checks if the nested document for the tag is there. If it's not, we pull the array tags_with_ratings out, create a new one, re-add the values from the previous one and add the new nested document in there. Shouldn't we be able to do this with one upsert operation, without having the expensive read happen?
The incrementation of the values takes up 90% of the process and more than half of it is consumed by reading, because we are unable to use $inc capability of creating an attribute, if it is non-existent in the array.
You cannot achieve what you want with one step using this schema.
You could do it however if you used tag_name as the key name instead of using ratings there, but then you may have a different issue when querying.
If the tag_name value was the field name (replacing ratings) you'd have {"dubstep":{"0":1}} instead of { "ratings" : {"0" : 1},"tag_name" : "dubstep"} which you can update dynamically the way you want to. Just keep in mind that this schema will make it more difficult to query - you have to know what the ratings are in advance to be able to query by keyname.

MongoDB range search on numbers doesn't work as expected

I am trying to do a range search on some numbers in MongoDB.
I have the following two records. The import fields are the last two, value and type. I want to be able to get records back that have a value: withing some range.
{ "_id" : ObjectId("4eace8570364cc13b7cfa59b"), "sub_id" : -1630181078, "sub_uri" : "http://datetest.com/datetest", "pro_id" : -1630181078, "pro_uri" : "http://datetest.com/datetest", "type" : "number", "value" : "8969.0" }
{ "_id" : ObjectId("4eacef7303642d3d1adcbdaf"), "sub_id" : -1630181078, "sub_uri" : "http://datetest.com/datetest", "pro_id" : -1630181078, "pro_uri" : "http://datetest.com/datetest", "type" : "number", "value" : "3423.0" }
When I do this query
> db.triples.find({"value":{$gte: 908}});
So I do this query:
> db.triples.find({"value":{$gte: "908"}});
And again neither of the two expected records is returned, (although in this case some other record containing a date is returned).
I expect to get the two records above, but neither of them is displayed.
I can see that there are quotation marks around the numbers - does this mean they are being stored as "Strings" and therefore the numeric search doesn't work? I have very explicitly saved them as a Double hence the ".0" that's appearing.
Or could there be some other reason that the find(... query ...) command isn't working as expected?
EDIT:
Insertion code looks like this (exception handling removed):
t.getObject().getValue() returns a String - this is working fine. I then use this to instantiate a Double which is what I was hoping would get saved to MongoDB, and allow numeric range searches.
triple.put("value",new Double(t.getObject().getValue()));
DBCollection triples;
triples = db.getCollection("triples");
triples.update(triple,triple,true,false);
You're right -- they are saved as strings and $gte will presumably use lexicographical order. Which mongoDB driver are you using? How exactly do you insert those records?

Categories