MongoDb - Update collection atomically if set does not exist

MongoDb - Update collection atomically if set does not exist - java

I have the following document in my collection:
{
"_id":NumberLong(106379),
"_class":"x.y.z.SomeObject",
"name":"Some Name",
"information":{
"hotelId":NumberLong(106379),
"names":[
{
"localeStr":"en_US",
"name":"some Other Name"
}
],
"address":{
"address1":"5405 Google Avenue",
"city":"Mountain View",
"cityIdInCitiesCodes":"123456",
"stateId":"CA",
"countryId":"US",
"zipCode":"12345"
},
"descriptions":[
{
"localeStr":"en_US",
"description": "Some Description"
}
],
},
"providers":[
],
"some other set":{
"a":"bla bla bla",
"b":"bla,bla bla",
}
"another Property":"fdfdfdfdfdf"
}
I need to run through all documents in collection and if "providers": [] is empty I need to create new set based on values of information section.
I'm far from being MongoDB expert, so I have the few questions:
Can I do it as atomic operation?
Can I do this using MongoDB console? as far as I understood I can do it using $addToSet and $each command?
If not is there any Java based driver that can provide such functionality?

Can I do it as atomic operation?
Every document will be updated in an atomic fashion. There is no "atomic" in MongoDB in the sense of RDBMS, meaning all operations will succeed or fail, but you can prevent other writes interleaves using $isolated operator
Can I do this using MongoDB console?
Sure you can. To find all empty providers array you can issue a command like:
db.zz.find(providers :{ $size : 0}})
To update all documents where the array is of zero length with a fixed set of string, you can issue a query such as
db.zz.update({providers : { $size : 0}}, {$addToSet : {providers : "zz"}})
If you want to add a portion to you document based on a document's data, you can use the notorious $where query, do mind the warnings appearing in that link, or - as you had mentioned - query for empty provider array, and use cursor.forEach()
If not is there any Java based driver that can provide such functionality?
Sure, you have a Java driver, as for each other major programming language. It can practically do everything described, and basically every thing you can do from the shell. Is suggest you to get started from the Java Language Center.
Also there are several frameworks which facilitate working with MongoDB and bridge the object-document world. I will not give a least here as I'm pretty biased, but I'm sure a quick Google search can do.

db.so.find({ providers: { $size: 0} }).forEach(function(doc) {
doc.providers.push( doc.information.hotelId );
db.so.save(doc);
});
This will push the information.hotelId of the corresponding document into an empty providers array. Replace that with whatever field you would rather insert into the providers array.

Related

Looking for Java code for Mongo aggregation query

Can somebody guide me on the aggregation query in Java for the following Mongo query. I am trying to sum up the distance covered every day by the vehicle. There are some duplicate records (which I cannot eliminate) so I have to use group by to filter them out.
db.collection1.aggregate({ $match: { "vehicleId": "ABCDEFGH", $and: [{ "timestamp": { $gt: ISODate("2022-08-24T00:00:00.000+0000") } }, { "timestamp": { $lt: ISODate("2022-08-25T00:00:00.000+0000") } }, { "distanceMiles": { "$gt": 0 } }] } }, { $group: {"_id": {vehicleId: "$vehicleId", "distanceMiles" : "$distanceMiles" } } }, { $group: { _id: null, distance: { $sum: "$_id.distanceMiles" } } })
If possible can you also suggest some references? I am stuck at the last group by involving $_id part.
The Java code that I have except the last group by is:
Criteria criteria = new Criteria();
criteria.andOperator(Criteria.where("timestamp").gte(start).lte(end),
Criteria.where("vehicleId").in(vehicleIdList));
Aggregation aggregation = Aggregation.newAggregation(Aggregation.match(criteria),
Aggregation.sort(Direction.DESC, "timestamp"),
Aggregation.project("distanceMiles", "vehicleId", "timestamp").and("timestamp")
.dateAsFormattedString("%Y-%m-%d").as("yearMonthDay"),
Aggregation.group("vehicleId", "yearMonthDay").first("vehicleId").as("vehicleId").
first("timestamp").as("lastReported").sum("distanceMiles").as("distanceMiles"));
Note. there is a slight difference between the raw mongo query and the query in Java on the date param.

Generally if you are looking for advice on how to directly convert an aggregation pipeline into Java code (not necessarily using the builders), check out this answer.
I'm not really clear on what component you're currently stuck on though. Is it just the direct translation between the aggregation pipeline and the Java code? Is the aggregation pipeline not giving correct results? You haven't mentioned some information such as driver version that would help us advise further if needed.
A few other general things come to mind that might be worth mentioning:
The sample .aggregate() snippet you provided does not have the square brackets ([ and ]) wrapping the pipeline which would be needed in the shell.
When referencing existing field names, you probably need to prefix them with $ in the Java code similar to how you do in the shell.
You should be able to access the values nested inside of the _id field after the first $group stage using dot notation (eg "$_id.distanceMiles") as you are in the sample aggregation.
Depending on which specific driver you are using, documentation such as this may be helpful with respect to working with the builders.

Specifying keyword type on String field

I started using hibernate-search-elasticsearch(5.8.2) because it seemed easy to integrate it maintains elasticsearch indices up to date without writing any code. It's a cool lib, but I'm starting to think that it has a very small set of the elasticsearch functionalities implemented. I'm executing a query with a painless script filter which needs to access a String field, which type is 'text' in the index mapping and this is not possible without enabling field data. But I'm not very keen on enabling it as it consumes a lot of heap memory. Here's what elasticsearch team suggests to do in my case:
Fielddata documentation
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
A text field is analyzed before indexing so that a value like New York can be found by searching for new or for york. A terms aggregation on this field will return a new bucket and a york bucket, when you probably want a single bucket called New York.
Instead, you should have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations, as follows:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Unfortunately I can't find a way to do it with the hibernate-search annotations. Can someone tell me if this is possible or I have to migrate to the vanilla elasticsearch lib and not using any wrappers?

With the current version of Hibernate Search, you need to create a different field for that (e.g. you can't have different flavors of the same field). Note that that's what Elasticsearch is doing under the hood anyway.
#Field(analyzer = "your-text-analyzer") // your default full text search field with the default name
#Field(name="myPropertyAggregation", index = Index.NO, normalizer = "keyword")
#SortableField(forField = "myPropertyAggregation")
private String myProperty;
It should create an unanalyzed field with doc values. You then need to refer to the myPropertyAggregation field for your aggregations.
Note that we will expose much more Elasticsearch features in the API in the future Search 6. In Search 5, the APIs are designed with Lucene in mind and we couldn't break them.

Query Elastic document field with and without characters

I have the following documents stored at my elasticsearch index (my_index):
{
"name": "111666"
},
{
"name": "111A666"
},
{
"name": "111B666"
}
and I want to be able to query these documents using both the exact value of the name field as well as a character-trimmed version of the value.
Examples
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111666"
}
}
}
}
should return all of the (3) documents mentioned above.
On the other hand:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111a666"
}
}
}
}
should return just one document (the one that matches exactly with the the provided value of the name field).
I didn't find a way to configure the settings of my_index in order to support such functionality (custom search/index analyzers etc..).
I should mention here that I am using ElasticSearch's Java API (QueryBuilders) in order to implement the above-mentioned queries, so I thought of doing it the Java-way.
Logic
1) Check if the provided query-string contains a letter
2) If yes (e.g 111A666), then search for 111A666 using a standard search analyzer
3) If not (e.g 111666), then use a custom search analyzer that trims the characters of the `name` field
Questions
1) Is it possible to implement this by somehow configuring how the data are stored/indexed at Elastic Search?
2) If not, is it possible to conditionally change the analyzer of a field at Runtime? (using Java)

You can easily use any build in analyzer or any custom analyzer to map your document in elasticsearch. More information on analyzer is here
The "term" query search for exact match. You can find more information about exact match here (Finding Exact Values)
But you can not change a index once it created. If you want to change any index, you have to create a new index and migrate all your data to new index.

Your question is about different logic for the analyzer at index and query time.
The solution for your Q1 is to generate two tokens at index time (111a666 -> [111a666, 111666]) but only on token at query time (111a666 -> 111a666 and 111666 -> 111666).
I.m.h.o. your have to generate a new analyzer like
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern_replace-tokenfilter.html which supported "preserve_original" like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html does.
Or you could use two fields (one with original and one without letters) and search over both.

Elastic search handling missing indices

I would like to know if there is a way to specify to elastic search that I don't mind missing or erroneous indices on my search query. In other words I have a query which tries to query 7 different indices but one of them might be missing depending on the circumstances. What I want to know is that if there is a way to say, forget the broken one and get me the results of the other 6 indices?
SearchRequestBuilder builder = elasticsearchClient.getClient().prepareSearch(indices)
.setQuery(Query.buildQueryFrom(term1, term2))
.addAggregation(AggregationBuilders.terms('term')
.field('field')
.shardSize(shardSize)
.size(size)
.minDocCount(minCount));
As an example query you can find the above one.

Take a look at the ignore_unavailable option, which is part of the multi index syntax. This has been available since at least version 1.3 and allows you to ignore missing or closed indexes when performing searches (among other multi index operations).
It is exposed in the Java API by IndicesOptions. Browsing through the source code, I found there is a setIndicesOptions() method on the SearchRequestBuilder used in the example. You need to pass it an instance of IndicesOptions.
There are various static factory methods on the IndicesOptions class for building an instance with your specific desired options. You would probably benefit from using the more convenient lenientExpandOpen() factory method (or the deprecated version, lenient(), depending on your version) which sets ignore_unavailable=true,allow_no_indices=true, and expand_wildcards=open.
Here is a modified version of the example query which should provide the behavior you are looking for:
SearchRequestBuilder builder = elasticsearchClient.getClient().prepareSearch(indices)
.setQuery(Query.buildQueryFrom(term1, term2))
.addAggregation(AggregationBuilders.terms('term')
.field('field')
.shardSize(shardSize)
.size(size)
.minDocCount(minCount))
.setIndicesOptions(IndicesOptions.lenientExpandOpen());

Have you tried using Index Aliases?
Rather than referring to individual aliases you can specify a single index value. Behind this can be several indexes.
Here I'm adding two indexes to the alias and removing the missing / broken one:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "bad-index", "alias" : "alias-index" } },
{ "add" : { "index" : "good-index1", "alias" : "alias-index" } },
{ "add" : { "index" : "good-index2", "alias" : "alias-index" } }
]
}'

How to write query with index intersection with Mongo java driver

I googled and read the official doc of mongodb (http://docs.mongodb.org/manual/core/index-intersection/), but didn't find any tutorial or indications on syntax of query using index intersection.
Does mongodb apply automatically index intersection when the query involves 2 fields which are separately indexed by a single index? I don't think so.
Here is what cursor.explain() show when i run a query between 2 dates and a given "name" ("name" is a field, both date and name are indexed.)
{
"cursor": "BtreeCursor Name_1",
"isMultiKey": false,
"n": 99330,
"nscannedObjects": 337500,
"nscanned": 337500,
"nscannedObjectsAllPlans": 337601,
"nscannedAllPlans": 337705,
"scanAndOrder": false,
"indexOnly": false,
"nYields": 18451,
"nChunkSkips":
"millis": 15430,
"indexBounds": {
"Name": [
[
"blabla",
"blabla"
]
]
},
"allPlans": [
{
"cursor": "BtreeCursor Name_1",
"isMultiKey": false,
"n": 99330,
"nscannedObjects": 337500,
"nscanned": 337500,
"scanAndOrder": false,
"indexOnly": false,
"nChunkSkips": 0,
"indexBounds": {
"Name": [
[
"blabla",
"blabla"
]
]
}
},
{
"cursor": "BtreeCursor Date_1",
"isMultiKey": false,
"n": 0,
"nscannedObjects": 101,
"nscanned": 102,
"scanAndOrder": false,
"indexOnly": false,
"nChunkSkips": 0,
"indexBounds": {
"Date": [
[
"2014-08-23 10:28:50.221",
"2014-08-23 13:28:50.221"
]
]
}
},
{
"cursor": "Complex Plan",
"n": 0,
"nscannedObjects": 0,
"nscanned": 103,
"nChunkSkips": 0
}
The complex plan shows nothing. And the elapsed time is 16s. If I query only by name without date, it takes only 0.9s
I want to learn how to write query using index intersection in mongojava driver, something like hint() in mongo shell. Any example or tutorial link is welcome.
I know about writing basic queries with Mongodb java driver. You can just post the essential code example if it saves ur time.
Thanks in advance.

After reading these links: http://docs.mongodb.org/manual/core/query-plans/#index-filters
https://jira.mongodb.org/browse/SERVER-3071
I come to conclude that there is no way for now to force query to use index intersection.
In fact, when several candidate index are possible for a query, mongodb runs them in parallel and waits a index to "win the match". The winner index is the one that completes the whole query first or returns a threshold number of matching result first. Then mongodb uses this index to query.
In the case that your queries are very variant and you cannot build many compound index, its dead. You can only trust mongodb's test.
Sometimes, one index is more selective than another. But it doesn't mean that it returns more quickly the result. Like my case, the "name" index is more selective. It may fetch less documents. But it requires a date comparaison to determine if the fetched document matches the whole query. On the other side, the "date" index fetches more documents from the disque but only does a simple equality test on the "name" field to determine if the document matches the query. That is possibly why it can win the test.
About the index intersection, it has never been used in my several query tests. I doubt if it is useful and expect mongodb to improve its performance in future version.
If my conclusion is wrong, please point it out. Still learning about MongoDB :)

Does mongodb apply automatically index intersection when the query
involves 2 fields which are separately indexed by a single index?
has been answered here: MongoDB index intersection
You can't force MongoDB to apply index intersections rather you could modify your queries to allow MongoDB query optimizer to apply index intersection strategy on your query.
To learn how your query parameters affect the indexing process, see this link, though it is for compound indexes.
http://java.dzone.com/articles/optimizing-mongodb-compound
And Java API provides two methods to use hint() with the find() operation:
MongoDB Java API
public DBCursor hint(String indexName)
public DBCursor hint(DBObject indexKeys)
Informs the database of indexed fields of the collection in order to
improve performance.
which can be used as below,
List obj = collection.find( query ).hint(indexName);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.