Flatten nested arrays using mongodb aggregation in spring - java

I am trying to flatten nested arrays using aggregation framework but I can not get the result I which.
my collection is :
[
{
"id" : "xxx",
"countryName" : "xxx",
"cities" : [
{
"id" : "xxx",
"cityName" : "xxx"
},
{
"id" : "xxx",
"cityName" : "xxx"
}
]
}
]
I want to get the cities from all countries, the result I am looking for is :
[
{
"id" : "xxx",
"cityName" : "xxx"
},
{
"id" : "xxx",
"cityName" : "xxx"
}
]
I tried this request :
val aggregation = Aggregation.newAggregation(
Aggregation.group("cities")
)
return mongoDb.aggregate(aggregation, Country::class.java, Any::class.java).mappedResults
But, I got this result :
[
{
"_id": [
{
"id": "xxx",
"cityName": "xxx"
},
{
"id": "xxx",
"cityName": "xxx"
}
]
}
]
Can someone help me please?

This aggregation will help you achieve your result, except that you have to adapt it with Java driver:
db.countries.aggregate([
{
"$unwind": "$cities"
},
{
"$project": {
"_id": 0,
"cities": 1
}
},
{
"$replaceRoot": {
"newRoot": "$cities"
}
}
])

Related

How to write mongo cli query in mongo-template for $in aggregation

This is how my data looks like
{
"_id" : "2011250546437843117",
"name" : "Book",
"textbook" : [
"Maths",
"Science"
],
"language" : [
"English"
],
"isRead" : true,
"isAvailable" : true
}
I have to filter documents based on textbook,and based on that isRead field should be true or false.
my mongo query is
db.user.aggregate([
{
$match: {
"isAvailable": true
}
},
{
$project: {
"textbook": 1,
"name": 1,
"isread": {
$in: [
"Maths",
"$textbook"
]
}
}
}
]);
I have tried to write this using mongo-template
Aggregation aggregation = newAggregation(match(Criteria.where("isAvailable").is(true)),
project("textbook","name"));
I dont understand how to write the $in operator in project stage.
Thankyou in advance.

MongoDB - spring data - how to pick objects from different arrays into array / list

Object sample:
[
{
"name": "aaa",
"list": [
{
"key": "val1"
},
{
"key": "val2"
},
{
"key": "val3"
},
{
"key": "val4"
}
]
},
{
"name": "bbb",
"list": [
{
"key": "val2"
},
{
"key": "val4"
},
{
"key": "val6"
},
{
"key": "val8"
}
]
}
]
Query: list.key = val1 or val6
Actual results:
[
{"key":"val1"},
{"key":"val2"},
{"key":"val3"},
{"key":"val4"},
{"key":"val2"},
{"key":"val4"},
{"key":"val6"},
{"key":"val8"}
]
Expected results:
[
{"key":"val1"},
{"key":"val6"}
]
I need to pick all objects in list that equal to criteria.
#Query(value="{$or :{ 'listKey' : ?0},{ 'listKey' : ?1} }", fields="{ 'listKey' : 1}")
public List<Object> findByListKey(String value,String value2); // val1 or val6
Actually, it retrieves all objects of list in case it contains this value.
Any suggestions?
You need to project array documents using $ operator & for that you need to use $elemMatch in query.
Use this query
#Query(value="{ list: {$elemMatch: {$or: [{ 'key': ?0 }, { 'key': ?1 }]}}}", fields="{ 'list.$':1}")

MongoDB Query to match both single entry and array elements

I have a problem with MongoDB QueryBuilder.
Assume I have a number of documents, that can contain one or more users:
{
"_id": "document1",
"data": {
"user": {
"credentials": {
"name": "John",
"lastname": "Watson",
"middle": "Hemish"
}
}
}
}
{
"_id": "document2",
"data": {
"user": [
{
"credentials": {
"name": "John",
"lastname": "Nicholson",
"middle": "Joseph"
}
},
{
"credentials": {
"name": "Mary",
"lastname": "Watson",
"middle": ""
}
}
]
}
}
{
"_id": "document3",
"data": {
"user": [
{
"credentials": {
"name": "John",
"lastname": "Watson",
"middle": "Hemish"
}
},
{
"credentials": {
"name": "John",
"lastname": "Nicholson",
"middle": "Joseph"
}
},
{
"credentials": {
"name": "Mary",
"lastname": "Watson",
"middle": ""
}
}
]
}
}
What I am trying to do is the query, that will return only those documents containing John Watson as a user.
Here what I got so far:
1.
QueryBuilder qb = QueryBuilder.start("credentials.lastname").is("Watson").and("credentials.name").is("John");
DBObject query = QueryBuilder.start("data.user").elemMatch(qb.get()).get();
this query will return only document3: there is no array in document1 and no match in document2 (but I would like it to return document1 and document3)
2.
DBObject query = QueryBuilder.start("data.user.credentials.lastname").is("Watson").and("data.user.credentials.name").is("John").get();
this one will return all three documents: document1 and document3 are desired match, but the query will match as well document2, for it has Watson and John in query fields in the array, no matter that they are separate entries.
Is there any way to make a right query that will return document1 and document3 for John Watson?
I am trying to do it in Java, but any other example would be fine.
Right now I use a workaround combining results from both queries: first I get limit(100) results from the query with elementMatch(), then, if there are less than 100 results, I do the second query and filter all wrong matches. But I hope there is a better and more effective way to get those results.
I could give you at best like the following where user would be in an array as unwind value of the key data. I think a little bit more effort would lead you to the exact format as you want.
I am sharing it as I think it should serve the purpose or anyhow it should help you.
The aggregation query:
db.tuttut.aggregate([
{$unwind:"$data.user"},
{ $project: {
_id:1,
data:1,
temp: {name:"$data.user.credentials.name",
lastname:"$data.user.credentials.lastname"}
} } ,
{ $group:{
_id:"$_id" ,
data: {$addToSet: "$data"} ,
temp:{ $addToSet: "$temp" } } },
{ $match:{ temp:{name:"John",lastname:"Watson"} } } ,
{$project:{_id:1, data:1}}
]).pretty()
Returned Result:
{
"_id" : "document1",
"data" : [
{
"user" : {
"credentials" : {
"name" : "John",
"lastname" : "Watson",
"middle" : "Hemish"
}
}
}
]
}
{
"_id" : "document3",
"data" : [
{
"user" : {
"credentials" : {
"name" : "John",
"lastname" : "Watson",
"middle" : "Hemish"
}
}
},
{
"user" : {
"credentials" : {
"name" : "Mary",
"lastname" : "Watson",
"middle" : ""
}
}
},
{
"user" : {
"credentials" : {
"name" : "John",
"lastname" : "Nicholson",
"middle" : "Joseph"
}
}
}
]
}

Format date in elasticsearch query (during retrieval)

I have a elasticsearch index with a field "aDate" (and lot of other fields) with the following mapping
"aDate" : {
"type" : "date",
"format" : "date_optional_time"
}
When i query for a document i get a result like
"aDate" : 1421179734000,
I know this is the epoch, the internal java/elasticsearch date format, but i want to have a result like:
"aDate" : "2015-01-13T20:08:54",
I play around with scripting
{
"query":{
"match_all":{
}
},
"script_fields":{
"aDate":{
"script":"if (!_source.aDate?.equals('null')) new java.text.SimpleDateFormat('yyyy-MM-dd\\'T\\'HH:mm:ss').format(new java.util.Date(_source.aDate));"
}
}
}
but it give strange results (script works basically, but aDate is the only field returned and _source is missing). This looks like
"hits": [{
"_index": "idx1",
"_type": "type2",
"_id": "8770",
"_score": 1.0,
"fields": {
"aDate": ["2015-01-12T17:15:47"]
}
},
I would prefer a solution without scripting if possible.
When you run a query in Elasticsearch you can request it to return the raw data, for example specifying fields:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"fields" : "aDate",
"query":{
"match_all":{
}
}
}'
Will give you the date in the format that you originally stored it:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ 1421179734000 ]
}
It's not possible to change the date format unless you use a script.
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:54.000Z" ]
}
}
To apply a format, append it as follows:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\") }"
will return "aDate" : [ "2015-01-13" ]
To display the T, you'll need to use quotes but replace them with the Unicode equivalent:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\u0027T\u0027HH:mm:ss\") }"
returns "aDate" : [ "2015-01-13T20:08:54" ]
To return script_fields and source
Use _source in your query to specify the fields you want to return:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{ "_source" : "name",
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return my name field:
"_source":{"name":"Terry"},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Using asterisk will return all fields, e.g.: "_source" : "*",
"_source":{"name":"Terry","aDate":1421179736000},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Since 5.0.0, es use Painless as script language: link
Try this (work in 6.3.2)
"script":"doc['aDate'].value.toString('yyyy-MM-dd HH:mm:ss')"
As LabOctoCat mentioned, Olly Cruickshank answer no longer works in elastic 2.2. I changed the script to:
"script":"new Date(doc['time'].value)"
You can format the date according to this.
Scripting it only computes the answer when the row is extracted. This is expensive, and keeps you from using any date-related search functions in Elasticsearch.
You should create an elasticsearch "date" field before inserting it. Looks like a java Date() object will do.
Thanks #Archon for your suggestion. I used your answer as a guide to remove the time element from a datetime field in Elasticsearch
{
"aggs": {
"grp_by_date": {
"terms": {
"size": 200,
"script": "doc['TransactionReconciliationsCreated'].value.toString('yyyy-MM-dd')"
}
}
}
}
If you use Elasticsearch 7, and want to display datetime in a specified timezone, you can request it like this
"query": {
"bool": {
"filter": [
{
"term": {
"client": {
"value": "iOS",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"script_fields": {
"time": {
"script": "ZonedDateTime input = doc['time'].value; input = input.withZoneSameInstant(ZoneId.of('Asia/Shanghai')); String output = input.format(DateTimeFormatter.ISO_ZONED_DATE_TIME); return output"
}
},
"_source": true,
return
{
...
"_source" : {
...
"time" : 1632903354213
...
},
"fields" : {
"time" : [
"2021-09-29T16:15:54.213+08:00[Asia/Shanghai]"
]
}
},
...
}

Elastic search tokenizer &filter for Split the given data

I am so tied for split the data for my expectation output. But i could not able to got it. I tried all the Filter and Tokenizer.
I Have Updated setting in elastic search as give below.
{
"settings": {
"analysis": {
"filter": {
"filter_word_delimiter": {
"preserve_original": "true",
"type": "word_delimiter"
}
},
"analyzer": {
"en_us": {
"tokenizer": "keyword",
"filter": [ "filter_word_delimiter","lowercase" ]
}
}
}
}
}
Executed Queries
curl -XGET "XX.XX.XX.XX:9200/keyword/_analyze?pretty=1&analyzer=en_us" -d 'DataGridControl'
Hits value
{
"tokens" : [ {
"token" : "datagridcontrol"
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 1
}, {
"token" : "data",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
}, {
"token" : "grid",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
}, {
"token" : "control",
"start_offset" : 9,
"end_offset" : 16,
"type" : "word",
"position" : 3
} ]
}
Expectation Result like ->
DataGridControl
DataGrid
DataControl
Data
grid
control
What type of tokenizer and Filter add to index setting.
Any help ?
Try this:
{
"settings": {
"analysis": {
"filter": {
"filter_word_delimiter": {
"type": "word_delimiter"
},
"custom_shingle": {
"type": "shingle",
"token_separator":"",
"max_shingle_size":3
}
},
"analyzer": {
"en_us": {
"tokenizer": "keyword",
"filter": [
"filter_word_delimiter",
"custom_shingle",
"lowercase"
]
}
}
}
}
}
and let me know if it gets you any closer.

Categories