How can I find the document that contains the a given JSON object?
Example:
suppose that in the database test there is a document like this:
{
"identification": {
"componentId": "3a4f6199-6141-4179-ac5f-f1bbcf627bb2",
"componentType": "PivotTable",
"dataDate": "2016-06-15T15:29:51.139+0200",
"dataType": "PTF",
"properties": {
"contextId": "0329fe70-92f0-4b60-b3c2-79377adb8f95",
"tags": ["tag1", "tag2"]
}
},
"viewData": {
"lineGroups": []
}
}
Now given only the identification part of the document with partial keys set with value:
{
"componentType": "PivotTable",
"properties": {
"tags": ["tag1"]
}
}
Since the above document's identification part is matching the given identification, then that document should be returned.
If I do db.test.find({identification: {/*the given identification segment*/}}), mongodb will compare directly the identification part by checking exactly every entry in the document. In this case that document will not be returned.
Is there a way in mongodb query language that allows me to do this in relatively straight forward or easy way? Or I have to parse the entries in Identification object recursively in order to construct a query?
Mongo will try to match WHOLE properties subdocument,
so in this case we will have to supply 1:1 document.
The way you could try to get this working is unwind every element and add it to query filter section.
{
"componentType": "PivotTable",
"properties.tags": {$in:["tag1"]}
}
Related
I am using "Wildcard text index" in order to search for a pattern in every fields of my class. I am also using projection in order to remove a certain field:
#Query(value = "{$text: { $search: ?0 }, fields = "{'notWantedField':0}")
However, I would like to prevent from matching something from the unwanted field.
In other words, I would like first to project (and remove fields), then search on the remaining fields.
Is there a way to combine projection and search while keeping the wildcard search?
Thanks a lot.
I am using spring-data-mongodb 1.10.8
A possible solution could be a $and operator combined with a $regex.
For example following the Mongodb documentation https://docs.mongodb.com/manual/reference/operator/query/text, if you suppose to create a text index combining subject and author (db.articles.createIndex({"author": "text", "subject": "text"}), you can exclude author field with this query:
db.articles.find( {$and: [{ $text: { $search: "coffee" } }, {"author": {'$regex' : '^((?!coffe).)*$', '$options' : 'i'}}]}, {"author": 0})
In your case, considering that your index is a wildcard, you must exclude, using the regex, all the fields that are also in the projection.
I have the following documents stored at my elasticsearch index (my_index):
{
"name": "111666"
},
{
"name": "111A666"
},
{
"name": "111B666"
}
and I want to be able to query these documents using both the exact value of the name field as well as a character-trimmed version of the value.
Examples
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111666"
}
}
}
}
should return all of the (3) documents mentioned above.
On the other hand:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111a666"
}
}
}
}
should return just one document (the one that matches exactly with the the provided value of the name field).
I didn't find a way to configure the settings of my_index in order to support such functionality (custom search/index analyzers etc..).
I should mention here that I am using ElasticSearch's Java API (QueryBuilders) in order to implement the above-mentioned queries, so I thought of doing it the Java-way.
Logic
1) Check if the provided query-string contains a letter
2) If yes (e.g 111A666), then search for 111A666 using a standard search analyzer
3) If not (e.g 111666), then use a custom search analyzer that trims the characters of the `name` field
Questions
1) Is it possible to implement this by somehow configuring how the data are stored/indexed at Elastic Search?
2) If not, is it possible to conditionally change the analyzer of a field at Runtime? (using Java)
You can easily use any build in analyzer or any custom analyzer to map your document in elasticsearch. More information on analyzer is here
The "term" query search for exact match. You can find more information about exact match here (Finding Exact Values)
But you can not change a index once it created. If you want to change any index, you have to create a new index and migrate all your data to new index.
Your question is about different logic for the analyzer at index and query time.
The solution for your Q1 is to generate two tokens at index time (111a666 -> [111a666, 111666]) but only on token at query time (111a666 -> 111a666 and 111666 -> 111666).
I.m.h.o. your have to generate a new analyzer like
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern_replace-tokenfilter.html which supported "preserve_original" like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html does.
Or you could use two fields (one with original and one without letters) and search over both.
I know we can index a document as json but I want to index a field inside my document as json.
e.g.
{
id:"Person1",
name:"bob",
associatedCompanies:[
{
companyName:"apple",
companyId:"c1"
},
{
companyName:"google",
companyId:"c2"
}
]
}
I can have associatedCompanies field as an array by declaring it as multiValued in schema. But how can I add company element as json?
I don't think the parent-child example applies here since in this use case, the json element which is nested is not exactly same as the document. I just want to add some json element in my document.
Does anyone have any idea how this can be indexed? And how to query with such index? Is it possible to do query like below..
id:person AND name:bob AND associatedCompanies:[{
companyName:"apple",
companyId:"c1"
}]
or
id:person AND name:bob AND associatedCompanies:[{
companyName:"apple"
}]
In second query, will I get the response with the document having apple company?
Try out : Solr Nested Documents
and the Block Join Queries
I have a Java application that writes to a log file in json format.
The fields that come in the logs are variable.
The logstash reads this logfile and sends it to Kibana.
I've configured the logstash with the following file:
input {
file {
path => ["[log_path]"]
codec => "json"
}
}
filter{
json {
source => "message"
}
date {
match => [ "data", "dd-MM-yyyy HH:mm:ss.SSS" ]
timezone => "America/Sao_Paulo"
}
}
output {
elasticsearch_http {
flush_size => 1
host => "[host]"
index => "application-%{+YYYY.MM.dd}"
}
}
I've managed to show correctly everything in Kibana without any mapping.
But when I try to create a terms panel to show a count of the servers who sent those messages I have a problem.
I have a field called server in my json, that show the servers name (like: a1-name-server1), but the terms panel split the server name because of the "-".
Also I would like to count the number of times that a error message appears, but the same problem occurs, because the terms panel split the error message because of the spaces.
I'm using Kibana 3 and Logstash 1.4.
I've searched a lot on the web and couldn't find any solution.
I also tried using the .raw from logstash, but it didn't work.
How can I manage this?
Thanks for the help.
Your problem here is that your data is being tokenized. This is helpful to make any search over your data. ES (by default) will split your field message split into different parts to be able to search them. For example you may want to search for the word ERROR in your logs, so you probably would like to see in the results messages like "There was an error in your cluster" or "Error processing whatever". If you don't analyze the data for that field with tokenizers, you won't be able to search like this.
This analyzed behaviour is helpful when you want to search things, but it doesn't allow you to group when different messages that have the same content. This is your usecase. The solution to this is to update your mapping putting not_analyzed for that specific field that you don't want to split into tokens. This will probably work for your host field, but will probably break the search.
What I usually do for these kind of situations is to use index templates and multifields. The index template allow me to set a mapping for every index that match a regex and the multifields allow me to have the analyzed and not_analyzed behaviour in a same field.
Using the following query would do the job for your problem:
curl -XPUT https://example.org/_template/name_of_index_template -d '
{
"template": "indexname*",
"mappings": {
"type": {
"properties": {
"field_name": {
"type": "multi_field",
"fields": {
"field_name": {
"type": "string",
"index": "analyzed"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
And then in your terms panel you can use field.untouched, to consider the entire content of the field when you calculate the count of the different elements.
If you don't want to use index templates (maybe your data is in a single index), setting the mapping with the Put Mapping API would do the job too. And if you use multifields, there is no need to reindex the data, because from the moment that you set the new mapping for the index, the new data will be duplicated in these two subfields (field_name and field_name.untouched). If you just change the mapping from analyzed to not_analyzed you won't be able to see any change until you reindex all your data.
Since you didn't define a mapping in elasticsearch, the default settings takes place for every field in your type in your index. The default settings for string fields (like your server field) is to analyze the field, meaning that elastic search will tokenize the field contents. That is why its splitting your server names to parts.
You can overcome this issue by defining a mapping. You don't have to define all your fields, but only the ones that you don't want elasticsearch to analyze. In your particular case, sending the following put command will do the trick:
http://[host]:9200/[index_name]/_mapping/[type]
{
"type" : {
"properties" : {
"server" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
You can't do this on an already existing index because switching from analyzed to not_analyzed is a major change in the mapping.
I was not able to write a code, which would be able to increment a non-existent value in an array.
Let's consider a following structure in a mongo collection. (This is not the actual structure we use, but it maintains the issue)
{
"_id" : ObjectId("527400e43ca8e0f79c2ce52c"),
"content" : "Blotted Science",
"tags_with_ratings" : [
{
"ratings" : {
"0" : 6154,
"1" : 4974
},
"tag_name" : "math_core"
},
{
"ratings" : {
"0" : 154,
"1" : 474,
},
"tag_name" : "progressive_metal"
}
]
}
Example issue: We want to add to this document into the tags_with_ratings attribute an incrementation of a rating of a tag, which is not yet added in the array. For example we would want to increment a "0" value for a tag_name "dubstep".
So the expected behaviour would be, that mongo would upsert a document like this into the "tags_with_ratings" attribute:
{
"ratings" : {
"0" : 1
},
"tag_name" : "dubstep"
}
At the moment, we need to have one read operation, which checks if the nested document for the tag is there. If it's not, we pull the array tags_with_ratings out, create a new one, re-add the values from the previous one and add the new nested document in there. Shouldn't we be able to do this with one upsert operation, without having the expensive read happen?
The incrementation of the values takes up 90% of the process and more than half of it is consumed by reading, because we are unable to use $inc capability of creating an attribute, if it is non-existent in the array.
You cannot achieve what you want with one step using this schema.
You could do it however if you used tag_name as the key name instead of using ratings there, but then you may have a different issue when querying.
If the tag_name value was the field name (replacing ratings) you'd have {"dubstep":{"0":1}} instead of { "ratings" : {"0" : 1},"tag_name" : "dubstep"} which you can update dynamically the way you want to. Just keep in mind that this schema will make it more difficult to query - you have to know what the ratings are in advance to be able to query by keyname.