I have a Java application that writes to a log file in json format.
The fields that come in the logs are variable.
The logstash reads this logfile and sends it to Kibana.
I've configured the logstash with the following file:
input {
file {
path => ["[log_path]"]
codec => "json"
}
}
filter{
json {
source => "message"
}
date {
match => [ "data", "dd-MM-yyyy HH:mm:ss.SSS" ]
timezone => "America/Sao_Paulo"
}
}
output {
elasticsearch_http {
flush_size => 1
host => "[host]"
index => "application-%{+YYYY.MM.dd}"
}
}
I've managed to show correctly everything in Kibana without any mapping.
But when I try to create a terms panel to show a count of the servers who sent those messages I have a problem.
I have a field called server in my json, that show the servers name (like: a1-name-server1), but the terms panel split the server name because of the "-".
Also I would like to count the number of times that a error message appears, but the same problem occurs, because the terms panel split the error message because of the spaces.
I'm using Kibana 3 and Logstash 1.4.
I've searched a lot on the web and couldn't find any solution.
I also tried using the .raw from logstash, but it didn't work.
How can I manage this?
Thanks for the help.
Your problem here is that your data is being tokenized. This is helpful to make any search over your data. ES (by default) will split your field message split into different parts to be able to search them. For example you may want to search for the word ERROR in your logs, so you probably would like to see in the results messages like "There was an error in your cluster" or "Error processing whatever". If you don't analyze the data for that field with tokenizers, you won't be able to search like this.
This analyzed behaviour is helpful when you want to search things, but it doesn't allow you to group when different messages that have the same content. This is your usecase. The solution to this is to update your mapping putting not_analyzed for that specific field that you don't want to split into tokens. This will probably work for your host field, but will probably break the search.
What I usually do for these kind of situations is to use index templates and multifields. The index template allow me to set a mapping for every index that match a regex and the multifields allow me to have the analyzed and not_analyzed behaviour in a same field.
Using the following query would do the job for your problem:
curl -XPUT https://example.org/_template/name_of_index_template -d '
{
"template": "indexname*",
"mappings": {
"type": {
"properties": {
"field_name": {
"type": "multi_field",
"fields": {
"field_name": {
"type": "string",
"index": "analyzed"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
And then in your terms panel you can use field.untouched, to consider the entire content of the field when you calculate the count of the different elements.
If you don't want to use index templates (maybe your data is in a single index), setting the mapping with the Put Mapping API would do the job too. And if you use multifields, there is no need to reindex the data, because from the moment that you set the new mapping for the index, the new data will be duplicated in these two subfields (field_name and field_name.untouched). If you just change the mapping from analyzed to not_analyzed you won't be able to see any change until you reindex all your data.
Since you didn't define a mapping in elasticsearch, the default settings takes place for every field in your type in your index. The default settings for string fields (like your server field) is to analyze the field, meaning that elastic search will tokenize the field contents. That is why its splitting your server names to parts.
You can overcome this issue by defining a mapping. You don't have to define all your fields, but only the ones that you don't want elasticsearch to analyze. In your particular case, sending the following put command will do the trick:
http://[host]:9200/[index_name]/_mapping/[type]
{
"type" : {
"properties" : {
"server" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
You can't do this on an already existing index because switching from analyzed to not_analyzed is a major change in the mapping.
Related
I started using hibernate-search-elasticsearch(5.8.2) because it seemed easy to integrate it maintains elasticsearch indices up to date without writing any code. It's a cool lib, but I'm starting to think that it has a very small set of the elasticsearch functionalities implemented. I'm executing a query with a painless script filter which needs to access a String field, which type is 'text' in the index mapping and this is not possible without enabling field data. But I'm not very keen on enabling it as it consumes a lot of heap memory. Here's what elasticsearch team suggests to do in my case:
Fielddata documentation
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
A text field is analyzed before indexing so that a value like New York can be found by searching for new or for york. A terms aggregation on this field will return a new bucket and a york bucket, when you probably want a single bucket called New York.
Instead, you should have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations, as follows:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Unfortunately I can't find a way to do it with the hibernate-search annotations. Can someone tell me if this is possible or I have to migrate to the vanilla elasticsearch lib and not using any wrappers?
With the current version of Hibernate Search, you need to create a different field for that (e.g. you can't have different flavors of the same field). Note that that's what Elasticsearch is doing under the hood anyway.
#Field(analyzer = "your-text-analyzer") // your default full text search field with the default name
#Field(name="myPropertyAggregation", index = Index.NO, normalizer = "keyword")
#SortableField(forField = "myPropertyAggregation")
private String myProperty;
It should create an unanalyzed field with doc values. You then need to refer to the myPropertyAggregation field for your aggregations.
Note that we will expose much more Elasticsearch features in the API in the future Search 6. In Search 5, the APIs are designed with Lucene in mind and we couldn't break them.
I have the following documents stored at my elasticsearch index (my_index):
{
"name": "111666"
},
{
"name": "111A666"
},
{
"name": "111B666"
}
and I want to be able to query these documents using both the exact value of the name field as well as a character-trimmed version of the value.
Examples
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111666"
}
}
}
}
should return all of the (3) documents mentioned above.
On the other hand:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111a666"
}
}
}
}
should return just one document (the one that matches exactly with the the provided value of the name field).
I didn't find a way to configure the settings of my_index in order to support such functionality (custom search/index analyzers etc..).
I should mention here that I am using ElasticSearch's Java API (QueryBuilders) in order to implement the above-mentioned queries, so I thought of doing it the Java-way.
Logic
1) Check if the provided query-string contains a letter
2) If yes (e.g 111A666), then search for 111A666 using a standard search analyzer
3) If not (e.g 111666), then use a custom search analyzer that trims the characters of the `name` field
Questions
1) Is it possible to implement this by somehow configuring how the data are stored/indexed at Elastic Search?
2) If not, is it possible to conditionally change the analyzer of a field at Runtime? (using Java)
You can easily use any build in analyzer or any custom analyzer to map your document in elasticsearch. More information on analyzer is here
The "term" query search for exact match. You can find more information about exact match here (Finding Exact Values)
But you can not change a index once it created. If you want to change any index, you have to create a new index and migrate all your data to new index.
Your question is about different logic for the analyzer at index and query time.
The solution for your Q1 is to generate two tokens at index time (111a666 -> [111a666, 111666]) but only on token at query time (111a666 -> 111a666 and 111666 -> 111666).
I.m.h.o. your have to generate a new analyzer like
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern_replace-tokenfilter.html which supported "preserve_original" like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html does.
Or you could use two fields (one with original and one without letters) and search over both.
My mapping has 4 string fields:
"name"
"info"
"language"
"genre"
and 4 custom analyzers:
"english_custom_analyzer"
"french_custom_analyzer"
"spanish_custom_analyzer"
"arabic_custom_analyzer"
I want to be able to specify the Analyzer to use when inserting the document using the language field.
So If the language is English I want to use the English analyzer for documents field and if the language is French, I want to use the French Analayzer.
I tried to create an extra field called, "language_name_analyzer", populate that field with the analyzer name at insert time and set the analyzer name to "language_name_analyzer". But I get this error:
Cause: org.elasticsearch.index.mapper.MapperParsingException: Analyzer [language_name_analyzer] not found for field [datacontent_onair_title]
Thank you
First of all, I would recommend to reconsider using this feature since it has been removed from the next major release of elasticsearch 2.0.
If you still want to use it, you need to specify the path to the language_name_analyzer field in the mapping:
{
"type1" : {
"_analyzer" : {
"path" : "language_name_analyzer"
},
"properties": {
//// your other fields
}
}
}
I recently started playing around with filtered aliases in Elastic Search (documentation here) but there is use case which I am not sure how to approach.
Use Case
Each document that I index in ElasticSearch for a fact is going to have a field called "tenantId" (and also some other fields such as "type", "id" etc). Now all the documents reside in the same index, so per Tenant I want to make sure I create a filtered alias. Now I want to create the filtered alias as soon as I have created the tenant itself and have the "tenantId" handy.
Problem
When I tried to create the alias programmatically using their java client, I get the following exception:
Caused by: org.elasticsearch.index.query.QueryParsingException:
[mdm-master] Strict field resolution and no field mapping
can be found for the field with name [tenantId]
Researching more, I found out that I can probably use dynamic templates in order to achieve this. So I created a template, saved it under config/templates, recreated my index and tried the same thing again. Got the same exception again. On reading the documentation more here (bottom 3 lines on the page), I found out that even if I would try to change the following property index.query.parse.allow_unmapped_fields to true (which I didn't try yet), for filtered aliases, it will force it to false.
Now the question is, how do I approach my usecase? I do not know the mappings of the corresponding types but what I do know for a fact is each document that I index, regardless of the type, will ALWAYS have a field called tenantId and that's what I want to create my filtered alias on.
EDIT
Couple of helpful links that I found. Not sure which version is this fixed on.
filtered aliases in templates do not inherit mappings from aliased index #8473
index.query.parse.allow_unmapped_fields setting does not seem to allow unmapped fields in alias filters #8431
SECOND EDIT
Found an open bug for ElasticSearch with the exact same problem. Waiting on ES developers for a response. Failure to create Filtered Alias on empty index with template mappings #10038
All help is extremely appreciated ! I have been trying to figure this out from couple of days now with no luck :(.
Following is the code I used to add the filtered alias, and the default mapping json template
Template
{
"template-1": {
"template": "*",
"mappings": {
"_default_": {
"properties": {
"type": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"tenantId": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
JAVA CLIENT
(You can ignore the "Observable" related stuff for now)
public Observable<Boolean> createAlias(String tenantId) {
FilterBuilder filter = FilterBuilders.termFilter("tenantId", tenantId);
ListenableActionFuture<IndicesAliasesResponse> response = client.admin().indices().prepareAliases().addAlias("mdm-master", tenantId, filter).execute();
return Observable.from(response)
.map((IndicesAliasesResponse apiResponse) -> {
return apiResponse.isAcknowledged();
});
}
I'm the guy who posted that latest issue on ES Github Failure to create Filtered Alias on empty index with template mappings #10038. The quickest workaround I've found for now (except downgrading to 1.3, where this issue does not exist), is to index a document with the field before creating the alias.
If you have one index with many tenants, you should only need to index a document with the required field once when you create the index, and then you should be able to create the alias.
If you try the reproduction case I posted in the GitHub issue, but before creating the alias, run the following:
curl -XPOST 'http://localhost:9200/repro/dummytype/1' -d '{
"testfield": "dummyvalue"
}'
Then you should be able to add a filtered alias on the field testfield.
Edit - Answer to first comment:
I think that it's an oversight when you use mapping in templates. A template is applied to matching index when the index is created. I think the issue here is that the generic mapping part of the template isn't actually applied until it gets a document indexed.
This behaviour can be observed if you change my template in the issue to the following:
curl -XPUT 'http://localhost:9200/_template/repro' -d '{
"template": "repro",
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings": {
"dummytype": {
"properties": {
"testfield": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
Then you're able to create the index and add the filtered alias without indexing any documents.
As I said, I think this is a bug in the template application in ES.
I have the following document in my collection:
{
"_id":NumberLong(106379),
"_class":"x.y.z.SomeObject",
"name":"Some Name",
"information":{
"hotelId":NumberLong(106379),
"names":[
{
"localeStr":"en_US",
"name":"some Other Name"
}
],
"address":{
"address1":"5405 Google Avenue",
"city":"Mountain View",
"cityIdInCitiesCodes":"123456",
"stateId":"CA",
"countryId":"US",
"zipCode":"12345"
},
"descriptions":[
{
"localeStr":"en_US",
"description": "Some Description"
}
],
},
"providers":[
],
"some other set":{
"a":"bla bla bla",
"b":"bla,bla bla",
}
"another Property":"fdfdfdfdfdf"
}
I need to run through all documents in collection and if "providers": [] is empty I need to create new set based on values of information section.
I'm far from being MongoDB expert, so I have the few questions:
Can I do it as atomic operation?
Can I do this using MongoDB console? as far as I understood I can do it using $addToSet and $each command?
If not is there any Java based driver that can provide such functionality?
Can I do it as atomic operation?
Every document will be updated in an atomic fashion. There is no "atomic" in MongoDB in the sense of RDBMS, meaning all operations will succeed or fail, but you can prevent other writes interleaves using $isolated operator
Can I do this using MongoDB console?
Sure you can. To find all empty providers array you can issue a command like:
db.zz.find(providers :{ $size : 0}})
To update all documents where the array is of zero length with a fixed set of string, you can issue a query such as
db.zz.update({providers : { $size : 0}}, {$addToSet : {providers : "zz"}})
If you want to add a portion to you document based on a document's data, you can use the notorious $where query, do mind the warnings appearing in that link, or - as you had mentioned - query for empty provider array, and use cursor.forEach()
If not is there any Java based driver that can provide such functionality?
Sure, you have a Java driver, as for each other major programming language. It can practically do everything described, and basically every thing you can do from the shell. Is suggest you to get started from the Java Language Center.
Also there are several frameworks which facilitate working with MongoDB and bridge the object-document world. I will not give a least here as I'm pretty biased, but I'm sure a quick Google search can do.
db.so.find({ providers: { $size: 0} }).forEach(function(doc) {
doc.providers.push( doc.information.hotelId );
db.so.save(doc);
});
This will push the information.hotelId of the corresponding document into an empty providers array. Replace that with whatever field you would rather insert into the providers array.