Elastic search term query not working on a specific field - java

I'm new to elastic search.
So this is how the index looks:
{
"scresults-000001" : {
"aliases" : {
"scresults" : { }
},
"mappings" : {
"properties" : {
"callType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"esdtValues" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gasLimit" : {
"type" : "long"
},
AND MORE OTHER Fields.......
If I'm trying to create a search query in Java that looks like this:
{
"bool" : {
"filter" : [
{
"term" : {
"sender" : {
"value" : "sendervalue",
"boost" : 1.0
}
}
},
{
"term" : {
"data" : {
"value" : "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU=",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
If I run this query I get 0 hits. If I change the field "data" with other field it works. I don't understand what's different.
How I actually create the query in Java+SpringBoot:
QueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("sender", "sendervalue"))
.filter(QueryBuilders.termQuery("data",
"YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="));
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(boolQuery)
.build();
SearchHits<ScResults> articles = elasticsearchTemplate.search(searchQuery, ScResults.class);

Since you're trying to do an exact match on a string with a term query, you need to do it on the data.keyword field which is not analyzed. Since the data field is a text field, hence analyzed by the standard analyzer, not only are all letters lowercased but the = sign at the end also gets stripped off, so there's no way this can match (unless you use a match query on the data field but then you'd not do exact matching anymore).
POST _analyze
{
"analyzer": "standard",
"text": "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="
}
Results:
{
"tokens" : [
{
"token" : "ywrktglxdwlkaxr5uhjvehlamdawmdawmdawmdawmdawmda1mdblymqzmdrjmmyznge2yjnmnme1n2mxmznhyjdiogm2zjgxzgm0mde1ntq4m0a3zje1yjewoddmmjuwnzq4qdbjmdu0yjcwndhlmmy5nte1zwe3ywu",
"start_offset" : 0,
"end_offset" : 163,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}

Related

Elasticsearch is not working with Alphanumeric

I am having alphanumeric codes like AA111, 111AA, AA-111, AAAA, 1111. Below is the mapping for elastic search
"name" : {
"type" : "text",
"analyzer" : "standard",
"fields" : {
"lower_case_sort" : {
"type" : "keyword",
"normalizer" : "lowercase"
}
},
"copy_to" : "default"
}
The default mapping is like below
"default" : {
"type" : "text",
"analyzer" : "index_ngram",
"search_analyzer" : "search_ngram"
},
When we search with AAA or AA, It returns results. But when we search by 111 it does not return any result.
Below is the query
"bool" : {
"filter" : [
{
"match" : {
"default" : {
"query" : "111",
"operator" : "AND",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
Its happening as you are using the some analyzer on your default field, which is removing the numbers from your text (simple analyzer is one of them), you need to use a tokeniser which doesn't remove them like edge-ngram and search on them, or use the standard analyzer which also works with the numbers.

Elastic termsQuery not giving expected result

I have an index where each of my objects has status field which can have some predefined values. I want to fetch all of them which has statusINITIATED, UPDATED, DELETED, any match with these and hence created this query by java which I got printing on console, using Querybuilder and nativeSearchQuery, executing by ElasticsearchOperations:
{
"bool" : {
"must" : [
{
"terms" : {
"status" : [
"INITIATED",
"UPDATED",
"DELETED"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
I have data in my index with 'INITIATED' status but not getting anyone with status mentioned in the query. How to fix this query, please?
If you need anything, please let me know.
Update: code added
NativeSearchQueryBuilder nativeSearchQueryBuilder=new NativeSearchQueryBuilder();
QueryBuildersingleQb=QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("status",statusList));
Pageable pageable = PageRequest.of(0, 1, Sort.by(Defs.START_TIME).ascending());
FieldSortBuilder sort = SortBuilders.fieldSort(Defs.START_TIME).order(SortOrder.ASC);
nativeSearchQueryBuilder.withQuery(singleQb);
nativeSearchQueryBuilder.withSort(sort);
nativeSearchQueryBuilder.withPageable(pageable);
nativeSearchQueryBuilder.withIndices(Defs.SCHEDULED_MEETING_INDEX);
nativeSearchQueryBuilder.withTypes(Defs.SCHEDULED_MEETING_INDEX);
NativeSearchQuery searchQuery = nativeSearchQueryBuilder.build();
List<ScheduledMeetingEntity> scheduledList=elasticsearchTemplate.queryForList(searchQuery, ScheduledMeetingEntity.class);
Update 2: sample data:
I got this from kibana query on this index:
"hits" : [
{
"_index" : "index_name",
"_type" : "type_name",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-03-03T13:09:59.198",
"createTimeInMs" : 1614755399198,
"createdBy" : "user1#domain.com",
"editTime" : "2021-03-03T13:09:59.198",
"editTimeInMs" : 1614755399198,
"editedBy" : "user1#domain.com",
"versionId" : 1,
"id" : "1",
"meetingId" : "47",
"userId" : "129",
"username" : "user1#domain.com",
"recipient" : [
"user1#domain.com"
],
"subject" : "subject",
"body" : "hi there",
"startTime" : "2021-03-04T07:26:00.000",
"endTime" : "2021-03-04T07:30:00.000",
"meetingName" : "name123",
"meetingPlace" : "placeName",
"description" : "sfsafsdafsdf",
"projectName" : "",
"status" : "INITIATED",
"failTry" : 0
}
}
]
Confirm your mapping:
GET /yourIndexName/_mapping
And see if it is valid
Your mapping needs to have keyword for TermsQuery to work.
{
"status": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
ES can automatically do the mapping for you (without you having to do it yourself) when you first push a document. However you probably have finer control if you do the mapping yourself.
Either way, you need to have keyword defined for your status field.
=====================
Alternative Solution: (Case Insensitive)
If you have a Field named (status), and the values you want to search for are (INITIATED or UPDATED, or DELETED).
Then you can do it like this:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must(createStringSearchQuery());
public QueryBuilder createStringSearchQuery(){
QueryStringQueryBuilder queryBuilder = QueryBuilders.queryStringQuery(" INITIATED OR UPDATED OR DELETED ");
queryBuilder.defaultField("status");
return queryBuilder;
}
Printing the QueryBuilder:
{
"query_string" : {
"query" : "INITIATED OR UPDATED OR DELETED",
"default_field" : "status",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}

What is meant by processedWithError in the report task manager?

I already ingested the file into the druid, greatfully it shows the ingestion is success. However when I checked in the reports of the ingestion, there are all rows are processed with error yet the Datasource is display in the "Datasource" tab.
I have tried to minimise the rows from 20M to 20 rows only. Here is my configuration file:
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "/home/data/Salutica",
"filter" : "outDashboard2RawV3.csv"
}
},
"dataSchema" : {
"dataSource": "DaTRUE2_Dashboard_V3",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "WEEK",
"queryGranularity" : "none",
"intervals" : ["2017-05-08/2019-05-17"],
"rollup" : false
},
"parser" : {
"type" : "string",
"parseSpec": {
"format" : "csv",
"timestampSpec" : {
"column" : "Date_Time",
"format" : "auto"
},
"columns" : [
"Main_ID","Parameter_ID","Date_Time","Serial_Number","Status","Station_ID",
"Station_Type","Parameter_Name","Failed_Date_Time","Failed_Measurement",
"Database_Name","Date_Time_Year","Date_Time_Month",
"Date_Time_Day","Date_Time_Hour","Date_Time_Weekday","Status_New"
],
"dimensionsSpec" : {
"dimensions" : [
"Date_Time","Serial_Number","Status","Station_ID",
"Station_Type","Parameter_Name","Failed_Date_Time",
"Failed_Measurement","Database_Name","Status_New",
{
"name" : "Main_ID",
"type" : "long"
},
{
"name" : "Parameter_ID",
"type" : "long"
},
{
"name" : "Date_Time_Year",
"type" : "long"
},
{
"name" : "Date_Time_Month",
"type" : "long"
},
{
"name" : "Date_Time_Day",
"type" : "long"
},
{
"name" : "Date_Time_Hour",
"type" : "long"
},
{
"name" : "Date_Time_Weekday",
"type" : "long"
}
]
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
}
]
},
"tuningConfig" : {
"type" : "index",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {}
}
}
}
Report:
{"ingestionStatsAndErrors":{"taskId":"index_DaTRUE2_Dashboard_V3_2019-09-10T01:16:47.113Z","payload":{"ingestionState":"COMPLETED","unparseableEvents":{},"rowStats":{"determinePartitions":{"processed":0,"processedWithError":0,"thrownAway":0,"unparseable":0},"buildSegments":{"processed":0,"processedWithError":20606701,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}
I'm expecting this:
{"processed":20606701,"processedWithError":0,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}
instead of this:
{"processed":0,"processedWithError":20606701,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}
Below is my input data from csv;
"Main_ID","Parameter_ID","Date_Time","Serial_Number","Status","Station_ID","Station_Type","Parameter_Name","Failed_Date_Time","Failed_Measurement","Database_Name","Date_Time_Year","Date_Time_Month","Date_Time_Day","Date_Time_Hour","Date_Time_Weekday","Status_New"
1,3,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","1.8V","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,4,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","1.35V","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,5,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","Isc_VChrg","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,6,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","Isc_VBAT","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"

how to update nested field values in elasticsearch using java api

i have a following document in elasticsearch
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
and here is the mapping of the doc
{
"testdb" : {
"mappings" : {
"directUser" : {
"properties" : {
"Email" : {
"type" : "string",
"store" : true
},
"FirstName" : {
"type" : "string",
"store" : true
},
"Inbox" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"messageList" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"ArtworkUuid" : {
"type" : "string",
"store" : true
},
"Body" : {
"type" : "string",
"store" : true
},
"DateAndTime" : {
"type" : "date",
"store" : true,
"format" : "dateOptionalTime"
},
"Delete" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"deleteReason" : {
"type" : "integer",
"store" : true
},
"deleteStatus" : {
"type" : "integer",
"store" : true
}
}
},
"ReadStatusInt" : {
"type" : "integer",
"store" : true
},
"Subject" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"LastName" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
}
}
}
}
Now i want to update the values of Inbox.messageList.Delete.deleteStatus and Inbox.messageList.Delete.deleteReason from 0 to 1 of the doc with uuid 321 (Inbox.messageList.uuid).
i want to achieve something like this
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
i am trying the following code to achieve my desired updated document
var xb:XContentBuilder=XContentFactory.jsonBuilder().startObject()
.startObject("Inbox")
xb.startArray("messageList")
xb.startObject();
xb.startObject("Delete")
xb.field("deleteStatus",1)
xb.field("deleteReason",1)
xb.endObject()
xb.endObject();
xb.endArray()
.endObject()
xb.endObject()
val responseUpdate=client.prepareUpdate("testdb", "directUser", directUserObj.getUuid.toString())
.setDoc(xb).execute().actionGet()
but from this code my doc becomes
{"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
,"Inbox":{
"uuid":"1234",
"messageList":[
{
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
}
]
}
}
and i do not want this, please help me how can i achieve my desired document , Iam using elasticsearch version 1.6
The best way I've found to update a single nested field is to use the elasticsearch update API that takes a (parameterised) script a la this answer. Last time I checked this kind of thing is only supported in groovy scripts, not lucene expression scripts (unfortunately). The reason your update produces the result it does is you are updating the whole nested object, not a specific nested item. Groovy script update will allow you to select and update the nested object with the specified ID.
You can also have a look at the nested object that I have currently updated and how I used the UpdateRequest class in Java here.
Specifically for the JAVA API, it is also possible to update a nested document with this answer of PeteyPabPro.

How to retrieve a document by its own sub document or array?

I have such structure of document:
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedPhoneList" : [
{
"type" : "家庭",
"number" : "00000000000"
},
{
"type" : "手机",
"number" : "00000000000"
}
],
"embedAddrList" : [
{
"type" : "家庭",
"addr" : "山东省诸城市***"
},
{
"type" : "工作",
"addr" : "深圳市南山区***"
}
],
"embedEmailList" : [
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
}
]
}
What I wan't to do is find the document by it's sub document,such as email in embedEmailList field.
Or if I have structure like this
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedEmailList" : [
"123#gmail.com" ,
"********#gmail.com" ,
]
}
the embedEmailList is array,how to find if there is 123#gmail.com?
Thanks.
To search for a specific value in an array, mongodb supports this syntax:
db.your_collection.find({embedEmailList : "foo#bar.com"});
See here for more information.
To search for a value in an embedded object, it supports this syntax:
db.your_collection.find({"embedEmailList.email" : "foo#bar.com"});
See here for more information.

Categories