Elasticsearch is not working with Alphanumeric - java

I am having alphanumeric codes like AA111, 111AA, AA-111, AAAA, 1111. Below is the mapping for elastic search
"name" : {
"type" : "text",
"analyzer" : "standard",
"fields" : {
"lower_case_sort" : {
"type" : "keyword",
"normalizer" : "lowercase"
}
},
"copy_to" : "default"
}
The default mapping is like below
"default" : {
"type" : "text",
"analyzer" : "index_ngram",
"search_analyzer" : "search_ngram"
},
When we search with AAA or AA, It returns results. But when we search by 111 it does not return any result.
Below is the query
"bool" : {
"filter" : [
{
"match" : {
"default" : {
"query" : "111",
"operator" : "AND",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},

Its happening as you are using the some analyzer on your default field, which is removing the numbers from your text (simple analyzer is one of them), you need to use a tokeniser which doesn't remove them like edge-ngram and search on them, or use the standard analyzer which also works with the numbers.

Related

Elastic search term query not working on a specific field

I'm new to elastic search.
So this is how the index looks:
{
"scresults-000001" : {
"aliases" : {
"scresults" : { }
},
"mappings" : {
"properties" : {
"callType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"esdtValues" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gasLimit" : {
"type" : "long"
},
AND MORE OTHER Fields.......
If I'm trying to create a search query in Java that looks like this:
{
"bool" : {
"filter" : [
{
"term" : {
"sender" : {
"value" : "sendervalue",
"boost" : 1.0
}
}
},
{
"term" : {
"data" : {
"value" : "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU=",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
If I run this query I get 0 hits. If I change the field "data" with other field it works. I don't understand what's different.
How I actually create the query in Java+SpringBoot:
QueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("sender", "sendervalue"))
.filter(QueryBuilders.termQuery("data",
"YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="));
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(boolQuery)
.build();
SearchHits<ScResults> articles = elasticsearchTemplate.search(searchQuery, ScResults.class);
Since you're trying to do an exact match on a string with a term query, you need to do it on the data.keyword field which is not analyzed. Since the data field is a text field, hence analyzed by the standard analyzer, not only are all letters lowercased but the = sign at the end also gets stripped off, so there's no way this can match (unless you use a match query on the data field but then you'd not do exact matching anymore).
POST _analyze
{
"analyzer": "standard",
"text": "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="
}
Results:
{
"tokens" : [
{
"token" : "ywrktglxdwlkaxr5uhjvehlamdawmdawmdawmdawmdawmda1mdblymqzmdrjmmyznge2yjnmnme1n2mxmznhyjdiogm2zjgxzgm0mde1ntq4m0a3zje1yjewoddmmjuwnzq4qdbjmdu0yjcwndhlmmy5nte1zwe3ywu",
"start_offset" : 0,
"end_offset" : 163,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}

Elastic termsQuery not giving expected result

I have an index where each of my objects has status field which can have some predefined values. I want to fetch all of them which has statusINITIATED, UPDATED, DELETED, any match with these and hence created this query by java which I got printing on console, using Querybuilder and nativeSearchQuery, executing by ElasticsearchOperations:
{
"bool" : {
"must" : [
{
"terms" : {
"status" : [
"INITIATED",
"UPDATED",
"DELETED"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
I have data in my index with 'INITIATED' status but not getting anyone with status mentioned in the query. How to fix this query, please?
If you need anything, please let me know.
Update: code added
NativeSearchQueryBuilder nativeSearchQueryBuilder=new NativeSearchQueryBuilder();
QueryBuildersingleQb=QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("status",statusList));
Pageable pageable = PageRequest.of(0, 1, Sort.by(Defs.START_TIME).ascending());
FieldSortBuilder sort = SortBuilders.fieldSort(Defs.START_TIME).order(SortOrder.ASC);
nativeSearchQueryBuilder.withQuery(singleQb);
nativeSearchQueryBuilder.withSort(sort);
nativeSearchQueryBuilder.withPageable(pageable);
nativeSearchQueryBuilder.withIndices(Defs.SCHEDULED_MEETING_INDEX);
nativeSearchQueryBuilder.withTypes(Defs.SCHEDULED_MEETING_INDEX);
NativeSearchQuery searchQuery = nativeSearchQueryBuilder.build();
List<ScheduledMeetingEntity> scheduledList=elasticsearchTemplate.queryForList(searchQuery, ScheduledMeetingEntity.class);
Update 2: sample data:
I got this from kibana query on this index:
"hits" : [
{
"_index" : "index_name",
"_type" : "type_name",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-03-03T13:09:59.198",
"createTimeInMs" : 1614755399198,
"createdBy" : "user1#domain.com",
"editTime" : "2021-03-03T13:09:59.198",
"editTimeInMs" : 1614755399198,
"editedBy" : "user1#domain.com",
"versionId" : 1,
"id" : "1",
"meetingId" : "47",
"userId" : "129",
"username" : "user1#domain.com",
"recipient" : [
"user1#domain.com"
],
"subject" : "subject",
"body" : "hi there",
"startTime" : "2021-03-04T07:26:00.000",
"endTime" : "2021-03-04T07:30:00.000",
"meetingName" : "name123",
"meetingPlace" : "placeName",
"description" : "sfsafsdafsdf",
"projectName" : "",
"status" : "INITIATED",
"failTry" : 0
}
}
]
Confirm your mapping:
GET /yourIndexName/_mapping
And see if it is valid
Your mapping needs to have keyword for TermsQuery to work.
{
"status": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
ES can automatically do the mapping for you (without you having to do it yourself) when you first push a document. However you probably have finer control if you do the mapping yourself.
Either way, you need to have keyword defined for your status field.
=====================
Alternative Solution: (Case Insensitive)
If you have a Field named (status), and the values you want to search for are (INITIATED or UPDATED, or DELETED).
Then you can do it like this:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must(createStringSearchQuery());
public QueryBuilder createStringSearchQuery(){
QueryStringQueryBuilder queryBuilder = QueryBuilders.queryStringQuery(" INITIATED OR UPDATED OR DELETED ");
queryBuilder.defaultField("status");
return queryBuilder;
}
Printing the QueryBuilder:
{
"query_string" : {
"query" : "INITIATED OR UPDATED OR DELETED",
"default_field" : "status",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}

MongoDB Java driver - problems with aggregation

In Java with MongoDB driver I want to have pairs (business_name, count), meaning reviews count per business. Currently I have aggregate pipeline:
Bson group = group("$business_id", Accumulators.sum("count", 1));
Bson lookupOperation = lookup(
"business",
"_id",
"business_id",
"business_data"
);
Bson unwind = unwind("$business_data");
Bson project = project(fields(include("business_data.name", "count"), excludeId()));
return db
.getCollection("tip")
.aggregate(Arrays.asList(group, lookupOperation, unwind, project));
It works, but returns:
Document{{count=1, business_data=Document{{name=Firestone Complete Auto Care}}}}
1) How can I unwind business_data.name to have {count, name}?
2) How can I make name distinct? I want only 1 count per name, but printing results gives many identical copies, e. g.:
Document{{count=3, business_data=Document{{name=Malmaison}}}}
Document{{count=3, business_data=Document{{name=Malmaison}}}}
Results are quite big collection, so I return AggregateIterable, but I want results sorted by name. Can I do that with iterable, without loading entire data into array and sorting the array?
EDIT
Business document example:
{
"_id" : ObjectId("5ddbc3c1a94f7aac8d179b7c"),
"business_id" : "vcNAWiLM4dR7D2nwwJ7nCA",
"full_address" : "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018",
"hours" : {
"Tuesday" : {
"close" : "17:00",
"open" : "08:00"
},
"Friday" : {
"close" : "17:00",
"open" : "08:00"
},
"Monday" : {
"close" : "17:00",
"open" : "08:00"
},
"Wednesday" : {
"close" : "17:00",
"open" : "08:00"
},
"Thursday" : {
"close" : "17:00",
"open" : "08:00"
}
},
"open" : true,
"categories" : [
"Doctors",
"Health & Medical"
],
"city" : "Phoenix",
"review_count" : 7,
"name" : "Eric Goldberg, MD",
"neighborhoods" : [],
"longitude" : -111.983758,
"state" : "AZ",
"stars" : 3.5,
"latitude" : 33.499313,
"attributes" : {
"By Appointment Only" : true
},
"type" : "business"
}
Review document example:
{
"user_id": "IORZRljfUkedhh1SGMthTA",
"text": "The desserts are enormous..dear god.",
"business_id": "JwUE5GmEO-sH1FuwJgKBlQ",
"likes": 0,
"date": "2011-09-29",
"type": "tip"
}

how to update nested field values in elasticsearch using java api

i have a following document in elasticsearch
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
and here is the mapping of the doc
{
"testdb" : {
"mappings" : {
"directUser" : {
"properties" : {
"Email" : {
"type" : "string",
"store" : true
},
"FirstName" : {
"type" : "string",
"store" : true
},
"Inbox" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"messageList" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"ArtworkUuid" : {
"type" : "string",
"store" : true
},
"Body" : {
"type" : "string",
"store" : true
},
"DateAndTime" : {
"type" : "date",
"store" : true,
"format" : "dateOptionalTime"
},
"Delete" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"deleteReason" : {
"type" : "integer",
"store" : true
},
"deleteStatus" : {
"type" : "integer",
"store" : true
}
}
},
"ReadStatusInt" : {
"type" : "integer",
"store" : true
},
"Subject" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"uuid" : {
"type" : "string",
"store" : true
}
}
},
"LastName" : {
"type" : "string",
"store" : true
},
"uuid" : {
"type" : "string",
"store" : true
}
}
}
}
}
}
Now i want to update the values of Inbox.messageList.Delete.deleteStatus and Inbox.messageList.Delete.deleteReason from 0 to 1 of the doc with uuid 321 (Inbox.messageList.uuid).
i want to achieve something like this
{
"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
"Inbox":{
"uuid":"1234",
"messageList":[
{
"uuid":"321",
"Subject":"subject1",
"Body":"bodyText1",
"ArtworkUuid":"101",
"DateAndTime":"2015-10-15T10:59:12.096+05:00",
"ReadStatusInt":0,
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
},
{
"uuid":"123",
"Subject":"subject",
"Body":"bodyText",
"ArtworkUuid":"100",
"DateAndTime":"2015-10-15T10:59:11.982+05:00",
"ReadStatusInt":1,
"Delete":{
"deleteStatus":0,
"deleteReason":0
}
}
]
}
}
i am trying the following code to achieve my desired updated document
var xb:XContentBuilder=XContentFactory.jsonBuilder().startObject()
.startObject("Inbox")
xb.startArray("messageList")
xb.startObject();
xb.startObject("Delete")
xb.field("deleteStatus",1)
xb.field("deleteReason",1)
xb.endObject()
xb.endObject();
xb.endArray()
.endObject()
xb.endObject()
val responseUpdate=client.prepareUpdate("testdb", "directUser", directUserObj.getUuid.toString())
.setDoc(xb).execute().actionGet()
but from this code my doc becomes
{"uuid":"123",
"Email":"mail#example.com",
"FirstName":"personFirstNmae",
"LastName":"personLastName",
,"Inbox":{
"uuid":"1234",
"messageList":[
{
"Delete":{
"deleteStatus":1,
"deleteReason":1
}
}
]
}
}
and i do not want this, please help me how can i achieve my desired document , Iam using elasticsearch version 1.6
The best way I've found to update a single nested field is to use the elasticsearch update API that takes a (parameterised) script a la this answer. Last time I checked this kind of thing is only supported in groovy scripts, not lucene expression scripts (unfortunately). The reason your update produces the result it does is you are updating the whole nested object, not a specific nested item. Groovy script update will allow you to select and update the nested object with the specified ID.
You can also have a look at the nested object that I have currently updated and how I used the UpdateRequest class in Java here.
Specifically for the JAVA API, it is also possible to update a nested document with this answer of PeteyPabPro.

How to retrieve a document by its own sub document or array?

I have such structure of document:
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedPhoneList" : [
{
"type" : "家庭",
"number" : "00000000000"
},
{
"type" : "手机",
"number" : "00000000000"
}
],
"embedAddrList" : [
{
"type" : "家庭",
"addr" : "山东省诸城市***"
},
{
"type" : "工作",
"addr" : "深圳市南山区***"
}
],
"embedEmailList" : [
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
}
]
}
What I wan't to do is find the document by it's sub document,such as email in embedEmailList field.
Or if I have structure like this
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedEmailList" : [
"123#gmail.com" ,
"********#gmail.com" ,
]
}
the embedEmailList is array,how to find if there is 123#gmail.com?
Thanks.
To search for a specific value in an array, mongodb supports this syntax:
db.your_collection.find({embedEmailList : "foo#bar.com"});
See here for more information.
To search for a value in an embedded object, it supports this syntax:
db.your_collection.find({"embedEmailList.email" : "foo#bar.com"});
See here for more information.

Categories