How to identify a result is coming from which index? - java

I know that we can search multiple indexes in elastic search but would I know if a particular search result is belonging to which index?
As per my requirement , I want to provide a global search on different types/indexes but a user should know that the search is coming from which index/context as that will help them to correctly associate the result to the context

Elasticsearch adds some fields to the search response. Some od them are _index and _type. You can use them for your purpose.
So the sample Elasticsearch response looks like below:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 19,
"max_score": 1.1,
"hits": Array[10][
{
"_index": "first_index_name",
"_type": "first_type_of_first_index",
"_id": "doc-id-125125422",
"_score": 1.1,
"_source": { /*here is your indexed document*/ }
},
{
"_index": "second_index_name",
"_type": "first_type_of_second_index",
"_id": "doc-id-212452314",
"_score": 0.9,
"_source": {...}
},
...
]
}
}

Related

How can I parse a GeoPoint value out of an ElasticSearch response in Java?

I am searching an elastic search index from Java using Elastic's high level REST client for JAVA.
My response looks like this...
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 2,
"hits": [
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "1",
"_score": 2,
"_source": {
"location": {
"lon": -71.34,
"lat": 41.12
}
}
},
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "5291485",
"_score": 2,
"_source": {
"home_address1": "208 E Main ST Ste 230",
"firstname": "Geri",
"home_city": "Belleville",
"location": "39.919499456869055,-89.08605153191894",
"lastname": "Boyer"
}
},
...
{
"_index": "contacts_1_rvmmtqnnlh",
"_type": "_doc",
"_id": "5291492",
"_score": 2,
"_source": {
"home_address1": "620 W High ST",
"firstname": "Edna",
"home_city": "Nashville",
"location": "40.55917440131824,-89.24254785283054",
"lastname": "Willis"
}
}
]
}
}
How can I parse out the latitude and longitude of each document hit? The latitude and longitude are stored in a field named "location" that is of type GeoPoint
Here is what I have tried...
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits) {
Map<String, Object> contactMap = hit.getSourceAsMap();
LinkedHashMap<String, Object> contactLHM = new LinkedHashMap<>(contactMap);
Object coordinate = contactLHM.get("location");
location.latitude = ??????
location.longitude = ?????
}
How can I parse out the latitude and longitude given that the value of the coordinate variable is
{lon=-71.34, lat=41.12}
By the way, this is the location class definition:
public static class Location{
public Double latitude;
public Double longitude;
}
The source here indicates that you have saved documents with different _source.
You can do that with the geo_point type and of course, query them by using the same query. Basically elasticsearch understands both formats and analyzes them to the same structure (lat, lon), but that doesn't mean that it will change your source (which is exactly the data you saved).
First of all, if that's an option, you need to save the data with only one way, so the _source comes always the same. If that's not an option then you need to handle both formats (location as string, location as object of lat and lon). Moreover, you can update your _source by script.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

Implementing priority search in elastic search

I'm trying to implement custom search in elastic search.
Problem statement is consider 3 documents inserted into elastic search with "names" field as array:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
when I search for "Nick" I want to boost or give priority to document starting with Nick so in this case document with id 2 should come first and then document with id 3 and also if I search for whole name "Manny Nick"
doc with id 3 should be given priority.
In such case, you may want to modify/boost the score of search matched result for required criteria. For example, match the documents with names "Nick" and at the same time modify and boost the score of documents which contains names that start with Nick so that documents that match Nick and also starts with Nick will have higher score.
One of the way to achieve this is using Function Score Query.
In the below query, search is made for keyword "Nick" and matched documents' score is modified and boosted for criteria "names that start with Nick" using Match Phrase Prefix Query with additional weight 20.
{
"query": {
"function_score": {
"query": {
"match": {
"names": "Nick"
}
},
"boost": "1",
"functions": [
{
"filter": {
"match_phrase_prefix": {
"names": "Nick"
}
},
"weight": 20
}
],
"boost_mode": "sum"
}
}
}
Testing:
Inserted data:
{
id:1,
names:["John Wick","Iron man"]
}
{
id:2,
names:["Wick Stone","Nick John"]
}
{
id:3,
names:["Manny Nick","Stone cold"]
}
Output:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 20.693148,
"hits": [
{
"_index": "stack_1",
"_type": "1",
"_id": "T9kn5WsBrk7qsVCmKBGH",
"_score": 20.693148,
"_source": {
"id": 2,
"names": [
"Wick Stone",
"Nick John"
]
}
},
{
"_index": "stack_1",
"_type": "1",
"_id": "Ttkm5WsBrk7qsVCm2RF_",
"_score": 20.287682,
"_source": {
"id": 3,
"names": [
"Manny Nick",
"Stone cold"
]
}
}
]
}
}

ElasticSearch Java Api: Update Existing Document

I am currently in the process of attempting to update an ElasticSearch document via the Java API. I have a groovy script with the following code:
static updateRequestById(String agencyIndex, String type, String id, def policy) {
UpdateRequest updateRequest = new UpdateRequest()
updateRequest.docAsUpsert(true);
updateRequest.parent("agentNumber");
updateRequest.index(agencyIndex)
updateRequest.type(type)
updateRequest.id(id)
updateRequest.doc("policies", policy)
elasticsearchClient.update(updateRequest).get()
}
The problem with I am having is that I want to update an array within the following document:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "int-b-agency",
"_type": "jacket",
"_id": "99808.1.27.09_4644",
"_score": 1,
"_source": {
"agentNumber": "99808.1.27.09",
"fileNumber": "4644",
"policies": [
{
"agentNumber": "99808.1.27.09",
"fileNumber": "4644",
"policyNumber": "2730609-91029084",
"checkNumber": "0",
"checkAmount": 0,
"createdOn": null,
"createdBy": "traxuser621",
"propertyTypeCode": "",
"propertyTypeDesc": "1-4 FAMILY RESIDENTIAL",
"ppaddress": "110 Allan Ct ",
"ppcity": "Jacksonville",
"ppstate": "FL",
"ppzip": "32226",
"ppcounty": "Duval",
"policytype": "",
"status": "Active",
"effectiveDate": "2015-04-01T00:00:00-05:00",
"formType": "BASIC OWNERS - ALTA Owners Policy 06_306_FL - FL Original Rate",
"rateCode": "FLOR",
"rateCodeDesc": "FL Original Rate",
"policyTypeCode": "1",
"policyTypeCodeDesc": "BASIC OWNERS",
"amount": 200000,
"hoiAgentNumber": "",
"proForma": false,
"pdfLocation": "\\\\10.212.61.206\\FNFCenter\\legacy_jacket_pdfs\\2015_4_FL6465\\Policy_2730609-91029084.pdf",
"legacyPolicy": "true",
"associatedPolNbr": null
}
]
}
}
]
}
}
In the document above I have a document that has an array called "policies" with a single object. I want to be able to update the "policies" array with additional objects. The end result should look something like the following:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "int-b-agency",
"_type": "jacket",
"_id": "41341.1.81.38_41340103",
"_score": 1,
"_source": {
"agentNumber": "41341.1.81.38",
"fileNumber": "41340103",
"policies": [
{
"agentNumber": "41341.1.81.38",
"fileNumber": "41340103",
"policyNumber": "8122638-91036874",
"checkNumber": "0",
"checkAmount": 0,
"createdOn": null,
"createdBy": "traxuser621",
"propertyTypeCode": "",
"propertyTypeDesc": "1-4 FAMILY RESIDENTIAL",
"ppaddress": "1800 Smith St ",
"ppcity": "sicklerville",
"ppstate": "PA",
"ppzip": "08105",
"ppcounty": "Dauphin",
"policytype": "",
"status": "Active",
"effectiveDate": "2016-02-01T00:00:00-06:00",
"formType": "TestData",
"rateCode": "PASALERATE",
"rateCodeDesc": "Sale Rate - Agent",
"policyTypeCode": "26",
"policyTypeCodeDesc": "SALE OWNERS",
"amount": 180000,
"hoiAgentNumber": "",
"proForma": false,
"pdfLocation": "SomeLocation1",
"legacyPolicy": "true",
"associatedPolNbr": null
},
{
"agentNumber": "41341.1.81.38",
"fileNumber": "41340103",
"policyNumber": "8122638-91036875",
"checkNumber": "0",
"checkAmount": 0,
"createdOn": null,
"createdBy": "traxuser621",
"propertyTypeCode": "",
"propertyTypeDesc": "1-4 FAMILY RESIDENTIAL",
"ppaddress": "1800 Smith St ",
"ppcity": "sicklerville",
"ppstate": "PA",
"ppzip": "08105",
"ppcounty": "Dauphin",
"policytype": "",
"status": "Active",
"effectiveDate": "2016-02-01T00:00:00-06:00",
"formType": "Test Data",
"rateCode": "PASALERATE",
"rateCodeDesc": "Sale Rate - Agent",
"policyTypeCode": "26",
"policyTypeCodeDesc": "SALE OWNERS",
"amount": 180000,
"hoiAgentNumber": "",
"proForma": false,
"pdfLocation": "SomeLocation2",
"legacyPolicy": "true",
"associatedPolNbr": null
}
]
}
}
]
}
}
What am I doing wrong?
You can use a scripted update:
Put your new policy in a parameter, for example policy
Use a script like the following :
if (!ctxt._source.policies) { ctxt._source.policies = [] }
ctxt._source.policies += policy
See this documentation : https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html
Updates in inverted indexes are deletes and replacements of documents. There is no in-place update like you find in a db. ES uses Lucene under the hood which in-turn implements a kick-ass inverted index.

Elasticsearch grouping by nested fields

Is there a way to group by nested fields and perform aggregation on a non-nested fields??
I have data like this in ES
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rgXqe0-x669Gsae3",
"_score": 1,
"_source": {
"Name": "Marsh",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 70,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-06T22:30:00",
"sprintname": "changed",
"sprintpoints": 52
}
]
}
},
{
"_index": "bighalf",
"_type": "excel",
"_id": "AVE0rvTHe0-x669Gsae5",
"_score": 1,
"_source": {
"Name": "Taylor",
"date": "2015-11-07T10:47:14",
"grade": 9,
"year": 2016,
"marks": 54,
"subject": "Mathematics",
"Gender": "male",
"dob": "22/11/2000",
"sprint": [
{
"sprintdate": "2015-11-07T22:30:00",
"sprintname": "jira",
"sprintpoints": 52
}
]
}
}
I wanted to group by sprintname and find sum of marks
I tried like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(AggregationBuilders.terms("by_sprint").field("sprint.sprintname").subAggregation(sumGrades));
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);
the sum_grade did not work for me. But I changed field(grade) with nested field (sprintpoints) and it worked But my requirement is to find sum("grade") and group by sprint.sprintname.
Since your sprint field is of nested type, in your aggregation you need to use the reverse_nested aggregation in order to "jump back" at the root document from within your nested ones. It goes like this:
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
ReverseNestedBuilder backToGrades = AggregationBuilders.reverseNested("spring_to_grade")
.subAggregation(sumGrades);
TermsBuilder bySprint = AggregationBuilders.terms("by_sprint")
.field("sprint.sprintname").subAggregation(backToGrades)
NestedBuilder nested = AggregationBuilders.nested("nested").path("sprint")
.subAggregation(bySprint);
String names[] = { "changed", "jira" };
QueryBuilder query = QueryBuilders.boolQuery().must(
QueryBuilders.nestedQuery("sprint",QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("sprint.sprintname", names))));
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg").filter(query).subAggregation(nested);

Add _timestamp to Elasticsearch results using Java API

I have an Elasticsearch index which has _timestamp populated on every record. Using Marvel or curl I can get the _timestamp in the "fields" part of the result for example:
GET index/type/_search?fields=_timestamp,_source
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 116888,
"max_score": 1,
"hits": [
{
"_index": "index",
"_type": "type",
"_id": "mXJdWqSLSfykbMtChiCRjA",
"_score": 1,
"_source": {
"results": "example",
},
"fields": {
"_timestamp": 1443618319514
}
},...
However when doing a search using the Java API I cant get it to return the _timestamp.
SearchRequestBuilder builder= client.prepareSearch(index)
.addFacet(facet)
.setFrom(start)
.setSize(limit);
SearchResponse response = builder.execute().actionGet();
Can anyone tell me how to ask for _timestamp too?
You simply need to use the setFields() method like this:
SearchRequestBuilder builder= client.prepareSearch(index)
.setType(type)
.addFacet(facet)
.setFields("_timestamp") <--- add this line
.setFrom(start)
.setSize(limit);
SearchResponse response = builder.execute().actionGet();

Categories