How to remove certain fields from json httpresponse in java? - java

I am interacting with ES using Java rest client from a spring boot app and my response is JSON object.
See sample below.
What would would be the most effective way to remove certain unwanted fields ?
For example : Remove "took","successful".""_source.accountname"
Since this is a nested json object and there is high possibility of getting a larger response,I am trying to find out the most effective way.I am worried that looping through each nested object migh chew up time
{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie#euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke#pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}

Related

Groovy json parsing using JsonSlurper

I am trying to parse following json i am getting from elastic-search api in groovy using jsonslurper . I need to create a list of _id out of this json. Tried multiple variation of code but no success
please suggest , any help appreciated.
{
"took" : 2535,
"timed_out" : false,
"_shards" : {
"total" : 384,
"successful" : 384,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [
{
"_index" : "X",
"_type" : "_doc",
"_id" : "310165903526204",
"_score" : null,
"sort" : [
"310165903526204"
]
},
{
"_index" : "X",
"_type" : "_doc",
"_id" : "310165903698515",
"_score" : null,
"sort" : [
"310165903698515"
]
},
{
"_index" : "X",
"_type" : "_doc",
**"_id" : "310165903819494"**,
"_score" : null,
"sort" : [
"310165903819494"
]
}
]
}
}
PS: I tried using multiple clients provided by elasticsearch to search ES and parse data but i am facing another issue with that, so had to switch to HTTP client and do manual parse . this is link for client issue RestHighLevelClient with Elasticsearch client error
Update:
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 370,
"successful" : 370,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index",
"_type" : "_doc",
"_id" : "3961655114649",
"_score" : 1.0,
"_source" : {
"location" : {
"lat" : 14.94046,
"lon" : -23.48016
},
"place" : {
"country" : "USA",
"pois" : [
{
"externalIdentifier" : "3961655114649",
"gdfFeatureCode" : "7376",
"officialNames" : [
{
"name" : "ENG",
"language" : "ENG"
}
],
"alternateNames" : [ ],
"brandNames" : [ ],
"streetsAndCities" : [
{
"city" : "California",
"cityLanguage" : "UND"
}
],
"postalCode" : "",
"postalCodeMain" : ""
}
],
"providers" : [
{
"UniqueId" : """{"mostSigBits": 6332787932357083, "leastSigBits": -6052983698683356}""",
"code" : "ABC",
"deliveryId" : "3959",
"rawId" : """{"mostSigBits": 8772023489060096, "leastSigBits": -6327158443391381}""",
"totalAttributes" : "1",
"visibleAttributes" : "1"
},
{
"UniqueId" : """{"mostSigBits": 6332787932357083, "leastSigBits": -6052983698683356}_1""",
"rawId" : """{"mostSigBits": 8772023489060096, "leastSigBits": -6327158443391381}""",
"totalAttributes" : "1",
"visibleAttributes" : "1"
}
],
"attributes" : [ ],
"isAddObservation" : false
},
"transactionCommitDate" : 0
}
}
]
}
}
With this updated Json, i want to pull mostSigBits and leastSigBits values in typeId under providers. but the catch is i want to pull only that typeID inside providers[] which is not having _1 or _2 or anything suffix with it.
i have tried to get that data by doing this but looking for some better approach
json.hits.hits[0]._source.Place.provider[0].typeId
def _id = json.hits.hits.collect{ it._id }
_id.each{println it}
Basic Groovy's object navigation is here to help:
import groovy.json.*
String response = '''\
{
"took" : 2535,
"timed_out" : false,
"_shards" : {
"total" : 384,
"successful" : 384,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [
{
"_index" : "X",
"_type" : "_doc",
"_id" : "310165903526204",
"_score" : null,
"sort" : [
"310165903526204"
]
},
{
"_index" : "X",
"_type" : "_doc",
"_id" : "310165903698515",
"_score" : null,
"sort" : [
"310165903698515"
]
},
{
"_index" : "X",
"_type" : "_doc",
"_id" : "310165903819494",
"_score" : null,
"sort" : [
"310165903819494"
]
}
]
}
}'''
def json = new JsonSlurper().parseText response
List ids = json.hits.hits*._id
assert ids.toString() == '[310165903526204, 310165903698515, 310165903819494]'

Auto Delete index log ElasticSearch by period

How can i configure ElasticSearch to delete logs passed 1 month,
or if there is no sush conf, how can i call api delete for this purpose from java
Thank you
Have you tried using the Delete by query API?
This post also discusses how to go about doing so for 10 days. You could try it with 30 days instead.
Reading over these, you should get an idea of what to do. I do not know the exact answer but I hope this at least helps.
This is the sample index with 2 documents.
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "sample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"created" : "06/18/2021"
}
},
{
"_index" : "sample",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"created" : "06/17/2021"
}
}
]
}
}
Now you can use delete by query
POST /sample/_delete_by_query
{
"query": {
"bool": {
"must": [
{
"range": {
"created": {
"lte": "now-30d/d",
"format": "MM/dd/yyyy"
}
}
}
]
}
}
}
Output:
{
"took" : 8,
"timed_out" : false,
"total" : 2,
"deleted" : 2,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
You will see Total: 2 and Deleted: 2
Hope this helps #java_dev

Elastic termsQuery not giving expected result

I have an index where each of my objects has status field which can have some predefined values. I want to fetch all of them which has statusINITIATED, UPDATED, DELETED, any match with these and hence created this query by java which I got printing on console, using Querybuilder and nativeSearchQuery, executing by ElasticsearchOperations:
{
"bool" : {
"must" : [
{
"terms" : {
"status" : [
"INITIATED",
"UPDATED",
"DELETED"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
I have data in my index with 'INITIATED' status but not getting anyone with status mentioned in the query. How to fix this query, please?
If you need anything, please let me know.
Update: code added
NativeSearchQueryBuilder nativeSearchQueryBuilder=new NativeSearchQueryBuilder();
QueryBuildersingleQb=QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("status",statusList));
Pageable pageable = PageRequest.of(0, 1, Sort.by(Defs.START_TIME).ascending());
FieldSortBuilder sort = SortBuilders.fieldSort(Defs.START_TIME).order(SortOrder.ASC);
nativeSearchQueryBuilder.withQuery(singleQb);
nativeSearchQueryBuilder.withSort(sort);
nativeSearchQueryBuilder.withPageable(pageable);
nativeSearchQueryBuilder.withIndices(Defs.SCHEDULED_MEETING_INDEX);
nativeSearchQueryBuilder.withTypes(Defs.SCHEDULED_MEETING_INDEX);
NativeSearchQuery searchQuery = nativeSearchQueryBuilder.build();
List<ScheduledMeetingEntity> scheduledList=elasticsearchTemplate.queryForList(searchQuery, ScheduledMeetingEntity.class);
Update 2: sample data:
I got this from kibana query on this index:
"hits" : [
{
"_index" : "index_name",
"_type" : "type_name",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"createTime" : "2021-03-03T13:09:59.198",
"createTimeInMs" : 1614755399198,
"createdBy" : "user1#domain.com",
"editTime" : "2021-03-03T13:09:59.198",
"editTimeInMs" : 1614755399198,
"editedBy" : "user1#domain.com",
"versionId" : 1,
"id" : "1",
"meetingId" : "47",
"userId" : "129",
"username" : "user1#domain.com",
"recipient" : [
"user1#domain.com"
],
"subject" : "subject",
"body" : "hi there",
"startTime" : "2021-03-04T07:26:00.000",
"endTime" : "2021-03-04T07:30:00.000",
"meetingName" : "name123",
"meetingPlace" : "placeName",
"description" : "sfsafsdafsdf",
"projectName" : "",
"status" : "INITIATED",
"failTry" : 0
}
}
]
Confirm your mapping:
GET /yourIndexName/_mapping
And see if it is valid
Your mapping needs to have keyword for TermsQuery to work.
{
"status": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
ES can automatically do the mapping for you (without you having to do it yourself) when you first push a document. However you probably have finer control if you do the mapping yourself.
Either way, you need to have keyword defined for your status field.
=====================
Alternative Solution: (Case Insensitive)
If you have a Field named (status), and the values you want to search for are (INITIATED or UPDATED, or DELETED).
Then you can do it like this:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must(createStringSearchQuery());
public QueryBuilder createStringSearchQuery(){
QueryStringQueryBuilder queryBuilder = QueryBuilders.queryStringQuery(" INITIATED OR UPDATED OR DELETED ");
queryBuilder.defaultField("status");
return queryBuilder;
}
Printing the QueryBuilder:
{
"query_string" : {
"query" : "INITIATED OR UPDATED OR DELETED",
"default_field" : "status",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}

elasticsearch - Issue with aggregations along with filters

I am using the Transport client to retrieve data from Elasticsearch.
Example code snippet:
String[] names = {"Stokes","Roshan"};
BoolQueryBuilder builder = QueryBuilders.boolQuery();
AggregationBuilder<?> aggregation = AggregationBuilders.filters("agg")
.filter(builder.filter(QueryBuilders.termsQuery("Name", "Taylor"))
.filter(QueryBuilders.rangeQuery("grade").lt(9.0)))
.subAggregation(AggregationBuilders.terms("by_year").field("year")
.subAggregation(AggregationBuilders.sum("sum_marks").field("marks"))
.subAggregation(AggregationBuilders.sum("sum_grade").field("grade")));
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
I wanted to calculate the sum of marks and the sum of grades with names "Stokes" or "Roshan" whose grade is less than 9 and group them by "year". Please let me know whether my approach is correct or not. Please let me know your suggestions as well.
Documents in ES:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1,
"hits" : [{
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rgXqe0-x669Gsae3",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 9,
"year" : 2016,
"marks" : 54,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rvTHe0-x669Gsae5",
"_score" : 1,
"_source" : {
"Name" : "Marsh",
"grade" : 9,
"year" : 2015,
"marks" : 70,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "22/11/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0sBbZe0-x669Gsae7",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 3,
"year" : 2015,
"marks" : 87,
"subject" : "physics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rWz4e0-x669Gsae2",
"_score" : 1,
"_source" : {
"Name" : "Stokes",
"grade" : 9,
"year" : 2015,
"marks" : 91,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "21/12/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0roT4e0-x669Gsae4",
"_score" : 1,
"_source" : {
"Name" : "Roshan",
"grade" : 9,
"year" : 2015,
"marks" : 85,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "12/12/2000"
}
}
]
}
}
Response :
"aggregations" : {
"agg" : {
"buckets" : [{
"doc_count" : 0,
"by_year" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : []
}
}
]
}
}
Please let me know the solution for my requirement.
I think the issue is in your filters aggregation. To sum it up, you want to filter your aggregation to documents "... with names "Stokes" or "Roshan" whose grade is less than 9". In order to do this
// create the sum aggregations
SumBuilder sumMarks = AggregationBuilders.sum("sum_marks").field("marks");
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
// create the year aggregation + add the sum sub-aggregations
TermsBuilder yearAgg = AggregationBuilders.terms("by_year").field("year")
.subAggregation(sumMarks)
.subAggregation(sumGrades);
// create the bool filter for the condition above
String[] names = {"stokes","roshan"};
BoolQueryBuilder aggFilter = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("Name", names))
.must(QueryBuilders.rangeQuery("grade").lte(9.0))
// create the filter aggregation and add the year sub-aggregation
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg")
.filter(aggFilter)
.subAggregation(yearAgg);
// create the request and execute it
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
In the end, it will look like this:
{
"query": {
"match_all": {}
},
"aggs": {
"agg": {
"filter": {
"bool": {
"must": [
{
"terms": {
"Name": [
"stokes",
"roshan"
]
}
},
{
"range": {
"grade": {
"lte": 9
}
}
}
]
}
},
"aggs": {
"by_year": {
"terms": {
"field": "year"
},
"aggs": {
"sum_marks": {
"sum": {
"field": "marks"
}
},
"sum_grade": {
"sum": {
"field": "grade"
}
}
}
}
}
}
}
}
For your documents above, the result will look like this:
"aggregations": {
"agg": {
"doc_count": 2,
"by_year": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2015,
"doc_count": 2,
"sum_grade": {
"value": 18
},
"sum_marks": {
"value": 176
}
}
]
}
}
}

elasticsearch disable term frequency scoring

I want to change the scoring system in elasticsearch to get rid of counting multiple appearances of a term. For example, I want:
"texas texas texas"
and
"texas"
to come out as the same score. I had found this mapping that elasticsearch said would disable term frequency counting but my searches do not come out as the same score:
"mappings":{
"business": {
"properties" : {
"name" : {
"type" : "string",
"index_options" : "docs",
"norms" : { "enabled": false}}
}
}
}
}
Any help will be appreciated, I have not been able to find a lot of information on this.
I am adding my search code and what gets returned when I use explain.
My search code:
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build();
Client client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));
SearchRequest request = Requests.searchRequest("businesses")
.source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("name", "Texas")
.minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);
ExplainRequest request2 = client.prepareIndex("businesses", "business")
and when I search with explain I get:
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9U5KBks4zEorv9YI4n",
"_score" : 1.0,
"_source":{
"name" : "texas"
}
,
"_explanation" : {
"value" : 1.0,
"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.0,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=2, maxDocs=3)"
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
}, {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9U5K6Ks4zEorv9YI4o",
"_score" : 0.8660254,
"_source":{
"name" : "texas texas texas"
}
,
"_explanation" : {
"value" : 0.8660254,
"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.8660254,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.7320508,
"description" : "tf(freq=3.0), with freq of:",
"details" : [ {
"value" : 3.0,
"description" : "termFreq=3.0"
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=2, maxDocs=3)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
} ]
}
It looks like it is still considering frequency and doc frequency. Any ideas? Sorry for the bad formatting I don't know why it is appearing so grotesque.
My code from the browser search http://localhost:9200/businesses/business/_search?pretty=true&qname=texas
is:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YcCKjKvtg8NgyozGK",
"_score" : 1.0,
"_source":{"business" : {
"name" : "texas texas texas texas" }
}
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YateBKvtg8Ngyoy-p",
"_score" : 1.0,
"_source":{
"name" : "texas" }
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YavVnKvtg8Ngyoy-4",
"_score" : 1.0,
"_source":{
"name" : "texas texas texas" }
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9Yb7NgKvtg8NgyozFf",
"_score" : 1.0,
"_source":{"business" : {
"name" : "texas texas texas" }
}
} ]
}
}
It finds all 4 objects I have in there and has them all the same score.
When I run my java API search with explain I get:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.287682,
"hits" : [ {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YateBKvtg8Ngyoy-p",
"_score" : 1.287682,
"_source":{
"name" : "texas" }
,
"_explanation" : {
"value" : 1.287682,
"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.287682,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 1.287682,
"description" : "idf(docFreq=2, maxDocs=4)"
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
}, {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YavVnKvtg8Ngyoy-4",
"_score" : 1.1151654,
"_source":{
"name" : "texas texas texas" }
,
"_explanation" : {
"value" : 1.1151654,
"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.1151654,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.7320508,
"description" : "tf(freq=3.0), with freq of:",
"details" : [ {
"value" : 3.0,
"description" : "termFreq=3.0"
} ]
}, {
"value" : 1.287682,
"description" : "idf(docFreq=2, maxDocs=4)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
} ]
}
}
Looks like one cannot override the index options for a field after the field has been initial set in mapping
Example:
put test
put test/business/_mapping
{
"properties": {
"name": {
"type": "string",
"index_options": "freqs",
"norms": {
"enabled": false
}
}
}
}
put test/business/_mapping
{
"properties": {
"name": {
"type": "string",
"index_options": "docs",
"norms": {
"enabled": false
}
}
}
}
get test/business/_mapping
{
"test": {
"mappings": {
"business": {
"properties": {
"name": {
"type": "string",
"norms": {
"enabled": false
},
"index_options": "freqs"
}
}
}
}
}
}
You would have to recreate the index to pick up the new mapping
your field type must be text
you must re-indexing elasticsearch - create a new index
"mappings": {
"properties": {
"text": {
"type": "text",
"index_options": "docs"
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-options.html

Categories