I have gone through various threads, but couldn't find the particular answer in python.
I have a json file
{
"StoreID" : "123",
"Status" : 3,
"data" : {
"Response" : {
"section" : "25",
"elapsed" : 277.141,
"products" : {
"prd_1": {
"price" : 11.99,
"qty" : 10,
"upc" : "0787493"
},
"prd_2": {
"price" : 9.99,
"qty" : 2,
"upc" : "0763776"
},
"prd_3": {
"price" : 29.99,
"qty" : 8,
"upc" : "9948755"
}
},
"type" : "Tagged"
}
}
}
I need to convert this json file into the format below, by changing json object 'products' into an array form.
{
"StoreID" : "123",
"Status" : 3,
"data" : {
"Response" : {
"section" : "25",
"elapsed" : 277.141,
"products" : [
{
"price" : 11.99,
"qty" : 10,
"upc" : "0787493"
},
{
"price" : 9.99,
"qty" : 2,
"upc" : "0763776"
},
{
"price" : 29.99,
"qty" : 8,
"upc" : "9948755"
}
],
"type" : "Tagged"
}
}
}
Is there any good way to do it in python. Mostly I saw people are using java, but not in python. Can you please let me know a way to do it in python.
Just get the values() of products dictionary and that will give you an array of values. Code below works from me assuming your json is in file1.txt Also note
import json
with open('file1.txt') as jdata:
data = json.load(jdata)
d = data
d["data"]["Response"]["products"] = d["data"]["Response"]["products"].values()
print(json.dumps(d))
output:
{"Status": 3, "StoreID": "123", "data": {"type": "Tagged", "Response": {"section": "25", "products": [{"price": 9.99, "upc": "0763776", "qty": 2}, {"price": 29.99, "upc": "9948755", "qty": 8}, {"price": 11.99, "upc": "0787493", "qty": 10}], "elapsed": "277.141"}}}
Would something like this work for you?
import json
import copy
a = json.load(open("your_data.json", "r"))
b = copy.deepcopy(a)
t = a.get('data').get('Response').get('products')
b['data']['Response']['products'] = t.values() # Originally was: [t[i] for i in t]
You can give back JSON with json.dumps(b)
Related
I want to get the Content in the "hits Array" beginning with "target" until "text" from this String JsonData as bellow ,
I get this JSONData from elasticsearch server,
String holdedEntity =
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "try1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"target" : {
"br_id" : 0,
"wo_id" : 2,
"process" : [
"element 1",
"element 2"
]
},
"explanation" : {
"an_id" : 1311,
"pa_name" : "micha"
},
"text" : "hello world"
}
}
]
}
}
Result should look like this :
String result =
{
"target" : {
"br_id" : 0,
"wo_id" : 2,
"process" : [
"element 1",
"element 2"
]
},
"explanation" : {
"an_id" : 1311,
"pa_name" : "micha"
},
"text" : "hello world"
}
I tried this , but it is not giving the right result as above, please any Suggestion, i will be thankfull ,
JSONObject jsonObj = new JSONObject(holdedEntity);//convert the holdedEntity into JSONObject.
JSONObject jsonObjectContent = jsonObj.getJSONObject("target");//trying to get the content starting from "target" until the "text" .
String result = jsonObjectContent.toString(); //converting the jsonObjectContent toString.
But it could not recognise the Field "target" and throw me this Failure ,
JSONObject["target"] not found
Please any Advice.
thx
There is no target field at the top level; you have to get the hits field, then the hits field of that, then the appropriate element of that (since it is an array); finally, the source element of that will get you your desired result.
I was trying to find a way to convert the json array to json string.
http://jsonpath.com/
JSON
{
"firstName": "John",
"lastName" : "doe",
"age" : 26,
"address" : {
"streetAddress": "naist street",
"city" : "Nara",
"postalCode" : "630-0192"
},
"phoneNumbers": [
{
"type" : ["iPhone"],
"number": "0123-4567-8888"
},
{
"type" : ["home"],
"number": "0123-4567-8910"
}
]
}
Output
iphone
Expression I tried,
$.phoneNumbers[:1].type[,]
$.phoneNumbers[:1].type
$.phoneNumbers[:1].type
Thanks in advance
I am using the Transport client to retrieve data from Elasticsearch.
Example code snippet:
String[] names = {"Stokes","Roshan"};
BoolQueryBuilder builder = QueryBuilders.boolQuery();
AggregationBuilder<?> aggregation = AggregationBuilders.filters("agg")
.filter(builder.filter(QueryBuilders.termsQuery("Name", "Taylor"))
.filter(QueryBuilders.rangeQuery("grade").lt(9.0)))
.subAggregation(AggregationBuilders.terms("by_year").field("year")
.subAggregation(AggregationBuilders.sum("sum_marks").field("marks"))
.subAggregation(AggregationBuilders.sum("sum_grade").field("grade")));
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
I wanted to calculate the sum of marks and the sum of grades with names "Stokes" or "Roshan" whose grade is less than 9 and group them by "year". Please let me know whether my approach is correct or not. Please let me know your suggestions as well.
Documents in ES:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1,
"hits" : [{
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rgXqe0-x669Gsae3",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 9,
"year" : 2016,
"marks" : 54,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rvTHe0-x669Gsae5",
"_score" : 1,
"_source" : {
"Name" : "Marsh",
"grade" : 9,
"year" : 2015,
"marks" : 70,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "22/11/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0sBbZe0-x669Gsae7",
"_score" : 1,
"_source" : {
"Name" : "Taylor",
"grade" : 3,
"year" : 2015,
"marks" : 87,
"subject" : "physics",
"Gender" : "male",
"dob" : "13/09/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0rWz4e0-x669Gsae2",
"_score" : 1,
"_source" : {
"Name" : "Stokes",
"grade" : 9,
"year" : 2015,
"marks" : 91,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "21/12/2000"
}
}, {
"_index" : "bighalf",
"_type" : "excel",
"_id" : "AVE0roT4e0-x669Gsae4",
"_score" : 1,
"_source" : {
"Name" : "Roshan",
"grade" : 9,
"year" : 2015,
"marks" : 85,
"subject" : "Mathematics",
"Gender" : "male",
"dob" : "12/12/2000"
}
}
]
}
}
Response :
"aggregations" : {
"agg" : {
"buckets" : [{
"doc_count" : 0,
"by_year" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : []
}
}
]
}
}
Please let me know the solution for my requirement.
I think the issue is in your filters aggregation. To sum it up, you want to filter your aggregation to documents "... with names "Stokes" or "Roshan" whose grade is less than 9". In order to do this
// create the sum aggregations
SumBuilder sumMarks = AggregationBuilders.sum("sum_marks").field("marks");
SumBuilder sumGrades = AggregationBuilders.sum("sum_grade").field("grade");
// create the year aggregation + add the sum sub-aggregations
TermsBuilder yearAgg = AggregationBuilders.terms("by_year").field("year")
.subAggregation(sumMarks)
.subAggregation(sumGrades);
// create the bool filter for the condition above
String[] names = {"stokes","roshan"};
BoolQueryBuilder aggFilter = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("Name", names))
.must(QueryBuilders.rangeQuery("grade").lte(9.0))
// create the filter aggregation and add the year sub-aggregation
FilterAggregationBuilder aggregation = AggregationBuilders.filter("agg")
.filter(aggFilter)
.subAggregation(yearAgg);
// create the request and execute it
SearchResponse response = client.prepareSearch(index)
.setTypes(datasquareID)
.addAggregation(aggregation)
.execute().actionGet();
System.out.println(response.toString());
In the end, it will look like this:
{
"query": {
"match_all": {}
},
"aggs": {
"agg": {
"filter": {
"bool": {
"must": [
{
"terms": {
"Name": [
"stokes",
"roshan"
]
}
},
{
"range": {
"grade": {
"lte": 9
}
}
}
]
}
},
"aggs": {
"by_year": {
"terms": {
"field": "year"
},
"aggs": {
"sum_marks": {
"sum": {
"field": "marks"
}
},
"sum_grade": {
"sum": {
"field": "grade"
}
}
}
}
}
}
}
}
For your documents above, the result will look like this:
"aggregations": {
"agg": {
"doc_count": 2,
"by_year": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2015,
"doc_count": 2,
"sum_grade": {
"value": 18
},
"sum_marks": {
"value": 176
}
}
]
}
}
}
I am trying to retrieve the data from SearchResponse class with the above code:
SearchHits searchHits = searchResponse.getHits();
for (SearchHit searchHit : searchHits) {
SearchHitField title = searchHit.field("title");
System.out.println(title.getValue().toString());
}
But I get a null pointer exception in title.getValue() function. The "title" field is definitely there and I can verify that by printing the search response which gives the following output:
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myIndex",
"_type" : "myTye",
"_id" : "5c849b0f-d72d-4cc9-9b8c-e1201f888f94",
"_score" : 2.4181843,
"_source":{"esId":"100200153", "title":"Book 1"}
}
}
I know that I can retrieve the data with searchHit.getSource() but I am wondering why the above solution isn't working as well.
I think you have to specify .fields(fields) in the request to be able to access the fields part.
For example, if you have a query like this:
{
"query": {
"match_all": {}
}
}
you get in the hits section of the result some fields (_id, _type..., _source).
But, if you have something like this:
{
"query": {
"match_all": {}
},
"fields": ["my_field"]
}
you get back a different result:
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_malformed",
"_type": "test",
"_id": "1",
"_score": 1,
"fields": {
"my_field": [
"whatever"
]
}
},
...
You notice there, in the hits you have fields where the field specified in the search request is being returned.
It looks like you are almost there. On each hit, instead of getting the title, get the _source object, then the title field from that source object.
I have a elasticsearch index with a field "aDate" (and lot of other fields) with the following mapping
"aDate" : {
"type" : "date",
"format" : "date_optional_time"
}
When i query for a document i get a result like
"aDate" : 1421179734000,
I know this is the epoch, the internal java/elasticsearch date format, but i want to have a result like:
"aDate" : "2015-01-13T20:08:54",
I play around with scripting
{
"query":{
"match_all":{
}
},
"script_fields":{
"aDate":{
"script":"if (!_source.aDate?.equals('null')) new java.text.SimpleDateFormat('yyyy-MM-dd\\'T\\'HH:mm:ss').format(new java.util.Date(_source.aDate));"
}
}
}
but it give strange results (script works basically, but aDate is the only field returned and _source is missing). This looks like
"hits": [{
"_index": "idx1",
"_type": "type2",
"_id": "8770",
"_score": 1.0,
"fields": {
"aDate": ["2015-01-12T17:15:47"]
}
},
I would prefer a solution without scripting if possible.
When you run a query in Elasticsearch you can request it to return the raw data, for example specifying fields:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"fields" : "aDate",
"query":{
"match_all":{
}
}
}'
Will give you the date in the format that you originally stored it:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ 1421179734000 ]
}
It's not possible to change the date format unless you use a script.
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return:
{
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlWNTAk1DYhbTcL2xO",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
}, {
"_index" : "myindex",
"_type" : "date-test",
"_id" : "AUrlQnFgk1DYhbTcL2xM",
"_score" : 1.0,
"fields" : {
"aDate" : [ "2015-01-13T20:08:54.000Z" ]
}
}
To apply a format, append it as follows:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\") }"
will return "aDate" : [ "2015-01-13" ]
To display the T, you'll need to use quotes but replace them with the Unicode equivalent:
"script":"use( groovy.time.TimeCategory ){ new Date( doc[\"aDate\"].value ).format(\"yyyy-MM-dd\u0027T\u0027HH:mm:ss\") }"
returns "aDate" : [ "2015-01-13T20:08:54" ]
To return script_fields and source
Use _source in your query to specify the fields you want to return:
curl -XGET http://localhost:9200/myindex/date-test/_search?pretty -d '
{ "_source" : "name",
"query":{
"match_all":{ }
},
"script_fields":{
"aDate":{
"script":"use( groovy.time.TimeCategory ) { new Date( doc[\"aDate\"].value ) }"
}
}
}'
Will return my name field:
"_source":{"name":"Terry"},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Using asterisk will return all fields, e.g.: "_source" : "*",
"_source":{"name":"Terry","aDate":1421179736000},
"fields" : {
"aDate" : [ "2015-01-13T20:08:56.000Z" ]
}
Since 5.0.0, es use Painless as script language: link
Try this (work in 6.3.2)
"script":"doc['aDate'].value.toString('yyyy-MM-dd HH:mm:ss')"
As LabOctoCat mentioned, Olly Cruickshank answer no longer works in elastic 2.2. I changed the script to:
"script":"new Date(doc['time'].value)"
You can format the date according to this.
Scripting it only computes the answer when the row is extracted. This is expensive, and keeps you from using any date-related search functions in Elasticsearch.
You should create an elasticsearch "date" field before inserting it. Looks like a java Date() object will do.
Thanks #Archon for your suggestion. I used your answer as a guide to remove the time element from a datetime field in Elasticsearch
{
"aggs": {
"grp_by_date": {
"terms": {
"size": 200,
"script": "doc['TransactionReconciliationsCreated'].value.toString('yyyy-MM-dd')"
}
}
}
}
If you use Elasticsearch 7, and want to display datetime in a specified timezone, you can request it like this
"query": {
"bool": {
"filter": [
{
"term": {
"client": {
"value": "iOS",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"script_fields": {
"time": {
"script": "ZonedDateTime input = doc['time'].value; input = input.withZoneSameInstant(ZoneId.of('Asia/Shanghai')); String output = input.format(DateTimeFormatter.ISO_ZONED_DATE_TIME); return output"
}
},
"_source": true,
return
{
...
"_source" : {
...
"time" : 1632903354213
...
},
"fields" : {
"time" : [
"2021-09-29T16:15:54.213+08:00[Asia/Shanghai]"
]
}
},
...
}