Hi I am trying to do aggregation on a field using MAX function inside my index, the issue is when I do aggregation with group by it fails whenever I don't have that value inside of a time frame
POST /_opendistro/_sql
{
"query": "SELECT date(timeStamp) time_unit, MAX(testField) test_field_alias FROM my_index where orgId = 'xyz' and timeStamp <= '2020-07-04T23:59:59' and timeStamp >= '2020-07-01T00:00' group by date(timeStamp) order by time_unit desc"
}
My index mapping is as given below
"mappings" : {
"properties" : {
"orgId" : {
"type" : "text"
},
"testField" : {
"type" : "long",
"null_value" : 0
},
"timeStamp": {
"type":"date"
}
}
When I try the above query I get
{
"error": {
"root_cause": [
{
"type": "j_s_o_n_exception",
"reason": "JSON does not allow non-finite numbers."
}
],
"type": "j_s_o_n_exception",
"reason": "JSON does not allow non-finite numbers."
},
"status": 500
}
I have understood this happens for time frames where I don't have the above-mentioned field in the documents in my index, SO I added "null_value":0 to my mapping but still of no use.
Thing is I want to do aggregation using MAX function grouped by time scales if there is another approach that works, that's enough for me it doesn't have to be in SQL format only.
Related
i am trying out dynamodb locally and got the following table:
"Table": {
"AttributeDefinitions": [
{
"AttributeName": "hashKey",
"AttributeType": "S"
},
{
"AttributeName": "sortKey",
"AttributeType": "S"
},
{
"AttributeName": "full_json",
"AttributeType": "S"
}
],
"TableName": "local",
"KeySchema": [
{
"AttributeName": "hashKey",
"KeyType": "HASH"
},
{
"AttributeName": "sortKey",
"KeyType": "RANGE"
}
],
"TableStatus": "ACTIVE",
"CreationDateTime": "2021-10-01T15:18:04.413000+02:00",
"ProvisionedThroughput": {
"LastIncreaseDateTime": "1970-01-01T01:00:00+01:00",
"LastDecreaseDateTime": "1970-01-01T01:00:00+01:00",
"NumberOfDecreasesToday": 0,
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 1
},
"TableSizeBytes": 1066813,
"ItemCount": 23,
"TableArn": "arn:aws:dynamodb:ddblocal:000000000000:table/local",
"GlobalSecondaryIndexes": [
{
"IndexName": "sortKeyIndex",
"KeySchema": [
{
"AttributeName": "sortKey",
"KeyType": "HASH"
}
],
"Projection": {
"ProjectionType": "ALL"
},
"IndexStatus": "ACTIVE",
"ProvisionedThroughput": {
"ReadCapacityUnits": 10,
"WriteCapacityUnits": 1
},
"IndexSizeBytes": 1066813,
"ItemCount": 23,
"IndexArn": "arn:aws:dynamodb:ddblocal:000000000000:table/local/index/sortKeyIndex"
}
]
}
I want to query it with Java like this:
Index index = table.getIndex("sortKeyIndex");
ItemCollection<QueryOutcome> items2 = null;
QuerySpec querySpec = new QuerySpec();
querySpec.withKeyConditionExpression("sortKey > :end_date")
.withValueMap(new ValueMap().withString(":end_date","2021-06-30T07:49:22.000Z"));
items2 = index.query(querySpec);
But it throws a Exception with "QUery Key Condition not supported". I dont understand this, because in the docs, the "<" operator is described as regular operation. Can anybody help me
DDB Query() requires a key condition that includes an equality check on the hash/partition key.
You must provide the name of the partition key attribute and a single
value for that attribute. Query returns all items with that partition
key value. Optionally, you can provide a sort key attribute and use a
comparison operator to refine the search results.
In other words, the only time you can really use Query() is when you have a composite primary key (hash + sort).
Without a sort key specified as part of the key for the table/GSI, Query() acts just like GetItem() returning a single record with the given hash key.
I am sending a query to Elastic Search to find all segments which has a field matching the query.
We are implementing a "free search" which the user could write any text he wants and we build a query which search this text throw all the segments fields.
Each segment which one (or more) of it's fields has this text should return
For example:
I would like to get all the segments which with the name "tony lopez".
Each segment has a field of "first_name" and a field of "last_name".
The query our service builds:
"multi_match" : {
"query": "tony lopez",
"type": "best_fields"
"fields": [],
"operator": "OR"
}
The result from the Elastic using this query is a segment which includes "first_name" field "tony" and "last_name" field "lopez", but also a segment when the "first_name" field is "joe" and "last_name" is "tony".
In this type of query, I would like to recive only the segments which it's name is "tony (first_name) lopez (last_name)"
How can I fix that issue?
Hope i'm not jumping into conclusions too soon but if you want to get only tony and lopez as firstname and lastname use this:
GET my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"first": "tony"
}
},
{
"match": {
"last": "lopez"
}
}
]
}
}
}
But if one of your indexed documents contains for example tony s as firstname, the query above will return it too.
Why? firstname is a text datatype
A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed.
More Details
If you run this query via kibana:
POST my_index/_analyze
{
"field": "first",
"text": ["tony s"]
}
You will see that tony s is analyzed as two tokens tony and s.
passed through an analyzer to convert the string into a list of individual terms (tony as a term and s as a term).
That is why the above query returns tony s in results, it matches tony.
If you want to get only tony and lopez exact match then you should use this query:
GET my_index/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"first.keyword": {
"value": "tony"
}
}
},
{
"term": {
"last.keyword": {
"value": "lopez"
}
}
}
]
}
}
}
Read about keyword datatype
UPDATE
Try this query - it is not perfect same issue with my tony s example and if you have a document with firstname lopez and lastname tony it will find it.
GET my_index/_search
{
"query": {
"multi_match": {
"query": "tony lopez",
"fields": [],
"type": "cross_fields",
"operator":"AND",
"analyzer": "standard"
}
}
}
The cross_fields type is particularly useful with structured documents where multiple fields should match. For instance, when querying the first_name and last_name fields for “Will Smith”, the best match is likely to have “Will” in one field and “Smith” in the other
cross fields
Hope it helps
I wrote a query in MongoDB as follows:
db.getCollection('student').aggregate(
[
{
$match: { "student_age" : { "$ne" : 15 } }
},
{
$group:
{
_id: "$student_name",
count: {$sum: 1},
sum1: {$sum: "$student_age"}
}
}
])
In others words, I want to fetch the count of students that aren't 15 years old and the summary of their age. The query works fine and I get two data items.
In my application, I want to do the query by Spring Data.
I wrote the following code:
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group().sum("student_age").as("totalAge"),
count().as("countOfStudentNot15YearsOld"));
When this code is run, the output query will be:
"aggregate" : "MyDocument", "pipeline" :
[ { "$match" { "AGE" : { "$ne" : 15 } } },
{ "$group" : { "_id" : null, "totalAge" : { "$sum" : "$student_age" } } },
{ "$count" : "countOfStudentNot15YearsOld" }],
"cursor" : { "batchSize" : 2147483647 }
Unfortunately, the result is only countOfStudentNot15YearsOld item.
I want to fetch the result like my native query.
If your're asking to return the grouping for both "15" and "not 15" as a result then you're looking for the $cond operator which will allow a "branching" based on conditional evaluation.
From the "shell" content you would use it like this:
db.getCollection('student').aggregate([
{ "$group": {
"_id": null,
"countFiteen": {
"$sum": {
"$cond": [{ "$eq": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"countNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"sumNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, "$student_age", 0 ]
}
}
}}
])
So you use the $cond to perform a logical test, in this case whether the "student_age" in the current document being considered is 15 or not, then you can return a numerical value in response which is 1 here for "counting" or the actual field value when that is what you want to send to the accumulator instead. In short it's a "ternary" operator or if/then/else condition ( which in fact can be shown in the more expressive form with keys ) you can use to test a condition and decide what to return.
For the spring mongodb implementation you use ConditionalOperators.Cond to construct the same BSON expressions:
import org.springframework.data.mongodb.core.aggregation.*;
ConditionalOperators.Cond isFifteen = ConditionalOperators.when(new Criteria("student_age").is(15))
.then(1).otherwise(0);
ConditionalOperators.Cond notFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.then(1).otherwise(0);
ConditionalOperators.Cond sumNotFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.thenValueOf("student_age").otherwise(0);
GroupOperation groupStage = Aggregation.group()
.sum(isFifteen).as("countFifteen")
.sum(notFifteen).as("countNotFifteen")
.sum(sumNotFifteen).as("sumNotFifteen");
Aggregation aggregation = Aggregation.newAggregation(groupStage);
So basically you just extend off of that logic, using .then() for a "constant" value such as 1 for the "counts", and .thenValueOf() where you actually need the "value" of a field from the document, so basically equal to the "$student_age" as shown for the common shell notation.
Since ConditionalOperators.Cond shares the AggregationExpression interface, this can be used with .sum() in the form that accepts an AggregationExpression as opposed to a string. This is an improvement on past releases of spring mongo which would require you to perform a $project stage so there were actual document properties for the evaluated expression prior to performing a $group.
If all you want is to replicate the original query for spring mongodb, then your mistake was using the $count aggregation stage rather than appending to the group():
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group()
.sum("student_age").as("totalAge")
.count().as("countOfStudentNot15YearsOld")
);
I am testing mapping for url-s in elasticsearch.
I want to be able to search for entry both by domain name with tld (e.g. example.com)
and without tld (e.g example) and for full domain document to be returned
(like, http://example.com and www.example.com and similar)
I PUT this mapping to ES - in Sense:
PUT /en_docs
{
"mappings": {
"url": {
"properties": {
"content": {
"type": "string",
"analyzer" : "urlzer"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"urlzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
}
},
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
}
}
}
}
Now, when I index an url document, e.g
POST /en_docs/url
{
"content": "http://example.com"
}
I can get it by searching example.com but just example doesnt return anything.
The lowercase tokenizer im using in my analyzer, as docs say, and as direct testing of my analyzer shows, gives example and com tokens, but when I do the search for indexed document, example returns nothing:
GET /en_docs/url/_search?q=example
gets no results, but if the query is example.com, result is returned.
What am I missing?
I am working on elastic-search v1.1.1
I faced a problem with search queries .I want to know How solve below obstacle
Here is my mapping
{
"token" : {
"type" : "string"
}
}
Data in indexed record is
{
token : "4r5etgg-kogignjj-jdjuty687-ofijfjfhf-kdjudyhd"
}
My search is
4r5etgg-kogignjj-jdjuty687-ofijfjfhf-kdjudyhd
I want exact match of the record ,which query I need to use to get exact match of the record
can it be done
QueryBuilders.queryString() ?
I checked with queryString() ,then I finalized its not useful for exact match
Please suggest me
You can put quotes around the string to do an exact match:
QueryBuilders.queryString("\"4r5etgg-kogignjj-jdjuty687-ofijfjfhf-kdjudyhd\"");
If you don't want partial matches on the above string index an untokenized version of the value and search on that. In you mapping add:
"token": {
"type": "multi_field",
"fields": {
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
}
Then search:
{
"query": {
"match": {
"token.untouched": "4r5etgg-kogignjj-jdjuty687-ofijfjfhf-kdjudyhd"
}
}
}
Change the mapping so ElasticSearch doesn't touch your data while indexing like so to:
{
"token" : {
"type" : "string",
"index": "not_analyzed"
}
}
And then run a TermQuery from java like this
QueryBuilders.termQuery("token", "4r5etgg-kogignjj-jdjuty687-ofijfjfhf-kdjudyhd");
That should give you your exact match.