Elasticsearch java client concatenating terms - java

I am building an elasticsearch TermsFilterBuilder using the java client so this it has an array of 3 elements with the values [3, 4, 5]. I can see in the debugger that the "values" property is this array of 3 elements.
However, when the query is sent off to elasticsearch it concatenates all of the values like this:
{
"terms" : {
"offer.accommodation.rating" : [ 345 ]
}
This is causing my query to fail. Why is it doing this?

Related

List as key for a key value store

I want to store key value pair in data base where key is a list of Integers or a set of Integers.
The use case that I have has the below steps
I will get a list of integers
I will need to check if that list of integers (as a key) is already present in the DB
If this is present, I will need to pick up the value from the DB
There are certain computations that I need to do if the list of integers (or set of integers) is not there in the DB already, if this there then I just want to pass the value and avoid the computations.
I am thinking of keeping the data in a key value store but I want the key to be specifically a list or set of integers.
I have thought about below options
Option A
Generate a unique hash for the list of integers and store that as key in key/value store
Problem:
I will have hash collision which will break my use case. I believe there is no way to generate hash with uniqueness 100% of the time.
This will not work.
If there is away to generate a unique hash (100%) times then that is the best way.
Option B
Create an immutable class with List of integers or Set of integers and store that as a key for my key value store.
Please share any feasible ways to achieve the need.
You don’t need to do anything special:
Map<List<Integer>, String> keyValueStore = new HashMap<>();
List<Integer> key = Arrays.asList(1, 2, 3);
keyValueStore.put(key, "foo");
All JDK collections implement sensible equals() and hashCode() that is based solely on the contents of the list.
Thank you. I would like to share some more findings.
I now tried the below further to what I mentioned in my earlier post.
I added the below documents in Mongodb
db.products.insertMany([
{
mapping: [1, 2,3],
hashKey:'ABC123',
date: Date()
},
{
mapping: [4, 5],
hashKey:'ABC45' ,
date: Date()
},
{
mapping: [6, 7,8],
hashKey:'ABC678' ,
date: Date()
},
{
mapping: [9, 10,11],
hashKey:'ABC91011',
date: Date()
},
{
mapping: [1, 9,10],
hashKey:'ABC1910',
date: Date()
},
{
mapping: [1, 3,4],
hashKey:'ABC134',
date: Date()
},
{
mapping: [4, 5,6],
hashKey:'ABC456',
date: Date()
}
]);
When I am now trying to find the mapping I am getting expected results
> db.products.find({ mapping: [4,5]}).pretty();
{
"_id" : ObjectId("5d4640281be52eaf11b25dfc"),
"mapping" : [
4,
5
],
"hashKey" : "ABC45",
"date" : "Sat Aug 03 2019 19:17:12 GMT-0700 (PDT)"
}
The above is giving the right result as the mapping [4,5] (insertion order retained) is present in the DB
> db.products.find({ mapping: [5,4]}).pretty();
The above is giving no result as expected as the mapping [5,4] is not present in the DB. The insertion order is retained
So it seems the "mapping" as List is working as expected.
I used Spring Data to read from MongoDB that is running locally.
The format of the document is
{
"_id" : 1,
"hashKey" : "ABC123",
"mapping" : [
1,
2,
3
],
"_class" : "com.spring.mongodb.document.Mappings"
}
I inserted 1.7 million records into DB using org.springframework.boot.CommandLineRunner
Then the query similar to my last example:
db.mappings.find({ mapping: [1,2,3]})
is taking average 1.05 seconds to find the mapping from 1.7 M records.
Please share if you have any suggestion to make it faster and how fast can I expect it to run.
I am not sure about create, update and delete performance as yet.

how to extract json path and find array length?

how to extract JSON path and find array length using java? for my below response data. I need to validate array length should be equal to '7' in Jmeter assertion.
[
[
"Week",
"Event Count"
],
[
"3/13/17",
"1"
],
[
"3/20/17",
"1"
],
[
"3/27/17",
"1"
],
[
"4/3/17",
"1"
],
[
"4/10/17",
"1"
],
[
"4/17/17",
"1"
]
]
Add JSON Extractor as a child of the request which produces the above JSON response and configure it as follows:
Variable names: anything meaningful, i.e. week
JSON Path Expressions: $[*]
Match No: -1
This will produce the following JMeter Variables (you can validate them using Debug Sampler):
week_1=["Week","Event Count"]
week_2=["3\/13\/17","1"]
week_3=["3\/20\/17","1"]
week_4=["3\/27\/17","1"]
week_5=["4\/3\/17","1"]
week_6=["4\/10\/17","1"]
week_7=["4\/17\/17","1"]
week_matchNr=7
You are particularly interested in the latter one
Add Response Assertion as a child of the same request and configure it as follows:
Apply to: JMeter Variable -> week_matchNr
Pattern Matching Rules: Equals
Patterns to Test: 7
This way your sampler will pass if number of matches will be equal to 7 and otherwise it will fail. See How to Use JMeter Assertions in Three Easy Steps article to learn more about using assertions in JMeter tests.

Not able to Query alphanumeric fields from ELASTIC SEARCH using TERMS QUERY

I am trying to query Alphanumeric values from the index using TERMS QUERY, But it is not giving me the output.
Query:
{
"size" : 10000,
"query" : {
"bool" : {
"must" : {
"terms" : {
"caid" : [ "A100945","A100896" ]
}
}
}
},
"fields" : [ "acco", "bOS", "aid", "TTl", "caid" ]
}
I want to get all the entries that has caid A100945 or A100896
The same query works fine for NUmeric fields.
I am not planning to use QueryString/MatchQuery as i am trying to build general query builder that can build query for all the request. Hence am looking to get the entries usinng TERMS Query only.
Note: I am using Java API org.elasticsearch.index.query.QueryBuilders for building the Query.
eg: QueryBuilders.termQuery("caid", "["A10xxx", "A101xxx"]")
Please help.
Regards,
Mik
If you have not customized the mappings/analysis for the caid-field, then your values are indexed as e.g. a100945, a100896 (note the lowercasing.)
The terms-query does not do query-time text-analysis, so you'll be searching for A100945 which does not match a100945.
This is quite a common problem, and is explained a bit more in this article on Troubleshooting Elasticsearch searches, for Beginners.
You better use match query.match query are analyzed[applied default analyzer and query] like
QueryBuilders.matchQuery("caid", "["A10xxx", "A101xxx"]");

Mongo and Java: Create indexes for aggregation framework

Situation: I have collection with huge amount of documents after map reduce(aggregation). Documents in the collection looks like this:
/* 0 */
{
"_id" : {
"appId" : ObjectId("1"),
"timestamp" : ISODate("2014-04-12T00:00:00.000Z"),
"name" : "GameApp",
"user" : "test#mail.com",
"type" : "game"
},
"value" : {
"count" : 2
}
}
/* 1 */
{
"_id" : {
"appId" : ObjectId("2"),
"timestamp" : ISODate("2014-04-29T00:00:00.000Z"),
"name" : "ScannerApp",
"user" : "newUser#company.com",
"type" : "game"
},
"value" : {
"count" : 5
}
}
...
And I searching inside this collection with aggregation framework:
db.myCollection.aggregate([match, project, group, sort, skip, limit]); // aggregation can return result on Daily or Monthly time base depends of user search criteria, with pagination etc...
Possible search criteria:
1. {appId, timestamp, name, user, type}
2. {appId, timestamp}
3. {name, user}
I'm getting correct result, exactly what I need. But from optimisation point of view I have doubts about indexing.
Questions:
Is it possible to create indexes for such collection?
How I can create indexes for such object with complex _id field?
How I can do analog of db.collection.find().explain() to verify which index used?
And is good idea to index such collection or its my performance paranoia?
Answer summarisation:
MongoDB creates index by _id field automatically but that is useless in a case of complex _id field like in an example. For field like: _id: {name: "", timestamp: ""} you must use index like that: *.ensureIndex({"_id.name": 1, "_id.timestamp": 1}) only after that your collection will be indexed in proper way by _id field.
For tracking how your indexes works with Mongo Aggregation you can not use db.myCollection.aggregate().explain() and proper way of doing that is:
db.runCommand({
aggregate: "collection_name",
pipeline: [match, proj, group, sort, skip, limit],
explain: true
})
My testing on local computer sows that such indexing seems to be good idea. But this is require more testing with big collections.
First, indexes 1 and 3 are probably worth investigating. As for explain, you can pass explain as an option to your pipeline. You can find docs here and an example here

MongoDB data model to support unique visitors, per event, per date range

I've got multiple websites, where each website has visitors that "trigger" multiple events I want to track. I have a log of those events, from all websites, each event is filled with the website-id, the event-name and the user-id that did the event (for the sake of simplicity, let's say that's it).
The requirements:
Be able to get, per website-id and event-name, how many unique visitors got it.
This should support also date range (distinct unique visitors on the range).
I was thinking of creating a collection per "website-id" with the following data model (as example):
collection ev_{websiteId}:
[
{
_id: "error"
dailyStats: [
{
_id: 20121005 <-- (yyyyMMdd int, should be indexed!)
hits: 5
users: [
{
_id: 1, <-- should be indexed!
hits: 1
},
{
_id: 2
hits: 3
},
{
_id: 3,
hits: 1
}
]
},
{
_id: 20121004
hits: 8
users: [
{
_id: 1,
hits: 2
},
{
_id: 2
hits: 3
},
{
_id: 3,
hits: 3
}
]
},
]
},
{
_id: "pageViews"
dailyStats: [
{
_id: 20121005
hits: 500
users: [
{
_id: 1,
hits: 100
},
{
_id: 2
hits: 300
},
{
_id: 3,
hits: 100
}
]
},
{
_id: 20121004
hits: 800
users: [
{
_id: 1,
hits: 200
},
{
_id: 2
hits: 300
},
{
_id: 3,
hits: 300
}
]
},
]
},
]
I'm using the _id to hold the event-id.
I'm using dailyStats._id to hold when it happened (an integer in yyyyMMdd format).
I'm using dailySattes.users._id to represent a user's unique-id hash.
In order to get the unique users, I should basically be able to run (mapreduce?) distinct count number of items in the array(s), per the given date range (I will convert the date range to yyyyMMdd).
My questions:
does this data model makes sense to you? I'm concerned about scalability of this model over time (if I've got a lot of daily unique visitors in some client, it make cause a huge document).
I was thinking of deleting dailyStats documents by _id < [date as yyyyMMdd]. This way I can keep my documents size to a sane number, but still, there are limits here.
Is there an easy way to run "upsert" that will also create the dailyStats if not already created, add the user, if not already created and increment "hits" property for both?
what about map-reduce? how would you approach it (need to run distinct on the users._id for all subdocuments in the given date range)? is there an easier way with the new aggregation framework?
btw - another option to solve unique visitors is using Redis Bitmaps but I am not sure it's worth holding multiple data storage (maintenance-wise).
Few comments on the current above architecture.
I'm slightly worried as you've pointed out about the scalability and how much pre-aggregation you're really doing.
Most of the Mongo instances I've worked with when doing metrics have similar cases to what you pointed out but you really seem to be relying heavily on doing updates to a single document and upserting various parts of it which is going to slow down and potentially cause a bit of locking..
I might suggest a different path, one that Mongo even suggests when talking with them about doing metrics. Seeing as you already have a structure that you're looking to do I'd create something along the lines of:
{
"_id":"20121005_siteKey_page",
"hits":512,
"users":[
{
"uid":5,
"hits":512,
}
}
This way you are limiting your document sizes to something that is going to be reasonable to do quick upserts on. From here you can do mapreduce jobs in batches to further extend out what you're looking to see.
It also depends on your end goal, are you looking to provide realtime metrics? What sort of granularity are you attemtping to get? Redis Maps may be something you want to at least look at: Great article here.
Regardless it is a fun problem to solve :)
Hope this has helped!

Categories