I am using springboot with mongodb in our project.
I have two collections. For simplicity lets call them foo and bar.
Foo has bar as Dbref inside its documents.
I have a query on Foo where I am querying it on bar.$id
{ "bar.$id" : ObjectId("61405393108f544d3b2082ed")}
I have indexed Foo collection on bar.$id
THere are 14K documents in Foo collection right now and 5 bar documents.
However, when I am calling this query from mongo repository, it is taking more than 1 min to execute.
When I analyzed the issue using explain(), I saw that on mongoDb side, the query is executed in under 100 ms. Result of explain:
{ queryPlanner:
{ plannerVersion: 1,
namespace: 'development.foo',
indexFilterSet: false,
parsedQuery: { 'bar.$id': { '$eq': ObjectId("61405393108f544d3b2082ed") } },
winningPlan:
{ stage: 'FETCH',
inputStage:
{ stage: 'IXSCAN',
keyPattern: { 'bar.$id': 1 },
indexName: 'index1',
isMultiKey: false,
multiKeyPaths: { 'bar.$id': [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { 'bar.$id': [ '[ObjectId(\'61405393108f544d3b2082ed\'), ObjectId(\'61405393108f544d3b2082ed\')]' ] } } },
rejectedPlans: [] },
executionStats:
{ executionSuccess: true,
nReturned: 7011,
executionTimeMillis: 6,
totalKeysExamined: 7011,
totalDocsExamined: 7011,
executionStages:
{ stage: 'FETCH',
nReturned: 7011,
executionTimeMillisEstimate: 1,
works: 7012,
advanced: 7011,
needTime: 0,
needYield: 0,
saveState: 7,
restoreState: 7,
isEOF: 1,
docsExamined: 7011,
alreadyHasObj: 0,
inputStage:
{ stage: 'IXSCAN',
nReturned: 7011,
executionTimeMillisEstimate: 0,
works: 7012,
advanced: 7011,
needTime: 0,
needYield: 0,
saveState: 7,
restoreState: 7,
isEOF: 1,
keyPattern: { 'bar.$id': 1 },
indexName: 'index1',
isMultiKey: false,
multiKeyPaths: { 'b.$arid': [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { 'bar.$id': [ '[ObjectId(\'61405393108f544d3b2082ed\'), ObjectId(\'61405393108f544d3b2082ed\')]' ] },
keysExamined: 7011,
seeks: 1,
dupsTested: 0,
dupsDropped: 0 } } },
serverInfo:
{ host: 'XXXX.0.mongodb.net',
port: 27017,
version: '4.4.9',
gitVersion: 'XXXX' },
ok: 1,
'$clusterTime':
{ clusterTime: Timestamp({ t: 1632909970, i: 19 }),
signature:
{ hash: Binary(Buffer.from("XXXX", "hex"), 0),
keyId: XXXX } },
operationTime: Timestamp({ t: 1632909970, i: 19 }) }
Springboot code:
#Query(value="{ 'bar.$id' : ObjectId(?0) }")
List<Foo> findByBarId(String barId);
Questions:
Why is spring boot taking too long to process query results?
Any solution to optimize this?
For all those who are facing a similar issue,
I resolved the issue, by adding turning on the lazy evaluation in Dbrefs.
#DBRef(lazy = true)
private bar bar;
I was retrieving the list of 14K documents each with one Dbref object and without mentioning lazy=true, all those objects were getting resolving eagerly which was causing the delay.
Related
I have a list of parameters to query, i want to get result sort by this param list order
in MYSQL, It looks like this
select * from tableA
where id in (3, 1, 2)
order by field(id, 3, 1, 2)
How to achieve the same effect in es?
"query":{
"bool":{
"must":{
{"terms":{ "xId" : #[givenIdList] }}
}
}
}
"sort":{how to sort by #[givenIdList]?}
Thanks for any suggestions.
The idea is that given the sorted list [3, 1, 2] you should return a smaller score for 3, a bigger for 1 and finally the biggest for 2. The easiest function you can consider, is a function from array element to its index. E.g., for 3 you should return 0, for 1 1 and for 2 2.
Concretely, you need a function that may look like this:
def myList = [3, 1, 2];
// Declares a map literal
Map m= [:];
// Generate from [3, 1, 2] the mapping { 3 = 0.0, 1 = 1.0, 2 = 2.0 }
def i = 0;
for (x in myList) {
m[x] = (double)i++;
}
// Extract the xId from the document
def xId = (int)doc['xId'].value;
// Return the mapped value, e.g., for 3 return 0
return m[xId];
Obviously, you can improve the performance, by passing directly the map as parameter to the script as reported here.
In this case the script reduces to:
def xId = doc['xId'].value.toString();
return params.m[xId];
FULL EXAMPLE
Index the data
POST _bulk
{"index": { "_index": "test", "_id": 1}}
{"xId": 1}
{"index": { "_index": "test", "_id": 2}}
{"xId": 2}
{"index": { "_index": "test", "_id": 3}}
{"xId": 3}
{"index": { "_index": "test", "_id": 4}}
{"another_field": "hello"}
Complete example with the list approach:
GET test/_search
{
"query": {
"terms": {
"xId": [3, 1, 2]
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"params": {
"list": [3, 1, 2]
},
"source": """
Map m= [:];
def i = 0;
for (x in params.list) {
m[x] = (double)i++;
}
def xId = (int)doc['xId'].value;
m[xId];
"""
},
"order": "asc"
}
}
}
Complete example with the map approach
GET test/_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"params": {
"list": [3, 1, 2],
"map": {
"3": 0.0,
"1": 1.0,
"2": 2.0
}
},
"source": """
def xId = doc['xId'].value.toString();
params.map[xId];
"""
},
"order": "asc"
}
},
"query": {
"terms": {
"xId": [3, 1, 2]
}
}
}
FINAL NOTES
The script is simplified by the fact that there is a terms query that guarantees, that only documents with the ids and present in the map are considered. If it is not the case you should deal with the case of missing xId and missing key in map.
You should be careful with types. Indeed, when you retrieve a field from a stored document, it is retrieved with the indexed type, e.g, xId is stored as long and is retrieved as long. In the second example the map is from string to double, therefore xId is converted into a string before using it as key in the map.
I have a document already in the database that can be represented as follows:
{
_id: 1,
field1: "hello"
field2: 2,
listField: [
{
subField1: ...,
subField2: ...
},
{
subField1: ...,
subField2: ...
}
]
}
I have constructed another document that will be used to update the document with an id of 1. It can be represented as follows:
Document newDocument = {
field1: "goodbye",
field3: 3,
listField: [
{
subfield3: ...
},
{
subfield3: ...
}
]
}
The expected result is as follows:
{
_id: 1,
field1: "goodbye"
field2: 2,
field3: 3,
listField: [
{
subField1: ...,
subField2: ...,
subField3: ...
},
{
subField1: ...,
subField2: ...,
subField3: ...
}
]
}
How can this be accomplished using the Java driver? I had previously attempted to accomplish this with
collection.updateOne(Filters.eq("_id", new ObjectId(1)), new Document("$set", newDocument));
However, this method produces the following result instead:
{
_id: 1,
field1: "goodbye"
field2: 2,
field3: 3,
listField: [
{
subField3: ...
},
{
subField3: ...
}
]
}
The issue with this is that the listField field is being replaced with the new field. I want each document within listField to be updated with the new sub fields instead. What should I be doing differently to accomplish this?
I am trying to create a timeseries of the count of events that happened over a given time.
The events are encoded as
PCollection<KV<String, Long>> events;
Where the String is the id of the event source, and long is the timestamp of the event.
What I want out is a PCollection<Timeseries> of timeseries that have the form
class Timeseries {
String id;
List<TimeseriesWindow> windows;
}
class TimeseriesWindow {
long timestamp;
long count;
}
Toy example with a fixed window size (is this the correct term?) of 10 seconds, and an total timeseries duration of 60 seconds:
Input:
[("one", 1), ("one", 13), ("one", 2), ("one", 43), ("two", 3)]
Output:
[
{
id: "one"
windows: [
{
timestamp: 0,
count: 2
},
{
timestamp: 10,
count: 1
},
{
timestamp: 20,
count: 0
},
{
timestamp: 30,
count: 0
},
{
timestamp: 40,
count: 1
},
{
timestamp: 50,
count: 0
}
]
},
{
id: "two"
windows: [
{
timestamp: 0,
count: 1
},
{
timestamp: 10,
count: 0
},
{
timestamp: 20,
count: 0
},
{
timestamp: 30,
count: 0
},
{
timestamp: 40,
count: 0
},
{
timestamp: 50,
count: 0
}
]
}
]
I hope this makes sense :)
You can do a GroupByKey to transform your input into
[
("one", [1, 13, 2, 43]),
("two", [3]),
]
at which point you can apply a DoFn to convert the list of integers into a Timeseries object (e.g. by creating the list of TimeseriesWindow at the appropriate times, and then iterating over the values incrementing the counts.)
You may also look into the builtin windowing capabilities to see if that will meet your needs.
I'm using elasticsearch for the first time. I'm trying to use completion suggester in multi-field key, although I don't see any error but I don't get the response.
Mapping creation:
PUT /products5/
{
"mappings":{
"products" : {
"properties" : {
"name" : {
"type":"text",
"fields":{
"text":{
"type":"keyword"
},
"suggest":{
"type" : "completion"
}
}
}
}
}
}
}
Indexing:
PUT /products5/product/1
{
"name": "Apple iphone 5"
}
PUT /products5/product/2
{
"name": "iphone 4 16GB"
}
PUT /products5/product/3
{
"name": "iphone 3 SS 16GB black"
}
PUT /products5/product/4
{
"name": "Apple iphone 4 S 16 GB white"
}
PUT /products5/product/5
{
"name": "Apple iphone case"
}
Query:
POST /products5/product/_search
{
"suggest":{
"my-suggestion":{
"prefix":"i",
"completion":{
"field":"name.suggest"
}
}
}
}
Output:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"my-suggestion": [
{
"text": "i",
"offset": 0,
"length": 1,
"options": []
}
]
}
}
Please guide me what is the mistake, I tried every possible options.
From the first perspective this looks accurate. Probably the reason why you don't have correct response is that you added documents in the index before you created mapping in the index. And documents are not indexed according to the mapping you specified
I have found an issue in your mapping name. There is an inconsistency between name of the mapping and value which you specifies in the url when you're creating new documents. You create a mapping in the index with the name products. And when you add new documents you're specifying product as a name of the mapping of your index and it doesn't end with s. You have a typo.
The response
array (
0 =>
array (
'time_start' => 1252652400,
'time_stop' => 1252911600,
'stats' =>
array (
6002306163363 =>
array (
'id' => 6002306163363,
'impressions' => '6713',
'clicks' => '7',
'spent' => '593',
'actions' => '1',
),
),
),
)
data is shown in facebook api of rest/ads.getAdGroupStats.
I am not able to convert the stats part to a Java class, Where the 6002306163363 is a variable and similarly could have many more mappings. Below is the full result for three ads 123456,23456,34567.
[
{
"time_start": 0,
"time_stop": 1285224928,
"stats": {
"123456": {
"id": 123456,
"impressions": 40,
"clicks": 0,
"spent": 0,
"social_impressions": 0,
"social_clicks": 0,
"social_spent": 0
},
"23456": {
"id": 23456,
"impressions": 3,
"clicks": 0,
"spent": 0,
"social_impressions": 0,
"social_clicks": 0,
"social_spent": 0
},
"34567": {
"id": 34567,
"impressions": 211457,
"clicks": 84,
"spent": 6898,
"social_impressions": 124,
"social_clicks": 0,
"social_spent": 0
}
}
}
]
I have to make a Java class which could map to the above JSON and not able to do so. Can anyone please help me here?
Update : I am getting this data from facebook and in the api that we are using requires class, so that the returned json could be mapped. I have only control to create a class so that api internally map this out. I need the format of the java class required.
You need a hashmap or something similar to deal with those numerical keys.
public class GroupStats {
long time_start;
long time_stop;
HashMap<GroupAccount> stats;
}
public class GroupAccount {
long id;
int impressions;
int clicks;
int spent;
int social_impressions;
int social_spent;
}
I'm Mark Allen (RestFB maintainer). Version 1.6 should address this with the new option to map to the built-in JsonObject type - check out http://restfb.com for an example.