Group by two arrays and get concatenation values in Mongodb aggregations

Group by two arrays and get concatenation values in Mongodb aggregations - java

Having a document with this format :
"_id" : ObjectId("59ce3bb32708c95ee2168e2f"),
"document1" : [
{
"value" : "doc1A"
},
{
"value" : "doc1B"
},
{
"value" : "doc1C"
},
{
"value" : "doc1D"
},
{
"value" : "doc1E"
},
{
"value" : "doc1F"
}
],
"document2" : [
{
"value" : "doc2A"
},
{
"value" : "doc2B"
},
{
"value" : "doc2C"
},
{
"value" : "doc2D"
},
"metric1" :0.0,
"metric2" : 0.0
]
}
I need to group by the concatenation of the document1 and document 2 values and perform some calculs on it in Aggregation framework at Java.
I can do group(document1,document2) but I'll get as an _id an array so I want to get it as a concatenation and as :
doc1A (doc2A) / doc1A (doc2B) / doc1A (doc2C) ...
Do you have any idea ?

Related

Elastic search term query not working on a specific field

I'm new to elastic search.
So this is how the index looks:
{
"scresults-000001" : {
"aliases" : {
"scresults" : { }
},
"mappings" : {
"properties" : {
"callType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"esdtValues" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gasLimit" : {
"type" : "long"
},
AND MORE OTHER Fields.......
If I'm trying to create a search query in Java that looks like this:
{
"bool" : {
"filter" : [
{
"term" : {
"sender" : {
"value" : "sendervalue",
"boost" : 1.0
}
}
},
{
"term" : {
"data" : {
"value" : "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU=",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
If I run this query I get 0 hits. If I change the field "data" with other field it works. I don't understand what's different.
How I actually create the query in Java+SpringBoot:
QueryBuilder boolQuery = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery("sender", "sendervalue"))
.filter(QueryBuilders.termQuery("data",
"YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="));
Query searchQuery = new NativeSearchQueryBuilder()
.withFilter(boolQuery)
.build();
SearchHits<ScResults> articles = elasticsearchTemplate.search(searchQuery, ScResults.class);

Since you're trying to do an exact match on a string with a term query, you need to do it on the data.keyword field which is not analyzed. Since the data field is a text field, hence analyzed by the standard analyzer, not only are all letters lowercased but the = sign at the end also gets stripped off, so there's no way this can match (unless you use a match query on the data field but then you'd not do exact matching anymore).
POST _analyze
{
"analyzer": "standard",
"text": "YWRkTGlxdWlkaXR5UHJveHlAMDAwMDAwMDAwMDAwMDAwMDA1MDBlYmQzMDRjMmYzNGE2YjNmNmE1N2MxMzNhYjdiOGM2ZjgxZGM0MDE1NTQ4M0A3ZjE1YjEwODdmMjUwNzQ4QDBjMDU0YjcwNDhlMmY5NTE1ZWE3YWU="
}
Results:
{
"tokens" : [
{
"token" : "ywrktglxdwlkaxr5uhjvehlamdawmdawmdawmdawmdawmda1mdblymqzmdrjmmyznge2yjnmnme1n2mxmznhyjdiogm2zjgxzgm0mde1ntq4m0a3zje1yjewoddmmjuwnzq4qdbjmdu0yjcwndhlmmy5nte1zwe3ywu",
"start_offset" : 0,
"end_offset" : 163,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}

Highchart tooltip value should be format based on locale

For example, I want to display the below format:
Italy Format : 1.325.000
Us Format : 1,325,000
etc.
But highchart display another format using the below response.
Based on locale it shows the value in tooltip dynamically.
I am trying with below response.
Highcharts.chart('container', {
"tooltip" : {
"shared" : true
},
"legend" : {
"enabled" : true,
"reversed" : false
},
"credits" : {
"enabled" : false
},
"exporting" : {
"enabled" : false
},
"chart" : {
"zoomType" : "xy"
},
"title" : {
"text" : "Financial analytic"
},
"xAxis" : [ {
"categories" : [ "Amar", "Kiran", "Venkatesh" ],
"crosshair" : true
} ],
"yAxis" : [ {
"title" : {
"text" : "Financial"
},
"labels" : {
"format" : "${value:.2f}USD"
}
} ],
"series" : [ {
"type" : "column",
"name" : "Financial",
"data" : [ 1325000.0, 1740000.0, 1560000.0 ],
"tooltip" : {
"valueDecimals" : 2,
"valuePrefix" : "$",
"valueSuffix" : "USD"
},
"yAxis" : 0
}]
});

You need to set thousandsSep in the lang options:
Highcharts.setOptions({
lang: {
thousandsSep: ','
}
});
Live demo: http://jsfiddle.net/BlackLabel/4m79t50g/1/
API Reference: https://api.highcharts.com/gantt/lang.thousandsSep

What is meant by processedWithError in the report task manager?

I already ingested the file into the druid, greatfully it shows the ingestion is success. However when I checked in the reports of the ingestion, there are all rows are processed with error yet the Datasource is display in the "Datasource" tab.
I have tried to minimise the rows from 20M to 20 rows only. Here is my configuration file:
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "/home/data/Salutica",
"filter" : "outDashboard2RawV3.csv"
}
},
"dataSchema" : {
"dataSource": "DaTRUE2_Dashboard_V3",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "WEEK",
"queryGranularity" : "none",
"intervals" : ["2017-05-08/2019-05-17"],
"rollup" : false
},
"parser" : {
"type" : "string",
"parseSpec": {
"format" : "csv",
"timestampSpec" : {
"column" : "Date_Time",
"format" : "auto"
},
"columns" : [
"Main_ID","Parameter_ID","Date_Time","Serial_Number","Status","Station_ID",
"Station_Type","Parameter_Name","Failed_Date_Time","Failed_Measurement",
"Database_Name","Date_Time_Year","Date_Time_Month",
"Date_Time_Day","Date_Time_Hour","Date_Time_Weekday","Status_New"
],
"dimensionsSpec" : {
"dimensions" : [
"Date_Time","Serial_Number","Status","Station_ID",
"Station_Type","Parameter_Name","Failed_Date_Time",
"Failed_Measurement","Database_Name","Status_New",
{
"name" : "Main_ID",
"type" : "long"
},
{
"name" : "Parameter_ID",
"type" : "long"
},
{
"name" : "Date_Time_Year",
"type" : "long"
},
{
"name" : "Date_Time_Month",
"type" : "long"
},
{
"name" : "Date_Time_Day",
"type" : "long"
},
{
"name" : "Date_Time_Hour",
"type" : "long"
},
{
"name" : "Date_Time_Weekday",
"type" : "long"
}
]
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
}
]
},
"tuningConfig" : {
"type" : "index",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {}
}
}
}
Report:
{"ingestionStatsAndErrors":{"taskId":"index_DaTRUE2_Dashboard_V3_2019-09-10T01:16:47.113Z","payload":{"ingestionState":"COMPLETED","unparseableEvents":{},"rowStats":{"determinePartitions":{"processed":0,"processedWithError":0,"thrownAway":0,"unparseable":0},"buildSegments":{"processed":0,"processedWithError":20606701,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}
I'm expecting this:
{"processed":20606701,"processedWithError":0,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}
instead of this:
{"processed":0,"processedWithError":20606701,"thrownAway":0,"unparseable":1}},"errorMsg":null},"type":"ingestionStatsAndErrors"}}

Below is my input data from csv;
"Main_ID","Parameter_ID","Date_Time","Serial_Number","Status","Station_ID","Station_Type","Parameter_Name","Failed_Date_Time","Failed_Measurement","Database_Name","Date_Time_Year","Date_Time_Month","Date_Time_Day","Date_Time_Hour","Date_Time_Weekday","Status_New"
1,3,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","1.8V","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,4,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","1.35V","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,5,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","Isc_VChrg","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"
1,6,"2018-10-05 15:00:55","1840SDF00038","Passed","ST1","BLTBoard","Isc_VBAT","","","DaTRUE2Left",2018,10,5,15,"Friday","Passed"

Cassandra: Java equivalent method to `sstabledump`?

I'm writing test for a java program which outputs cassandra sstable files (e.g. mc-1-big-Data.db). Now what I have is an correct output file (correct.db). To check if the program is correct or not, I need to dump the two files, and compare each field inside excep "liveness_info" (which means I cannot directly compare mc-1-big-Data.db with correct.db).
I know I can use sstabledump to do this. But this command runs in shell and I may not call it directly in Java since I'm supposed to do a unit test. Therefore, I'd like to have an equivalent method in Java. But after searching for a long time I still didn't find any. Could anyone give some suggestions? Thanks!
[update]
I've tried the method #JimWartnick mentions. When I used SSTableExport.main(...), I got this error java.lang.ExceptionInInitializerError caused by org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote files.
It seems this requires me to setup cassandra server, but I suspect for a unit test I cannot do this. Any suggestions? Thanks!
[example dump]
Just in case it's helpful, I attach the example dump below.
[
{
"partition" : {
"key" : [ "1" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 33,
"liveness_info" : { "tstamp" : "2019-08-15T23:52:11.715Z" },
"cells" : [
{ "name" : "name", "value" : "Amy" }
]
}
]
},
{
"partition" : {
"key" : [ "2" ],
"position" : 34
},
"rows" : [
{
"type" : "row",
"position" : 67,
"liveness_info" : { "tstamp" : "2019-08-15T23:52:11.738Z" },
"cells" : [
{ "name" : "name", "value" : "Bob" }
]
}
]
},
{
"partition" : {
"key" : [ "4" ],
"position" : 68
},
"rows" : [
{
"type" : "row",
"position" : 103,
"liveness_info" : { "tstamp" : "2019-08-15T23:52:11.738Z" },
"cells" : [
{ "name" : "name", "value" : "Caleb" }
]
}
]
},
{
"partition" : {
"key" : [ "3" ],
"position" : 104
},
"rows" : [
{
"type" : "row",
"position" : 133,
"liveness_info" : { "tstamp" : "2019-08-15T23:52:11.738Z" },
"cells" : [
{ "name" : "name", "value" : "" }
]
}
]
}

How to retrieve a document by its own sub document or array?

I have such structure of document:
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedPhoneList" : [
{
"type" : "家庭",
"number" : "00000000000"
},
{
"type" : "手机",
"number" : "00000000000"
}
],
"embedAddrList" : [
{
"type" : "家庭",
"addr" : "山东省诸城市***"
},
{
"type" : "工作",
"addr" : "深圳市南山区***"
}
],
"embedEmailList" : [
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
},
{
"email" : "********#gmail.com"
}
]
}
What I wan't to do is find the document by it's sub document,such as email in embedEmailList field.
Or if I have structure like this
{
"_id" : "4e76fd1e927e1c9127d1d2e8",
"name" : "***",
"embedEmailList" : [
"123#gmail.com" ,
"********#gmail.com" ,
]
}
the embedEmailList is array,how to find if there is 123#gmail.com?
Thanks.

To search for a specific value in an array, mongodb supports this syntax:
db.your_collection.find({embedEmailList : "foo#bar.com"});
See here for more information.
To search for a value in an embedded object, it supports this syntax:
db.your_collection.find({"embedEmailList.email" : "foo#bar.com"});
See here for more information.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Group by two arrays and get concatenation values in Mongodb aggregations - java

Related

Elastic search term query not working on a specific field

Highchart tooltip value should be format based on locale

What is meant by processedWithError in the report task manager?

Cassandra: Java equivalent method to `sstabledump`?

How to retrieve a document by its own sub document or array?

Categories

Resources