Related
I have a collection in mongodb with below data:
collection name: runState
runId: 1
startTime:2020-09-16T20:56:06.598+00:00
endTime:2020-09-16T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 2
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 2
startTime:2021-01-27T20:56:06.598+00:00
endTime:2021-01-27T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 3
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 4
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 5
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 6
startTime:2021-09-09T20:56:06.598+00:00
endTime:2021-09-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 7
startTime:2022-01-09T20:56:06.598+00:00
endTime:2022-01-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 8
startTime:2022-01-10T20:56:06.598+00:00
endTime:2022-01-10T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: failed
ranBy:David
I want the output as count for last 12 months (Jan 2021 to Jan 2022) for each products where task is completed( product is gettable from product_action)
Output should be in below format:
{
"_id" : "postgres",
completed: [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "SEP-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 1
}
]
},
{
"_id" : "oracle",
"completed" : [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 2
}
]
}
I have started with below, but not sure how to get count for month wise like above.
{"product_action":{$regex:"postgres|oracle"},"Task":"completed"}
As this is new to me, can someone help me with mongo DB query to get the result and also code to acheive this in Java springboot?
Java code I tried using aggregation, but this is not yielding the result I want.
Aggregation agg = Aggregation.newAggregation(
Aggregation.project("endTime","Task","product_action").and(DateOperators.Month.monthOf("endTime")).as("month"),
Aggregation.match(Criteria.where("product_action").regex("postgres|oracle").and("Task").is("completed")
.and("endTime").gte(parseDate("2021-02-01"))),
Aggregation.group("month","Task").count().as("count")
);
Try this on for size:
db.foo.aggregate([
// Get easy stuff out way. Filter for the desired date range and only
// those items that are complete:
{$match: {$and: [
{"endTime":{$gte:new ISODate("2021-01-01")}},
{"endTime":{$lt:new ISODate("2022-01-01")}},
{"Task":"completed"}
]} }
// Now group by product and date expressed as month-year. The product
// is embedded in the field value so there are a few approaches to digging
// it out. Here, we split on underscore and take the [2] item.
,{$group: {_id: {
p: {$arrayElemAt:[{$split:["$product_action","_"]},2]},
d: {$dateToString: {date: "$endTime", format: "%m-%Y"}}
},
n: {$sum: 1}
}}
// The OP seeks to make the date component nested inside the product
// instead of having it as a two-part grouping. We will "regroup" and
// create an array. This is slightly different than the format indicated
// by the OP but values as keys (e.g. "Jan-2021: 2") is in general a
// poor idea so instead we construct an array of proper name:value pairs.
,{$group: {_id: '$_id.p',
completed: {$push: {d: '$_id.d', n: '$n'}}
}}
]);
which yields
{
"_id" : "postgres",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "09-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 2
}
]
}
UPDATED
It has come up before that the $dateToString function does not have a format argument to produce the 3 letter abbreviation for a month e.g. JAN (or a long form e.g. January for that matter). Sorting still works with 01-2021,02-2021,04-2021 vs. JAN-2021,FEB-2021,APR-2021 but if such output is really desired directly from the DB instead of post-processing in the client-side code, then the second group is replaced by a $sort and $group as follows:
// Ensure the NN-YYYY dates are going in increasing order. The product
// component _id.p does not matter here -- only the dates have to be
// increasing. NOTE: This is OPTIONAL with respect to changing
// NN-YYYY into MON-YYYY but almost always the follow on question is
// how to get the completed list in date order...
,{$sort: {'_id.d':1}}
// Regroup as before but index the NN part of NN-YYYY into an
// array of 3 letter abbrevs, then reconstruct the string with the
// dash and the year component. Remember: the order of the _id
// in the doc stream coming out of $group is not deterministic
// but the array created by $push will preserve the order in
// which it was pushed -- which is the date-ascending sorted order
// from the prior stage.
,{$group: {_id: '$_id.p',
completed: {$push: {
d: {$concat: [
{$arrayElemAt:[ ['JAN','FEB','MAR',
'APR','MAY','JUN',
'JUL','AUG','SEP',
'OCT','NOV','DEC'],
// minus 1 to adjust for zero-based array:
{$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]}
]},
"-",
{$substr:['$_id.d',3,4]}
]},
n: '$n'}}
}}
which yields:
{
"_id" : "postgres",
"completed" : [
{
"d" : "JAN-2021",
"n" : 1
},
{
"d" : "FEB-2021",
"n" : 1
},
{
"d" : "SEP-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "JAN-2021",
"n" : 2
},
{
"d" : "FEB-2021",
"n" : 1
}
]
}
As for converting this to Java, there are several approaches but unless a great deal of programmatic control is required, then capturing the query as "relaxed JSON" (quotes not required around keys) in a string in Java and calling Document.parse() seems to be the easiest way. A full example including helper functions and the appropriate Java drivers calls can be found here: https://moschetti.org/rants/mongoaggcvt.html but the gist of it is:
private static class StageHelper {
private StringBuilder txt;
public StageHelper() {
this.txt = new StringBuilder();
}
public void add(String expr, Object ... subs) {
expr.replace("'", "\""); // This is the helpful part.
if(subs.length > 0) {
expr = String.format(expr, subs); // this too
}
txt.append(expr);
}
public Document fetch() {
Document b = Document.parse(txt.toString());
return b;
}
}
private List<Document> makePipeline() {
List<Document> pipeline = new ArrayList<Document>();
StageHelper s = new StageHelper();
s.add("{$match: {$and: [ ");
// Note use of EJSON here plus string substitution of dates:
s.add(" {endTime:{$gte: {$date: '%s'}} }", "2021-01-01");
s.add(" {endTime:{$lt: {$date: '%s'}} }", "2022-01-01");
s.add(" {Task:'completed'} ");
s.add("]} } ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: { ");
s.add(" p: {$arrayElemAt:[{$split:['$product_action','_']},2]}, ");
s.add(" d: {$dateToString: {date: '$endTime', 'format': '%m-%Y'}} ");
s.add(" }, ");
s.add(" n: {$sum: 1} ");
s.add("}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$sort: {'_id.d':1}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: '$_id.p', ");
s.add(" completed: {$push: { ");
s.add(" d: {$concat: [ ");
s.add(" {$arrayElemAt:[ ['JAN','FEB','MAR', ");
s.add(" 'APR','MAY','JUN', ");
s.add(" 'JUL','AUG','SEP', ");
s.add(" 'OCT','NOV','DEC'], ");
s.add(" {$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]} ");
s.add(" ]}, ");
s.add(" '-', ");
s.add(" {$substr:['$_id.d',3,4]} ");
s.add(" ]}, ");
s.add(" n: '$n'}} ");
s.add(" }} ");
pipeline.add(s.fetch());
return pipeline;
}
...
import com.mongodb.client.MongoCursor;
import com.mongodb.client.AggregateIterable;
AggregateIterable<Document> output = coll.aggregate(pipeline);
MongoCursor<Document> iterator = output.iterator();
while (iterator.hasNext()) {
Document doc = iterator.next();
I'm trying to DRY my code and I used for that, for the first time, traits to enhance my enums.
What I want to do, is : for a given array of strings, find all the enums matching at least one keyword (non case sensitive)
The code below seems to works fine, but I think it generates me memory leaks when the method getSymbolFromIndustries is called thousands of times.
Here is a capture from VisualVM after about 10 minutes of run, the column Live Objects is always increasing after each snapshot and the number of items compared to the second line is so huge...
My heap size is always increasing too...
The trait :
trait BasedOnCategories {
String[] categories
static getSymbolFromIndustries(Collection<String> candidates) {
values().findAll {
value -> !value.categories.findAll {
categorie -> candidates.any {
candidate -> categorie.equalsIgnoreCase(candidate)
}
}
.unique()
.isEmpty()
}
}
}
One of the multiple enums I have implementing the trait
enum KTC implements BasedOnCategories, BasedOnValues {
KTC_01([
'industries': ['Artificial Intelligence','Machine Learning','Intelligent Systems','Natural Language Processing','Predictive Analytics','Google Glass','Image Recognition', 'Apps' ],
'keywords': ['AI','Voice recognition']
]),
// ... more values
KTC_43 ([
'industries': ['Fuel','Oil and Gas','Fossil Fuels'],
'keywords': ['Petroleum','Oil','Petrochemicals','Hydrocarbon','Refining']
]),
// ... more values
KTC_60([
'industries': ['App Discovery','Apps','Consumer Applications','Enterprise Applications','Mobile Apps','Reading Apps','Web Apps','App Marketing','Application Performance Management', 'Apps' ],
'keywords': ['App','Application']
])
KTC(value) {
this.categories = value.industries
this.keywords = value.keywords
}
My data-driven tests
def "GetKTCsFromIndustries"(Collection<String> actual, Enum[] expected) {
expect:
assert expected == KTC.getSymbolFromIndustries(actual)
where:
actual | expected
[ 'Oil and Gas' ] | [KTC.KTC_43]
[ 'oil and gas' ] | [KTC.KTC_43]
[ 'oil and gas', 'Fossil Fuels' ] | [KTC.KTC_43]
[ 'oil and gas', 'Natural Language Processing' ] | [KTC.KTC_01, KTC.KTC_43]
[ 'apps' ] | [KTC.KTC_01, KTC.KTC_60]
[ 'xyo' ] | []
}
My questions :
If someone have some clues to help me fix those leaks...
Is there a more elegant way to write the getSymbolFromIndustries method ?
thanks.
Not sure about performance issues, but I would redesign your trait like that:
https://groovyconsole.appspot.com/script/5205045624700928
trait BasedOnCategories {
Set<String> categories
void setCategories( Collection<String> cats ) {
categories = new HashSet( cats*.toLowerCase() ).asImmutable()
}
#groovy.transform.Memoized
static getSymbolFromIndustries(Collection<String> candidates) {
def lowers = candidates*.toLowerCase()
values().findAll{ value -> !lowers.disjoint( value.categories ) }
}
}
Now the rest of the context
trait BasedOnValues {
Set<String> keywords
}
enum KTC implements BasedOnCategories, BasedOnValues {
KTC_01([
'industries': ['Artificial Intelligence','Machine Learning','Intelligent Systems','Natural Language Processing','Predictive Analytics','Google Glass','Image Recognition'],
'keywords': ['AI','Voice recognition']
]),
// ... more values
KTC_43 ([
'industries': ['Fuel','Oil and Gas','Fossil Fuels'],
'keywords': ['Petroleum','Oil','Petrochemicals','Hydrocarbon','Refining']
]),
// ... more values
KTC_60([
'industries': ['App Discovery','Apps','Consumer Applications','Enterprise Applications','Mobile Apps','Reading Apps','Web Apps','App Marketing','Application Performance Management'],
'keywords': ['App','Application']
])
KTC(value) {
this.categories = value.industries
this.keywords = value.keywords
}
}
// some tests
[
[ [ 'Oil and Gas' ], [KTC.KTC_43] ],
[ [ 'oil and gas' ], [KTC.KTC_43] ],
[ [ 'oil and gas', 'Fossil Fuels' ], [KTC.KTC_43] ],
[ [ 'oil and gas', 'Natural Language Processing' ], [KTC.KTC_01, KTC.KTC_43] ],
[ [ 'xyo' ], [] ],
].each{
assert KTC.getSymbolFromIndustries( it[ 0 ] ) == it[ 1 ]
}
and then measure the performance
Our prod mongo server is running out of memory with excessive logging of below statement
2018-03-19T20:03:05.627-0500 [conn2] warning: ClientCursor::staticYield can't unlock b/c of recursive lock ns:
top:{
opid:16,
active:true,
secs_running:0,
microsecs_running:254,
op:"query",
ns:"users.person",
query:{
findandmodify:"person",
query:{
clientSN:"405F014DE02B33F1",
status:"New"
},
update:{
$set:{
status:"InProcess"
},
$currentDate:{
lastUpdateTime:true
}
},
new:true
},
client:"10.102.26.26:61299",
desc:"conn2",
connectionId:2,
locks:{
^:"w",
^users:"W"
},
waitingForLock:false,
numYields:0,
lockStats:{
timeLockedMicros:{
},
timeAcquiringMicros:{
r:0,
w:1070513
}
}
}
I have checked the MongoDB: How to disable logging the warning: ClientCursor::staticYield can't unlock b/c of recursive lock? which suggests cause of the issue is missing indexes.
I tried runnign the query with explain() as per the above article and Below is the output of db.getCollection('personSync').find({"clientSN":"405F014DE02B33F1","status":"New"}).explain() the query which has the fields of above findandmodify operation
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 71331,
"nscanned" : 71331,
"nscannedObjectsAllPlans" : 71331,
"nscannedAllPlans" : 71331,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 557,
"nChunkSkips" : 0,
"millis" : 59,
"server" : "SQL01:27017",
"filterSet" : false,
"stats" : {
"type" : "COLLSCAN",
"works" : 71333,
"yields" : 557,
"unyields" : 557,
"invalidates" : 0,
"advanced" : 0,
"needTime" : 71332,
"needFetch" : 0,
"isEOF" : 1,
"docsTested" : 71331,
"children" : []
}
}
so I was referring to article on https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/ to create index, would adding index in below way fix my issue for findandmodify operation in my case? Or do I need to add any more indexes too?
db.users.createIndex({ clientSN:"405F014DE02B33F1", status:"New"})
Adding the index should improve the performance of this operation. Currently it's performing a Collection Scan; using an index will be much more efficient.
The createIndex command is incorrect. It should be the following:
db.users.createIndex({ clientSN:1, status:1}, { background : true } )
Note that indexes are built in the foreground by default, so setting the background flag is very important.
I created a GUI in QTJambi which runs the Matlab code when a button is pressed. Everything runs fine and output is received from the Matlab code, but when I close the window of the GUI, I get a segmentation violation, shown below:
------------------------------------------------------------------------
Segmentation violation detected at Sun Apr 26 19:21:03 2015
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled
Current Visual : 0x23 (class 4, depth 24)
Default Encoding : UTF-8
GNU C Library : 2.19 stable
MATLAB Architecture: glnxa64
MATLAB Root : /usr/local/MATLAB/MATLAB_Compiler_Runtime/v83
MATLAB Version : 8.3.0.532 (R2014a)
Operating System : Linux 3.13.0-49-generic #83-Ubuntu SMP Fri Apr 10 20:11:33 UTC 2015 x86_64
Processor ID : x86 Family 31 Model 4 Stepping 3, AuthenticAMD
Virtual Machine : Java 1.7.0_79-b14 with Oracle Corporation OpenJDK 64-Bit Server VM mixed mode
Window System : The X.Org Foundation (11501000), display :0.0
Fault Count: 1
Abnormal termination:
Segmentation violation
Register State (from fault):
RAX = 00007fa7adc2f410 RBX = 0000000000000000
RCX = 00007fa7a8365ae0 RDX = 0000000000000000
RSP = 00007fa7ae80f300 RBP = 00007fa7a83458d0
RSI = 0000000000000000 RDI = 00620069006c68a0
R8 = 00007fa7a833f500 R9 = 00007fa7a8364330
R10 = 00007fa7ae80f130 R11 = 0000000000000000
R12 = 0000000000000080 R13 = 0000000000000008
R14 = 00007fa79bffbfb8 R15 = 0000000000000001
RIP = 00007fa7adc2f414 EFL = 0000000000010206
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007fa7adc2f414 /lib/x86_64-linux-gnu/libpthread.so.0+00042004 pthread_mutex_lock+00000004
[ 1] 0x00007fa799a9e2c7 /usr/lib/x86_64-linux-gnu/libX11.so.6+00279239 XrmDestroyDatabase+00000039
[ 2] 0x00007fa799a867b3 /usr/lib/x86_64-linux-gnu/libX11.so.6+00182195 _XFreeDisplayStructure+00001123
[ 3] 0x00007fa799a744ef /usr/lib/x86_64-linux-gnu/libX11.so.6+00107759 XCloseDisplay+00000223
[ 4] 0x00007fa79b580d6e /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.6+02309486
[ 5] 0x00007fa79b517d66 /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.6+01879398 _ZN12QApplicationD1Ev+00001158
[ 6] 0x00007fa7938bdb57 /usr/lib/jni/libcom_trolltech_qt_gui.so+05557079 _ZN25QtJambiShell_QApplicationD0Ev+00000023
[ 7] 0x00007fa7a0eaac58 /usr/lib/x86_64-linux-gnu/libQtCore.so.4.8.6+01662040 _ZN7QObject5eventEP6QEvent+00000648
[ 8] 0x00007fa79b51bed3 /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.6+01896147 _ZN12QApplication5eventEP6QEvent+00000067
[ 9] 0x00007fa79b516e2c /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.6+01875500 _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent+00000140
[ 10] 0x00007fa79b51d4a0 /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.6+01901728 _ZN12QApplication6notifyEP7QObjectP6QEvent+00000624
[ 11] 0x00007fa7a0e924dd /usr/lib/x86_64-linux-gnu/libQtCore.so.4.8.6+01561821 _ZN16QCoreApplication14notifyInternalEP7QObjectP6QEvent+00000109
[ 12] 0x00007fa7a0e95b3d /usr/lib/x86_64-linux-gnu/libQtCore.so.4.8.6+01575741 _ZN23QCoreApplicationPrivate16sendPostedEventsEP7QObjectiP11QThreadData+00000493
[ 13] 0x00007fa7a0e96bb0 /usr/lib/x86_64-linux-gnu/libQtCore.so.4.8.6+01579952 _ZN16QCoreApplication4execEv+00000192
[ 14] 0x00007fa7a418d7f8 <unknown-module>+00000000
[ 15] 0x00007fa7a41811d4 <unknown-module>+00000000
[ 16] 0x00007fa7a417b4e7 <unknown-module>+00000000
[ 17] 0x00007fa7ad1f1099 /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so+06193305
[ 18] 0x00007fa7ad1f0b38 /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so+06191928
[ 19] 0x00007fa7ad1ffc6b /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so+06253675
[ 20] 0x00007fa7ad210ed8 /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so+06323928
[ 21] 0x00007fa7ae40f1f9 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/../lib/amd64/jli/libjli.so+00012793
[ 22] 0x00007fa7adc2d182 /lib/x86_64-linux-gnu/libpthread.so.0+00033154
[ 23] 0x00007fa7ae14147d /lib/x86_64-linux-gnu/libc.so.6+01025149 clone+00000109
If this problem is reproducible, please submit a Service Request via:
http://www.mathworks.com/support/contact_us/
A technical support engineer might contact you with further information.
Thank you for your help.
http://uk.mathworks.com/matlabcentral/answers/100053-why-does-jboss-7-1-throw-a-segmentation-violation-when-trying-to-call-a-matlab-builder-ja-2-2-4-r20
I added MWApplication.initialize(MWMCROption.NOJVM); to the main function of my program, before the QTJambi initialisation procedure.
I'm parsing SIU S14 with the following segments order:
MSH
SCH
PID
PV1
RGS
AIL
AIS
and although it parses without error, I can't retrieve data from AIS segment. But when I move AIS segement before AIL, everything seems to work fine. So does segments order matter in HL7?
The order of segments in a HL7 message is predetermined by the message type. In Schedule Information Unsolicited messages the AIS segment has to be ahead of AIL.
SIU^S12-S24,S26,S27^SIU_S12: Schedule Information Unsolicited
MSH Message Header
SCH Schedule Activity Information
[ { TQ1 } ] Timing/Quantity
[ { NTE } ] Notes and Comments for the SCH
[ { --- PATIENT begin
PID Patient Identification
[ PD1 ] Additional Demographics
[ PV1 ] Patient Visit
[ PV2 ] Patient Visit - Additional Info
[ { OBX } ] Observation/Result
[ { DG1 } ] Diagnosis
} ] --- PATIENT end
{ --- RESOURCES begin
RGS Resource Group Segment
[ { --- SERVICE begin
AIS Appointment Information - Service
[ { NTE } ] Notes and Comments for the AIS
} ] --- SERVICE end
[ { --- GENERAL_RESOURCE begin
AIG Appointment Information - General Resource
[ { NTE } ] Notes and Comments for the AIG
} ] --- GENERAL_RESOURCE end
[ { --- LOCATION_RESOURCE begin
AIL Appointment Information - Location Resource
[ { NTE } ] Notes and Comments for the AIL
} ] --- LOCATION_RESOURCE end
[ { --- PERSONNEL_RESOURCE begin
AIP Appointment Information - Personnel Resource
[ { NTE } ] Notes and Comments for the AIP
} ] --- PERSONNEL_RESOURCE end
} --- RESOURCES end
But both segments respectively their segment groups are optional. So a message with just an AIL and no AIS segment is syntactically ok. And as HL7 messages are open, there are additional or local defined segments allowed after a complete message.In order to retrieve this additional data you need an adapted template.