I'm trying to DRY my code and I used for that, for the first time, traits to enhance my enums.
What I want to do, is : for a given array of strings, find all the enums matching at least one keyword (non case sensitive)
The code below seems to works fine, but I think it generates me memory leaks when the method getSymbolFromIndustries is called thousands of times.
Here is a capture from VisualVM after about 10 minutes of run, the column Live Objects is always increasing after each snapshot and the number of items compared to the second line is so huge...
My heap size is always increasing too...
The trait :
trait BasedOnCategories {
String[] categories
static getSymbolFromIndustries(Collection<String> candidates) {
values().findAll {
value -> !value.categories.findAll {
categorie -> candidates.any {
candidate -> categorie.equalsIgnoreCase(candidate)
}
}
.unique()
.isEmpty()
}
}
}
One of the multiple enums I have implementing the trait
enum KTC implements BasedOnCategories, BasedOnValues {
KTC_01([
'industries': ['Artificial Intelligence','Machine Learning','Intelligent Systems','Natural Language Processing','Predictive Analytics','Google Glass','Image Recognition', 'Apps' ],
'keywords': ['AI','Voice recognition']
]),
// ... more values
KTC_43 ([
'industries': ['Fuel','Oil and Gas','Fossil Fuels'],
'keywords': ['Petroleum','Oil','Petrochemicals','Hydrocarbon','Refining']
]),
// ... more values
KTC_60([
'industries': ['App Discovery','Apps','Consumer Applications','Enterprise Applications','Mobile Apps','Reading Apps','Web Apps','App Marketing','Application Performance Management', 'Apps' ],
'keywords': ['App','Application']
])
KTC(value) {
this.categories = value.industries
this.keywords = value.keywords
}
My data-driven tests
def "GetKTCsFromIndustries"(Collection<String> actual, Enum[] expected) {
expect:
assert expected == KTC.getSymbolFromIndustries(actual)
where:
actual | expected
[ 'Oil and Gas' ] | [KTC.KTC_43]
[ 'oil and gas' ] | [KTC.KTC_43]
[ 'oil and gas', 'Fossil Fuels' ] | [KTC.KTC_43]
[ 'oil and gas', 'Natural Language Processing' ] | [KTC.KTC_01, KTC.KTC_43]
[ 'apps' ] | [KTC.KTC_01, KTC.KTC_60]
[ 'xyo' ] | []
}
My questions :
If someone have some clues to help me fix those leaks...
Is there a more elegant way to write the getSymbolFromIndustries method ?
thanks.
Not sure about performance issues, but I would redesign your trait like that:
https://groovyconsole.appspot.com/script/5205045624700928
trait BasedOnCategories {
Set<String> categories
void setCategories( Collection<String> cats ) {
categories = new HashSet( cats*.toLowerCase() ).asImmutable()
}
#groovy.transform.Memoized
static getSymbolFromIndustries(Collection<String> candidates) {
def lowers = candidates*.toLowerCase()
values().findAll{ value -> !lowers.disjoint( value.categories ) }
}
}
Now the rest of the context
trait BasedOnValues {
Set<String> keywords
}
enum KTC implements BasedOnCategories, BasedOnValues {
KTC_01([
'industries': ['Artificial Intelligence','Machine Learning','Intelligent Systems','Natural Language Processing','Predictive Analytics','Google Glass','Image Recognition'],
'keywords': ['AI','Voice recognition']
]),
// ... more values
KTC_43 ([
'industries': ['Fuel','Oil and Gas','Fossil Fuels'],
'keywords': ['Petroleum','Oil','Petrochemicals','Hydrocarbon','Refining']
]),
// ... more values
KTC_60([
'industries': ['App Discovery','Apps','Consumer Applications','Enterprise Applications','Mobile Apps','Reading Apps','Web Apps','App Marketing','Application Performance Management'],
'keywords': ['App','Application']
])
KTC(value) {
this.categories = value.industries
this.keywords = value.keywords
}
}
// some tests
[
[ [ 'Oil and Gas' ], [KTC.KTC_43] ],
[ [ 'oil and gas' ], [KTC.KTC_43] ],
[ [ 'oil and gas', 'Fossil Fuels' ], [KTC.KTC_43] ],
[ [ 'oil and gas', 'Natural Language Processing' ], [KTC.KTC_01, KTC.KTC_43] ],
[ [ 'xyo' ], [] ],
].each{
assert KTC.getSymbolFromIndustries( it[ 0 ] ) == it[ 1 ]
}
and then measure the performance
Related
I wrote a query in MongoDB as follows:
db.getCollection('student').aggregate(
[
{
$match: { "student_age" : { "$ne" : 15 } }
},
{
$group:
{
_id: "$student_name",
count: {$sum: 1},
sum1: {$sum: "$student_age"}
}
}
])
In others words, I want to fetch the count of students that aren't 15 years old and the summary of their age. The query works fine and I get two data items.
In my application, I want to do the query by Spring Data.
I wrote the following code:
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group().sum("student_age").as("totalAge"),
count().as("countOfStudentNot15YearsOld"));
When this code is run, the output query will be:
"aggregate" : "MyDocument", "pipeline" :
[ { "$match" { "AGE" : { "$ne" : 15 } } },
{ "$group" : { "_id" : null, "totalAge" : { "$sum" : "$student_age" } } },
{ "$count" : "countOfStudentNot15YearsOld" }],
"cursor" : { "batchSize" : 2147483647 }
Unfortunately, the result is only countOfStudentNot15YearsOld item.
I want to fetch the result like my native query.
If your're asking to return the grouping for both "15" and "not 15" as a result then you're looking for the $cond operator which will allow a "branching" based on conditional evaluation.
From the "shell" content you would use it like this:
db.getCollection('student').aggregate([
{ "$group": {
"_id": null,
"countFiteen": {
"$sum": {
"$cond": [{ "$eq": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"countNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, 1, 0 ]
}
},
"sumNotFifteen": {
"$sum": {
"$cond": [{ "$ne": [ "$student_age", 15 ] }, "$student_age", 0 ]
}
}
}}
])
So you use the $cond to perform a logical test, in this case whether the "student_age" in the current document being considered is 15 or not, then you can return a numerical value in response which is 1 here for "counting" or the actual field value when that is what you want to send to the accumulator instead. In short it's a "ternary" operator or if/then/else condition ( which in fact can be shown in the more expressive form with keys ) you can use to test a condition and decide what to return.
For the spring mongodb implementation you use ConditionalOperators.Cond to construct the same BSON expressions:
import org.springframework.data.mongodb.core.aggregation.*;
ConditionalOperators.Cond isFifteen = ConditionalOperators.when(new Criteria("student_age").is(15))
.then(1).otherwise(0);
ConditionalOperators.Cond notFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.then(1).otherwise(0);
ConditionalOperators.Cond sumNotFifteen = ConditionalOperators.when(new Criteria("student_age").ne(15))
.thenValueOf("student_age").otherwise(0);
GroupOperation groupStage = Aggregation.group()
.sum(isFifteen).as("countFifteen")
.sum(notFifteen).as("countNotFifteen")
.sum(sumNotFifteen).as("sumNotFifteen");
Aggregation aggregation = Aggregation.newAggregation(groupStage);
So basically you just extend off of that logic, using .then() for a "constant" value such as 1 for the "counts", and .thenValueOf() where you actually need the "value" of a field from the document, so basically equal to the "$student_age" as shown for the common shell notation.
Since ConditionalOperators.Cond shares the AggregationExpression interface, this can be used with .sum() in the form that accepts an AggregationExpression as opposed to a string. This is an improvement on past releases of spring mongo which would require you to perform a $project stage so there were actual document properties for the evaluated expression prior to performing a $group.
If all you want is to replicate the original query for spring mongodb, then your mistake was using the $count aggregation stage rather than appending to the group():
Criteria where = Criteria.where("AGE").ne(15);
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(where),
Aggregation.group()
.sum("student_age").as("totalAge")
.count().as("countOfStudentNot15YearsOld")
);
I have a projection field computed from some conditions in the current document. The native mongo query works fine. But I cant implement the query in java driver 3.4. Only java driver 3.4 syntax is relevant.
The projection code for field result from switch is:
"SITUACAO": {
"$switch" : {
"branches": [
{ case: {"$eq": ["$ID_STATUSMATRICULA", 0]},
then: {
"$switch" : {
"branches": [
{ case: {"$and": [{"$eq": ["$NR_ANDAMENTO", 0 ] },
{"$eq": ["$ID_STATUSMATRICULA", 0]} ] }, then: "NAOINICIADO" },
{ case: {"$and": [{"$gt": ["$NR_ANDAMENTO", 0]},
{"$lte": ["$NR_ANDAMENTO", 100]},
{"$eq": ["$ID_STATUSMATRICULA", 0]} ] }, then: "EMANDAMENTO" }
],
"default": "--matriculado--"
}
}
},
{ case: {"$eq": ["$ID_STATUSMATRICULA", 1]},
then: {
"$switch" : {
"branches": [
{ case: {"$and": [ {"$eq": ["$ID_STATUSMATRICULA", 1]},
{"$in": ["$ID_STATUSAPROVEITAMENTO", [1] ]} ] }, then: "APROVADO" },
{ case: {"$and": [ {"$eq": ["$ID_STATUSMATRICULA", 1]},
{"$in": ["$ID_STATUSAPROVEITAMENTO", [2] ]} ] }, then: "REPROVADO" },
{ case: {"$and": [{"$eq": ["$ID_STATUSMATRICULA", 1]},
{"$in": ["$ID_STATUSAPROVEITAMENTO", [0] ]} ] }, then: "PENDENTE" },
{ case: {"$and": [ {"$eq": ["$ID_STATUSMATRICULA", 1]},
{"$in": ["$ID_STATUSAPROVEITAMENTO", [1,2] ]} ] }, then: "CONCLUIDO" }
],
"default": "--concluido--"
}
}
}
],
"default": "--indefinida--"
}
}
The part around $and inside case statments I can draw like this:
List<Document> docs = new ArrayList<>();
docs.add( new Document("$eq", asList("$NR_ANDAMENTO", 0)) );
docs.add( new Document("$eq", asList("$ID_STATUSMATRICULA", 1)) );
Document doc = new Document("$and", docs);
but, the structure $switch / branches[] / case ... is dificult to find the way to write.
Anybody have an example like this or some idea for write this ?
Thanks
I would like to write a Json reader for such Json
{
"headers": [
{
"id": "time:monthly",
"type": "a"
},
{
"id": "Value1",
"type": "b"
},
{
"id": "Value2",
"type": "b"
}
],
"rows": [
[
"2013-01",
4,
5
],
[
"2013-02",
3,
6
]
]
}
I know (thanks to the header) that in the elements of rows the first element is of a type a, the second and the third will be of type b. My goal is to create an object row (List[a],List[b]) (
the number of element of type a and b varies that's why I use List).
My question is how can I parse rows or how can I read a Json array with different type of object and without an id ?
I'd be tempted to setup the model with cases classes and mix the play framework macro base json reader with a custom one for your rows like this.
import play.api.libs.json._
case class Header(id: String, `type`: String)
case class Row(a: String, b: Int, c: Int)
case class Data(headers: Seq[Header], rows: Seq[Row])
object RowReads extends Reads[Row] {
def reads(js: JsValue) = js match {
case JsArray(Seq(a,b,c)) =>
(a,b,c) match {
case (a: JsString, b: JsNumber, c: JsNumber) =>
JsSuccess(Row(a.value,b.value.toInt,c.value.toInt))
case _ => JsError("nope")
}
case _ => JsError("nope")
}
}
object Formats {
implicit val headerReads = Json.reads[Header]
implicit val rowReads = RowReads
implicit val dataReads = Json.reads[Data]
def readIt(js: JsValue) = {
Json.fromJson[Data](js: JsValue)
}
}
For more details.
https://playframework.com/documentation/2.4.x/ScalaJson
For testing purpose i need to override 'equals' method:
def any = [equals: { true }] as String
any == 'should be true'
// false
More detailed about problem:
class EmployeeEndpointSpec extends RestSpecification {
void "test employee" () {
when:
get "/v1/employee", parameters
then:
expectedStatus.equals(response.statusCode)
expectedJson.equals(response.json)
where:
parameters << [
[:],
[id: 824633720833, style: "small"]
]
expectedStatus << [
HttpStatus.BAD_REQUEST,
HttpStatus.OK
]
expectedJson << [
[errorCode: "badRequest"],
[
id: 824633720833,
name: "Jimmy",
email: "jimmy#fakemail.com",
dateCreated:"2015-01-01T01:01:00.000", // this value should be ignored
lastUpdated: "2015-01-01T01:01:00.000" // and this
]
]
}
}
lastUpdated and dateCreated may change in time, and i need
somehow ignore them.
If there's no need to compare mentioned fields - remove them:
class EmployeeEndpointSpec extends RestSpecification {
void "test employee" () {
when:
get "/v1/employee", parameters
then:
expectedStatus.equals(response.statusCode)
def json = response.json
json.remove('dateCreated')
json.remove('lastUpdated')
expectedJson.equals(response.json)
where:
parameters << [
[:],
[id: 824633720833, style: "small"]
]
expectedStatus << [
HttpStatus.BAD_REQUEST,
HttpStatus.OK
]
expectedJson << [
[errorCode: "badRequest"],
[
id: 824633720833,
name: "Jimmy",
email: "jimmy#fakemail.com",
dateCreated:"2015-01-01T01:01:00.000",
lastUpdated: "2015-01-01T01:01:00.000"
]
]
}
}
I'd also separate testing negative and positive scenarios.
You can also test keySet() separately from testing keys values instead of comparing the whole map. This is the way I'd do that:
then:
def json = response.json
json.id == 824633720833
json.name == "Jimmy"
json.email == "jimmy#fakemail.com"
json.dateCreated.matches('<PATTERN>')
json.lastUpdated.matches('<PATTERN>')
In case You don't like the last two lines it can be replaced with:
json.keySet().contains('lastUpdated', 'dateCreated')
The answer is:
String.metaClass.equals = { Object _ -> true }
I couldn't convert below mongodb aggregate operation to Spring Data AggregationOperation. I am using Spring Data MongoDB 1.3.2 version.
db.ads.aggregate( { $group :{
_id : "$adId",
req : { $sum : 1 },
imp: {$sum: { $cond: [ { $eq: [ "$imped", true ] } , 1, 0 ] } },
click: {$sum: { $cond: [ { $eq: [ "$clked", true ] } , 1, 0 ] } } ,
bid: {$sum: { $cond: [ { $eq: [ "$clked", true ] } , "$bid", 0 ] } } } });
I stopped here:
AggregationOperation group = Aggregation.group("adId").count().as("req").sum("imped").as("imp").;;
I would appreciate any help, thanks.
Currently there is no support to have $cmp/$eq/$ne to be used in group
or project aggregation. It would be a good to have feature. Also it
would be helpful to improve on some of the documentation/examples for
Criteria features.
please let's vote in here: https://jira.springsource.org/browse/DATAMONGO-784