Prometheus query by label with range vectors - java

I'm defining a lot of counters in my app (using java micrometer) and in order to trigger alerts I tag the counters which I want to monitor with "error":"alert" so a query like {error="alert"} will generate multiple range vectors:
error_counter_component1{error="alert", label2="random"}
error_counter_component2{error="alert", label2="random2"}
error_counter_component3{error="none", label2="random3"}
I don't control the name of the counters I can only add the label to the counters I want to use in my alert. The alert that I want to have is if all the counters labeled with error="alert" increase more then 3 in one hour so I could use this kind of query: increase({error="alert"}[1h]) > 3 but I get the fallowing error in Prometheus: Error executing query: vector cannot contain metrics with the same labelset
Is there a way to merge two range vectors or should I include some kind of tag in the name of the counter? Or should I have a single counter for errors and the tags should specify the source something like this:
errors_counter{source="component1", use_in_alert="yes"}
errors_counter{source="component2", use_in_alerts="yes"}
errors_counter{source="component3", use_in_alerts="no"}

The version with source="componentX" label is much more fitting to prometheus data model. This is assuming the error_counter metric is really one metric and other than source label value it will have same labels etc. (for example it is emitted by the same library or framework).
Adding stuff like use_in_alerts label is not a great solution. Such label does not identify time series.
I'd say put a list of components to alert on somewhere where your alerting queries are constructed and dynamically create separate alerting rules (without adding such label to raw data).
Other solution is to have a separate pseudo metric that will obnly be used to provide metadata about components, like:
component_alert_on{source="component2"} 1
and. combine it in alerting rule to only alert on components you need. It can be generated in any possible way, but one possibility is to have it added in static recording rule. This has the con of complicating alerting query somehow.
But of course use_in_alerts label will also probably work (at least while you are only alerting on this metric).

Related

Grafana dashboard separating "boundedElastic" vs "parallel" for executor pool

Small question regarding how to build Grafana dashboards which separates "boundedElastic" vs "parallel" please.
Currently with a Spring Webflux app, I get out of the box very useful metrics for Reactor Core.
executor_pool_size_threads
executor_pool_core_threads
executor_pool_max_threads
etc
The Reactor Team even provides default dashboard so we can have visuals on the states:
https://github.com/reactor/reactor-monitoring-demo
Unfortunately, the current dashboards mix "boundedElastic" and "parallel", I am trying to build the same dashboards, but with "boundedElastic" and "parallel" separated.
I tried:
sum(executor_pool_size_threads{_ws_="my_workspace"}) by (reactor_scheduler_id, boundedElastic)
But no luck so far.
May I ask what is the correct way to do it please?
Thank you
In the demo project, the metrics are stored in Prometheus and are queried using PromQL. Each metric can have several labels and each label can have several values. The metrics can be selected by labels and values, e.g. my_metric{first_label="first_value", second_label="another_value"} selects my_metric where both labels are matching corresponding values.
So in your example the metric executor_pool_size_threads has the label reactor_scheduler_id. However, the values contain more information beyond scheduler name. On my machine (because of default pool size) the values are: parallel(8,"parallel") and boundedElastic("boundedElastic",maxThreads=80,maxTaskQueuedPerThread=100000,ttl=60s). So regex-match is useful here for matching the values with =~ operator.
PromQL query only for parallel:
sum (executor_pool_size_threads{reactor_scheduler_id=~"parallel.*"}) by (reactor_scheduler_id)
PromQL query only for boundedElastic:
sum (executor_pool_size_threads{reactor_scheduler_id=~"boundedElastic.*"}) by (reactor_scheduler_id)

Springboot:Can Drools Rule Engine file (.drl) be updated through frontend based on user inputs

I have followed a tutorial on Drools and have implemented the same as well. I am trying to understand the means by which I can alter the values in the .drl file through front end. Below is the Drools file I have used named order.drl.
package KieRule;
import com.example.demo.Order;
rule "HDFC"
when
orderObject : Order(cardType=="HDFC" && price>10000);
then
orderObject.setDiscount(10);
end;
rule "ICICI"
when
orderObject : Order(cardType=="ICICI" && price>15000);
then
orderObject.setDiscount(8);
end;
rule "DBS"
when
orderObject : Order(cardType=="DBS" && price>15000);
then
orderObject.setDiscount(15);
end;
Basically the logic is to establish rules on the percentage of discount to be calculated on each card type for a commodity purchased.
So a post request as below to http://localhost:8080/orders
{
"name":"Mobile",
"cardType":"HDFC",
"price" : 11000
}
gives output as below where the discount has been determined
{
"name": "Mobile",
"cardType": "HDFC",
"discount": 10,
"price": 11000
}
I created the Spring Starter Project and below is my folder structure
Will it be feasible for me to have the values presently hardcoded in the .drl file like for e.g. "HDFC" and "price>10000" to be captured from the JSP or Reactjs front end and updated? I would prefer the admin users of the application to alter the rules as and when they require. I have seen examples where there are $ notations used in .drl but was not able to grasp them fully. Can this we achieved in Java?
Thanks
SM
Assuming that what you're trying to do is to keep the basic structure the same and simply vary the constraints, then the two simplest solutions would be to use rule templates or to pass the constraints themselves to the rules alongside your inputs.
Passing in constraints
This is the simplest solution. Basically the idea is that the values for your constraints are first class citizens, alongside your rule inputs.
Looking at your rules that you presented, the two pieces of data that make up your constraints are the card type and the minimum price. These each have a single consequence. We can model that simply:
class OrderConstraints {
private String cardType;
private Integer minimumPrice;
private Integer discount;
// Getters
}
You would pass these into the rules in objects that look like this:
{
"cardType": "HDFC",
"minimumPrice": 10000,
"discount": 10
},
{
"cardType": "ICICI",
"minimumPrice": 15000,
"discount": 8
},
{
"cardType": "DBS",
"minimumPrice": 15000,
"discount": 15
}
Now all of your use cases can be handled in a single rule:
rule "Apply order discount"
when
OrderConstraints( $cardType: cardType, $minimumPrice: minimumPrice, $discount: discount)
$order: Order( cardType == $cardType, price > $minimumPrice )
then
$order.setDiscount($discount);
end
(Side note: I cleaned up your syntax. You had a lot of unnecessary semi-colons and oddly located white space in your original rules.)
The workflow would basically be as follows:
User creates constraints in UI / front end, specifying required information (card type, minimum price, discount).
User's constraints are sent to the server and saved to your persistence layer (database, etc.)
When a new query is made, constraints are read out of the persistence layer and passed into the rules along with the rule inputs.
Rule templates
The second solution is to use rule templates (link is to Drools documentation.) The basic idea is that you provide a table of data and a template of a DRL, and the Drools framework will make the data to the template and generate the DRLs for you. This is useful for when you have very repetitive rules such as yours -- where other you're basically applying the same rule with various different constraints.
Similar to the other scenario, your workflow would be like this:
User creates constraints in UI / front end, specifying required information (card type, minimum price, discount.)
User's constraints are sent to the server.
Server reformats the request into tabular form (instead of JSON or whatever the original format was).
Server uses the data table (step 3) with the template to generate rules.
Your template might look something like this, assuming columns labelled "cardType", "minPrice", and "discount"):
template header
cardType
minPrice
discount
package com.example.template;
import com.example.Order;
template "orderdiscounts"
rule "Apply order discount for #{cardType}"
when
$order: Order( cardType == "#{cardType}",
price > #{minPrice} )
then
$order.setDiscount(#{discount});
end
end template
The format is pretty straight-forward. First comes the header, where we define the columns in order. The first blank line indicates the end of the header. The package declaration and imports come next, because those are static for the file. Then comes the template. The column values are interpolated using the #{ column name } pattern; note that you need to wrap this in quote marks for a string.
The Drools documentation is very good so I'm not going to go overly into detail, but you should be able to get the gist of this.
Design considerations
Since you're talking about a React front-end, I'm going to assume you're building a modern web application. When you implement your solution, keep in mind the problems of persistence and data integrity.
If you scale your backend application into multiple instances fronted by a load balanced, you'll want to make sure that the constraints the user applies are propagated to all instances. Further, if you apply changes, they need to become visible/propagate in real time -- you can't have one node or cluster using stale constraints or values.
While on its face the rule templates seem like they're the perfect built-in solution for this problem, it's not necessarily the case. Rule templates are designed around data tables stored in flat files -- by definition not a very modern or distributed approach. If your user updates the constraints on Node A, and the data table on Node A is updated, you need to make sure that the same data table file propagates to all of the other Nodes. Further if you spin up a new Node, you'll need to design a mechanism by which it is able to get the "current" data table. And then what if something catastrophic happens and you lose all your nodes?
The "pass constraints into memory" solution is old school, but it has the benefit of being backed by a traditional persistence layer. If you use a distributed data source of some kind, this can be your source of truth. As long as all of your application instances query your constraints out of the database whenever they file the rules (or with some logical caching layer), you won't need to worry about your instances getting out of sync. And if you lose all your nodes, your constraints will be in your data source for when your new application instances are spun up and need to use it. The downside is, of course, that you do need to do these database queries or cache reads, which potentially adds latency. Then you need to pass this data into the rules, which is going to increase your memory usage and potentially CPU (exactly how much depends on how much data you're passing in).
I've linked you to the Drools documentation and I strongly suggest reading it. It's long, but quite worth it. It's probably my most-accessed bookmark, even though I don't even do drools on a daily basis anymore. There are other solutions available as is documented there -- for example, your user request could generate rules, package them in a kjar, and publish them to a Maven repository which all of your instances could then download and use -- but the right solution for you use case is something you'll need to decide for yourself based on your requirements.

Apache open nlp maxent: proper method to handle 'probability distribution' labels?

Apache open nlp maxent: is it possible to set 'probability distribution' label?
I have read football.dat, gameLocation.dat, and realTeam.data and tried CreateModel.java and Predict.java in the 'sports' package. The prediction results are classes probability distribution like lose[0.3686] win[0.4416] tie[0.1899], and labels of training examples at the end of lines are all single classes, like win.
Is it possible to set probability distribution labels like lose[0.3686] win[0.4416] tie[0.1899] in the training data? If not, beyond just setting the max probability tag as the label, what are proper ways to handle 'probability distribution' labels? For example, is duplicating examples with class labels proportional to probabilities a principled approach or not, or other systematic methods.

JasperReport & Filling Components

JasperReports newbie here. I have read the tutorial and the quick reference and read up on a number of articles regarding JR, and have now been playing around with the iReport report designer for a day or so.
I think the last major set of concepts I am choking on have to do with the relationship between chart components and their data. Although it is easy to find definitions for each of these, there seems to be very little practicle documentation showing how they relate to one another in a meaningful application.
Report Fields
Report Prameters
Report Variables
Datasets
By playing around with iReport it seems that Fields, Parameters and Variables can exist at the report-level, as well as being placed inside of Datasets. But my understanding of when something is a Field vs. Parameter vs. Variable is very fuzzy, and my understanding of how they relate to Datasets is also very shaky.
Using Datasets as a slight segue, I'm having a tough time seeing the "forest through the trees" with how chart components (such as pie charts, tabls, etc.) get "fed" or "injected with" their data.
Soo... I thought of an example that, if answered, would tie everything together for me (I believe!). Let's say I had two chart components, a text field and a pie chart. I want the pie chart to appear below the text field like so:
The author of this report is: <value supplied by the data source>
<pie chart here>
So, at "fill time" (I think I'm using that correctly...), the report will be "filled" with the name of the report's author (a String), as well as a pie chart comprised of 2 pie slices: 1 slice with a value of 75 with a label/key of "Eloi" and a 2nd slice with a value of 25 and a label/key of "Morlocks". If I am not using the correct JR terminology here, what I am trying to achieve is a fill-time pie chart with two slices: an "Eloi" slice consuming 75% of the chart, and a "Morlocks" slice consuming 25% of the chart.
If someone can explain or give code (Java/JRXML) snippets for how to set this kind of chart up, I think it will help connect all the dots and help me understand how components get filled with data. Thanks in advance for any and all help!
Think of parameters as things that the end user supplies to the report at runtime. For example, you supply a StartDate and an EndDate that will get used in a query. The start date that you are interested in is something you know, it's not provided by the data source. (There are variations on this idea: maybe your application knows things about you based on your login, and it supplies these as parameters. But again, these are things known before the report is executed.)
Think of the fields as the data that comes back from your data source. This is the stuff that you want to learn. For example, you run a query like this:
select political_group, gullibility from mytable where the_date > $P{StartDate}
Presumably you would input a value of '802701' for the StartDate and then get results like this:
$F{political_group} $F{gullibility}
Eloi 75
Morlock 25
Think of variables as a way to manipulate this raw data. They can calculate totals and subtotals as well as line-by-line calculations like string manipulation or more complex things like running totals.
Take a look at this pie chart report I posted a couple of years ago: http://mdahlman.wordpress.com/2009/05/02/limiting-pie-pieces/
It has the main ideas you want. I put the title directly into the chart rather than as a separate field. That would be a very simple change. Likewise, you could change the title to "The author of this report is: $P{TheAuthor}" and then pass that param to the report at runtime.
Using a field in the report title rather than a parameter is possible also. But typically it doesn't make sense. The fields will have many values in the data set. Which one belongs in the title? In the case above "Eloi" and "Morlock" are fields, and they really don't make sense in the report title. (You can imagine special cases, of course. You could concatenate all of the political_group values into a single string and put that in the report title. But in an overwhelming majority of cases this won't be reasonable.)
Good luck.

User matching with current data

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.
(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

Categories