Grafana dashboard separating "boundedElastic" vs "parallel" for executor pool - java

Small question regarding how to build Grafana dashboards which separates "boundedElastic" vs "parallel" please.
Currently with a Spring Webflux app, I get out of the box very useful metrics for Reactor Core.
executor_pool_size_threads
executor_pool_core_threads
executor_pool_max_threads
etc
The Reactor Team even provides default dashboard so we can have visuals on the states:
https://github.com/reactor/reactor-monitoring-demo
Unfortunately, the current dashboards mix "boundedElastic" and "parallel", I am trying to build the same dashboards, but with "boundedElastic" and "parallel" separated.
I tried:
sum(executor_pool_size_threads{_ws_="my_workspace"}) by (reactor_scheduler_id, boundedElastic)
But no luck so far.
May I ask what is the correct way to do it please?
Thank you

In the demo project, the metrics are stored in Prometheus and are queried using PromQL. Each metric can have several labels and each label can have several values. The metrics can be selected by labels and values, e.g. my_metric{first_label="first_value", second_label="another_value"} selects my_metric where both labels are matching corresponding values.
So in your example the metric executor_pool_size_threads has the label reactor_scheduler_id. However, the values contain more information beyond scheduler name. On my machine (because of default pool size) the values are: parallel(8,"parallel") and boundedElastic("boundedElastic",maxThreads=80,maxTaskQueuedPerThread=100000,ttl=60s). So regex-match is useful here for matching the values with =~ operator.
PromQL query only for parallel:
sum (executor_pool_size_threads{reactor_scheduler_id=~"parallel.*"}) by (reactor_scheduler_id)
PromQL query only for boundedElastic:
sum (executor_pool_size_threads{reactor_scheduler_id=~"boundedElastic.*"}) by (reactor_scheduler_id)

Related

Prometheus Java Client : Export String based Metrics

I`m Currently trying to write an Exporter for Minecraft to display some Metrics in our Grafana Dashboard. While most Metrics are working fine with the Metric Types Counter and Gauge, i couldn't find any documentation on how to export Strings as Metrics. I need those to export Location Data, so that we can have an Overview about where our Players are from, so we can focus localization on these regions. I wasn't able to find anything about that in the official Documentation, nor was I able to find anything in the Github Repository that could help me.
Anyone can help me with that?
With kind regards
thelooter
Metrics are always numeric. But you can use a labels to export string values, this is typically used to export build or version information. E.g.
version_info{version="1.23", builtOn="Windows", built_by="myUserName" gitTag="version_1.0"} = 1
so you can show in Grafana which version is currently running.
But (!!!) Prometheus is not designed to handle a lot of label combinations. Prometheus creates a new file for every unique label value combination. This would mean that you creat a file per player if you had one metric per player. (And you still need to calculate the amount of players per Region)
What you could do is define regions in your software and export a gauge for every region representing the amount of players logged in from this region:
player_count{region="Europe"} 234
player_count{region="North America"} 567
...
If you don't want to hardcode the regions in your software, you should export the locations of the players into a database and do the statistics later based on the raw data.

Prometheus query by label with range vectors

I'm defining a lot of counters in my app (using java micrometer) and in order to trigger alerts I tag the counters which I want to monitor with "error":"alert" so a query like {error="alert"} will generate multiple range vectors:
error_counter_component1{error="alert", label2="random"}
error_counter_component2{error="alert", label2="random2"}
error_counter_component3{error="none", label2="random3"}
I don't control the name of the counters I can only add the label to the counters I want to use in my alert. The alert that I want to have is if all the counters labeled with error="alert" increase more then 3 in one hour so I could use this kind of query: increase({error="alert"}[1h]) > 3 but I get the fallowing error in Prometheus: Error executing query: vector cannot contain metrics with the same labelset
Is there a way to merge two range vectors or should I include some kind of tag in the name of the counter? Or should I have a single counter for errors and the tags should specify the source something like this:
errors_counter{source="component1", use_in_alert="yes"}
errors_counter{source="component2", use_in_alerts="yes"}
errors_counter{source="component3", use_in_alerts="no"}
The version with source="componentX" label is much more fitting to prometheus data model. This is assuming the error_counter metric is really one metric and other than source label value it will have same labels etc. (for example it is emitted by the same library or framework).
Adding stuff like use_in_alerts label is not a great solution. Such label does not identify time series.
I'd say put a list of components to alert on somewhere where your alerting queries are constructed and dynamically create separate alerting rules (without adding such label to raw data).
Other solution is to have a separate pseudo metric that will obnly be used to provide metadata about components, like:
component_alert_on{source="component2"} 1
and. combine it in alerting rule to only alert on components you need. It can be generated in any possible way, but one possibility is to have it added in static recording rule. This has the con of complicating alerting query somehow.
But of course use_in_alerts label will also probably work (at least while you are only alerting on this metric).

JSAT: Data wrangling / manipulating

After building a prototype in R (using dplyr), I need to build a model that is deployable to our Java based server-infrastructure. Right now, I'm using the JSAT-machine-learning library.
What is the best way to wrangle data?
None of the collection-like types from the JSAT package (ClassificationDataSet, RegressionDataSet, DataSet) seem to support even basic tasks like:
Filtering out datapoints based on conditions
Splitting the dataset into two (different sized) datasets, e.g. training and testing dataset
Mutating or adding new rows based on the values of other rows
1) This isn't currently supported in JSAT, JSAT is a source of Machine Learning algorithms. Dataframe like operations are not a goal of the project in any way. I'm not sure why you would want to be filtering out data in a production system, there is no reason you couldn't do that in a better tool and then export the data to have JSAT build the model.
2) All DataSet objects inherit a randomSplit method that can do what you have asked for. An example of that is here.
3) See 1, I'm not sure what the use case is for adding "new rows based on the values of other rows". All the different DataSet classes support adding new data points, you just have to create them yourself.
source: I'm the author of JSAT

Non associative aggregations in apache spark streaming

I am trying to build a utility layer in Java over apache spark streaming where users are able to aggregate data over a period of time (using window functions in spark) but it seems all the available options require associative functions (taking two arguments). However for some fairly common use cases like averaging temperature sensor values over an hour, etc. dont seem possible with the spark API.
Is there any alternative for achieving this kind of functionality? I am thinking of implementing repetitive interactive queries to achieve that, but it will be too slow.
Statistical aggregates (average, variance) are actually associative and can be computed online. See here for a good numerical way of doing this.
In terms of number of arguments, remember the type of what you put in arguments is of your choice. You can nest several arguments in one of these using a Tuple.
Finally, you can also use stateful information with something like updateStateByKey.

What is evaluation of a cluster in WEKA?

What do we mean when we say that we are evaluating the clusters in WEKA frmework? Clustering is an unsupervised approach to grouping objects. What do we mean when we say we want to evaluate the result? Also, in addition to this, when we say that we are evaluating the clusters on top of the training data itself, what does that mean?
Thanks
Abhishek S
As written on this page:
Evaluation
The way Weka evaluates the clusterings depends on the cluster mode you select. Four different cluster modes are available (as buttons in the Cluster mode panel):
Use training set (default). After generating the clustering Weka classifies the training instances into clusters according to the cluster representation and computes the percentage of instances falling in each cluster. For example, the above clustering produced by k-means shows 43% (6 instances) in cluster 0 and 57% (8 instances) in cluster 1.
In Supplied test set or Percentage split Weka can evaluate clusterings on separate test data if the cluster representation is probabilistic (e.g. for EM).
Classes to clusters evaluation. In this mode Weka first ignores the class attribute and generates the clustering. Then during the test phase it assigns classes to the clusters, based on the majority value of the class attribute within each cluster. Then it computes the classification error, based on this assignment and also shows the corresponding confusion matrix. An example of this for k-means is shown below.

Categories