Kubernetes, Java and Grafana - How to display only the running containers? - java

I'm working on a setup where we run our Java services in docker containers hosted on a kubernetes platform.
On want to create a dashboard where I can monitor the heap usage of all instances of a service in my grafana. Writing metrics to statsd with the pattern:
<servicename>.<containerid>.<processid>.heapspace works well, I can see all heap usages in my chart.
After a redeployment, the container names change, so new values are added to the existing graph. My problem is, that the old lines continue to exist at the position of the last value received, but the containers are already dead.
Is there any simple solution for this in grafana? Can I just say: if you didn't receive data for a metric for more than X seconds, abort the chart line?
Update:
Upgrading to the newest Grafana Version and Setting "null" as value for "Null value" in Stacking and Null Value didn't work.
Maybe it's a problem with statsd?
I'm sending data to statsd in form of:
felix.javaclient.machine<number>-<pid>.heap:<heapvalue>|g
Is anything wrong with this?

This can happen for 2 reasons, because grafana is using the "connected" setting for null values, and/or (as is the case here) because statsd is sending the previously-seen value for the gauge when there are no updates in the current period.
Grafana Config
You'll want to make 2 adjustments to your graph config:
First, go to the "Display" tab and under "Stacking & Null value" change "Null value" to "null", that will cause Grafana to stop showing the lines when there is no data for a series.
Second, if you're using a legend you can go to the "Legend" tab and under "Hide series" check the "With only nulls" checkbox, that will cause items to only be displayed in the legend if they have a non-null value during the graph period.
statsd Config
The statsd documentation for gauge metrics tells us:
If the gauge is not updated at the next flush, it will send the
previous value. You can opt to send no metric at all for this gauge,
by setting config.deleteGauges
So, the grafana changes alone aren't enough in this case, because the values in graphite aren't actually null (since statsd keeps sending the last reading). If you change the statsd config to have deleteGauges: true then statsd won't send anything and graphite will contain the null values we expect.
Graphite Note
As a side note, a setup like this will cause your data folder to grow continuously as you create new series each time a container is launched. You'll definitely want to look into removing old series after some period of inactivity to avoid filling up the disk. If you're using graphite with whisper that can be as simple as a cron task running find /var/lib/graphite/whisper/ -name '*.wsp' -mtime +30 -delete to remove whisper files that haven't been modified in the last 30 days.

To do this, I would use
maximumAbove(transformNull(felix.javaclient.*.heap, 0), 0)
The transformNull will take any datapoint that is currently null, or unreported for that instant in time, and turn it into a 0 value.
The maximumAbove will only display the series' that have a maximum value above 0 for the selected time period.
Using maximumAbove, you can see all history containers, if you wish to see only the currently running containers, you should use just that: currentAbove

Related

Prometheus Java Client : Export String based Metrics

I`m Currently trying to write an Exporter for Minecraft to display some Metrics in our Grafana Dashboard. While most Metrics are working fine with the Metric Types Counter and Gauge, i couldn't find any documentation on how to export Strings as Metrics. I need those to export Location Data, so that we can have an Overview about where our Players are from, so we can focus localization on these regions. I wasn't able to find anything about that in the official Documentation, nor was I able to find anything in the Github Repository that could help me.
Anyone can help me with that?
With kind regards
thelooter
Metrics are always numeric. But you can use a labels to export string values, this is typically used to export build or version information. E.g.
version_info{version="1.23", builtOn="Windows", built_by="myUserName" gitTag="version_1.0"} = 1
so you can show in Grafana which version is currently running.
But (!!!) Prometheus is not designed to handle a lot of label combinations. Prometheus creates a new file for every unique label value combination. This would mean that you creat a file per player if you had one metric per player. (And you still need to calculate the amount of players per Region)
What you could do is define regions in your software and export a gauge for every region representing the amount of players logged in from this region:
player_count{region="Europe"} 234
player_count{region="North America"} 567
...
If you don't want to hardcode the regions in your software, you should export the locations of the players into a database and do the statistics later based on the raw data.

Getting values from previous windows

I'm computing statistics (min, avg, etc.) on fixed windows of data. The data is streamed in as single points and are continuous (like temperature).
My current pipeline (simplified for this question) looks like:
read -> window -> compute stats (CombineFn) -> write
The issue with this is that each window's stats are incorrect since they don't have a baseline. By that, I mean that I want each window's statistics to include a single data point (the latest one) from the previous window's data.
One way to think about this is that each window's input PCollection should include the ones that would normally be in the window due to their timestamp, but also one extra point from the previous window's PCollection.
I'm unsure of how I should go about doing this. Here are some things I've thought of doing:
Duplicating the latest data point in every window with a modified timestamp such that it lands in the next window's timeframe
Similarly, create a PCollectionView singleton per window that includes a modified version of its latest data point, which will be consumed as a side input to be merged into the next window's input PCollection
One constraint is that if a window doesn't have any new data points, except for the one that was forwarded to it, it should re-forward that value to the next window.
It sounds like you may need to copy a value from one window into arbitrarily many future windows. The only way I know how to do this is via state and timers.
You could write a stateful DoFn that operates on globally windowed data and stores in its state the latest (by timestamp) element per window and fires a timer at each window boundary this element into the subsequent window. (You could possibly leverage the Latest combine operation to get the latest element per window rather than doing it manually.) Flattening this with your original data and then windowing should give you the values you desire.

What's exactly is the use of 'withIngestionTimestamps()' in Hazelcast Jet Pipeline?

I'm running a pipeline, source from Kafka topic and sink to an IMap. Everytime I write one, I come across the methods withIngestionTimestamps() and withoutTimestamps() and wondering how are they useful? I understand its all about the source adding time to the event. Question is how do I get to use it? I don't see any method to fetch the timestamp from the event?
My IMap have a possibility of getting filled with duplicate values. If I could make use of the withIngestionTimestamps() method to evaluate latest record and discard the old?
Jet uses the event timestamps to correctly apply windowing. It must decide which event belongs to which window and when the time has come to close a window and emit its aggregated result. The timestamps are present on the events as metadata and aren't exposed to the user.
However, if you want to apply your logic that refers to the wall-clock time, you can always call System.currentTimeMillis() to check it against the timestamp explicitly stored in the IMap value. That would be equivalent to using the processing time, which is quite similar to the ingestion time that Jet applies. Ingestion time is simply the processing time valid at the source vertex of the pipeline, so applying processing time at the sink vertex is just slightly different from that, and has the same practical properties.
Jet manages the event timestamp behind the scenes, it's visible only to processors. For example, the window aggregation will use the timestamp.
If you want to see the timestamp in the code, you have to include it in your item type. You have to go without timestamps from the source, add the ingestion timestamp using a map operator and let Jet know about it:
Pipeline p = Pipeline.create();
p.drawFrom(KafkaSources.kafka(...))
.withoutTimestamps()
.map(t -> tuple2(System.currentTimeMillis(), t))
.addTimestamps(Tuple2::f0, 2000)
.drainTo(Sinks.logger());
I used allowedLag of 2000ms. The reason for this is that the timestamps will be added in a vertex downstream of the vertex that assigned them. Stream merging can take place there and internal skew needs to be accounted for. For example it should account for the longest expected GC pause or network delay. See the note in addTimestamps method.

jawampa maximum websockt frame size?

Is there a maximum size for the arguments when publishing an event?
I use this code (java): wampClient.publish(token, response.toString());
response.toString() is a long json-string in my case. It has about 70.000 characters. I have the suspicion that the event does not get published, because when I replace response.toString() with a short string, the event gets published as expected.
I dont know much about the internals of Wamp and an initial debugging session into the code did not provide me with much insight. As I said above, I think that the long string is causing some problems.
Minimal running example: To get a minimum running example, please download the example java project from here: http://we.tl/a3kj3dzJ7N and import it into your IDE.
In the demo folder there are two .java-files: Client.java and Server.java
Run/Start both of them and a GUI should appear for each. Then do the following procedure (C = Client, S = Server):
C: hit start
S: hit start
C: hit publish
depending on the size of the message you will see different output on the console of your IDE. the size of the message can be changed in line 137 of Client.java via the size integer variable. As already explained above: If size is lower than 70000 (e.g. 60000) everything works as expected. The console output of Client.java is then as follows:
Open Client
Session1 status changed to Connecting
Session1 status changed to Connected
Publishing
Received event test.event with value 10000
However, if the integer variable size is changed to 70000 (or higher) the output is as follows:
Open Client
Session1 status changed to Connecting
Session1 status changed to Connected
Publishing
Completed event test.event
Session1 status changed to Disconnected
Session1 status changed to Connecting
Session1 status changed to Connected
As you can see the Received event ... is missing, hence, the event is not received. However, there is Completed event test.event, but the data is missing obviously.
To sum up, one can see when running the example above that the event is not received properly when the size of the transmitted string is greater than 70000. This problem may be related to netty since it is used under the hood of jawampa. Any help is appreciated. Maybe it's just some small configuration which can fix this problem.
EDIT 1: I updated the question with a minimal running example which can be downloaded.
EDIT 2: I think I now know the root of the problem (totally not sure though, see EDIT3). It is related to the allowed size of a string literal in java. See: Size of Initialisation string in java
In the above example, I can reflect that. If the size variable is lower than 65535 characters, it works, else it doesnt. Is there a workaround for this?
EDIT 3 aka SOLUTION: As suggested by the developer (see here), the variable DEFAULT_MAX_FRAME_PAYLOAD_LENGTH in NettyWampConnectionConfig.java:8 should be changed to a higher value. then everything works like a charm.
As suggested by the developer (see here), the variable DEFAULT_MAX_FRAME_PAYLOAD_LENGTH can be overwritten through the NettyWampConnectionConfig class, which you can provide to the NettyWampClientConnectorProvider class. The variable value should, obviously, be increased.
There is bug in jawampa, cause DEFAULT_MAX_FRAME_PAYLOAD_LENGTH is 1 bite lower than default split frame size in Crossbar. So DEFAULT_MAX_FRAME_PAYLOAD_LENGTH should be increased just by 1 bite or crossbar split frame size should be lowered by 1.
Also if you change DEFAULT_MAX_FRAME_PAYLOAD_LENGTH, it should be changed using builder: .withConnectionConfiguration((new NettyWampConnectionConfig.Builder()).withMaxFramePayloadLength(65536).build())

User matching with current data

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.
(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

Categories