I am trying to measure application and jvm level metrics on my application using DropWizard Metrics library.
Below is my metrics class which I am using across my code to increment/decrement the metrics. I am calling increment and decrement method of below class to increment and decrement metrics.
public class TestMetrics {
private final MetricRegistry metricRegistry = new MetricRegistry();
private static class Holder {
private static final TestMetrics INSTANCE = new TestMetrics();
}
public static TestMetrics getInstance() {
return Holder.INSTANCE;
}
private TestMetrics() {}
public void increment(final Names... metricsName) {
for (Names metricName : metricsName)
metricRegistry.counter(name(TestMetrics.class, metricName.value())).inc();
}
public void decrement(final Names... metricsName) {
for (Names metricName : metricsName)
metricRegistry.counter(name(TestMetrics.class, metricName.value())).dec();
}
public MetricRegistry getMetricRegistry() {
return metricRegistry;
}
public enum Names {
// some more fields here
INVALID_ID("invalid-id"), MESSAGE_DROPPED("drop-message");
private final String value;
private Names(String value) {
this.value = value;
}
public String value() {
return value;
}
};
}
And here is how I am using above TestMetrics class to increment the metrics basis on the case where I need to. Below method is called by multiple threads.
public void process(GenericRecord record) {
// ... some other code here
try {
String clientId = String.valueOf(record.get("clientId"));
String procId = String.valueOf(record.get("procId"));
if (Strings.isNullOrEmpty(clientId) && Strings.isNullOrEmpty(procId)
&& !NumberUtils.isNumber(clientId)) {
TestMetrics.getInstance().increment(Names.INVALID_ID,
Names.MESSAGE_DROPPED);
return;
}
// .. other code here
} catch (Exception ex) {
TestMetrics.getInstance().increment(Names.MESSAGE_DROPPED);
}
}
Now I have another class which runs every 30 seconds only (I am using Quartz framework for that) from where I want to print out all the metrics and its count. In general, I will send these metrics every 30 seconds to some other system but for now I am printing it out here. Below is how I am doing it.
public class SendMetrics implements Job {
#Override
public void execute(final JobExecutionContext ctx) throws JobExecutionException {
MetricRegistry metricsRegistry = TestMetrics.getInstance().getMetricRegistry();
Map<String, Counter> counters = metricsRegistry.getCounters();
for (Entry<String, Counter> counter : counters.entrySet()) {
System.out.println(counter.getKey());
System.out.println(counter.getValue().getCount());
}
}
}
Now my question is: I want to reset all my metrics count every 30 seconds. Meaning when my execute method prints out the metrics, it should print out the metrics for that 30 second only (for all the metrics) instead of printing for that whole duration from when the program is running.
Is there any way that all my metrics should have count for 30 seconds only. Count of whatever has happened in last 30 seconds.
As an answer because it is too long:
You want to reset the counters. There is no API for this. The reasons are discussed in the linked github issue. The article describes a possible workaround. You have your counters and use them as usual - incrementing and decrementing. But you can't reset them. So add new Gauge which value is following the counter you want to reset after it have reported to you. The getValue() method of the Gauge is called when you want to report the counter value. After storing the current value the method is decreasing the value of the counter with it. This effectively reset the counter to 0. So you have your report and also have the counter reset. This is described in Step 1.
Step 2 adds a filter that prohibits the actual counter to be reported because you are now reporting through the gauge.
Related
Sorry for the long question, I need to present the environment otherwise you may misunderstand my issue.
Current state
I have a cache manager< K, V >, that for a given object of class K, returns a holder parametrized by the type V, representing the value associated on a web service to the corresponding K.
Holder
The Holder classes manage the fetch, synchronization, and scheduling of next fetch, because the cache is designed for multiple parallel calls. The data fetched by the web service has an expiry date (provided in the header), after which the holder can fetch it again and schedules itself again for next expiry. I have 3 classes(for list, map and other), but they are all used the same way. The Holder< V > class has 5 methods, 2 for direct access and 3 for IoC access
void waitData() waits until the data is fetched at least once. Internally is uses a countdownlatch.
V copy() waits for the data to be fetched at least once, then returns a copy of the cached V. Simple items are returned as they are, while more complex (eg Map for the prices in a given shop referenced by furniture id) are copied in a synchronized loop (to avoid another fetch() to corrupt the data)
void follow(JavaFX.Listener< V >) registers a new listener of V to be notified on modifications on the holder's data. If the holder already has received data, the listener is notified of this data as if it was new.
void unfollow (JavaFX.Listener< V >) unregisters apreviously registered listener.
Observable asObservable() returns an Observable . That allows to be used eg in javafx GUI.
Typically this allows me to do things like streaming of multiple data in parallel with adequate time, eg
Stream.of(1l, 2l, 3l).parallel().map(cache::getPrice).mapToInt(p->p.copy().price).min();
or to make much more complex Bindings in javafx, eg when the price depends on the number of items you want to purchase
Self Scheduling
The holder class contains a SelfScheduling< V > object, that is responsible to actually fetch the data, put it in the holder and reschedule itself after data expire.
The SelfScheduling use a ScheduledExecutorService in the cache, to schedule its own fetch() method. It starts by scheduling itself after 0 ms, rescheduling itself after 10s if error, or after expiry if new data was fetched. It can be paused, resumed, is started on creation, and can be stopped.
This is the behavior I want to modify. I want the self executor to remove the Holder from the cache on expiry, if the holder is not used anywhere in the code
Cache manager
Just for the information, my cache manager consists of a Map< K, Holder< V > > cachedPrices to hold the cache data, and a method getPrice(K) that syncs over the cache if holder missing, create the holder if required(double check to avoid unnecessary sync), and return the holder.
Global Code
Here is a example of what my code looks like
public class CacheExample {
public static class Holder<T>{
SimpleObjectProperty<T> data = new SimpleObjectProperty<>();
// real code removed
T copy() {
return null;
}
Observable asObservable() {
return null;
}
void follow(ChangeListener<? super T> listener) {
}
}
public static class SelfScheduled implements Runnable {
// should use enum
private Object state = "start";
public void schedule(long ms) {
// check state, sync, etc.
}
#Override
public void run() {
long next = fetch();
schedule(next);
}
public long fetch() {
// set the value in the holder
// return the next expiry
return 0;
}
}
public Map<Long, Holder<Object>> cachePrices = new HashMap<>();
public Holder<Object> getPrice(long param) {
Holder<Object> ret = cachePrices.get(param);
if (ret == null) {
// sync, re check, etc.
synchronized (cachePrices) {
ret = cachePrices.get(param);
if (ret == null) {
ret = new Holder<>();
// should be the fetch() call instead of null
makeSchedule(ret.data, null);
}
}
}
return ret;
}
public void makeSchedule(SimpleObjectProperty<Object> data, Runnable run) {
// code removed.
// creates a selfscheduler with fetch method and the data to store the
// result.
}
}
Expected modifications
As I wrote above, I want to modify the way the cache holds the data in memory.
Especially, I see no reason to maintain a huge number of self scheduling entities to fetch data when those data are no more used. If the expiry is 5s (some web sevices ARE), and I cache 1000 data(that's a very low value), then that means I will make 200 fetch() per second for no reason.
What I expect is that, when the Holder is no more used, the self scheduling stops itself and instead of fetching data, it actually removes the holder from the cache. example :
Holder< Price > p = cache.getPrice(1);
// here if the fetch() is called it should fetch the data
p.copy().price;
// now the price is no more used, on next fetch() it should remove p from the cache.
// If that happens, and later I re enter that code, the holder and the selfscheduler will be re created.
Holder< Price > p2 = cache.getPrice(22);
mylist.add(p2);
// now there is a strong reference to this price, so the fetch() method will keep scheduling the selfscheduler
// until mylist is no more strongly referenced.
Incorrect
However my knowledge of adequate technologies is limited in that field. To what I understood, I should use a weak reference in the cache manager and the self scheduling to know when the holder is no more strongly referenced (typically, start the fetch() by checking if the reference became null, in which case just stop); However this would lead to the holder being GC'd BEFORE the next expiry, which I don't want : some data have very long expiry and are only used in a simple method, eg cache.getShopLocation() should not be GC'd just after the value returned by copy() is used.
Thus, this code is incorrect :
public class CacheExampleIncorrect {
public static class Holder<T>{
SimpleObjectProperty<T> data = new SimpleObjectProperty<>();
// real code removed
T copy() {
return null;
}
Observable asObservable() {
return null;
}
void follow(ChangeListener<? super T> listener) {
}
}
public static class SelfScheduled<T> implements Runnable {
WeakReference<Holder<T>> holder;
Runnable onDelete;
public void schedule(long ms) {
// check state, sync, etc.
}
#Override
public void run() {
Holder<T> h = holder.get();
if (h == null) {
onDelete.run();
return;
}
long next = fetch(h);
schedule(next);
}
public long fetch(Holder<T> h) {
// set the value in the holder
// return the next expiry
return 0;
}
}
public Map<Long, WeakReference<Holder<Object>>> cachePrices = new HashMap<>();
public Holder<Object> getPrice(long param) {
WeakReference<Holder<Object>> h = cachePrices.get(param);
Holder<Object> ret = h == null ? null : h.get();
if (h == null) {
synchronized (cachePrices) {
h = cachePrices.get(param);
ret = h == null ? null : h.get();
if (ret == null) {
ret = new Holder<>();
h = new WeakReference<>(ret);
// should be the fetch() call instead of null
SelfScheduled<Object> sched = makeSchedule(h, null);
cachePrices.put(param, h);
// should be synced on cachedprice
sched.onDelete = () -> cachePrices.remove(param);
}
}
}
return ret;
}
public <T> SelfScheduled<T> makeSchedule(WeakReference<Holder<Object>> h, Runnable run) {
// creates a selfscheduler with fetch method and the data to store the
// result.
return null;
}
}
I am trying to monitor logged in users, i am getting the logged in user info by calling api, this is the code i have used,
public class MonitorService {
private InfoCollectionService infoService;
public MonitorService(InfoCollectionService infoService) {
this.infoService = infoService
}
#Scheduled(fixedDelay = 5000)
public void currentLoggedInUserMonitor() {
infoService.getLoggedInUser("channel").forEach(channel -> {
Metrics.gauge("LoggedInUsers.Inchannel_" + channel.getchannelName(), channel.getgetLoggedInUser());
});
}
}
And i see the values in Prometheus, the problem is after a few seconds, the value become NaN, i have read that Micrometer gauges wrap their obj input with a WeakReference(hence Garbage Collected ).I don't know how to fix it.If anybody knows how to fix this it would be great.
This is a shortcoming in Micrometer that I would like to fix eventually.
You need to keep the value in a map in the meantime so it avoid the garbage collection. Notice how we then point the gauge at the map and us a lambda to pull out the value to avoid the garbage collection.
public class MonitorService {
private Map<String, Integer> gaugeCache = new HashMap<>();
private InfoCollectionService infoService;
public MonitorService(InfoCollectionService infoService) {
this.infoService = infoService
}
#Scheduled(fixedDelay = 5000)
public void currentLoggedInUserMonitor() {
infoService.getLoggedInUser("channel").forEach(channel -> {
gaugeCache.put(channel.getchannelName(), channel.getgetLoggedInUser());
Metrics.gauge("LoggedInUsers.Inchannel_" + channel.getchannelName(), gaugeCache, g -> g.get(channel.getchannelName()));
});
}
}
I would also recommend using tags for the various channels:
Metrics.gauge("loggedInUsers.inChannel", Tag.of("channel",channel.getchannelName()), gaugeCache, g -> g.get(channel.getchannelName()));
I work on stress tests for REST server.
My aim is to create a mock controller method, which will throw 404 Error every 100 requests (other results are 200 OK), and check the total amount of sent requests and failed ones.
The problem is, even though I use ConcurrentHashMap and AtomicInteger for counting those figures, the amount of failed request varies +-20. Synchronization of RequestCounter.addFailed() didn't help. The only way I found is to synchronize controller's method, but it's not the option.
I run 220_000 stress test requests with 20 threads via Jmeter.
Here is my controller:
#RequestMapping(value = "/items/add", method = RequestMethod.POST)
public ResponseEntity addGDT(#RequestBody String data, Principal principal) {
RequestCounter.add();
if ((RequestCounter.getCounts().get("ADD").longValue() % 100) == 0) {
RequestCounter.addFailed();
return ResponseEntity.notFound().build();
} else {
return ResponseEntity.ok().build();
}
}
The number of requests is counted here:
public class RequestCounter {
static Map<String, AtomicInteger> counts = new ConcurrentHashMap<>();
static {
counts.put("ADD", new AtomicInteger(0));
counts.put("ADD_FAILED", new AtomicInteger(0));
}
public static void add(){
counts.get("ADD_GDT").incrementAndGet();
}
public static void addFailed(){
counts.get("ADD_FAILED").incrementAndGet();
}
UPDATE
I followed an advice of javaguy and refactored the code by removing map and working with AtomicInteger variables directly. But the result is still unpredictable: failedRequestCount still varies from +-3
public class RequestCounter {
static AtomicInteger failedRequestsCounter = new AtomicInteger(0);
...
public static void addGDTFailed(){
failedRequestsCounter.incrementAndGet();
}
UPDATE2
The situation wasn't resolved neither by calling directly the thread-safe variable, nor by separation and synchronization of a method for getting modulus
The problem is RequestCounter class is not threadsafe because of these two lines:
counts.get("ADD_GDT").incrementAndGet();
counts.get("ADD_FAILED").incrementAndGet();
These are NOT atomic operations i.e., actually, the computation involves two steps (read the value from Map and then write). Though ConcurrentHashMap and AtomicInteger are individually threadsafe, but when you use them collectively, you need a synchronization or locking.
But you can achieve what you wanted for your testing with a much simpler code without using a ConcurrentHashMap itself.
To make the RequestCounter class threadsafe, just remove the Map, and directly access the AtomicInteger reference as below:
public class RequestCounter {
private final AtomicLong addInt = new AtomicLong();
private final AtomicLong addFailed = new AtomicLong();
public static long get() {
return addInt.get();
}
public static long add() {
return addInt.incrementAndGet();
}
public static long addFailed(){
return addFailed.incrementAndGet();
}
}
UPDATE1: Problem with 3% variation of requests:
You need to esnure that RequestCounter.add() is being called only once per request, look at my controller code below:
#RequestMapping(value = "/items/add", method = RequestMethod.POST)
public ResponseEntity addGDT(#RequestBody String data, Principal principal) {
if ((RequestCounter.get() % 100) == 0) {
RequestCounter.addFailed();
return ResponseEntity.notFound().build();
} else {
RequestCounter.add();
return ResponseEntity.ok().build();
}
}
I'm experimenting with fault tolerance in Apache Ignite.
What I can't figure out is how to retry a failed job on any node. I have a use case where my jobs will be calling a third-party tool as a system process via process buildr to do some calculations. In some cases the tool may fail, but in most cases it's OK to retry the job on any node - including the one where it previously failed.
At the moment Ignite seems to reroute the job to another node which did not have this job before. So, after a while all nodes are gone and the task fails.
What I'm looking for is how to retry a job on any node.
Here's a test to demonstrate my problem.
Here's my randomly failing job:
public static class RandomlyFailingComputeJob implements ComputeJob {
private static final long serialVersionUID = -8351095134107406874L;
private final String data;
public RandomlyFailingComputeJob(String data) {
Validate.notNull(data);
this.data = data;
}
public void cancel() {
}
public Object execute() throws IgniteException {
final double random = Math.random();
if (random > 0.5) {
throw new IgniteException();
} else {
return StringUtils.reverse(data);
}
}
}
An below is the task:
public static class RandomlyFailingComputeTask extends
ComputeTaskSplitAdapter<String, String> {
private static final long serialVersionUID = 6756691331287458885L;
#Override
public ComputeJobResultPolicy result(ComputeJobResult res,
List<ComputeJobResult> rcvd) throws IgniteException {
if (res.getException() != null) {
return ComputeJobResultPolicy.FAILOVER;
}
return ComputeJobResultPolicy.WAIT;
}
public String reduce(List<ComputeJobResult> results)
throws IgniteException {
final Collection<String> reducedResults = new ArrayList<String>(
results.size());
for (ComputeJobResult result : results) {
reducedResults.add(result.<String> getData());
}
return StringUtils.join(reducedResults, ' ');
}
#Override
protected Collection<? extends ComputeJob> split(int gridSize,
String arg) throws IgniteException {
final String[] args = StringUtils.split(arg, ' ');
final Collection<ComputeJob> computeJobs = new ArrayList<ComputeJob>(
args.length);
for (String data : args) {
computeJobs.add(new RandomlyFailingComputeJob(data));
}
return computeJobs;
}
}
Test code:
final Ignite ignite = Ignition.start();
final String original = "The quick brown fox jumps over the lazy dog";
final String reversed = StringUtils.join(
ignite.compute().execute(new RandomlyFailingComputeTask(),
original), ' ');
As you can see, should always be failovered. Since the probability of failure != 1, I expect the task to successfully terminate at some point.
With the probability threshold of 0.5 and a total of 3 nodes this hardly happens. I'm getting an exception like class org.apache.ignite.cluster.ClusterTopologyException: Failed to failover a job to another node (failover SPI returned null). After some debugging I've found out that this is because I eventually run out of nodes. All of the are gone.
I understand that I can write my own FailoverSpi to handle this.
But this just doesn't feel right.
First, it seems to be an overkill to do this.
But then the SPI is a kind of global thing. I'd like to decide per job if it should be retried or failed over. This may, for instance, depend on what the exit code of the third-party tool I'm invoking. So configuring failover over the global SPI isn't right.
Current implementation of AlwaysFailoverSpi (which is the default one) doesn't failover if it has already tried all nodes for a particular job. I believe it can be a configuration option, but for now you will have to implement your own failover SPI (it should be pretty simple - just pick a random node from the topology each time a job is trying to fail over).
As for global nature of the SPI, you're right, but its failover() takes FailoverContext, which has information about failed job (task name, attributes, exception, etc.), so you can make decision based on this information.
In my JEE6-App (running on Glassfish 3.0.1) I have an EmailEJB which has to send lots of mails. The mails are sent asynchronously, so its annotated with the new EJB3.1 #Asynchronous, letting it be run in a separate Thread. Now i want the user to be informed about the current status of the method: How many mails have already been sent?
Sending the mails asynchronously works fine, but i can't figure out how to let the progress be accessible from outside. Seems like my approach to do that is quite wrong, but somehow it has to be possible (maybe another approach). This is how my EmailEJB currently looks like (its kind of pseudo code, but explains what i want):
#Stateful
public class EmailEJB {
#Asynchronous
public Future<Integer> sendMails() {
for (int i=0; i<mails.size; i++) {
sendMail(mails[i])
// i want to return the progress without returning ;)
return new AsyncResult<Integer>(i)
}
}
}
//Just for the completeness... from outside, i'm accessing the progress like this:
Future<Integer> progress = emailEJB.sendEmails();
Integer currentvalue = progress.get();
How can i return the current progress inside my asynchronous function, without cancelling it with a return? How can i show the user the progress of a loop inside a function? Do i need another asynchronous method? Any hints?
Nobody? Ok so this is my solution. Im not sure if this is a big fat workaround or just a way to get this done.
Since an #Asynchronous method cannot access the Session context, and therefore also no Session Beans (at least i dont know how, i always got ConcurrentModificationErrors or similar ones) i created a Singleton ProgressEJB, which contains a HashMap:
#Singleton #LocalBean #Startup
public class ProgressEJB {
private HashMap<String, Integer> progressMap = new HashMap<String, Integer>
// getters and setters
}
This hashmap should map the SessionId (a String) to an Integer value (the progress 0->100). So a user session is associated with a progress.
In my EmailEJB, i'm injecting this ProgressEJB, and in my #Asynchronous method, i'm increasing the value everytime an email has been sent:
#Stateful #LocalBean
public class EmailEJB {
#Inject
private ProgressEJB progress;
// Mail-Settings
...
#Asynchronous
public void sendEmails(user:User, message:Message, sessionId:String) {
progress.progressMap.put(sessionId, 0);
for (int i=0; i<mails.size; i++) {
sendMail(mails[i])
progress.getProgressMap().put(sessionId, (i / mails.size) * 100)
}
progress.getProgressMap().remove(sessionId);
}
The sessionId comes from my Managed (Weld) Bean, when calling the function:
#SessionScoped
#Named
public class EmailManager {
#Inject
private ProgressEJB progress;
#Inject
private FacesContext facesContext;
private String sessionId;
#PostConstruct
private void setSessionId() {
this.sessionId = ((HttpSession)facesContext.getExternalContext().getSession(false)).getId();
}
public Integer getProgress() {
if (progress.getProgressMap().get(sessionId) == null)
return 100;
else
return progress.getProgressMap().get(sessionId);
}
}
Now i can access progress from EmailManager from my JSF view with Ajax Polling, telling the user how many mails already have been sent. Just tested it with 2 users, seems to work.
I also see only a #Singleton solution here.
But this imply the need of Housekeeping in ProgressEJB. E.g. some effort is needed to prune old session from Hashmap.
Another solution is described in
Is there any way to know the progress of a EJB Asynchronous process?
This solution does not need a Stateful Bean.
#Stateless
public class EmailEJB {
// Mail-Settings
...
#Asynchronous
public void sendEmails(User user, Message message, WorkContext context) {
progress.progressMap.put(sessionId, 0);
for (int i=0; i<mails.size; i++) {
sendMail(mails[i])
context.setProgress((i / mails.size) * 100)
}
context.setRunning(false);
}
}
The Context-Object, which holds the progress.
public class WorkContext {
//volatile is important!
private volatile Integer progress = 0;
private volatile boolean running = false;
// getters & setters
}
The usage is very easy.
#SessionScoped
#Named
public class EmailManager {
#Inject
private EmailEJB emailEJB;
private WorkContext workContext;
public void doStuff() {
workContext = new WorkContext();
emailEJB.sendEmails(user, message, workContext)
}
public Integer getProgress() {
return workContext.getProgress();
}
....
}