Updating log level on multiple EC2 instances

Updating log level on multiple EC2 instances - java

My app runs on multiple EC2 instances to ensure high availability. The default log level is INFO for the app. But sometimes for debugging purposes, I want to update the log level to DEBUG. The request to update the log level passes through the ElasticLoadBalancer which delegates the request to any one of the multiple EC2 instances. The log level for the app running on that instance is updated but apps on the other instances will still log at level INFO. I want all the apps to log at DEBUG level.
I am using Spring, SLF4J and Logback.
If I somehow make the log level information to be centralized, and the request will update the level on the centralized location, but still someone has to intimate apps on all instances about the change as app will never be requesting the log level.

If you want an AWS solution you can utilize sns.
Once your app gets instantiated, register its endpoint (using it's private ip) to an sns topic for a http notification.
Thus instead of changing your LOG level through the load balancer you can issue a sns message and the message shall be sent to the endpoints registered.
Keep in mind to deregister the http endpoint from sns,once the app gets terminated.

You might want to take a look at Zookeeper:
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
It's quite easy to setup and start small. The app running on your EC2 nodes just needs to implement a "listener/watcher" interface. This will notify your app when some configuration changed (eg. you decided you want to set the global log level to DEBUG).
Based on this configuration-change, all of your nodes will update the local log-level without you having to come up with all kinds of ELB-bypassing manual REST-calls to tell each node to update - exactly what zookeeper is solving:
Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
When this works for you, you can add additional configuration to the zookeeper if needed, limiting the amount of configuration you need to package in the deployed apps or copied alongside them.

Amazons Remote Management (Run Command) allows you to run commands on your instances. You just need a simple script to change the loglevel.
But it is not easy to set it up and set grant all the needed IAM rights:

There are tags for an instance. Some tags exist by default and you can create your own tags. So, if we add a tag which identifies all those instances on which app is currently running, we can very easily fetch all those instances' IP addresses.
DescribeInstancesRequest request = new DescribeInstancesRequest();
Filter filter1 = new Filter("tag:Environment", Collections.singletonList("Sandbox"));
Filter filter2 = new Filter("tag:Application", Collections.singletonList("xxxxx"));
Filter filter3 = new Filter("tag:Platform", Collections.singletonList("xxxx"));
InstanceProfileCredentialsProvider mInstanceProfileCredentialsProvider =
new InstanceProfileCredentialsProvider();
AWSCredentials credentials = mInstanceProfileCredentialsProvider.getCredentials();
AmazonEC2 ec2Client = new AmazonEC2Client(credentials);
List<String> privateIps = new ArrayList<>();
ec2Client.describeInstances(request.withFilters(filter1, filter2, filter3)).getReservations().forEach(
reservation -> reservation
.getInstances()
.forEach(instance -> privateIps.add(instance.getPrivateIpAddress())));
for (String privateIp : privateIps) {
hitTheInstance(privateIp);
}
Here, I have used 3 tags to filter out the instances.

Related

How to enhance log entries for standard environment?

I have a Java application running on Google App Engine standard environment.
I am able to log from it (using JUL). In standard environment, all the application log lines from a single web request are grouped into a single entry in the request_log. Everything runs great
However, now I have a requirement to add custom labels to a log entry for a request. For example, what is user ID associated with it.
Stackdriver documentation (https://cloud.google.com/logging/docs/setup/java) gives example how to "enhance" log entries with custom labels. However, it appears that the page does not apply to standard environment.
Is it possible to add labels (or any information associated with the log entry other than app log lines) to a log entry in request_log and how? If not, what are the alternatives?

The log enhancer would allow you to add a custom labels although it would be hard coded as this function (enhanceLogEntry(LogEntry.Builder logEntry)) is called at the end of the request when the log is being populated. Supplying a value from the request to appear in the request_log made by the application would not be possible.
I do not see how it would be limited to the flexible environment, you should be able to do it from the standard environment as far as I can tell.
Alternatively, I believe the best path would be to write your own logs entries by using the Stackdriver Logging Client Libraries within your request code.

Elastic Beanstalk host specific application configuration

I have a java web application I'm trying to re-factor to work with the elastic beanstalk way of doing things. The application will be load balanced and have (for the moment) 2 hosts without taking any advantage of auto-scaling. The issue is that there are slight configuration differences between the nodes, in particular authenticating to certain web-services is done with different credentials to effectively double throughput as there are per account throttling restrictions.
Currently my application treats configuration separately from the archive so its relatively simple on fixed hosts where the configuration remains in a relatively static file path and deployment of the war files is all that is required.
Going down the elastic beanstalk path I think I'll have to include all the configuration options inside the deployable artifact and some how get the application to load up the relevant host specific configuration. The problem I have is deciding which configuration to load inside the application. I could use a physical aspect about the host, i.e. an IP address or Instance ID that would effectively load the relevant config;
/config-<InstanceID-1>.properties
/config-<InstanceID-2>.properties
This approach is totally flawed given that if I create an entirely new environment in beanstalk, it would require me to update all the configuration files in the project to reflect the new Instance-id's created.
Has anyone come up with a good way of doing this in beanstalk?

If you have to have two different types of nodes, then you should consider SOA architecture for your application.
Create two environments, environment-a and environment-b. Either set all properties for the environments through AWS web console, or can reuse your existing configuration files and just set the specific configuration file name for each environment.
#environment-a
PARAM1 = config-environment-a.properties
#environment-b
PARAM1 = config-environment-b.properties
You share the same code base and push to either environment with -e modifier.
#push to environment-a
$ git aws.push -e environment-a
#push to environment-b
$ git aws.push -e environment-b
You can also create git alias to push to both environments at the same time :-)
Now, the major benefit of SOA approach is that you can scale and manage those environments separately. It is simple and elegant.
If you want more complex and less elegant, use simple token distribution service. On every environment initialization, send two messages to Amazon SQS. Each message should contain configuration name. Then pull those messages from SQS, each instance will get exactly one from the queue. Whichever configuration name the message contains, configure your node with that configuration. :-)
Hope it helps.
Update after #vcetinick comment:
All still seems rather complex for what should be pretty simple.
That's why I suggested separate environments. You can make your own registration service, when the node comes up, it registers with the service and in return gets configuration params. You keep available configurations in persistent DB. If the node dies and the service gets another registration request, the registration service can quickly check registered all nodes (because they all left their info during the registration), and if any of the nodes is not responding, its configuration data is reassigned to the new node. And now you have single point of failure on your hands :-)
Again, there might be other ways to approach that problem.

Two instances despite using concurrent requests and low traffic

My Apache Wicket web application uses JDO for its data persistence in GAE/J.
On application start-up, the home page enqueues a task before it is shown (with zero delay to its default ETA). This task causes the construction of a new Wicket web page, in order to construct the JVM's singleton Persistence Manager Factory (PMF) instance for use by the application during its lifetime.
I have set the application to use concurrent requests by adding
<threadsafe>true</threadsafe>
to the application's appengine-web.xml file.
Despite this, after a single request to visit the application's home page, I get two application instances: one created by the home page visit request, and the other created by the execution of the enqueued task (about 6 to 7 seconds later).
I could try to solve this problem by delaying the execution of the enqueued task (by around 10 seconds, perhaps?), but why should I need to try this when I have enabled concurrent requests? Should the first GAE/J application instance not be able to handle two requests close together without causing a second instance to be brought forth? I presume that I am doing something wrong, but what is it?
I have searched Stack Overflow's set of tags ([google-app-engine] [java]), and the depreciating group "Google App Engine for Java" too, but have found nothing relevant to my question.
I would appreciate any pointers.

If you want the task to use an existing instance, you can set the X-AppEngine-FailFast header, which according to the GAE docs:
This header instructs the Scheduler to immediately fail the request if an existing instance is not available. The Task Queue will retry and back-off until an existing instance becomes available to service the request
It's worth checking out the Managing Your App's Resource Usage document for performance and tuning techniques.

Propagating configuration within the WAS cluster by means of MOM

I am developing application which is embedded within the cluster environment in Websphere AS. I am using several nodes and sometimes I would like to change configuration settings on the fly and propagate it to all nodes within the cluster. I don't want to hold the config in the db or at least I would like to cache it on the node level and trigger config refresh action which forces each node to refresh the config from some common ground (i.e. db or net drive)
to avoid constant round-trips to the config storage.
More over some configuration can't be stored in db i.e. log level needs to be applied on the logger object in each node separately.
I was thinking about using JMS Topics and publish/subscribe approach to achive that goal.
The idea is that each node could subscribe to each Topic and no matter which nodes initate the config change modification would be propagated to all nodes within the cluster.
Has anyone ever tried to do that in WAS and whether there are any obstacles with this approach. If there are or if you have any other suggestion on how to solve that problem I would be very greatfull for your help.
Tx in advance,
Marcin

Here are a few options to consider as alternatives to JMS -
Use Java EE environment entries. These are scoped to the application, and WAS will automatically propagate any changes to all servers against which the application is deployed. This is a good approach since it is the standard Java EE approach to application configuration, if it is robust enough to meet your use case.
Use a WebSphere Shared Library. This allows you to link your applications to static files external to your application (i.e. on the filesystem), such that they are available on your classpath. Although these files are located on the node file systems, there is a way that you can place these files in WebSphere's centralized configuration repository such that they are automatically propagated to all WAS nodes. For more details on this, see this answer.
Both of these options are optimized for static configuration; in other words, configuration settings that are intended to be set at assembly-time, deployment-time, or to be changed by system administrators, but they are not typically used for values that change frequently, nor are they generally changed programmatically at runtime. WAS does allow your applications to pick these configuration settings in a rolling fashion, such that no application downtime is required though.

Currently we solved the problem with maybe not the most pretty approach but with the most simple one. Since we are using only 2 nodes we have possibility to enter web interface of specific node where we modify settings per each node. Maybe it is not very pretty but for now it is the easiest way. The config is stored in DB and we are planning to trigger config reload in each node and change the log level per node as well.

How do you differentiate log4j sessions in a log file from copies of the same web-app?

There is only one file. And it is written simultaneously as web app copies run.
How do you filter only one session log messages from other log lines?

Using a servlet filter with either NDC or MDC information is the best way I've seen. A quick comparison of the two is available at http://wiki.apache.org/logging-log4j/NDCvsMDC.
I've found MDC has worked better for me in the past. Remember that you'll need to update your log4j properties file to include whichever version you prefer (pattern definitions at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html).
A full example of configuring MDC with a servlet filter is available at http://veerasundar.com/blog/2009/11/log4j-mdc-mapped-diagnostic-context-example-code/.
A slightly easier to configure, but significantly inferior option: You could opt to just print out the thread ID (via the properties file) for each request and make sure that the first thing you log about each request is a session identifier. It isn't as proper (or useful), but it can work for low-volume applications.

You could set a context message including the identifier of the specific app instance using org.apache.log4j.NDC, like this:
String appInstanceId = "My App Instance 1";
org.apache.log4j.NDC.push(appInstanceId);
// handle request
org.apache.log4j.NDC.clear();
You can set up the context during the initialization of your web app instance, or inside the doPost() method of your servlets. As its name implies, you can also nest contexts within contexts with multiple push calls at different levels.
See the section "Nested Diagnostic Contexts" in the Log4J manual.

Here is a page that sets up an MDC filter for web-app -> http://rtner.de/software/MDCUserServletFilter.html
Being a servlet filter it will free you from managing MDC/NDC in each of your servlets.
Of course, you should modify it to save information more pertinent to your web-app.

If you want to differentiate sessions in the same application then the MDC is the way to go. But if you want to differentiate the web applications writing to the same file, then MDC won't help because it works on a thread basis. In such case I used to make my own appender which knows which application instance it serves. This can be done through appender configuration properties. Such appender would stick application name into each logging event as a property before writing it into the media, and then you can use a layout to show this property value in the text file it writes to. Using MDC in such case won't work because every thread will have to MDC.put(applicationName) and that is quite ugly. MDC is only good for single process, not for several processes. If someone knows the other way, I'd like to hear.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.