I am designing a server for logging. The business logic for this application is written in multiple languages (C++ & Java for now, but other languages might be added to the mix at a later stage.
I am considering making this a separate server with a well defined interface to ensure that I need not port this for other languages at a later date. For scalability, the main application has the ability to run multiple instances on multiple machines supported by load balancers.
One of the important considerations for the design (other than the usuals like the logging level) is performance and support for multiple logging targets (flat file, console, DB(?) etc) .
How do I ensure that the logger is not impacting the performance of the application? Would communicating using a socket make sense? Is there a better way to do this?
Is there a need to have all your logs shared? I would use whatever logging mechanism is best for each stage of the logic (log4j or java's logging in java, and I guess I don't know C++'s logging libraries enough to suggest one.)
For the most part, logs should only be used for debugging and outside-the-app parsing. I would not recommend integrating logging as part of your business logic. If you really need data in the logs, you'll be much better off making a direct communication rather than spitting out the log to have it slurped in by another application.
If you absolutely need it, you can have an external (very low priority) application that feeds off the logs and sends them back to a centralized logging server.
There is data you need to see in near real time and data which needs to be recorded for offline processing. They have different requirements.
real time data needs to be in a machine readable format and is usually directed to the places where it is used. The central logger can be on this path provided it doesn't delay the real time information unacceptably. For this I would use a sockets (or JMS) rather than a buffered file.
offline processing logs can be machine readable format (for over night reports) or be human readable (for debugging) For this I would use a file or a database or both. File can be simpler to manage, esp if that are large. A database makes building reports easier.
In either case I would pass the information which needs to be send via socket or written to a file, to another thread so that any random delays in the system do not impact the code which is producing the log. In fact, I would consider delaying send any logs until whatever the critical process is complete. i.e. You process everything which needs to be done first, then log everything of interest later.
Check this:
http://logging.apache.org/log4j/1.2/manual.html
Take a look at the performance section. It will address your concerns as far as the logging overhead in your application is concerned.
As far as supporting multiple logging targets this is easily achievable with log4j but you need to delve into some details (refer to the URL I posted you).
In general, from my experience log4j is excellent. I have generated thousands of static & dynamic logs of "considerable size" ( for my application - this term may be interpreted differently for your application ) without any problem despite the heavy processing I perform (for history I am evaluating/simulating a distributed P2P algorithm in a local PC and all is going well despite creating hundred of logger instances for the simulation ).
Related
I'm interested in using JMX to monitor/configure a simple Java Client/Server application. For example, we would capture any network exceptions that occur in a Java program.
Can MBeans be extended in this way? Or are they limited to more concrete get & set functions?
So far, I've looked in Notifications and Monitor MBeans
Thanks
Well I would say that it's definitely doable. I was using JMX in an Apache Wicket application earlier with custom MBeans. Anyway MBeans is just a wrapper around some logic in your server application. So you can take the data directly from your application.
If you want to take an example how is this done in a working application you might want to checkout this:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/wrapper/MarkupSettings.java
The class basically holds a reference to the application and asks for data directly form the server app.
When the server starts up, then it registers all the MBeans through an initializer class:
https://github.com/apache/wicket/blob/master/wicket-jmx/src/main/java/org/apache/wicket/jmx/Initializer.java
Then every time when you take a look in your MBean server you will see the latest up-to-date information coming directly from the app.
There are some caveats though. One caveat is that Java in general doesn't provide any good abstraction to capture all Exceptions of a given type coming from any source of the application. You can register your catch-all exception handler but as far as I can remember it doesn't work perfectly.
What I was doing when I had to do something like this, I was using AspectJ to register an all catch place to handle exceptions. I was using compile time weaving to reduce the performance implication but I am not sure how much does it affect the overall performance (if it affects at all).
¯\_(ツ)_/¯
The other caveat is that JMX connections are usually difficult to set up in an enterprise environment. If you have to log-in through two hops just to arrive to the production servers because there are firewalls everywhere than your monitoring connection will definitely fail and you need to keep buying beer to your sysadmin and convince your manager that this is not imposing any security risk. :)
There is one thing though. You say
to monitor/configure a simple Java Client/Server application
You want to configure / monitor the clients as well? I've never done that. I am not sure that's even possible.
I made a web based application by using the java language, and I would like to monitor its performance periodically (e.g. response time). Also I want to display this information on the homepage of my application. Is that possible? Can I have any idea about how this can be made.
Thanks.
You can take a look at stagemonitor. It is a open source java web application performance monitor. It captures response time metrics, JVM metrics, request details (including a call stack captured by the request profiler) and more. The overhead is very low.
Optionally, you can use the great timeseries database graphite with it to store a long history of datapoints that you can look at with fancy dashboards.
Example:
Take a look at the github page to see screenshots, feature descriptions and documentation.
Note: I am the developer of stagemonitor
Depending on your environment, I would use a cron job or task that measures the response time to request your app using something like HttpClient. Then drop that information into a database table accessible by your app.
The answer here is the simplest way you can measure the time: How do I time a method's execution in Java?
Why not checkout Munin monitoring? The website says
Munin the monitoring tool surveys all your computers and remembers
what it saw. It presents all the information in graphs through a web
interface. Its emphasis is on plug and play capabilities. After
completing a installation a high number of monitoring plugins will be
playing with no more effort.
SLAC at the Stanford university also keeps a large, quite well sorted list with various solutions for network monitoring among other things. SLACs list of Network Monitoring Tools, check for instance "Public domain or free network monitoring tools".
You can also consider to create your own custom web application monitor. Therfore, use the ProxyPattern and and create a concreate monitor. By using Spring framework you can easily swich on and off the monitor during runtime without re- deployment or restart of the web application. Furthermore you can create a lot of different specific monitors by yourself and are able to control what is beeing monitored. This gives you a maximum of flexibility, but requires a bit of work.
It is possible.
The clearest way to go about it, providing true numbers is to simulate a client that performs some sort of activity that mimics the real usage. Then have that client periodically use the website.
This presupposes that your website has a means to accept inputs that do not impact the real back end business. Crafting such interfaces requires some thought, but is not beyond the ability of a person who could put together the web site in the first place. The key points are to attempt to emulate as much using the real website as possible, but guard against real business impact. Basically it is designing for a special user (the tester).
So you might have a special user that when logged in, all purchases are bound to a special account that actually is filtered out to appropriately not demand payment and not ship goods. Provided the systems you integrate with all share an understanding of this live testing account, you can simultaneously test alongside of real production post-deployment.
Such a structure provides a huge benefit. You get performance of the real, live running system. Performance tends to change over time, and is subject to the environment. By fetching your performance numbers on the live system, in the same environment, you get a much better view of what real users might be encountering. Also, you can differentiate and track performance for different activities.
Yes, it is a lot more to design and set up; however, if you are in it for the long run, the benefits are huge.
I guess JavaMelody is the most appropriate solution for you. It can be built into a Java application and due to this feature, it monitors the functionality inside the app. Using this platform, it’s possible to get much more specific parameters for your Java app, than via external monitoring. In addition, it allows you displaying some statistics on your app homepage. Moreover, you can build in the app the graphs from JavaMelody, that essentially facilitates the app monitoring.
Take a look at the detailed overview of JavaMelody: http://cases.azoft.com/enterprise-system-monitoring-solutions-business-apps/
I am used to adding logging to standalone Java applications and writing the logs to the files using log4j and sl4j. I am moving some applications to a Java web start format and I am not clear on what is the best way to perform the logging to monitor the behaviour of the application. I have thought of two options
Write the log to the local machine and provide an option to send the information to the central server under some condition (time, error etc..)
Send the output of the log to the server directly
What is best practice?
I've seen 1. implemented by many programs.
But 2. seems bandwidth intensive, intrusive, and overkill.
Agreed, 2 seems like it's not such a good option. An error with webservices wouldn't be logged in that case. I was wondering if there was any other option but I can't think of any.
I was thinking of entirely local sources of problems connecting to the server, but good point.
What is best practice?
Stick with the majority and use method 1. Unless you have a marvelous inspiration about how the entire logging/reporting system can be improved, I'd go with "tried and tested". It is likely to be easiest, best supported by existing frameworks, and should your code falter, has the greatest number of people who have 'been there, done that' to potentially help.
I'm looking for a way to centralise the logging concerns of distributed software (written in Java) which would be quite easy, since the system in question has only one server. But keeping in mind, that it is very likely that more instances of the particular server will run in the future (and there are going to be more application's in need for this), there would have to be something like a Logging-Server, which takes care of incoming logs and makes them accessable for the support-team.
The situation right now is, that several java-applications use log4j which writes it's data to local files, so if a client expiriences problems the support-team has to ask for the logs, which isn't always easy and takes a lot of time. In the case of a server-fault the diagnosis-problem is not as big, since there is remote-access anyways, but even though, monitoring everything through a Logging-Server would still make a lot of sense.
While I went through the questions regarding "centralised logging" I found another Question (actually the only one with a (in this case) useable answer. Problem being, all applications are running in a closed environment (within one network) and security-guidelines do not permit for anything concerning internal software to go out of the environments network.
I also found a wonderful article about how one would implement such a Logging-Server. Since the article was written in 2001, I would have thought that someone might have already solved this particular problem. But my search-results came up with nothing.
My Question: Is there a logging-framework which handle's logging over networks with a centralised server which can be accessed by the support-team?
Specification:
Availability
Server has to be run by us.
Java 1.5 compatibility
Compatibility to a heterogeneous network.
Best-Case: Protocol uses HTTP to send logs (to avoid firewall-issues)
Best-Case: Uses log4j or LogBack or basically anything that implements slf4j
Not necessary, but nice to have
Authentication and security is of course an issue, but could be set back for at least a while (if it is open-software we would extend it to our needs OT: we always give back to the projects).
Data mining and analysis is something which is very helpful to make software better, but that could as well be an external application.
My worst-case scenario is that their is no software like that. For that case, we would probably implement this ourselves. But if there is such a Client-Server Application I would very much appreciate not needing to do this particularly problematic bit of work.
Thanks in advance
Update: The solution has to run on several java-enabled platforms. (Mostly Windows, Linux, some HP Unix)
Update: After a lot more research we actually found a solution we were able to acquire. clusterlog.net (offline since at least mid-2015) provides logging services for distributed software and is compatible to log4j and logback (which is compatible to slf4j). It lets us analyze every single users way through the application. Thus making it very easy to reproduce reported bugs (or even non reported ones). It also notifies us of important events by email and has a report system were logs of the same origin are summorized into an easily accessable format. They deployed (which was flawless) it here just a couple of days ago and it is running great.
Update (2016): this question still gets a lot of traffic, but the site I referred to does not exist anymore.
You can use Log4j with the SocketAppender, thus you have to write the server part as LogEvent processing.
see http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SocketAppender.html
NXLOG or LogStash or Graylogs2
or
LogStash + ElasticSearch (+optionally Kibana)
Example:
1) http://logstash.net/docs/1.3.3/tutorials/getting-started-simple
2) http://logstash.net/docs/1.3.3/tutorials/getting-started-centralized
Have a look at logFaces, looks like your specifications are met.
http://www.moonlit-software.com/
Availability (check)
Server has to be run by us. (check)
Java 1.5 compatibility (check)
Compatibility to a heterogeneous network. (check)
Best-Case: Protocol uses HTTP to send logs (to avoid firewall-issues) (almost TCP/UDP)
Best-Case: Uses log4j or LogBack or basically anything that implements slf4j (check)
Authentication (check)
Data mining and analysis (possible through extension api)
There's a ready-to-use solution from Facebook - Scribe - that is using Apache Hadoop under the hood. However, most companies I'm aware of still tend to develop in-house systems for that. I worked in one such company and dealt with logs there about two years ago. We also used Hadoop. In our case we had the following setup:
We had a small dedicated cluster of machines for log aggregation.
Workers mined logs from production service and then parse individual lines.
Then reducers would aggregate the necessary data and prepare reports.
We had a small and fixed number of reports that we were interested in. In rare cases when we wanted to perform a different kind of analysis we would simply add a specialized reducer code for that and optionally run it against old logs.
If you can't decide what kind of analyses you are interested in in advance then it'll be better to store structured data prepared by workers in HBase or some other NoSQL database (here, for example, people use Mongo DB). That way you won't need to re-aggregate data from the raw logs and will be able to query the datastore instead.
There are a number of good articles about such logging aggregation solutions, for example, using Pig to query the aggregated data. Pig lets you query large Hadoop-based datasets with SQL-like queries.
I've been asked to port a legacy data processing application over to Java.
The current version of the system is composed of a nubmer of (badly written) Excel sheets. The sheets implement a big loop: A number of data-sources are polled. These source are a mixture of CSV and XML-based web-servics.
The process is conceptually simple:
It's stateless, that means the calculations which run are purely dependant on the inputs. The results from the calculations are published (currently by writing a number of CSV files in some standard locations on the network).
Having published the results the polling cycle begins again.
The process will not need an admin GUI, however it would be neat if I could implemnt some kind of web-based control panel. It would be nothing pretty and purely for internal use. The control panel would do little more than dispay stats about the source feeds and possibly force refresh the input feeds in the event of a problem. This component is purely optional in the first delivery round.
A critical feature of this system will be fault-tolerance. Some of the input feeds are notoriously buggy. I'd like my system to be able to recover in the event that some of the inputs are broken. In this case it would not be possible to update the output - I'd like it to keep polling until the system is resolved, possibly generating some XMPP messages to indicate the status of the system. Overall the system should work without intervention for long periods of time.
Users currently have a custom-client which polls the CSV files which (hopefully) will not need to be re-written. If I can do this job properly then they will not notice that the engine that runs this system has been re-implemented.
I'm not a java devloper (I mainly do Python), but JVM is the requirement in this case. The manager has given me generous time to learn.
What I want to know is how to begin architecting this kind of project. I'd like to make use of frameworks & good patterns possible. Are there any big building-blocks that might help me get a good quality system running faster?
UPDATE0: Nobody mentioned Spring yet - Does this framework have a role to play in this kind of application?
You can use lots of big complex frameworks to "help" you do this. Learning these can be CV++.
In your case I would suggest you try making the system as simple as possible. It will perform better and be easier to maintain (its also more likely to work)
So I would take each of the requirements and ask yourself; How simple can I make this? This is not about being lazy (you have to think harder) but good practice IMHO.
1) Write the code that processes the files, keep it simple one class per task, you might find the Apache CSV and Apache Commons useful.
2) Then look at Java Thread Pools to create a sperate process runner for those classes as seperate tasks, if they error it can restart them.
3) The best approach to start up depends on platform, but I'll assume your mention of Excel indicates it's windows PC. The simplest solution would therefore be to run the process runner from Windows->Startup menu item. A slightly better solution would be to use a windows service wrapper Alternatively you could run this under something like Apache ACD
There is a tool in Java ecosystem, which solves all (almost) integration problems.
It is called Apache Camel (http://camel.apache.org/). It relies on a concept of Consumers and Producers and Enterprise Integration Patterns in between. It provides fault-tolerance and concurrent processing configuration capabilities. There is a support for periodical polling. It has components for XML, CSV and XMPP. It is easy to define time-triggered background jobs and integrate with any messaging system you like for job queuing.
If you would be writing such system from scratch it would takes weeks and weeks and still you would probably miss some of the error conditions.
Have a look at Pentaho ETL tool or Talend OpenStudio.
This tools provide access to files, databases and so on. You can write your own plugin or adapter if you need it. Talend creates Java code which you can compile and run.