Appropriate design pattern for an event log parser?

Appropriate design pattern for an event log parser? - java

Working on a project that parses a log of events, and then updates a model based on properties of those events. I've been pretty lazy about "getting it done" and more concerned about upfront optimization, lean code, and proper design patterns. Mostly a self-teaching experiment. I am interested in what patterns more experienced designers think are relevant, or what type of pseudocoded object architecture would be the best, easiest to maintain and so on.
There can be 500,000 events in a single log, and there are about 60 types of events, all of which share about 7 base properties and then have 0 to 15 additional properties depending on the event type. The type of event is the 2nd property in the log file in each line.
So for I've tried a really ugly imperative parser that walks through the log line by line and then processes events line by line. Then I tried a lexical specification that uses a "nextEvent" pattern, which is called in a loop and processed. Then I tried a plain old "parse" method that never returns and just fires events to registered listener callbacks. I've tried both a single callback regardless of event type, and a callback method specific to each event type.
I've tried a base "event" class with a union of all possible properties. I've tried to avoid the "new Event" call (since there can be a huge number of events and the event objects are generally short lived) and having the callback methods per type with primitive property arguments. I've tried having a subclass for each of the 60 event types with an abstract Event parent with the 7 common base properties.
I recently tried taking that further and using a Command pattern to put event handling code per event type. I am not sure I like this and its really similar to the callbacks per type approach, just code is inside an execute function in the type subclasses versus the callback methods per type.
The problem is that alot of the model updating logic is shared, and alot of it is specific to the subclass, and I am just starting to get confused about the whole thing. I am hoping someone can at least point me in a direction to consider!

Well... for one thing rather than a single event class with a union of all the properties, or 61 event classes (1 base, 60 subs), in a scenario with that much variation, I'd be tempted to have a single event class that uses a property bag (dictionary, hashtable, w/e floats your boat) to store event information. The type of the event is just one more property value that gets put into the bag. The main reason I'd lean that way is just because I'd be loathe to maintain 60 derived classes of anything.
The big question is... what do you have to do with the events as you process them. Do you format them into a report, organize them into a database table, wake people up if certain events occur... what?
Is this meant to be an after-the-fact parser, or a real-time event handler? I mean, are you monitoring the log as events come in, or just parsing log files the next day?

Consider a Flyweight factory of Strategy objects, one per 'class' of event.
For each line of event data, look up the appropriate parsing strategy from the flyweight factory, and then pass the event data to the strategy for parsing. Each of the 60 strategy objects could be of the same class, but configured with a different combination of field parsing objects. Its a bit difficult to be more specific without more details.

Possibly Hashed Adapter Objects (if you can find a good explanation of it on the web - they seem to be lacking.)

Just off the top:
I like the suggestion in the accepted answer about having only one class with a map of properties. I also think the behvavior can be assembled this way as well:
class Event
{
// maps property name to property value
private Map<String, String> properties;
// maps property name to model updater
private Map<String, ModelUpdater> updaters;
public void update(Model modelToUpdate)
{
foreach(String key in this.properties.keys)
{
ModelUpdater updater = this.updaters[key];
String propertyValue = this.properties[key];
updaters.updateModelUsingValue(model, propertyValue);
}
}
}
The ModelUpdater class is not pictured. It updates your model based on a property. I made up the loop; this may or may not be what your algorithm actually is. I'd probably make ModelUpdater more of an interface. Each implementer would be per property and would update the model.
Then my "main loop" would be:
Model someModel;
foreach(line in logFile)
{
Event e = EventFactory.createFrom(line);
e.update(someModel);
}
EventFactory constructs the events from the file. It populates the two maps based on the properties of the event. This implies that there is some kind of way to match a property with its associated model updater.
I don't have any fancy pattern names for you. If you have some complex rules like if an Event has properties A, B, and C, then ignore the model updater for B, then this approach has to be extended somehow. Most likely, you might need to inject some rules into the EventFactory somehow using the Rule Object Pattern. There you go, there's a pattern name for you!

I'm not sure I understand the problem correctly. I assume there is a complex 'model updating logic'. Don't distribute this through 60 classes, keep it in one place, move it out from the event classes (Mediator pattern, sort of).
Your Mediator will work with event classes (I don't see how could you use the Flyweight here), the events can parse themselves.
If the update rules are very complicated you can't really tackle the problem with a general purpose programming language. Consider using a rule based engine or something of the sort.

Related

Why is EventSourcingHandler (in aggregate object) needed?

Fair warning: I have no idea what I'm doing, so even the asking of this question may go awry.
I'm wanting to update state on a simple object (the Aggregate) and then provide the UI with a projection of the changed object. Here's my aggregate object (command handler exists, but not shown here).
#Aggregate
public class Widget {
#AggregateIdentifier
private String id;
private String color;
...
#EventSourcingHandler
public void on(ChangeColorEvt evt) {
color = evt.getColor();
}
...
}
...and here is my projection:
public class WidgetProjection {
private final EntityManager entityManager;
private final QueryUpdateEmitter queryUpdateEmitter;
...
#EventHandler
public void on(ChangeColorEvt evt) {
ProjectedWidget projection = entityManager.find(ProjectedWidget.class, evt.getId());
projection.setColor(evt.getColor());
queryUpdateEmitter.emit(FetchWidget.class, query -> evt.getId().startsWith(query.getFilter().getIdStartsWith()), projection);
}
...
}
The QueryHandler does what you would expect, finding the instance and returning it.
So here are my questions:
Why do I need the EventSourcingHandler in the Aggregate? That is, it
appears to do work, but the result isn't stored or sent anywhere.
So, that becomes my next question: After executing the EventSourcing
Handler, is the resulting instance of Widget (not the projection)
stored, seen, or sent anywhere?
If EventSourcingHandler is indeed needed, is there any way to avoid
having two independent copies of my business logic (one in
EventSourcingHandler, and one EventHandler)? I really hate the idea
of having to update the same business logic in two places.
I appreciate any clarity the group can provide. Thanks for your help!

I hope I can help you out with this question!
The short answer would be:
The necessity of using the #EventSourcingHandler really depends on how you want to model your application.
Now, that doesn't say much, so let me elaborate a little why I am stating this.
In answering your questions I am assuming that your desire is to create an application where you follow the ideas of DDD, CQRS and Event Sourcing. Thus, the concepts which Axon tries to simplify for you whilst building your application.
Let me go over your numbered questions from here:
The reason you'd need #EventSourcingHandler annotated functions in your Aggregate, is to (1) update the Aggregate state based on the (2) events it has published. To go to point 1, why would you update the state of your Aggregate? As your Aggregate is the Command Model of your application, it is tasked with making the decisions based on the commands it receives. To be able to make decisions, sometimes an Aggregate needs state, but sometimes it does not. So in the Widget example you've shared, I'd assume that the color field is no used at all to drive business logic later on, hence you could perfectly omit this state from the Aggregate without any problems. With the second point of my response I try to point out that an Aggregate will only ever handle the events which originate from itself. This is as the events are the source of your Aggregate, as the events constitute all the delta's which have occurred on that given model.
Your following questions fits nicely with proceeding the explanation I've started in point 1. The answer is quite simple, your Widget Aggregate is not stored anywhere, not as is. Every time you'd retrieve your Aggregate from the Repository (this is done automatically for you in Axon), which defaults to an EventSourcingRepository in Axon, all the Events for that given Aggregate will be retrieved from the Event Store. Than, an empty Aggregate instance is created and the framework will replay all the events it has found for that exact Aggregate instance. You're thus effectively Event Sourcing your Aggregate every time a new command comes in. This might sound a little overkill, as the number of events for a given Aggregate might grow to quite a large set. This can be solved by doing things like making snapshots of the Aggregate.
If this form of splitting your application in a dedicated part which deals with your business logic, the Command Model, and the part which simply returns a Query Model as an answer, the Query Model, then you could decide to have a State Stored Aggregate. So note, your are not required, at all, to do Event Sourcing when using Axon; it's just the default modus operandi for the framework. Thus I understand the felt concern from your part, that you're duplicating your logic. You can however strictly separate the part which makes all the decisions to be held in your Aggregate.
The Query Model (in your example the ProjectedWidget) can be stored in what ever format and in what ever database/tool you'd like, ideally without any business logic.
If you do find yourself adding business logic in the Query side of your application, this might suggest you should upgrade this bit as an event originating from your Aggregate's.
I hope this brings you some insights into why you'd go for Event Sourcing to begin with. This article describes CQRS in a little more detail than I could do here, and this link for Event Sourcing; hope they might serve as a more thorough explanation than I just gave you.

Modelling event type objects

We have an application that is composed of a number of independent components and sub-systems. We are looking at implementing a simple event logging mechanism where these components & sub-systems can log some events of interest. Events could be something like
New account created
Flight arrived
Weekly report dispatched to management etc.
As you can see, the event types are heterogeneous in nature and the attributes that needs to be logged differs based on the event types. New account created event, for example, will also log the account-id, the name of the user who created the new account etc. Whereas, the flight arrived event will be logging the flight number, arrived at, arrived from etc.
I'm wondering what is the good way of modelling the event types and the attributes.
One option is to do it object oriented way - to have an AbstractEvent that will have some common attributes (timestamp, message etc) and then create a full hierarchy of classes underneath. The flight events, for example, can look like
abstract class AbstractEvent;
abstract class FlightEvent extends AbstractEvent;
class FlightArrivedEvent extends FlightEvent;
class FlightCancelledEvent extends FlightEvent;
The problem I see with this approch is that we have hundreds of events which will result in class explosion. Also, whenever we add a new event (very likely), we have to create a class and distribute the new package to all the components and sub-systems.
The second option I can think of is on the other end of the spectrum. Have a simple Event class that contains the basic attributes and wrap a map inside it so that the clients can populate any data they want. The code in that case will look something like this.
class Event {
private timestamp;
private eventType;
private Map attributes;
public Event ( String eventType ) {
timestamp = System.nanoTime();
this.eventType = eventType;
attributes = new HashMap();
}
public Event add ( String key, String value ) {
attributes.put ( key, value );
return this;
}
}
//Client code.
Event e = new Event("FlightEvent:FlightArrived")
.add("FLIGHT_NUMBER", "ABC123")
.add("ARRIVED_AT", "12:34");
While this is flexible, it suffers from inconsitency. Two components can log the FLIGHT_NUMBER key in two different formats (FLIGHT_NUMBER & FLGT_NO) and I can't think of a good way to enforce some convention.
Any one have some suggestions that can provide a nice compromise between these two extreme options?

There is a Java event framework (see java.util.EventObject and the Beans framework) but the fundamental question you are asking is not connected with events. It is a design question, and it is this: do I use Java classes in my application to represent classes in my business domain?
It is clear that the different types of event are different "classes" of thing, but for maintainability reasons you are considering representing your business data in a map so that you don't have to write and distribute an actual class. If you take this to a logical extreme, you could design your whole application with no classes and just use maps and name-value pairs for everything - not just events. It would be a mess and you would be debugging it forever because you would have no type-safety whatsoever. The only way of finding what was in map would be to look up in some documentation somewhere what someone might have added to it and what type that object might be.
So, here is the thing - you would not have actually have gotten rid of your class definition.
You will have moved it into a Word document somewhere that people will have to refer to in order to understand what is in your map. The Word document will need to be maintained, verified and distributed but unlike the Java class, it won't be checked by the compiler and there is no guarantee that the programmers will interpret it correctly.
So I would say, if there is a class, put it in your code and then focus on solving the problems of distributing and versioning the Java classes instead of distributing and versioning Word documents.
I will mention versioning again as this is an issue if you might serialise the objects and restore them, so you need to think about that.
Some caveats:
If you are writing a piece of middleware software that routes events from one system to another system, it might be you don't need to know are care what the data is, and it might make sense to use a generic holder in this case. If you don't need to look at the data, you don't need a class for it.
You might get complaints from high-level designers and architects about the number of classes and the work they have to do in defining them compared with a map and name/value stuff. This is because putting classes (i.e., the real design) in Java is harder than putting them in a Word document. Easier, if you are high-level hand-waving type guy, to write something wishy-washy in Word that doesn't need to run or even compile and then give the real design work to the programmers to get working.

Can [someone] provide a nice compromise between these two extreme options?
No. There is no generic one-size-fits-all answer to this problem. You will have to find yourself a balance which fits the general design of your product. If you nail everything down, you will need thousands of classes. If you give a lot of leeway, you can get away with a few but you're paying your freedom with precision. See my blog post "Designing a Garbage Bin"
Do you have shared attributes? As in: Do you expect to define attributes of events like you define classes right now with very tight-fitting semantics?
That would mean you have a simple event and typed attributes (i.e. String value simply isn't sufficient). You need formatting and validation for attributes or ... attributes themselves need to be classes.
If this is the case, you can use my type-safe map pattern: http://blog.pdark.de/2010/05/28/type-safe-object-map/

Event type "explosion" is not a problem. In fact it is a desirable approach as it allows the components to be independent of one another. I wouldn't necessarily make all events inherit from a single superclass unless it gives you a lot of reusable code because it can cause dependencies to start proliferating.
I would put the event types in a separate project that will be a dependency of both the publisher and consumer.
What is your communication mechanism for these events between components? JMS? If so you could also consider making your messages XML and using JAXB.
I would definitely discount the map approach as it destroys any hope of polymorphism or any other oo niceties.

Design pattern request for event queue entries

I really need your help regarding a design pattern which I can use, because right now I can't think of the best solution.
I need something which can accomplish the following thing.
Currently I have 3 objects :
NotificationOne.java
NotificationTwo.java
NotificationThree.java
Each one, represents basically the same thing, but they have nothing in common when it comes to fields/attributes.
These are actually some JSON's which I will map into objects, when they arrive via a JMS Queue.
Now, what I really need to do, is to transform these 3 objects to a common object, by interpreting their fields in a particular way for each one. Easy to done 'till now.
The real question is, what would be the best design pattern to apply, considering that in time, there will be more and more types of Notifications which will have to be transformed from something to a common object.
The flow of things will be something like this :
-JSON gets in the queue
-I will map the JSON to a POJO
-Pass the POJO to a possible factory, which will have to deal with each type of Notification class, so it can transform it into something which we'll call it CommonNotification.
-CommonNotification has to be stored into DB
-A particular field of CommonNotification has to be used as a notification payload.
Based on this flow, what's the best pattern which I can use.
Thanks in advance.

You're saying the conversion depends on the type of notification? Like zapl said, in that case, I would create a common interface and let the three notification classes do the conversion. The design is very simple. Everybody who creates a new notification, knows that it should implement the interface and hence, knows that a conversion should be done.
Strictly theoretically speaking you're right in that a POJO should be a POJO, but I wouldn't make my design any more complicated than needed in this case. An interface and polymorphism is the way to go here.
Next to that, creating a common notification doesn't have any side effects in the system. So it's not that your POJO is modifying any system state. It just has some logic which is very POJO specific. It belongs in the POJO.

Use the Decorator pattern.
The ConcreteDecorator just adds responsibilities to the original
Component.
You can have different concrete decorator for different output but the responsibilities from your original component will be there in every output if that's what you want.
See http://www.oodesign.com/decorator-pattern.html

Observer Pattern: could event processor affect subject execution?

There are several incarnations of Observer pattern: Listeners, Events etc. (am I right?)
But today we found we have no agreement among team members - is it possible that Listener (Observer) affect execution of the method which invokes it.
The example was proposed as following: when the deal is saved via DealDao, it could fire an event which would be caught by some Listener which will:
update calculated fields in the deal (some fees etc.);
create some dependent entities.
I myself greatly dislike this approach - for example because if update of dependent entity in the listener throws an exception, the deal update should be rolled back too.
But it seems unnatural that Listener forces the subject to roll back.

In general if you are changing the behavior of the object then you may be implementing the Strategy Pattern not the Observer Pattern.
Modifying the model itself in response to observed events on it can lead to very complicated behavior so I wouldn't really recommend it but it is valid.
Generally try to think about the "flow" of data and operations through your system. If you can make the flow always go in one direction (or at least have as few loops as possible) then the whole thing becomes a lot more simple to manage and understand.
Adding an observer is "down flow" from the original data. If that observer then goes back and modifies the original data then that is flowing upstream and makes the behavior of your whole program a lot more complicated. For example what happens if that change then triggers another event, which then comes back into the same observer again, etc? Someone tracing through the execution is going to find unexpected data changes until they find that observer and you run into the potential for recursive loops that stack overflow and all sorts of similar fun and games.

If there are no listeners to DealDao, will the deal be saved at all?
If yes, then you actually have an implicit listener which actually does saving operation. Then, when another listener is added which updates fields in the deal, then we have two listeners which operate on the same object. This is clearly violation of encapsulation principle which may cause problems. But Observer Pattern is not in vain: similary, you could get same effect in other way. As user Tim B pointed, first design flow of data with minimum of loops, that is, as a graph with nodes and edges, and let each node be well-defined object (in OOP sense). Only after that, think how to implement it, and Observer Pattern is a valid option.

Java Swing: keeping the event handling maintanable

In my current project we are using for our Swing client the following patterns:
Business Object (POJO) <--> (mapping) <--> Presentation Model (POJO with property change support) <--> (binding) <--> view components
Everything is fine and works the way we are expecting them to behave.
BUT, we encouter those problems when view begin to grow:
Many events are fired, resulting in cascading events. One update on a field may result in dozens of subsequent property update
When dialog complexity grows, the number of listeners grows as well, and the code begin to be messy and hard to understand.
Edit after first answer:
We don't fire event if there is no change on value.
We don't add listener if we have no need for them.
We have screen with very complexes rules and with a need for notification from other related panels. So we have a lots of useful listeners and a single user change can triggers many underlying events.
The idea about binding the presentation model with the business model is not so good for us: we perform some code in the mapping process.
I'm looking for guides, advices, best practices, etc about building maintainable Swing application, especially for the event management side.

There are many ways of reducing the number of events being sent.
Don't propagate an event when there is no change. A typical example for this one ios the idiomatic way of writing a setter triggering a PropertyChangeEvent (see below) but it is the case for all kind of events you fire by hand.
public void setMyField(Object newValue) {
Object oldValue = myField;
if((oldValue==null && newValue!=null) || (oldValue!=null && !oldValue.equals(newValue))) {
myField = newValue;
propertyChangeSupport.firePropertyChange("myField", oldValue, newValue);
}
}
}
Only register as event listener when you start to be interested in, and unregister as soon as you stop being interested in. Indeed, being a listener, even if it is for no action, forces the JVm to call the various methods used for event propagation. Not being a listener will avoid all those calls, and make the application by far simpler.
Consider replacing your POJO to increased POJO mapping by direct instanciation of increased POJO. or, to say things simpler : make your POJO authentical java beans with PropertyChangeEvent handling abilities. To allow them to be easily persisted, an easy solution is, once "rehydrated", or loaded from the persistence layer, add a persistence updater mechanism as PropertyChangeListener. This way, when a POJO is updated, the persistence layer gets notified, and updates the object in DB transparently.
All these are rather simple advices, requiring only a great dose of discpline, to ensure events are fired only at the right time, and for the right listeners.

I suggest a single event action per model. Don't try breaking it down to fields with the hopeless ProtpertyChangeListener. Use ChangeListener or your own equivalent. (Honestly, the event argument is not helpful.) Perhaps change the 'property' type to be a listenable object, rather than listening to the composite object.

The EventListenerList scheme used by most Swing components is fairly lightweight. Be sure to profile the code before deciding on a new architecture. In addition to the usual choices, another interesting approach to monitoring event traffic is suggested by this EventQueue subclass example.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.