Migrating a large-scale application from JavaEE to Akka

Migrating a large-scale application from JavaEE to Akka - java

Suppose I have a very large-scale server-side web application written in JavaEE (and related technologies classically combined with it), and I decided to migrate it completely to Akka (and related technologies usually combined with it, including moving the code to Scala). The reasons of the migration decision are not important: suppose I have to do it, and that's all to it.
My question is: What would be the strategy to follow here, aiming to optimize the migration time and the scalability of the resulting application?
If the question lacks of details, I can provide some, although I would like to hear strategies without being very specific.

This is an open ended question. But let me try and give you some ideas. Having worked with both J2EE as well as Play2/Akka/Spray.io (Scala) based system I can provide you will the following high level/general guidance for migration.
Partition your system: Partition your current system based on functionality and rank them according to their criticality to business, your stakeholders and clients. Partitions can done based on different dimensions ( architectural components at runtime, business features, development team/modules) etc. You also need to find dependencies between these partitions.
Identify candidate partition: Once you have ranked partitions, it’s useful to pick the smallest possible partition that overlaps in as many dimensions as possible and has the least amount of coupling. Usually this is the case if your initial architecture is modular.
Implement a prototype: Take the candidate partition and create a prototype that provides the same functional capability. Now evaluate and compare the new capability against the old in terms of various quality attributes (performance, modifiability, extensibility etc). The prototype will also give you an estimate of technical risk, challenges, and effort.
Create a new architecture: I think at this point you should have enough input to create the first version of your new architecture. Also identify how capabilities of other partitions will be implemented in this new architecture. Selecting the most complex partition and try to map it to this new architecture is really good exercise and can massively reduce your technical risk in the future.
Field the prototype: Try to field the prototype to a small subset of users/stakeholders and get feedback. Decoupling the prototype using REST/pub-sub interfaces is a good idea.
Plan for migration: Create a plan and schedule for rest of your system.
I can be more specific if you provide more targeted questions.

Related

Adding dependencies to microservices

I have built few microservices that consume a number of external services. Few of these external services are consumed by more than 1 microservice that I have built. I have built the connectors to these microservices as a library project and have included it as a dependency in all my microservice projects. However I read that all logic for microservices should be self contained and duplication of logic is ok. If that's the case , is it recommended for me to define these connectors within each every microservice instead of having a shared library?

...all logic for microservices should be self-contained and
duplication of logic is ok
I think this is the core of the issue you are struggling with. Is this statement actually true?
A quick google search later:
http://www.simplicityitself.io/our%20team/2015/01/12/sharing-code-between-microservices.html
This article talks about this exact question, which we can now frame as What is the appropriate level of re-use in microservice architecture?
The author provides a list of reasons why developers feel the need to share code, ordered from lowest to highest in terms of coupling and loss of isolation:
Leverage existing technical functionality
Sharing data schemas, using a class, for example, as an enforcement of a shared schema.
Sharing data sources, use of the same database by multiple services.
Though this list covers most of the reasons, I would add another important reason to share code, which is to do with a common framework for rapid standing up of microservices, commonly called the Microservice Chassis pattern.
The author goes onto say:
It is of utmost importance to pin down your motivation for wanting to
share code, as unfortunately there is no right answer to this
question. Like everything else, it’s contextual.
So, all that said, should you centralize your connectors or not?
Well, where do these dependencies fit in on our list? And what degree of coupling can you endure before you're no longer doing microservices but building a monolith instead?
These are not easy questions to answer, but hopefully this will help guide you to the correct conclusion.

java service vs Rules engine implementation

I am confused choosing between java service and IBM Rules Designer. I am aware of the fact that we should use Rules Engine for less development effort and whenever the business requirements are subject to change frequently. But I have requirements which can be developed either using java or Rules Engine. Considering the performance, maintenance cost,re usability and other factors in long term which is the best option to implement? what are the ideal cases when to use this either of them?

I believe the question is a bit objective.
For me, if certain "changeable logic" is related routine work (e.g. Such settings are required when introducing a new user to system, or a new product to be sold etc), I will consider using rule engine (or other "soft coding" skill as mentioned in the comment in OP). As we should not require deploying the application again just because we need to do such routine job.
However, if some logic is related to requirement and such change is not triggered by routine job, I am inclined to write it in code.

Solution to provide shared entities between multiple Java processes

I am trying to reconstruct a flow of information from multiple parts handled by different Java processes. Please note that i don't generate the flows, i just read some information about them.
I've tried using MySQL (MyISAM/InnoDB tables) with INSERT ON DUPLICATE KEY UPDATE using an id for each flow. I've also tried storing all the pieces of information and running a query at the end to get the full information. Neither of these approaches yielded the performance needed.
I'm looking for a solution that will allow me to have a set of shared objects between multiple Java processes. The objects should be persistent between runs and fast to lookup/update concurrently (>100k lookups/updates per second).
I've thought of a few solutions including:
NoSQL: something like MongoDB, HBase etc.
a caching solution like EhCache, Memcached etc.
The problem is i don't have any experience with any of these solutions. So, what would you recommend that fits the following criteria:
very fast on a single system. Most of the applications i mentioned were built for distributed systems, but it's not the case here.
easy to learn/use (i want to be able to prototype it in a day)
mature technology
free to use even for commercial purposes
preferably open-source

You could try a seperate java process that co-ordinates between the others. This process would hold the information to pass over to the main processes. You could wire them up with RMI.

If you want to do only exchange of objects withing java applications, you could also looki into tuple spaces. There are specific implementations of spaces for java, JavaSpaces, which should be able to do what you need. Not sure if they can keep up with the performance though. Also I’m not sure how widely this technology is still being used, since it only supports Java and isn’t as flexible as NoSQL stores would be these days.
Wikipedia has a more detailed description and list of different implementations, many of which are open source.
The other option is to go with Redis, you have notifications there and it can for sure scale to the requirements you are looking for.

The old (legacy?) solution is JavaSpaces. However, from an software architects point of view I would say distributed caches are the replacements for that nowadays. Especially take a look at hazelcast and infinispan.
From the performance viewpoint I am not happy with the performance of the "big" distributed caching solutions, when only a single in-memory cache is needed, see my writeup on the cache2k benchmarks page (hazelcast needs to be added here).
Anyways, please clarify your problem statement first, because your question falls into the XyProblem category. You are not describing the actual problem, and your question just boils down to "fast reliable distributed objects" solution. What kind of data comes in? What is the rate? Who is it accessed? What consistency guarantees need to be met, considering the fact that writing and reading is in parallel?
By the term "flow of information" it sounds more like a complex event processing problem to me.

Real life experience with the Axon Framework [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
As part of researching CQRS for use with a project, I ran across the Axon Framework, and I was wondering if anyone has any real life experience with it. Just to be clear, I'm asking about the framework, not CQRS as an architectural pattern.
My project already uses Spring and Spring Integration which fits nicely with Axon's own requirements, but before i dedicate a lot of time to it, I would like to know if anyone has some first hand experience. In particular I'm interested i possible pitfalls that are not immediately apparent from the documentation.

The framework relies heavily on eventsourcing, which means that all state changes are >written to the data store as events. "
This is completely untrue, it does not rely heavily on event-sourcing. One of the implementations for storing the aggregate in this framework use Event-Sourcing but you can easily use also the classes provided to use a standard relational model.
It is just better with event-sourcing.
So you have a historical reference of all your data. This is nice but makes changing your >domain after you've gone in production a very daunting proposition especially if you sold >the customer on the system's "strong auditability" "
I don't think it is a lot easier with a standard relational model that only stores the current state.
The framework encourages denormalizing your data, to the point that some have suggested >having a table per view in the application. This makes your application extremely >difficult to maintain, especially when the original developers are gone"
This is unrelated to the framework but to the architectural pattern in use (CQRS).
And sorry to mention that but having one denormalizer/view is a good idea as it stays a simple object.
So maintenance is easy because SQL request/insertion as also easy.
So this argument is not very strong.
How about a view which uses a 1000 tables model with inner joins everywhere and complex SQL queries?
Again, CQRS helps because, basically, the view data is just a SELECT * from the table which correspond to the view.
if somehow you made a mistake in one of the eventhandlers, your only option is to >"replay" the eventlog, which depending on the size of your data can take a very long >time. The tooling for this however is non-existent.
I agree on the point that currently there is a lack of tooling to replay events and that this can take a long time. However, it is theoretically possible to only replay a portion of the event and not all the content of the event store.
Replaying can have side effects, so >developers become scared of doing it
Replaying event have side effects -> that's untrue. For me side effects means modifying the state of the system. In an event-sourced CQRS application, the state is stored in the event-store. Replaying the events does not modify the event store.
You can have side effect on the query side of the model yes. But you don't care if you have made a mistake because you are still able to correct it and replay the event once again.
it's extremely easy to have developers mess up using this framework. if they don't store >changes to domain objects in events, next time you replay your events you are in for a >surprise.
Well if you misused and misunderstand the architecture, the concept, etc. then ok I agree with you. But perhaps the problem is not the framework here.
Should you store delta's ? absolute values ? if you don't keep tabs on your developers >you are bound to end up with both and you will be f***ed
I can say that for every system I would say that it's unrelated directly to the framework itself. It's like saying, "Java is crap because you can messed up everything if someone codes a bad implementation of hashCode and equals methods."
And for the last part of your comment, I already seen samples like helloWorld with the Spring framework.
Of course it is completely useless in a simple example.
Be careful in your comment to make a difference between the concept (CQRS + EventSourcing) and the framework. Make a difference please.

Since you have stated that you want to use CQRS for your project (and I assume that the JVM is your target platform) I think Axon Framework is an excellent choice.
I have built a fairly complex trading platform on it (no, the trading sample is not complex) and I have not seen any obvious flaws of the framework.
Since I use EventSourcing, the test fixtures made it very easy to write BDD style "given, when, then" tests. This lets you treat an aggregate as a black box and concentrate on checking that the correct set of events come out when you put in a certain command.
About pitfalls: before jumping in, make sure
That you have the concepts of CQRS figured out.
Make a list (paper, whiteboard, whatever) of all your aggregates, command handlers, event handlers, sagas, commands and events. This is the hard part of building your system, figuring out what it should do and how. After this, the reference manual should show you how to wire it all together with Axon.
Some non Axon specific points:
Being able to rebuild the view store from events is a concept of EventSourcing, and not something that is exclusive to Axon, but I found it pretty easy to create a service that will send me all events from an aggregate type, aggregate id or a certain event type.
Being able to build a new reporting component one year after the project is launched and instantly get reports on data from the time of the project launch and onwards is awesome.

I've been using AxonFramework for more than one year on a complex project developed for a big bank.
The requirements were demanding, customer's expectations were high, and release times narrow.
I've choosed AxonFramework because, at the project kick off moment, it was the most complete and the best documented implementation of CQRS available in Java, well designed, easy to integrate, to test and to extend.
After more than one year I think that these considerations are still valid and current.
Another consideration has guided my choice: I wanted that the commitment on such a difficult project to become a training opportunity for me and other members of the team.
We started to develop with AxonFramework version 1.0 and moved to version 1.4 as newer versions were released.
Our team experience with CQRS and the implementation provided by the AxonFramework was absolutely positive.
It provided us with a consistent and uniform manner to develop each feature that guided us and make you feel at ease.
Without it some features of the application would have been much more complicated to develop.
I am referring mainly to the various long-running processes that need to be handled and to the related compensation logic, but also to the many business logics pieces that have been necessary, here and there, that fitted nicely and uncoupled in the event driven architecture promoted by CQRS.
Our choice was to be conservative in the write model, so we preferred a JPA based persistence instead of the event sourced one.
The query model is made up of views. We have tried to make sure that each view contains all the required data from a single page using intermediate views when necessary.
Anyhow we developed the write model as we were applying event sourcing, so we take care of modifying the state of aggregates exclusively through events. When the customer asked for a cloning function of a very complex aggregate it was just a matter of replaying the source events (with uuid translated) to a brand new instance - the down side in this case have been the events upcasting (but this functionality was greatly improved in the imminent 2.0 version).
As in each project during the development we found a lot of bugs, in our code mainly, but also in components supposed to be mature and stable, like the application server, the IoC container, the cache, the workflow engine and some of the other libraries that are easily to be found in any large J2EE application.
As any other human product AxonFramework was not immune to bugs too, but surprisingly for a young and niche project like this, they have been few, not critical, and quickly resolved by new releases.
The kind and immediate support provided by the author on the mailing list is another invaluable feature and helped me a lot when I was in trouble.
The application was released in production a year ago and is currently maintained and under active development of new features.
The customer is satisfied and asks for more.
When to use AxonFramework is more a matter of when to use CQRS. For a response it's worth to go back to the official documentation: http://www.axonframework.org/docs/1.4/introduction.html#d4e51
In our case definitively it was worth it.

The OP specifically asks about the pitfalls relating to the Axon Framework rather than CQRS. This makes the question difficult to answer, as Axon started as a fairly faithful implementation of the famous book by Eric Evans
The main advantage is that it does exactly what it says on the tin: it handles the hard parts of a CQRS based design for you: aggregates, sagas, event sourcing, command handlers, event handlers, BASE consistency etc. When you follow the best practices, you end up with a highly responsive and horizontally scalable application. If you use it with event sourcing, your data is completely auditable, and at least in theory, you can determine the state your application had at any given point in time. Tooling to do this is not provided; you will have to roll your own.
The main developer of the framework is very approachable and extremely knowledgeable on the subject of high performance and scalable computing in java. He tends to answer every question on the mailing list within a few hours. This is both an advantage and the major pitfall: at this time (early 2014), the Axon Framework depends heavily on one person. The rest of the pitfalls I would like to mention are probably more the result of event sourcing than of CQRS or Axon (as of 2018 the framework is supported by the company Axoniq)
Design your data model very carefully upfront. Though it is easy to add to, making fundamental changes to your datamodel can be very difficult. If you make a fundamental mistake in the datamodel, your application may not perform well, or even fail to work at all. For example, if you choose a tree shaped data model, with one long lived aggregate root at the top, this aggregate may grow very large as it accrues more and more events over time, and it may take a long time to load and store. I don't know what will happen if this goes on until an instance of the aggregate no longer fits in RAM, but I imagine could be bad. Don't do it that way.
Another pitfall (event sourcing related) is that, after a number of revisions, it can become increasingly difficult to reason about the state of an aggregate, as you sometimes have to keep in mind not only what the code does today, but also what it did in the past. This definitely makes replaying (a portion of) the event store to rebuild a view table a non trivial task.
Fixing data errors can be more difficult than with a 'traditional' design. Rather than a simple SQL statement, you will often need to make a command to change the state of your application. If the error in your data was caused by a faulty event handler, you can usually just fix the bug, clear the snapshots and let he events for the aggregate be replayed. If your bug caused spurious events to be applied, it can me much more trouble to fix. The faulty events will stay in the event store, and you may have to apply some new ones to restore your data to the correct state, or change the code to ignore or fix their behaviour.

While the framework itself is written decent enough, using it in a real world project has been nothing short of a nightmare and the choice of this framework imo was a major contributing factor to this project's failing.
The framework relies heavily on eventsourcing, which means that all state changes are written to the data store as events. So you have a historical reference of all your data. This is nice but makes changing your domain after you've gone in production a very daunting proposition especially if you sold the customer on the system's "strong auditability"
You cannot have ops guys make ad-hoc changes to the database
The framework encourages denormalizing your data, to the point that some have suggested having a table per view in the application. This makes your application extremely difficult to maintain, especially when the original developers are gone
if somehow you made a mistake in one of the eventhandlers, your only option is to "replay" the eventlog, which depending on the size of your data can take a very long time. The tooling for this however is non-existent. Replaying can have side effects, so developers become scared of doing it
it's extremely easy to have developers mess up using this framework. if they don't store changes to domain objects in events, next time you replay your events you are in for a surprise. Should you store delta's ? absolute values ? if you don't keep tabs on your developers you are bound to end up with both and you will be f***ed
There is practically no adoption of this framework, so googling for answers will not do you any good
Even though the framework does not yet support distribution it's written with it in mind and the api's are a pain to work with because of it. Firing off an event is async by default and if you want to check if an exception was raised executing the command, say a duplicate username exception, you need to pass in a listener to your commandhandler which is a future, then you wait for the future's result to come in, handle any checked exceptions, interuptedexception etc and then you can grab the exception that was thrown from the future. Ofcourse which exceptions a command can raise is not apparent from the api. Defeating the purpose of checked exceptions
Check out some of the example apps. I somehow need a unit of work listener to create an addressbook application? My goodness...

I am currently with a team working on an online casino platform launching our brand Casumo this summer. The domain and platform is build using Axon Framework and so far it it has served us solidly.
A lot of time has been saved not having to build all the infrastructure needed for command handling, event routing, event sourcing, snapshoting etc and the APIs are really nice to work with. The one bug we found in the framework so far was fixed in .. release 12 hours later and Allard is always quick to take suggestions on new features and discussing ways to leverage the framework to fulfill your needs.

Anyone using JavaSpaces technology? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Are there real practical uses of JavaSpaces technology out there and how exactly is it implemented?

We are currently using javaspaces (the Sun outrigger implementation), to coordinate loosely coupled processes. The idea behind it is compelling, and the API is very simple. The actual implementation has been a problem. It's built on Jini, so 5 or 6 processes are required to bring up a space. And, at least in Sun's implementation, there is no way to have it communicate over specific ports, which makes firewalls a bit of a pain.
The other issue that we have run into is that there is no implied ordering in the space. So if you put 5 objects in, and your template on the read/take matches all 5, it is unspecified which one you will get. Depending on the application, this may or may not be an issue.

GigaSpaces is a mature version of JavaSpaces. It is widely used in financial applications, which are kept quiet.
As for the Implementation it is basically an transactional Object database on top of Jini. The queries are similar to db4o.

I've seen it used in a financial application, mostly for managing compute workers (grid style) where entries were written into the space from front-tier applications and pulled out by workers by matching on a field showing work was needed. Results could be written back into the space, triggering a notify registered by the front-tier app which then reads back the finished piece of work.
For compute workers it's OK, but lack of ordering may be an issue for you (if only because of unpredictability) - some implementations have features to enforce FIFO ordering. It was also used for long term data storage as it was persistent, but I don't think that was a good idea. The admin tooling wasn't good enough to make it manageable and performance suffered due to the volume of data.
Dan Creswell's Blitz JavaSpaces implementation was used - it's got a good range of features (can run in transient or persistent modes), is designed to be robust (with transaction logging) and retain high performance, and it's very tunable. As with the other Jini services, you can configure the "exporter" to have it listen on specific ports to make firewalling easier - SSL transports and full PKI were used too and are made possible by Jini's abstraction of communication.
I think Gigaspaces is the only implementation that has continued to innovate by extending the specification in numerous ways, which is nice to see. They've made it fit a wide variety of use-cases and added implementation features such as clustering and high availability. Using it would worry me though, as I'd be much happier seeing two or more implementations of these features in the community, given Gigaspaces is fairly proprietary.

I believe Orbitz which is a reservation system for hotels runs on Jini.
Based on Java Posse episodes #82, #84 and #86 which is an interview with Vin Simmons this technology is sometimes used in military or financial applications which are unfortunatley on the quiet.

I used it a few years back but it probably has not changed much.
#Keith: It is(used to be atleast) possible to start all the services in a single process/JVM and I think there is documentation out there on how to do this.
I believe Jini/Javaspaces is used in a few large applications (ticketing, cell phones etc) in Europe. Also used by GE Aircraft for research and analysis.
SORCER lab at Texas Tech has a large SOA architecture built on top of Jini/Javaspaces and you may be able to find some help there.

I'm not aware of any new usage of JavaSpaces at this point in time. For distributed computing, most large-scale systems are being built with In-Memory Data Grid technology or partitioned NoSQL-like solutions. (I see a lot of Oracle Coherence being used, but that's probably because I work with it.)
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.