We have an issue where a Database table has to be updated on the status for a particular entity. Presently, its all Java code with a lot of if conditions and an update to the status. I was thinking along lines of using a Workflow engine since there can be multiple flows in future. Is it an overkill to use a Workflow Engine here... where do you draw the line ?
It depends on the complexity of your use case.
In a simple use case, we have a database column updated by multiple consumers for each stage in an Order lifecycle. This is done by a web service calling into the database.
The simple lifecycle goes from ACKNOWLEDGED > ACCEPTED/REJECTED > FULFILLED > CLOSED. All of these are in the same table on the same column. This is executed in java classes with no workflow.
A workflow engine is suited in a more complex use case which involves actions on multiple data providers eg: database or Content Mgmt or Document Mgmt or search engine, multiple parallel processes, forking based on the success/failure of a previous step, sending an email at a certain step, offline error alerting.
You can look at Apache ODE to implement this.
We have an issue where a Database table has to be updated on the status for a particular entity. Presently, its all Java code with a lot of if conditions and an update to the status.
Sounds like something punctual, no need for orchestrating actions among workflow participants.
Maybe a rule engine is better suited for this. Drools could be a good candidate. When X then Y.
If you're using Spring, this is a good article on how to implement your requirement
http://www.javaworld.com/javaworld/jw-04-2005/jw-0411-spring.html
I think you should consider a workflow engine. Workflow should be separated from application logic.
Reasons:
Maintainable: Easier to modify, add new flows and even easier to replace by another workflow engine.
Business Process management: Workflows are mostly software representations of BPM. So it is usually designed by process designers (Non-tech people). So it is not a good idea to code inside the application. Instead BPM products such as ALBPM or JPBM should be used which support graphical workflow designs.
Monitoring business flows: They are often monitored by the Top level managers and used to make strategic decisions.
Easier for Data mining/Reports/Statistics.
ALBPM(Now Oracle BPM): is a commercial tool from Oracle suitable for large scope projects.
My recommendation is JBPM. Open source tool from JBOSS. Unlike ALBPM which requires separate DB and application server, it can be packaged with your application and runs as another module in your application. I think suitable for your project.
Related
I am trying to provide the users a way to generate their own workflow as part of the system.
These workflows will be custom paths an order will take depending on customer requirements.
For example: If a customer requires us to sign a set of terms and conditions, the order should not be able to be approved unless a T&C document has been uploaded.
I have been looking at using bpmn-js for the frontend and execute the output BPMN2.0 file every time something changes that is related to the workflow (i.e. hooks on when documents are uploaded in this case) but it doesn't look like the users will be able to select actual system functionality with that library out-of-the-box. Should I try to extend that library or is there something else I could use instead?
I have been looking into using Camunda as well, but it would be nice to not expect the users to use a second application.
Design your application in a way that the business process will be driven by the process engine. When a process is initiated it start a process instance. From there the process engine (embedded in your existing Java application or standalone back-end service) determines if business rules (DMN) need to be evaluated (all required data and approvals present?), services need to be called (invocation of your java code directly), automated (external task pattern) or user task need to be completed by technical workers (any language) or human process participants (based on assignment determined by engine).
If humans are involved, the UI/client queries the process engine for pending user tasks and updated those tasks when humans have performed the tasks.
The process engine will determine the next steps based on the interpretation (no code is generated) of the BPMN2-standard-based process model. Versioning for these is provided ootb. Newly started processes are automatically started on the latest version while running process instances continue their life-cycle on the version they were started on (unless they are migrated).
It is usually not an issue if users need to access a dedicated modelling environment at design time. They just should not have to work with two applications at runtime. Anyway, it is also easily possible to integrate the modelling part into the same application via the BPMN-js library you mentioned.
The "selection of system functionality" is done by selecting the implementation type and setting the corresponding attributes. The BPMN-JS library is generic in this regard. have a look at how this is done in the Camunda Modeler (https://camunda.com/products/camunda-bpm/modeler/):
Java: https://docs.camunda.org/get-started/java-process-app/service-task/
Spring: https://docs.camunda.org/get-started/spring/service-task/
External: https://docs.camunda.org/manual/latest/user-guide/process-engine/external-tasks/
I am researching how to build a general application or microservice to enable building workflow-centric applications. I have done some research about frameworks (see below), and the most promising candidates share a hard reliance upon RDBMSes to store workflow and process state combined with JPA-annotated entities. In my opinion, this damages the possibility of designing a general, data-driven workflow microservice. It seems that a truly general workflow system can be built upon NoSQL solutions like MondoDB or Cassandra by storing data objects and rules in JSON or XML. These would allow executing code to enforce types or schemas while using one or two simple Java objects to retrieve and save entities. As I see it, this could enable a single application to be deployed as a Controller for different domains' Model-View pairs without modification (admittedly given a very clever interface).
I have tried to find a workflow engine/BPM framework that supports NoSQL backends. The closest I have found is Activiti-Neo4J, which appears to be an abandoned project enabling a connector between Activity and Neo4J.
Is there a Java Work Engine/BPM framework that supports NoSQL backends and generalizes data objects without requiring specific POJO entities?
If I were to give up on my ideal, magically general solution, I would probably choose a framework like jBPM and Activi since they have great feature sets and are mature. In trying to find other candidates, I have found a veritable graveyard of abandoned projects like this one on Java-Source.net.
Yes, Temporal Workflow has pluggable persistence and runs on Cassandra as well as on SQL databases. It was tested to up to 100 Cassandra nodes and could support tens of thousands of events per second and hundreds of millions of open workflows.
It allows to model your workflow logic as plain old java classes and ensures that the code is fully fault tolerant and durable across all sorts of failures. This includes local variable and threads.
See this presentation that goes into more details about the programming model.
I think the reason why workflow engines are often based on RDBMS is not the database schema but more the combination to a transaction-safe data store.
Transactional robustness is an important factor for workflow engines, especially for long-running or nested transactions which are typical for complex workflows.
So maybe this is one reason why most engines (like activi) did not focus on a data-driven approach. (I am not talking about data replication here which is covered by NoSQL databases in most cases)
If you take a look at the Imixs-Workflow Project you will find a different approach based on Java Enterprise. This engine uses a generic data object which can consume any kind of serializable data values. The problem of the data retrieval is solved with the Lucene Search technology. Each object is translated into a virtual document with name/value pairs for each item. This makes it easy to search through the processed business data as also to query structured workflow data like the status information or the process owners. So this is one possible solution.
Apart from that, you always have the option to store your business data into a NoSQL database. This is independent from the workflow data of a running process instance as far as you link both objects together.
Going back to the aspect of transactional robustness it's a good idea to store the reference to your NoSQL data storage into the process instance, which is transaction aware. Take also a look here.
So the only problem you can run into is the fact that it's very hard to synchronize a transaction context from a EJB/JPA to an 'external' NoSQL database. For example: what will you do when your data was successful saved into your NoSQL data storage (e.g. Casnadra), but the transaction of the workflow engine fails and a role-back is triggered?
The designers of the Activiti project have also been aware of the problem you have stated, but knew it would be quite a re-write to implement such flexibility which, arguably, should have been designed into the project from the beginning. As you'll see in the link provided below, the problem has been a lack of interfaces toward which to code different implementations other than that of a relational database. With version 6 they went ahead and ripped off the bandaid and refactored the framework with a set of interfaces for which different implementations (think Neo4J, MongoDB or whatever other persistence technology you fancy) could be written and plugged in.
In the linked article below, they provide some code examples for a simple in-memory implementation of the aforementioned interfaces. Looks pretty cool and sounds to perhaps be precisely what you're looking for.
https://www.javacodegeeks.com/2015/09/pluggable-persistence-in-activiti-6.html
My main goal is providing a search application written in jquery that is based on solr. (For those who unfamiliar with solr, just assume its a rest api that can return search result.)
For this goal I wrote many small applications and servlets that each one does an ad-hoc task.
For example:
SearchApp - a jquery app in which an end user can perform searches.
SolrProxy - A java servlet that plays a proxy role between the SearchApp and solr. One of the things it does is logging the user request for later analysis.
StatsApp- a servlet that performs analysis of the user activity and returns a json with the data.
Indexer - a java application that indexes data to solr according to my requirements. in this process it also fetches an SQLServer DB, and then performs some update commands to the DB.
IndexerServlet - an asynchronous servlet that uses Indexer to provide an ability to execute index by http request.
Nutch - an open source project that indexes data to solr for other requirements that are not accomplished in Indexer(3).
(MAYBE) - some service that will perform scheduled Nutch running.
And more components might be added.
It seems a bit wrong to have multiple java projects that each one does a single task, instead of having one project that handles most of the components.
Any ideas and insights on this?
Should I combine all the java apps to a single project? should I use some kind of a fremework for this? or should I live it as it is now?
I don't think it's a bad idea that you have all these separate applications. They all seem to be doing one thing, and doing it well. What you can do, is expose them via a unified interface. So essentially you have a facade that sits in front of all these disparate services that presents an abstract and uniform interface. The consumers of this service will have no idea what sits behind that facade. This is just as well, because now you can discretely update and replace individual components without affecting others. If you had combined all of them into one, you would have to push a new release every time you modified one of the components.
I've been asked to make an ETL-style application that transfers information from one data source to another. At the moment, I've decided to use a three-layer architecture but I would like to find out more about the best practices as well as the life cycle described on this wikipedia page:
http://en.wikipedia.org/wiki/Extract,_transform,_load
Four-layered approach for ETL architecture design
Functional layer: Core functional ETL processing (extract, transform, and load).
Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
Utility layer: Common components supporting all other layers.
Real-life ETL cycle
The typical real-life ETL cycle consists of the following execution steps:
Cycle initiation
Build reference data
Extract (from sources)
Validate
Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
Stage (load into staging tables, if used)
Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
Publish (to target tables)
Archive
Clean up
I don't know what your situation is or what your requirements are, but you're likely over thinking the problem.
The name alone is "the" architecture:
Extract
Transform
Load
Exporting a DB table to a CSV can be considered "ET" while loading the CSV is the "L". Most ETL problems are simply not complicated.
Beyond that, you should grab any of the 1 or 2 million ETL and ESB packages already available in Java, free and commercial, libraries and full boat processing systems, and simply adopt one of them that you like best.
Get a white board, string some bubbles together with lines and turn that in to code.
To answer the question, "What's the best practice?" the answer depends on what you are trying to accomplish.
To simplify let's assume you are doing one of the following:
You are building a data warehouse that will restructure the data in some way
You are moving data from point A to point B, but you are not restructuring the data
When I use the word "restructuring", I mean changing the grain or lowest level of detail of a table.
For 1. The ten steps outlined in your question is generally followed. General best practices:
As much transformation logic as possible is pushed onto database resources, not ETL software (ETL software is generally slower)
Validate, Transform, and Audit steps are used to employ whatever Master Data Management (MDM) standards your organization uses
For 2. This is much more straightforward so either method outlined in your question can be used.
I'm asking for a suitable architecture for the following Java web application:
The goal is to build several web applications which all operate on the same data. Suppose a banking system in which account data can be accessed by different web applications; it can be accessed by customers (online banking), by service personal (mostly read) and by the account administration department (admin tool). These applications run as separate web applications on different machines but they use the same data and a set of common data manipulation and search queries.
A possible approach is to build a core application which fits the common needs of the clients, namely data storage, manipulation and search facilities. The clients can then call this core application to fulfil their requests. The requirement is the applications are build on top of a Wicket/Spring/Hibernate stack as WARs.
To get a picture, here are some of the possible approaches we thought of:
A The monolithic approach. Build one huge web application that fits all needs (this is not really an option)
B The API approach. Build a core database access API (JAR) for data access/manipulation. Each web application is build as a separate WAR which uses the API to access a database. There is no separate core application.
C RMI approach. The core application runs as a standalone application (possibly a WAR) and offers services via RMI (or HttpInvoker).
D WS approach. Just like C but replace RMI with Web Services
E OSGi approach. Build all the components as OSGi modules and which run in an OSGi container. Possibly use SpringSource dm Server or ModuleFusion. This approach was not an option for us for some reasons ...
Hope I could make clear the problem. We are just going the with option B, but I'm not very confident with it. What are your opinions? Any other solutions? What are the drawbacks of each solution?
I think that you have to go in the oppposite direction - from the bottom up. Of course, you have to go forth and back to verify that everything is playing, but here is the general direction:
Think about your data - DB scheme, how transactions are important (for example in banking systems everything is about transactions) etc.
Then define common access method - from set of stored procedures to distributed transaction engine...
Next step is a business logic/presentation - what could be generalized and what is a subject of customization.
And the final stage are the interfaces, visualisation and reports
B, C, and D are all just different ways to accomplish the same thing.
My first thought would be to simply have all consumer code connecting to a common database. This is certainly doable, and would eliminate the code you don't want to place in the middle. The drawback, of course, is that if the schema changes, all consumers need to be updated.
Another solution you may want to consider is giving each consumer its own database, using some sort of replication to keep them in sync.
It looks like A and E are out of the picture as you have stated in your question for various reasons. Option A would be one huge application which would make maintenance difficult in the future.
B, C and D are essentially the same architecturally since they involve remote access to common libraries from the various web applications, the only difference is the transport mechanism. I would recommend implementing this in EJB 3 or Spring if possible instead of with your own RMI libraries since either of these provide a good framework over RMI / Webservices.
So I think this problem basically boils down to the following two options:
1) Include the business and DAO layer classes as a common jar included in the deployment of all web applications.
Advantages:
Deployment is easier.
Applications will perform better initially since there is no remote access to other servers.
Disadvantages:
You cannot add more hardware to the middle tier specifically (service and DAO layers) since it is included in each web application.
Other business teams in the organisation will not have access to your business services since there is no remote interface.
2) Deploy the business service and DAO layer classes in a separate application server and expose business methods remotely.
Advantages:
You can scale up the business service and DAO layer as needed depending on load from the various web applications calling it.
Other applications in the organisation can make use of your interfaces if needed.
More scalable
You get all the advantages of Java EE.
Disadvantages:
More complex deployment.
Another server to maintain and monitor.
Could be slower since calls will be made over the network although this shouldn't be too much of a problem.
In both cases if the interfaces change the client code will need to change so this isn't a factor in the decision. Transactions should be handled on the business service method level so this shouldn't be a factor either.
I think it depends on the size of the applications as well and how scalable the solution needs to be to warrant the extra complexity of option 2 above.
I think you need to have a separate application that all the client applications will use as their data layer. The reason for this is that you want to ensure they're all accessing the database in the same way. There are also some race conditions you can get into that database transactions may not be able to prevent. The other reason is that using the database as a form of RPC is a known antipattern. If all your apps access the database directly, you will almost inevitably end up with some "event" table that the various applications poll periodically... don't do that.
Apart from the provided responses, if you are considering having multiple applications working with the database at the same time, consider a distributed cache as part of your solution, as well. The beauty of the distributed cache is that it can be accessed by multiple applications at the same time, apart from being distributed. I am not sure if this holds true for all of the Java variations, such as Ehcache, etc, as I do not come from a Java background.
What we are currently doing is abstracting the data a level further than before. We now have a DAL that can be accessed directly, but we have put a "Model Factory" in front of the DAL. The purpose of the Model Factory is to broker both the cache and the data layer, acting as a passthrough. So, the caller always calls the Model Factory and not the DAL or caching code directly. This abstraction layer will basically retrieve data from the DAL on a cache miss without adding the complexity to the API.