I have a general design problem regarding Cucumber-
I'm trying to build some cucumber scenarios around a specific external process that takes some time. Currently, the tests look like this:
Given some setup
When I perform X action
And do the external process
Then validate some stuff
I have a number of these tests, and it would be massively more performant if I could do the external process just once for all these scenarios.
The problem I'm running into is that it doesn't seem like theres any way to communicate between scenarios in cucumber.
My first idea was to have each test running concurrently and have them hit a wait and poll the external process to see if it's running before proceeding, but I have no way of triggering the process once all the tests are waiting since they can't communicate.
My second idea was to persist data between tests. So, each test would just stop at the point the process needs to be run, then somehow gets their CucumberContext to a follow up scenario that validates things after the process. However, I'd have to save this data to the file system and pick it up again, which is a very ugly way to handle it.
Does anyone have advice on either synchronizing steps in cucumber, or creating "continuation" scenarios? Or is there another approach I can take?
You can't communicate data between scenarios, nor should you try to. Each scenario (by design) is its own separate thing, which sets and resets everything.
Instead what you can do is improve the way you execute your external process so instead of doing it each time, you use the results of it being done once, and then re-use that result in future executions of the scenario.
You could change your scenarios to reflect this e.g.
Given I have done x
And the external process has been run for x
Then y should have happened
You should also consider the user experience of waiting for the external process. For new behaviours you could do something like
When I do x
Then I should see I am waiting for the external process
and then later do another scenario
Given I have done x
And the external process has completed
Then I should see y
You can use something like VCR to record the results of executing your external process. (https://rubygems.org/gems/vcr/versions/6.0.0)
Note: VCR is ruby specific, but I am sure you can find a java equivalent.
Now that your external process executes pretty much instantly (a few milliseconds) your no longer have any need to share things between scenarios.
Related
I have a Java application named 'X'. In Windows environment, at a given point of time there might be more than one instance of the application.
I want a common piece of code to be executed sequentially in the Application 'X' no matter how many instances of the application are running. Is that something possible and can be achieved ? Any suggestions will help.
Example :- I have a class named Executor where a method execute() will be invoked. Assuming there might be two or more instances of the application at any given point of time, how can i have the method execute() run sequential from different instances ?
Is there something like a lock which can be accessed from two instances and see if the lock is currently active or not ? Any help ?
I think what you are looking for is a distributed lock (i.e. a lock which is visible and controllable from many processes). There are quite a few 3rd party libraries that have been developed with this in mind and some of them are discussed on this page.
Distributed Lock Service
There are also some other suggestions in this post which use a file on the underlying system as a synchornization mechanism.
Cross process synchronization in Java
To my knowledge, you cannot do this that easily. You could implement TCP calls between processes... but well I wouldn't advice it.
You should better create an external process in charge of executing the task and a request all the the tasks to execute by sending a message to a JMS queue that your executor process would consume.
...Or maybe you don't really need to have several processes running in the same time but what you might require is just an application that would have several threads performing things in the same time and having one thread dedicated to the Executor. That way, synchronizing the execute()method (or the whole Executor) would be enough and spare you some time.
You cannot achieve this with Executors or anything like that because Java virtual machines will be separate.
If you really need to synchronize between multiple independent instances, one of the approaches would be to dedicate internal port and implement a simple internal server within the application. Look into ServerSocket or RMI is full blown solution if you need extensive communications. First instance binds to the dedicated application port and becomes the master node. All later instances find the application port taken but then can use it to make HTTP (or just TCP/IP) call to the master node reporting about activities they need to do.
As you only need to execute some action sequentially, any slave node may ask master to do this rather than executing itself.
A potential problem with this approach is that if the user shuts down the master node, it may be complex to implement approach how another running node could take its place. If only one node is active at any time (receiving input from the user), it may take a role of the master node after discovering that the master is not responding and then the port is not occupied.
A distributed queue, could be used for this type of load-balancing. You put one or more 'request messages' into a queue, and the next available consumer application picks it up and processes it. Each such request message could describe your task to process.
This type of queue could be implemented as JMS queue (e.g. using ActiveMQ http://activemq.apache.org/), or on Windows there is also MSMQ: https://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx.
If performance is an issue and you can have C/C++ develepors, also the 'shared memory queue' could be interesting: shmemq API
I have 2 java processes, Process1 is responsible for importing some external data to the database, Process2 is running the rest of the application using the same database, i.e. it hosts the web module the everything else. Process1 would normally import data once a day.
What I require is when Process1 has finished it's work it should notify the Process2 about it, so that it can perform some subsequent tasks. That is it, this will be their limit of interaction with each other. No other data has to be shared later.
No I know I can do this in one of the following ways:
Have the Process1 write an entry in the database when it has finished its execution and have a demon thread in Process2 looking for that entry. Once this entry is read, complete the task in Process2. Even though this might be the easiest to implement in the existing ecosystem, I think having a thread loop the database just for one notification looks kind of ugly. However, it could be optimised by starting the thread only when the import job starts and killing it after the notification is received.
Use a socket. I have never worked with sockets before, so this might be an interesting learning curve. But after my initial readings I am afraid it might be an overkill.
Use RMI
I would like to hear from people who have worked on similar problems, and what approach they choose and why and also would like to know what will be an appropriate solution for my problem.
Edit.
I went through this but found that for a starter in interprocess communication it lacks basic examples. That is what I am looking in this post.
I would say take a look at Chronicle-Queue
It uses a memory mapped file and saves data off-heap (so no problem with GC). Also, Provides TCP replication for failover scenarios.
It scales pretty well and supports distributed processing when more than one machine is available.
I am stress-testing a domino java agent which modifies potentially many documents in potentially many databases. I am load-testing the agent with huge databases and my agents are being shut down by the agent manager as they last longer than the specified input in the server document 'Max LotusScript/Java execution time:'.
I am aware that I can write a program document to let the agent run without any timing but don't want to do this since you lose the handle to the agent.
I am aware that I need to program the agent so that I can save the 'task' document (which contains all the instructions for the agent) in an 'unfinished' state so that I can start from where I stopped.
Writing LotusScript agents, there was a possibility of writing cleanup code in the 'Terminate' event of the agent, and I am missing this option for my java agent.
At the moment my best idea is to have a 'timeout' field in my configuration, which would be filled by a value smaller than the server cut-off time. This would imply, however, that I would be asking at very regular intervals the question 'Do I still have time to start the next action?' which I assume is going to kill the performance.
What's your experience with best practise for this case?
Apart from DOTS and a Java Application approach, here are two other alternatives.
Option 1: This is where you want to use a program document and still have some visibility to interact with your agent.
Add checks in your code to check either a file on disk or a document field. If the file is there, or field set then tell your application to start cleaning up.
There would be more overhead on checking a document then checking a file on disk.
Option 2: You can use a java.util.Timer object.
Have this set to execute for ServerMaximumTimeout - X minute/s. In the timer code throw a TimeoutException. Have your main code catch this Exception and do the clean up.
Then in your finally block clean up the timer object if it hasn't died yet.
More details on this in another question.
I am looking for ideas on how to deal with a search related task which takes more than usual time (in human terms more than 3 seconds)
I have to query multiple sources, sift through information for the first time and then cache it in the DB for later quick return.
The context of the project is J2EE, Spring and Hibernate (on top of SpringROO)
The possible solutions I could think of
-On the webpage let the user know that task is running in background, if possible give them a queue number or waiting time. Refresh the page via a controller which basically checks if the task is done, then when its done (ie the search result is prepared and stored in DB) then just forward to a new controller and fetch the result from the DB
-The background tasks could be done with Spring Task executor. I am not sure if it is easy to give a measure of how long it would take. It would probably be a bad idea to let all the search terms run concurrently, so some sort of pooling will be a good idea.
-Another option to use background tasks is to use JMS. This is perhaps a solution with more control (retries etc)
-Spring batch also comes to mind
Please suggest how you would do it. I would greatly appreciate a semi-detailed+ description. The sources of info can be man and can be sequential in nature so it can take upto 4-5 minutes for the results to form. It is also possible that such tasks run automatically in the background without user intervention (ie to update from the sources)
From a user perspective, I use AJAX. The default web page contains some kind of "Busy" indicator. When the AJAX request completes, the busy indicator is replaced with the result.
In the background, request handlers are already multi-threaded. So you can simply format the default result, close&flush the output, and do the processing in the current thread. You should put something in the session or DB to make sure that no one can start the same heavy process a second time.
Running task pools in a web container is possible but there are some caveats, especially how to synchronize startup/shutdown: Do you want your web server to "hang" during shutdown while some thread is busy collecting your results? Also the additional load should be considered. It might be better to use JMS and offload the strain to a second server dedicated to build the search results.
Such a system will scale much better if your searches start to become a burden. It also makes it trivial to automate the process by writing a small program which posts searches in the JMS queue.
I've solved this problem in the past doing something like this:
When the user initiates a long running task, I open a popup window that displays the task status. The task status includes a name and estimated time to complete
This task is also stored in my "app" (this can be stored in the DB, session, or application context), so the user can continue doing other things on my web app while having an easy way to navigate back to the running task.
I stored my tasks in a DB, so I could manage what happens on startup and shutdown of the web app. This requires storing the progress of the task in the DB.
The tricky part is display results to the user. If you use the method I've described, you'll need to store results in either the DB, session, or application contexts.
This system I've described is pretty heavyweight, and may be overkill for your application.
In response to the comment
so what do you use to do the
background computing. I have asked
this before
I use java.util.concurrent. A lot of this depends on the nature of your application. Is the task (or steps in the task) idempotent? How critical is it that it run to completion? If you have a non-idempotent task that must run to completion, I would say you generally must record every piece of work you do, and you must do that piece of work within a transaction. For example, if one of your tasks is to email a list of people (this is definitely not idempotent) you would do the emailing in a "transaction" (I'm using the term lightly here) and store your progress after each transaction is complete.
In podcast #15, Jeff mentioned he twittered about how to run a regular event in the background as if it was a normal function - unfortunately I can't seem to find that through twitter. Now I need to do a similar thing and are going to throw the question to the masses.
My current plan is when the first user (probably me) enters the site it starts a background thread that waits until the alloted time (hourly on the hour) and then kicks off the event blocking the others (I am a Windows programmer by trade so I think in terms of events and WaitOnMultipleObjects) until it completes.
How did Jeff do it in Asp.Net and is his method applicable to the Java web-app world?
I think developing a custom solution for running background tasks doesn't always worth, so I recommend to use the Quartz Scheduler in Java.
In your situation (need to run background tasks in a web application) you could use the ServletContextListener included in the distribution to initialize the engine at the startup of your web container.
After that you have a number of possibilities to start (trigger) your background tasks (jobs), e.g. you can use Calendars or cron-like expressions. In your situation most probably you should settle with SimpleTrigger that lets you run jobs in fixed, regular intervals.
The jobs themselves can be described easily too in Quartz, however you haven't provided any details about what you need to run, so I can't provide a suggestion in that area.
As mentioned, Quartz is one standard solution. If you don't care about clustering or persistence of background tasks across restarts, you can use the built in ThreadPool support (in Java 5,6). If you use a ScheduledExecutorService you can put Runnables into the background thread pool that wait a specific amount of time before executing.
If you do care about clustering and/or persistence, you can use JMS queues for asynchronous execution, though you will still need some way of delaying background tasks (you can use Quartz or the ScheduledExecutorService to do this).
Jeff's mechanism was to create some sort of cached object which ASP.Net would automatically recreate at some sort of interval - It seemed to be an ASP.Net specific solution, so probably won't help you (or me) much in Java world.
See https://stackoverflow.fogbugz.com/default.asp?W13117
Atwood: Well, I originally asked on Twitter, because I just wanted something light weight. I really didn't want to like write a windows service. I felt like that was out of band code. Plus the code that actually does the work is a web page in fact, because to me that is a logical unit of work on a website is a web page. So, it really is like we are calling back into the web site, it's just like another request in the website, so I viewed it as something that should stay inline, and the little approach that we came up that was recommended to me on Twitter was to essentially to add something to the application cache with a fixed expiration, then you have a call back so when that expires it calls a certain function which does the work then you add it back in to the cache with the same expiration. So, it's a little bit, maybe "ghetto" is the right word.
My approach has always been to have to OS (i.e. Cron or the Windows task scheduler) load a specific URL at some interval, and then setup a page at that URL to check it's queue, and perform whatever tasks were required, but I'd be interested to hear if there's a better way.
From the transcript, it looks like FogBugz uses the windows service loading a URL approach also.
Spolsky: So we have this special page called heartbeat.asp. And that page, whenever you hit it, and anybody can hit it at anytime: doesn't hurt. But when that page runs it checks a queue of waiting tasks to see if there's anything that needs to be done. And if there's anything that needs to be done, it does one thing and then looks in that queue again and if there's anything else to be done it returns a plus, and the entire web page that it returns is just a single character with a plus in it. And if there's nothing else to be done, the queue is now empty, it returns a minus. So, anybody can call this and hit it as many times, you can load up heartbeat.asp in your web browser you hit Ctrl-R Ctrl-R Ctrl-R Ctrl-R until you start getting minuses instead of pluses. And when you've done that FogBugz will have completed all of its maintenance work that it needs to do. So that's the first part, and the second part is a very, very simple Windows service which runs, and its whole job is to call heartbeat.asp and if it gets a plus, call it again soon, and if it gets a minus call it again, but not for a while. So basically there's this Windows service that's always running, that has a very, very, very simple task of just hitting a URL, and looking to see if it gets a plus or a minus and, and then scheduling when it runs again based on whether it got a plus or a minus. And obviously you can do any kind of variation you want on this theme, like for example, uh you could actually, instead of returning just a plus or minus you could say "Okay call me back in 60 seconds" or "Call me back right away I have more work to be done." And that's how it works... so that maintenance service it just runs, you know, it's like, you know, a half page of code that runs that maintenance service, and it never has to change, and it doesn't have any of the logic in there, it just contains the tickling that causes these web pages to get called with a certain guaranteed frequency. And inside that web page at heartbeat.asp there's code that maintains a queue of tasks that need to be done and looks at how much time has elapsed and does, you know, late-night maintenance and every seven days delete all the older messages that have been marked as spam and all kinds of just maintenance background tasks. And uh, that's how that does that.
We use jtcron for our scheduled background tasks.
It works well, and if you understand cron it should make sense to you.
Here is how they do it on StackOverflow.com:
https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/