Pentaho kettle: how to set up tests for transformations/jobs?

Pentaho kettle: how to set up tests for transformations/jobs? - java

I've been using Pentaho Kettle for quite a while and previously the transformations and jobs i've made (using spoon) have been quite simple load from db, rename etc, input to stuff to another db. But now i've been doing transformations that do a bit more complex calculations that i would now like to test somehow.
So what i would like to do is:
Setup some test data
Run the transformation
Verify result data
One option would probably be to make a Kettle test job that would test the transformation. But as my transformations relate to a java project i would prefer to run the tests from jUnit. So i've considered making a jUnit test that would:
Setup test data (using dbunit)
Run the transformation (using kitchen.sh from command line)
Verify result data (using dbunit)
This approach would however require test database(s) which are not always available (oracle etc. expensive/legacy db's) What I would prefer is that if I could mock or pass some stub test data to my input steps some how.
Any other ideas on how to test Pentaho kettle transformations?

there is a jira somewhere on jira.pentaho.com ( i dont have it to hand ) that requests exactly this - but alas it is not yet implemented.
So you do have the right solution in mind- I'd also add jenkins and an ant script to tie it all together. I've done a similar thing with report testing - I actually had a pentaho job load the data, then it executed the report, then it compared the output with known output and reported pass/failure.

If you separate out your kettle jobs into two phases:
load data to stream
process and update data
You can use copy rows to result at the end of your load data to stream step, and get rows from result to get rows at the start of your process step.
If you do this, then you can use any means to load data (kettle transform, dbunit called from ant script), and can mock up any database tables you want.
I use this for testing some ETL scripts I've written and it works just fine.

You can use the data validator step. Of course is not a full unit test suite, but i think sometimes will be useful to check the data integrity in a quick way.
You can run more than several tests at once.
For a more "serious" test i will recommend the #codek answer and execute your kettles under Jenkins.

Related

Clean up data after test

In the process of testing, we all do create test (dummy) data on different env levels like (dev, QA, cluster & sometimes staging). When you have a small number of tests everything is fine, but when you have a large number of tests and when they are executed in parallel as in my case tests are interfering because they're using the same data.
So my goal here is to isolate every test and make them independent from other tests. I'm planning to create unique data for each test and every test will work/manipulate with its own data.
The question is how do I clean up all of the data that were created during test execution, so tests will be able to create the same data each time they are executed? Has anyone else been through such a case and found a good solution for this?
My testing framework is built on Java, using Cucumber with Serenity BDD and RestAssured (for testing Web UI and API).
p.s there's a solution I have in mind currently and that's to keep track (something like using session variables) of every object I create whether using Web UI or API and on the final step of every test (by using '#After' method) using API to get the ID's of every object I've created and then do a delete Rest request to delete these items which will delete them from the database too.

How to reload database data in junit test fastly?

I develop an application in Java with Spring and Hibernate. And now I am looking for a solution for fast reloading data between tests. The tests require a lot of data, which are generated and persisted through services. As a db I use hsqldb in memory. Data generating process takes about 30 seconds, so it is too long to simply run it before each test.
So I was wondering if it is good idea and if it is possible with hsqldb to run data loader once at the begining of test case or suite, then create a dump and restore it before each test? I can't find how to create dump in hsqldb, especially if it is in memory db.
I appreaciate all yours help.
EDIT: I have to use database. Let's consider that they re integration tests.

You can use dbUnit to load and clean the database before and after each test but I don't think that will improve your performance.
Instead, I would ask why you need so much data for a unit test? 30 seconds isn't too bad for an integration test that actually hits a database, but I think you should strive to have unit tests that don't hit the database at all and instead use mock objects to simulate interacting with your services. Then you can have a few integration tests that actually use a database but those tests won't have to cover all scenarios since your faster unit tests should do that already.

You can use an HSQLDB file: database with the default MEMORY tables.
After generating the dataset, add the property files_readonly=true to the database.properties file. You then run the tests with this database. This ensures your tests run and modify the data the same way as a mem: database, but the changes made by the tests are not persisted when the test process ends. The original data is loaded in a few seconds in the fastest possible way.

try using this annotation in your test class
#TransactionConfiguration(transactionManager="nameOfYourTransactionManager", defaultRollback=true)
I found it here

How to load integration test data

We've a Java-Tomcat project, using Spring, JPA, with maven build, JUnit for unit tests and TestNG for integration tests.
Some integration tests will require a database, so a new DB is created each time mvn verify is run. The problem is now to populate it to have test data.
Should I look into dbunit, persist the objects myself using JPA, or another way?
How to load test data in the DB, each time integration tests are run to have a stable testing environment?

I'm using dbunit with an in memory database. It's helpful to load the specific test datasets, to run the tests, to verify the database contents after each test and to clean up the database after the test is run.
The "pros" of dbunit would be that it allows you to control the state of the database before and after each test. The "cons" is that you will work with test datasets in a custom xml format, not SQL. You can export from sql to this custom xml format, but you will still need occasionally to manually edit the xml file.

I take a copy of live database and make tests transactional so they are rolled back each time.

We use Dbunit.
We load test data within junit in a #BeforeClass method.
And delete/clean data in a #BeforeClass and a #AfterClass method.

The problem is now to populate it to have test data
As each integration test might need to have different test data, I think that should be done as part of the set-up phase of each of the integration tests.

There are two patterns to consider Fresh Fixture and Shared Fixture. The first one provides better tests isolation as it is about recreating test data for each test case, i.e. assuring a clean state. The later one introduces the risk of tests coupling but is faster as reuses the same instances of test data across many tests. Both are described in details in Meszaros: xUnit Test Patterns.
Regardless of the choice, it may be worth to consider the random data driven approach designed on top of the test-arranger: How to organize tests with Test Arranger. According to my knowledge, it's the cheapest approach with regard to maintenance costs and the required amount of code.

How do I efficiently test SQL/JPQL/HQL?

It is often said when unit testing to dont test the database as that is an integration test (see point 4).
However, SQL/JPQL/HQL encapsulate data store specific logic which is often in string format on how to access data. This free form string data access command can easily go wrong and hence needs to be tested.
How do i efficiently test this sort of logic?

The closest you can get to running a unit test against an SQL (or similar framework) query, is to set up a SQLite database in memory, and run against it.
While that still is technically an integration test, it runs almost as fast as a unit test should.
If you do so, just take care to note the slight differences between SQLite and your real database, and try to make your queries compatible with both.
Hope this helps,
Assaf.

It is not a unit test, but there is nothing with using a unit testing framework like Nunit to test your sql. But it IS important that you keep it separated from the real unit tests. Real unit tests are fast and does not communicate with the outside ... nor do they attempt to alter it by updates, deletes and inserts.

Data-driven tests with jUnit

What do you use for writing data-driven tests in jUnit?
(My definition of) a data-driven test is a test that reads data from some external source (file, database, ...), executes one test per line/file/whatever, and displays the results in a test runner as if you had separate tests - the result of each run is displayed separately, not in one huge aggregate.

In JUnit4 you can use the Parameterized testrunner to do data driven tests.
It's not terribly well documented, but the basic idea is to create a static method (annotated with #Parameters) that returns a Collection of Object arrays. Each of these arrays are used as the arguments for the test class constructor, and then the usual test methods can be run using fields set in the constructor.
You can write code to read and parse an external text file in the #Parameters method (or get data from another external source), and then you'd be able to add new tests by editing this file without recompiling the tests.

This is where TestNG, with its #DataSource, shines. That's one reason why I prefer it to JUnit. The others are dependencies and parallel threaded tests.

I use an in-memory database such as hsqldb so that I can either pre-populate the database with a "production-style" set of data or I can start with an empty hsqldb database and populate it with rows that I need to perform my testing. On top of that I will write my tests using JUnit and Mockito.

I use combination of dbUnit, jMock and jUnit 4. Then you can ether run it as suite or separately

You are better off extending TestCase with a DataDrivenTestCase that suits your needs.
Here is working example:
http://mrlalonde.blogspot.ca/2012/08/data-driven-tests-with-junit.html
Unlike parameterized tests, it allows for nicely named test cases.

I'm with #DroidIn.net, that is exactly what I am doing, however to answer your question literally "and displays the results in a test runner as if you had separate tests," you have to look at the JUnit4 Parameterized runner. DBUnit doesn't do that. If you have to do a lot of this, honestly TestNG is more flexible, but you can absolutely get it done in JUnit.
You can also look at the JUnit Theories runner, but my recollection is that it isn't great for data driven datasets, which kind of makes sense because JUnit isn't about working with large amounts of external data.

Even though this is quite an old topic, i still thought of contributing my share.
I feel JUnit's support for data driven testing is to less and too unfriendly. for eg. in order to use parameterized, we need to write our constructor. With Theories runner we do not have control over the set of test data that is passed to the test method.
There are more drawbacks as identified in this blog post series: http://www.kumaranuj.com/2012/08/junits-parameterized-runner-and-data.html
There is now a comprehensive solution coming along pretty nicely in the form of EasyTest which is a a framework extended out of JUnit and is meant to give a lot of functionality to its users. Its primary focus is to perform Data Driven Testing using JUnit, although you are not required to actually depend on JUnit anymore. Here is the github project for refernece: https://github.com/anujgandharv/easytest
If anyone is interested in contributing their thoughts/code/suggestions then this is the time. You can simply go to the github repository and create issues.

Typically data driven tests use a small testable component to handle the data. (File reading object, or mock objects) For databases, and resources outside of the application mocks are used to similate other systems. (Web services, and databases etc). Typically I see is that there are external data files that handle the data and the output. This way the data file can be added to the VCS.

We currently have a props file with our ID numbers in it. This is horribly brittle, but is easy to get something going. Our plan is to initially have these ID numbers overridable by -D properties in our ant builds.
Our environment uses a legacy DB with horribly intertwined data that is not loadable before a run (e.g. by dbUnit). Eventually we would like to get to where a unit test would query the DB to find an ID with the property under test, then use that ID in the unit test. It would be slow and is more properly called integration testing, not "unit testing", but we would be testing against real data to avoid the situation where our app runs perfectly against test data but fails with real data.

Some tests will lend themselves to being interface driven.
If the database/file reads are retrieved by an interface call then simply get your unit test to implement the interface and the unit test class can return whatever data you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pentaho kettle: how to set up tests for transformations/jobs? - java

Related

Clean up data after test

How to reload database data in junit test fastly?

How to load integration test data

How do I efficiently test SQL/JPQL/HQL?

Data-driven tests with jUnit

Categories

Resources