I am using Neo4j for storing nodes and need to access the Neo4j database across classes which should all be able to connect concurrently to the database.
I currently use
public void setUp()
{
//deleteFileOrDirectory(new File(FILESYSTEM_DB));
graphDb = new GraphDatabaseFactory().newEmbeddedDatabase(FILESYSTEM_DB);
indexManager = graphDb.index();
index = indexManager.forNodes("indexNodes");
registerShutdownHook();
}
to create the database and connect to it, however next time another class tries to run a similar method (or another instance of the same class calls the same setUp() method) I get a quite reasonable
"Error Obtaining Lock (org.neo4j.kernal.StoreLockException)".
How can I check if database is running and if not call newEmbeddedDatabase(FILESYSTEM_DB) otherwise connect to the running instance?
Make sure variables graphDb and others are not local variables but fields of an instance of some class Neo4jConnection. Then, create single instance of that class (singleton), run setUp() once and use that connection whenever you need access the database. How to manage that singleton, depends on your environment (do you use Spring?). Simplest way is to have a static variable referring that singleton. Read https://stackoverflow.com/questions/2832297 and other discussions marked with java+singleton tags
Related
I have an application where I am using Apache Spark 1.4.1 - Standalone cluster. Code from this application has evolved and it's quite complicated (more than a few lines of code as we see in most Apache Spark examples), with lots of method calls from one class to another.
I am trying to add code that when encounters a problem with data (while processing it on the cluster nodes) it notifies an external application. For contacting the external application we have connection details setup in a config file. I want to pass somehow the connection details to the cluster nodes but passing them as parameters to each method that runs on nodes (as parameters or broadcast variable) is not ok for my application because it means that each and every method has to pass them and we've got lots of "chained method calls" (method A calls B, B calls C.....Y calls Z) which is different from most Apache Spark example where we see only one or two method calls.
I am trying to workaround this problem - is there a way to pass data to nodes besides method parameters and broadcast variables ? For example I was looking to setup a env property that would point to the config file (using System.setProperty) and to set it on all nodes, so that I can read connection details on the fly and the code would isolated in one block of code only, but I've got no luck so far.
Actually after some hours of investigation I found a way that really suits my needs. There are two spark properties (one for driver, one for executors) that can be used for passing parameters that can be then read using System.getProperty() :
spark.executor.extraJavaOptions
spark.driver.extraJavaOptions
Using them is more simpler than the approach suggested in above post and you could easily make your application to switch configuration from one environment to another (e.g QA/DEV vs PROD) when you've got all environment setup in your project.
They can be set in the SparkConf object when you're initializing the SparkContext.
The post that helped me a lot in figuring the solution is : http://progexc.blogspot.co.uk/2014/12/spark-configuration-mess-solved.html
The properties you provide as part of --properties-file will be loaded at runtime and will be available only as part of driver but not on any of the executors. But you can always make it available to the executors.
Simple hack:
private static String getPropertyString(String key, Boolean mandatory){
String value=sparkConf.get(key,null);
if(mandatory && value == null ){
value = sparkConf.getenv(key);
if(value == null)
shutDown(key); // Or whatever action you would like to take
}
if(value !=null && sparkConf.getenv(key)==null )
sparkConf.setExecutorEnv(key,value);
return value;
}
First time when your driver kicks, it will find all the properties provided from properties file from sparkconf. As soon as it finds, check whether that key already present in environment if not set those values to executors using setExecutorEnv in your program.
Its tough to distinguish whether your program is in driver or in executor so check whether the property exists in sparkconf if not then check it against environment using getenv(key).
I suggest the following solution:
Put the configuration in a database.
Put the database connection details in a JOCL (Java Object configuration Language) file and have this file available on the class path of each executors.
Make a singleton class that reads the DB connection details from the JOCL, connects to the database, extracts the configuration info and exposes it as getter methods.
Import the class into the context where you have your Spark calls and use it to access the configuration from within them.
In my desktop application new databases get opened quite often. I use Hibernate/JPA as an ORM.
The problem is, creating the EntityManagerFactory is quite slow, taking about 5-6 Seconds on a fast machine. I know that the EntityManagerFactory is supposed to be heavyweight but this is just too slow for a desktop application where the user expects the new database to be opened quickly.
Can I turn off some EntityManagerFactory features to get an instance
faster? Or is it possible to create some of the EntityManagerFactory lazily to speed up cration?
Can I somehow create the EntityManagerFactory object before
knowing the database url? I would be happy to turn off all
validation for this to be possible.
By doing so, can I pool EntityManagerFactorys for later use?
Any other idea how to create the EntityManagerFactory faster?
Update with more Information and JProfiler profiling
The desktop application can open saved files. Our application document file format constists of 1 SQLite database + and some binary data in a ZIP file. When opening a document, the ZIP gets extracted and the db is opened with Hibernate. The databases all have the same schema, but different data obviously.
It seems that the first time I open a file it takes significantly longer than the following times.
I profiled the first and second run with JProfiler and compared the results.
1st Run:
create EMF: 4385ms
build EMF: 3090ms
EJB3Configuration configure: 900ms
EJB3Configuration <clinit>: 380ms
.
2nd Run:
create EMF: 1275ms
build EMF: 970ms
EJB3Configuration configure: 305ms
EJB3Configuration <clinit>: not visible, probably 0ms
.
In the Call tree comparison you can see that some methods are significantly faster (DatabaseManager. as starting point):
create EMF: -3120ms
Hibernate create EMF: -3110ms
EJB3Configuration configure: -595ms
EJB3Configuration <clinit>: -380ms
build EMF: -2120ms
buildSessionFactory: -1945ms
secondPassCompile: -425ms
buildSettings: -346ms
SessionFactoryImpl.<init>: -1040ms
The Hot spot comparison now has the interesting results:
.
ClassLoader.loadClass: -1686ms
XMLSchemaFactory.newSchema: -184ms
ClassFile.<init>: -109ms
I am not sure if it is the loading of Hibernate classes or my Entity classes.
A first improvement would be to create an EMF as soon as the application starts just to initialize all necessary classes (I have an empty db file as a prototype already shipped with my Application). #sharakan thank you for your answer, maybe a DeferredConnectionProvider would already be a solution for this problem.
I will try the DeferredConnectionProvider next! But we might be able to speed it up even further. Do you have any more suggestions?
You should be able to do this by implementing your own ConnectionProvider as a decorator around a real ConnectionProvider.
The key observation here is that the ConnectionProvider isn't used until an EntityManager is created (see comment in supportsAggressiveRelease() for a caveat to that). So you can create a DeferredConnectionProvider class, and use it to construct the EntityManagerFactory, but then wait for user input, and do the deferred initialization before actually creating any EntityManager instances. I'm written this as a wrapper around ConnectionPoolImpl, but you should be able to use any other implementation of ConnectionProvider as the base.
public class DeferredConnectionProvider implements ConnectionProvider {
private Properties configuredProps;
private ConnectionProviderImpl realConnectionProvider;
#Override
public void configure(Properties props) throws HibernateException {
configuredProps = props;
}
public void finalConfiguration(String jdbcUrl, String userName, String password) {
configuredProps.setProperty(Environment.URL, jdbcUrl);
configuredProps.setProperty(Environment.USER, userName);
configuredProps.setProperty(Environment.PASS, password);
realConnectionProvider = new ConnectionProviderImpl();
realConnectionProvider.configure(configuredProps);
}
private void assertConfigured() {
if (realConnectionProvider == null) {
throw new IllegalStateException("Not configured yet!");
}
}
#Override
public Connection getConnection() throws SQLException {
assertConfigured();
return realConnectionProvider.getConnection();
}
#Override
public void closeConnection(Connection conn) throws SQLException {
assertConfigured();
realConnectionProvider.closeConnection(conn);
}
#Override
public void close() throws HibernateException {
assertConfigured();
realConnectionProvider.close();
}
#Override
public boolean supportsAggressiveRelease() {
// This gets called during EntityManagerFactory construction, but it's
// just a flag so you should be able to either do this, or return
// true/false depending on the actual provider.
return new ConnectionProviderImpl().supportsAggressiveRelease();
}
}
a rough example of how to use it:
// Get an EntityManagerFactory with the following property set:
// properties.put(Environment.CONNECTION_PROVIDER, DeferredConnectionProvider.class.getName());
HibernateEntityManagerFactory factory = (HibernateEntityManagerFactory) entityManagerFactory;
// ...do user input of connection info...
SessionFactoryImpl sessionFactory = (SessionFactoryImpl) factory.getSessionFactory();
DeferredConnectionProvider connectionProvider = (DeferredConnectionProvider) sessionFactory.getSettings()
.getConnectionProvider();
connectionProvider.finalConfiguration(jdbcUrl, userName, password);
You could put the initial set up of the EntityManagerFactory on a separate thread or something, so that the user never has to wait for it. Then the only thing they'll wait for, after specifying the connection info, is the setting up of the connection pool, which should be fairly quick compared to parsing the object model.
Can I turn off some EntityManagerFactory features to get an instance faster?
Don't believe so. EMFs don't really have too many features, other than initializing a JDBC connection/pool.
Or is it possible to create some of the EntityManagerFactory lazily to
speed up cration?
Rather than creating the EMF lazily, when the user will notice the performance hit, I suggest you should head in the opposite direction - create the EMF proactively before the user actually needs it. Create it once, up-front, possibly in a separate thread during application initialisation (or at least as soon as you know about your database). Reuse it throughout the existence of your application/database.
Can I somehow create the EntityManagerFactory object before knowing the database url?
No - it creates a JDBC connection.
I think a better question is: why does your application dynamically discover database connection URLs? Are you saying your databases are created/made available on-the-fly and there's no way to anticipate in advance the connection parameters. That really is to be avoided.
By doing so, can I pool EntityManagerFactorys for later use?
No, you can't pool EMFs. It's the connections that you can pool.
Any other idea how to create the EntityManagerFactory faster?
I agree - 6 seconds is too slow for initialisation of EMFs.
I suspect it's more to do with your selected database technology than JPA/JDBC/JVM. My guess is that maybe your database is initialising itself as you connect. Are you using Access? What DB are you using?
Are you connecting to a database remotely located? Over a WAN? Is network speed/latency good?
Are the client PCs limited in performance?
EDIT: Added after comments
Implementing your own ConnectionProvider as a decorator around a real ConnectionProvider will not speed up the user's experience at all. The database instance still needs to be initialised, the EMF & EM created and the JDBC connection still needs to be subsequently established.
Options:
Share a common preloaded DB instance: seems not possible for your business scenario (although JSE technology supports this and also supports client-server design).
Change to a DB with a faster startup: Derby (a.k.a. Java DB) is included in modern JVMs and has a startup time of about 1.5 seconds (cold) and 0.7 seconds (warm - data pre-loaded).
In many (most?) scenarios, the fastest solution would be to load data directly into in-memory java objects using JAXB with STAX. Subsequently, use in-memory cached data (particularly using smart structures like maps, hashing and arraylists). Just as JPA can map POJO classes to database tables & columns, so JAXB can map POJO classes to XML schema & work with XML doc instances. If you have very complex queries using SQL set-based logic with multiple joins and strong use of DB indexes, this would be less desirable.
(2) would probably give the best improvement for limited effort.
Additionally:
- try to unzip the data files during deployment rather than during app usage.
- initialize the EMF in a startup thread that runs in parallel to the UI startup - try to start the DB initializing as one of the very first steps of the app (that means connecting to the actual instance using JDBC).
I am developing a MongoDB app with Java but I think this question related to datastore connections for web apps in general.
I like to structure all web apps with four top-level packages which are called (which I think will be self explanatory):
Controller
Model
Dao
Util
Ideally I would like to have a class in the Dao package that handles all the connections details.
So far I have created a class that looks like this:
public class Dao {
public static Mongo mongo;
public static DB database;
public static DB getDB() throws UnknownHostException, MongoException{
mongo = new Mongo("localhost");
database = mongo.getDB("mydb");
return database;
}
public static void closeMongo(){
mongo.close();
}
}
I use it in my code with something like this
public static void someMethod(String someData){
try {
DB db = Dao.getDB();
DBCollection rColl = db.getCollection("mycollection");
// perform some database operations
Dao.closeMongo();
} catch (UnknownHostException e) { e.printStackTrace(); } catch (MongoException e) { e.printStackTrace();
}
}
This seems to work fine, but I'd be curious to know what people think is the "best" way to handle this issue, if there is such a thing.
The rule of thumb when connecting to relational database server is to have a pool. For example if you connect to an oracle database using a pool gives you some performance benefits both in terms of connection setup time and sql parsing time (if you are using bind variables). Other relational database may vary but my opinion is that a pool is a good pattern even for some other reason (eg. you may want to limit the maximum number of connections with your db user). You are using MongoDB so the first thing to check is how MongoDB handles connections, how expnsive is creating a connection,etc. I suggest to use/build a class that can implements a pool logic because it gives you the flexibility you may need in the future. Looking at your code it seems that you api
DB db=Dao.getDB();
should be paired with:
Dao.closeDB(DB db);
So you have a chance to really close the connection or to reuse it without affecting the Dao code. with these two methods can switch the way you manage connections without recoding the Dao objects
I would suggest you can write a java class to establish the connection with the database.
The arguments to the method should be the database name, password, host port and other necessary credentials.
You can always call the parametrized constructor everywhere where there is a need to establish database connectivity. This can be a model.
I got a 'nice' solution from this article. http://www.lennartkoopmann.net/post/722935345
Edit Since that link is dead, here's one from waybackmachine.org
http://web.archive.org/web/20120810083748/http://www.lennartkoopmann.net/post/722935345
Main Idea
What I found interesting was the use of a static synchronised method that returns an instance of the static class and its variables. Most professional devs probably find this obvious. I found this to be a useful pattern for managing the db connections.
Pooling
Mongo does automatic connection pooling so the key is to use just one connection to the datastore and let it handle its own pooling.
I think it is better if you call a method inside DAO to get data from database as well. If you do it in this way, say your database got changed. Then you have to edit many classes if you get data directly calling db queries. So if you separate db calling methods inside the DAO class itself and call that method to get data it is better.
final RuntimeMXBean remoteRuntime =
ManagementFactory.newPlatformMXBeanProxy(
serverConnection,
ManagementFactory.RUNTIME_MXBEAN_NAME,
RuntimeMXBean.class);
Where the serverConnection is just basically connecting to a jmx server.
What basically is going on is, this piece of code works fine. Let me explain:
The first call of this piece of code calls to server A, I then scrape some data in it and store it into an xml file. Using this information, start up a new server B.
Then, in wanting to verify B, I want to scrape B to compare the metadata. But when I run it I get the exception
java.lang.IllegalArgumentException: java.lang:type=Runtime is not an instance of interface java.lang.management.RuntimeMXBean
at java.lang.management.ManagementFactory.newPlatformMXBeanProxy(ManagementFactory.java:617
)
But, not sure what changes here since the parameters that are giving me problems are managed by the ManagementFactory class I don't have control over.
The problem was with my own MBeanServer implementation.
I had it returning false for the isInstanceOf() method if the passed in objectName returned a null Object. It turns out that this happened at all RunTime Classes so after reading http://tim.oreilly.com/pub/a/onjava/2005/01/26/classloading.html under the Class Loader section, I went with the fact that my ClassLoaderImplementation was incorrect and was loading these incorrectly.
Work around was just to return true in isInstanceOf() for these RunTime classes.
I have to test some Thrift services using Junit. When I run my tests as a Thrift client, the services modify the server database. I am unable to find a good solution which can clean up the database after each test is run.
Cleanup is important especially because the IDs need to be unique which are currently read form an XML file. Now, I have to manually change the IDs after running tests, so that the next set of tests can run without throwing primary key violation in the database. If I can cleanup the database after each test run, then the problem is completely resolved, else I will have to think about other solutions like generating random IDs and using them wherever IDs are required.
Edit: I would like to emphasize that I am testing a service, which is writing to database, I don't have direct access to the database. But since, the service is ours, I can modify the service to provide any cleanup method if required.
If you are using Spring, everything you need is the #DirtiesContext annotation on your test class.
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration("/test-context.xml")
#DirtiesContext(classMode = ClassMode.AFTER_EACH_TEST_METHOD)
public class MyServiceTest {
....
}
Unless you as testing specific database actions (verifying you can query or update the database for example) your JUnits shouldn't be writing to a real database. Instead you should mock the database classes. This way you don't actually have to connect and modify the database and therefor no cleanup is needed.
You can mock your classes a couple of different ways. You can use a library such as JMock which will do all the execution and validation work for you. My personal favorite way to do this is with Dependency Injection. This way I can create mock classes that implement my repository interfaces (you are using interfaces for your data access layer right? ;-)) and I implement only the needed methods with known actions/return values.
//Example repository interface.
public interface StudentRepository
{
public List<Student> getAllStudents();
}
//Example mock database class.
public class MockStudentRepository implements StudentRepository
{
//This method creates fake but known data.
public List<Student> getAllStudents()
{
List<Student> studentList = new ArrayList<Student>();
studentList.add(new Student(...));
studentList.add(new Student(...));
studentList.add(new Student(...));
return studentList;
}
}
//Example method to test.
public int computeAverageAge(StudentRepository aRepository)
{
List<Student> students = aRepository.GetAllStudents();
int totalAge = 0;
for(Student student : students)
{
totalAge += student.getAge();
}
return totalAge/students.size();
}
//Example test method.
public void testComputeAverageAge()
{
int expectedAverage = 25; //What the expected answer of your result set is
int actualAverage = computeAverageAge(new MockStudentRepository());
AssertEquals(expectedAverage, actualAverage);
}
How about using something like DBUnit?
Spring's unit testing framework has extensive capabilities for dealing with JDBC. The general approach is that the unit tests runs in a transaction, and (outside of your test) the transaction is rolled back once the test is complete.
This has the advantage of being able to use your database and its schema, but without making any direct changes to the data. Of course, if you actually perform a commit inside your test, then all bets are off!
For more reading, look at Spring's documentation on integration testing with JDBC.
When writing JUnit tests, you can override two specific methods: setUp() and tearDown(). In setUp(), you can set everything thats necessary in order to test your code so you dont have to set things up in each specific test case. tearDown() is called after all the test cases run.
If possible, you could set it up so you can open your database in the setUp() method and then have it clear everything from the tests and close it in the tearDown() method. This is how we have done all testing when we have a database.
Heres an example:
#Override
protected void setUp() throws Exception {
super.setUp();
db = new WolfToursDbAdapter(mContext);
db.open();
//Set up other required state and data
}
#Override
protected void tearDown() throws Exception {
super.tearDown();
db.dropTables();
db.close();
db = null;
}
//Methods to run all the tests
Assuming you have access to the database: Another option is to create a backup of the database just before the tests and restore from that backup after the tests. This can be automated.
If you are using Spring + Junit 4.x then you don't need to insert anything in DB.
Look at
AbstractTransactionalJUnit4SpringContextTests class.
Also check out the Spring documentation for JUnit support.
It's a bit draconian, but I usually aim to wipe out the database (or just the tables I'm interested in) before every test method execution. This doesn't tend to work as I move into more integration-type tests of course.
In cases where I have no control over the database, say I want to verify the correct number of rows were created after a given call, then the test will count the number of rows before and after the tested call, and make sure the difference is correct. In other words, take into account the existing data, then see how the tested code changed things, without assuming anything about the existing data. It can be a bit of work to set up, but let's me test against a more "live" system.
In your case, are the specific IDs important? Could you generate the IDs on the fly, perhaps randomly, verify they're not already in use, then proceed?
I agree with Brainimus if you're trying to test against data you have pulled from a database. If you're looking to test modifications made to the database, another solution would be to mock the database itself. There are multiple implementations of in-memory databases that you can use to create a temporary database (for instance during JUnit's setUp()) and then remove the entire database from memory (during tearDown()). As long as you're not using an vendor-specific SQL, then this is a good way to test modifying a database without touching your real production one.
Some good Java databases that offer in memory support are Apache Derby, Java DB (but it is really Oracle's flavor of Apache Derby again), HyperSQL (better known as HSQLDB) and H2 Database Engine. I have personally used HSQLDB to create in-memory mock databases for testing and it worked great, but I'm sure the others would offer similar results.