Accessing HSQLDB "res" database bundled as resource in jar is extremely slow - java

I'm attempting to build an open source project to provide easy access to machine learning datasets, which bundles the data in an easily accessible way. Basically, I have code, which converts the raw data into a HSQLDB file DB, producing *.data, *.properties, and *.script files. I then take those 3 files, put them in src/main/resources of my Maven project and build a jar. Applications depending on this jar can then access the HSQLDB database as a res database.
Technically, I have no problems getting all the pieces in place to accomplish this. However, accessing the data is extremely slow. The strange thing is though, that if I have the datasets project and a project depending on datasets both open in Eclipse and run it from there, it's fast as one would expect. This means that the problem has to do with the HSQLDB files being jarred up. Another clue, is that the larger the DB, the (seemingly) exponentially longer it takes to access the data.
I've tried bumping of the memory and perm space given as JVM args. I've also tried setting various HSQLDB flags in the the *.properties file as well.
Any ideas??
Edit: I also have jar compression turned off using the <compress>false</compress> element in the maven-jar-plugin definition.

I tried many things including setting the cache size and cache rows as suggested at the HSQLDB forum. I ended up solving this problem with a workaround as suggested above by Boris the Spider, which was to:
Create a temporary dir at java.io.temp.
Move the DB files out of the jar and into the temporary dir.
Use the file HSQLDB database using these files.
Cleanup afterwards by deleting the temporary dir.
Worked like a charm. A bit of a hack, but at least it works.

Related

Resource Handling approach in Maven

I am handling below way to work with files.
1st Approach:
I am keeping my files in D:\Projects\JavaProjects\LearnCucumber\src\test\resources\
-With the help of ClassLoader, I am getting the path and working on files
ResourceUtils.class.getClassLoader().getResource(".").getPath();
2nd Approach:
Now, I keep file in D:\Projects\JavaProjects\LearnCucumber\BrowserDrivers\
using System.getProperty("user.dir") I am dealing with files in this approach.
Now Which approach is efficient, when we run our code in different platform(windows,linux) in terms of handling files. Does it really makes difference?
Try to avoid working with files on the filesystem as those are usually less portable from one operating system to the next one. Also if you put files at certain locations other users of your software need to have those files available at the same location as well, whereas with your approach #1 you can ship these files directly with your application (packaged as jar) and access it from the classpath easily.

How to convert a jar file to an executable with MySQL in it?

I have a java application which uses MySQL as its database. In order to deliver the project, I need to convert it to an executable with MySQL included. I have tried exe4j but it doesn't allow to include the database. Please advice. The project is done in Netbeans.
Although it's not being actively developed any longer due to low demand, you can have a look at the MySQL-Connector/MXJ package that's meant for "embedding" your MySQL-database into your application - I recon' it still should be able to do the trick.
But to be honest, the most future-proof solution would be to switch out your database with another option, preferably an in memory database such as H2database or SQLite.

Patching Java software

I'm trying to create a process to patch our current java application so users only need to download the diffs rather than the entire application. I don't think I need to go as low level as a binary diff since most of the jar files are small, so replacing an entire jar file wouldn't be that big of a deal (maybe 5MB at most).
Are there standard tools for determining which files changed and generating a patch for them? I've seen tools like xdelta and vpatch, but I think they work at a binary level.
I basically want to figure out - which files need to be added, replaced or removed. When I run the patch, it will check the current version of the software (from a registry setting) and ensure the patch is for the correct version. If it is, it will then make the necessary changes. It doesn't sound like this would be too difficult to implement on my own, but I was wondering if other people had already done this. I'm using NSIS as my installer if that makes any difference.
Thanks,
Jeff
Be careful when doing this--I recommend not doing it at all.
The biggest problem is public static variables. They are actually compiled into the target, not referenced. This means that even if a java file doesn't change, the class must be recompiled or you will still refer to the old value.
You also want to be very careful of changing method signatures--you will get some very subtle bugs if you change a method signature and do not recompile all files that call that method--even if the calling java files don't actually need to change (for instance, change a parameter from an int to a long).
If you decide to go down this path, be ready for some really hard to debug errors (generally no traces or significant indications, just strange behavior like the number received not matching the one sent) on customer site that you cannot duplicate and a lot of pissed off customers.
Edit (too long for comment):
A binary diff of the class files might work but I'd assume that some kind of version number or date gets compiled in and that they'd change a little every compile for no reason but that could be easily tested.
You could take on some strict development practices of not using public final statics (make them private) and not every changing method signatures (deprecate instead) but I'm not convinced that I know all the possible problems, I just know the ones we encountered.
Also binary diffs of the Jar files would be useless, you'd have to diff the classes and re-integrate them into the jars (doesn't sound easy to track)
Can you package your resources separately then minimize your code a bit? Pull out strings (Good for i18n)--I guess I'm just wondering if you could trim the class files enough to always do a full build/ship.
On the other hand, Sun seems to do an okay job of making class files that are completely compatible with the previous JRE release, so they must have guidelines somewhere.
You may want to see if Java WebStart can help you as it is designed to do exactly those things you want to do.
I know that the documentation describes how to create and do incremental updates, but we deploy the whole application as it changes very rarely. It is then an issue of updating the JNLP when ready.
How is it deployed?
On a local network I just leave everything as .class files in a folder. The startup script uses robocopy or rsync to copy from network share to local. If any .class file is different it is synced down. If not, it doesn't sync.
For non-local network I created my own updater. It downloads a text file of md5sums and compares to local files. If different it pulls file down from http.
A long time ago the way we solved this was to used Classpath and jar files. Our application was built in a Jar file, and it had a launcher Jar file. The launcher classpath had a patch.jar that was read into the classpath before the main application.jar. This meant that we could update the patch.jar to supersede any classes in the main application.
However, this was a long time ago. You may be better using something like the Java Web Start type of approach, which offers more seamless application updating.

Alternative to ZIP as a project file format. SQLite or Other?

My Java application is currently using ZIP as a project file format. The project files contain a few XML files and many image and sound files.
The project files are getting pretty big, and since I can't find a way with the java.util.zip classes to write to a ZIP file without recreating it, my file saves are becoming very slow. So for example, if I just want to update one XML file, I need to rewrite the entire ZIP.
Is there some other Java ZIP library that will allow me to do random writes to a ZIP file?
I know switching to something like SQLite solves the random write issue. Would using SQLite just to write XML, Sound and Images as blobs be an appropriate use?
I suppose I could come up with my own file format and use RandomAccessFile but then there would be a lot of bookkeeping I'd have to write.
Update...
My file format is very much like Office Open XML. It is a ZIP file containing XML and other resources.
Someone must have solved the problem of how to do random writes to update a ZIP file. Does anyone know how?
There exist so-called single-file virtual file systems, that let you create file-based containers and provide file-system like structure and APIs. One of the samples is SolFS (it has C-written core with JNI wrapper) and some other C- and Delphi-written solutions (I don't remember their names at the moment). I guess there exist similar native Java solutions as well.
First of all I would separate your app's resources in those that are static (such as images) and those that can be changed (the xml files you mentioned).
Since the static files won't be re-written, you can continue to store them in a zip file, which IMHO is a good approach to deploy any resources.
Now you have 2 options:
Since the non-static files are probably not too big (the xml files are likely to be smaller than images+sounds), you can stick with your current solution (zip file) and simply maintain 2 zip files, of which only one (the smaller one with the changeable files) can/will be re-written.
You could use a in-memory-database (such as hsqldb) to store the changeable files and only persist them (transferring from the database to a file on the drive) when your application shuts down or that operation is explicitly needed.
sqlite is not always fast (at least in my experience). I would suggest individually compressing the XML files -- you'll still get decent compression, and just use the file system to save them. You could experiment with btrfs, or just go with ext4. If you're not on Linux, then this should still work okay, but it might not be as fast until things are cached in memory.
the idea is that if you do not have redundancy between XML files, then you don't get that much saving by compressing them in one "solid" archive.
Before offering another answer along the lines of using properly structured JARs, I have to ask -- why does the project need to be encapsulated in one file? How do you distribute the program to users to run?
If you must keep a project contained within a single file and be able to replace resources efficiently, yes I would say SQLite is a good choice.
If you do choose to use SQLite, also consider converting some of the XML schemas to one or more SQL tables rather than storing large XML documents as BLOBs.

Shipping Java code with data baked into the .jar

I need to ship some Java code that has an associated set of data. It's a simulator for a device, and I want to be able to include all of the data used for the simulated records in the one .JAR file. In this case, each simulated record contains four fields (calling party, called party, start of call, call duration).
What's the best way to do that? I've gone down the path of generating the data as Java statements, but IntelliJ doesn't seem particularly happy dealing with a 100,000 line Java source file!
Is there a smarter way to do this?
In the C#/.NET world I'd create the data as a separate file, embed it in the assembly as a resource, and then use reflection to pull that out at runtime and access it. I'm unsure of what the appropriate analogy is in the Java world.
FWIW, Java 1.6, shipping for Solaris.
It is perfectly OK to include static resource files in the JAR. This is commonly done with properties files. You can access the resource with the following:
Class.getResourceAsStream ("/some/pkg/resource.properties");
Where / is relative to the root of the classpath.
This article deals with the subject Smartly load your properties.
Sure, just include them in your jar and do
InputStream is = this.getClass().getClassLoader().getResourceAsStream("file.name");
If you put them under some folders, like "data" then just do
InputStream is = this.getClass().getClassLoader().getResourceAsStream("data/file.name");

Categories