Merging two folders with priority on timestamp - java

One of our teams recently migrated a good deal of legacy data to a folder with all of the current data. Some of the team members weren't aware of the changes, so they continued to make modifications in the legacy folder.
I'd like to consolidate the data by doing timestamp checks. I can write a script for this, but I essentially want to do a windows explorer merge of two folders, where the option selected is always the "newer" one in conflicts. That is, if the source has the newer file, copy the source into the dest. If the dest is newer, don't copy the source into the dest. If the source exists but the dest doesn't, copy the source over.
I'm writing a quick script in Java, but I'm running into some issues, and I wanted to know if there's a much simpler solution to simply always select the "newer" option.

To copy file Standard concise way to copy a file in Java?
To get timestamp http://docs.oracle.com/javase/6/docs/api/java/io/File.html#lastModified%28%29

Related

Merging two local files in subversion

I destroyed my subversion tree. (I attempted to ignore a few files, broke something, and now the svn says it can't find my root directory, even though it correctly notes the differences between files in said directory.) So, now I have about twenty files from my current project that I'd like to commit but can't.
I ended up checking out a new tree entirely, but now I don't know how to intelligently merge my files from the broken tree to the new tree I just checked out. I don't want to simply copy the files, as this will wipe changes others have done since I've updated. (The broken tree doesn't let me update.) Using 'svn merge' isn't meant to be used on two local copies, right? What tools can I use?
Use kdiff3 and manually merge your changes into the repository. Then commit.
I don't think that you can find better merging tool than winmerge
BTW - I didn't like the kdiff3 :)

Extract a particular folder from a jar and copy it to a desired destination on my system

I need to extract the resource folder from inside a jar to a desired location in my system. I want to do it by calling a function in a class, which is in the same jar.
I don't want to copy one file at a time. Can you please suggest me a way in which I can copy the entire folder?
I initially thought of compressing them into a zip, and copying it elsewhere, and extracting it.
How will this work? Is there a more efficient way to do this?
Thanks in advance.
If you are going to do this using java API I know only one way: you have to use JarInputStream or ZipInputStream, iterate over Zip entries, detect which entries belong to the folder and extract them, i.e. read from zip and write to disk. There is no other "magical" way.
But if you want you can probably use some kind of higher level API. Check VFS from Jakarta: http://commons.apache.org/vfs/
It provides API that probably does what you need.
You could use Runtime.exec api to execute something similar to the following :
jar xf <your_jar_file_name> <path_to_directory_to_be_extracted>
In this way, you do not have to create specialized class to handle Jar files and you can focus on solving actual problem at hand.
Note : this is restricted to JDK may not work on JRE.

Patching Java software

I'm trying to create a process to patch our current java application so users only need to download the diffs rather than the entire application. I don't think I need to go as low level as a binary diff since most of the jar files are small, so replacing an entire jar file wouldn't be that big of a deal (maybe 5MB at most).
Are there standard tools for determining which files changed and generating a patch for them? I've seen tools like xdelta and vpatch, but I think they work at a binary level.
I basically want to figure out - which files need to be added, replaced or removed. When I run the patch, it will check the current version of the software (from a registry setting) and ensure the patch is for the correct version. If it is, it will then make the necessary changes. It doesn't sound like this would be too difficult to implement on my own, but I was wondering if other people had already done this. I'm using NSIS as my installer if that makes any difference.
Thanks,
Jeff
Be careful when doing this--I recommend not doing it at all.
The biggest problem is public static variables. They are actually compiled into the target, not referenced. This means that even if a java file doesn't change, the class must be recompiled or you will still refer to the old value.
You also want to be very careful of changing method signatures--you will get some very subtle bugs if you change a method signature and do not recompile all files that call that method--even if the calling java files don't actually need to change (for instance, change a parameter from an int to a long).
If you decide to go down this path, be ready for some really hard to debug errors (generally no traces or significant indications, just strange behavior like the number received not matching the one sent) on customer site that you cannot duplicate and a lot of pissed off customers.
Edit (too long for comment):
A binary diff of the class files might work but I'd assume that some kind of version number or date gets compiled in and that they'd change a little every compile for no reason but that could be easily tested.
You could take on some strict development practices of not using public final statics (make them private) and not every changing method signatures (deprecate instead) but I'm not convinced that I know all the possible problems, I just know the ones we encountered.
Also binary diffs of the Jar files would be useless, you'd have to diff the classes and re-integrate them into the jars (doesn't sound easy to track)
Can you package your resources separately then minimize your code a bit? Pull out strings (Good for i18n)--I guess I'm just wondering if you could trim the class files enough to always do a full build/ship.
On the other hand, Sun seems to do an okay job of making class files that are completely compatible with the previous JRE release, so they must have guidelines somewhere.
You may want to see if Java WebStart can help you as it is designed to do exactly those things you want to do.
I know that the documentation describes how to create and do incremental updates, but we deploy the whole application as it changes very rarely. It is then an issue of updating the JNLP when ready.
How is it deployed?
On a local network I just leave everything as .class files in a folder. The startup script uses robocopy or rsync to copy from network share to local. If any .class file is different it is synced down. If not, it doesn't sync.
For non-local network I created my own updater. It downloads a text file of md5sums and compares to local files. If different it pulls file down from http.
A long time ago the way we solved this was to used Classpath and jar files. Our application was built in a Jar file, and it had a launcher Jar file. The launcher classpath had a patch.jar that was read into the classpath before the main application.jar. This meant that we could update the patch.jar to supersede any classes in the main application.
However, this was a long time ago. You may be better using something like the Java Web Start type of approach, which offers more seamless application updating.

Best practice to store .jar files in VCS (SVN, Git, ...)

I know, in the time of Maven it is not recommended to store libraries in VCS, but sometimes it makes sense, though.
My question is how to best store them - compressed or uncompressed? Uncompressed they are larger, but if they are replaced a couple of times with newer ones, then maybe the stored difference between two uncompressed .jar files might be much smaller than the difference of compressed ones. Did someone make some tests?
Best practice to store .jar files in VCS (SVN, Git, …): don't.
It could make sense in a CVCS (Centralized VCS) like SVN, which can handle millions of files whatever their size is.
It doesn't in a DVCS, especially one like Git (and its limits):
Binary files don't fit well with VCS.
By default, cloning a DVCS repo will get you all of its history, with all the jar versions.
That will be slow and take a lot of disk space, not matter how well those jar are compressed.
You could try to play with shallow cloning, but that's highly unpractical.
Use a second repository, like Nexus, for storing those jars, and only reference a txt file (or a pom.xml file for Maven project) in order to fetch the right jar versions.
A artifact repo is more adapted for distribution and release management purpose.
All that being said, if you must store jar in a Git repo, I would have recommend initially to store them in their compressed format (which is the default format for a jar: see Creating a JAR File)
Both compressed and uncompressed format would be treated as binary by Git, but at least, in a compressed format, clone and checkout would take less time.
However, many threads mentions the possibility to store jar in uncompressed format:
I'm using some repos that get regular 50MB tarballs checked into them.
I convinced them to not compress the tarballs, and git does a fairly decent job of doing delta compression between them (although it needs quite a bit of RAM to do so).
You have more on deltified object on Git here:
It does not make a difference if you are dealing with binary or text;
The delta is not necessarily against the same path in the previous revision, so even a new file added to the history can be stored in a delitified form;
When an object stored in the deltified representation is used, it would incur more cost than using the same object in the compressed base representation. The deltification mechanism makes a trade-off taking this cost into account, as well as the space efficiency.
So, if clones and checkouts are not common operations that you would have to perform every 5 minutes, storing jar in an uncompressed format in Git would make more sense because:
Git would compressed/compute delta for those files
You would end up with uncompressed jar in your working directory, jars which could then potentially be loaded more quickly.
Recommendation: uncompressed.
You can use similar solution as found in answers to "Uncompress OpenOffice files for better storage in version control" question here on SO, namely using clean / smudge gitattribute using rezip as filter to store *.jar files uncompressed.
.jar files are (can be) compressed already, compressing them a second time probably will not yield the size improvement you expect.

Alternative to ZIP as a project file format. SQLite or Other?

My Java application is currently using ZIP as a project file format. The project files contain a few XML files and many image and sound files.
The project files are getting pretty big, and since I can't find a way with the java.util.zip classes to write to a ZIP file without recreating it, my file saves are becoming very slow. So for example, if I just want to update one XML file, I need to rewrite the entire ZIP.
Is there some other Java ZIP library that will allow me to do random writes to a ZIP file?
I know switching to something like SQLite solves the random write issue. Would using SQLite just to write XML, Sound and Images as blobs be an appropriate use?
I suppose I could come up with my own file format and use RandomAccessFile but then there would be a lot of bookkeeping I'd have to write.
Update...
My file format is very much like Office Open XML. It is a ZIP file containing XML and other resources.
Someone must have solved the problem of how to do random writes to update a ZIP file. Does anyone know how?
There exist so-called single-file virtual file systems, that let you create file-based containers and provide file-system like structure and APIs. One of the samples is SolFS (it has C-written core with JNI wrapper) and some other C- and Delphi-written solutions (I don't remember their names at the moment). I guess there exist similar native Java solutions as well.
First of all I would separate your app's resources in those that are static (such as images) and those that can be changed (the xml files you mentioned).
Since the static files won't be re-written, you can continue to store them in a zip file, which IMHO is a good approach to deploy any resources.
Now you have 2 options:
Since the non-static files are probably not too big (the xml files are likely to be smaller than images+sounds), you can stick with your current solution (zip file) and simply maintain 2 zip files, of which only one (the smaller one with the changeable files) can/will be re-written.
You could use a in-memory-database (such as hsqldb) to store the changeable files and only persist them (transferring from the database to a file on the drive) when your application shuts down or that operation is explicitly needed.
sqlite is not always fast (at least in my experience). I would suggest individually compressing the XML files -- you'll still get decent compression, and just use the file system to save them. You could experiment with btrfs, or just go with ext4. If you're not on Linux, then this should still work okay, but it might not be as fast until things are cached in memory.
the idea is that if you do not have redundancy between XML files, then you don't get that much saving by compressing them in one "solid" archive.
Before offering another answer along the lines of using properly structured JARs, I have to ask -- why does the project need to be encapsulated in one file? How do you distribute the program to users to run?
If you must keep a project contained within a single file and be able to replace resources efficiently, yes I would say SQLite is a good choice.
If you do choose to use SQLite, also consider converting some of the XML schemas to one or more SQL tables rather than storing large XML documents as BLOBs.

Categories