I'm trying to decide what type of persistence manager to use for my project. I read this wiki entry about persistenceManagers.
First of all, due to JCR-2802 (all non-bundle PM deprecated), there are only
BundleFsPersistenceManager
BundleDbPersistenceManager
Mysql,H2,PostgreSQL,Oracle,Derby,MSSQL - PersistenceManagers
and all those InMem, Object, Xml PersistenceManagers are deprecated. (MemoryFileSystem still OK while InMemPM is deprecated ?)
So that as I see this, BundleFsPersistenceManager uses LocalFileSystem to persist files (is there a wiki entry that explains the means of how content is stored into files? - like different types of node properties such as nt:file) on filesystem and BundleDbPersistenceManager uses DbFileSystem to store the exact same files into DBMS ? Otherwise lucene indexing and full text searching wouldn't be possible right ?
So that the reasons are clustering and distributed nature of systems and atomicity...otherwise the database implementation would be redundant right ? Like this people have more choices.
MemoryFileSystem still OK while InMemPM is deprecated ?
Yes... It's a bit sad the the in-memory persistence manager is deprecated, because it allows to run fast unit tests. However, you could also use a database persistence manager together with an in-memory database (such as an H2 database).
is there a wiki entry that explains the means of how content is stored into files?
No, because this is an implementation detail and subject to change, you shouldn't ever need to parse or write such files yourself, and use Jackrabbit instead.
like different types of node properties such as nt:file
File content is stored in the DataStore. Node and property data and links to the data store is the persistence manager.
Otherwise lucene indexing and full text searching wouldn't be possible right ?
Lucene indexing is independent on the persistence manager or the data format the persistence manager uses. The Lucene indexing internally doesn't access the persistence manager data directly.
otherwise the database implementation would be redundant right ?
It's just that some people prefer storing all data in a database (for example because they already have a database and know very well how to operate / backup / maintain it). The majority seems to be OK to store the data in the file system directly, however there is no built-in transactional file based persistence manager in Jackrabbit. For this, you would need to use a Jackrabbit extension such as the (commercial) CRX from Adobe (disclaimer: I work for Adobe).
Related
I am working on a Spring-MVC application in which we are seeing that the database is growing big. The space is consumed by chat messages history mostly, and other stuff like old notifications, which are not that useful.
Because of which we thought of moving the guys to some text/XML file to give the DB some room to breath and increase the performance of queries thereby. Indexes are not that useful as too many insertions.
I wanted to know if there is any way, PostgreSQL or Hibernate has support for such a task, where data is picked out of db and saved in plain files, which can be accessed and result in atleast good performance gains.
I have only started looking up some stuff, so I don't have much in hand to show. Kindly let me know if there are any questions you guys have.
Thanks a lot.
I would use the PostgreSQL JSON storage and have two databases:
the current operations DB, the one where you are moving data away to slim it
the archive database where old data is aggregated to save storage
This way you can move data from the current database into the archive database without compromising ACID attributes and you can aggregate the old data to simplify retrieval, by grouping various related entities based on some common root entity, which you'll then use to access your old data.
This way the current operation database remains small enough, while the archive database can be shared. This way, it's easier to configure the current operation for high performance, while the archive one for scalability.
Anyway, hibernate doesn't support this out-of-the-box, but you can implement it using custom Hibernate types and JTA transactions.
I'd like to save persistent objects to the file system using Hibernate without the need for a SQL database.
Is this possible?
Hibernate works on top of JDBC, so all you need is a JDBC driver and a matching Hibernate dialect.
However, JDBC is basically an abstraction of SQL, so whatever you use is going to look, walk and quack like an SQL database - you might as well use one and spare yourself a lot of headaches. Besides, any such solution is going to be comparable in size and complexity to lighweight Java DBs like Derby.
Of course if you don't insist absolutely on using Hibernate, there are many other options.
It appears that it might technically be possible if you use a JDBC plaintext driver; however I haven't seen any opensource ones which provide write access; the one I found on sourceforge is read-only.
You already have an entity model, I suppose you do not want to lose this nor the relationships contained within it. An entity model is directed to be translated to a relational database.
Hibernate and any other JPA provider (EclipseLink) translate this entity model to SQL. They use a JDBC driver to provide a connection to an SQL database. This, you need to keep as well.
The correct question to ask is: does anybody know an embedded Java SQL database, one that you can start from within Java? There are plenty of those, mentioned in this topic:
HyperSQL: stores the result in an SQL clear-text file, readily imported into any other database
H2: uses binary files, low JAR file size
Derby: uses binary files
Ashpool: stores data in an XML-structured file
I have used HyperSQL on one project for small data, and Apache Derby for a project with huge databases (2Gb and more). Apache Derby performs better on these huge databases.
I don't know exactaly your need, but maybe it's one of below:
1 - If your need is just run away from SQL, you can use a NoSQL database.
Hibernate suports it through Hibernate OGM ( http://www.hibernate.org/subprojects/ogm ).
There are some DBs like Cassandra, MongoDB, CouchDB, Hadoop... You have some suggestions Here
.
2 - Now, if you want not to use a database server (with a service process running always), you can use Apache Derby. It's a DB just like any other SQL, but no need of a server. It uses a singular file to keep data. You can easily transport all database with your program.
Take a look: http://db.apache.org/derby/
3 - If you really want some text plain file, you can do like Michael Borgwardt said. But I don't know if Hibernate would be a good idea in this case.
Both H2 and HyperSQL support embedded mode (running inside your JVM instead of in a separate server) and saving to local file(s); these are still SQL databases, but with Hibernate there's not many other options.
Well, since the question is still opened and the OP said he's opened to new approaches/suggestions, here's mine (a little late but ok).
Do you know Prevayler? It's a Java Prevalence implementation which keep all of your business objects in RAM and mantain Snapshots/Changelogs in the File System, this way it's extremely fast and reliable, since if there's any crash, it'll restore it's last state and reapply every change to it.
Also, it's really easy to setup and run in your app.
Ofcourse this is possible, You can simply use file io features of Java, following steps are required:-
Create a File Object
2.Create an object of FileInputStream (though there are ways which use other Classes)
Wrap this object in a Buffer object or simply inside a java.util.Scanner.
use specific write functions of the object created in previous step.
Note that your object must implement Serializable interface. See following link,
does anobody know how much overhead jackrabbit has, in comparison with pure FS persistence ?
I'm using it for a CMS project, but I also have to persist temporary files (that unfortunately have properies/metadata)... Don't know if I should also employ jackrabbit for that.
I think the overhead is significant enough to avoid this .... at least the IO on filesystem.
These files are the same as the rest of files in repo, but it is for sure, that they will be deleted in a minute.
Should I create a layer to persist files with properties via JAVA IO API, should I use jackrabbit or should I use database ? If so, can it be set for performance somehow ?
By default, Jackrabbit stores the binaries in the FileDataStore, which uses a FileOutputStream, so the overhead is relatively low. However, the binaries in the data store remains until garbage collected, which might be a problem for you if you create a huge number of temporary files.
Metadata: it depends how much metadata you have. The metadata is stored in the persistence manager and possibly in the search index (Lucene). The main performance problem there is usually fulltext search, so disable it if possible.
should I use jackrabbit or should I use database
That really depends on your use case. Jackrabbit does not claim to be "faster than a database", but the data model (hierarchical, key value pairs) may be better or easier to use.
I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.
I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).
Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
Or do I even need that, should I just go with raw/custom Java?
Is there some simple library that
helps me in saving, loading, deleting
etc. the files? It's not that tricky
to implement it myself, but I wonder
if there are existing solutions? Just
a simple library that already provides
easy access to filesystem (preferrably
over different operating systems).
Java API
Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.
You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.
Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.
The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.
Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.
Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).
Web service
The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.
Transactions
Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.
You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:
Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
Create. Same story when user want to create a file
Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.
This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).
There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.
Support for XML and JSON seems to be experimental.
I'm doing a Java software-project at my university that mainly is about storing data-sets (management of software tests).
The first thing I thought of was a simple SQL DB, however the necessary DB scheme is not available for now (let's say the project is stupid but there's no choice).
Is a persistency framework like Hibernate able to store data internally (for example in XML) and to convert this XML into decent SQL later?
My intention is to use the additional abstraction layer of a framework like Hibernate to save work, because it might have conversion functions. I know that Hibernate can generate class files from SQL, but I'm not too sure whether it needs a DB at every point during development. Using a XML Scheme for now and converting it into SQL later maybe an idea :)
You can persist XML with hibernate into a relational DB, but you cannot use XML directly as a storage engine. Why not simply store you're data into a relational db from the start - you'll create some schema yourself and you'll adapt it to the actual one when you receive it.
I would recommand using a lightweight DB such as HSQLDB instead.