I'm still looking for Java Collections that are persistent and have comparable access times for performance. The Real data should stay on the disk but for faster access times I need a cache in the RAM so I can stream the content from the file to the main memory.
I read about h2 have such a cache function. Is there a option to cache the whole file on start up?
And can somebody say something about the performance?
Currently, I have more than 100.000 items in a Java HashMap (key value is custom class which contains a byte array).
Thank you!
Partially. The H2 MVStore can be used as a persistent java.util.Map. But not as a list, stack, or so. The H2 Database is a relational database, with SQL and JDBC API, and with the latest version uses the MVStore as the default storage engine.
Other projects such as MapDB support similar features than the MVStore.
Related
I'm reading a file streams of certain group of files and storing it in a database as bytea type. But when I try to read the streams from the database and write those streams into a file, It is really taking long to do it and finally I get an out of memory exception. Is there any other alternative where it can be done more efficiently with or without database involved?
Databases were designed with a key problem in mind:
When having a bunch of data, where we don't know the kinds of reports
that will be generated, how can we store the data in a manner that
preserves the data's inner relationships and permits any reporting
format we can think of. a
Files lack a few key characteristics of databases. Files consistently have a single structure of "characters in order". They also lack any means of integrated report building, and the reporting is often confined to simple searches, which have little context without the result being shown in the rest of the file.
In short, if you aren't using the database's features, please don't use the database.
Many people do store files in databases; because, they have one handy, and instead of writing support for a filesystem storage, they cut-and-paste the database storage code. Let's explore the consequences:
Backups and restores become problematic because the database grows in size very quickly, and the bandwidth to do the backup and restore is a function of the size of the database.
Replication rebuilds in fail-safe databases take longer (I've seen some go so long that redundancy couldn't catch up to the rate of change in the primary database).
Queries that (accidentally) reference the files in bulk spike the CPU, possibly starving access to the rest of the system (depends on database).
Bandwidth of returning the results of those queries steals system resources preventing other queries from communicating their results (better on some databases, worse on others).
I am working on a Spring-MVC application in which we are seeing that the database is growing big. The space is consumed by chat messages history mostly, and other stuff like old notifications, which are not that useful.
Because of which we thought of moving the guys to some text/XML file to give the DB some room to breath and increase the performance of queries thereby. Indexes are not that useful as too many insertions.
I wanted to know if there is any way, PostgreSQL or Hibernate has support for such a task, where data is picked out of db and saved in plain files, which can be accessed and result in atleast good performance gains.
I have only started looking up some stuff, so I don't have much in hand to show. Kindly let me know if there are any questions you guys have.
Thanks a lot.
I would use the PostgreSQL JSON storage and have two databases:
the current operations DB, the one where you are moving data away to slim it
the archive database where old data is aggregated to save storage
This way you can move data from the current database into the archive database without compromising ACID attributes and you can aggregate the old data to simplify retrieval, by grouping various related entities based on some common root entity, which you'll then use to access your old data.
This way the current operation database remains small enough, while the archive database can be shared. This way, it's easier to configure the current operation for high performance, while the archive one for scalability.
Anyway, hibernate doesn't support this out-of-the-box, but you can implement it using custom Hibernate types and JTA transactions.
Will be building an app which will be pulling down JSON objects from a web service, in the low hundreds, each relatively small say 20kb each.
The app won't be doing much else than displaying these POJOs, downloading new and updated ones when available and deleting out of date ones. What would be the preferred method for persistent storage of these objects? I guess the two main contenders are storing them in a SQLite DB, maybe using ORMLite to cut down on the overhead, or just serialize the objects to disk, probably in one large file and use a very fast JSON parser.
Any ideas what would be the preferred method?
You could consider using CouchDB as cache between the mobile client and your webservice.
CouchDB would have to run on a service on the internet, caching the objects from the webservice. On the client you can use TouchDB-Android: https://github.com/couchbaselabs/TouchDB-iOS/wiki/Why-TouchDB%3F . TouchDB-Android can synchronize automatically with CouchDB inatance running on the Internet. The application itself would then access TouchDB solely. TouchDB automatically detects wetter or not there's an internet-connection, so your application keeps running even without internet.
Advantages:
- Caching of JSON calls
- Client remains working with internet-connection down, synchronized automatically when internetconnection is up again.
- Takes load of your webservice, and you can scale.
We used this setup before to allow Android software to work seamlessly, even when the internetconnetion would drop frequently and the service we accessed data from was quite slow and had limited capacity.
A dbms such as SQLLite should come with querying, indexing and sorting capabilities (and other standard SQL DBMS features), you should consider if you need any of these. How many objects are you planning to have in production environment? If say a million disk serialization approach might not scale.
I'd like to save persistent objects to the file system using Hibernate without the need for a SQL database.
Is this possible?
Hibernate works on top of JDBC, so all you need is a JDBC driver and a matching Hibernate dialect.
However, JDBC is basically an abstraction of SQL, so whatever you use is going to look, walk and quack like an SQL database - you might as well use one and spare yourself a lot of headaches. Besides, any such solution is going to be comparable in size and complexity to lighweight Java DBs like Derby.
Of course if you don't insist absolutely on using Hibernate, there are many other options.
It appears that it might technically be possible if you use a JDBC plaintext driver; however I haven't seen any opensource ones which provide write access; the one I found on sourceforge is read-only.
You already have an entity model, I suppose you do not want to lose this nor the relationships contained within it. An entity model is directed to be translated to a relational database.
Hibernate and any other JPA provider (EclipseLink) translate this entity model to SQL. They use a JDBC driver to provide a connection to an SQL database. This, you need to keep as well.
The correct question to ask is: does anybody know an embedded Java SQL database, one that you can start from within Java? There are plenty of those, mentioned in this topic:
HyperSQL: stores the result in an SQL clear-text file, readily imported into any other database
H2: uses binary files, low JAR file size
Derby: uses binary files
Ashpool: stores data in an XML-structured file
I have used HyperSQL on one project for small data, and Apache Derby for a project with huge databases (2Gb and more). Apache Derby performs better on these huge databases.
I don't know exactaly your need, but maybe it's one of below:
1 - If your need is just run away from SQL, you can use a NoSQL database.
Hibernate suports it through Hibernate OGM ( http://www.hibernate.org/subprojects/ogm ).
There are some DBs like Cassandra, MongoDB, CouchDB, Hadoop... You have some suggestions Here
.
2 - Now, if you want not to use a database server (with a service process running always), you can use Apache Derby. It's a DB just like any other SQL, but no need of a server. It uses a singular file to keep data. You can easily transport all database with your program.
Take a look: http://db.apache.org/derby/
3 - If you really want some text plain file, you can do like Michael Borgwardt said. But I don't know if Hibernate would be a good idea in this case.
Both H2 and HyperSQL support embedded mode (running inside your JVM instead of in a separate server) and saving to local file(s); these are still SQL databases, but with Hibernate there's not many other options.
Well, since the question is still opened and the OP said he's opened to new approaches/suggestions, here's mine (a little late but ok).
Do you know Prevayler? It's a Java Prevalence implementation which keep all of your business objects in RAM and mantain Snapshots/Changelogs in the File System, this way it's extremely fast and reliable, since if there's any crash, it'll restore it's last state and reapply every change to it.
Also, it's really easy to setup and run in your app.
Ofcourse this is possible, You can simply use file io features of Java, following steps are required:-
Create a File Object
2.Create an object of FileInputStream (though there are ways which use other Classes)
Wrap this object in a Buffer object or simply inside a java.util.Scanner.
use specific write functions of the object created in previous step.
Note that your object must implement Serializable interface. See following link,
does anobody know how much overhead jackrabbit has, in comparison with pure FS persistence ?
I'm using it for a CMS project, but I also have to persist temporary files (that unfortunately have properies/metadata)... Don't know if I should also employ jackrabbit for that.
I think the overhead is significant enough to avoid this .... at least the IO on filesystem.
These files are the same as the rest of files in repo, but it is for sure, that they will be deleted in a minute.
Should I create a layer to persist files with properties via JAVA IO API, should I use jackrabbit or should I use database ? If so, can it be set for performance somehow ?
By default, Jackrabbit stores the binaries in the FileDataStore, which uses a FileOutputStream, so the overhead is relatively low. However, the binaries in the data store remains until garbage collected, which might be a problem for you if you create a huge number of temporary files.
Metadata: it depends how much metadata you have. The metadata is stored in the persistence manager and possibly in the search index (Lucene). The main performance problem there is usually fulltext search, so disable it if possible.
should I use jackrabbit or should I use database
That really depends on your use case. Jackrabbit does not claim to be "faster than a database", but the data model (hierarchical, key value pairs) may be better or easier to use.