I have a domain object that stores some metadata and some raw bytes. This is used for storing binary objects such as PDF documents and images.
I would like to persist the metadata in a database so it can be easily queried but I want to store the raw bytes in the file system for performance reasons. What is a good design for achieving this?
Should I have a domain object representing the raw bytes with its own DAO to perform CRUD and a separate JPA DAO to do the same for the metadata?
If that is the case would the domain object for the metadata contain a reference to the raw byte object that is marked as transient so JPA won't attempt to persist it?
Am I following an overly complex design for little benefit over storing raw bytes in the database? I'm using PostgreSQL 8.x if that makes a difference.
Many thanks.
I really wouldn't do this. Have you measured the supposed performance hit ? How are you going to maintain transactionality between your data in the database and your data on the filesystem. e.g. are you going to write to the filesystem, write to the db, and if that fails then rollback your filesystem change (which isn't as easy as simply deleting the file - do you have a previous version of the binary data?). How do you manage database backups etc. and keep everything in sync ? I would strongly recommend keeping all the data in one place.
Since you're talking about storing PDFs and the like, perhaps you need a document management system ?
Related
This might look simple but i haven't got satisfying answer anywhere.
Why do we need Serialization ?
Answer I found everywhere is like -
To convert object in byte stream and to store in DB.
But my question is - can't we do it without using serialization?
If not how we are storing the data in DB?
Please explain me clearly, if possible provide me an example
Serialization is not to store objects in a database.
It's to convert an object into a stream of bytes. That stream of bytes can be used indeed to store it into a database, but it can also be used to save it into a file or send it through a socket (here is an example).
Can't we do it without using serialization?
Absolutely, actually it's very rare to use serialization to store data in a database. Most of the times (I would say 99%) is using JDBC mainly through an ORM tool such as Hibernate.
Because data is transferred as byte stream across the network, you cannot put your Object inside the wire.
In case of JDBC - sort of its own serialization performed by the Driver itself in appropriate format.
In general, this is not about programming, but general network structure.
The data traverses the following path:
Application layer
Transport layer
Internet layer
Link layer
At the last point it converted to the byte stream and physically crosses the network.
I'm fairly new to java web applications and I am undertaking the task of learning JPA. However, it is not explicitly clear what it means for an entity object to persist. I think I have an idea, but I would rather not assume its meaning.
I am referencing the Oracle JPA Doc, but they continue to use the words like "persist" or "persistence" when describing persistent fields/properties. Can someone shed some light on this idea of persistence? And maybe define what it means for an instance of an entity to be persistent?
And if you could not use the word "persistent" (or any form of the word) in your definition that would be much appreciated. A simple answer would be great, but more in-depth explanations are definitely welcome! Thanks so much!
Persistence simply means to Store Permanently.
In JAVA we work with Objects and try to store Object's values into database(RDBMS mostly).
JPA provides implementation for Object Relation Mapping(ORM) ,so that we can directly store Object into Database as a new Tuple.
Object, in JPA, are converted to Entity for mapping it to the Table in Database.
So Persisting an Entity means Permanently Storing Object(Entity) into Database.
Hope this Helps!!
"Persist" means "lives on after the application is shut down". The object is not just in volatile memory; it's in more permanent storage on disk. If the application is shut down, or the user ends their session and begins a new one, the old data is still available from permanent storage on disk.
Databases store information on disks, unless they are in-memory versions that give you the advantage of using SQL but little else. If you use a relational SQL database, you get a query language that makes it easy to Create/Read/Update/Delete information without having to worry about how it's stored on the disk.
SQL databases store relations on disk using different data structures (e.g. B-Tree). Relations are defined in terms of tables and columns. Each record in a table consists of a tuple of row values. Objects have to map tables and columns to objects and attributes using object-relational mapping. JPA generalizes this idea and builds it into Java EE, following the example of implementations like TopLink and Hibernate.
NoSQL databases, like MongoDB, also store information on disk as documents rather than relations.
Object databases serialize an object and all its children using formats like Java serialization, XML, JSON, or custom formats (e.g. Google protocol buffers).
Graph databases, like Neo4J, can be thought of as more general cases of object databases.
In my java application I have some serialized entity classes with inheritance. When saving instances of these classes i am converting them to a byte array and saving to a longblob column in my database table. Is there any advantage using hibernate to implement this program. Because as far I understand hibernate is used to map entities with database tables in a proper way. But here I don't have a relational model to map attributes of entities. I am saving them as objects. Am I missing something. Please clarify me. Thanks in advance.
If you don't have a relational data model to save those objects and you can't change your schema, then you can use your current approach.
If you use PostgreSQL you might be interested in JSON storage as well. That way you can store your hierarchies using JSON objects and you can even run native SQL queries against them (although not inheritance-aware, but you can cope with that if you use some _class column to differ between object types).
The cleanest approach is to have the relation model in sync with your business domain model. That way you can benefit from:
optimistic locking (preventing lost updates phenomena)
caching (2nd level cache and query cache)
query-able hierarchies
an external DBA hierarchies could run an update on your hierarchies using mere SQL
auditing
I have a java objects which are not serializable. It is an external library and I cannot flag them as serializable. Here are a couple of questions..
1) Can they still be written to a mySQL BLOB column?
2) Is there any other way of persisting them outside of my JVM?
Any help will be useful.
Thanks
-a.
1) Have you tried it ?
2) Sure, for example in XML files. I personnally use XStream
1) Can they still be written to a mySQL BLOB column?
Yes, but you'll need to implement a serialisation algorithm to generate the bytes. Also, you will need to be sure you can access all the required internal state.
2) Is there any other way of persisting them outside of my JVM?
Take a look at XStream
Well, they don't have to be serializable in terms of "Java's default binary serialization scheme" but they have to be serializable somehow, by definition: you've got to have some way of extracting the data into a byte array, and then reconstituting it later from that array.
How you do that is up to you - there are lots of serialization frameworks/protocols/etc around.
They can, but not automatically. You'll have to come up with your own code to construct a binary representation of your object, persist the binary data to your database, and then reconstruct the object when you pull the data out of the database.
Yes, but again it will not be automatic. You'll have to come up with your own binary representation of the object, decide how you want to store it, and then reconstruct the object when you want to read it.
Serializable doesn't do anything in itself, it's just a way to hint that a class can be serializable. Some tools requires the presence of the interface while some does not.
I haven't looked into storing java objects as mySQL BLOB, but if serializable java objects can be then I see no reason why it wouldn't be possible.
2) Is there any other way of persisting them outside of my JVM?
There are many ways to persist objects outside JVM. Store it to disk, ftp, network storage, etc., and there exist just as many tools for storing in various format (such as XML, etc.).
I'm creating an application that will use a lot of data which is, for all intents and purposes, static. I had assumed it'd make most sense to use a SQLite database to handle that data. I'm wondering if it makes sense to just use an XML file(s) and then access it as a raw resource. Bear in mind that there's likely going to be a LOT of data, to the order of hundreds of separate pieces.
Am I right to assume SQLite is best, both in terms of memory management and overall design considerations or does SQLite not make sense if the data is basically static?
In fact, SQLite seems to be nonsense if the data is static. However, if what you are going to manipulate is a lot of data you should use it:
It will be easier to:
Retrieve data
Filter data
Sort data
Using XML files will cause some performance problems because of the way in which SAX or DOM parses XML.
It will be easier for you to update that set of data in the future (imagine that you want to add more data in the next release)
Cristian is right. Database gives you better access time and allows to modify data in very convenient way. XML might be a better idea in case of tree-like data structures.
In my opinion there are 2 question here:
what kind of data are you storing?
Do you allow user to modify this
data (for example in application or
using Notepad)
There is also 1 big disadvantage of XML - it is eventually open text. So anyone can read it. To prevent it, you would have to encrypt the data (and this means additional effort). In case of XML, using marshaling techniques (JiBX, Castor, JAXB) might be convenient and might also lower memory consumption.
Please describe what kind of data you are storing in DB, so we might come up with better answer.
Did you think of your data being stollen (from the sqlite database)?
Because as a sqlite database, anybody with root can just pull the db file and use it