I'm creating an application that will use a lot of data which is, for all intents and purposes, static. I had assumed it'd make most sense to use a SQLite database to handle that data. I'm wondering if it makes sense to just use an XML file(s) and then access it as a raw resource. Bear in mind that there's likely going to be a LOT of data, to the order of hundreds of separate pieces.
Am I right to assume SQLite is best, both in terms of memory management and overall design considerations or does SQLite not make sense if the data is basically static?
In fact, SQLite seems to be nonsense if the data is static. However, if what you are going to manipulate is a lot of data you should use it:
It will be easier to:
Retrieve data
Filter data
Sort data
Using XML files will cause some performance problems because of the way in which SAX or DOM parses XML.
It will be easier for you to update that set of data in the future (imagine that you want to add more data in the next release)
Cristian is right. Database gives you better access time and allows to modify data in very convenient way. XML might be a better idea in case of tree-like data structures.
In my opinion there are 2 question here:
what kind of data are you storing?
Do you allow user to modify this
data (for example in application or
using Notepad)
There is also 1 big disadvantage of XML - it is eventually open text. So anyone can read it. To prevent it, you would have to encrypt the data (and this means additional effort). In case of XML, using marshaling techniques (JiBX, Castor, JAXB) might be convenient and might also lower memory consumption.
Please describe what kind of data you are storing in DB, so we might come up with better answer.
Did you think of your data being stollen (from the sqlite database)?
Because as a sqlite database, anybody with root can just pull the db file and use it
Related
I have seen videos and read the documentation of Cloud firestore, from Google Firebase service, but I can't figure this out coming from realtime database.
I have this web app in mind in which I want to store my providers from different category of products. I want perform a search query through all my products to find what providers I have for such product, and eventually access that provider info.
I am planning to use this structure for this purpose:
Providers ( Collection )
Provider 1 ( Document )
Name
City
Categories
Provider 2
Name
City
Products ( Collection )
Product 1 ( Document )
Name
Description
Category
Provider ID
Product 2
Name
Description
Category
Provider ID
So my question is, is this approach the right way to access the provider info once I get the product I want?
I know this is possible in the realtime database, using the provider ID I could search for that provider in the providers section, but with Firestore I am not sure if its possible or if this is right approach.
What is the correct way to structure this kind of data in Firestore?
You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.
The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.
You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.
So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:
Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?
If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.
In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.
If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.
In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.
This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.
For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.
So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.
After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.
Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.
Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
In order to add data persistance to an oriented billing software, i wonder what is the best way to save and retrieve data in my situation.
I work with JavaFX's TableView populated by custom objects (with many string, int, booleans, ...), each one representing one bill. The user must be able to add, read and edit data on the fly. Everything is stored locally, no need to use a cloud or something like.
I usually use serialization to write my objects, but is it a safe and fast way to store around 10.000 custom objects ?
Should i use XML, Serialization, a local database (with JavaDB ?) or ... ?
By fast, i mean that the user can write, and edit data. I have no problem with a small loading time when the app is launched.
By safe, i don't mean encrypted, it is safe in the "data won't get lost or corrupted" way.
Eventually if there are multiple solutions, why one over another ?
Any persistence mechanism (flat file, relational db, nosql) can be safe if used as designed or unsafe if abused/misunderstood. Your question is a very open ended question and can get very involved, or be very light.. it all depends.
Typically the choices come down to:
flat file (say binary serialisation, csv, json or xml). Very simple mechanism which takes effort to scale to large files and care must be taken when making changes to the code base; as changes could prevent older files from being readable. One has also got to bare in mind when the data is written in relation to changes coming in from the user and the possibility of a machine crash. ie there are not transactions and so data can corrupt, not a simple topic in its own right. As for which format is best, well many a religious war has been fought over that but typically a textual format (json, xml or csv) has the advantage of being human readable which helps debugging/maintenance tasks. XML and Json support nested structures which is an advantage over CSV. As for performance, text manipulation typically slows the parser down by about 10x compared to a binary one. However there are fast implementations and slow ones, and for 10k objects you are unlikely to notice the difference.
relational database. Very useful for apps that benefit from using relational queries (SQL), and a lot of effort has gone into making them transactional and robust to machine crashes. They are generally the persistence mechanism of choice for large businesses and require some knowledge to setup and maintain the DB itself. H2 is a very simple, low cost entry provider and Oracle is at the other extreme end of the spectrum. Relational databases suffer from a domain mismatch, specifically Object design and SQL design do not map together without some effort from the developer. They also typically suffer from scaling problems as they are not usually clusterable, not a problem for 10k rows.
no sql databases (eg redis, cassandra, mongo, couch, neo4j). Generally not transactional, but they are often faster than the relational dbs and offer clusters from the get-go making them very robust. They also support different data modelling paradigms such as graph, list, document making the NoSQL landscape much richer than the relational SQL one.
I assume that you are not working on a professional project and lack a mentor, so I will wrap up by suggesting that you focus on flat files first and then pick a DB product of some kind to experiment with (H2 is very good for learning relational products, and Mongo or Redis for ease of learning one of the NoSql products).
You can use H2 database (http://www.h2database.com/), it's really convenient way to store data, and you can use embedded database this looks like
Class.forName("org.h2.Driver");
Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
// add application code here
conn.close();
H2 creates file test in user home directory named test.db.
Object serialization is safe. It's not particularly fast and you have to be very careful about how you make changes to the class to ensure that you can consistently deserialize. This is the biggest disadvantage in my opinion about object serialization.
XML (or JSON) aren't bad either. There are binding technologies like JAXB, Jackson or Gson which allow you to seamlessly map objects to XML or JSON. Permissive binding makes these formats easier to use than object serialization, having the additional benefit of being human readable and editable, but with the cost of being more verbose (consider file compression). If your storage format is a giant XML file, you can also search for records using XPath.
JavaDB (or H2 or SQLite) is good in that it implements a relational data store, so you can perform SQL queries on the data. Managing lots of records is much more straightforward with a proper database. You could probably save on disk space, too. I would recommend this approach.
Will there be multiple clients reading these files? In that case, for safety, you would have to implement some kind of file locking scheme to prevent data corruption. You can get around this with some kind of out-of-process data management, using a lightweight server such as H2 or one of the NoSQL datastores like Mongo or Cassandra.
We're going to write a new web interface for a big system based on Oracle database. All business rules are already coded in PL/SQL stored procedures and we'd like to reuse as much code as possible. We'll write some new stored procedures that will combine the existing business rules and return the final result dataset.
We want to do this on the database level to avoid java-db round trips. The interface layer will be written in Java (we'd like to use GWT), so we need a way of passing data from Oracle stored procedures to Java service side. The data can be e.g. a set of properties of a specific item or a list of items fulfilling certain criteria.
Would anyone recommend a preferable way of doing this?
We're considering one of the 2 following scenarios:
passing objects and lists of objects (DB object types defined on the
schema level)
passing a sys_refcursor
We verified that both approaches are "doable", the question is more about design decision, best practice, possible maintenance problems, flexibility, etc.
I'd appreciate any hints.
I would recommend sticking with a refcursor with well defined keys (agreed on both sides by java devs and pl/sql developers). This is much easier to extend in the future, you can easily convert the refcursor to hashmap and then a hashmap to a POJO using a apache bean utils if needed. I'm working on a big telecom project with many approaches to this issue and refcursor seems to be the best at the end of the day.
In the past I have achieved exactly the same with classic JDBC CallableStatement without any perfomance or maintenance issues. With ORM solutions like Hibernate making persistence much more flexible, you can wrap your solution around Hibernate as achieve in this post. Also see this example if you are not already familiar with the way store procedure and CallableStatement works.
It's been a while since I've done something like that, but the way I remember is that you need to define a view that calls your stored procedure, and you can then easily read the result sets from within java, with the OR-mapper of your choice.
So, this seems close to your scenario 1, which never caused any problems in my experience.
The one thing one needs to be careful is transaction handling: If your stored procedures write data, and you call several of them within a Java EE transaction, you might get into a situation of data inconsistency.
In Java Web Application, i would like to know if it is a proper (or "standard"?) way that all the essential data such as the config data, message data, code maintenance data, dropdown option data and etc (assuming all data will not updated frequently) are loaded as a "static" variables from database when the server startup.Or is it more preferred way to retrieve data by querying db per request?
Thanks for all your advice here.
It is perfectly valid to pull out all the data that are not going to be modified during application life-cycle into and keep it in memory as singleton or something.
This is a good idea because it saves DB hits and retrieval is faster. A lot of environment specific settings and other data can also be pulled once and kept in an immutable hashmap for any future request.
In a common web-app you generally do not have so many config data/option objects that can eat up lot of memory and cause OOM. But, if you have a table with hundreds of thousands of config data, better assume pulling objects as and when requested. And if you do want to keep it in memory, think of putting this in some key-value store like MemcacheD.
We used DB to store config values and ehcache to avoid a lot of DB hits. This way you don't need to worry about memory consumption (it will use whatever memory you have).
EhCache is one of many available DB cache solution and can be configured on top of JPA etc.
You can configure ehcache (or many other cache providers) to deem the tables read-only, in which case it will only go to the DB if it's explicitly told to invalidate the cache. This performs pretty well. The overhead becomes visible though when the read occurs very frequently (like 100/sec), but usually storing the config value in a local variable and avoiding reading inside loops, passing it on through the method stack during the invocation mitigates this well enough.
Storing values in a Singleton as java objects performs the best, but if you want to modify these without app. start up, it becomes a little bit involved.
Here is a simple way to achieve dynamic configuration with Java objects:
private volatile ImmutableMap<String,Object> param_value
Basically you'll have to start thinking about multi-threaded access, and memory issues (while it's quite unlikely that you'll run out of memory because of configuration values, unless you have binary data as config values etc.).
In essence, I'd recommend using the DB and some cache provider unless that part of code really needs high-performance.
I have a domain object that stores some metadata and some raw bytes. This is used for storing binary objects such as PDF documents and images.
I would like to persist the metadata in a database so it can be easily queried but I want to store the raw bytes in the file system for performance reasons. What is a good design for achieving this?
Should I have a domain object representing the raw bytes with its own DAO to perform CRUD and a separate JPA DAO to do the same for the metadata?
If that is the case would the domain object for the metadata contain a reference to the raw byte object that is marked as transient so JPA won't attempt to persist it?
Am I following an overly complex design for little benefit over storing raw bytes in the database? I'm using PostgreSQL 8.x if that makes a difference.
Many thanks.
I really wouldn't do this. Have you measured the supposed performance hit ? How are you going to maintain transactionality between your data in the database and your data on the filesystem. e.g. are you going to write to the filesystem, write to the db, and if that fails then rollback your filesystem change (which isn't as easy as simply deleting the file - do you have a previous version of the binary data?). How do you manage database backups etc. and keep everything in sync ? I would strongly recommend keeping all the data in one place.
Since you're talking about storing PDFs and the like, perhaps you need a document management system ?