Application configuration files [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
OK, so I don't want to start a holy-war here, but we're in the process of trying to consolidate the way we handle our application configuration files and we're struggling to make a decision on the best approach to take. At the moment, every application we distribute is using it's own ad-hoc configuration files, whether it's property files (ini style), XML or JSON (internal use only at the moment!).
Most of our code is Java at the moment, so we've been looking at Apache Commons Config, but we've found it to be quite verbose. We've also looked at XMLBeans, but it seems like a lot of faffing around. I also feel as though I'm being pushed towards XML as a format, but my clients and colleagues are apprehensive about trying something else. I can understand it from the client's perspective, everybody's heard of XML, but at the end of the day, shouldn't be using the right tool for the job?
What formats and libraries are people using in production systems these days, is anyone else trying to avoid the angle bracket tax?
Edit: really needs to be a cross platform solution: Linux, Windows, Solaris etc. and the choice of library used to interface with configuration files is just as important as the choice of format.

YAML, for the simple reason that it makes for very readable configuration files compared to XML.
XML:
<user id="babooey" on="cpu1">
<firstname>Bob</firstname>
<lastname>Abooey</lastname>
<department>adv</department>
<cell>555-1212</cell>
<address password="xxxx">ahunter#example1.com</address>
<address password="xxxx">babooey#example2.com</address>
</user>
YAML:
babooey:
computer : cpu1
firstname: Bob
lastname: Abooey
cell: 555-1212
addresses:
- address: babooey#example1.com
password: xxxx
- address: babooey#example2.com
password: xxxx
The examples were taken from this page: http://www.kuro5hin.org/story/2004/10/29/14225/062

First: This is a really big debate issue, not a quick Q+A.
My favourite right now is to simply include Lua, because
I can permit things like width=height*(1+1/3)
I can make custom functions available
I can forbid anything else. (impossible in, for instance, Python (including pickles.))
I'll probably want a scripting language somewhere else in the project anyway.
Another option, if there's a lot of data is to use sqlite3, because they're right to claim
Small.
Fast.
Reliable.
Choose any three.
To which I would like to add:
backups are a snap. (just copy the db file.)
easier to switch to another db, ODBC, whatever. (than it is from fugly-file)
But again, this is a bigger issue. A "big" answer to this probably involves some kind of feature matrix or list of situations like:
Amount of data, or short runtime
For large amounts of data, you might want efficient storage, like a db.
For short runs (often), you might want something that you don't need to do a lot of parsing for, consider something that can be mmap:ed in directly.
What does the configuration relate to?
Host:
I like YAML in /etc. Is that reimplemented in windows?
User:
Do you permit users to edit config with text editor?
Should it be centrally manageable? Registry / gconf / remote db?
May the user have several different profiles?
Project:
File(s) in project directory? (Version control usually follows this model...)
Complexity
Are there only a few flat values? Consider YAML.
Is the data nested, or dependent in some way? (This is where it gets interesting.)
Might it be a desirable feature to permit some form of scripting?
Templates can be viewed as a kind of configuration files..

XML XML XML XML. We're talking config files here. There is no "angle bracket tax" if you're not serializing objects in a performance-intense situation.
Config files must be human readable and human understandable, in addition to machine readable. XML is a good compromise between the two.
If your shop has people that are afraid of that new-fangled XML technology, I feel bad for you.

Without starting a new holy war, the sentiments of the 'angle bracket tax' post is one area where I majorly disagree with Jeff. There's nothing wrong with XML, it's reasonably human readable (as much as YAML or JSON or INI files are) but remember its intent is to be read by machines. Most language/framework combos come with an XML parser of some sort for free which makes XML a pretty good choice.
Also, if you're using a good IDE like Visual Studio, and if the XML comes with a schema, you can give the schema to VS and magically you get intellisense (you can get one for NHibernate for example).
Ulimately you need to think about how often you're going to be touching these files once in production, probably not that often.
This still says it all for me about XML and why it's still a valid choice for config files (from Tim Bray):
"If you want to provide general-purpose data that the receiver might want to do unforeseen weird and crazy things with, or if you want to be really paranoid and picky about i18n, or if what you’re sending is more like a document than a struct, or if the order of the data matters, or if the data is potentially long-lived (as in, more than seconds) XML is the way to go.
It also seems to me that the combination of XML and XPath hits a sweet spot for data formats that need to be extensible; that is to say, it’s pretty easy to write XML-processing code that won’t fail in the presence of changes to the message format that don’t touch the piece you care about."

#Guy
But application config isn't always just key/value pairs. Look at something like the tomcat configuration for what ports it listens on. Here's an example:
<Connector port="80" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" />
<Connector port="8009"
enableLookups="false" redirectPort="8443" protocol="AJP/1.3" />
You can have any number of connectors. Define more in the file and more connectors exist. Don't define any more and no more exist. There's no good way (imho) to do that with plain old key/value pairs.
If your app's config is simple, then something simple like an INI file that's read into a dictionary is probably fine. But for something more complex like server configuration, an INI file would be a huge pain to maintain, and something more structural like XML or YAML would be better. It all depends on the problem set.

We are using ini style config files. We use the Nini library to manage them. Nini makes it very easy to use. Nini was orignally for .NET but it has been ported to other platforms using Mono.

XML, JSON, INI.
They all have their strengths and weaknesses.
In an application context, I feel that the abstraction layer is the important thing.
If you can choose a way to structure the data that is a good middle ground between human readability and how you want to access/abstract the data in code, you're golden.
We mostly use XML where I work, and I cant really believe that a configuration file loaded into a cache as objects when first read or after it has been written to, and then abstracted away from the rest of the program, really is that much of a hit on neither CPU nor disk space.
And it is pretty readable too, as long as you structure the file right.
And all languages on all platforms supports XML through some pretty common libraries.

#Herms
What I really meant was to stick to the recommended way software should store configuration values for any given platform.
What you often get then is also the recommended ways these should/can be modified. Like a configuration menu in a program or a configuration panel in a "system prefs" application (for system services softwares ie). Not letting the end users modify them directly via RegEdit or NotePad...
Why?
The end users (=customers) are used to their platforms
System for backups can better save "safe setups" etc
#ninesided
About " choice of library ", try to link in (static link) any selected library to lower the risk of getting into a version-conflict-war on end users machines.

If your configuration file is write-once, read-only-at-bootup, and your data is a bunch of name value pairs, your best choice is the one your developer can get working first.
If your data is a bit more complicated, with nesting etc, you are probably better off with YAML, XML, or SQLite.
If you need nested data and/or the ability to query the configuration data after bootup, use XML or SQLite. Both have pretty good query languages (XPATH and SQL) for structured/nested data.
If your configuration data is highly normalized (e.g. 5th normal form) you are better off with SQLite because SQL is better for dealing with highly normalized data.
If you are planning to write to the configuration data set during program operation, then you are better off going with SQLite. For example, if you are downloading configuration data from another computer, or if you are basing future program execution decisions on data collected in previous program execution. SQLite implements a very robust data storage engine that is extremely difficult to corrupt when you have power outages or programs that are hung in an inconsistent state due to errors. Corruptible data leads to high field support costs, and SQLite will do much better than any home-grown solution or even popular libraries around XML or YAML.
Check out my page for more information on SQLite.

As far as I know, the Windows registry is no longer the preferred way of storing configuration if you are using .NET - most applications now make use of System.Configuration [1, 2]. Since this is also XML based it seems to be that everything is moving in the direction of using XML for configuration.
If you want to stay cross-platform I would say that using some sort of a text file would be the best route to go. As for the formatting of said file, you might want to take into account if a human is going to be manipulating it or not. XML seems to be a bit more friendly to manual manipulation than INI files due to the visible structure of the file.
As for the angle bracket tax - I don't worry about it too often as the XML libraries take care of abstracting it. The only time it might be a consideration is if you have very little storage space to work with and every byte counts.
[1] System.Configuration Namespace - http://msdn.microsoft.com/en-us/library/system.configuration.aspx
[2] Using Application Configuration Files in .NET - http://www.developer.com/net/net/article.php/3396111

We are using properties files, simply because Java supports them natively. A couple of months ago I saw that SpringSource Application Platform uses JSON to configure their server and it looks very interesting. I compared various configuration notations and came to the conclusion that XML seems to be the best fit at the moment. It has nice tools support and is rather platform independent.

Re: epatel's comment
I think the original question was asking about application configuration that an admin would be doing, not just storing user preferences. The suggestions you gave seem more for user prefs than application config, and aren't usually something that the user would ever deal with directly (the app should provide the configuration options in the UI, and then update the files). I really hope you'd never make the user have to view/edit the Registry. :)
As for the actual question, I'd say XML is probably OK, as plenty of people will be used to using that for configuration. As long as you organize the configuration values in an easy to use manner then the "angle bracket tax" shouldn't be too bad.

Maybe a bit of a tangent here but my opinion is that the config file should be read into a key value dictionary/hash table when the app first starts up and always accessed via this object from then on for speed. Typically the key/value table starts off as string to string but helper functions in the object do things such DateTime GetConfigDate(string key) etc...

I think the only important thing is to choose a format that you prefer and can navigate quickly. XML and JSON are both fine formats for configs and are widely supported--technical implementation isn't at the crux of the issue, methinks. It's 100% about what makes the task of config files easier for you.
I have started using JSON, because I work quite a bit with it as a data transport format, and the serializers make it easy to load into any development framework. I find JSON easier to read than XML, which makes handling multiple services, each using a config file that is modified quite frequently, that much easer for me!

What platform are you working on? I'd recommend trying to use the preferred/common method for it.
MacOSX - plists
Win32 - Registry (or are there a new one here, long since I developed on it)
Linux/Unix - ~/.apprc (name-value perhaps)

Related

Niche Templating Engine for Batch Jobs

First of all start off by saying this is more of an exploratory question more than a technical problem. I feel it doesn't belong to Code Review because there's nothing to review. I'm just trying to figure out the best approach to take.
My requirement is to build a batch process that can process user-defined files. These files usually come from external sources, so the filenames are not standard. One requirement that's causing me some headaches is supporting arbitrary dates in the filenames. And since these are batch job definitions that run on particular intervals, the definition has to be flexible enough to support it.
For example, one definition might be
File1_Type1_{CurrentDate in YYMMDD}
File1_Type2_{CurrentDate in YYYYMMDD}
File1_Type3_Static_Text
So basically, I feel like I need a full-fledged template engine in order to support these cases. However, that sounds like huge overkill, so I'm interested to hear people's thoughts on this.
Since I'm focusing on Java/Scala, I've found this library
https://scalate.github.io/scalate/documentation/ssp-reference.html
If we let users create ssp files like so:
#import(java.util.Date)
File1_Type1_${new Date}
then it gives the user full control over the entire formatting. But feels overkill to me? Or not? Welcome any feedback.
There are a huge number in the Java space. I've used Apache velocity and https://freemarker.apache.org/.
I'm not aware of anything in the Scala space, but it would be an intriguing idea.

Good practice for layered application with internationalization

I'm designing a new application in JSE which I want to internationalize.
I've never done such an application. I'm looking for the best practices about the internationalization. The application while be writing the translated data in files or DB. I've searched about best practices but I didn't found anything about my main question(the first one).
Should I put all the internationalization data in some layer or next to the object they are about ?
Could I directly use the properties files as a kind of enum to do a switch case ?
Or can I reverse engineer the data catched and know the default internationalize value and work with it?
I did encounter several strategies. I would start with a properties file.
One factor is that the data must be professionally maintained:
keep it in version control.
keep a version number for us humans, "1.0.23"
keep the texts ordered and nice, to help translation.
keep a second properties file with a glossary for consistent translation.
Undermore I did see generating properties or java ListResourceBundles from DocBook XML, Excel, translation memories. And yes, database.
Maintenance of data must be done careful, as several different parties will use the text at different times.
Programming tools, consistency checks and preparing data, communicating are tasks not to neglect.
Properties files are not entirely ideal, but IDEs have generally some support for them.
Set up everything for UTF-8, though take notice that properties files use ISO-8859-1, but you can use \uXXXX escaping or do a encoding conversion in your build process. ListResourceBundle java sources, generated than, would be an alternative.

What's the purpose of Properties.loadFromXML() and Properties.storeToXML() methods?

I am designing a simple library that deals with properties files.
I noticed that since JRE 1.5 the class Properties defines methods like:
public synchronized void loadFromXML(InputStream in)
public void storeToXML(OutputStream os, String comment)
I am questioning the fact that this is a real enhancement in the API of this class. Properties files have been, since JRE 1.5 text based files, and the newly introduced XML format is not adding anything to the functionalities, other than the possibility to use a different forma which is
more verbose
more complex (to understand, to change, to parse)
more inefficient (it uses dom internally to parse into an hastable: it consumes more memory, it requires helper classes in the implementation, and most likely is also slower)
more fragile (xml requires escaping of characters <>&"' while properties only need to escape backslashes, since it also supports Java backslash escaping)
it breaks backward compatibility of the programs using it, since users running JDK 1.4 won't be able to read xml properties. (ok, who cares...)
So I fail to understand the reason behind why engineers in Sun added this feature.
The question is:
Does anybody finds some advantage of using an XML-based properties files over a traditional text based one?
I need to evaluate this problem, since I don't want to add a useless feature to my simple library that I cited before.
Did you ever used an XML-based properties file over a Java Properties file? And why?
Note: same question can be made for Log4J xml file format, but at least Log4J xml format adds nesting ability and some sort of syntax which has some meaning, and I do understand that. But with this xml format for properties, I don't.
If staying within the Java environment, using a Java properties file works great. Even if you expect other programming languages to interact with your library, you'll probably be ok with a 'regular' properties file. However, for hierarchical data, XML is the standard. The reason you may want to support this change, and possibly the reason why Sun included it, is that other programming languages have extensive libraries for parsing XML files for hierarchical data.
The reason I'm answering is because I have actually used this feature before! But not for a great reason. In one program I'm working on now, I've found it easiest to keep a set of data in a properties object and I output the object to XML so that it can later be read by Python. At the moment, the data is further manipulated in a Python script and more children are added to the XML file. Without being able to output easily to XML, this would be a little more painful.
If I had the time, I wouldn't bother outputting to XML though. The main reason I'm using the Python code that takes in the XML is because somebody else wrote it and I'm temporarily using it until I have the time to reevaluate that section of my program and re-code it.
So there's a reason for using the XML! It isn't a good one, but it's a reason.
I imagine there are other cases like this where having the properties outputted as an XML aids in compatibility with other languages, since most languages have a robust XML parsing library and it makes it easier to manipulate hierarchical data. And in scientific programming, it seems you rarely get the luxury of sticking to one language.
Some points:
You can use standard, cross-platform tools to create it
You don't need to worry about peculiarities of escaping and character encoding, as you can use standard tools, which actually makes it more robust. The old properties file format is poorly specified.
Standard, cross-platform tools can use the data.
For most applications Java is used in, a bit of start up time isn't going to make much difference (particularly given the start up time of the rest of the system).
Java SE 1.6 is a bout to complete its end-of-life. Pre-1.5 isn't particularly relevant for Java SE (or EE).
But no, I've never seen it actually used.
Afaik the XML format is encouraged because of the encoding: (by specs) strictly ASCII for plain files (may I suggest you http://mojo.codehaus.org/native2ascii-maven-plugin/), UTF-8 (default) for XML property files as stated in http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Properties.html
edit: I beg your pardon: ISO-8859-1 for property plain files

Java; Runtime Interpretation; Strategies To Add Plugins

I'm beginning to start on my first large project. It will be a program very similar to Rosetta Stone. It will be a program, used for learning a foreign language, written in Java using Swing. In my program I plan on the user being able to select downloaded courses to learn from. I will be able to create an English course since I am a native English speaker. However, I want people who speak other languages to be able to write courses for users to use as well (this is an essential part for my program to work).
Since I want the users to be able to download courses of languages they want, having it hard-coded into the program is out of the question. The courses needed to be interpreted during the runtime. Also since I want others to collaborate with my work (ie make courses), I need to make it easy for them to do so.
What would be the best way to go about doing this?
The idea I have come up with is having a strict empty course outline (hard-coded) with a simple xml file which details the text and sounds to be used. The drawback to this is that it extremely limits the author. Different languages may need to start out with learning different parts.
Any advice on the problem at hand as well as the project as a whole will be greatly appreciated. Any links to any relevant resources or information would also be greatly appreciated.
Think you for your time and effort,
Joseph Pond
Simply, you should base your program on a system such as Eclipse RCP, or the Netbeans Platform. Both of these systems already deal with exactly this problem, and both are perfectly adequate for this task. They're not just for IDEs.
It's a larger first step as you will need to learn one of these platforms beyond simply just Swing.
But, they solve the problem, and their overall organization and technique will serve your program well anyway.
Don't reinvent this wheel, just learn one of these instead.
If you are set on doing this from scratch (Will's idea isn't bad), What I would do is first lay down the file format that would be easiest to create your language course in. It could be XML, plaintext or some other format you come up with yourself.
You will probably need some flexibility in the language format because you will want to actually be able to specify things like questions and answers. XML is a pain because of all the extra terminators, but it gives a good amount of meta-data. If you like XML for that, you may consider defining your language file in YML, it gives you the data of XML but uses whitespace delineators instead of angle brackets.
You probably also want to define your file in the language it's created for, so you might or might not want to require english words as keys. If you don't want any english, you may have to skip both XML and YML and come up with your own file format--possibly where the layout and/or special symbols define the flow and "functionality".
Once you have defined the file format, you won't have to worry about hard-coding anything... you won't be able to because it will already be in the file.
Plug-in functionality would be nice as well... This is where your definition file also contains information that tells you what class to instantiate (reflectively) and use to parse/display the data. In that way you could add new types of questions just by delivering a new jar file.
If this is confusing, sorry, this is difficult in a one-way forum because I can't look at your face and see if you're following me or if I'm even going in the right direction. If you think I'm on the right track and want more details (I've done a bit of this stuff before) feel free to leave a follow-up question (or an email address) in a comment and I'd be glad to discuss it with you further.
If I was doing this, I'd seriously consider using Eclipse EMF to model the "language" for defining courses. EMF is rather daunting to start with, but it gives you:
A high-level model that can be entered/edited in a variety of ways.
An automatic mechanism for serializing "instances" (i.e. courses) to XML. (And you can tinker with the serialization if you choose.)
Automatically generated Java classes for in-memory representations of your instances. These provide APIs that are tuned to your model, an generic ones that are the EMF equivalent of Java reflection ... but based on EMF model classes rather than Java classes.
An automatically generated tree editor for your "instances".
Hooks for implementing your own constraints / validation rules to say what is a valid "course".
Related Eclipse plugins offer:
Mappings to text-based languages with generation of parsers/unparsers
Mappings to graphical languages; e.g. notations using boxes / arrows / etc
Various more advanced persistence mechanisms
Comparisons/differencing, model-to-model transformations, constraints in OCL, etc
I've used EMF in a couple of largish projects, and the main point that keeps me coming back for more is ease of model evolution ... compared with building everything at a lower level of abstraction. If my model (language) needs to be extended / changed, I can make the necessary changes using the EMF Model editor, regenerate the code, extend my custom code to do the right stuff with the extensions, and I'm pretty much done (modulo conversion of stored instances).

Are flat file databases any good? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Informed options needed about the merits of flat file database. I'm considering using a flat file database scheme to manage data for a custom blog. It would be deployed on Linux OS variant and written in Java.
What are the possible negatives or positives regarding performance for reading and writing of both articles and comments?
Would article retrieval crap out because of it being a flat file rather than a RDBMS if it were to get slash-doted? (Wishful thinking)
I'm not against using a RDBMS, just asking the community their opinion on the viability of such a software architecture scheme.
Follow Up:
In the case of this question I would see “Flat file == file system–based” For example each blog entry and its accompanying metadata would be in a single file. Making for many files organized by date structure of the file folders (blogs\testblog2\2008\12\01) == 12/01/2008
Flat file databases have their place and are quite workable for the right domain.
Mail servers and NNTP servers of the past really pushed the limits of how far you can really take these things (which is actually quite far -- files systems can have millions of files and directories).
Flat file DBs two biggest weaknesses are indexing and atomic updates, but if the domain is suitable these may not be an issue.
But you can, for example, with proper locking, do an "atomic" index update using basic file system commands, at least on Unix.
A simple case is having the indexing process running through the data to create the new index file under a temporary name. Then, when you are done, you simply rename (either the system call rename(2) or the shell mv command) the old file over the new file. Rename and mv are atomic operations on a Unix system (i.e. it either works or it doesn't and there's never a missing "in between state").
Same with creating new entries. Basically write the file fully to a temp file, then rename or mv it in to its final place. Then you never have an "intermediate" file in the "DB". Otherwise, you might have a race condition (such as a process reading a file that is still being written, and may get to the end before the writing process is complete -- ugly race condition).
If your primary indexing works well with directory names, then that works just fine. You can use a hashing scheme, for example, to create directories and subdirectories to locate new files.
Finding a file using the file name and directory structure is very fast as most filesystems today index their directories.
If you're putting a million files in a directory, there may well be tuning issues you'll want to look in to, but out of that box most will handle 10's of thousands easily. Just remember that if you need to SCAN the directory, there's going to be a lot of files to scan. Partitioning via directories helps prevent that.
But that all depends on your indexing and searching techniques.
Effectively, a stock off the shelf web server serving up static content is a large, flat file database, and the model works pretty good.
Finally, of course, you have the plethora of free Unix file system level tools at your disposal, but all them have issues with zillions of files (forking grep 1000000 times to find something in a file will have performance tradeoffs -- the overhead simply adds up).
If all of your files are on the same file system, then hard links also give you options (since they, too, are atomic) in terms of putting the same file in different places (basically for indexing).
For example, you could have a "today" directory, a "yesterday" directory, a "java" directory, and the actual message directory.
So, a post could be linked in the "today" directory, the "java" directory (because the post is tagged with "java", say), and in its final place (say /articles/2008/12/01/my_java_post.txt). Then, at midnight, you run two processes. The first one takes all files in the "today" directory, checks their create date to make sure they're not "today" (since the process can take several seconds and a new file might sneak in), and renames those files to "yesterday". Next, you do the same thing for the "yesterday" directory, only here you simply delete them if they're out of date.
Meanwhile, the file is still in the "java" and the ".../12/01" directory. Since you're using a Unix file system, and hard links, the "file" only exists once, these are all just pointers to the file. None of them are "the" file, they're all the same.
You can see that while each individual file move is atomic, the bulk is not. For example, while the "today" script is running, the "yesterday" directory can well contain files from both "yesterday" and "the day before" because the "yesterday" script had not yet run.
In a transactional DB, you would do that all at once.
But, simply, it is a tried and true method. Unix, in particular, works VERY well with that idiom, and the modern file systems can support it quite well as well.
(answer copied and modified from here)
I would advise against using a flat file for anything besides read-only access, because then you'd have to deal with concurrency issues like making sure only one process is writing to the file at once. Instead, I recommend SQLite, a fully functional SQL database that's stored in a file. SQLite already has built-in concurrency, so you don't have to worry about things like file locking, and it's really fast for reads.
If, however, you are doing lots of database changes, it's best to do them all at once inside a transaction. This will only write the changes to the file once, as opposed to every time an change query is issued. This dramatically increases the speed of doing multiple changes.
When a change query is issued, whether it's inside a tranasction or not, the whole database is locked until that query finishes. This means that extremely large transactions could adversely affect the performance of other processes because they must wait for the transaction to finish before they can access the database. In practice, I haven't found this to be that noticeable, but it's always good practice to try to minimize the number of database modifying queries you issue, and it's certainly faster then trying to use a flat file.
This has been done with asp.net with Dasblog. It uses file based storage.
A few details are listed on this older link.
http://www.hanselman.com/blog/UpcomingDasBlog19.aspx
You can also get more details on http://dasblog.info/Features.aspx
I've heard some mixed opinions on the performance. I'd suggest you research that a bit more to see if that type of system would work well for you. This is the closest thing I have heard about yet.
Writing your own engine in native code can outperform a general purpose database.
However, the quality of the engine and the feature level will never approach that. All the things that databases give you as core features - indexing, transactions, referential integrity - you would have to implement all them yourself.
There's nothing wrong than reinventing the wheel (after all, Linux was just that), but keep in mind your expectations and time commitment.
I'm answering this not to answer why flat file databases are good or bad, others have done an ample job at that.
However, some have been pointing at SQLite which does it's job just fine. Since you are using Java, your best option would be to use HSQLDB, which does precisely the same as SQLite, but is implemented in Java and embeds into your application.
Most of the time a flat file database is enough now. But you will thank your younger self if you start your project with a database. This could be SQLite, if you don't want to set up a whole database system like PostgreSQL.
Check this out http://jsondb.io a opensource Java based database has most of what you are looking for.
Saves data as flat .json files, Multithreading Support, Encryption Support, ORM support, Atomicity Support, XPATH based advanced query support.
Disclaimer: I created this database.
Horrible idea. Appending would involve seeking to the end of the file every time you want to add something. Updating would require rewriting the entire file each time. Reading involves a table scan (or maintaining a separate index, which would have the same problems with writing/updating). Just use a database unless, of course, you re-implement all the stuff that an RDBMS already provides to make your solution even moderately scalable.
They seem to work quite well for high-write, low-read, no-update databases, where new data is appended.
Web servers and their cousins rely on them heavily for log files.
DBMS software as well use them for logs.
If your design falls within these limits, you're in good company, it seems. You might want to keep metadata and pointers in a database, and set up some kind of fast asynchronous queue-writer to buffer the comments, but the filesystem is already pretty good at that level of buffering and write-locking.
Flat file databases are possible but consider the following.
Databases need to attain all the ACID elements (atomicity, consistency, isolation, durability) and, if you're going to ensure that's all done in a flat file (especially with concurrent access), you've basically written a full-blown DBMS.
So why not use a full-blown DBMS in the first place?
You'll save yourself the time and money involved with writing (and re-writing many times, I'll guarantee) if you just go with one of the free options (SQLite, MySQL, PostgresSQL, and so on).
You can use fiat file databases if it is small enough does not have lost of random access. Big file with lot of random access will be very slow. And no complex queries. No joins, no sum, group by etc. You also can not expect to fetch hierarchical data from flat file. XML format is much better for complex structures.

Categories