Basically, I am looking for a simple way to list and access a set of strings in stream form in an abstract manner. The only issue is that Java's file-accessing API can only be used for listing and reading files, and any sort of non-filesystem storage of the data uses a different API. My question is whether there is some common API I could use (whether included in Java or as an external API) so that I could access both in an abstract manner, but also somewhat efficiently.
Essentially I want a set of lazily streamed text files. Something like Set might be reasonable, except on a filesystem, you would have to open the text streams even if you don't end up wanting to access that file.
Some sort of api like
String[] TextStorage.list()
InputStream TextStorage.open(String elementname);
which could abstractly be used to access either filesystems or databases, or some other storage mechanism I invent in the future (maybe fetching something across the internet).
Is there a library which already does this? Can I do this with the already existing Java API? Do I need to write this myself? I'd be surprised if no-one has encountered this problem before, but my google-fu and stackoverflow searches don't seem to find anything.
you might use HSQL
http://hsqldb.org/
Related
Unfortunately I couldn't find anything specific to this topic / to my problem. Here we go:
I'm building a JavaFX Business Application for a friend of mine. Unfortunately I do not have any possibility to connect to a Database. I want the Application to load a savestate from a file. The application contains a list with clients and the clients got some specific properties. I do not want to hardcode this to a .prop or .txt file, because I'm sure that there's a different way of doing this, isn't there?
Thanks in advance, appreciate it!
Lots of choices for persisting data to local storage. The exact choice depends on your needs. You do not describe enough details to make a specific recommendation.
Here is a list of possibilities, roughly in increasing order of complexity of your data.
Text file
If you have small amounts of simple data, save to a text file. You can store each piece in a separate file, or combine into a single file. Recent versions of Java have new classes to make this easier than ever. See Oracle Tutorial.
Comma-separate & Tab-delimited
For sets of structured data, write to text files in comma-separated values (CSV) or tab-delimited values. For example a list of people with rows for each person, and columns for name, phone number, and email address.
While reading/writing such files is easy enough to program yourself, I suggest using an established library to eliminate the drudgery, avoid bugs, and save yourself some time. There are a few such libraries written in Java.
My favorite is the Apache Commons CSV project. This library makes easy work of the chore of reading/writing such files. Despite the name, this library supports tab-delimited as well as comma-separated formats. I've written a few Answers here on Stack Overflow showing how to use this library, as you can see here, here, and here.
By the way, plain old ASCII defines a few character positions explicitly for delimiting in data files, with four levels of grouping (document, group, record/row, and field). Unicode, of course, inherits these from ASCII as code points. I am puzzled why these have remained so obscure and so infrequently used. Seems much more logical to me than using commas and tabs which may well exist inside the data payload.
Serialization
You can write out the data values stored within an object. This is called serialization. Java has a serialization facility built-in, but be sure to study up on the details.
To more simply write out an object’s values and later read them back in to reconstitute an object, I have enjoyed using the Simple XML Serialization project. This works well for relatively simple needs, and is aimed at the situation where you want the structure of a class to drive the process of determining what to write.
Java has other XML binding facilities both built-in and third-party. These are much more powerful in their flexibility. They are especially good for when you want to define and verify the XML structure in a rigid fashion such as defining a XML DTD or XML Schema against which to validate the data and perhaps even generate the Java class in which to represent the data.
Embedded database
For more complicated data, use an embedded relational database.
The SQLite database is bundled with many platforms. This is a C-based library, not pure Java. As the name indicates, SQLite is indeed quite “lite“, lacking rigid data types and many other common database features. SQLite is meant to be an alternative to writing text files than as a competitor to more serious databases. It is a great product if your needs fit the sweet-spot of its capabilities.
My first choice for an embedded database would be H2 Database Engine. Built in pure Java. Can be run inside your app, or separately as a server (you choice). Has sophisticated relational database features. Has been around for years, often updated, and is well-worn. The principal author has much experience in the field.
For my Java project I am looking for a convenient way to store my data. I have the following requirements:
It should be easy to synchronize with subversion (which I use for my Java code and other stuff). So I guess file-based is appropriate.
I want to be able to get certain elements without having to read all data into memory. Like in a database ("give me all objects with/without property x", "give me all information about object with certain ID").
I want to be able to read and write in this way.
I guess a database is overkill for my purpose, difficult to sync and I have to be admin/root on all machines to install it. (right?)
So I was thinking of using XML, but I heard that XML parsing in Java does not work very well. Or can anyone point me to a good library?
Then I was thinking of CSV. But all examples I saw (here and elsewhere) read the data into memory before processing it, which is not what I want.
I hope you can help me with this problem, because I am not so experienced with Java.
Edit:
Thank you for downvoting this question without any comment. This is not helpful at all because now I have no new information on my problem and I also have no idea what I did wrong with respect to this community's rules.
You can use Datanucleus (ORM) and use it with an XML Datastore
http://www.datanucleus.org/products/datanucleus/datastores/xml.html
I quote from Apache Commons Page for Commons FileUpload
This page describes the traditional API of the commons fileupload
library. The traditional API is a convenient approach. However, for
ultimate performance, you might prefer the faster Streaming API.
My Question
What specific differences make Streaming API faster than traditional API?
The key difference is in the way you're handling the file, as you noticed by yourself with the factory class.
The streaming API is not saving in disk while getting the input stream. In the end, you'll be able to handle the file faster (with a cost on temporary memory)... but the idea is to avoid saving the binary in disk unless you really want/need to.
After that, you are able to save the data to disk, of course, using a bufferedinputstream, a byte array or similar.
EDIT: The handler when you open the stream ( fileItemStreamElement.openStream() ) is a common InputStream instance. So, the answer to your "what if it's a big file" is something like this Memory issues with InputStream in Java
EDIT: The streaming API should not save to disk OR save in memory. It simply provides a stream you can read from to copy the file to where ever you want. This is a way to avoid having a temp directory and also avoid allocating enough memory to hold the file. This should be faster at least because it is not copied twice, once from the browser to disk/memory and then again from disk/memory to where ever you save it.
The traditional API, which is described in the User Guide, assumes, that file items must be stored somewhere, before they are actually accessable by the user. This approach is convenient, because it allows easy access to an items contents. On the other hand, it is memory and time consuming.
http://commons.apache.org/fileupload/streaming.html
The streaming API should not save to disk OR save in memory. It simply provides a stream you can read from to copy the file to where ever you want. This is a way to avoid having a temp directory and also avoid allocating enough memory to hold the file. This should be faster at least because it is not copied twice, once from the browser to disk/memory and then again from disk/memory to where ever you save it.
Streaming generally refers to a API (like Apache FileUpload or StAX) in which data is transmitted and parsed serially at application run time, often in real time, and often from dynamic sources whose contents are not precisely known beforehand.
Traditional models refer to APIs like (Traditional file handling APIs, DOM API) which provide a lot more detail information about the data.
Like for a FileHandling API Traditional approach assumes that file items must be stored somewhere, before they are actually accessible by the user. This approach is convenient, because it allows easy access to an items contents. On the other hand, it is memory and time consuming.
An Streaming API will have a smaller memory footprint and smaller processor requirements and can have higher performance in certain situations.
It works on the fundamental of "cardboard tube" view of the document you are working with.
I have my most of my apps "dynamic" data stored in the datastore.
However, I also have a large collection of static data that would only change with new builds of the app. A series of flat files seems like it might be simpler than managing it in the datastore.
Are there standard solutions to this? How about libraries to make loading/parsing this content quick and easy? Does it make more sense to push this data to the datastore? Which would perform better?
Anyone else have this problem and have war stories they can share?
Everything depends on how you need to use the information.
I for instance have an application that needs to have a starting state provided from static data. Since I wanted this static data to be easily prepared outside the application, I put the data as spreadsheets on Google Docs and then I have an administrative function in my web app to load the starting state through Google Docs Spreadsheet API to objects in the datastore. It works fairly well, although there are some reliability issues that I haven't quite worked out yet (I sometimes need to restart the process).
In other cases, you might get away with just including the data as static property/xml files and load them through the standard Java resource APIs (getResourceAsStream and such). I haven't tried this approach though since it wasn't meaningful in my particular situation.
I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.
I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).
Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
Or do I even need that, should I just go with raw/custom Java?
Is there some simple library that
helps me in saving, loading, deleting
etc. the files? It's not that tricky
to implement it myself, but I wonder
if there are existing solutions? Just
a simple library that already provides
easy access to filesystem (preferrably
over different operating systems).
Java API
Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.
You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.
Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.
The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.
Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.
Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).
Web service
The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.
Transactions
Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.
You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:
Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
Create. Same story when user want to create a file
Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.
This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).
There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.
Support for XML and JSON seems to be experimental.