I am uploading files using multipart form, Apache FileUpload, etc. It work fine.
But, I want to know what are the best practices or common practices when saving files in server, according to following:
Naming the files in server (i.e.: What name is better? Some UUID generated, or the row ID generated by db table when I insert the file associated data)
The best location for files inside the server (What is better? i.e. In linux server which folder or partition I should use. Do I have to encrypt the uploaded files?)
When I put a link to access the files from browser: Is better a direct access, or using a servlet?
If you do it this way (files in filesystem, metadata in DB) then row ID for filename is not a bad idea (at least it ensures uniqueness). Unfortunately you will have to take care that filesystem and database are in sync, so it will require careful coding.
If you care for performance files can be stored on a separate HDD (or NAS). Note that if the number of files is going to be big (thousands) you should not put all of them in one folder, but instead group them in subfolders, each containing at most several hundreds of files. It will ensure low access time if the number of files gets big. The use of encryption should depend on your business needs (do the files contain confidential data?).
Servlet is a better way, as it hides the real storage details from the client and it's more proof for future changes in the application. It has also some other benefits (eg. you can implement your access control, you get caching in browsers/proxies out-of-the-box, etc ). And it's a must if you use encryption.
After having had recurring trouble with server file system operations (missing permissions, different behaviour on different platforms) I would recommend just stuffing file data as BLOBs in your database. This way, you do not need to elaborate on unique file naming schemes, and all sensitive data will lie in one place.
In this case, you will need a servlet for downloading, which IMHO is the better way even for accessing data stored in files.
Related
I'm looking to make a web that makes use of two sets of databases, given in CSV format and both are 10 MB in size. I've chosen to use Java dynamic web app with JSP, that users can use to search and sort through the data provided through the CSV.
From what I understand, the user/client sends a request to the server, the server will call upon the Java cases in the backend, which has the different sorting methods and data from the CSV that can be manipulated.
This data, that sits in the backend, is where I'm running into confusion. I know its possible to load the data to a database, and have that sitting on the server that I could call upon.
If I use a class that reads the CSV and loads the data to arrays, Would this reading work be done every time someone accesses the website causing latency or would it already be loaded into arrays in the server?
Depending on the scope you use it would be loaded in an application context, therefore one time (say in a singleton class loaded at the application startup).
But I wouldn't recommend this approach, I would recommend a proper designed database where you can put your csv data into. This way you would have the database engine to help you organize your data which would give you scalability and maintainability (although with a proper design of your classes say a DAO pattern would give you the same).
Organized data in a database would give you more flexibility to search through your data using already made SQL functions.
In order to make my case here are some advantages of a Database system over a file system:
No redundant data – Redundancy removed by data normalization
Data Consistency and Integrity – data normalization takes care of it too
Secure – Each user has a different set of access
Privacy – Limited access
Easy access to data
Easy recovery
Flexible
Concurrency - The database engine will allow you to concurrent read the data or even write to it.
I'm not listing the disadvantages since I'm making my case :)
I can read from a CSV file to build your arrays. You can then add the arrays to session scope. The CSV file will only be read at the servlet that processes it. Future usage will be retrieved from session.
I am facing a problem for which I don't have a clean solution. I am writing a Java application and the application stores certain data in a limited set of files. We are not using any database, just plain files. Due to some user-triggered action, certain files needs to be changed. I need this to be a all-or-nothing operation. That is, either all files are updated, or none of them. It is disastrous if for example 2 of the 5 files are changed, while the other 3 are not due to some IOException.
What is the best strategy to accomplish this?
Is embedding an in-memory database, such as hsqldb, a good reason to get this kind of atomicity/transactional behavior?
Thanks a lot!
A safe approach IMO is:
Backup
Maintain a list of processed files
On exception, restore the ones that have been processed with the backed up one.
It depends on how heavy it is going to be and the limits for time and such.
What is the best strategy to accomplish this? Is embedding an in-memory database, such as hsqldb, a good reason to get this kind of atomicity/transactional behavior?
Yes. If you want transactional behavior, use a well-tested system that was designed with that in mind instead of trying to roll your own on top of an unreliable substrate.
File systems do not, in general, support transactions involving multiple files.
Non-Windows file-systems and NTFS tend to have the property that you can do atomic file replacement, so if you can't use a database and
all of the files are under one reasonably small directory
which your application owns and
which is stored on one physical drive:
then you could do the following:
Copy the directory contents using hard-links as appropriate.
Modify the 5 files.
Atomically swap the modified copy of the directory with the original
Ive used the apache commons transactions library for atomic file operations with success. This allows you to modify files transactionally and potentially roll back on failures.
Here's a link: http://commons.apache.org/transaction/
My approach would be to use a lock, in your java code. So only one process could write some file at each time. I'm assuming your application is the only which writes the files.
If even so some write problem occurs to "rollback" your files you need to save a copy of files like upper suggested.
Can't you lock all the files and only write to them once all files have been locked?
In my web application which I use primefaces+SpringWebflow. I need to add a picture which is scanned or uploaded to every customer registered. I can not think of any good solution to store the files. Criteria is that.
I want only application have access to files and it shouldnt be possible to access the images directly.
I tried to store the files in database but thats not good idea. i store the path in database.
I would like to have relative paths to files. and efficient wy to access the file.
Just store the files in the database. If you try to put them somewhere else, it will be difficult to migrate your application from one server to another, because its data will be spread across multiple sources. Upgrades will be more difficult.
If performance / database size becomes an issue, revisit this decision, but make sure you always measure rather than guess the performance.
Why is it not a good idea to store the files in DB ?
What will happen when you deploy your application on a clustered server environment ? In clustered server environment your solution of keeping images on file-system will not work.
You need to think more on keeping storage of images in DB in mind. Just investigate what performance hit you are getting in storage/retrieval of images in DB. try different kind of storage frameworks like nosql DBs. Essentially you can not get away with storing data at a central location I think.
My Java application is currently using ZIP as a project file format. The project files contain a few XML files and many image and sound files.
The project files are getting pretty big, and since I can't find a way with the java.util.zip classes to write to a ZIP file without recreating it, my file saves are becoming very slow. So for example, if I just want to update one XML file, I need to rewrite the entire ZIP.
Is there some other Java ZIP library that will allow me to do random writes to a ZIP file?
I know switching to something like SQLite solves the random write issue. Would using SQLite just to write XML, Sound and Images as blobs be an appropriate use?
I suppose I could come up with my own file format and use RandomAccessFile but then there would be a lot of bookkeeping I'd have to write.
Update...
My file format is very much like Office Open XML. It is a ZIP file containing XML and other resources.
Someone must have solved the problem of how to do random writes to update a ZIP file. Does anyone know how?
There exist so-called single-file virtual file systems, that let you create file-based containers and provide file-system like structure and APIs. One of the samples is SolFS (it has C-written core with JNI wrapper) and some other C- and Delphi-written solutions (I don't remember their names at the moment). I guess there exist similar native Java solutions as well.
First of all I would separate your app's resources in those that are static (such as images) and those that can be changed (the xml files you mentioned).
Since the static files won't be re-written, you can continue to store them in a zip file, which IMHO is a good approach to deploy any resources.
Now you have 2 options:
Since the non-static files are probably not too big (the xml files are likely to be smaller than images+sounds), you can stick with your current solution (zip file) and simply maintain 2 zip files, of which only one (the smaller one with the changeable files) can/will be re-written.
You could use a in-memory-database (such as hsqldb) to store the changeable files and only persist them (transferring from the database to a file on the drive) when your application shuts down or that operation is explicitly needed.
sqlite is not always fast (at least in my experience). I would suggest individually compressing the XML files -- you'll still get decent compression, and just use the file system to save them. You could experiment with btrfs, or just go with ext4. If you're not on Linux, then this should still work okay, but it might not be as fast until things are cached in memory.
the idea is that if you do not have redundancy between XML files, then you don't get that much saving by compressing them in one "solid" archive.
Before offering another answer along the lines of using properly structured JARs, I have to ask -- why does the project need to be encapsulated in one file? How do you distribute the program to users to run?
If you must keep a project contained within a single file and be able to replace resources efficiently, yes I would say SQLite is a good choice.
If you do choose to use SQLite, also consider converting some of the XML schemas to one or more SQL tables rather than storing large XML documents as BLOBs.
I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.
I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).
Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
Or do I even need that, should I just go with raw/custom Java?
Is there some simple library that
helps me in saving, loading, deleting
etc. the files? It's not that tricky
to implement it myself, but I wonder
if there are existing solutions? Just
a simple library that already provides
easy access to filesystem (preferrably
over different operating systems).
Java API
Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.
You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.
Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.
The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.
Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.
Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).
Web service
The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.
Transactions
Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.
You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:
Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
Create. Same story when user want to create a file
Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.
This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).
There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.
Support for XML and JSON seems to be experimental.