I need to sore attachments at server side. I can store them either under blob column of database or under file directory.
My question is which one is more reliable, scalable and maintainable?
EDIT:-
if we go for file system, we have to handle synchroniztion yourself. Is n't it ? For example if two users are trying to create/update the File under same directory how will we handle concurrency with filesystem?
Storing data in directory is more reliable due to indexing and data fetch and other operation. Just store the path of the file into DB and store that file into directory.
When there's lot's of data store request came on server it's very hard and complex to handle so much request.
So it's better to store data on directory so accessing of data becomes more faster and when the daily scale of DB storage increase then these become so important so when you start any system first of all study it well and then decide that what to do or which technique will be the best ?
When more data are there in DB then clustering and indexing become more important.
If you want to use it for small data storage then blob it good option but for large data I ll not recommend you because I have made online data store web application and faced this situation so at end I have used to store data in directory and just path in DB.
Related
I am working on a project where I have extracted images from sensor and saved them to the operating system directory. I have a Java API for uploading images to the server.
I need to upload these images and some other data typically float data type to the main server.
I need to decide an inter-mediator such as a database where I store those images and make connection through java to upload them or use HDFS.
Can some body please advise me, which option will be best for storing images? Database or HDFS?
Note: Images are up to 150 thousand can be more in future.
I think the best way to do that is to keep the floats you need and metadata of the images in the database. For easier searching and querying and easier interaction with the Java. The actual images are best stored on a file system to decrease the transformation from and to the database. I believe a simple file system would be good enough for that size of images. You probably won't use any of the fancy HDFS functions like map reduce and stuff like that. But that's up to you.
So in this case if a standard file system isn't good enough for you and you want something bigger then HDFS is the way to go. So the proper way would be a mixture of the two.
It totally depends on the usecase , you can choose
HDFS : when you wanna read them as a whole or transfer or process them to do any manipulation upon the images data and store or do someother action based on the processed results. In simple, if you wanna do Map-Reduce operation. And reading images in HDFS is sequentially , if you wanna perform to fetch particular image based on certain selection criteria, then it costly and performance impacted operations.
Database : It is better for query based operation where you wanna query or do DML operations upon images on certain criteria basis, In simple, WHERE conditions. But this is totally time consuming process, when you wanna process as a chunk. And the performance will be obviously very slow as you wanna store 150thousand of images
So My suggestion based on the requirement, you wanna store images as intermediate, it will be better to store in HDFS itself.
150.000 images is not considered a huge amount today. If an average of 10 MB is assumed for each image (uncompressed) the amount of data is 1.5 TB, which should be possible to store in an off-the-shelf database (with off-the-shelf hardware, i.e. a Linux box with some RAID disks) like postgreSQL. I'm no expert in HDFS even though I tried products in the same family as HDFS I find them easy to use, I guess you could try Hadoop then for processing of the images as well if you are looking for a way to parallelize the processing. Even though this product family is nice I would still use a standard database like postgreSQL if parallelisation is not really needed by nature (like you get in HDFS).
I am wondering if there is a way to cache arbitrary data from web requests onto the disk with Android. The flow I am thinking of is as follows:
The data is stored as a key value pair where the key is some identifier and the value is the raw data. Before actually making my web request, I check to see if the key is in the cache, if so, I skip making the web request. If the key does not exist in the cache, then I make the web request and store the data on the disk. I would like the cached data to be accessible across multiple runs of the app so that I don't have to make the web request again every time I start the app.
I was considering using SharedPreferences for this. Would SharedPreferences be the best way to go about this? Is it okay to store 1 megabyte of data in a single key in SharedPreferences?
The best solution to storing cache files is to store them in a cache directory. Luckily, the Android API provides a solution to this problem: Context#getCacheDir. You are able to create files in the directory returned, you can use a map to store an identifier for each file in order to retrieve them.
Although, this solution has a few limitations:
The system will automatically delete files in this directory as disk space is needed elsewhere on the device.
Cache data should only be used for temporary storage of information.
I may be coming late, but a couple years ago I made a library just for this:
https://github.com/fcopardo/EasyRest
The idea is to allow the app to operate with unstable or no connection without having to implement a secondary data layer for persisting data, instead, it keeps the responses for as long as you want, and refresh them without forcing the user to wait. Take a look, you may get some ideas.
in my java application I'm maintaining a users database and I have to store user account picture together with other details. i can store the photo in my database or I can store the image in my file system and store relevant path in the database. which one is more suitable in terms of memory and running time ?
Reference taken from this article:
You should take decision based on your system complexity and scalability. I suggest you to go with File System and Database Storage is not allowable, but still it is advisable. Management is always easy with database storage, but will create a performance issue in some circumstances. Management is quite complex with file system, but you will get more performance.
You should store file on file system. You can read about this on programmers stackexchange:
https://softwareengineering.stackexchange.com/a/150787
File system is obviously faster if you have the option.
You can use varbinary if you have to insert it to your db.
I am working on a project where we need to store large no of images say some 10 millions so which is the best way to store the images.Best way in terms of speed and efficient.
It is a web based project so the image retrieval should be fast.
Database
Storing images as base64 in database.
we are working on a nosql database.
File System
To make an unique id and store it under an folder.
1)Database
will require much code for processing image as using streams
Heavier load on the database server
database storage is usually more expensive than file system storage
databases win out where transactional integrity between the image and metadata are important.
it is more complex to manage integrity between db metadata and file system data
it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
2) File system
To store images on a unique id and storing it to harddisk will be a better option .
things like web servers, etc, need no special coding or processing to access images in the file system
refer http://perspectives.mvdirona.com/2008/06/30/FacebookNeedleInAHaystackEfficientStorageOfBillionsOfPhotos.aspx
also see Storing Images in DB - Yea or Nay?
There is a trade off - it will depend on your exact situation and needs. The benefits of each include
Filesystem
Performance, especially caching and I/O
Storing file paths in the database to be best.
There are a couple of issues:
database storage is usually more expensive than file system storage
you can super-accelerate file system access with standard off the shelf products
for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
things like web servers, etc, need no special coding or processing to access images in the file system
databases win out where transactional integrity between the image and metadata are important.
it is more complex to manage integrity between db metadata and file system data
it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
Database
Easier to scale out to multiple web servers
Easier to administer (backup, security etc)
If you have a SQL 2008 DB, have a look at FileStream in this SO article - this gives the best of both worlds.
See Storing Images in DB - Yea or Nay?
Edit
See for Nosql:
Is it a good idea to store hundreds of millions small images to a key/value store or other nosql database?
Storing images in NoSQL stores
I want to store images related to a particular row in my table,
So my table is called spot,
and each spot can have multiple images,
should i just store the images in a folder on the server and then store a location to that folder in a column of that row called imagesLocation?
or should there be other information encorporated?
any ideas?
You are on the right track - store the images on the file system (preferably where they can be seen by the web server), and store just a path to them in the database. This can greatly reduce I/O to your database server. Often you will just create a <img> tag with the path, so you can lead the loading/caching of these files to your webserver - which it is really good at.
Yes, you should store the file in the file system and the location of the file in the database. In my experience the database connectors perform very poorly on large pieces of binary data in the database.
You should store all the meta-information you need in the database so you don't need to rely on the OS for anything else than storing the raw bytes.