Passing images to Java - java

I am working on a project where I have extracted images from sensor and saved them to the operating system directory. I have a Java API for uploading images to the server.
I need to upload these images and some other data typically float data type to the main server.
I need to decide an inter-mediator such as a database where I store those images and make connection through java to upload them or use HDFS.
Can some body please advise me, which option will be best for storing images? Database or HDFS?
Note: Images are up to 150 thousand can be more in future.

I think the best way to do that is to keep the floats you need and metadata of the images in the database. For easier searching and querying and easier interaction with the Java. The actual images are best stored on a file system to decrease the transformation from and to the database. I believe a simple file system would be good enough for that size of images. You probably won't use any of the fancy HDFS functions like map reduce and stuff like that. But that's up to you.
So in this case if a standard file system isn't good enough for you and you want something bigger then HDFS is the way to go. So the proper way would be a mixture of the two.

It totally depends on the usecase , you can choose
HDFS : when you wanna read them as a whole or transfer or process them to do any manipulation upon the images data and store or do someother action based on the processed results. In simple, if you wanna do Map-Reduce operation. And reading images in HDFS is sequentially , if you wanna perform to fetch particular image based on certain selection criteria, then it costly and performance impacted operations.
Database : It is better for query based operation where you wanna query or do DML operations upon images on certain criteria basis, In simple, WHERE conditions. But this is totally time consuming process, when you wanna process as a chunk. And the performance will be obviously very slow as you wanna store 150thousand of images
So My suggestion based on the requirement, you wanna store images as intermediate, it will be better to store in HDFS itself.

150.000 images is not considered a huge amount today. If an average of 10 MB is assumed for each image (uncompressed) the amount of data is 1.5 TB, which should be possible to store in an off-the-shelf database (with off-the-shelf hardware, i.e. a Linux box with some RAID disks) like postgreSQL. I'm no expert in HDFS even though I tried products in the same family as HDFS I find them easy to use, I guess you could try Hadoop then for processing of the images as well if you are looking for a way to parallelize the processing. Even though this product family is nice I would still use a standard database like postgreSQL if parallelisation is not really needed by nature (like you get in HDFS).

Related

Attachments under File system vs databases?

I need to sore attachments at server side. I can store them either under blob column of database or under file directory.
My question is which one is more reliable, scalable and maintainable?
EDIT:-
if we go for file system, we have to handle synchroniztion yourself. Is n't it ? For example if two users are trying to create/update the File under same directory how will we handle concurrency with filesystem?
Storing data in directory is more reliable due to indexing and data fetch and other operation. Just store the path of the file into DB and store that file into directory.
When there's lot's of data store request came on server it's very hard and complex to handle so much request.
So it's better to store data on directory so accessing of data becomes more faster and when the daily scale of DB storage increase then these become so important so when you start any system first of all study it well and then decide that what to do or which technique will be the best ?
When more data are there in DB then clustering and indexing become more important.
If you want to use it for small data storage then blob it good option but for large data I ll not recommend you because I have made online data store web application and faced this situation so at end I have used to store data in directory and just path in DB.

Store data of an application?

I am looking for a good possibility to store data online. I am programming with Java and Android. So there should be an interface to get access to these data. The most files are images. There is an increasing number of images. One image has a file size of nearly 200kb.
What is a common way to store these data? I need a good performance. So there should be a fast response and unlimited traffic. Maybe you can show me some options for secure data storage.
I have looked for webservers to store data. But many of these do not allow to store application data like images.
As I understand, you don't want to use DB for storing images. Ok, so the solution is to use file storage. You may want to take a look on the Amazon S3 (to my mind, great solution for storing static content) or Google Cloud Storage

How to get pixel rgb value in hadoop?

I have millions of images stored in hdfs of hadoop. I want to build a index of these images. How to get pixel rgb values of these images? I am new in hadoop, the image format in hadoop is different from the original image binary format. Another problem is should I use the sequencefile in hadoop to pack the enormous images to a big file for efficiency? Many thanks.
I could answer the problem partially.
Another problem is should I use the sequencefile in hadoop to pack the enormous images to a big file for efficiency?
Depends on the size of the individual files. If the individual files are really big, then consolidating them might not really help and the other way also.
Check this query on SO for more details.
If you have the additional storage and efficiency is important to you I would definitely go with a SequenceFile. Hadoop will handle splitting the file up for you. We ran into a case where we were extracting data from imagery file similar to what you are doing. In our case we were extracting metadata for ingestion in a discovery system so that our imagery files could be searched outside of the cluster. In this case because efficiency was not a big deal for us we just process the files individually making sure to make them not splittable. This way the other system can reach back over http to grab the source files.

Ways to handle huge XML/JSON files

I'm looking to create an Android (altho for iOS the problem will be the same) application which will function pretty much as a webshop.
It will contain a lot of products - which can be acces through any way we want since that still has to be build.
The problem is, we created a plain text file to test the size, and it turns out that even a selection of the products, with no structure (XML, JSON..) is already 300mb.
Once we add a structure, this will logically only cause more overhead and increase this size.
Like I said, pretty much anything is possible in matters of receiving the data.
They can build an API to be able to fetch products once at a time when needed, or 1 big file to parse in a background process...
However, one of the wishes is being (as much as possible) offline. This would normally mean saving all the data into a database on the phone, but if this will result in 300mb on your SD card, this is no good.
To sum it up what I exactly want to know;
Are there any other ways to handle big data like this, without having to keep a connection to internet constantly, or having to download 300mb on someone's phone.
Some kind of compression, special way to save it in the database... any ideas are welcome.

Storing lots of small files: archive vs. filesystem

I am creating an application that requires a lot of image thumbnails (~3000, 5-25KB). Because speed is essential I plan on loading these images into memory when the application starts. At runtime, new thumbnails will be downloaded and added to the collective.
I could store them all in a folder, but reading thousands of files into memory when a program starts hardly seems efficient.
My second option would be to save them in some kind of (compressed) archive. This would make storage itself and loading more efficient (I think). However, new files will be added regularly, and that will probably not go as smoothly as just saving them in a folder.
Is storing a cache of small files in a (compressed) archive a bad idea or not? Are ZIP files the way to go? Would I be better off using uncompressed archives (and if so, what kind)?
All image files will be JPEG's.
Thanks in advance!
EDIT: I am considering to drop the "load everything into memory on application start" thing. This would simplify my question a little. My initial idea to put everything in one big file now seems less beneficial, since the problem of many files in one directory can be solved by hashing into subdirectories.
Small files don't compress especially well, so you may not gain much compression.
While loading the files will be fast because they are smaller, decompression adds time. You'd have to experiment to see which is faster.
I would think the real issues would relate to the efficiency of the file system when it comes to iterating over all the little files, especially if they are all in one folder. Windows is notorious for being pretty inefficient when folders contain lots of files.
I would consider doing something like writing them out into one file, uncompressed, that could be streamed into memory -- maybe not necessarily contiguous memory, as that might be a problem. But the idea would be to put them all in one file. Then write some kind of index that ties a file name or other identifier to an offset from which the location of the image in memory could be determined.
New images could be added at the end, and the index updated appropriately.
It isn't fancy but that's what you're trying to avoid. An archive or even a file system gives you lots of power and flexibility but at the cost of efficiency. When you know what you want to do, sometimes simple is better.
I would consider implementing a solution that reads files from a folder, another that divides the files into subfolders and subsubfolders so there are no more than 100 or so files in any given folder, then time those solutions so you have something to compare to. I would think a simple indexed file would be fast enough that you wouldn't even need to pre-load the images like you're suggesting -- just retrieve them as you need them and keep them around once they're in memory.
All disk based storage, and most database, allocate space in chunks. The chunks on large capacity disks can be large. If you have 5kb files and a 32kb disk chunk you end up with 85% wasted space on your storage.
Using an archive won't compress jpeg much because the jpeg encoding algorithm already does that. It will however save you the wasted space on the storage media. It does make things more complicated and perhaps a little slower.
In my opinion I think that the zip file way it´s a bad idea, because you will slowdown everything with the process to load the zip file and unzip it to extract each image.
I think that the purpose of a thumbnail image is that by nature is small so your app plus hardware can load it as fast as possible. So I believe that it is a better idea to load each image as you need it.
Well, if you have small, "geometric" pictures, you may implement them as objects of type javax.swing.Icon rather than images to load from the filesystem.
http://download.oracle.com/javase/6/docs/api/javax/swing/Icon.html
http://download.oracle.com/javase/tutorial/uiswing/components/icon.html
So you will implement one or more objects which draw themselves onto a Graphics surface using the Graphics drawing primitives, instead of copying pixels.
If this is a web-application then the best performance boost you can get is setting good HTTP caching headers. Having a unique URL for every image (also different URLs for different versions of the same image) makes it possible to set VERY far future expire headers, because changing the image changes the URL leading into refetch.
I won't compress, because JPEG cannot be good compressed and it only costs CPU time.
I would recommend to simply store the images into filesystem and consider the use of libraries like jawr or implement your own caching strategy.
I know this question has already answered but I think you need more options other than zipping.
While zip is good, It's not really affect much for JPEG since JPEG has already compressed.
Other thing you may want to consider is :
Put the image in Content Delivery Network (CDN)
Compress components with gzip ( mean the server will automatically zip every response ) and you dont need to write any code to unzip it later - it's handled by the browser automatically.
Since you mention JPEG, you may want to use JPEGTran.Run jpegtran on all your JPEGs.
This tool does lossless JPEG operations such as rotation and can also be used to optimize and remove comments and other useless information (such as EXIF information) from your images.
jpegtran -copy none -optimize -perfect src.jpg dest.jpg
Use Image Sprites. Instead of asking browser to download many image at same time, ask the browser to only download one.
For the details read : http://developer.yahoo.com/performance/rules.html#opt_images
For the basic examination how to improve your website performance you can try install YSlow ( plugin to detect uneffecient code ) in Firefox.
Hope that helps.

Categories