I need to pick a file from a remote server of size 200mb and store it in db2 database (column as Blob type and corresponding Java type as byte[]).
Can anyone tell me what will be the good approach for this and how to go with that in Java?
Thanks all.
I amaze that why you want such a big file database. It dramatically reduce your database performance.
Although to do this use Binary / blob type data to store such a big file.
use these links for more details
http://www.ibm.com/developerworks/data/library/techarticle/0303stolze/0303stolze.html
http://www.dbforums.com/db2/1663961-insert-data-fields-clob-blob.html
Try it in some sandbox / virtual location.
BEWARE FOR DATA DAMAGE IN BIG FILE TRANSACTION OVER SLOW CONNECTION.
Related
We have an application that runs with any of IBM Informix, MySQL and Oracle, and we are using Java with Hibernate to connect to the database. We will store XML, CSV and other text-based files inside the database (clob column). The entities in Java are byte[] objects.
One feature request to the application is now to "grep" content inside the data. So I need to find all files with a specific content.
On regular char/varchar fields I can use like '%xyz%', but this is not working on byte[] / blobs.
The first approach was to load each entity, cast the byte[] into a string and use the contains method in Java. If the use enters any filter parameters on other (non-clob) columns, I will apply those filters before testing the clob in order to reduce the number of blobs I have to scan.
That worked quite well for 100 files (clobs) and as long as the application and database are on the same server. But I think it will get really slow if I have 1.000.000 files inside the database and the database is not always in the same network. So I think that is not a good idea.
My next thought was creating a database procedure. But I am not quite sure if this is possible for Informix, MySQL and Oracle. And I am not sure if this is possible.
The last but not favored method is to store the content of the data not inside a clob. Maybe I can use a different datatype for that?
Does anyone has a good idea how to realize that? I need a solution for all three DBMS. The application knows on what kind of DBMS it is connected to. So it would be okay, if I have three different solutions (one for each DBMS).
I am completely open to changing what kind of datatype I use (BLOB, CLOB ...) — I can modify that as I want.
Note: the clobs will range from about 5 KiB to about 500 KiB, with a maximum of 1 MiB.
Look into Apache Lucene or other text indexing library.
https://en.wikipedia.org/wiki/Lucene
http://en.wikipedia.org/wiki/Full_text_search
If you go with a DB specific solution like Oracle Text Search you will have to implement a custom solution for each database. I know from experience that Oracle Text search takes significant time to learn and involves a lot of tweaking to get just right.
Also, if you use a DB solution you would receive different results in each DB even if the data sets were the same (each DB would have it's own methods of indexing and retrieving the data).
By going with a 3rd party solution like Lucene -- you only have to learn one solution and results will be consistent regardless of the Db.
i am trying to store image in sql on server. Later i want to make these image available for download. Which approach is good:
By uploading image and saving path.
or
2.By converting image to base64 and then storing as BLOB type
I'd say it's a tradeoff, and it also depends a lot on your network connection, I'd go for keeping the images in the cloud and just keeping a key in the database.
It would shrink the disk usage a lot , possibly even allowing the entire database to be cached into ram, thus balancing the extra time the process would spend retrieving the relevant info from the network
If the security is main concern then store image in DB in blob, or if if it is not much sensitive then store it in specific directory then store it's path in DB.
Because DB storage give it extra security.
If we store it on directory then someone can access it if they have tried some trick to access the directory.
So, and if we store it on directory then proper handling of file should be done.
Because if someone upload file then again other person store it wit same name then possibility of file overwrite.
So you can do something like that take 2 fields in table one is original name and one is new name that have original name plus the timestamp of file upload and in directory store it with that new name.
and if you are storing it in DB then doesn't need this extra work.
And yes bifurcation of images in directory also tough, suppose in one year so many file will be stored in it so indexing plus accessing become cumbersome work.
So apply some mechanism so at some time of interval you can separate images.
I am new to databases and I love how easy it is to get data from a relational database (such as a Derby database). What I don't like is how much data one takes up; I have made a database with two tables, written a total of 130 records to these tables (each table has 6 columns), and the whole relational database gets saved in the system directory as a folder that houses a total of approximately 1914014 bytes! (Assuming that I did the arithmetic right....) What the heck is going on to cause such a huge request of memory?! //I also notice that there is a log1.dat file in log folder that takes up exactly 1MB of data. I looked into this file via Notepad++, and saw that it was mostly NULL characters. What is that all about?
Derby need to keep track on your database data, the redo logs and transactions so your database is in a consistent state and can recover even from pc crashes.
Also he creates most files with a fixed size (like 1MB) to ensure he did not need to increase the file size later on (performance issues and to not fragment his files to much).
Over the runtime or when stopping, Derby will clean up some of this files or regroup them and free space.
So overall the space and the files are the trade offs you get for using a database.
Maybe you can change some of this behaviour via some Derby configs (I did not find any one suitable in the doc :().
When last checked in 2011, an empty Derby database takes about 770 K of disk space: http://apache-database.10148.n7.nabble.com/Database-size-larger-than-expected-td104630.html
The log1.dat file is your transaction log, and records database changes so that the database can be recovered if there is a crash (or if your program calls ROLLBACK).
Note that log1.dat is disk space, not memory.
If you'd like to learn about the basics of Derby's transaction log, start here: http://db.apache.org/derby/papers/recovery.html
Extending this thread - I would just like to know why it's faster to retrieve files from a file system, rather than a MySQL database. If one were to benchmark the two to see which would retrieve the most data (multiple types of data) over 10 minutes - which one would win?
If a file system is truly faster, then why not just store everything in a file system and replace a database with csv or xml?
EDIT 1:
I found a good resource for alternate storage options for java
EDIT 2:
I'm looking for a Java API/Jar that has the functionality of a SQL Database Server Engine (or at least some of it) that uses XML for data storage (preferably). If you know of something, please leave a comment below.
At the end of the day the database does just store the data in the file system. It's all the useful stuff on top of just the raw data that makes you decide to use a database.
If you can replicate the functionality, scalability, robustness, integrity, etc, etc of a database system using CSV and still make it perform faster than a relational database then yes I'd suggest doing it your way.
It'd take you a few years to get there though.
Of course, relational systems are not the only way to store data. There are object-oriented database systems (db4o, InterSystems Cache) and document-based systems (RavenDB).
Performance is also relative to the style and volume of data you are working with and what you intend to do with it - I'm not going to even try and discuss that, it's too open ended.
I will also not start the follow on discussion: if memory is truly faster than the file system, why not just store everything in memory? :-)
This also seems similar to another question I answered a long while ago:
Is C# really slower than say C++?
Basically stuff isn't always done just for performance.
MySQL uses the file system the same as everything else on a computer. To retrieve a single piece of data, or a table of data, there is no faster way that directly from the file system. MySQL would just be a small bit of overhead added to that file system pull.
If you need to do some intelligent selecting, match some rows, or filter that data, MySQL is going to do that faster than most other options. The database server provides you calculation and data manipulation power that a filesystem can't.
When you have mixed/structured data, a DBMS is the only solution. For eg. try to get the people's name, surname and country for all your customers stored into your DB, but only those born in 1981 and living in Rome. If you have this data into files on the filesystem, how do you easily get only the required data without scanning all your files and how do you join returned data?
A DBMS give you much more than that.
Many DBMS store data into files.
This abstraction layer will make you retrieve data in a very easily, standard and structured way.
The difference is in how the desired data is located.
In a file system, locating the desired data means searching through all existing data until you find it.
Databases provide indexing which results in locating the desired data almost immediately (within ~12 comparisons) regardless of the amount of data.
What we want is an indexed file system - lucky for us, we have them. They are called databases.
I need some inputs and suggestions from you. I have a very huge database which has around 2000 records having some information.
is it good to have another database having key value pair pointing to that huge database or XML file is enough?
Having 2000 records is not huge. And its better to use SQLite for data operations rather than using xml file, because an xml file with 2000 pairs will make the processing slow and is resource wasting. Better use SQLite for such requirements.