I need some inputs and suggestions from you. I have a very huge database which has around 2000 records having some information.
is it good to have another database having key value pair pointing to that huge database or XML file is enough?
Having 2000 records is not huge. And its better to use SQLite for data operations rather than using xml file, because an xml file with 2000 pairs will make the processing slow and is resource wasting. Better use SQLite for such requirements.
Related
What is the best approach for saving statistical data on a file using spring framework? is there any available library that offers reading and updating the data on a file? or should I build my own IO code?
I already have a relational database, but don't like the approach of creating an additional table to save the calculated values in different multiple tables with joins, also don't want to add more complexity to the project by using an additional database for just one task like MongoDB.
To understand the complexity of this report, Imagine you are drawing a chart with a total number of daily transactions for a full year with billions of records at any time with a lot of extra information like( total and average with different currencies on different rates).
So, my approach was to generate those data in a file on a regular basis, so later I don't need to generate them again once requested, only accumulate the new dates if available to the file
Is this approach fine? and what is the best library to do that in an efficient way?
Update
I found this answer useful for why sometimes people prefer using flat files rather than the relational or non-relational one
Is it faster to access data from files or a database server?
I would preferet to use MongoDB for such purposes, but if you need simple approach, you can write your data to csv\excel file.
Just using I\O
List<String> data = new ArrayList<>();
data.add("head1;head2;head3");
data.add("a;b;c");
data.add("e;f;g");
data.add("9;h;i");
Files.write(Paths.get("my.csv"), data);
That is all)
How to convert your own object, to such string 'filed1;field2' I think you know.
Also you can use apache-poi csv library, but I think this is way much faster.
Files.write(Paths.get("my.csv"), data, StandardOpenOption.APPEND);
If you want to append data to existed file, there are many different options in StandardOpenOption.
For reading you should use Files.readAllLines(Paths.get("my.csv")); it will return you list of strings.
Also you can read lines in range.
But if you need to retrieve one column, or update two columns where, and so on. You should read about MongoDB or other not relational databases. It is difficult write about MongoDB here, you should read documentation.
Enjoy)
I found a library that can be used to write/read CSV files easily and can be mapped to objects as well Jackson data formats
Find an example with spring
I am designing an API in Java with Spring Framework that will read a flat file containing 100K records and compare them with values fetched from database. If the DB values are available in the file values then they will be updated in the database.
The concern in the entire process is performance.
I have a maximum of 7 minutes to perform the entire processing of 100K records.
I am looking to use a caching mechanism to fetch all the data from the database in a cache bean. The cache will be refreshed every 30 mins or 1 hour.
Second, we will read the file and compare the values with the values in the cache and the matched values will be stored in another cache.
Third, we will update the values from the second cache to the database using a threading mechanism.
I need some opinions on this design. Does it look good.
Any advice to improve the design is welcome.
P.S. : Database in use is DB2 hosted on Mainframe systems
Thanks
Nirmalya
Extending this thread - I would just like to know why it's faster to retrieve files from a file system, rather than a MySQL database. If one were to benchmark the two to see which would retrieve the most data (multiple types of data) over 10 minutes - which one would win?
If a file system is truly faster, then why not just store everything in a file system and replace a database with csv or xml?
EDIT 1:
I found a good resource for alternate storage options for java
EDIT 2:
I'm looking for a Java API/Jar that has the functionality of a SQL Database Server Engine (or at least some of it) that uses XML for data storage (preferably). If you know of something, please leave a comment below.
At the end of the day the database does just store the data in the file system. It's all the useful stuff on top of just the raw data that makes you decide to use a database.
If you can replicate the functionality, scalability, robustness, integrity, etc, etc of a database system using CSV and still make it perform faster than a relational database then yes I'd suggest doing it your way.
It'd take you a few years to get there though.
Of course, relational systems are not the only way to store data. There are object-oriented database systems (db4o, InterSystems Cache) and document-based systems (RavenDB).
Performance is also relative to the style and volume of data you are working with and what you intend to do with it - I'm not going to even try and discuss that, it's too open ended.
I will also not start the follow on discussion: if memory is truly faster than the file system, why not just store everything in memory? :-)
This also seems similar to another question I answered a long while ago:
Is C# really slower than say C++?
Basically stuff isn't always done just for performance.
MySQL uses the file system the same as everything else on a computer. To retrieve a single piece of data, or a table of data, there is no faster way that directly from the file system. MySQL would just be a small bit of overhead added to that file system pull.
If you need to do some intelligent selecting, match some rows, or filter that data, MySQL is going to do that faster than most other options. The database server provides you calculation and data manipulation power that a filesystem can't.
When you have mixed/structured data, a DBMS is the only solution. For eg. try to get the people's name, surname and country for all your customers stored into your DB, but only those born in 1981 and living in Rome. If you have this data into files on the filesystem, how do you easily get only the required data without scanning all your files and how do you join returned data?
A DBMS give you much more than that.
Many DBMS store data into files.
This abstraction layer will make you retrieve data in a very easily, standard and structured way.
The difference is in how the desired data is located.
In a file system, locating the desired data means searching through all existing data until you find it.
Databases provide indexing which results in locating the desired data almost immediately (within ~12 comparisons) regardless of the amount of data.
What we want is an indexed file system - lucky for us, we have them. They are called databases.
I'm quite new to Java Programming and am writing my first desktop app, this app takes a unique isbn and first checks to see if its all ready held in the local DB, if it is then it just reads from the local DB, if not it requests the data from isbndb.com and enters it into the DB the local DB is in XML format. Now what im wondering is which of the following two methods would create the least overhead when checking to see if the entry all ready exists.
Method 1.) File Exists.
On creating said DB entry the app would create a seperate file for every isbn number named isbn number.xml (ie. 3846504937540.xml) and when checking would use the file exists method to check if an entry all ready exists using the user provided isbn .
Method 2.) SAX XML Parser.
All entries would be entered into a single large XML file and when checking for existing entries the SAX XML Parser would be used to parse the file and then the user provided isbn would be checked against those in the XML DB for a match.
Note :
The resulting entries could number in the thousands over time.
Any information would be greatly appreciated.
I don't think either of your methods is all that great. I strongly suggest using a DBMS to store the data. If you don't have a DBMS on the system, or if you want an app that can run on systems without an installed DBMS, take a look at using SQLite. You can use it from Java with SQLiteJDBC by David Crawshaw.
As far as your two methods are concerned, the first will generate a huge amount of file clutter, not to mention maintenance and consistency headaches. The second method will be slow once you have a sizable number of entries because you basically have to read (on the average) half the data base for every query. With a DBMS, you can avoid this by defining indexes for the info you need to look up quickly. The DBMS will automatically maintain the indexes.
I don't like too much the idea of relying on the file system for that task: I don't know how critical is your application, but many things may happen to these xml files :) plus, if the folder gets very very big, you would need to think about splitting these files in some hierarchcal folder structure, to have decent performance.
On the other hand, I don't see why using an xml file as a database, if you need to update frequently.
I would use a relational database, and add a new record in a table for each entry, with an index on the isbn_number column.
If you are in the thousands records, you may very well go with sqlite, and you can replace it with a more powerful non-embedded DB if you ever need it, with no (or little :) ) code modification.
I think you'd better use DBMS instead of your 2 methods.
If you want least overhead just for checking existence, then option 1 is probably what you want, since it's direct look up. Parsing XML each time for checking requires you to to pass through the whole XML file in worst case. Although you can do caching with option 2 but that gets more complicated than option 1.
With option 1 though, you need to beware that there is a limit of how many files you can store under a directory, so you probably have to store the XML files by multiple layer (for example /xmldb/38/46/3846504937540.xml).
That said, neither of your options is good way to store data in the long run, you will find them become quite restrictive and hard to manage as data grows.
People already recommended using DBMS and I agree. On top of that I would suggest you to look into document-based database like MongoDB as your database.
Extend your db table to not only include the XML string but also the ISBN number.
Then you select the XML column based on the ISBN column.
Query: Java escaped, "select XMLString from cacheTable where isbn='"+ isbn +"'"
A different approach could be to use an ORM like Hibernate.
In ORM instead of saving the whole XML document in one column you use different different columns for each element and attribute and you could even split upp your document over several tables for a simpler long term design.
With Java, how to store around a billion of key-value pairs in a file, with a possibility of dynamically updating and querying the values whenever necessary?
If for some reason a database is out of the question, then you need to answer the following question about your problem:
What is the mix of the following operations?
Insert
Read
Modify
Delete
Search
Once you have a good guess at the ratio of these operations, try selecting the appropriate data structure for use in your file. I'd recommend starting with this book as a good catalog of options:
http://www.amazon.com/Introduction-Algorithms-Second-Thomas-Cormen/dp/0262032937
You'll want to select a data structure with the best average and worst case runtimes for your most common operations.
Good Luck
Old question, but this is a case for log files. You do not want to be copying a billion records over every time you do a delete. This can be solved by logging all "transactions" or updates to a new and separate file. These files should be broken up into reasonable sizes.
To read a tuple, you start at the newest log file until you find your key, then stop. To update or insert you just add a new record into the most recent log file. A delete is still a log entry.
A batch coalesce process needs to be run periodically which will scan each log file and write out another master. As it is read, each NEW key gets written to the new master and duplicate (old) keys are skipped until you make it all the way through. If you encounter a delete record, mark it in a separate delete list skip the record and ignore subsequent records with that key.
That made it sound simple, but remember you may want to block/chunk your file as you will likely scan said log files in reverse, or you will at least seek() to the max size and write in reverse instead of read.
I have done this exact thing with billions of lines of data. You're just re-inventing sequential access databases.
You leave out a lot of details, but...
Are the keys static? What about the values? Are they fixed size? Why not use a database?
If you don't want to use a database then use a memory mapped file.
Can you use a database? Managing such a large file would be a pain.
Edit: if the file requirement is mostly to avoid machine communication failures, downtime and similar situations, maybe you could use an embedded database. This way you would be freed from the large file manipulation problems and still use all the advantages a database can give you. I already used Apache Derby as an embedded database with wonderful results. Java DB is Oracle supported and based on Derby.