How to populate a large sqlite database on first run

How to populate a large sqlite database on first run - java

I'm working on a dictionary app based on an SQLite database with well over 300,000 rows.
The problem is that the database file in its final form consists of full-text indexed tables and weighs well over 150Mb.
I've managed to bring the .db file size to a minimum by creating a contentless fts4 tables. The database cannot be any smaller. I also managed to put the pre-populated database in the app and it works fine.
The problem is that I can't just keep the final .db file in /assets and copy it to sdcard on first run because it's too big. I also don't want to download it on first run.
Bulk INSERTing the data, even with transactions and sqlite optimized and no indexes at start takes forever so it's also not an option.
The good thing is the raw data used to build the database, in CSV format and compressed, is 30Mb and sqlite's command line .import option is very fast (100,000 rows in ~1s) but... it can't be accessed from the app without root permissions.
I would love to bundle the app with compressed CSV files in /assets, decompress them, create the database on sdcard and then import the CSV but that seems to me to be impossible. Although there are many dictionary apps that appear to be doing exactly this. (The downloaded app is a dozen megabytes, builds database on first run, and takes hundreds of megabytes of space on the sdcard).
How do I accomplish this?
I've been working on this for past two weeks and simply ran out of ideas. I'm new to Android development so any help would be much appreciated.

Plan A: ship a compressed SQLite database file and decompress it after installation; you should be able to omit indices and rebuild them later.
Plan B: copy the relevant parts of the CSV importer into your application, ship a compressed CSV file and load it into an empty database like the command line tool would. Official documentation page.

Create entire content into csv file possibly multiple csv files and compress into zip. I have checked with 32 mb csv file after compression the file become 482KB.
For Example
data1.csv
data2.csv
data3.csv
data4.csv
data5.csv
compress the files into data.zip and put file into asset folder, when importing - extract file from the zip one by one and insert into sqlite db.

Related

Optimize Apache POI .xls file append

Can someone please let me know if there is a memory efficient way to append to .xls files. (Client is very insistent on .xls file for the report and I did all possible research but in vain) All I could find is that to append to existing .xls, we first have to load the entire file into memory, append data and then write it back. Is that the only way ? I can afford to give up on time to optimize memory consumption.

I am afraid that is not possible using apache poi. And I doubt that it will be possible by other libraries. Even the Microsoft applications itself needs always opening the whole file to be able to work with it.
All of the Microsoft Office file formats have a complex internal structure similar to a file system. And the parts of that internal system may have relations to each other. So one cannot simply stream data into those files and append data as it is possible with plain text files or CSV files or single XML files for example. One always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. And where should it be known when not in memory?
The modern Microsoft Office file formats are Office Open XML. This are ZIP archives containing an internal file system having a directory structure containing XML files and other files too. So one can reduce the memory footprint by reading data parts from that ZIP file system directly instead of reading all data into the memory by unzipping the ZIP file system. This is what apache poi tries with XSSF and SAX (Event API). But this is for reading only.
For the writing approach one could have parts of the data (single XML files) written to temporary files to keep them away from the memory. Then put the complete ZIP file system together from those temporary files when all writing is complete. This is what SXSSF (Streaming Usermodel API) tries to do. But this is for writing only.
When it comes to appending data to an existing Microsoft Office file, then nothing of the above is useable. Because, as said already, one always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. So the whole file system always needs to be accessible to append data parts to it and update the relationships. One could think about having all data parts (single XML files) and relationship parts in temporary files to keep them away from the memory. But I don't know any library (maybe the closed source ones like Aspose) who does this. And I doubt that will be possible in a performant way. So you would pay time for a lower memory footprint.
The older Microsoft Office file formats are binary file systems but also consists in an complex internal structure. The single parts are streams of binary records which also may have relations to each other. So the main problem is the same as with Office Open XML.
There is Event API (HSSF Only) which tries reading single record streams similiar to the event API for Office Open XML. But, of course, this is for reading only.
There is no streaming approach for writing HSSF upto now. And the reason is that the old binary Excel worksheets only provide 65,536 rows and 256 columns. So the data amount in one sheet cannot be that big. So a GB sized *.xls file should not occur at all. You should not use Excel as data exchange format for database data. This is not what a spreadsheet calculation application is made for.
But even if one would program a streaming approach for writing HSSF this would not solve your problem. Because there is still nothing for appending data to an existing *.xls file. And the problems for this are the same as with the Office Open XML file formats.

Reading large .csv file in android

For my android application, I am using an AutoCompleteTextView. I want it so that the user types in a few letters of a city and there will be a popup showing the cities that match those letters. But, in order to do this, I need the list of all the city names. I have a .csv file and a .sql file containing all the data of the cities in the world. However, I do not know how to read such a large file effeciently. The file is approximetely 50 MB. Can someone please help me read or store this file so I can use it in my AutocompleteTextview? Thank you in advance!

You can't create android app greater than 50 mb, or you must use APK extension files.
With raw csv "cities" file with size of 50 mb or more, data base will be too big anyway. For you needs better solution will be access to online data base, store data online and read it from your app. I am sure, you can use some google API for it.

Is it not possible to export a database/catalog to a single file in HSQLDB

I am in the process of refreshing/re-writing a data analysis application I wrote a couple of years back. What I am trying to achieve is as follows:
I need to have a light-weight database to do queries to, I decided on using HSQLDB.
I will have 2 applications; one for creating the DB and one that will do the analysis (will be used by others). It is my intention that the analysis software (which is multithreaded) will use the DB in a read-only fashion
The DB will likely be distributed via FTP, and preferably with minimum hassle for the user of the analysis software (most of which are not very technically skilled).
I am not very well-read on SQL but I managed to get the info into tables and tried out simple queries. So in order to finish the "database creator" application I just need to figure out how to "package" the DB.
I have experimented both with mem and file "catalogs" as described in the HSQLDB user guide when I generate the DB. The way I see it, with mem catalogs I cannot write them to disk (to distribute later on) and with file catalogs I have several files that need to be taken care of:
A file: catalog consists of between 2 to 6 files, all named the same
but with different extensions, located in the same directory. For
example, the database named "test" consists of the following files:
• test.properties
• test.script
• test.log
• test.data
• test.backup
• test.lobs
The properties file contains a few settings about the database. The script file contains the definition of tables and other database objects, plus the data for non-cached tables. The log file contains
recent changes to the database. The data file contains the data for
cached tables and the backup file is a compressed backup of the last
known consistent state
of the data file. All these files are essential and should never be deleted. For some catalogs, the test.data and test.backup files
will not be present. In addition to those files, a HyperSQL database
may link to any formatted text files, such as CSV lists, anywhere on
the disk.
Question(s):
I think the *.script and *.properties files are the most important ones but the guide specifically says that all the files are essential and should not be deleted. Since there is no *.data file in my case, and all the data to generate my database is stored in *.script file (in clear text) it makes me think that when I "open" that file, the JVM recreates the entire DB all over again. Is this correct? Isn't this a very inefficient representation of data?
If my understanding in (1) is correct, why are the other files essential? Do I have to distribute them all?
If (1) and (2) are not off-track, then what options do I have to achieve my goal? Is it for instance feasible to gzip all the files and transport them that way? Then my analysis software would need to unpack them in a reasonable spot, and do a clean-up from time to time when it gets "crowded" in there...

If there is no *.data or *.lobs file then the .script file contains all the data as well as table definitions. In this case, the JVM recreates the database by reading the .script file.
The other files are essential if they exist. If you do not use LOBs, there will be no .lobs file. If you do not disk tables (CACHED tables), there will be no .data file.
You can distribute the .script file only. Each time this file is opened, a .properties file will be created if it does not exist.
You can use the res: option if your database does not change. For databases that change and need reloading use the file: option.

How can i store 32 MB file in Blackberry project

Recently i am developing application for android and blackberry.
What i need to do is to store a 32MB Sqlite file in project as initial database for application uses. In case of android it is simple and i can easily store this large file in assest folder but i m having hard time trying to make this work in blackberry why because as per RIM announcement App size should not be larger then 15 MB.
http://forums.crackberry.com/blackberry-os-apps-f35/rim-explains-app-memory-limit-637544/
Can anybody help me if is there any other trick strikes to make this happen.
To download a 32 MB file during Application startup is not a good idea. It'll take long time to navigate user into application.
Many thanks in advance.
Any help would be highly appreciated.

I've been thinking about this yesterday after work, and I think you could also try including the same data the DB contains, but in a compressed format, such as the first time the app is run, you create and populate the larger local sqlite DB from it, then delete the compressed data.
For instance, you could create a CSV dump of each table, and save it to a .csv file (i.e.: table.csv). Then you could then gzip each one to a file (i.e.: table.gz). You then compile every compressed file in a second resource module for your app (i.e: res_module.cod). When you install the app, this module is also installed and it copies its files to SDCard. Then the main module reads the files from the SDCard, ungzipping them and populating each table from them. After that, you can programmatically uninstall the assests module and delete the sdcard temporary files. The requirement for this to work is the compressed files should be smaller than 14 MB to fit in one cod file.
As for the DB schema, maybe a format for exporting it already exists, otherwise you could make your own format, or if the DB is unlikely to change, then hardcode it (bad practice).

application size issue

I'm building a dictionary application and I have a problem right now. My application's is 16MB and when I install it on a phone, Database files copies to the data folder and in the manage apps section I see that my application size is 32MB (my app+data folder).
I don't cheat user, I want to say, my app is 16MB, but when user install it , it become 32MB. Why? this is a negative point and I want to solve it. I want my app uses only 16MB in users phone. just that
How I can fix this? I have to read and write in assets folder directly or there is other solution? this is a problem in low storage size phones. how I can fix this?

I am not sure how your database is structured in terms of whether it is a pre-loaded database wherein you just include you .db file with all the data OR is it something where in you push all your DB content with the app and then at the time of app installation you actually install all you data in the DB.
In case of the latter situation you double the size of your app because you already have data content (in files) which you want to use to populate your database (say 16 mb in this case). Then you use these files to actually create your DB file (which is 16mb again) and this doubles the size of the app.
So what you could do is pre-populate your DB content in a .db file and then just use this file directly as the Db file in your app (this will keep it to 16mb). Follow this tutorial :
http://www.reigndesign.com/blog/using-your-own-sqlite-database-in-android-applications/
Hope this helps.

Not sure I fully understand your situation.
Do you have a roughly 16MB dictionary, that is packaged inside your app as string constants in your code or some resource file or something (to make it 16MB) and then, when your app installs or first launches, you also write this dictionary into your app's database?
If so, then now you have 2 copies of your dictionary around to make it 32MB.
To solve this, either keep only one copy in your app, or download the dictionary from somewhere to get it into your database rather than storing it as a constant in your app.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.