Ways to handle huge XML/JSON files - java

I'm looking to create an Android (altho for iOS the problem will be the same) application which will function pretty much as a webshop.
It will contain a lot of products - which can be acces through any way we want since that still has to be build.
The problem is, we created a plain text file to test the size, and it turns out that even a selection of the products, with no structure (XML, JSON..) is already 300mb.
Once we add a structure, this will logically only cause more overhead and increase this size.
Like I said, pretty much anything is possible in matters of receiving the data.
They can build an API to be able to fetch products once at a time when needed, or 1 big file to parse in a background process...
However, one of the wishes is being (as much as possible) offline. This would normally mean saving all the data into a database on the phone, but if this will result in 300mb on your SD card, this is no good.
To sum it up what I exactly want to know;
Are there any other ways to handle big data like this, without having to keep a connection to internet constantly, or having to download 300mb on someone's phone.
Some kind of compression, special way to save it in the database... any ideas are welcome.

Related

Passing images to Java

I am working on a project where I have extracted images from sensor and saved them to the operating system directory. I have a Java API for uploading images to the server.
I need to upload these images and some other data typically float data type to the main server.
I need to decide an inter-mediator such as a database where I store those images and make connection through java to upload them or use HDFS.
Can some body please advise me, which option will be best for storing images? Database or HDFS?
Note: Images are up to 150 thousand can be more in future.
I think the best way to do that is to keep the floats you need and metadata of the images in the database. For easier searching and querying and easier interaction with the Java. The actual images are best stored on a file system to decrease the transformation from and to the database. I believe a simple file system would be good enough for that size of images. You probably won't use any of the fancy HDFS functions like map reduce and stuff like that. But that's up to you.
So in this case if a standard file system isn't good enough for you and you want something bigger then HDFS is the way to go. So the proper way would be a mixture of the two.
It totally depends on the usecase , you can choose
HDFS : when you wanna read them as a whole or transfer or process them to do any manipulation upon the images data and store or do someother action based on the processed results. In simple, if you wanna do Map-Reduce operation. And reading images in HDFS is sequentially , if you wanna perform to fetch particular image based on certain selection criteria, then it costly and performance impacted operations.
Database : It is better for query based operation where you wanna query or do DML operations upon images on certain criteria basis, In simple, WHERE conditions. But this is totally time consuming process, when you wanna process as a chunk. And the performance will be obviously very slow as you wanna store 150thousand of images
So My suggestion based on the requirement, you wanna store images as intermediate, it will be better to store in HDFS itself.
150.000 images is not considered a huge amount today. If an average of 10 MB is assumed for each image (uncompressed) the amount of data is 1.5 TB, which should be possible to store in an off-the-shelf database (with off-the-shelf hardware, i.e. a Linux box with some RAID disks) like postgreSQL. I'm no expert in HDFS even though I tried products in the same family as HDFS I find them easy to use, I guess you could try Hadoop then for processing of the images as well if you are looking for a way to parallelize the processing. Even though this product family is nice I would still use a standard database like postgreSQL if parallelisation is not really needed by nature (like you get in HDFS).

Reduce data usage in an application supports offline-mode

I am working with an application which has offline mode. In order to do that we store the information in a local SQLite Database and using Content Provider which provides a wrapper around the SQLite, and sync it every once in a while with the data from the web service.
We are also keeping the images which are taken by user on the sdcard and send them to the server during the sync service.
The problem is bandwidth and data usage. In Android 4.0+, we have a section in device setting named Data usage. It is showing too much data usage and it annoyed the users.
My first question is : Do you think using ProGaurd which is a tool to shrink the code, can have any impact on reducing the Data Usage?
I would appreciate if you share any experience and suggestion with me in order to reduce the Data usage in such an app.
Addenda:
1 - User login to the system and during first sync sqlite file generated and transferred from REST (initialization).
2 - We have sync-status flag for entries in database. If record(using json string for data) or picture is not synced, it will transfer to the REST during sync and status-flag get updated.
3 - An updated database file receives from REST and merge with the current database on the phone in the sync service (if initialization is already done).
ProGuard has nothing to do with the amount of data you send/receive from a server. ProGuard can shrink and obfuscate code (thus making your APK smaller).
You need to analyze the data you send and receive. There is no silver bullet here that will magically solve any bandwidth issues you may come across in an app. You need to ask yourself several questions and take action depending on your answers:
What kind of numbers are we talking about?
In 2011 the average bandwidth use of an app was around 10MB per hour. There are probably more recent surveys if you search a bit. Are you far above the average number? If not, then I don't think you have to worry too much.
How often do you send and receive data?
If it's a real-time app that absolutely require live data then there's little you can do. If it's not a real-time app maybe you can reduce the frequency of send/receive or wait and collect more data before sending it to reduce overhead? If you're sending many small chunks of data you'll get a lot of overhead in HTTP headers and so on. Hold on to the small chunks a while longer and send them in one go to change the data to overhead ratio.
Can you change the protocol?
Maybe you can send data over a socket instead of HTTP to reduce overhead? By your description it doesn't sound like this would work in your case.
Can you compress data before sending it?
Make sure that your server GZips data before sending it to the client. There is a lot to gain by doing this.
Can you use another data format (binary, json, xml, custom)?
You mention that you use JSON. JSON usually/always perform better than XML, so you're already good there, but maybe you can send data in another format that is even more compact?

Android GTFS app

I'm trying to work on an app which uses GTFS. This may seems like a stupid question but I couldn't find any answer to it.
The GTFS for Israel, a rather small country with not so many buses infrastructure, is around 120 MB zipped file.
Right now the only possible way I could think of for getting it working is to download the file, but downloading 120 MB using the phone could take quite a long time. Sure you can do this only once and save it in a database on the phone, but it still requires downloading 120 MB.
Since it is zipped, I can't unzip it over the server and than just get the txt files..
So basically I'm asking, How can I get the information to the phone, without downloading the zipped file?
I've seen and used apps which uses that same GTFS file, and they load up really fast, even on the first load..
I hope you understand my issue, not sure how to explain it better.
Thanks!
P.s I would make an iPhone app too, and it's the same issue, hence the iPhone tag
One approach might be to preprocess the GTFS data during your app development. You could load it into a SQLite database, and use Core Data to get the data you need out of the file at runtime. This also gives you an opportunity to include only the data that you actually need for your app - it doesn't make sense to ask users to download extra data that they won't need.
Use protocol binary format (pbf) formely google and now open source. It is compact and very fast searchable, so no need to decompress it on a device and load it into a database on that device because pbf acts as a database. Just include pbf library in your code to query it. Of course you have to compress it once before distributing the data online.

Recommended file / filetype for importing online data to SQLite database on Android device

I use PHP to access my database and generate an XML file online. My android app then gets that XML file, parses it, and inserts the data into a SQLite database.
This works just fine but is INSANELY slow. We have an iOS app and an Android App both doing the same thing... the android app takes 7-10 seconds every time the user wants refreshed data, while the iOS app only takes 2-3 seconds at most.
There aren't a lot of records - 30-50 on average. There is a lot of content - some large articles, and each with 2-10 photos (I'm not downloading the photos - just importing their url, size...etc)
I followed an example on how to use Sax to import my XML (supposedly the fastest way).
TLDR:
Is there a better way I can format my data to make it MUCH quicker than how I'm doing it now? CSV? Use PHP to generate SQLite Insert statements? What is the "norm" and/or "best" for this?
Edit:
The more I read, the more it sounds like the difference between JSON and XML are miniscule, and can even be faster with XML if it's large data (like articles) instead of JSON. Not sure this is correct, just details from further reading.
You should try using JSON instead of XML i think it might be a lot faster to work with that. It is supported on Android and as far as I know iOS can handle it as well.
I used to create a SQLite db file and gzip it, then unzip it on device and use that directly. (Not a good way for sure)
For later data updates I used json to transfer data. JSON can surely handle large articles, but if you prefer you can just put urls to the articles in JSON and fetch them in subsequent transfers.
Instead of using XML or JSON, look into Google's Protobuf :
https://developers.google.com/protocol-buffers/docs/overview
http://code.google.com/p/protobuf/
since you are on PHP, you will need to find an implementation that works for you, here is a list :
http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns
Going forward, this will be a very nice way to transfer and marshall data around. Please let us know if this works for you.

Adding Many Saved Images in Android App

The application I'm trying to build will have a lot of images displayed (in ImageViews), and I'm not fetching them from a server/online service as it will need to be used offline. I know I can just dump them in the res/drawable directories, but I was wondering if there's any way to optimize this. Is there a way to somehow compress these images (besides making them smaller, they're already as small as I need) or use some other sort of android tool to better store them locally on the device?
I could just be overlooking a well used feature, and if so, it'd be great if someone could point me to that.
Edit: If I were to compress the images somehow, I would need to decompress at runtime or something, and that would take another thread/loading time. I'm not sure how to do that either, so I'm just brainstorming various ways, and I thought someone here would've come across this at some point.
If you haven't already, this is a good read - http://developer.android.com/guide/practices/ui_guidelines/icon_design.html#design-tips
When saving image assets, remove unnecessary metadata
Although the Android SDK tools will automatically compress PNGs when
packaging application resources into the application binary, a good
practice is to remove unnecessary headers and metadata from your PNG
assets. Tools such as OptiPNG or Pngcrush can ensure that this
metadata is removed and that your image asset file sizes are
optimized.
Outside of all other compression logic the above would be the place to start. Also when you say "optimize" - do you mean optimize the way images/drawables are loaded in your app or just the amount of space (on disk) the app will consume?

Categories