I have a simple data file I want to store. I don't need any indexes or queries performed on it, so I can put it in Cloud Storage. BUT, the latency of fetching the file is very important. What is the latency I can expect when fetching a file from Cloud Storage vs. the latency in fetching an entity from the Datastore?
I could not find a good reference for this issue...
You shouldn't expect a specific latency as it'll vary depending on a large number of things. If the file is that important, then just package it with the files when distributing the program if that's possible.
If this is a file that fits within the limits of Datastore entity (1 MB size). Then storing the file there makes sense.
I have seen lower latency on Datastore retrieval than GCS (again depends highly on the size of the object).
Another advantage using Datastore would be with is using the NDB Python interface as it will transparently cache the entity to memcache.
Related
I'm reading a file streams of certain group of files and storing it in a database as bytea type. But when I try to read the streams from the database and write those streams into a file, It is really taking long to do it and finally I get an out of memory exception. Is there any other alternative where it can be done more efficiently with or without database involved?
Databases were designed with a key problem in mind:
When having a bunch of data, where we don't know the kinds of reports
that will be generated, how can we store the data in a manner that
preserves the data's inner relationships and permits any reporting
format we can think of. a
Files lack a few key characteristics of databases. Files consistently have a single structure of "characters in order". They also lack any means of integrated report building, and the reporting is often confined to simple searches, which have little context without the result being shown in the rest of the file.
In short, if you aren't using the database's features, please don't use the database.
Many people do store files in databases; because, they have one handy, and instead of writing support for a filesystem storage, they cut-and-paste the database storage code. Let's explore the consequences:
Backups and restores become problematic because the database grows in size very quickly, and the bandwidth to do the backup and restore is a function of the size of the database.
Replication rebuilds in fail-safe databases take longer (I've seen some go so long that redundancy couldn't catch up to the rate of change in the primary database).
Queries that (accidentally) reference the files in bulk spike the CPU, possibly starving access to the rest of the system (depends on database).
Bandwidth of returning the results of those queries steals system resources preventing other queries from communicating their results (better on some databases, worse on others).
I am working on a Spring-MVC application in which we are seeing that the database is growing big. The space is consumed by chat messages history mostly, and other stuff like old notifications, which are not that useful.
Because of which we thought of moving the guys to some text/XML file to give the DB some room to breath and increase the performance of queries thereby. Indexes are not that useful as too many insertions.
I wanted to know if there is any way, PostgreSQL or Hibernate has support for such a task, where data is picked out of db and saved in plain files, which can be accessed and result in atleast good performance gains.
I have only started looking up some stuff, so I don't have much in hand to show. Kindly let me know if there are any questions you guys have.
Thanks a lot.
I would use the PostgreSQL JSON storage and have two databases:
the current operations DB, the one where you are moving data away to slim it
the archive database where old data is aggregated to save storage
This way you can move data from the current database into the archive database without compromising ACID attributes and you can aggregate the old data to simplify retrieval, by grouping various related entities based on some common root entity, which you'll then use to access your old data.
This way the current operation database remains small enough, while the archive database can be shared. This way, it's easier to configure the current operation for high performance, while the archive one for scalability.
Anyway, hibernate doesn't support this out-of-the-box, but you can implement it using custom Hibernate types and JTA transactions.
We are working on a solution which crunches log files generated by systems and does various analysis operations on these logs to come up with different views which can help to triage the issues. For e.g. building a sequence of error messages which are repeating across the logs.
Currently we are loading the logdata in java collections and doing all operations by iterating/searching through these collections which is affecting the performance. We are thinking to instead load the data in a database and fire queries on the data to get optimized search results. And for the same we are thinking on using an in-memory db which will give a better performance than a persistent store as disk reads/writes will be minimized.
The amount of data to be analyzed at a time may go up to few GBs (2-4 GBs) and hence may exceed the RAM size on the machine.
Question:
What options can be considered for such an In-Memory db? Is GridGain a good option for the same?
Most of our solutions shall be deployed on a single node and hence distributed capabilities are not the priority. What other in-memory db's can be recommended for this purpose
You could try column store in-memory databases. They usually can achieve better compression ratio than row store db, and are designed for analytical tasks. Examples are MonetDB (open source), Vertica, InfiniDB and so on.
Will be building an app which will be pulling down JSON objects from a web service, in the low hundreds, each relatively small say 20kb each.
The app won't be doing much else than displaying these POJOs, downloading new and updated ones when available and deleting out of date ones. What would be the preferred method for persistent storage of these objects? I guess the two main contenders are storing them in a SQLite DB, maybe using ORMLite to cut down on the overhead, or just serialize the objects to disk, probably in one large file and use a very fast JSON parser.
Any ideas what would be the preferred method?
You could consider using CouchDB as cache between the mobile client and your webservice.
CouchDB would have to run on a service on the internet, caching the objects from the webservice. On the client you can use TouchDB-Android: https://github.com/couchbaselabs/TouchDB-iOS/wiki/Why-TouchDB%3F . TouchDB-Android can synchronize automatically with CouchDB inatance running on the Internet. The application itself would then access TouchDB solely. TouchDB automatically detects wetter or not there's an internet-connection, so your application keeps running even without internet.
Advantages:
- Caching of JSON calls
- Client remains working with internet-connection down, synchronized automatically when internetconnection is up again.
- Takes load of your webservice, and you can scale.
We used this setup before to allow Android software to work seamlessly, even when the internetconnetion would drop frequently and the service we accessed data from was quite slow and had limited capacity.
A dbms such as SQLLite should come with querying, indexing and sorting capabilities (and other standard SQL DBMS features), you should consider if you need any of these. How many objects are you planning to have in production environment? If say a million disk serialization approach might not scale.
I'm looking for cleaner way to horizontally scale my Java app with minimal impact in sources and infrastructure. Here is my problem: I'm currently saving resources in local file system, so i need to consistently share these files among all my new processing processing nodes.
I know the existence of ehcache and terractora server array but localRestartable (persistence guaranteed) is only available on ehcache enterprise and i want to keep commercial licenses as away as possible.
Other alternatives could be memcached, redis, mongodb (persistence in mind), even nfs, but i want the opinion of those who have experience using these services as storage services, also i need to clarify: Requirements prevent to use any on-line cloud storage service although i'm open to any alternative that could be installable on my data-center of course!.
With MongoDB you can take advantage of:
replica sets to distribute data to multiple servers
sharding to scale writes (if appropriate for the volume of data and writes you need to manage)
You have a few options for storing your binary files in MongoDB:
1) You could save the files as binary data within a field in a MongoDB document. The current document size limit (as at MongoDB 2.2) is 16Mb, which seems more than adequate for your ~1Mb files.
2) You can use the GridFS API to conveniently work with larger documents or fetch binary files in smaller chunks (see also: Java docs for the GridFS class).