I am developing a photo album system and decided to use Redis. I keep the user's photo data, (who has which photos) in Redis. For example : photos:1000:pid [1,24,525,12,42,62,56] means the user with the id 1000 has the photos in the list (ids). The point that I confused is when I got the [1,24,525,12,42,62,56], how I can get the photo details ? I thought using Redis to get photo details again. However, when a user has 150 photo, getting them one by one (from java using jedis in a loop) costs 100 - 150 msec which is not suitable for my case. I have to manage a high traffic. Response shouldn't be over 100msec.
I decided to use DB by using stored procedures, "one shot, get everything" knowing the photo ids (they are indexed). Does "Get ids from Redis, get details from DB" is a proper approach ? What would you do for this situation ?
I would not recommend to use two different stores. Keep it simple. Think about the consistency of your data. If you are more familiar with a relational database, there is nothing wrong in using it (for all your data).
Now, if you want to store everything in Redis, it is also possible, provided you can anticipate all access paths to your data.
With Redis, running several commands to get some data is quite efficient if you bundle these commands in the same rountrip. Redis server (and most clients) fully supports pipelining. Assuming you use Jedis, you can find some examples here.
Actually, there are multiple ways to solve your problem.
Let's suppose you have the following model:
photos:<userid> -> set of photo IDs for a given user ID
photo:<photoid> -> hash of photo properties for a give photo ID
If you are interested by retrieving specific photo properties (say name and size) for a given user (i.e. like a select name, size from ...), it can be done using a single SORT command.
SORT photos:<userid> by nosort get # get photo:*->name photo:*->size
If you are interested by retrieving all the photo properties for a given user (i.e. like a select * from ...), it is a bit more complex.
One solution is to use pipelining and perform two roundtrips:
first roundtrip to get the set of photo IDs (using SMEMBERS)
second roundtrip to pipeline all the HGETALL commands (one per photo)
An alternative solution would be to use server-side Lua scripting to perform all the aggregation on server side. Complexity is higher, but the cost would be a single roundtrip.
Related
I am java developer and my application is in iOS and android.I have created web service for that and it is in restlet Framework as JDBC as DB connectivity.
My problem is i have three types of data it is called intersection like current + Past + Future.and this intersection contain list of user as a data.There is single web service for giving all users to device as his/her intersection.I have implement pagination but server has to process all of his/her intersections and out of this giving (start-End) data to device.I did this because there are chances that past user may also come in current.This the total logic.
But as intersection grows in his/her profile server has to process all user.so it become slow and this is obvious.also device call this web service in every 5 minutes.
please provide better suggestion to handle this scenario.
Thanks in advance.
Ketul Rathod
It's a little hard to follow your logic, but it sounds like you can probably benefit from caching your results on the server.
If it makes sense, after every time you process the users data on the server, save the results (to a file, to a database table, whatever). Then, in 5min, if there are no changes, simply return the same. If there were changes, retrieve from cache (optionally invalidating the cache in the process), append those changes to what is cached, re-save the results in the cache, and return the results.
If this is applicable to your workflow, your server-side processing time will be significantly less.
Which is the most efficient way to retrieve data for performing search operation.
Following is the requirement application needs search like feature for known variables(search keywords).
NB:: Currently application already have search keywords stored in keys that are stored in data cache in form of objects maintained at application level and are used for other purpose than performing search.
There are two possibilities now that are available to enable searching
(1) perform some pattern matching with java.util.regex.Pattern and then fetch the identified result rows from the cache or
(2) Ask the database to perform the match and retrieve matching rows?
Need to know which is more efficient.
Any inputs on it or data on simulators performed for similar operation would be appreciated ?
Option 1 is preferable because it does not involve network I/O.
Pattern matching and looking up in local cache will most likely take nanoseconds or a few milliseconds while sending a request to the database over the wire and waiting for the response will take a few dozen (or a few hundred) milliseconds. It's irrelevant that the database possibly implements the actual data look-up a bit faster than your own code.
This became too big to put into a comment:
To simply answer your question: Option 1 is preferable with what you describe, eg a local cache and a database accessible over network.
I'd like to emphasize "local" cache. If we're talking about a distributed cache you incur the network penalty and then the answer would be "we need more information". Factors to consider are the average size of a row, median network latency, read and write probability,... Answering this is a real pain.
When I face such a decision, I usually go through the following steps to decide what to use. The main metric here is simplicity, ie I'm looking for the most simple solution possible to save my time while still having a responsive site.
When starting, I try with no cache.
If that doesn't suffice and I still have one app server, I implement a local cache.
When I need to scale out by adding more app servers (behind a load-balancer), I try with no caches again (relying on the DB cache)
Only if that hits a performance limit, I implement a distributed cache system by attaching redis or memcache instances as needed (probably keeping a small cache on the individual app servers).
I have 3000 records in an employee table which I have fetched from my database with a single query. I can show 20 records per page. So there will be 150 pages for with each page showing 20 records. I have two questions on pagination and sortable column approach:
1) If I implement a simple pagination without sortable columns, should I send all 3000 records to client and do the pagination client side using javascript or jquery. So if user clicks second page, call will not go to server side and it will be faster. Though I am not sure what will be the impact of sending 3000 or more records on browser/client side? So what is the best approach either sending all the records to client in single go and do the sorting there or on click of page send the call to server side and then just return that specific page results?
2) In this scenario, I need to provide the pagination along with sortable columns (6 columns). So here user can click any column like employee name or department name, then names should be arranged in ascending or descending order. Again I want to know the best approach in terms of time response/memory?
Sending data to your client is almost certainly going to your bottleneck (especially for mobile clients), so you should always strive to send as little data as possible. With that said, it is almost definitely better to do your pagination on the server side. This is a much more scalable solution. It is likely that the amount of data will grow, so it's a safer bet for the future to just do the pagination on the server.
Also, remember that it is fairly unlikely that any user will actually bother looking through hundreds of result pages, so transferring all the data is likely wasteful as well. This may be a relevant read for you.
I assume you have a bean class representing records in this table, with instances loaded from whatever ORM you have in place.
If you haven't already, you should implement caching of these beans in your application. This can be done locally, perhaps using Guava's CacheBuilder, or remotely using calls to Memcached for example (the latter would be necessary for multiple app servers/load balancing). The cache for these beans should be keyed on a unique id, most likely the mapping to the primary key column of the corresponding table.
Getting to the pagination: simply write your queries to return only IDs of the selected records. Include LIMIT and OFFSET or your DB language's equivalent to paginate. The caller of the query can also filter/sort at will using WHERE, ORDER BY etc.
Back in the Java layer, iterate through the resulting IDs of these queries and build your List of beans by calling the cache. If the cache misses, it will call your ORM to individually query and load that bean. Once the List is built, it can be processed/serialized and sent to the UI.
I know this doesn't directly answer the client vs server side pagination, but I would recommend using DataTables.net to both display and paginate your data. It provides a very nice display, allows for sorting and pagination, built in search function, and a lot more. The first time I used it was for the first web project I worked on, and I, as a complete noobie, was able to get it to work. The forums also provide very good information/help, and the creator will answer your questions.
DataTables can be used both client-side and server-side, and can support thousands of rows.
As for speed, I only had a few hundred rows, but used the client-side processing and never noticed a delay.
USE SERVER PAGINATION!
Sure, you could probably get away with sending down a JSON array of 3000 elements and using JavaScript to page/sort on the client. But a good web programmer should know how to page and sort records on the server. (They should really know a couple ways). So, think of it as good practice :)
If you want a slick user interface, consider using a JavaScript grid component that uses AJAX to fetch data. Typically, these components pass back the following parameters (or some variant of them):
Start Record Index
Number of Records to Return
Sort Column
Sort Direction
Columns to Fetch (sometimes)
It is up to the developer to implement a handler or interface that returns a result set based on these input parameters.
I have a database with a lot of web pages stored.
I will need to process all the data I have so I have two options: recover the data to the program or process directly in database with some functions I will create.
What I want to know is:
do some processing in the database, and not in the application is a good
idea?
when this is recommended and when not?
are there pros and cons?
is possible to extend the language to new features (external APIs/libraries)?
I tried retrieving the content to application (worked), but was to slow and dirty. My
preoccupation was that can't do in the database what can I do in Java, but I don't know if this is true.
ONLY a example: I have a table called Token. At the moment, it has 180,000 rows, but this will increase to over 10 million rows. I need to do some processing to know if a word between two token classified as `Proper NameĀ“ is part of name or not.
I will need to process all the data. In this case, doing directly on database is better than retrieving to application?
My preoccupation was that can't do in the database what can I do in
Java, but I don't know if this is true.
No, that is not a correct assumption. There are valid circumstances for using database to process data. For example, if it involves calling a lot of disparate SQLs that can be combined in a store procedure then you should do the processing the in the stored procedure and call the stored proc from your java application. This way you avoid making several network trips to get to the database server.
I do not know what are you processing though. Are you parsing XML data stored in your database? Then perhaps you should use XQuery and a lot of the modern databases support it.
ONLY an example: I have a table called Token. At the moment, it has
180,000 rows, but this will increase to over 10 million rows. I need
to do some processing to know if a word between two token classified
as `Proper NameĀ“ is part of name or not.
Is there some indicator in the data that tells it's a proper name? Fetching 10 million rows (highly susceptible to OutOfMemoryException) and then going through them is not a good idea. If there are certain parameters about the data that can be put in a where clause in a SQL to limit the number of data being fetched is the way to go in my opinion. Surely you will need to do explains on your SQL, check the correct indices are in place, check index cluster ratio, type of index, all that will make a difference. Now if you can't fully eliminate all "improper names" then you should try to get rid of as many as you can with SQL and then process the rest in your application. I am assuming this is a batch application, right? If it is a web application then you definitely want to create a batch application to do the staging of the data for you before web applications query it.
I hope my explanation makes sense. Please let me know if you have questions.
Directly interacting with the DB for every single thing is a tedious job and affects the performance...there are several ways to get around this...you can use indexing, caching or tools such as Hibernate which keeps all the data in the memory so that you don't need to query the DB for every operation...there are tools such as luceneIndexer which are very popular and could solve your problem of hitting the DB everytime...
It is advisable to store some information(meta-data) about a content in the Id(or key) of that content ?
In other words, I am using a time based UUIDs as the Ids (or key) for some content stored in the database. My application first accesses the list of all such Ids(or keys) of the content (from the database) and then accessed the corresponding content(from the database). These Ids are actually UUIDs(time based). My idea is to store some extra information about the content, in the Ids itself, so that the my software can access this meta-content without accessing the entire content from the database again.
My application context is a website using Java technology and Cassandra database.
So my question is,
whether I should do so ? I am concerned since lots of processing may be required (at the time of presentation of data to user) in order to retrieve the meta data from the ids of the content!! Thus it may be instead better to retrieve it from database then getting it through processing of the Id of that content.
If suggested then , How should I implement that in an efficient manner ? I was thinking of following way :-
Id of a content = 'Timebased UUID' + 'UserId'
where, 'timebasedUUID' is the generated ID based on the timestamp when that content was added by a user & 'userId' represents the Id of the user who put that content.
so my example Id would look something like this:- e4c0b9c0-a633-15a0-ac78-001b38952a49(TimeUUID) -- ff7405dacd2b(UserId)
How should I extract this userId from the above id of the content, in most efficient manner?
Is there a better approach to store meta information in the Ids ?
I hate to say it since you seem to have put a lot of thought into this but I would say this is not advisable. Storing data like this sounds like a good idea at first but ends up causing problems because you will have many unexpected issues reading and saving the data. It's best to keep separate data as separate variables and columns.
If you are really interested in accessing meta-content with out main content I would make two column families. One family has the meta-content and the other the larger main content and both share the same ID key. I don't know much about Cassandra but this seems to be the recommended way to do this sort of thing.
I should note that I don't think that all this will be necessary. Unless the users are storing very large amounts of information their size should be trivial and your retrievals of them should remain quick
I agree with AmaDaden. Mixing IDs and data is the first step on a path that leads to a world of suffering. In particular, you will eventually find a situation where the business logic requires the data part to change and the database logic requires the ID not to change. Off the cuff, in your example, there might suddenly be a requirement for a user to be able to merge two accounts to a single user id. If user id is just data, this should be a trivial update. If it's part of the ID, you need to find and update all references to that id.