If I have a MySQL table that is not changed very often (once a month) which contains information such as active user accounts for a web service. How safe is it to do something like:
public AccountDao
{
List<Account> accounts;
/*fields*/
public AccountDao()
{
refreshAccounts();
}
public void refreshAccounts()
{
this.accounts = /*call to database to get list of accounts*/
}
public boolean isActiveAccount(String accountId)
{
//logic involving in memory list object above
}
}
I would do this because I have to check a user has an active account for every request to allow access to the web service. This would allow me to avoid one SQL call to the database layer (which is stressed at the moment) on every request. My question is how safe is it to store data like this in production?
By the way, I would refresh the account list whenever a new user account is added via an API call. As stated above, this would happen about once to twice a month.
Access to the shared state in the DAO (and potentially its callers) will need to be synchronized somehow to achieve thread safety.
Stale data can cause a wrong access decision. Since that is likely security relevant, your code will need to be bullet proof; in particular, it needs to work reliably in case of failures. This makes any notification based scheme shaky - what if the notification is lost?
Lifetime of credentials in memory is prolonged. Confidentiality can still be achieved by hashing the credentials (and frankly, if somebody can read your application's memory, you have many other problems). Manipulation of the password in memory requires the attacker to have access to heap memory, and if he can that, you've lost anyway, because he could just as easily change the database connection used for reading accounts.
That said, for a high-traffic webservice, caching credentials sounds like a sensible idea, but it's not totally trivial.
Edit: No, web containers don't synchronize threads. Concurrent requests will be served by concurrent threads, and if these happen to read and write the same data, that can cause a data race. For instance, one thread could read the list of accounts while it is being updated with new information, and thus see an incomplete list.
If you have some sort of trigger to update the in-memory object when the corresponding data base tables change then it should be safe.
Without a trigger it becomes a matter of correctness and, potentially, policy. What happens if different part of your infrastructure have different in-memory versions of the database?
How long an interval is acceptable between e.g. adding or removing a user and that change being reflected by your service?
Have a look at caching. Your libraries probably already support it, if not memcached is a good option.
Related
Without relying on the database, is there a way to ensure a field (let's say a User's emailAddress) is unique.
Some common failure attempts:
Check first if emailAddress exists (by querying the DB) and if not then create the user. Now obviously in the window of check-then-act some other thread can create a user with same email. Hence this solution is no good.
Apply a language-level lock on the method responsible for creating the user. This solution fails as we need redundancy of the service for performance reasons and lock is on a single JVM.
Use an Event store (like an Akka actor's mailbox), event being an AddUser message, but since the actor behavior is asynchronous, the requestor(sender) can't be notified that user creation with unique email was successful. Moreover, how do 2 requests (with same email) know they contain a unique email? This may get complicated.
Database, being a single source of data that every thread and every service instance will write to, makes sense to implement the unique constraint here. But this holds true for Relational databases.
Then what about NoSql databases? some do allow for a unique constraint, but it's not their native behavior, or maybe it is.
But the question of not using the database to implement uniqueness of a field, what could be the options?
I think your question is more generic - "how do I ensure a database write action succeeded, and how do I handle cases where it didn't?". Uniqueness is just one failure mode - you may be attempting to insert a value that's too big, or of the wrong data type, or that doesn't match a foreign key constraint.
Relational databases solve this through being ACID-compliant, and throwing errors for the client to deal with when a transaction fails.
You want (some of) the benefits of ACID without the relational database. That's a fairly big topic of conversation. The obvious way to solve this is to introduce the concept of "transaction" in your application layer. For instance, in your case, you might send a "create account(emailAddress, name, ...)" message, and have the application listen for either an "accountCreated" or "accountCreationFailed" response. The recipient of that message is responsible for writing to the database; you have a couple of options. One is to lock that thread (so only one process can write to the database at any time); that's not super scalable. The other mechanism I've used is introducing status flags - you write the account data to the database with a "draft" flag, then check for your constraints (including uniqueness), and set the "draft" flag to "validated" if the constraints are met (i.e. there is no other record with the same email address), and "failed" if they are not.
to check for uniquness you need to store the "state" of the program. for safety you need to be able to apply changes to the state transactionally.
you can use database transactions. a few of the NoSQL databases support transactions too, for example, redis and MongoDB. you have to check for each vendor separately to see how they support transactions. in this setup, each client will connect to the database and it will handle all of the details for you. also depending on your use case you should be careful about the isolation level configuration.
if durability is not a concern then you can use in memory databases that support transactions.
which state store you choose, it should support transactions. there are several ways to implement transactions and achieve consistency. many relational databases like PostgresSQL achieve this by implementing the MVCC algorithm. in a distributed environment you have to look for distributed transactions such as 2PC, Paxos, etc.
normally everybody relies on availabe datastore solutions unless there is a weird or specific requirement for the project.
final note, the communication pattern is not related to the underlying problem here. for example, in the Actor case you mentioned, at the end of the day, each actor has to query the state to find if a email exists or not. if your state store supports Serializability then there is no problem and conflicts will not happen (communicating the error to the client is another issue). suppose that you are using PostgreSQL. when a insert/update query is issued, it is wrapped around a transaction and the underlying MVCC algorithm will take care of everything. in an advanced and distrbiuted environment you can use data stores that support distributed transactions, like CockroachDB.
if you want to dive deep you can research these keywords: ACID, isolation levels, atomicity, serializability, CAP theorem, 2PC, MVCC, distributed transacitons, distributed locks, ...
NoSQL databases provide different, weaker, guarantees than relational databases. Generally, the tradeoff is you give up ACID guarantees in exchange for increased scalability in the dimensions that matter for your application.
It's possible to provide some kind of uniqueness guarantee, but subject to certain tradeoffs. With NoSQL, there are always tradeoffs.
If your NoSQL store supports optimistic concurrency control, maybe this approach will work:
Store a separate document that contains the set of all emailAddress values, across all documents in your NoSQL table. This is one instance of this document at a given time.
Each time you want to save a document containing emailAddress, first confirm email address uniqueness:
Perform the following actions, protected by optimistic locking. You can on the backend if this due to a concurrent update:
Read this "all emails" document.
Confirm the email isn't present.
If not present, add the email address to the "all emails document"
Save it.
You've now traded one problem ... the lack of unique constraints, for another ... the inability to synchronise updates across your original document and this new "all emails" document. This may or may not be acceptable, it depends on the guarantees that your application needs to provide.
e.g. Maybe you can accept that an email may be added to "all emails", that saving the related document to your other "table" subsequently fails, and that that email address is now not able to be used. You could clean this up with a batch job somehow. Not sure.
The index of emails could be stored in some other service (e.g. a persistent cache). The same problem exists, you need to keep the index and your document store in sync somehow.
There's no easy solution. For a detailed overview of the relevant concepts, I'd recommend Designing Data-Intensive Applications by Martin Kleppmann.
Let's presume that we have an application "mail client" and a front-end for it.
If a user is typing a message or editing the subject or whatever, a rest call is made to update whatever the user was changing (e.g. the receivers) to keep the message in DRAFT. So a lot PUT's are happening to save the message. When closing the window, an update of every editable field happens at the same time. Hibernate can't handle this concurrency: Each of those calls retrieve the message, edit their own fields and try to save the message again, while the other call already changed it.
I know I can add a rest call to save all fields at the same time, but I was wondering if there is a cleaner solution, or a decent strategy to handle such cases (like for example only updating one field or some merge strategy if the object has already changed)
Thanks in advance!
The easiest solutions here would be to tweak the UI to either
Submit a single rest call during email submission that does all the tasks necessary.
Serialize the rest calls so they're chained rather than firing concurrently.
The concern I have here is that this will snowball at some point and become a bigger concurrency problem as more users are interacting with the application. Consider for the moment the potential number of concurrent rest calls your web infrastructure will have to support alone when you're faced with a 100, 500, 1000, or even 10000 or more concurrent users.
Does it really make sense to beef up the volume of servers to handle that load when the load itself is a product of a design flaw in the first place?
Hibernate is designed to handle locking through two mechanisms, optimistic and pessimistic.
Optimistic Way
Read the entity from the data store.
Cache a copy of the fields you're going to modify in temporary variables.
Modify the field or fields based on your PUT operation.
Attempt to merge the changes.
If save succeeds, you're done.
Should an OptimisticLockException occur, refresh the entity state from data store.
Compare cached values to the fields you must change.
If values differ, you can assert or throw an exception
If they don't differ, go back to 4.
The beautiful part of the optimistic approach is you avoid any form of deadlock happening, particularly if you're allowing multiple tables to be read and locked separately.
While you can use pessimistic lock options, optimistic locking is generally the best accepted way to handle concurrent operations as it has the least concurrency contention and performance impact.
I was hoping to understand how one can ensure data integrity in case of concurrent requests from same user to the same Spring Controller method ?
for e.g.suppose in an Online shopping scenario,a user happens to make
concurrent requests to a Controller method
(e.g.'/debitWallet?amount=100' to deduct his wallet money). The
'Wallet' could be a hibernate entity which is obtained in this method
through a standard 'WalletService' instance -->'WalletDao' instance.
Now how can we ensure data integrity of the wallet for concurrent
requests?
On what objects do I synchronize here?
what would be the scope of different beans(service,dao etc.) although I don't see any way that would help since the Wallet is going to be taken from the data store?
Should I even take the Wallet from the DB every-time the Controller method is invoked.Would it be a right approach? Instead should I use #SessionAttribute on this Wallet entity & then use it for every request to this method?
I could really use some help here to understand how to tackle data-integrity issue in this use case?
First of all answer the question: how frequent will be your data changed?
If it is not so frequent (or your database iterations are fast) you can use pattern: "User always operates with recent wallet instance, which is constantly synchronized with database". And to make it work user always sends Optimistick lock value (#Version field on entity), and in case changes happened in background: user receives Optimistick locking exception.
If it is frequent you should deeply analyze your implementation and then - search places for synchronization. Or even rework your API.
I am trying to find what is the usual design/approach for "static/global"! data access/storage in a web app, I'm using struts 2. Background, I have a number of tables I want to display in my web app.
Problem 1.
The tables will only change and be updated once a day on the server, I don't want to access a database/or loading a file for every request to view a table.
I would prefer to load the tables to some global memory/cache once (a day), and each request get the table from there, rather than access a database.
I imagine this is a common scenario and there is an established approach? But I cant find it at the moment.
For struts 2, Is the ActionContext the right place for this data.
If so, any link to a tutorial would be really appreciated.
Problem 2.
The tables were stored in a XML file I unmarshalled with JAXB to get the table objects, and so the lists for the tables.
For a small application this was OK, but I think for the web app, its hacky to store the xml as resources and read in the file as servlet context and parse, or is it?
I realise I may be told to store the tables to a database accessing with a dao, and use hibernate to get the objects.
I am just curious as to what is the usual approach with data already stored in XML file? Given I will have new XML files daily.
Apologies if the questions are basic, I have a large amount of books/reference material, but its just taking me time to get the higher level design answers.
Not having really looked at the caching options I would fetch the data from the DB my self but only after an interval has passed.
Usually you work within the Action scope, the next level up is the Session and the most global is the Application. A simple way to test this is to create an Action class which implements ApplicationAware. Then you can get the values put there from any jsp/action... anywhere you can get to the ActionContext (which is most anyplace) see: http://struts.apache.org/2.0.14/docs/what-is-the-actioncontext.html
Anyways, I would implement a basic interceptor which would check if new data should be available and I have not looked it up already, then load the new data (the user triggering this interceptor may not need this new data, so doing this in a new thread would be a good idea).
This method increases the complexity, as you are responsible for managing some data structures and making them co-operate with the ORM.
I've done this to load data from tables which will never need to be loaded again, and that data stands on it's own (I don't need to find relationships between it and other tables). This is quick and dirty, Stevens solution is far more robust and probably would pay you back at a later date when further performance is a requirement.
This isn't really specific to Struts2 at all. You definitely do not want to try storing this information in the ActionContext -- that's a per-request object.
You should look into a caching framework like EHCache or something similar. If you use Hibernate for your persistence, Hibernate has options for caching data so that it does not need to hit the database on every request. (Hibernate can also use EHCache for its second-level cache).
As mentioned earlier, the best approach would be using EHCache or some other trusted cache manager.
Another approach is to use a factory to access the information. For instance, something to the effect of:
public class MyCache {
private static MyCache cache = new MyCache();
public static MyCache getCache() {
return cache;
}
(data members)
private MyCache() {
(update data members)
}
public synchronized getXXX() {
...
}
public synchronized setXXX(SomeType data) {
...
}
}
You need to make sure you synchronize all your reads and writes to make sure you don't have race conditions while updating the cache.
synchronized (MyCache.getCahce()) {
MyCahce.getCache().getXXX();
MyCache.getCache().getTwo();
...
}
etc
Again, better to use EHCache or something else turn-key since this is likely to be fickle without good understanding of the mechanisms. This sort of cache also has performance issues since it only allows ONE thread to read/write to the cache at a time. (Possible ways to speed up are to use thread locals and read/write locks - but that sort of thing is already built into many of the established cache managers)
I've been always trying to avoid using Sessions. I've used spring security or other ways of having user logged in the application, which is I suppose the major use case for using Sessions.
But what are the other use cases ? Could you please make a list of those most important ones ? How come that I've been able to develop even complicated applications without using Sessions?
Is it because I'm using spring-mvc and using Sessions is practically not needed except the login stuff ?
EDIT: Guys this question was asking for use cases... Most of the answers explains what are sessions for. If we summarize some usecases, we can say for sure, when to use database or sessions for maintaining conversation state...
Don't you remember any concrete scenarios you needed sessions for? For past years :)
for instance some conversational state may become persistent after some point / event. In this case I'm using database from the beginning.
I think you can do anything you want without storing anything on a sessions.
I usually use the sessions to avoid having to pass state between the client and server (used id as an example) and when I don't want to send sensitive information to the client (even in encrypted form) as it might be a security problem.
Other ways of avoiding using the session are:
store some state on a database, e.g. shopping carts, instead of in the session, even if the cart is discarded after a certain amount of time.
store state in cookies e.g. for user customization
One use case when it's really useful to use the session is for conversations, although usually frameworks manage that behind scenes, and store the conversation in the session.
edit
Converstions (in my understanding) are something like wizards, in which you complete several forms in different pages and at the end you perform the action. e.g. in a checkout process, the user enters his name, shipping address and credit card details in different pages, but you want to submit the order just at the end, without storing any intermediate state in your DB.
By sensitive information I mean, imagine in the previous example, once the user sent his credit card details, you shouldn't return that information in any format (even encrypted) to the user. I know it's a bit paranoid, but that's security :).
In the ecommerce system i'm working on, there is an external system at the back-end which stores users' saved shipping and billing addresses. Our web app talks to it by making web service calls to retrieve those addresses. When we get the addresses, we store them in the session. That way, we only have to call the service once, when the user firsts looks at their addresses, and not every time we serve a page which needs address information. We have a time-to-live on the addresses, so if the addresses change (eg if the user telephones the customer service desk to change an address), we will eventually pick up the fresh ones.
It would be possible to store the addresses in our database, rather than in the session. But why would we? It's transient information which is already stored permanently somewhere else. The session is the ideal place for it.
Well in one sense your question is deep (what's SPECIAL about a session is worth knowing) and in another sense it's shallow (what can't I do if I don't use them turns out to be a somewhat odd question)
In the end a Session is merely (or could be) a ConcurrentHashMap (in fact it usually isn't that threadsafe) with a a key of unique session id passing as the cookie. You know why it's useful, but to answer you for use cases
clustering (this is how state gets distributed across nodes)
caching general state of the user and their objects (as opposed to reloading from db each time)
built in methods for sessionlisteners to watch when someone is timed out, or attributes change.
= used for by a lot of localization utilities
Can you do all this with a database or your own hashmap implementation/filter? Of course, there's nothing magical about Sessions. They are merely a convenient standard for having some objects follow a logged in user and be tied to the lifetime of that user's use of the application.
Why do you use Servlets? You could also implement your own socket level standard? The answer to that is using standard apis/implementations provides convenience and other libraries build upon them.
The cons are
you are reinventing the wheel and some code that has been time tested
you won't be able to use a lot of built in facilities for monitoring/managing/clustering/localizing etc.
Sessions are one way of maintaining conversational state across multiple requests (e.g. multiple stateless HTTP requests.)
There are other ways of implementing conversational state, for example, storing an authentication token or some suitable conversation id as a cookie and maintaining a store of conversation id to session state. (In essence, duplicating what the app server is doing when it provides sessions.)
That you haven't needed to use sessions means that your application either doesn't need conversational state or you've implemented it in a different way. For example, perhaps your application uses an authentication token (say a cookie) and persists all state changes to the database. With that kind of arrangement, there is no need for a conversation state.
Hi you can take an example of shopping cart because since Http is stateless protocol it does not maintain the status of the user who sends the request.
For e.g.
If one user sends a request to buy camera from say eBay and after some minutes another user sends a request to buy laptop.
But since http is stateless protocol so server is not able to separate the request send by the users and may it happen that the bill of the laptop may be given to first user.
So through session we can maintain a particular entity over the server side for a particular user.