The company that I work for has a web system, some modules will be used in the works, which do not always have connection to the internet.
What the best solution to deploy web modules of the system in construction?
The system cant read/write in the database.
You can have a local database that runs on your webserver. Every 15 min you check if there is a connection and transfer the changes from your local database onto your "real" database.
There is no perfect solution to this problem due to the CAP theorem. If there was, distributed systems would be a whole lot easier, though :)
The way I see it, you have a couple of options. All options require a copy of the data client-side so that the client can at least read the data while it's disconnected.
Don't allow inconsistent writes, just inconsistent reads.
Either the client or the server would take write-ownership of the data whenever the client is off the network. Whichever side owns the data is allowed to write, and the other side must read a potentially-stale local copy of the data (or it could generate errors if that's preferable, or at least tell the user that the data is stale). This is more difficult if you have multiple clients running at the same time.
A similar approach is to run all server-side write operations when you know no clients will be at work (i.e. run all your jobs at midnight). It's simple, but it works for many applications.
Allow inconsistencies and then deal with them later. When you have to be able to write to both sides of the disconnected network concurrently, this is the only way to go. There are a couple of ways to mitigate the inconsistency issues, however, depending on your design:
If you can make all transformations commutative (you can change the order of the transformations and the outcome will be the same), you could store the transformations performed on the client and the server and then apply any cached ones from the client when the it reconnects to the server. This makes dealing with inconsistencies very simple. This is what ATMs do when they are disconnected from the banking network - their commutative transformations are similar to, "Deduct $50 from acct #12345." Note that this can still cause invalid state - the user could deduct more than they had in their account by visiting multiple ATMs which are disconnected from the banking network. However, the bank makes up for this by charging overdraft fees when the ATMs reconnect to the network, and so they don't typically lose any money as a result.
If conflicts are rare, you could just tell a user of the client, "Hey, you wrote this while you were offline, but the value changed on the server - which copy should I keep?" This has the issue that you might have to ask it multiple times, since the user has to perform an atomic CAS operation on the value being changed and it's possible that multiple clients might be reconnecting at the same time. This approach can also suffer from the ABA problem if you're not careful (but that depends what you're doing).
Related
In a Spring Boot application, I keep a TreeMap in memory. I'm doing around 10,000 operations per second, and it may increase. To improve performance, I kept data in memory. I want my app to be able to start from the same state when application is restarted.
There are some methods I could find for this.
Keeping data on Hazelcast.
In this case I don't risk losing the data unless the Hazelcast dies, but if the Hazelcast dies, I can't restore data. Additionally, I don't think it makes sense to sync that amount of operations on Hazlecast.
Synchronizing events to database.
Here, my risk of data loss is very low. However, I need to execute a query after each operation. This may affect performance. Also, I need to handle exceptions on database update.
Synchronizing data in batches
There is only one ready solution that I could find here, MapDB. I'm planning to try it but I haven't tried it yet. If there is a more reliable, optimized sink solution that also uses db instead of file, I would prefer to use it.
Any recommendation to solve this question?
Do you need a Map or a TreeMap ?
Is collating sequence relevant for storage, for access or neither.
For Hazelcast, the chance for data loss is configurable. You set up a cluster with the level of resilience you want. This is the same as with disk, if you have one disk and it fails, you lose data. If you have two and one goes offline, you don't lose data. You allocate hardware for the level of resilience you need. Three is the recommended minimum.
(10,000 per second isn't worrying either, 1,000,000,000 has been done. Sync to an external store can be immediate or in batches)
Disclaimer, I work for Hazelcast, but I think your question is more fundamental -- how do you keep your store available.
Simply, don't restart.
Clustered solutions are the answer here. If you have multiple nodes, the service as a whole stays running even if a few nodes go offline.
Do rolling bounces.
If you must restart everything at once, what matters is how quickly can your service bring all data back and what does it do when the restore is 50% done (is 50% data visible?). Immediate replication to elsewhere is only really necessary if you have a clustered solution that hasn't been configured for resilience. Saving intermittently is fine if you have solved resilience.
So, configure your storage so that it doesn't go offline, makes the solution options for backup/restore all the easier.
tl;dr: Setting the polling-interval to 0 has given my performance a huge boost, but I am worried about possible problems down the line.
In my application, I am doing a fair amount of publishing from our java server to our flex client, publishing on a variety of topics and sub-topics.
Recently, we have been on a round of performance improvements system-wide, and the messaging layer was proving to be a big bottleneck.
A few minutes ago, I discovered that setting the <polling-interval-millis> property in our services-config.xml to 0 caused published messages, even when there are lots of them, to be recognized by the client almost instantly, instead of with the 3 second delay that is the default value for polling-interval-millis, which has obviously had a tremendous impact.
So, I'm pretty happy with the current performance, only thing is, I'm a bit nervous about unintended side-effects caused by this change. In particular, I am worried about our Flash client slowing way down, and of way too much unwanted traffic.
My preliminary testing has not borne out this fear, but before I commit the change to our repository, I was hoping that somebody with experience with this stuff would chime in.
Unfortunately your question is too general...there is no way to receive a specific answer. I'll write below some ideas, maybe they are helpful.
Decreasing the value from 3 to 0 means that you are receiving new data way faster. If your Flex client uses this data in order to make complex computations it is possible to slow your client or to show obsolete data (it is a known pattern, see http://help.adobe.com/en_US/LiveCycleDataServicesES/3.1/Developing/WS3a1a89e415cd1e5d1a8a18fb122bdc0aad5-8000Update.html ). You need to understand how the data is processed and probably to do some client benchmarking.
Also the server will have to handle more requests, and it would be good to identify what is the maximum requests per second which can be handled. For that, you will need to use a tool like Jmeter in order to detect the maximum capacity of your system, after that you can do some computations trying to figure out how many requests per second you will have after you reduced the interval from 3 to 0, taking into account that the number of clients is increasing with 10% per month etc etc.
The main idea is that you should do some performance testing for some API and save the scripts in order to see if your future modification are slowing down the system too much. Without having this it is quite hard to guess if it ok or not to change configuration parameters.
You might want to try out long-polling. For our Weblogic servers, we don't get any problems unless we let the poll request go to 5 minutes, so we keep it to 4, then give it a 1 second rest before starting again. We have a couple of hundred total users, with 60-70 on it hard core all day. The thing to keep in mind is that you're basically turning intermittent user requests into what amounts to almost always connected telnet sessions. Depending on the browser your users are using it can implications from that as well, but overall we've been very pleased.
I want to make a java program that must run for a specific number of times only. e.g 2,5,10 etc. after that it must throw an Exception.
It is not allowed to use any FILE or Database for this. Someone gave me a hint of REGISTRY! But i don't know how to use it for this.
Please help me is this regard...
You can use java preferences (registry on windows) :
http://download.oracle.com/javase/6/docs/api/java/util/prefs/Preferences.html
You can find some sample usage here:
http://www.particle.kth.se/~lindsey/JavaCourse/Book/Part1/Java/Chapter10/Preferences.html
Whether this problem is solvable depends on what is meant by "any FILE or Database".
Depending on your point of view, the Windows Registry is a kind of file / database. Certainly, the only reason that values stay in the registry over a reboot is because registry changes are written to disc.
You can move state (such as a count of the number of times an application has been run) to some other service on the local machine. But once again, unless the service saves that state to disc (or some other stable storage medium) it will be lost is the local machine reboots.
You can move state to a service on a remote machine, but once again it may be lost if not saved to disc, etc. Moreover, you may not be able contact that remote service at the time you need the state; e.g. when starting the application.
You can copy the state to lots of remote services without discs, but a network failure (or the user unplugging from the network) will stop you accessing the state.
So to summarize, if you cannot write to disc (or nvram, tape, etc) locally, you cannot guarantee that the counter won't get reset, and that it will be available when needed. Therefore you cannot guarantee that the application won't be run more times than is allowed.
I imagine that you are trying to come up with some limited use scheme that users cannot subvert; e.g. by deleting stuff from the file / database / whatever that counter. Unfortunately, unless you physically control BOTH the hardware AND the operating system on which the application runs, you cannot prevent someone from subverting any counter stored on the machine itself. Anyone with "root" or full administrator rights, or with physical access, can ultimately change any data on the machine itself.
The best you can do is establish a secure connection to a remote server and use that to hold the usage counter. But even that is futile, because a motivated person can reverse engineer the critical part of your application and disable the code that checks the counter.
If the app. has a GUI, it can be launched using Java Web Start and use the PersistenceService. Here is a small demo. of the PersistenceService. The code is available for download.
Edit:
And the PersistenceService should work on any machine that has a JRE, as opposed to just Windows.
Even though this sounds like an attempt at copy protection, you may want to consider self-modifying code. There is an interesting discussion on this subject in Java here: Self modifying code in Java
I'm using the synchronous implementation of JRedis, but I'm planning to switch to the asynchronous way to communicate with the redis server.
But before that I would like to ask the community whether the JRedisFuture implementation of alphazero's jredis is stable enough for production use or not?
Is there anybody out there who is using it or have experience with it?
Thanks!
When JRedis gets support for transaction semantics (Redis 1.3.n, JRedis master branch) then certainly, it should be "stable" enough.
Redis protocol for non-transactional commands, themselves atomic, allows a window of unrecoverable failure when a destructive command has been sent, and on the read phase the connection faults. The client has NO WAY of knowing if Redis in fact processed the last request but the response got dropped due to network failure (for example). Even the basic request/reply client is susceptible to this (and I think this is not limited to Java, per se.)
Since Redis's protocol does not require any metadata (at all) with the DML and DDL type commands (e.g. no command sequent number) this window of failure is opened.
With pipelining, there is no longer a sequential association between the command that is being written and the response that is being read. (The pipe is sending a command that is N commands behind the one that caused Redis to issue the response being read at the same time. If anything goes kaput, there are a LOT of dishes in air :)
That said, every single future object in the pipe will be flagged as faulted and you will know precisely at which response the fault occurred.
Does that qualify as "unstable"? In my opinion, no. That is an issue with pipelining.
Again, Redis 1.3.n with transaction semantics completely addresses this issue.
Outside of that issue, with asynchronous (pipelines), there is a great deal of responsibility on your part for making sure you do not excessively overload the input to the connector. To a huge extent JRedis pipelines protect you from this (since the caller's thread is used to make the network write thus naturally damping the input load on the pending response queue).
But you still need to run tests -- you did say "Production", right? )) -- and size your boxes and put a cap on the number of loading threads on the front end.
I would also potentially recommend not running more than one JRedis pipeline on multi-core machines. In the existing implementation (which does not chunk the write buffer) there is room for efficiencies (in context of full bandwidth utilization and maximizing throughput) to be gained by running multiple pipelines to the same server. While one pipeline is busy creating buffers to write, the other is writing, etc. But, these two pipelines will interfere with one another due to their (inevitable -- remember they are queues and some form of synchronization must occur) and periodic cache invalidation (on each dequeue/enqueue in worst case -- but in Doug Lea we trust.) So if pipeline A average latency hit d1 (in isolation), then so does pipe B. Regrettably, running two of them on the same cores will result in a new system wide cache invalidation period that is HALF of the original system so TWICE as more cache invalidations occur (on average). So it is self defeating. But test your load conditions, and on your projected production deployment platform.
I have a webapp in development. I need to plan for what happens if the host goes down.
I will lose some very recent session status (which I can live with) and everything else should be persistently stored in the database.
If I am starting up again after an outage, can I expect a good host to reconstruct the database to within minutes or seconds of where I was up to, or should I build in a background process to continually mirror the database elsewhere?
What is normal/sensible?
Obviously a good host will have RAID and other redundancy, so the likelihood of total loss should be low, and if they have periodic backups, I should lose only very recent stuff but this is presumably likely to be designed with almost static web content in mind, and my site is transactional with new data being filed continuously (with a customer expectation that I don't ever lose it).
Any suggestions/advice?
Are there off-the-shelf frameworks for doing this? (I'm primarily working in Java.)
Should I just plan to save the data or should I plan to have an alternative usable host implementation ready to launch in case the host doesn't come back up in a suitable timeframe?
You need a replication strategy which of course depends on your database engine.
It's usually done by configuration.
http://en.wikipedia.org/wiki/Replication_%28computer_science%29
I've experience with Informix there you can setup data replication to have a standby system available or do full backup the data, and replay logical-logs (which contain basically all SQL-Statement) which needs more time to recover from a crash.
Having a redundant storage is also a good idea on case of a disc crashes. This topic is probably better discussed on serverfault.com