I am using Java EWS API to connect my application to MS Exchange and read user email requests. These requests are then processed through the system workflow. The amount of emails in a day is limited to 50 so the overall volume is less. However I am looking at an efficient and reliable mechanism to read from exchange server using EWS API. Also note that once the email is processed we move it to sub folders so Inbox only has the unprocessed requests
Currently as I understand the following schemes are used to connect to Exchange server and perform various operations on the mailbox.
Polling - Connect to Exchange using the standard Exchange Service interface; find all new emails and process them in sequence. The client has better control over failures and synchronization between the reads and moving to processed folders. On the downside the experience isn’t real time and connections are made to exchange even if there isn’t any activity.
Pull Notifications - This method is almost identical to previous one, subscribe to pull notifications using an interval and read emails from Inbox whenever the timer event occurs. Pros and cons are similar to approach 1.
Push Notifications - Here the clients subscribe to exchange server for receiving push notifications by registering themselves to particular events and define a callback mechanism (Client Web service) to receive notifications. On the upside the notifications are near real time and connections are made only when there are events. On the downside I see that subscriptions and watermark needs to be managed on the client side so that events aren’t lost. Not sure if this is still a reliable approach as what happens to messages that are already in the inbox before establishing a subscription; will those events be replayed when server starts? It’s not clear.
Streaming Subscription - Clients establish a Streaming connection and then keep it open for a maximum of 30 min with the server and during this time exchange will notify any registered events. Once the connection dies there is an ability to restore it so that the subscription stays alive. It seemed like the best approach until I started hearing that an additional steps to Sync folder items and maintain sync state; is required at regular intervals so that events are not missed from connect/disconnect.
Looking at my needs (read emails from exchange server reliably) and analysis of various options I feel that approach 1 is simple and more reliable as it gives better control over the entire process. But at the same time I wanted to circle with others who are familiar with the API to correct me if my understanding of the framework in terms of pros and cons is wrong.
I am open for any suggestions from the group in order to make this better as the intent is to not miss any email.
I'd go for the code simplicity of option 1. If you connect once a minute the load is very low (just a FindItem call returning nothing) and the users experience it as almost instantaneous.
You're are only handling 50 a day max so the wish to be 'instantaneous' is a bit contradictory (if the user only does that many updates he surely can wait a minute).
Related
I have a web-service on my server that pushes the xml data to the clients that are communicating to it over internet.
In these cases we have challenge to receive acknowledgement from the
client.
Specific case like, once client has received the data and before
sending the acknowledge, if the communication channel goes down.
Example:
In case of the software updates on clients over internet, how the server makes sure every thing is processed fine.
If you want to go on the "push" path, and you absolutely must know if the update was succesful, then you have to build your service and clients in such a way that you do know.
Basically what you need to do is build a small protocol so that information is transmitted no matter the failures of the communication channel. This means two things:
Your service does re-transmissions;
Your clients can deal with duplicate messages;
For example:
service pushes a message, client acknowledges => all good;
service pushes a message, the connection goes down, the message is lost. The client does not acknowledge since it never got the message => service pushes that same message once again at some later time. Now hopefully you get to case 1.
service pushes a message, client acknowledges but the connection fails and the service does not receive the acknowledge => similar to 2, so the service pushes that same message once again some later time and now the client receives the same message twice. It must ignore the second message but still needs to send an acknowledge so the service does not send it a third, forth, ... nth time;
And so on and so forth...
This is a high level description of what TCP does, for example. TCP is a reliable protocol over an unreliable network. It handles dropped packets, duplicated packets, etc.
Now, that would be pushing. A more simple alternative would be to use "pull" instead. The clients periodically pull the updates from the server. This is simpler to implement (the download is succesful if it worked, otherwise you try again later) but it's not without its gotchas, like for example:
controlling when clients start to pull data from the service. You can't just have them all update at the same time or you might overload the server. Clients should first ask the server if it's OK to update now or comme back later when the service is not so busy;
are you downloading upgrades in the background, from user devices? Data charges might apply so maybe it's better to ask the user if it wants the update now or later instead of doing it behind the scenes;
updating in the background, even if there is no problem with data charges might still consume bandwith when the client needs that bandwith for something else;
And so on and so forth...
The thing is this is a large topic, with general solutions that might not apply given particular situations. But it is not a new topic. Others have had these issues before. Consider for example Windows updates, how each PC's OS updates itself. Something similar happened a while ago when thick clients needed updates. The world moved to thin clients but now thick clients are making a comeback. Have a look at how these issues are solved, you will find usefull information online.
I do not think there is a way to do that. I believe the reason you are asking is for the following reasons:
1) If you are asking because you are sending a lot of data and your client deny receiving it, perhaps you can paginate it. That way you will know when the last page was accessed. You can even go one step further and just put very little data on your last page, that way you are sure that the last page is called.
2) If you are genuinely concerned about ensuring that they receive the entire data. How about suggest they access a 2nd web service which contains the checksum for the data, and suggest that they compare it.
Assuming that your web service is RESTful, your server should be stateless. The client should make sure it receives the data properly.
You could define a service to get the hash value of the data, followed by the request to receive the data itself. The client can check after the download whether the hash value of the downloaded data corresponds to the value received by the first call.
Amongst others, you could use MD5, SHA-1 and SHA256 in standard Java, as described in the Oracle documentation. This will calculate the hash value of the data from the server side.
Assuming you use Javascript from the client side, there are many possibilities to calculate the hash code using the same algorithms (jsSHA, for example).
I hope it helps.
I am implementing sending of browser push notifications via Google Cloud Messaging and Firefox Push Notification System. For this, we have to make HTTP Post requests to GCM and FPNS.
To make HTTP request to GCM/FPNS we should have user registration IDs. Using JavaScript we are collecting registration IDs and storing it in Cassandra. Each record contains user registration information (Registration ID and browser type).
When we make an HTTP request to GCM/FPNS we should send registration IDs along with the request to GCM/FPNS based on browser type (if user registration ID belongs to Chrome we will make GCM request otherwise FPNS request). For example, if we have 10,000 records we should make around 10,000 requests to FPNS/GCM.
Once GCM/FPNS receives the user registration IDs, it will send a push notification to the browser. In browser, we have JavaScript code (Service Worker) to handle the notification event.
For above requirement, synchronous servlet architecture is not good enough. Because to process 10,000 records, it may take assuming 10 to 15 minutes, even if we are using multithreading. It may cause tomcat memory leakage and an out of memory exception.
When I was searching online, people are suggesting asynchronous servlet architecture. Once we take the request from the client to send the notification we will have respond immediately (something like 200 Ok Added to queue) and also this request should be added to Message Queue (JMS). From JMS we use multithreading to make asynchronous HTTP requests.
I am not finding the correct way of doing this. Can you suggest a way of implementing this functionality (Architecture Design and control flow)?
Short of changing to something like PubNub, I would create a worker queue. This could be done with JMS or just a shared Queue (search for producer/consumer). JMS would be, in my opinion, the easiest though it gets harder to distribute in a cluster.
Basically you could continue to have a synchronous servlet - it would take the message, put it on the queue, and return the 200. Placing a message on the queue would have very minimal blocking - a couple of milliseconds at best.
As you indicated, on the queue consumer side you would then have to handle many requests. Depending on the latency requirements of your system you may need to thread or off load that. It really depends on how fast you need to send the messages.
For a totally different architecture, you could consider a "queue in the cloud". I've used Amazon SQS for things like this. You wouldn't even have a servlet - the message would go straight to SQS and then something else would pull it off and process it.
For reference I don't work for Amazon or PubNub.
I've been through different questions about this topic, however, none of them have cleared my doubts on the best approach notifying the client side of a server-client IM app.
The Problem:
The whole problem is how to notify the client application of updates. I've alread seen the following approaches:
Clients keeps checking for updates: From time to time, client app performs a check in the server to see if there are updates for that specific user;
Problem: it is not performatic at all. Suppose you have one million users and each one of them checks for new updates every second. Serve would have to deal with one million requests per second. Wont work.
Client app opens a socket: The client app opens a socket and sends its address to the server. Server, by its turn, persists this information and connects to the socket whenever it needs to notify the client of some update.
Problem: Often the client will be connected to a NAT, so, the IP it has access to is in a non-visible range. In order to send messages to this client, a port forwarding in the NAT would have to be configured, which can't be done.
Despite of the technology, I think this approach will always be used, however, I have no idea how the problem described above can be solved.
Google Cloud Message (GCM): use the GCM service to notify the client of any update. Problem: It does't seems right to use a third server to handle the IM and it raises concerns about the scalability of the system. When the number of messages and users increases exponentially, it seems that the service will go down. Despite that, it seems that passing the information for two servers before delivering to the targets just adds bottlenecks in the process.
A combination of 2 and 3: uses GCM to reach the client when the last persist addres is no longer available.
Problem: same as described in 2
XMPP: I've seen many answers indicating the use of XMPP for IM applications, however, XMPP is a protocol - as per what I've foun in the web. I don't see how it can solve the problem described in 2 for instance.
Given the options above, can someone indicate me what line should I try to go for? Which one of these approaches has the best chances of success?
Thank y'all in advanced.
Use Google Cloud Messaging. Opposing to what you stated this service is built to scale to billions of users it will generally not introduce performance bottlenecks.
What you basically want to do is to use the messaging service to wake up devices. If you insist you can then still use your client server approach and thus your own protocol to have the client lookup new messages from the backend.
when building a server, one sometimes performs asynchronous tasks from client to server (which responds to client in asynchronous time),
or the server needs to send the client a message
now if the client is listening at all times (meaning polling) it takes a lot of resources which is problematic
here is where I assume the operating system steps in and assumes the role of polling for the appropriate port, and letting the application know using the appropriate event (the application subscribes using the OS API)
am I right in my assumptions?
how do I subscribe to a port using the OS's API? (lets say android for the sake of argument)
how is a message from server to client work exactly?
and how does the server know the client's IP at all times?
I have seen many questions in the subject, but wasn't able to figure out the big picture
Edit:
I am using GCM in android, but have seen other apps that does not use it and still manage to do it right, also it's a more general question as to what is the right approach in java VS. any operating system it uses (ubnutu, windows, android, etc.)
Totally right - polling is typically a waste of resources. Until recently, many apps would either keep a socket open and poll every few minutes to keep it alive, or make periodic HTTP calls to a server.
Nowadays, Google Cloud Messaging is used by most apps to push data instead of constantly polling. As you correctly guessed, this is implemented by maintaining a persistent connection with Google's servers. The advantage of this is that it's very efficient for battery life, and that all apps can use this one resource to send push notifications, instead of each app having to poll a different server or create its own persistent connection.
The idea is that you send requests to GCM from your server (this can be in response to user activity, etc), which sends it to all of the client's devices. You can either send a message with a small payload (up to 4kb) or a "send-to-sync" message, which tells an app to contact the server (e.g. to sync new data from the server after user changes).
here is where I assume the operating system steps in and assumes the role of polling for the appropriate port, and letting the application know using the appropriate event (the application subscribes using the OS API)
GCM pushes messages to clients, so there isn't active waiting like you'd see in a simple polling system.
how is a message from server to client work exactly? and how does the server know the client's IP at all times?
There's no need for servers to know the client IP, as any online android device will typically maintain a connection with GCM. Targeting specific users is done via User Notifications.
(Oh, and I realize that your question is more general than just Android, which I have more experience in, but iOS has a similar system in place. Some developers I've met like to use Parse for managing push notifications).
I have an app which will generate 5 - 10 new database records in one host each second.
The records don't need any checks. They just have to be recorded in a remote database.
I'm using Java for the client app.
The database is behind a server.
The sending data can't make the app wait. So probably sending each single record to the remote server, at least synchronously, it's not good.
Sending data must not fail. My app doesn't need an answer from the server, but it has to be 100% secure that it arrives at the server correctly (which should be guaranteed using for example http url connection (TCP) ...?).
I thought about few approaches for this:
Run the send data code in separate thread.
Store the data only in memory and send to database after certain count.
Store the data in a local database and send / pulled by the server by request.
All of this makes sense, but I'm a noob on this, and maybe there's some standard approach which I'm missing and makes things easier. Not sure about way to go.
Your requirements aren't very clear. My best answer is to go through your question, and try to point you in the right direction on a point-by-point basis.
"The records don't need any checks," and "My app doesn't need an answer, but it has to be 100% secure that it arrives at the server correctly."
How exactly are you planning on the client knowing that the data was received without sending a response? You should always plan to write exception handling into your app, and deal with a situation where the client's connection, or the data it sends, is dropped for some reason. These two statements you've made seem to be in conflict with one another; you don't need a response, but you need to know that the data arrives? Is your app going to use a crystal ball to devine confirmation of the data being received (if so, please send me such a crystal ball - I'd like to use it to short the stock market).
"Run the send data code in a separate thread," and "store the data in memory and send later," and "store the data locally and have it pulled by the server", and "sending data can't make my app wait".
Ok, so it sounds like you want non-blocking I/O. But the reality is, even with non-blocking I/O it still takes some amount of time to actually send the data. My question is, why are you asking for non-blocking and/or fast I/O? If data transfers were simply extremely fast, would it really matter if it wasn't also non-blocking? This is a design decision on your part, but it's not clear from your question why you need this, so I'm just throwing it out there.
As far as putting the data in memory and sending it later, that's not really non-blocking, or multi-tasking; that's just putting off the work until some future time. I consider that software procrastination. This method doesn't reduce the amount of time or work your app needs to do in order to process that data, it just puts it off to some future date. This doesn't gain you anything unless there's some benefit to "batching" data sending into large chunks.
The in-memory idea also sounds like a temporary buffer. Many of the I/O stream implementations are going to have a buffer built in, as well as the buffer on your network card, as well as the buffer on your router, etc., etc. Adding another buffer in your code doesn't seem to make any sense on the surface, unless you can justify why you think this will help. That is, what actual, experienced problem are you trying to solve by introducing a buffer? Also, depending on how you're sending this data (i.e. which network I/O classes you choose) you might get non-blocking I/O included as part of the class implementation.
Next, as for sending the data on a separate thread, that's fine if you need non-blocking I/O, but (1) you need to justify why that's a good idea in terms of the design of your software before you go down that route, because it adds complication to your app, so unless it solves a specific, real problem (i.e. you have a UI in your app that shouldn't get frozen/unresponsive due to pending I/O operations), then it's just added complication and you won't get any added performance out of it. (2) There's a common temptation to use threads to, again, basically procrastinate work. Putting the work off onto another thread doesn't reduce the total amount of work needing to be done, or the total amount of I/O your app will consume in order to accomplish its function - it just puts it off on another thread. There are times when this is highly beneficial, and maybe it's the right decision for your app, but from your description I see a lot of requested features, but not the justification (or explanation of the problem you're trying to solve) that backup these feature/design choices, which is what should ultimately drive the direction you choose to go.
Finally, as far as having the server "pull" it instead of it being pushed to the server, well, all you're doing here is flipping the roles, and making the server act as a client, and the client the server. Realize that "client" and "server" are relative terms, and the server is the thing that's providing the service. Simply flipping the roles around doesn't really change anything - it just flips the client/server roles from one part of the software to the other. The labels themselves are just that - labels - a convenient way to know which piece is providing the service, and which piece is consuming the service (the client).
"I have an app which will generate 5 - 10 new database records in one host each second."
This shouldn't be a problem. Any decent DB server will treat this sort of work as extremely low load. The bigger concern in terms of speed/responsiveness from the server will be things like network latency (assuming you're transferring this data over a network) and other factors regarding your I/O choices that will affect whether or not you can write 5-10 records per second - that is, your overall throughput.
The canonical, if unfortunately enterprisey, answer to this is to use a durable message queue. Your app would send messages to the queue, and a backend app would receiver and store them in a database. Once the queue has accepted a message, it guarantees that it will be made available to the receiver, even if the sender, receiver, or the queue broker itself crash.
On my machine, using HornetQ, it takes ~1 ms to construct and send a short text message to a durable queue. That's quick enough that you can do it as part of handling a web request without adding any noticeable additional delay. Any good message queue will support your 10 messages per second throughput. HornetQ has been benchmarked as handling 8.2 million messages per second.
I should add that message queues are not that hard to set up and use. I downloaded HornetQ, and had it up and running in a few minutes. The code needed to create a queue (using the native HornetQ API) and send and receive messages (using the JMS API) is less than a hundred lines.
If you queue the data and send it in a thread, it should be fine if your rate is 5-10 per second and there's only one client. If you have multiple clients, to the point where your database inserts begin to get slow, you could have a problem; given your requirement of "sending data must not fail." Which is a much more difficult requirement, especially in the face of machine or network failure.
Consider the following scenario. You have more clients than your database can handle efficiently, and one of your users is a fast typist. Inserts begin to back up in-memory in their app. They finish their work and shut it down before the last ones are actually uploaded to the database. Or, the machine crashes before the data is sent - or while its sending; or worse yet, the database crashes while its sending, and due to network issues the client can't really tell that its transaction has not completed.
The easy way avoid these problems (most of them anyway), is to make the user wait until the data is committed somewhere before allowing them to continue. If you can make the database inserts fast enough then you can stick with a simpler scheme. If not, then you have to be more creative.
For example, you could locally write the data to disk when the user hits submit, and then upload it from another thread. This scenario needs to be smart enough to mark something that is persisted as sent (deleting it would work); and have the ability to re-scan at startup and look for unsent work to send. It also needs the ability to keep trying in the case of network or centralized server failure.
There also needs to be a way for the server side to detect duplicates. Because the client machine could send the data and crash before it can mark it as sent; and then upon restart it would send it again. The same situation could occur if there is a bad network connection. The client could send it and never receive confirmation from the server; time out and then end up retrying it.
If you don't want the client app to block, then yes, you need to send the data from a different thread.
Once you've done that, then the only thing that matters is whether you're able to send records to the database at least as fast as you're generating them. I'd start off by getting it working sending them one-by-one, then if that isn't sufficient, put them into an in-memory queue and update in batches. It's hard to say more, since you don't give us any idea what is determining the rate at which records are generated.
You don't say how you're writing to the database... JDBC? ORM like Hibernate? But the principles are the same.