One way of ensuring an IMAP client is in sync with its server is to leverage the SEEN flag (e.g., Library for IMAP IDLE).
I have not yet used this myself, but I was wondering if setting the SEEN flag basically sets the message to "read" on the server.
If so, this is obviously a problem when there are multiple readers involved or when the user logs into the server directly (e.g., logs into their Gmail account) and reads the message there (so that it is "marked read", and - thus - flagged as SEEN).
Or, I could be misunderstanding this completely and SEEN is something that is unique between a particular client and the server. However, not clear how to maintain state in that case.
"Leveraging the SEEN flag" sounds like a bad way to synchronize with the server. As you surmise, setting the SEEN flag basically sets the message to "read" on the server. All the other IMAP clients will see that the message has been read. The flag is not "private" between the server and each client. Your client should not mark the message SEEN unless the user has seen it.
To synchronize, you need to keep track of the UIDs of the messages your client has already seen, and compare the list with the ones available on the server whenever you poll the folder. You then locally discard ones that aren't on the server anymore (they're messages that have been deleted from other clients) and download the ones you didn't have in your local list (they're new messages).
It gets more complicated if you want to be robust and handle the case where the server has forgotten the UIDs of all messages and rebuild the folder with new UIDs (can happen if the index is corrupted and rebuilt on the server, the server software is changed, the server has become a different hosting provider, etc...) but that's the basic idea.
Related
I have a web-service on my server that pushes the xml data to the clients that are communicating to it over internet.
In these cases we have challenge to receive acknowledgement from the
client.
Specific case like, once client has received the data and before
sending the acknowledge, if the communication channel goes down.
Example:
In case of the software updates on clients over internet, how the server makes sure every thing is processed fine.
If you want to go on the "push" path, and you absolutely must know if the update was succesful, then you have to build your service and clients in such a way that you do know.
Basically what you need to do is build a small protocol so that information is transmitted no matter the failures of the communication channel. This means two things:
Your service does re-transmissions;
Your clients can deal with duplicate messages;
For example:
service pushes a message, client acknowledges => all good;
service pushes a message, the connection goes down, the message is lost. The client does not acknowledge since it never got the message => service pushes that same message once again at some later time. Now hopefully you get to case 1.
service pushes a message, client acknowledges but the connection fails and the service does not receive the acknowledge => similar to 2, so the service pushes that same message once again some later time and now the client receives the same message twice. It must ignore the second message but still needs to send an acknowledge so the service does not send it a third, forth, ... nth time;
And so on and so forth...
This is a high level description of what TCP does, for example. TCP is a reliable protocol over an unreliable network. It handles dropped packets, duplicated packets, etc.
Now, that would be pushing. A more simple alternative would be to use "pull" instead. The clients periodically pull the updates from the server. This is simpler to implement (the download is succesful if it worked, otherwise you try again later) but it's not without its gotchas, like for example:
controlling when clients start to pull data from the service. You can't just have them all update at the same time or you might overload the server. Clients should first ask the server if it's OK to update now or comme back later when the service is not so busy;
are you downloading upgrades in the background, from user devices? Data charges might apply so maybe it's better to ask the user if it wants the update now or later instead of doing it behind the scenes;
updating in the background, even if there is no problem with data charges might still consume bandwith when the client needs that bandwith for something else;
And so on and so forth...
The thing is this is a large topic, with general solutions that might not apply given particular situations. But it is not a new topic. Others have had these issues before. Consider for example Windows updates, how each PC's OS updates itself. Something similar happened a while ago when thick clients needed updates. The world moved to thin clients but now thick clients are making a comeback. Have a look at how these issues are solved, you will find usefull information online.
I do not think there is a way to do that. I believe the reason you are asking is for the following reasons:
1) If you are asking because you are sending a lot of data and your client deny receiving it, perhaps you can paginate it. That way you will know when the last page was accessed. You can even go one step further and just put very little data on your last page, that way you are sure that the last page is called.
2) If you are genuinely concerned about ensuring that they receive the entire data. How about suggest they access a 2nd web service which contains the checksum for the data, and suggest that they compare it.
Assuming that your web service is RESTful, your server should be stateless. The client should make sure it receives the data properly.
You could define a service to get the hash value of the data, followed by the request to receive the data itself. The client can check after the download whether the hash value of the downloaded data corresponds to the value received by the first call.
Amongst others, you could use MD5, SHA-1 and SHA256 in standard Java, as described in the Oracle documentation. This will calculate the hash value of the data from the server side.
Assuming you use Javascript from the client side, there are many possibilities to calculate the hash code using the same algorithms (jsSHA, for example).
I hope it helps.
I have an app which will generate 5 - 10 new database records in one host each second.
The records don't need any checks. They just have to be recorded in a remote database.
I'm using Java for the client app.
The database is behind a server.
The sending data can't make the app wait. So probably sending each single record to the remote server, at least synchronously, it's not good.
Sending data must not fail. My app doesn't need an answer from the server, but it has to be 100% secure that it arrives at the server correctly (which should be guaranteed using for example http url connection (TCP) ...?).
I thought about few approaches for this:
Run the send data code in separate thread.
Store the data only in memory and send to database after certain count.
Store the data in a local database and send / pulled by the server by request.
All of this makes sense, but I'm a noob on this, and maybe there's some standard approach which I'm missing and makes things easier. Not sure about way to go.
Your requirements aren't very clear. My best answer is to go through your question, and try to point you in the right direction on a point-by-point basis.
"The records don't need any checks," and "My app doesn't need an answer, but it has to be 100% secure that it arrives at the server correctly."
How exactly are you planning on the client knowing that the data was received without sending a response? You should always plan to write exception handling into your app, and deal with a situation where the client's connection, or the data it sends, is dropped for some reason. These two statements you've made seem to be in conflict with one another; you don't need a response, but you need to know that the data arrives? Is your app going to use a crystal ball to devine confirmation of the data being received (if so, please send me such a crystal ball - I'd like to use it to short the stock market).
"Run the send data code in a separate thread," and "store the data in memory and send later," and "store the data locally and have it pulled by the server", and "sending data can't make my app wait".
Ok, so it sounds like you want non-blocking I/O. But the reality is, even with non-blocking I/O it still takes some amount of time to actually send the data. My question is, why are you asking for non-blocking and/or fast I/O? If data transfers were simply extremely fast, would it really matter if it wasn't also non-blocking? This is a design decision on your part, but it's not clear from your question why you need this, so I'm just throwing it out there.
As far as putting the data in memory and sending it later, that's not really non-blocking, or multi-tasking; that's just putting off the work until some future time. I consider that software procrastination. This method doesn't reduce the amount of time or work your app needs to do in order to process that data, it just puts it off to some future date. This doesn't gain you anything unless there's some benefit to "batching" data sending into large chunks.
The in-memory idea also sounds like a temporary buffer. Many of the I/O stream implementations are going to have a buffer built in, as well as the buffer on your network card, as well as the buffer on your router, etc., etc. Adding another buffer in your code doesn't seem to make any sense on the surface, unless you can justify why you think this will help. That is, what actual, experienced problem are you trying to solve by introducing a buffer? Also, depending on how you're sending this data (i.e. which network I/O classes you choose) you might get non-blocking I/O included as part of the class implementation.
Next, as for sending the data on a separate thread, that's fine if you need non-blocking I/O, but (1) you need to justify why that's a good idea in terms of the design of your software before you go down that route, because it adds complication to your app, so unless it solves a specific, real problem (i.e. you have a UI in your app that shouldn't get frozen/unresponsive due to pending I/O operations), then it's just added complication and you won't get any added performance out of it. (2) There's a common temptation to use threads to, again, basically procrastinate work. Putting the work off onto another thread doesn't reduce the total amount of work needing to be done, or the total amount of I/O your app will consume in order to accomplish its function - it just puts it off on another thread. There are times when this is highly beneficial, and maybe it's the right decision for your app, but from your description I see a lot of requested features, but not the justification (or explanation of the problem you're trying to solve) that backup these feature/design choices, which is what should ultimately drive the direction you choose to go.
Finally, as far as having the server "pull" it instead of it being pushed to the server, well, all you're doing here is flipping the roles, and making the server act as a client, and the client the server. Realize that "client" and "server" are relative terms, and the server is the thing that's providing the service. Simply flipping the roles around doesn't really change anything - it just flips the client/server roles from one part of the software to the other. The labels themselves are just that - labels - a convenient way to know which piece is providing the service, and which piece is consuming the service (the client).
"I have an app which will generate 5 - 10 new database records in one host each second."
This shouldn't be a problem. Any decent DB server will treat this sort of work as extremely low load. The bigger concern in terms of speed/responsiveness from the server will be things like network latency (assuming you're transferring this data over a network) and other factors regarding your I/O choices that will affect whether or not you can write 5-10 records per second - that is, your overall throughput.
The canonical, if unfortunately enterprisey, answer to this is to use a durable message queue. Your app would send messages to the queue, and a backend app would receiver and store them in a database. Once the queue has accepted a message, it guarantees that it will be made available to the receiver, even if the sender, receiver, or the queue broker itself crash.
On my machine, using HornetQ, it takes ~1 ms to construct and send a short text message to a durable queue. That's quick enough that you can do it as part of handling a web request without adding any noticeable additional delay. Any good message queue will support your 10 messages per second throughput. HornetQ has been benchmarked as handling 8.2 million messages per second.
I should add that message queues are not that hard to set up and use. I downloaded HornetQ, and had it up and running in a few minutes. The code needed to create a queue (using the native HornetQ API) and send and receive messages (using the JMS API) is less than a hundred lines.
If you queue the data and send it in a thread, it should be fine if your rate is 5-10 per second and there's only one client. If you have multiple clients, to the point where your database inserts begin to get slow, you could have a problem; given your requirement of "sending data must not fail." Which is a much more difficult requirement, especially in the face of machine or network failure.
Consider the following scenario. You have more clients than your database can handle efficiently, and one of your users is a fast typist. Inserts begin to back up in-memory in their app. They finish their work and shut it down before the last ones are actually uploaded to the database. Or, the machine crashes before the data is sent - or while its sending; or worse yet, the database crashes while its sending, and due to network issues the client can't really tell that its transaction has not completed.
The easy way avoid these problems (most of them anyway), is to make the user wait until the data is committed somewhere before allowing them to continue. If you can make the database inserts fast enough then you can stick with a simpler scheme. If not, then you have to be more creative.
For example, you could locally write the data to disk when the user hits submit, and then upload it from another thread. This scenario needs to be smart enough to mark something that is persisted as sent (deleting it would work); and have the ability to re-scan at startup and look for unsent work to send. It also needs the ability to keep trying in the case of network or centralized server failure.
There also needs to be a way for the server side to detect duplicates. Because the client machine could send the data and crash before it can mark it as sent; and then upon restart it would send it again. The same situation could occur if there is a bad network connection. The client could send it and never receive confirmation from the server; time out and then end up retrying it.
If you don't want the client app to block, then yes, you need to send the data from a different thread.
Once you've done that, then the only thing that matters is whether you're able to send records to the database at least as fast as you're generating them. I'd start off by getting it working sending them one-by-one, then if that isn't sufficient, put them into an in-memory queue and update in batches. It's hard to say more, since you don't give us any idea what is determining the rate at which records are generated.
You don't say how you're writing to the database... JDBC? ORM like Hibernate? But the principles are the same.
I have a constantly-running Java program that needs to send an email whenever it encounters a problem. However it is possible that the mail server it uses could be down at the time it tries to send the email.
What is the best way to ensure that the email will be delivered when the mail server comes back up?
Queue up the requests. Have a separate thread which merely waits for something to enter the queue, then tries to email it. If it fails, it waits a few hours and tries again. Once it sends a message, it goes back to the queue to get the next message.
Put the email object into a stack or list when it fails to send, when the email server comes back up, pop each email out until it is empty.
You may want to save the email in a file, perhaps an xml file, so that should the application crash you won't lose this information.
This file is loaded when the application starts, and it keeps everything in memory, so that while there are pending emails then it keeps checking every 5 minutes or so, then, as it sends each email it will resave the xml file, so that should it crash after sending 3 emails out of 10 it won't resend those three when it starts up.
But, how you handle that is really going to depend on the specification for how to handle error conditions.
If you go from "forward everything to this SMTP server which is always there" to a situation where you need to handle all kinds of conditions normally handled by a full SMTP-server like retry later, retransmit if connection closed, use MX-hosts in their stated order and similar, you may want to consider simply having a SMTP-server inside your client (but one that does not accept incoming connections) since this moves all the dirty logic away from your applications.
I believe that the James email server - http://james.apache.org/ - is easily embeddable, but I have not actually tried.
The suggestion of using James is a good one but I've had some issues in the past of James being a bit flaky and needing to be restarted.
You could use something like Quartz to have a scheduler check for messages that need to be sent. If the message can't be sent (eg. smtp server isn't available), then that message is rescheduled to be sent at a later time. You could either have a task per message or have a persistent task that checks for messages and available mail server then sends the messages. The persistent task would give you email batching.
If you are in a Unix/Linux world, then consider the alternative of sending your alerts using syslog, and dealing with the generation of emails on that side. For example, nsyslogd has a module called ommail for generating emails natively.
IIRC, there are adapters for log4j and the like that can bridge between the Java and syslog worlds with a minimum of (zero ?) coding.
Apache James - http://james.apache.org/ will let you run your own mailserver as a proxy, not only that but is written in 100% java, so you can figure out what its doing,
and as an extra bonus James uses databases to queue the mail, so you can even inject mail directly into the queues by inserting into a database, then leave whole business of sending the mail up to James.
Architecture:
A bunch of clients send out messages to a server which is behind a VIP. Obviously this server poses an availability risk.
The client monitors a resource and the server is responsible to take action based on the what status the majority of the clients report to it and hence the need for only 1 server/leader.
I am thinking of adding another server as a backup on the VIP, which gets turned on only when the first server fails. However when the backup comes up it would have no information to process and would lose time waiting for clients to report and waiting for the required thresholds etc.
Problem:
What is the best and easiest way to have two servers share client state information with only one receiving client traffic?
Solution1:
I thought of have having the server forward client state information to backup server and in the event of a failure when the backup server comes up, it can take it from there.
Is there any other way to do this? I thought of having a common/shared place to store state information where both servers can read client state information from. But this doesn't work well as the shared space is a single point of failure too.
One option is to use a write-ahead log. Essentially, any modification you make to your state gets sent over to the backup server, which replays the change on its own copy of the state. As long as it can keep up with the streaming log, the backup is always up-to-date.
This is the approach generally used by most databases; if you use one as your backend, you may be able to get support for this with little work.
Be careful to have a plan to recover from communication failure - either save the log to disk and resend the missing portion, or send a snapshot of the state, plus all log entries since the snapshot on reconnect.
There are various distributed caching products which do the kind of thing you're talking about here. Some are supplied with App Servers, such as WebSphere's dynacache and Object Grid. In fact ObjectGrid can be used in JSE, no need for an App Server.
Those distributed cache products use various push and pull models with pub-sub messaging to achieve consistency across the instances. Working for IBM I'm a fan of ObjectGrid, but more impartant, I'm fan of not reinventing wheels. My take is that this stuff can get quite complex and hence finding something off-the shelf might save a load of work - there are links to various Open Source solutions here.
The is very much dependent on how available your solution needs to be (how many 9's). There is a spectrum of solution.
A lightweight one could be crafted around Memcache: extremely fast distributed state facility. As example, it is used extensively on Google AppEngine.
I have written a nice program in Java that connects to a gmail account and download atachments sent to it. Once an attachment has been downloaded, it is marked as read and is not downloaded ever again. This program will have to run in multiple instances with each program downloading unique attachments so that a single attachment is never downloaded twice. The problem is that at the moment if the attachment is of a decent size, one program is still downloading it, when another instance connects and also starts to download the attachment before it has been marked as read.
I have tried checking and setting various flags and checking whether the folder is open, nothing seems to work. Any solutions?
Update: Thank you for the quick answers, sadly IMAP is not an option due to other reasons.
Consider using IMAP instead - it is designed for client-server interaction.
From RFC1939 (Post Office Protocol - Version 3):
POP3 is not intended to provide
extensive manipulation operations of
mail on the server; normally, mail is
downloaded and then deleted. A more advanced (and complex) protocol, IMAP4, is discussed in RFC1730.
I don't think POP3 is made for multiple simultaneous access.
Ask yourself this: do i really need multiple processes accessing the same mailbox?
If you do, you'll have to find a way to have these processes communicate to each other.
Use a common database or server process to coordinate actions.
IMAP does have more options, but i'm not sure if you can "lock" a single mail to mark it as being processed.
As the others have mentioned, POP3 isn't really intended for this kind of scenario.
If you absolutely have to use POP3, I'd suggest downloading all the e-mail to an intermediate server which sorts the messages and makes them available for each of the other clients.
It sounds like you're just trying to distribute the processing of the e-mails. If that's the case, you can just have each client connect to your intermediate server to retrieve the next available message.
I'm not sure what your constraints are, but you may even want to consider receiving the attachments some other way besides e-mail. If people are uploading files, you could set up a web form that automatically sends each file to the next available instance of your application for processing.
If you need to stay with a POP3 connection, you could keep a local database of previously downloaded message ids. Then new instances could check against that before downloading again. The best solution is just to use IMAP, though, as IMAP is able to set the read/unread flags before downloading.
You could mark the mail as read before starting the download, and then start downloading it.