I would like to inform all logged in users that the server will shutdown. This special interest would be nice in an ajaxfy application (RIA).
What are the possible solutions? What are the best practice solutions?
There were two possible end-scenarios:
Send a text $x to the server ergo to all users. ("The server will not be available for some minutes.")
Send a key $y to the server which will used to generate a (custom) text to all users. ("SERVER_SHUTDOWN")
Environment: Tomcat (6/7), Spring 3+
Messaging to users: with polling or pseudo-pushing via an async servlet.
Ideas
1. Context.destroy(): Implementing a custom ContextListener's destroy
I don't think it is a good solution to block within a "destroy()" -- blocking, because we should wait about 5-10 seconds to make sure that all logged in users receive a message.
2. JMX Beans
This would mean, that any server service operation (start, stop) have to invoke a special program which sends the message.
3. Any other messaging queues like AMQP or ActiveMQ
Like 2.
Unless the server shuts down regularly and the shutdown has a significant impact on users (for e.g. they will lose any unsubmitted work - think halfway through editing a big post on a page) then notifying of server shutdown won't really be of much benefit.
There are a couple of things you could do.
First, if the server is going to be shutdown due to planned maintenance then you could include a message on web pages like;
Server will be unavailable Monday 22nd Aug 9pm - 6am for planned
maintenance. Contact knalli#example.com for more information.
Second, before shutting down the server, redirect requests to a static holding page (just change your web server config). This holding page should have information on why the server is down and when it will be available again.
With both options, its also important to plan server downtime. It's normal to have maintenance windows outside of normal working hours. Alternatively, if you have more than one server you can cluster them. This allows you to take individual servers out of the cluster to perform maintenance without having any server downtime at all.
Related
We have a spring java app using EWS to connect to our on prem 2016 Exchange server and 'stream' pulling emails. Every 30 minutes a new 30 minute subscription is made (via new thread). We assume old connection just expires.
When one instance is running in our environment, it works perfectly fine, but when two instances run, after some time one instance will eventually start throwing error about
You have exceeded the available concurrent connections for your account. Try again once your other requests have completed.
It seems like an issue which is then hit by throttling. I found that the Exchange servers config is:
EWSMaxConcurrency=27, MaxStreamingConcurrency=10,
HangingConnectionLimit=10
Our code previously didn't explicitly close connections and unsubscribe (was running fine without when one instance). We tried including both but the issue still persists and we noticed the close method for StreamingSubscriptionConnection throws error. The team that handles the Exchange server can find errors referencing the exceeding connection count error above, but nothing relating to the close connection error
...[m.e.w.d.n.StreamingSubscriptionConnection.close(349)]: java.lang.Exception: microsoft.exchange.webservices.data.notification.StreamingSubscriptionConnection
Currently we don't have much ability to make changes on the exchange server side. I'm not familiar with SOAP messages but I was planning to look into how to monitor them to see what inbound and outbound messages there are for some insights
For the service I set service.setTraceEnabled(true) and service.setTraceFlags(EnumSet.allOf(TraceFlags.class)
However I only see trace messages in console when an email arrives. I dont see any messages during start up when a subscription/connection is created
Can anyone help provide any advice on how I can monitor these subscription related messages?
I tried using SOAPUI but I'm having difficulty applying our server's WSDL. I considered using the Tunnelij plugin for intellij but I'm not too familiar with how to set it up either
My suspicion is that there is some intermittent latency issue on Exchange server side, perhaps response messages are not coming back in a timely manner, and this may be screwing up. I presume if I monitor these SOAP messages then I should see more than 10 requests to subscribe before that error appears
The EWS Logs on the CAS (Client Access Server) should have details about the throttling issue. Are you using Impersonation in you Application if you not using Impersonation then the concurrent connections are charged against the account your using with Impersonation that get charged against the account your impersonating. The difference here is that a single user can have no more the 10 streaming subscriptions (unless you modify the web.config) if your using impersonation than you can scale your application to 1000's of users see https://github.com/MicrosoftDocs/office-developer-exchange-docs/blob/main/docs/exchange-web-services/how-to-maintain-affinity-between-group-of-subscriptions-and-mailbox-server.md
I am using Elasticsearch 1.5.1 and Tomcat 7. Web application creates a TCP client instance as Singleton during server startup through Spring Framework.
Just noticed that I failed to close the client during server shutdown.
Through analysis on various tools like VisualVm, JConsole, MAT in Eclipse, it is evident that threads created by the elasticsearch client are live even after server(tomcat) shutdown.
Note: after introducing client.close() via Context Listener destroy methods, the threads are killed gracefully.
But my query here is,
how to check the memory occupied by these live threads?
Memory leak impact due to this thread?
We have got few Out of memory:Perm gen errors in PROD. This might be a reason but still I would like to measure and provide stats for this.
Any suggestions/help please.
Typically clients run in a different process than the services they communicate with. For example, I can open a web page in a web browser, and then shutdown the webserver, and the client will remain open.
This has to do with the underlying design choices of TCP/IP. Glossing over the details, under most cases a client only detects it's server is gone during the next request to the server. (Again generally speaking) it does not continually poll the server to see if it is alive, nor does the server generally send a "please disconnect" message on shutting down.
The reason that clients don't generally poll servers is because it allows the server to handle more clients. With a polling approach, the server is limited by the number of clients running, but without a polling approach, it is limited by the number of clients actively communicating. This allows it to support more clients because many of the running clients aren't actively communicating.
The reason that servers typically don't send an "I'm shutting down" message is because many times the server goes down uncontrollably (power outage, operating system crash, fire, short circuit, etc) This means that an protocol which requires such a message will leave the clients in a corrupt state if the server goes down in an uncontrolled manner.
So losing a connection is really a function of a failed request to the server. The client will still typically be running until it makes the next attempt to do something.
Likewise, opening a connection to a server often does nothing most of the time too. To validate that you really have a working connection to a server, you must ask it for some data and get a reply. Most protocols do this automatically to simplify the logic; but, if you ever write your own service, if you don't ask for data from the server, even if the API says you have a good "connection", you might not. The API can report back a good "connections" when you have all the stuff configured on your machine successfully. To really know if it works 100% on the other machine, you need to ask for data (and get it).
Finally servers sometimes lose their clients, but because they don't waste bandwidth chattering with clients just to see if they are there, often the servers will put a "timeout" on the client connection. Basically if the server doesn't hear from the client in 10 minutes (or the configured value) then it closes the cached connection information for the client (recreating the connection information as necessary if the client comes back).
From your description it is not clear which of the scenarios you might be seeing, but hopefully this general knowledge will help you understand why after closing one side of a connection, the other side of a connection might still think it is open for a while.
There are ways to configure the network connection to report closures more immediately, but I would avoid using them, unless you are willing to lose a lot of your network bandwidth to keep-alive messages and don't want your servers to respond as quickly as they could.
I have a Java Web app running on Jetty which connects to the server using cometD to receive data and returns after 25s if the server has no data and reconnects, i.e., long-polling.
I monitor the performance of the server using NewRelic but those long-polling connections skew the performance diagrams.
Is there a way to tell newrelic to actually ignore the time the server is waiting and only show the actual time that the server has been busy? I understand that it is probably impossible to do this on the newrelic side, but I thought there may be some best practices on how to deal with long-polling connections in newrelic.
Any help is appreciated!
You wont be able to just exclude or ignore the time the server is waiting and only show the actual time that the server has been busy, but what you can do is ignore the transaction completely if you do not need to see those metrics. https://docs.newrelic.com/docs/java/java-agent-api This documentation talks about using New Relics API for ignoring transactions.
CometD sends long polls to a URL that is the base CometD Servlet URL with "/connect" appended, see parameter appendMessageTypeToURL in the documentation.
For example, if you have mapped the CometD Servlet to /cometd/*, then long polls are sent to /cometd/connect.
I don't know NewRelic, but perhaps you can filter out the requests that end in */connect and gather your statistics on the other requests, that now won't be skewed by the long poll timeout.
So I'm working through a bit of a problem, and some advice would be nice. First a little background, please excuse the length.
I am working on a management system that queries network devices via the TL1 protocol. For those unfamiliar with the protocol, the short answer is that is is a "human readable" language that communicates via a text based IO stream.
I am using Spring and Jsch to open a port to the remote NE (network element), login, run the command, then close the connection. There are two kinds of ways to get into the remote NE's, either directly (via the ssh gateway) if the element has a tcp/ip address (many are osi only), or through an ems (management system) of some type using what is called a "northbound interface".
Either way, the procedure is the same.
Use Jsch to open a port to the NE or ems.
Send login command for the NE ex. "act-user<tid>:<username>:UniqueId::<password>;"
Send command ex. "rtrv-alm-all:<tid>:ALL:uniqueid::,,,,;"
Retrieve and process results. The results of the above for example might look something like this...
RTRV-ALM-ALL:foo:ALL:uniqueid;
CMPSW205 02-01-11 18:33:05
M uniqueid COMPLD
"01-01-06:MJ,BOARDOUT-ALM,SA,01-10,12-53-58,,:\"OPA_C__LRX:BOARD EXTRACTED\","
;
The ; is important because it signals the end of the response.
Lastly logout, and close the port.
With Spring I have been using the ThreadPoolTaskExecutor quite effectively to do this.
Until this issue came up ...
With one particular ems platform (Hitachi) I ran across a roadblock with my approach. This ems handles as many as 80 nodes through it. You connect to the port, then issue a command to login to the ems, then run commands pointing to the various NE's. Same procedure as before, but here is the problem...
After you login into the ems, the next command, no matter what it is, will take up to 10 minutes to complete. until that happens, all other commands are blocked. After this initial wait all other commands work quickly. There appears to be no way to defeat this behaviour (my suspicion is that there is some NE auto-discovery happening during this period).
Now the thrust of my question...
So my next approach for this platform would be to connect to the ems, login to it, and keep the connection open, and just pass commands to the various NE's. That would mean a 10 minute delay after the application (web based) first loads, but would be fine after this point.
The problem I have is how best to do this. Having a single text based iostream for passing this stuff through looks like a large bottleneck, plus multiple users will be using the application, how do I handle multiple commands and responses against this single iostream? I can open a few iostreams (maybe up to 6) on this ems, but that also complicates sorting out what goes where.
Any advice on direction would be appreciated.
Look at using one process per ems so that communication to each ems is separated. This will at least ensure that communications with other ems's are unaffected by the problems with this one.
You're going to have to build some sort of a command queuing system so that commands sent to the Hitachi ems don't block the user interface until they are completed. Either that, or you're going to have to put a 10 minute delay into the client software before they can begin using it, or a 10 minute delay into the part of the interface that would handle the Hitachi.
Perhaps it would be a good policy to bring up the connection and immediately send some sort of ping or station keeping idle command - something benign that you don't care about the response, or gives no response, but will trigger the 10 minute delay to get it over with. Your users can become familiar with this 10 minute delay and at least start the application up before getting their coffee or something.
If you can somehow isolate the Hitachi from the other ems's in the application's design, this would really ensure that the 10 minute delay only exists while interfacing with the Hitachi. You can connect and issue a dummy command, and put the Hitachi in some sort of "connecting" state where commands cannot be used until the result comes in, and then you change the status to ready so the user can interact with it.
One other approach would be to develop some sort of middleware component - I don't know if you've already done this. If the clients are all web-based, you could run a communications piece on the webserver which takes connections from the clients and pipes them through one piece on the webserver which communicates with all of the ems's. When this piece starts up on the webserver, it can connect to each ems and send some initial ping command which starts the 10 minute timer. Once this is complete, the piece on the webserver could send keepalive messages every so often, again some sort of dummy command, to keep the socket alive so it wouldn't have to reset and go through the 10-minute wait time again. When the user brings up the website, they can communicate with this middleware server piece which would forward the requests to the appropriate ems and forward the response back to the client -- all through the already open connection.
Context: Master server (Java, TCP) monitoring a list of hosted games (a different machine for the master server and for each hosted game server). Any user can host a game on his PC. Hosted games can last weeks or months.
Need: Knowing when hosted game servers are closed or no longer reachable.
Restriction 1: Can't rely on hosted servers' "gone offline update message", since those messages may never arrive (power down, Internet link cut, etc.)
Restriction 2: I'm not sure about TCP's built-in keep-alive, since it would mean a 24/7 open socket with each hosted server (correct me if I'm wrong)
Any thoughts?
Consider using some kind of heartbeat messages. Those messages ("I'm alive!") are sent regularly and if the master server doesn't get a heartbeat message from a hosted server (for a certain time), it knows, that this hosted server is unavailable.
You can even add some status parameters to this message if you need more detailed information from the hosted servers (like 'fully operational', 'going down for maintenance in 5 minutes', etc.)
It sounds like you're running into what's known as the "Two-Army Problem," the "Coordinated Attack Problem, or, as Andreas D noted, the Two General's Problem which you've identified as your Restriction 1. The idea behind this problem is that two armies want to coordinate an attack on the enemy. They need to both attack at the same time, since each army knows that they will die if they attack on their own.
The problem is this: How does an army know their messenger, who is carrying the time intended for the coordinated attack, has reached the other army successfully? Furthermore, how can the second army be sure that the first army knows that they received the message and plans to attack? It's possible that any message between the armies doesn't arrive successfully, and thus the armies can never be sure that they're coordinated properly.
Because of this, the simplest answer may be to simply do a ping on each running game at a scheduled interval, and also whenever a user requests a refresh. You can use this information to populate your master list.