On October 7 2020 and Januari 21 2021, Google introduced unidirectional server streaming and bidirectional web sockets respectively for Cloud Run. Here are the blog posts:
https://cloud.google.com/blog/products/serverless/cloud-run-now-supports-http-grpc-server-streaming
https://cloud.google.com/blog/products/serverless/cloud-run-gets-websockets-http-2-and-grpc-bidirectional-streams
From the second link:
This means you can now build a chat app on top of Cloud Run using a
protocol like WebSockets, or design streaming APIs using gRPC.
This raises some questions:
How does it work with auto scaling?
Say we build a chat app and we have ws connections distributes across multiple instances and need to push a message to all of them. How would we do?
Is it okey for the instances to keep a state now(the web socket connection)? What are the consequences of this?
What I am trying to ask; How do we build a scaleable chat application with Cloud Run and other managed tools available in Google Cloud with features like private messages and public chat rooms?
How does it work with auto-scaling?
Each WebSocket connection will consume 1 connection out of 250 available connection capacity per container. (250 is subject to change in the future as it had been 80 but increased to 250 recently.) This limit info is available in the Google Cloud Run Limits doc. When container's all 250 connections are occupied, another container instance will start automatically.
Say we build a chat app and we have ws connections distributes across multiple instances and need to push a message to all of them. How would we do?
You would have to use some form of central datastore or pubsub to solve that problem. e.g. Google provides Google Cloud PubSub, or you can setup a Redis instance and use Redis' PubSub feature. There are many ways to tackle this problem.
Is it okay for the instances to keep a state now(the web socket connection)? What are the consequences of this?
It is always safe to keep a state in a container, but you just need to make sure that the container can be terminated at any time when there isn't an active connection. Also,
according to the doc, Google Cloud Run will terminate all HTTP requests (including WebSockets) after request timeouts config, which has a default value of 5 min and can be increased to 15 min. Therefore, your WebSocket connections will likely be dropped after 15 min, and you should have a logic to handle auto reconnection. Google Cloud Run doc explicitly talks about this limit.
Related
I'm trying to create an app with notification service whenever a call is made on API.
Is it possible for me to create a logger on port:8080 and when app is run on the server it listens to api running on another server.
Both applications are run on local machine for testing purposes using Docker.
So far I've been reading https://www.baeldung.com/spring-boot-logging in order to implement it but I'm having problems with understanding the path mapping.
Any ideas?
First let's name the two applications:
API - the API service that you want to monitor
Monitor - which wants to see what calls are made to (1)
There are several ways to achieve this.
a) Open up a socket on Monitor for inbound traffic. Communicate the IP address and socket port manually to the API server, have it open up the connection to the Monitor and send some packet of data down this "pipe". This is the lowest level approach simple, but very fragile as you have to coordinate the starting of the services, and decide on a "protocol" for how the applications exchange data.
b) REST: Create a RESTful controller on the Monitor app that accepts a POST. Communicate the IP address and port manually to the API server. Initiate a POST request to the Monitor app when needed. This is more robust but still suffers from needing careful starting of the servers
c) Message Queue. install a message queue system like RabbitMQ or ActiveMQ (available in Docker containers). API server publishes a message to a Queue. Monitor subscribes to the Queue. Must more robust, still requires each application to be advised of the address of the MQ server, but now you can stop/start the two applications in any order
d) The java logging article is good started into java logging. Most use cases log to a local file on the local server. There are some implementations of backend logging that send logs to remote places (I don't think that article covers them), and there are ways of adding your own custom receiver of this log traffic. In this option, on the API side, it would use ordinary logging code with no knowledge of the downstream consumption of the logging. Your monitor app would need to integrate tightly into a particular logging system with this approach.
We have a spring java app using EWS to connect to our on prem 2016 Exchange server and 'stream' pulling emails. Every 30 minutes a new 30 minute subscription is made (via new thread). We assume old connection just expires.
When one instance is running in our environment, it works perfectly fine, but when two instances run, after some time one instance will eventually start throwing error about
You have exceeded the available concurrent connections for your account. Try again once your other requests have completed.
It seems like an issue which is then hit by throttling. I found that the Exchange servers config is:
EWSMaxConcurrency=27, MaxStreamingConcurrency=10,
HangingConnectionLimit=10
Our code previously didn't explicitly close connections and unsubscribe (was running fine without when one instance). We tried including both but the issue still persists and we noticed the close method for StreamingSubscriptionConnection throws error. The team that handles the Exchange server can find errors referencing the exceeding connection count error above, but nothing relating to the close connection error
...[m.e.w.d.n.StreamingSubscriptionConnection.close(349)]: java.lang.Exception: microsoft.exchange.webservices.data.notification.StreamingSubscriptionConnection
Currently we don't have much ability to make changes on the exchange server side. I'm not familiar with SOAP messages but I was planning to look into how to monitor them to see what inbound and outbound messages there are for some insights
For the service I set service.setTraceEnabled(true) and service.setTraceFlags(EnumSet.allOf(TraceFlags.class)
However I only see trace messages in console when an email arrives. I dont see any messages during start up when a subscription/connection is created
Can anyone help provide any advice on how I can monitor these subscription related messages?
I tried using SOAPUI but I'm having difficulty applying our server's WSDL. I considered using the Tunnelij plugin for intellij but I'm not too familiar with how to set it up either
My suspicion is that there is some intermittent latency issue on Exchange server side, perhaps response messages are not coming back in a timely manner, and this may be screwing up. I presume if I monitor these SOAP messages then I should see more than 10 requests to subscribe before that error appears
The EWS Logs on the CAS (Client Access Server) should have details about the throttling issue. Are you using Impersonation in you Application if you not using Impersonation then the concurrent connections are charged against the account your using with Impersonation that get charged against the account your impersonating. The difference here is that a single user can have no more the 10 streaming subscriptions (unless you modify the web.config) if your using impersonation than you can scale your application to 1000's of users see https://github.com/MicrosoftDocs/office-developer-exchange-docs/blob/main/docs/exchange-web-services/how-to-maintain-affinity-between-group-of-subscriptions-and-mailbox-server.md
We are running a setup on production where grpc clients are talking to servers via proxy in between (image attached)
The client is written in java and server is written in go. We are using the load balancing property as round_robin in the client. Despite this, we have observed some bizarre behaviour. When our proxy servers scale in i.e reduce from let's say 4 to 3, then resolver gets into action and the request load from our clients gets distributed equally to all of our proxies, but when the proxy servers scale out i.e increase from 4 to 8, then the new proxy servers don't get any requests from the clients which leads to a skewed distribution of request load on our proxy servers. Is there any configuration that we can do to avoid this?
We tried setting a property named networkaddress.cache.ttl to 60 seconds in the JVM ARGS but even this didn't help.
You need to cycle the sticky gRPC connections using the keepalive and keepalive timeout configuration in the gRPC client.
Please have a look at this - gRPC connection cycling
both round_robin and pick_first perform name resolution only once. They are intended for thin, user-facing clients (android, desktop) that have relatively short life-time, so sticking to a particular (set of) backend connection(s) is not a problem then.
If your client is a server app, then you should be rather be using grpclb or the newer xDS: they automatically re-resolve available backends when needed. To enable them you need to add runtime dependency in your client to grpc-grpclb or grpc-xds respectively.
grpclb does not need any additional configuration or setup, but has limited functionality. Each client process will have its own load-balancer+resolver instance. backends are obtained via repeated DNS resolution by default.
xDS requires an external envoy instance/service from which it obtains available backends.
I am designing a Real Time backend chat application for mobile devices and to do this I am building everything over Java (to deal with incoming HTTP requests ) and Redis (Pub/Sub). Now I am looking for a Worker and already took a look at tools like Resque, Python-RQ and even Celery (also offers Redis integration), but maybe things grow and remain difficult to manage. I want to keep things as simple as possible. Has anyone tried to use Jedis (redis java client) to listening messages from a Redis channel and started a new Thread to each message received? Was the performace bad? what if a had hundreds of requests per second? It seems a poor solution (simple Thread as a Worker)
The flow is (for android example):
Android client send a message to chat
My Rest webservice (tomcat) receives the message and publish (jedis) a message to a Redis channel [quite simple]
The Worker (?) process the message and deliver it to all subscribers over Google Cloud Message (simple http request)
So, any suggestions or experiences about Redis workers implementations or Jedis library? What do you recommend? Thanks.
For those who wants a suggestion:
I opted for Python-RQ because of its simplicity. Too simple, well documented and solved my problem.
Regards.
I am reading a lot about HTML5 and I like the web sockets in particular because they facilitate bi-directional communication between web server and web browser.
But we keep reading about chrome, opera, firefox, safari getting ready for html5. Which web server is ready to use web sockets feature? I mean, are web servers capable of initiating subsequent communication as of today? How about Google's own Appengine?
How can I write a sample web application that takes advantage of this feature in Java?
Bi-directional communication between web servers and browsers is nothing new. Stack Overflow does it today if a new answer is posted to a question you're reading. There are a few different strategies for implementing socket-style behavior using existing technologies:
AJAX short polling: Connect to the server and ask if there are any new messages. If not, disconnect immediately and ask again after a short interval. This is useful when you don't want to leave a lot of long-running, idle connections open to the server, but it means that you will only receive new messages as fast as your polling interval, and you incur the overhead of establishing a new HTTP connection every time you poll.
AJAX long polling: Connect to the server and leave the connection open until a new message is available. This gives you fast delivery of new messages and less frequent HTTP connections, but it results in more long-running idle processes on the server.
Iframe long polling: Same as above, only with a hidden iframe instead of an XHR object. Useful for getting around the same-origin policy when you want to do cross-site long polling.
Plugins: Flash's XMLSocket, Java applets, etc. can be used to establish something closer to a real low-level persistent socket to a browser.
HTML5 sockets don't really change the underlying strategies available. Mostly they just formalize the strategies already in use, and allow persistent connections to be explicitly identified and thus handled more intelligently. Let's say you want to do web-based push messaging to a mobile browser. With normal long-polling, the mobile device needs to stay awake to persist the connection. With WebSockets, when the mobile device wants to go to sleep, it can hand off the connection to a proxy, and when the proxy receives new data it can wake up the device and pass back the message.
The server-side is wide open. To implement the server-side of a short polling application, you just need some kind of a chronological message queue. When clients connect they can shift new messages off the queue, or they can pass an offset and read any messages that are newer than their offset.
Implementing server-side long polling is where your choices start to narrow. Most HTTP servers are designed for short-lived requests: connect, request a resource, and then disconnect. If 300 people visit your site in 10 minutes, and each takes 2 seconds to connect and download HTTP resources, your server will have an average of 1 HTTP connection open at any given time. With a long polling app, you're suddenly maintaining 300 times as many connections.
If you're running your own dedicated server you may be able to handle this, but on shared hosting platforms you're likely to bump up against resource limits, and App Engine is no exception. App Engine is designed to handle a high volume of low latency requests, e.g. short polling. You can implement long polling on App Engine, but it's ill-advised; requests that run for longer than 30 seconds will get terminated, and the long running processes will eat up your CPU quota.
App Engine's solution for this is the upcoming Channel API. The channel API implements long polling using Google's existing robust XMPP infrastructure.
Brett Bavar and Moishe Lettvin's Google I/O talk lays out the usage pattern as follows:
App Engine apps create a channel on a remote server, and are returned a channel ID which they pass off to the web browser.
class MainPage(webapp.RequestHandler):
def get(self):
id = channel.create_channel(key)
self.response.out.write(
{'channel_id': id})
The web browser passes the channel ID to the same remote server to establish a connection via iframe long polling:
<script src='/_ah/channel/jsapi'></script>
<script>
var channelID = '{{ channel_id }}';
var channel =
new goog.appengine.Channel(channelId);
var socket = channel.open();
socket.onmessage = function(evt) {
alert(evt.data);
}
</script>
When something interesting happens, the App Engine app can push a message to the user's channel, and the browser's long poll request will immediately receive it:
class OtherPage(webapp.RequestHandler):
def get(self):
# something happened
channel.send_message(key, 'bar')
Jetty, for example, supports this feature since version 7: Jetty Websocket Server
Google App Engine have plans for this also. They even have working demo of this at Google I/O 2010, but it's not in production yet. See ticket #377