Persistent push with comet long-polling on Jetty?

Persistent push with comet long-polling on Jetty? - java

I am trying to create a Jetty servlet that allows clients (web browsers, Java clients, ...) to get broadcast notifications from the web server.
The notifications should be sent in a JSON format.
My first idea was to make the client send a long-polling request, and the server respond when a notification is available using Jetty's Continuation API, then repeat.
The problem with this approach is that I am missing all the notifications that happen between 2 requests.
The only solution I found for this, is to buffer the Events on the server and use a timestamp mechanism to retransmit missed notifications, which works, but seems pretty heavy for what it does...
Any idea on how I could solve this problem more elegantly?
Thanks!

HTTP Streaming is most definitely a better solution than HTTP long-polling. WebSockets are an even better solution.
WebSockets offer the first standardised bi-directional full-duplex solution for realtime communication on the Web between any client (it doesn't have to be a web browser) and server. IMHO WebSockets are the way to go since they are a technology that will continue to be developed, supported and in demand and will only grow in usage and popularity. They're also super-cool :)
There appear to be a few WebSocket clients for Java and Jetty also supports WebSockets.

Sorry for bumping this up, yet I believe numerous people will come across this thread and the accepted answer, IMHO, is at least outdated, not to say misleading.
In order of priority I'd put it as following:
1) WebSockets is the solution nowadays. I've personally had the experience of introducing WebSockets in enterprise oriented applications. All of the major browsers (Chrome, Firefox, IE - in alphbetical order :)) support WebSockets natively. All major servers/servlets (IIS, Tomcat, Jetty) are the same and there are quite a number of frameworks in Java implementing JSR 356 APIs. There is a problem with proxies, especially in cloud deployment. Yet there is a high awareness of the WebSockets requirements, so NginX supported them already 1.5 year ago. Anyway, secured 'wss' protocol solves proxy problem in 99.9% (not 100% just to be on the safe side, never experienced that myself).
2) Long Polling is probably the second best solution, and the 'probably' part is due to 'short polling' alternative. When speaking of long polling I mean the repeated request from client to server, which responses as soon as any data available. Thus, one poll can finish in a few millis, another one - till the max wait time.
Be sure to limit the poll time to something lesser than 2mins since usually otherwise you'll need to manage timeout error in you client side. I'd propose to limit the poll time to something like tens of seconds.
To be sure, once poll finished (timely or before that) it is immediately repeated (yet better to establish some simple protocol and give to your server a chance to say to the client - 'suspend').
Cons of the long polling, which IMHO justifies the continuation of the list, is that it holds one of just a few (4, 8? still not that many) allowed connections, that browser allows each page to establish to a server. So that is can eat up ~12% to ~25% of your website's client traffic resource.
3) Short polling not that much loved by many, but sometimes i prefer it. The main Cons of this one, is, of course, the high load on the browser and the server while establishing new connections. Yet, i believe that if connections pools are used properly, that overhead much lesser than it look like on the first glance.
4) HTTP streaming, either it be page streaming via IFrame or XHR streaming, is, IMHO, highly BAD solution since it's like an accumulation of Cons of all the rest and more:
you'll hold the connections opened (resources of browser and server);
you'll still be eating up from total available client traffic limit;
most evil: you'll need to design/implement (or reuse design/implementation) the actual content delivery in order to be able to differentiate the new content from the old one (be it in pushing scripts, oh my! or tracking the length of the accumulated content). Please, don't do this.
Update (20/02/2019)
If WebSockets are not an option - Server Sent Events is the second best option IMHO - effectively browsers implemented HTTP streaming for you here at the lower level.

I have done this before using Http Streaming via Atmosphere framework and it worked fine.
Visit Comet, Streaming
if you see the atmosphere tutorial they have given multiple examples

You may want to check how they implemented this in CometD: http://cometd.org .
Or you may even consider to use that tool, without having to reinvent the wheel.

Related

Android: connection to postgressql (direct or webservice) [duplicate]

Can someone answer on my dilemma which method to use for connecting Android device to mySQL or Postgresql?
I can do it in both ways without any errors and problems, with no noticeable difference but everyone recommend web service instead of using jdbc driver and direct connection,
Can someone explain why with some facts?
EDIT: I did'n mention that is more simple and needs less time to do it over jdbc. So, why web service, or why not?

You think it's simpler and faster to do it with JDBC because you aren't considering the real world operating environment of phones and portable devices. They often have flakey connectivity through buggy traffic rewriting proxies and insane firewalls. They're typically using a network transport layer that has high and variable packet loss rates and latencies that vary over many orders of magnitude in short spans of time. TCP really isn't great in this environment and particularly struggles with long lived connections.
The key benefit of a web service is that it:
Has short-lived connections with minimal state, so it's easy to get back to where you were when the device switches WiFi networks, to/from cellular, loses connectivity briefly, etc; and
Can pass through all but the most awful and draconian web proxies
You will routinely encounter problems with a direct JDBC connection. One challenge is reliably timing out dead connections, re-establishing sessions and releasing locks held by the old session (as the server may not decide it's dead at the same time the client does). Another is packet loss causing very slow operations, long-running database transactions, and consequent problems with lock durations and transactional cleanup tasks. You'll also meet every variety of insane and broken proxy and firewall under the sun - proxies that support CONNECT but then turn out to assume all traffic is HTTPs and mangle it if it isn't; firewalls with buggy stateful connection tracking that cause connections to fail or go to a half-open zombie state; every NAT problem you can imagine; carriers "helpfully" generating TCP ACKs to reduce latency, never mind the problems that causes with packet loss discovery and window sizing; wacky port blocking; etc.
Because everyone uses HTTP, you can expect that to work - at least, vastly more often than anything else does. This is particularly true now that common websites use REST+JSON communication style even in mobile web apps.
You can also write your web service calls to be idempotent using unique request tokens. That lets your app re-send modification requests without fear that it'll perform an action against the database twice. See idempotence and definining idempotence.
Seriously, JDBC from a mobile device might look like a good idea now - but the only way I'd even consider it would be if the mobile devices were all on a single high-reliably WiFi network under my direct control. Even then I'd avoid it for reasons of database performance management if I possibly could. You can use something like PgBouncer to pool connections among many devices at the server side so connection pooling isn't a big problem, but cleanup of lost and abandoned connections is, as is the tcp keepalive traffic required to make it work and the long stalled transactions from abandoned connections.

I can think of a few reasons
JDBC android driver support for your database.
Connection pooling across various Android devices make it difficult to monitor and cap them.
Result sets sent from the DB to android will consume a lot of bandwidth and battery power.
Proxies usuall allow HTTP access to your device.
Exposing your database directly to the client has security implications.
Web services can provide additional features on top of the JDBC connection like authentication / quality of service / authorization / conditional GET requests / error handling etc. JDBC cannot do any of these.

Besides all things Craig Ringer said, which I completely agree, JDBC has another problem: it will force to expose your database to the world. If you want android devices to access it, you will need to provide your app with database credentials, and the database will have to have public access.
Using a WebService or RESTful API is clearly the way to go to make your application secure.

Another option would be to use a database sync tool like SymmetricDS.
This would let you have say a Postgres database on your server, and a SQLite database on your tablet.
SymmetricDS would synchronize the databases over HTTP, when a connection is available. You don't have to sync the whole db of course, just the relevant parts.
(I am not affiliated with SymmetricDS)

TL;DR: It depends!
(Sorry to all the "never ever ever ever do it, direct conns are always evil"-devs)
When creating a public domain / general app for the playstore kind of thing, I am mainly with my fellow responders. Opening your DB to "everyone" (especially when permissions are badly or not at all configured) is typically not a great idea!!
However(!), the story might be totally different, when you e.g. create something for internal use within the network boundaries of your company, like Android handheld devices for logistics, inventory, etc. In these cases I would even most of the time definately recommend going with JDBC or a similar direct connection. Reaons being:
One less point of failure
One less development (sub-)project
One less thing to maintain and keep up to date with your data-structure
One less thing to keep up and running, CI/CD, test, etc. (you get the draft)
Which - im my humble opinion - is worse than the (implement it once) effort of connection pooling, reestablishment, etc. (if it really becomes necesseary, be careful with premature optimization there).
But for public projects ... well, if they only ever require read access, I could possibly imagine it as well, or if there are only certain tables were you allow adding, but not delete or modifications. There are some tricks you could apply to make it still secure (allowing adds but not reads with id-secrets for a certain table, triggers, and general reads for other tables, etc.), but there is a lot to think and a lot to miss about these. So generally, I would say it is bad practice to allow your public domain client to get hold of your SQL connection. But still, don't let that hinder you to ask yourself (and understand) "why" and look at the specific situation. There might even be good cause/use for that. Especially since it is "less", which is also often better. It definately depends.
Just be careful and aware that (even if permissions are set correctly) a lot can be misused (and only little hindered), with a direct connection at your client. (Plus possible connection issues to be taken care of.)
As a sidenote: A lot of these considerations are relevant again with the use of technologies like GraphQL, which shares some similarities (however without connection issues and with a little bit more secure control).

Sending HTTP requests from a pool of IP addresses on rotation basis to avoid throttling

We want to hit a server multiple times without getting throttled. This is a basic problem now a days people face and must have found a work-around.
Long back I did a small experiment in which I assigned multiple IP Addresses to my desktop and sent HTTP requests using socket(...,...,...,..). I wrote a small servlet to test it and it was working fine.
I wanted to know if there is any better approach to perform the task.

Per your comments, you are querying a WEB page on the internet multiple times, at high rate probably (higher than a human normal browsing rate).
As browsers, you should limit yourself to the number of connections opened simultaneously. originally, HTTP 1.1 defined 4 such connections is allowed, but latest browser versions are utilizing around 8, multiple windows of the browser add even more.
But the base rule is still there, and some server farms at big sites try to protect themselves from attacks using what you call throttling.
You can embrace the same techniques that the browsers are taking to balance request rates:
Limit the number of connections you open in parallel and utilize an open connection (without closing it) to send a new request when the previous one has ended.
A more complicated approach is to implement HTTP Pipelining, and add it to point 1 above. Pipelining allows you to send several requests on the same connection without waiting for the response, but not all sites support it (and not all browsers use it).
Your proposal to use several source address is applicable, but it depends what is the technology behind the farms that serve the site, they may assume similar IP address to originate from the same user network.
Another options is to distribute the clients over the network with different clients, which combine/sync the work through a central server.

How to optimize number of database connections?

We have a Java (Spring) web application with Tomcat servlet container.
We have a something like blog.
But the blog must load its posts dynamically with Ajax.
The client's ajax script checks for new posts every second.
I.e. Ajax must ask the server for new posts every second and it will be very heavy for database.
But what if we have hundreds of thousands connects simultaneously?
I think that we must retrieve all posts with cron every second and after that save it somewhere. But where? The main idea is to unload the database.
Any ideas about architecture?
Thanks in advance!

There is other architecture for polling that could be more optimal, depending on the case:
Long polling
Long polling is a variation of the
traditional polling technique and
allows emulation of an information
push from a server to a client. With
long polling, the client requests
information from the server in a
similar way to a normal poll. However,
if the server does not have any
information available for the client,
instead of sending an empty response,
the server holds the request and waits
for some information to be available.
Once the information becomes available
(or after a suitable timeout), a
complete response is sent to the
client. The client will normally then
immediately re-request information
from the server, so that the server
will almost always have an available
waiting request that it can use to
deliver data in response to an event.
In a web/AJAX context, long polling is
also known as Comet programming.
Long Polling
Example of Implementations of this technology:
Push Server
You could also use the observer pattern to register the requests, and notify them when an update is done.

Hundreds of thousands of concurrent users all polling our site every second makes for a huge amount of traffic. If you truly expect this load you are going to have to design your platform accordingly, probably by clustering multiple web, application and DB servers.
Remember that with a database connection pool you don't need a DB connection for every user.

I'm not as familiar with Tomcat, but in WebSphere we can set up connection pools to prepare a certain number of connections.
Also, are you mainly worried about reads or the same number of writes?
Plus, you may also want to have the database "split" depending on region etc. This way there is no single heavy load across the entire database, but it can then be split and even load balanced.
There is also the "NoSQL" databases to look into as well. Maybe something to consider. Just ideas to help out.

server push or client push is better?

I am developing a chat website using jsp/servlet.I will be hosting my website on gooogle appengine .Now i have some doubts regarding whether to use server push or client pull technology
1)If i use server push and if i dont close the response of servlet will it cause the server to go slow?How many simultanious connection can a tyicall tomcat server can handle if i keep the socket open for the entire chat session between 2 clinets??
2)Will server push or clinet push be better??

If you are using a servlet (prior to 3.0), then I guess you'll have to go with pull because of the programming model of servlet. However, there ARE advantages in using a push model. Primarily, wasted load on server and the limitation in latency. That's why there are technologies such as comet. Servlet 3.0 also supports push model. These are commonly used in ajax based apps.
In fact I believe a push model is more suited for a chatting app. because of the fast response time (=better user experience) it can provide.
If you use a nio based implementation for push-model, you can support thousands or even more than 10k concurrent connections (obviously, your millage varies).
If you use a conventional IO based implementation, it will be likely in the range of hundreds of concurrent connections (don't take this estimation too seriously though. I'm just giving these numbers to give a very, very rough feeling).
As for tomcat, last time I checked, people were saying that it won't have a good push-model support until version 7.0. But I'm not following the current status so I'm not sure (Sorry, perhaps somebody else can help you on this). If that is the case, you might want to check out comet support of jetty.
grizzly and netty are also good NIO based network frameworks, but if you want to use JSP, and find that tomcat is not sufficient, I guess jetty would be the best bet.
edit: (some additional info)
In this "push models", it's not like the server opens a connection to the client. The connection will be kept alive, and the server will push messages as it sees fit.
Also, it's not like there are only "push" and "pull" models. You can have a hybrid, like long polling.

I don't know how are you thinking of achieving server push here. As far as I can see, server needs a request to respond over HTTP. So, when there is a request, server will respond to that.

If i use server push and if i dont close the response of servlet will it cause the server to go slow?
App Engine will not let you do that. You have to finish your response within thirty seconds, or it will be killed. The thirty seconds is also an edge case, most calculations they do (for quota and such) are based on a 75 millisecond response time.
How many simultanious connection can a tyicall tomcat server
Tomcat? I thought you are planning to use App Engine?

Pull. Always pull.
I know it's a manufacturing-oriented book but the advice from Lean Thinking (Womack & Jones) is invaluable in any context (roughly, from memory):
Start by defining value,
line up the activities that create value in the value-stream,
create flow across the value-stream,
let customers pull value from the value-stream,
compete against perfection rather than other organizations
If I misquoted them, I apologize. Anyway, all of those principles can easily be applied in the development of any software product just as they could in the production of any physical product but the one that matters for you is pull.
Letting consumers of a service pull rather than pushing to them not only makes your programming model easier, it aligns activity with demand. You can still use queuing to load-level over time, if you have to, just the way you could with push but, this way, you have complete visibility into what, exactly happens in any given transaction.
I don't quite get your first question but the answer is still pull.

The answer to your query depends on what underlying protocol you wish to use.
Since you have mentioned JSP/servlets, your app will be implemented over the HTTP protocol.
HTTP is a protocol over TCP. TCP is connection oriented and remains alive, until the connection is ended. However, HTTP connections are persistent, only for the duration of a single request-response cycle. The TCP connection is broken after every request-response cycle. So that should answer your doubt with regards to how many socket connections a typical TOMCAT server will be able to handle. The connections will not be persistent, at all. They will only last the duration of a HTTP request-response cycle.
Given this basic idea, I would suggest , you use a client pull strategy, to implement your app.
Even with server push, over HTTP, even though the name says "server push", it is always the web client that polls the server at regular intervals, which just gives an illusion of "server-push". HTTP specification mandates that the client makes a request to which the server responds.
I have considerable experience in developing chat applications (both mobile and web).
Let me know , if you need any assistance. I will be more that willing to help.

Fat Java client need two-way communication channel to web server over http/https

I have a situation where I want a Java client to have a two-way data channel with a servlet (I have control over both), so that either can begin data transferring without having to wait for the other to do something first, but to get through the firewalls this needs to be tunnelled in http or https.
I have looked around, but I do not believe I know the right terms for asking Google.
I was originally looking at http-tunneling modules, but realizing that I have a web container in the other end, I believe that the appropriate way is to think of a fat client needing to communicate home. I was thinking that the persistant connection in http 1.1 might be very useful here. I can easily do heartbeat transfers to keep the connection from ideling.
At this point in time I just need to do a proof of concept so I primarily need something that works now, which can then be optimized or even replaced later.
So, I'd appreciate pointers to projects that allow me to have a connection where either side can at will push information (like a serialized object or a descriptive stream of bytes) to the other side. I'd prefer pure Java, if at all possible.
EDIT: Thanks for the pointers. It appears that what I need, will be available in the servlet 3.0 specification, which I might end up using in the long term depending on when it will be supported in the various web containers.
For now I am investigating the Cometd package, which appears to be able to do exactly what I need for my prototype.

Search terms: comet, long-polling
These are mostly used in an AJAX context, but I see no reason why you could not use them in a Java project.

Please take a look at Eclipse Net4J,
http://wiki.eclipse.org/Net4j
It supports all the features you mentioned. A special nice feature is that it supports HTTP connection pooling so you can have lots of channels between client and server but use only a few HTTP connections.
The only problem is that it doesn't have documentation at all. You just have to read the source code. Once you figure it out, it's very easy to use.
There are a few more diagrams on old Net4J site,
http://net4j.berlios.de/

How fast does it need to be? You could always just do polling on the client. Just check for new messages every so often.

You can use the Hessian protocol over HTTP. It's a fast binary protocol for serializing data. Typically used for a web-services style RPC communication, but there's no reason it couldn't be 2-way - see Hessian mux. It's pure Java, too :-)

Generally this is done by having the server not respond to an http request immediately. It waits around for some update (or a timeout) before sending a response. Obviously some care needs to be made ensuring that the server will handle this under load.
See, for instance, Comet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.