Robust web Socket Streaming to Kafka in Java - java

I need to record from an unreliable web socket connection and stream into Kafka.
Our Kafka cluster is pretty reliable and we can make it highly available.
What is the best approach to make the web socket connection as reliable as possible? I would like to minimize data loss.
One solution would be to have multiple processes or web socket clients listening and streaming to multiple Kafka topics. Then do a filter with Kafka streams. This only works if each message that I obtain has a unique id, which is not always the case.
Another solution would be to monitor the web socket connection and restart or reset it. But then I might have data loss. Or rely on web socket heartbeats?
Or code my own error handlers?
What frameworks / libraries from the Java are in this space to do the best job? Currently, I use the web socket client from org.java-websocket.
I think this is a pretty standard use case in web socket and also in Kafka producer development, so I am sure I do not need to reinvent the wheel.

Related

Java + Redis Pub/Sub - help to choose a good Worker implementation

I am designing a Real Time backend chat application for mobile devices and to do this I am building everything over Java (to deal with incoming HTTP requests ) and Redis (Pub/Sub). Now I am looking for a Worker and already took a look at tools like Resque, Python-RQ and even Celery (also offers Redis integration), but maybe things grow and remain difficult to manage. I want to keep things as simple as possible. Has anyone tried to use Jedis (redis java client) to listening messages from a Redis channel and started a new Thread to each message received? Was the performace bad? what if a had hundreds of requests per second? It seems a poor solution (simple Thread as a Worker)
The flow is (for android example):
Android client send a message to chat
My Rest webservice (tomcat) receives the message and publish (jedis) a message to a Redis channel [quite simple]
The Worker (?) process the message and deliver it to all subscribers over Google Cloud Message (simple http request)
So, any suggestions or experiences about Redis workers implementations or Jedis library? What do you recommend? Thanks.
For those who wants a suggestion:
I opted for Python-RQ because of its simplicity. Too simple, well documented and solved my problem.
Regards.

Separated websocket for each connected client

I am planing to develop JavaScript client application that will connect to Java server using websocket. Server should handle many connected clients.
After some reading I found out websocket single thread. This is not good if I want to run databases query that can block everything for a while.
What I am thinking about is to opening separated websocket for each JavaScript client. One socket is listening for new connection and when connection is established creates some unique id. After that opens new websocket and send id to client using listener socket. When client received id close first socket and connect to new one.
What do you think, is it good solution? Maybe I am missing something?
Spring 4 gives you the chance to use a thread pool. The documentation is here:
http://docs.spring.io/spring/docs/current/spring-framework-reference/html/websocket.html
You could use Akka to manage all the concurrency and thread management for you. Or you could use the Play Framework that already builds on Akka and that supports WebSocket quite nicely. With Play you can choose between Java and Scala on the server side.
You should use NodeJS on the server to handle the socket i/o. You can connect to it via your javascript client apps, and then make calls to your Java based API. NodeJS is non blocking (async) and you should be able to leverage your existing Javascripting skills to quickly build a Node app. You could even use a full MEAN stack to build the client/server app. http://meanjs.org/ or http://mean.io/#!/ are two popular places to start.

TCP socket and Web Socket on playframework server?

My service already uses Websockets to communicate with an HTML5 in-browser client. The client is served by the same server from a normal http request.
Now I would like to offer the same service/app but out of the browser, and I would like to offer it over TCP sockets.
The RPCs/action object I am using are going to be the same, the serialization is going to be the same, the logic is the same. I just want to use TCP socket instead of WebSocket.
I would like to keep the code together under the same "project folder", starting all at once when I deploy the playframework server (basically on start I want to start listening to WebSockets, TCP sockets and http requests), and have everything in the same package on deploy.
I know that:
It is not necessary, since WebSocket can be used in not-in-browser apps, but consider this an exercise or a curiosity question.
playframework is built on top of netty, and I used netty before to do some TCP services (nothing big and nothing prod ready though ... so not an expert). So they should work together right?
What I was thinking to do:
Have an akka actor listen for new socket connections.
Wrap the connections (WS or TCP sockets) into a ClientConnectionManager instance
Pass it to the actors that takes care of the connections/rpc logic.
Other leads I considered: Reimplementing the playframework Controller class.
Or is there an already implemented solution for this?

Connections between controller and remote worker machines

I need to develop a platform in Java to download tweets from Twitter (that was obvious). The idea is to have various computers downloading from the streaming API and a main controller to send tasks (keywords to download and other data) to each fetcher. My problem is related with the connection between this programs. Which is the best way to do this? Actually I'm using RMI to send commands like "stop", "start", "setTask" from the Controller (client) to each fetcher (servers) and a SSLSocket to make a quick validation, but I'm not sure if this is a good idea. I could use TCP sockets but maybe it's not a good idea to have permanent connections. What do you think? Is it a good idea to keep using RMI or should I take another point of view?
Thank you ;)
I propose you to use queue (and any queue protocol).
ActiveMQ, RabbitMQ, QPID, or one of many other tool.
I use ActiveMQ in prod and fine with it, but for very highroad RabbitMQ will be better.
you receive easy scaling for any count of workers and easiest way to share/split tasks between workers.
Also please look on ActiveMQ or RabbitMQ or ZeroMQ or

Managing multiple socket connections in a java server application

In our new project we need to implement a server application. This server gets connection requests of 50,000(+) clients. Problem is these connections have to remain open and have to be managed somewhere. The application should work like a telephone exchange. So it can get requests of connected clients and connect them to other (maybe several) clients only if they are also connected. A proprietary protocol is used. My questions are:
How (and where) to manage the open sockets? Should I put them in a HashMap or something? This sounds curious to me. But I don't have experiences with so many open connections.
Are there any frameworks available which support this connection requirements?
Thank you for your help!
How (and where) to manage the open sockets? Should I put them in a HashMap or something?
Typically each socket will be managed by a thread that will be responsible for reading and writing to the socket. You would also have a master thread that is responsible for receiving all connection requests at a predefined network interface & port (using the ServerSocket API class), which may then hand off the actual processing work to the worker/slave threads. In this case, you ought to be looking at a thread pool for the worker threads, because creating 50k threads will most likely overwhelm your OS and the hardware.
Also, if you are indeed managing 50k concurrent sockets, using NIO API (java.nio.*) over the plain IO API of Java is highly recommended, although I haven't seen too many projects requiring more than 2-5k concurrent connections. There are atleast two known NIO based frameworks in the Java world - Apache MINA and JBoss Netty. I would however recommend reading the well written NIO tutorial, before heading onto use the NIO API or the NIO frameworks.

Categories