For testing purposes I want to create a socket server which will contain 10+ million concurrent socket connections spread over X number of ec2 instance on AWS (still deciding on either node.js with JXCore, Java, or Erlang). These sockets will be sending messages randomly to one other socket every 10 seconds. I am just having trouble understanding how I can store and read these sockets effectively.
The two options I can see are to store the socket objects in something like a map in the application itself, or store the sockets in a fast database such as Redis. The problem with having sockets stored in a data structure inside of the application is will it be able to scale, be robust, and how will the read performance be when millions of sockets need to find one another. And If I store them in something like a database such as redis there must be a network call every time because Socket A needs to know where Socket B is located to send the message. This I fear will bring down performance considerably.
I was wondering what the best practices are for scalable socket servers as I can't find anything on the internet which answer this question. Every socket server I find online simple broadcasts to every other socket instead of having specific sockets and only contain something like 10 sockets.
If you want this application to be distributed across a number of nodes there need to be a way to at least determine destination node. If it may be a pure function of source and current packet, no central storage is required, which is the best possible solution.
In the rest of the cases central storage is inevitable but some optimizations may be applied to reduce access to it. Local sockets can be easily stored in a local map (ets or mnesia in erlang, shared singleton map in other languages) and checked first. Source may be told to cache destination address so the packet would contain all the necessary information. Or that destination cache could be stored on the source socket node to not depend on client behavior. This cache may be used for routing and only if the route operation was unsuccessful the central storage may be accessed.
The may be some other optimizations which depend on what's available in your case.
Related
I've been looking into making a simple Sockets-based game in Java, and read in multiple places that client sockets are destroyed after a single exchange. Is this good practice for continued connections? The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response. I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client? I've seen so many conflicting websites about sockets in Java and how they should be implemented.
There's no hard and fast rules, but it does depend slightly on what data rates you want to achieve.
For example, YouTube is a streaming video service, but the video data is delivered by means of the client using https to fetch batches of video data. Inefficient, yes, but very easy to program for. There's lots of reasons to use https for an application like YouTube (firewalls, etc), but ultimate power saving and network performance were not one of them. The "proper" way would be to use a protocol like RTP which uses UDP to deliver small packets of data which can then be rearranged into order, you also have to deal with missing frames at the CODEC level, etc. Much less network traffic, friendly to bandwidth constrained network links, but significantly more difficult to deal with traversing across firewalls, in client software, etc.
So if your game is sending modest amounts of data, the only thing wrong with setting up and tearing down a whole socket connection for every message is the nagging feeling you yourself will have that it is somehow not the most efficient solution.
Though it sounds like you have a conflict between the need to communicate between client / server and a need to process something else whilst waiting for the communication to complete. Here you're getting into asynchronous I/O territory. To make that easy i strongly suggest you take a look at ZeroMQ - that will make everything a whole lot simpler.
and read in multiple places that client sockets are destroyed after a single exchange.
Only in the places where that actually happens. There are numerous contexts where it doesn't, the outstanding example being HTTP, where every effort is made to reuse connections.
Is this good practice for continued connections?
The question is a contradiction in terms. A continued connection is a connection that isn't closed. A closed connection can't be continued.
The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response.
The word you are groping for here is 'session'.
I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client?
Yes.
I've seen so many conflicting websites about sockets in Java and how they should be implemented.
You should use a connection pool at the client; a request loop at the server that looks for multiple requests per connection; a client-side facility that closes idle connections after some idle timeout; and a read timeout at the server that closes connections on which no request has been read within the timeout.
I am facing a problem regarding designing my app with datagram socket. My app needs to communicate with different servers using udp connections. now I am not sure which of the following will be good. Is there any advantage of any of the following ( by performance or by other measures ). or is there any better option?
Option 1
create a single Datagram socket, and create a single thread to receive data of that. While sending to different servers set the address of the datagram packets. and in the receiving thread check the address and process data accordingly
Option 2
create different datagram sockets to communicate with servers. use socket.connet() to connect to the relevant server. And create threads for every socket to receive data.
N.B. I am actually working on an android app. if you have any query you can ask in comment
Unless you are we are talking about 100000 of connections, I would create single socket per thread. It speeds up application and guarantee the thread safety of sockets and that receaved data wont get mixed up.
The most important is however, that if one channel will fail or latency will get high, it will have no influence on other channels (sockets).
The drawback is that you are consuming more resources.
All depends on purpose of app.
My opinion is you can create a single socket to because creating more socket will bring down your app.
I have java server-client application using CORBA connection. the application running well with wired connection, but when connected via WiFi the client app running very slowly. Anybody has an idea why CORBA very slow over WiFi?
thanks in advance.
You haven't quantified at all what is slow and fast. There are a few things to look at, first the design of your IDL interfaces. Normally each invocation of an IDL operation results in a remote call which goes over the network. For example when you want to retrieve 1M values, don't perform 1M operations, but retrieve them in bigger chunks. Secondly, what is the payload of the invocation, what is the size of the data to transmit. If that is large and your wifi link is slow, than it just takes time to transmit the data, ZIOP (CORBA Compression) adds the ability to CORBA that it will compress your application data, something to look at. Last, is your network setup correctly, do all host names and ip addresses you use do work correctly, if for example in your wifi setup the DNS settings are not ok, than reverse lookups can kill performance.
Check your CORBA implementation to enable logging, see what is happening, how much data is transmitted, do you see errors, etc.
CORBA can be a very network intensive protocol if developers design CORBA objects like ordinary C++/Java Objects, this will cause several small interactions over the network. This makes it very susceptible to network latency. I.E. not the overall speed of the network but the time it takes to open a stream and send a single packet. Wireless networks can be very fast sending large packets once a connection is established, but, I suspect your wireless network is quite slow to route packets.
I have a Scala application which maintains (or tries to) TCP connections to various servers for hours (possibly > 24) at a time. Each server sends a short, ~30 character message about twice a second. These messages are fed into an iteratee where they are parsed and eventually end up making state changes to a database.
If any of these connections fail for any reason, my app needs to continually try to reconnect until I specify otherwise. Any messages getting lost is Bad. I have no control over the servers I connect to, or the protocols used.
It is conceivable there would be as many as 300 of these connections at once. No exactly a high-load scenario, so I don't think NIO is needed, though it might be nice to have? Other bits of the app are high-load.
I'm looking for some sort of socket controller / manager which can keep these connections as reliably as possible. I am running my own blocking controller now, but as I'm inexperienced with socket coding (and all the various settings, options, timeouts, etc.) I doubt its will achieve the best possible uptime. Plus I may need SSL support at some point down the line.
Would NIO offer any real advantages?
Would Netty be the best choice here? I've seen the Uptime example here, and was thinking of simply duplicating it, but being new to lower-level networking I wasn't sure if there were better options.
However I'm uncertain of the best strategies for ensuring as few packets are lost as possible, and assumed this would be a "solved" problem in one library or another.
Yup. JMS is an example.
I suppose a lot of it would come down to a timeout guessing strategy? Close and re-open a socket too early and you've lost whatever packets were en-route.
That is correct. That approach is not going to be reliable, especially if connections go up and down regularly.
A real solution involves having the other end keep track of what it has received, and letting the sender know when then connection is re-established. If that can't be done, you have no real way of controlling how much gets lost. (This is what the reliable messaging services do ...)
I have no control over the servers I connect to. So unless there's another way to adapt JMS to a generic TCP stream I don't think it will work.
Yup. And the same applies if you try to implement this by hand. The other end has to cooperate.
I guess you could construct something where you run (say) a JMS end point on each of the remote servers, and have the endpoint use UNIX domain sockets or loopback (i.e. 127.0.0.1) to talk to the server. But you still have potential for message loss.
Background
In my java application, I have reasonably large amounts of data sitting in ConcurrentHashMap.
Now, I need to give this data to a consumer client in XML format when the client connects to my application via a TCP port.
So in a nutshell - I have a TCP Server that a client connects to. As soon as the client connects, I have to read all the data in the Map and spit it out in XML format (custom) on the TCP port. The data in the Map keeps getting updated automatically from somewhere else using worker threads etc, so I have to keep sending the fresh data over and over to the client on this tcp port.
I want to implement a solution that is memory and cpu efficient - Mainly I would prefer not to generate too many immutable objects in the heap. .
NOTE:In future I might have to support multiple output formats (like comma separated or Json or HL7 etc). To keep it simple lets say there's different TCP port the client can connect for a specific format.
Question
With that said - I've been looking around for the best solution for my TCP Server implementation and data conversion process from ConcurrentHashMap to XML.
For TCP Server, people talk about
NETTY
Kryonet
Apache MINA
My client will be some third party, so i think kryonet is out, since client wont do the "register" business needed by Kryonet. So out of MINA and NETTY, which one is scalable and easier to understand? Any other suggestion?
FOR data conversion from ConcurrentHashMap to XML, I was thinking of using XSTREAM
Any other suggestion?
Thanks
If you have 100s or 1000s of connections you should start to consider scalability. However if you have small number of connections, using plain Sockets may be all you need.
If only a portion of the data is changing, you better off sending only the data which has changed, or at least only regenerating the XML which has changed.
How fast does it need to be? It seems like you should be able to create something that returns in less than 10ms (plus RTT) just using tomcat and a standard framework like spring-mvc. Use JAXB to convert objects to XML. If you want to support additional output formats like json it's trivial (use Jackson library for that, api is similar to JAXB).
I had a co-worker that tried the socket server approach and in the end we used tomcat because it was almost as fast and the QPS was more stable/predictable.