I need to implement analytics system with server and terminals which in realtime.
I use library ZeroMq (pub|sub mode) to send messages to client (~40bytes).
if I connect with 1 client, messages come with delay (sometime more than 250ms).
if I connect with 100 clients a lot of clients lose uniformity of delivery (more than 750ms no one message, after that huge scope of data). It is so critical issue for me.
I have to publish to more than 6000 terminals...
Publish every 30ms, it is about 1700bytes to each client in the worst case (tcp)
Maybe I should use another technology to deliver messages in realtime?
As I said in the comment, Multicast is the way. The primary overriding concern is whether your terminals can join the group you are publishing on - irrespective of how far away they are.
You've not indicated how the terminals connect to your network - (for example vpn over internet, private line whatever..) You asked for a better technology - it's multicast.
Now there are some options if you are going to go down the tcp route:
Build a load-balacing infrastructure which sits in front of your
service. Meaning that your terminals don't connect to your
service, but to a set of load balancers which then connect to your
service. If you have 10 of these for example, each only has to deal
with 600 clients. Your problem is much smaller - you can scale this
way. Don't forget to use asynchronous io.
Buy better hardware - for example solace or tervela do hardware
message brokers which can scale to very large numbers concurrent tcp
connections - but this is not cheap.
Related
We are working on a Android project with the below requirements.
The application should be able to send data to all the devices which are running our application which exists in the WiFi LAN.
Some payloads are expected to be of size >= 5MB.
The data shouldn't be lost and if lost the client should know the failure.
All the devices should be able to communicate with all other. There will be no message targeted to a specific device instead all the messages should be reached all the devices in the N/W.
No internet hence no remote server.
Study we have done:-
UDP Broadcasting - UDP doesn't guarantee the message delivery but this is a prime requirement in our case. Hence not an option.
TCP - TCP guarantees the message delivery but requires the receiver IP address to be known before hand and in our case we need to send the message to all the devices inside the LAN. Hence not a straight option.
Solutions we are looking into:-
A Hybrid approach - Name one of the devices in the N/W as Server. Post all the messages to a local Server. The Server keeps a open socket to all the devices(which have our application) & when there is a message from a device then it routes the message to all the devices. The disadvantages of this approach are,
Server having multiple sockets open each per device. But in our case we are expecting devices <=5 in LAN.
Server discovery using continuous UDP broadcast.
We want to have all the data in all the devices. So if we newly introduce any device into the LAN then that device needs to get all the data from the server.
So my question, have you any time worked on these kind of hybrid approaches? Or can you suggest any other approaches?
Your hybrid approach is the way to go.
Cleanly split your problem into parts and solve them independently:
Discovery: Devices need to be able to discover the server, if there is any.
Select server: Decide which of your devices assumes the server role.
Server implementation: The server distributes all data to all devices and sends notifications as necessary. Push or pull with notifications does not matter.
Client implementation: Clients only talk to the server. The device which contains the server should also contain a normal client, potentially passing data to the server directly, but using the same abstract protocol.
You could use mDNS (aka Bonjour or zeroconf) for the discovery, but I would not even recommend that. It often createsmore problems than it solves, and it does not solve your 'I need one server' problem. I would suggest you handcraft a simple UDP broadcast protocol for the discovery, which already tells you who the server is, if there is any.
Select server: One approach is to use network meta data which you have anyway, for example 'use the device with the highest IP address'. This often works better than fancy arbitration algorithms. Once you established a server new devices would use this, rather than switching the server role.
Use UDP broadcast for the discovery, with manual heuristic repeats. No fancy logic, just make your protocol resilient against repeated packets and repeat your packets. (Your WLAN router may repeat your packets without your knowledge anyway.)
You may want to use two TCP connections per client, potentially to two different server ports, but that does not matter much: One control connection (always very responsive, no big amounts of data, just a few hundred bytes per message) and one data connection (for the bulk of the data, your > 5 MB chunks). This is so that everything stays responsive.
The system I am developing potentially has a very large number of clients (lets say one million) that need to periodically update a central server with some information. Clients are written in Java.
The specific use-case is that the server backend needs to have an up to date mapping of IP address to clients. But the client IPs are dynamic and subject to (effectively random) change.
The solution I have in mind requires the clients to ping the server to update their IP. The period ideally should be once every minute, but even 1 ping/10 mins is acceptable.
My questions, in sequence:
1M pings per 1 min is over 10k/sec. So first off I want to know
the approaches can scale to handle such a load. This is to know the options available.
Assuming you have more than one solution in mind, which of these
would be the most economical? The cost effectiveness is critically important. I don't have my own data center or
static and fat end-point on the net, so the server application will
need to run on some sort of provider or ultimately on the cloud.
Notes:
I considered running the server from home using my own ISP provided connection, but I am neither sure of the performance issues, nor what my ISP will think about a constant stream of pings.
I can't see how the server can auto-discover these IP changes.
Erik, your problem is much simpler than it seems to have been made to sound.
This problem been around for a decade maybe two. No need to re-invent the wheel here.
Why Polling/Pinging is a Bad Idea
The dynamic IPs provided by ISPs can have a variable lease time, but will often be at least 24-72 hours. Pinging your server every 1-10m will be a horrible waist of resources potentially making over a 4,320 useless HTTP requests PER CLIENT in a 72 hour period. Each request will be say around 300 bytes * 4,320 wasted http requests equals 1.3mb wasted bandwidth multiplied by your target client count of 1 million clients, you are talking about a monthly wasted bandwidth of ~1.2 TB! And that's just the wasted bandwidth, not the other bandwidth you might need to run your app and provide useful info.
The clients need to be smarter than just pinging frequently. Rather they should be able to check if their IP address matches the DNS on startup, then only when the IP changes, send a notification to the server. This will cut down your bandwidth and server processing requirements by thousands of times.
What you are describing is Dynamic DNS
What you are talking about is "Dynamic DNS" (both a descriptive name for the technology and also the name of one company that provides a SaaS solution).
Dynamic DNS is quite simply a DNS server that allows you to very rapidly change the mapping between a name and an IP address. Normally this is useful for devices using an ISP which only provides dynamic IPs. Whenever the IP changes for the router/server on a dynamic IP it will inform the Dynamic DNS server of the change.
The defacto standard protocol for dynamic DNS is well documented. Start here: DNS Update API, I think the specifics you are looking for are here: DynDNS Perform Update. Most commercial implementations out there are very close to the same protocol due to the fact that router hardware usually has a built in DynDNS client which everyone wants to use.
Most routers (even cheap ones) already have Dynamic DNS clients built into them. (You can write your own soft client, but the router is likely the most efficient location for this as your clients are likely being NAT'd with a private IP - you can still do it but at a cost of more bandwidth for public IP discovery)
A quick google search for "dynamic DNS java client" brings up full source projects like this one: Java DynDNS client (untested, just illustrating the power of search)
Other Considerations for your System Architecture
Lets say the IP-client mapping thing gets resolved. You figured it all out and it works perfectly, you always knows the IP for each client. Would you then have a nice reliable system for transferring files to clients from mobile devices? I would say no.
Both mobiles and home computers can have multiple connection types, Wi-Fi, Cellular Data, maybe wired data. Each of these networks may have different security systems in place. So a connection from a cellular data mobile to a wifi laptop behind a home router is going to look very different than a wifi mobile device connecting to laptop on the same wifi network.
You may have physical router firewalls to contend with. Also home computers may have windows firewall enabled, maybe norton internet security, maybe symantec, maybe AVG, maybe zone alarm, etc... Do you know the firewall considerations for all these potential clients?
Maybe you could use SIP as protocol for that purpose ?
Probably the java SIP libs already solved your problem.
Nice app by the way.
I would suggest better tweak you java program to know the IP change and then only hit the web service.
You can do it like,
on your java program initiation extract the IP of machine and store
it in Global variable or better some property file.
Run a batch process/scheduler which will check your IP every 30sec/1 minute for change.Java Quartz Scheduler will come very handy for you.
Invoke the web service in case of a change of IP.
This way it reduces your server role and thus traffic and connections.
You could create your own protocol on top of UDP, for example XML based. Define 3 messages:
request - client requests a challenge from server
challenge - server replies with challenge (basically a random number)
response - client sends username and hashed password + challenge back to the server
It's lightweight and not too traffic-heavy. You can load-balance it to multiple servers at any layer or using load-balancer.
Any average PC could handle million such hits per minute, provided you do server-side in C/C++ (I don't know about java network performance)
Please have a look at how no-ip works. Your requirement is exactly same as what it does.
Do I have the use case right? A community of users all want to receive pictures from each other? You don't want to host the images on the server but broadcast them directly to all the users?
There are two questions here. The first question is "how to know if my own WAN IP address has changed."
If you are not NATed then:
InetAddress.getLocalHost()
will tell you your IP address.
If you are NATed, then using dynamic DNS and resolving your own host name will work.
The second question is something like "How to share pictures between hosts which come and go on the internet".
The possible solution space includes:
IP Multicast, probably with Forward Error Correction and Carouseling, e.g. FLUTE.
File Swarming - e.g. bittorrent.
A Publish/Subscribe message bus solution using Jabber, AMQP, JMS, STOMP or similar. Suitable implementations include RabbitMQ, ActiveMQ, etc. JMS Topics are a key concept here.
The solution should avoid the massive overheads of doing things at the IP level.
I am creating a web application having a login page , where number of users can tries to login at same time. so here I need to handle number of requests at a time.
I know this is already implemented for number of popular sites like G talk.
So I have some questions in my mind.
"How many requests can a port handle at a time ?"
How many sockets can I(server) create ? is there any limitations?
For e.g . As we know when we implement client server communication using Socket programming(TCP), we pass 'a port number(unreserved port number)to server for creating a socket .
So I mean to say if 100000 requests came at a single time then what will be approach of port to these all requests.
Is he manitains some queue for all these requests , or he just accepts number of requests as per his limit? if yes what is handling request limit size of port ?
Summary:
I want to know how server serves multiple requests simultaneously?I don't know any thing about it. I know we are connection to a server via its ip address and port number that's it.
So I thought there is only one port and many request come to that port only via different clients so how server manages all the requests?
This is all I want to know. If you explain this concept in detail it would be very helpful. Thanks any way.
A port doesn't handle requests, it receives packets. Depending on the implementation of the server this packets may be handled by one or more processes / threads, so this is unlimited theoretically. But you'll always be limited by bandwith and processing performance.
If lots of packets arrive at one port and cannot be handled in a timely manner they will be buffered (by the server, the operating system or hardware). If those buffers are full, the congestion maybe handled by network components (routers, switches) and the protocols the network traffic is based on. TCP for example has some methods to avoid or control congestion: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Congestion_control
This is typically configured in the application/web server you are using. How you limit the number of concurrent requests is by limiting the number of parallel worker threads you allow the server to spawn to serve requests. If more requests come in than there are available threads to handle them, they will start to queue up. This is the second thing you typically configure, the socket back-log size. When the back-log is full, the server will start responding with "connection refused" when new requests come in.
Then you'll probably be restricted by number of File Descriptors your os supports (in case of *nix) or the number of simultaneous connections your webserver supports. The OS maximum on my machine seems to be 75000.
100,000 concurrent connections should be easily possible in Java if you use something like Netty.
You need to be able to:
Accept incoming connections fast enough. The NIO framework helps enormously here, which is what Netty uses internally. There is a smallish queue for incoming requests, so you need to be able to handle these faster than the queue can fill up.
Create connections for each client (this implies some memory overhead for things like connection info, buffers etc.) - you may need to tweak your VM settings to have enough free memory for all the connections
See this article from 2009 where they discuss achieving 100,000 concurrent connections with about 20% CPU usage on a quad-core server.
The point of my question is to ask if it is accepted to use both TCP and UDP to communicate between client and server.
I am making a real-time client server game with parts of the communication that need to be guaranteed (logging in, etc..), but other parts will be ok to lose packets (state updates, etc). So, I would like to use UDP for most of the data communication but I do not want to have to implement my own framework to insure that my control communication (logging in) is guaranteed.
So, would it be reasonable to initially use TCP to manage a connection, and then on a separate port send data communication pack and forth?
You should absolutely do it that way (use TCP and UDP to accomplish different communication tasks.) And you don't even have to use two different ports. One will suffice. You can listen to the two different protocols on the same port.
It is quite reasonable and already used in mainstream. Even when browsing the Web, DNS operations are UDP-based and HTTP connections are TCP-based.
Keep in mind that you should either consider the two connection types to be completely independent or employ additional measures to properly handle any inter-dependencies. TCP connections can have timing issues at the OS and network levels and UDP connections have packet loss issues. You should take specific measures to avoid deadlocks and performance problems when the TCP part of your application stalls or a UDP packet is lost.
It is not only accepted but is widely used. As a good example, BATS Exchange is using this approach in their market data distribution system, to implement a recovery mechanisms.
I have a serial hardware device that I'd like to share with multiple applications, that may reside on different machines within or spanning multiple networks. A key requirement is that the system must support bi-directional communication, such that clients/serial device can exist behind firewalls and/or on different networks and still talk to each other (send and receive) through a central server. Another requirement of the system is that the clients must be able to determine if the gateway/serial device is offline/online.
This serial device is capable of receiving and sending packets to a wireless network. The software that communicates with the serial device is written in Java, and I'd like to keep it a 100% Java solution, if possible.
I am currently looking at XMPP, using Jive software's Openfire server and Smack API. With this solution, packets coming off the serial device are delivered to clients via XMPP. Similarly, any client application may send packets to the serial device, via the Smack API. Packets are just byte arrays, and limited is size to around 100 bytes, so they can be converted to hex strings and sent as text in the body of a message. The system should be tolerant of the clients/serial device going offline, meaning they will automatically reconnect when they are available again, but packets will be discarded if the client is offline. The packets must be sent and received in near real-time, so offline delivery is not desired. Security should be provided by messaging system and provided client API.
I'd like to hear of other possible solutions. I thought of using JMS but it seems a bit too heavyweight and I'm not sure it will support the requirement of knowing if clients and/or the gateway/serial device is offline.
Jini might fit the job. It works really well in distributed environments where multicast is available but it also works in unicast, and is quite fast. Not only it provides remote services, but also remote events, and distributed transactions if you need them. A downside is that it only works with Java.
Where I work, Jini is used in a infrastructure with more that 1000 machines, with each machine providing remote services used to access the devices connect to the machine serial ports.
You really need to provide a bit more detail... do the clients need guaranteed delivery? What about offline delivery? Is this part of a larger system? Do you need encryption? Security?
If you want the smallest footprint possible, then should transmit data using SocketServer, Sockets, and serialization. But then you lose all of the advantages of the 3rd party solutions you mentioned, which typically include reliability, delivery guarantees, security, management, etc.
I would personally use JMS, but that's because I'm familiar with it. There are a number of stand-alone servers that can be deployed out-of-the-box with virtually no configuration. They all provide for guaranteed delivery, some security, encryption, and a number of other easy-to-use features. Coding a JMS publisher or subscriber is pretty easy.
Update:
If you want the most ease in coding, then I would look at the third-party solutions. Looking at Smack/XMPP, the API seems to be a little easier than a JMS for the functionality you asked for. You still have to setup/configure a server, etc.
The Smack API also has a lot of extra baggage that you don't need either, but its "concepts" are a little more intuitive since its all chat/IM concepts.
I would still look at OpenJMS or ActiveMQ. I think knowing JMS will be more valuable in the future as compared to knowing XMPP. Take a look at their Getting Started documentation or the Sun Tutorial to see how much coding is involved. In JMS parlance, you will want an administered "Topic" and a "Queue" where the Serial Port App will receive and send messages respectively. All of your clients will open a subscription to the Topic and send their outbound messages to the Queue. When you send messages, their delivery mode should be non-persistent.
I ended up using XMPP via the Smack API. What led me to this decision was its native support for presence (is the client online/offline) and robust connection handling (it automatically reconnects if a the underlying connection breaks). Another benefit of XMPP is that it's compatible with Google Talk, so I don't need to setup a server. Thanks for the suggestions. In case anyone is interested, I have released the code on Google Code http://code.google.com/p/xbee-xmpp/