I have an issue that is driving me crazy! Both design-wise and tech-wise.
I have a need to listen to a LOT of multicast addresses. They are divided into 3 groups per item that I am monitoring/collecting. I have gone down the road of having one process spin-up 100 threads. Each thread uses 2 ports, and three addresses/groups. (2 of the groups are on same port) I am using MulticastChannel for each port, and using SELECT to monitor for data. (I have used datagram but found NIO MulticastChannel much better).
Anyway, I am seeing issues where I can subscribe to about a thousand of these threads, and data hums along nicely. Problem is, after a while I will have some of them stop receiving data. I have confirmed with the system (CentOS) that I am still subscribed to these addresses, but data just stops. I have monitors in my threads that monitor data drops and out-of-order via the RTP headers. When I detect that a thread has stopped getting data, I do a DROP/JOIN, and data then resumes.
I am thinking that a router in my path is dropping my subscription.
I am at my wits end writing code to stabilize this process.
Has anyone ever sent IGMP joins out the network to keep the data flowing? Is this possible, or even reasonable.
BTW: The computer is a HP DL380 Gen-9 with a 10G fiber connection to a 6509 switch.
Any pointers on where to look would really help.
Please do not ask for any code examples.
The joinGroup() operation already sends out IGMP requests on the network. It shouldn't be necessary to send them out yourself, and it isn't possible in pure Java anyway.
You could economize on sockets and threads. A socket can join up to about 20 groups on most operating systems, and if you're using NIO and selectors there's no need for more than one thread anyway.
I have used datagram but found NIO MulticastChannel much better).
I don't know what this means. If you're referring to DatagramSocket, you can't use it for receiving multicasts, so the sentence is pointless. If you aren't, the sentence is meaningless.
Related
Ok, so let´s clarify the questions...
I'm studing Sockets in Java, from my understood until now, related to this subject are:
To make multiple clients to connect to only one address in the server (port), then it is necessary to assign each client connection to another thread
Based on that I got confused about somethings AND could not find any acceptable answer here or at Google until now.
If Socket is synchronous, what happens if 2 clients try to connect AT THE SAME TIME and how the server decides who will connect first?
How the server process multiple messages from one client? I mean, does it process in order? Return ordered?
Same question above BUT with multiple messages from multiple clients?
If the messages are not ordered, how to achieve that? (in java)
Sorry about all those questions but for me all of them are related...
Edit:
As the comment said, I misunderstood the concept of synchronization, so changed that part.
Guys we ask here to LEARN not to get judged by other SO think about that before giving -1 vote ok.
what happens if 2 clients try to connect AT THE SAME TIME
It is impossible for 2 clients to connect at exactly the same time: networking infrastructure guarantees it. Two requests happening at the exact same time is called a collision (wikipedia), and the network handles it in some way: it can be through detection or through avoidance.
How the server process multiple messages from one client? I mean, does it process in order?
Yes. The Socket class API uses the TCP/IP protocol, which includes sequence numbers in every segment, and re-orders segments so that they are processed in the order they are sent, which may be different from the order they are received.
If you used DatagramSocket instead, that would use UDP, which does not guarantee ordering.
Same question above BUT with multiple messages from multiple clients?
There are no guarantees of the relative ordering of segments sent from multiple sources.
I am working on socket programming on Java recently and something is confusing me. I have three questions about it.
First one is;
There is a ServerSocket method in Java. And this method can take up to 3 parameters such as port, backlog and ip address. Backlog means # of clients that can connect as a form of queue into a server. Now lets think about this situation.
What happens if 10 clients try to connect this server at the same
time?
Does Server drop last 5 clients which tried to connect? Lets increase the number of clients up to 1 million per hour. How can I handle all of them?
Second question is;
Can a client send messages concurrently without waiting server's response? What happens if a client sends 5 messages into server that has 5 backlog size?
The last one is not a question actually. I have a plan to manage load balancing in my mind. Lets assume we have 3 servers running on a machine.
Let the servers names are A, B and C and both of them are running smoothly. According to my plan, if I gave them a priority according to incoming messages then smallest priority means the most available server. For example;
Initial priorities -> A(0), B(0), C(0) and respond time is at the end of 5. time unit.
1.Message -> A (1), B(0), C(0)
2.Message -> A (1), B(1), C(0)
3.Message -> A (1), B(1), C(1)
4.Message -> A (2), B(1), C(1)
5.Message -> A (2), B(2), C(1)
6.Message -> A (1), B(2), C(2)
.
.
.
Is this logic good? I bet there is a far better logic. What do I do to handle more or less a few million requests in a day?
PS: All this logic is going to be implemented into Java Spring-Boot project.
Thanks
What happens if 10 clients try to connect this server at the same time?
The javadoc explains it:
The backlog argument is the requested maximum number of pending connections on the socket. Its exact semantics are implementation specific. In particular, an implementation may impose a maximum length or may choose to ignore the parameter altogther.
.
Lets increase the number of clients up to 1 million per hour. How can I handle all of them?
By accepting them fast enough to handle them all in one hour. Either the conversations are so quick that you can just handle them one after another. Or, more realistically, you will handle the various messages in several threads, or use non-blocking IO.
Can a client send messages concurrently without waiting server's response?
Yes.
What happens if a client sends 5 messages into server that has 5 backlog size?
Sending messages has nothing to do with the backlog size. The backlog is for pending connections. Messages can only be sent once you're connected.
All this logic is going to be implemented into Java Spring-Boot project.
Spring Boot is, most of the time, not used for low-level socket communication, but to expose web services. You should probably do that, and let standard solutions (a reverse proxy, software or hardware) do the load-balancing for you. Especially given that you don't seem to understand how sockets, non-blocking IO, threads, etc. work yet.
So for your first question, the backlog queue is something where the clients will be held in wait if you are busy with handling other stuff (IO with already connected client e.g.). If the list grows beyond backlog, the those news clients will get a connection refused. You should be ok with 10 clients connect at the same time. It's long discussion, but keep a thread pool, as soon you get a connected socket from accept, hand it to your thread pool and go back to wait in accept. You can't support millions of client "practically" on one single server period! You'll need to load balance.
Your second question is not clear, clients can't send messages, as long as they are on the queue, they will be taken off the queue, once you accept them & then it's not relevant how long the queue is.
And lastly your question about load balancing, I'd suggest if you are going to have to serve millions of clients, invest in some good dedicated load-balancer :), that can do round robin as well as you mentioned.
With all that said, don't reinvent the wheel :), there are some open source java servers, my favorite: https://netty.io/
I'm new to the whole UDP thing ('cause everyone loves TCP) and need to ask a few questions about Java's implementation.
I need somebody to tell me whether:
The DatagramPackets sent by Java are fragmented automatically due to
network configurations and data size.
The DatagramPackets are rearranged to be in the correct frag sequential order by Java after being fragmented automatically due to network configurations and data size... before the receive() call returns the result.
If fragmented DatagramPackets that're incomplete are dropped or generate Exceptions when dropped. (Some fragments received, others lost)
I'm concerned that Java drops it silently, or data is not arranged correctly... which would mean that I have to implement a pseudo TCP kind of thing to have both the benefits of UDP, as well as the checking of TCP.
UDP is largely implemented in the OS and Java has very little say in the matter.
packets over 576 bytes long can be fragmented;
packets can be lost;
packets can arrive out of order.
There is no way for Java, or you to tell whether these have happened.
What you can do is implement a protocol to detect this. e.g. adding a sequence number, length and checksum to the start of each packet.
which would mean that I have to implement a pseudo TCP kind of thing to have both the benefits of UDP, as well as the checking of TCP.
And now you are starting to understand why "everyone loves TCP" or most people do. UDP has its uses but for most applications TCP is simplest.
The DatagramPackets sent by Java are fragmented automatically due to network configurations and data size.
Yes, but Java has nothing to do with it.
The DatagramPackets are rearranged to be in the correct frag sequential order by Java after being fragmented automatically due to network configurations and data size... before the receive() call returns the result.
Yes, but not by Java, and only if all the fragments arrive. This happens at the IP layer.
If fragmented DatagramPackets that're incomplete are dropped or generate Exceptions when dropped. (Some fragments received, others lost)
They are dropped silently. No exception. Again Java has nothing to do with it. It all happens at the IP layer. If and only if all the fragments arrive, the datagram is reassembled and passed up to the UDP layer. Java is not involved in any way.
I'm concerned that Java drops it silently
Java does nothing. IP drops it silently.
or data is not arranged correctly
A datagram is received either intact and entire or not at all. Again Java has nothing to do with it.
which would mean that I have to implement a pseudo TCP kind of thing to have both the benefits of UDP, as well as the checking of TCP.
Correct. You do.
Sorry for such the long post. I have done lots and lots of research on this topic and keep second guessing myself on which path I should take. Thank you ahead of time for your thoughts and input.
Scenario:
Create a program in Java that sends packets of data every .2-.5 seconds that contain simple "at that moment" or live data to approximately 5-7 clients. (Approximately only 10 bytes of data max per packet). The server can be connected via wifi or ethernet. The clients however are restricted to only using Wifi. The client(s) will not be sending any packets to the server as it will only display the data retrieved from the server.
My Thoughts:
I originally started out creating the server program using TCP as the transport layer. This would use the ServerSocket class within Java. I also made it multithreaded to accept multiple clients.
TCP sounded like a great idea for various reasons.
With TCP, the message will always get sent unless the connection fails.
TCP rearranges the order of packets sent.
TCP has flow control and requires more time to set up when started.
So, perfect! The flow control is crucial for this scenario (since the client wants the most concurrent information), the setup time isn't that big of a deal, and I am pretty much guaranteed that the message will be received. So, its official, I have made my decision to go with TCP....
Except, what happens when TCP gets behind due to packet loss? "The biggest problem with TCP in this scenario is its congestion control algorithm, which treats packet loss as a sign of bandwidth limitations and automatically throttles the sending of packets. On 3G or Wi-Fi networks, this can cause a significant latency."
After seeing the prominent phrase "significant latency", I realized that it would not be good for this scenario since the client needs to see the live data at that moment, not continue receiving data from .8 seconds ago. So, I went back to the drawing board.
After doing more research, I discovered another procedure that involved UDP with Multicasting. This uses a DatagramSocket on the server side and a MulticastSocket on the client side.
UDP with Multicasting also has its advantages:
UDP is connectionless, meaning that the packets are broadcasted to anyone who is listening.
UDP is fast and is great for audio/video (although mine is simple data like a string).
UDP doesn't guarantee delivery - this can be good or bad. It will not get behind and create latency, but there is no guarantee that all client(s) might receive the data at that moment.
Sends one packet that gets passed along.
Since the rate of sending the packets will be rather quick (.2-.5 sec), I am not too concerned about losing packets (unless it is consistently dropping packets and looks to be unresponsive). The main concern I have with UDP above anything is that it doesn't know the order of which the packets were sent. So, for example, lets say I wanted to send live data of the current time. The server sends packets which contain "11:59:03", "11:59:06", "11:59:08", etc. I would not want the data to be presented to the client as "11:59:08", "11:59:03", "11:59:06", etc.
After being presented with all of the information above, here are my questions:
Does TCP "catch up" with itself when having latency issues or does it always stay behind once the latency occurs when receiving packets? If so, is it rather quick to retrieving "live" data again?
How often do the packets of data get out of order with UDP?
And above all:
In your opinion, which one do you think would work best with this scenario?
Thanks for all of the help!
Does TCP "catch up" with itself when having latency issues or does it always stay behind once the latency occurs when receiving packets?
TCP backs its transmission speed off when it encounters congestion, and attempts to recover by increasing it.
If so, is it rather quick to retrieving "live" data again?
I don't know what that sentence means, but normally the full speed is eventually restored.
How often do the packets of data get out of order with UDP?
It depends entirely on the intervening network. There is no single answer to that.
NB That's a pretty ordinary link you're citing.
UDP stands for 'User Datagram Protocol', not 'Universal Datagram Protocol'.
UDP packets do have an inherent send order, but it isn't guaranteed on receive.
'UDP is faster because there is no error-checking for packets' is meaningless. 'Error-checking for packets' implies retransmission, but that only comes into play when there has been packet loss. Comparing lossy UDP speed to lossless TCP speed is meaningless.
'UDP does error checking' is inconsistent with 'UDP is faster because there is no error-checking for packets'.
'TCP requires three packets to set up a socket connection, before any user data can be sent' and 'Once the connection is established data transfer can begin' are incorrect. The client can transmit data along with the third handshake message.
The list of TCP header fields is incomplete and the order is incorrect.
For TCP, 'message is transmitted to segment boundaries' is meaningless, as there are no messages.
'The connection is terminated by closing of all established virtual circuits' is incorrect. There are no virtual circuits. It's a packet-switching network.
Under 'handshake', and several other places, he fails to mention the TCP four-way close protocol.
The sentences 'Unlike TCP, UDP is compatible with packet broadcasts (sending to all on local network) and multicasting (send to all subscribers)' and 'UDP is compatible with packet broadcast' are meaningless (and incidentally lifted from an earlier version of a Wikipedia article). The correct term here is not 'compatible' but supports.
I'm not fond of this practice of citing arbitrary Internet rubbish here and then asking questions based on it.
UDP packets will may get out of order if there are a lot of hops between the server and the client, but more likely than getting out of order is that some will get dropped (again, the more hops the more chance of that). If your main concern with UDP is the order, and if you have control over the client, then you can simply have the client discard any messages with an earlier timestamp than the last one received.
I'm currently developing a simple P2P network as an exercise. Each node in the network sends heartbeats to a subset of the other nodes to be able to detect nodes that have left the network. Beside the heartbeat packets I send packets when new nodes join/leave the network, when they want to locate a resource (small text files), etc. All packets are UDP packets.
Whenever I receive a packet I start a new thread that handles that specific packet. I am however concerned with the amount of threads I start during one applications lifetime which adds up to quite a lot (Especially because of the heartbeats). (There is also the risk of deadlocks and the like I would like to avoid).
I thought about having a queue or something where I put all incoming packets and have a single thread handling all packets one at a time from that queue (something like the producer-consumer pattern). I would like the packets to be handled rapidly so the sender doesn't think the packet is lost.
What is the best way to handle a lot of different incoming packets without having to start a new thread for each of them? Should I go with what I have, the producer-consuming or something different?
How long does it take to your application to process one packet?
For the ping ones it is probably faster to just process them as they are received, you can put the others in a shared data structure such as a particular blocking queue, so when the queue is empty the worker threads wait for new jobs, and when a new jobs is added, a thread is awaken and will do the job.
Probably starting one thread per packet makes you consume more time on starting and stopping the threads than in actually doing the job.
If the things to do in response of a packet aren't so time consuming for all the type of packets, it might be the case that the extra time spent with the locks of the queue and scheduling threads makes your program slower rather than faster.
In any case use thread pool and start the workers in the beginning. If you want you could increase or reduce the number of working threads dynamically depending on the load of the past minutes.
I would use an event driven architecture. Creating a new thread for every packet is not scalable, so this will work to a certain amount of workload, but there is a point it won't work anymore. You could compare that to e.g. a chat program like the Facebook chat where messages are the packets.
An event driven architecture would be scalable and IMHO exactly what your looking for. Just do some googling, there libraries for many programming languages, so just pick the right one for you (I like to do that in Erlang, Scala, C or Python).
edit: ok, didn't see the java tag. But the language does not matter.
Take a look at this link for example:
http://www.nightmare.com/medusa/async_sockets.html
I find it a quite good one to get the idea of event driven programming.