How can I optimize simple socket communication?

How can I optimize simple socket communication? - java

Client sends a message. Server reads the message and writes a reply. Client reads the reply. Repeat. Each message is shorter than 500 bytes. Sockets are not closed.
I get around 800 request+responses/per second between two desktop PCs on LAN. Network activity on the hosts are barely noticeable.
If I don't do readReply (or do it in a separate thread), throughout explodes to like 30.000 msg/sec or more! This also peaks the network activity on the hosts.
My questions:
Is 800 msg/sec a reasonable number for a request/response protocol on a single socket?
How is it that removing the readReply call can increase performance like that???
What can be done to improve this, apart from using UDP? Any other protocol that might be used?
Server:
while (true) {
message = readMessage();
writeReply("Thanks");
}
Client:
while (true) {
writeMessage("A message");
reply = readReply();
}
Notes:
I implemented this in both Java and Php and got about the same results.
Ping latency is <1 ms

The basic problem is latency: the time it takes for a network frame/package to reach the destination.
For instance, 1 ms latency limits the speed to at most 1000 frames/second. Latency of 2 ms can handle 500 fps, 10 ms gives 100 fps etc..
In this case, managing 1600 fps (800*2) is expected when latency is 0.5 ms.
I think this is because you manage to send more data per frame. It will fill up the TCP buffer in the client after a while though.
Batch (pipeline) the messages if possible. Send 10 messages from the client in a batch and then wait for the server to reply. The server should send all 10 replies in a single chunk as well. This should make the speed 10x faster in theory.

Related

CXF Webservice randomly waits 3 seconds for the Apache Tomcat response

We have a Java 8 application served by an Apache Tomcat 8 behind an Apache server, which is requesting multiple webservices in parallel using CXF. From time to time, there's one of them which lasts exactly 3 more seconds than the rest (which should be about 500ms only).
I've activated CXF debug, and now I have the place inside CXF where the 3 seconds are lost:
14/03/2018 09:20:49.061 [pool-838-thread-1] DEBUG o.a.cxf.transport.http.HTTPConduit - No Trust Decider for Conduit '{http://ws.webapp.com/}QueryWSImplPort.http-conduit'. An affirmative Trust Decision is assumed.
14/03/2018 09:20:52.077 [pool-838-thread-1] DEBUG o.a.cxf.transport.http.HTTPConduit - Sending POST Message with Headers to http://172.16.56.10:5050/services/quertServices Conduit :{http://ws.webapp.com/}QueryWSImplPort.http-conduit
As you could see, there're three seconds between these two lines. When the request is ok, it usually takes 0ms in between these two lines.
I've been looking into the CXF code, but no clue about the reason of these 3 secs...
The server application (which is also maintained by us), is served from another Apache Tomcat 6.0.49, which is behind another Apache server. The thing is that it seems that the server's Apache receives the request after the 3 seconds.
Anyone could help me?
EDIT:
We've monitored the server's send/received packages and it seems that the client's server is sending a negotiating package at the time it should, while the server is replying after 3 seconds.
These are the packages we've found:
481153 11:31:32 14/03/2018 2429.8542795 tomcat6.exe SOLTESTV010 SOLTESTV002 TCP TCP:Flags=CE....S., SrcPort=65160, DstPort=5050, PayloadLen=0, Seq=2858646321, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192 {TCP:5513, IPv4:62}
481686 11:31:35 14/03/2018 2432.8608381 tomcat6.exe SOLTESTV002 SOLTESTV010 TCP TCP:Flags=...A..S., SrcPort=5050, DstPort=65160, PayloadLen=0, Seq=436586023, Ack=2858646322, Win=8192 ( Negotiated scale factor 0x8 ) = 2097152 {TCP:5513, IPv4:62}
481687 11:31:35 14/03/2018 2432.8613607 tomcat6.exe SOLTESTV010 SOLTESTV002 TCP TCP:Flags=...A...., SrcPort=65160, DstPort=5050, PayloadLen=0, Seq=2858646322, Ack=436586024, Win=256 (scale factor 0x8) = 65536 {TCP:5513, IPv4:62}
481688 11:31:35 14/03/2018 2432.8628380 tomcat6.exe SOLTESTV010 SOLTESTV002 HTTP HTTP:Request, POST /services/consultaServices {HTTP:5524, TCP:5513, IPv4:62}
So, it seems is the server's Tomcat is the one which is blocked with something. Any clue?
EDIT 2:
Although that happened yesterday (the first server waiting 3s for the ack of the second), this is not the most common scenario. What it usually happens is what I described at the beginning (3 seconds between the two CXF's logs and the server receiving any request from the first one after 3 seconds.
There has been some times when the server (the one which receives the request), hangs for 3 seconds. For instance:
Server 1 sends 5 requests at the same time (suppossedly) to server 2.
Server 2 receives 4 of them, in that same second, and start to process them.
Server 2 finish processing 2 of those 4 requests in 30ms and replies to server 1.
More or less at this same second, there's nothing registered in the application logs.
After three seconds, logs are registered again, and the server finish to process the remaining 2 requests. So, although the process itself is about only some milliseconds, the response_time - request_time is 3 seconds and a few ms.
At this same time, the remaining request (the last one from the 5 request which were sent), is registered in the network monitor and is processed by the application in just a few milliseconds. However, the global processing time is anything more than 3s, as it has reached the server 3 seconds after being sent.
So there's like a hang in the middle of the process. 2 requests were successfully processed before this hang and replied in just a fraction of a second. 2 other request lasted a little bit more, the hang happened, and ended with a processing time of 3 seconds. The last one, reached the server just when the hang happened, so it didn't get into the application after the hang.
It sounds like a gc stop the world... but we have analyzed gc.logs and there's nothing wrong with that... could there be any other reason?
Thanks!
EDIT 3:
Looking at the TCP flags like the one I pasted last week, we've noticed that there are lots of packets with the CE flag, that is a notification of TCP congestion. We're not network experts, but have found that this could deal us to a 3 seconds of delay before the packet retransmission...
could anyone give us some help about that?
Thanks. Kind regards.

Finally, it was everything caused by the network congestion we discovered looking at the TCP flags. Our network admins has been looking at the problem, trying to reduce the congestion, reducing the timeout to retransmit.

The thing is that it seems that the server's Apache receives the
request after the 3 seconds.
How do you figure this out ? If you're looking at Apache logs, you can be misleaded by wrong time stamps.
I first thought that your Tomcat 6 takes 3 seconds to answer instead of 0 to 500ms, but from the question and the comments, it is not the case.
Hypothesis 1 : Garbage Collector
The GC is known for introducing latency.
Highlight the GC activity in your logs by using the GC verbosity parameters. If it is too difficult to correlate, you can use the jstat command with the gcutil option and you can compare easily with the Tomcat's log.
Hypothesis 2 : network timeout
Although 3s is a very short time (in comparison with the 21s TCP default timeout on Windows for example), it could be a timeout.
To track the timeouts, you can use the netstat command. With netstat -an , look for the SYN_SENT connections, and with netstat -s look for the error counters. Please check if there is any network resource that must be resolved or accessed in this guilty webservice caller.

Sockets, message rate limiter Java

Imagine I have a server that can produce messages at a rate of 10,000 messages per second. But my client can only receive up to a maximum of 1000 messages per second.
System 1
If my system sends 1000 messages in the 1st milisecond and then does nothing for the remaining 999ms.
System 2
My system sends 1 message per milisecond, hence in 1000ms (1second) it will send 1000 messages.
Q1) Which system is better given that the client can handle a maximum of 500 messages per second?
Q2) What will be the impact of system 1 on the client? Will it overwhelm the client?
Thanks

Wil it overwhelm the client: It depends of the size of your messages, and the socket buffer size. The messages the sender sends are buffered. If the client cannot consume because the buffer is full, the output stream the sender is using will block. When the client has consumed some messages, the sender can continue writing as his OutputStream gets unblocked.
A typical buffer size on a windows system used to be 8192 bytes, but size can differ by the OS and settings in the OS.
So System 1 will not overwhelm the client, it will block at a certain moment.
What is the best approach merely depends on the design of your application.
For example: I had a similar issue while writing to an Arduino via USB (not socket-client, but otherwise the same problem). In my problem, buffered messages where a problem because it were positions of a face tracking camera. Buffered positions were no longer relevant when the Arduino read them, but it MUST process them because such a buffer is a queue, and you can only get the most recent if you read out the old one's. The Arduino could never keep up with the messages being produced, because by the time a new position reached the Arduino code, it was outdated. So that was an "overwhelm".
I resolved this by using bi-directional communication. The Arduino would send a message to the producer saying: READY (to receive a message). Then the producer would send one (up-to-date) face tracking position. Then the Arduino repositioned the camera and requested a new message. This way, there was a kind of flow control, that prevented the producer to overflow the Arduino.

Neither is better. TCP will alter the actual flow whatever you do yourself.
Neither will overwhelm the client. If the client isn't keeping up, its socket receive buffer will fill up, and so wil your socket send buffer, and eventually you will block in send, or get EAGAIN/EWOULDBLOCK if you're in non-blocking mode.

Java SocketChannel write blocked while reading

I am trying to use SocketChannel.write and SocketChannel.read at the same time in two different threads (Android API Level 25).
I configured the SocketChannel as blocking mode.
For reading, I created an endless loop to read everything from server:
// Make socketChannel.read() return after 500ms at most
socketChannel.socket().setSoTimeout(500);
while(!SHUTDOWN) {
int read = socketChannel.read(buffer);
if(read > 0 ){
// Do something nice
break;
}
}
And for writing, I write data each 10 seconds.
The problem is, I found that sometimes the writing operations were blocked while reading.
But if I make the reading thread sleep for a short period in each loop, e.g. 100ms, this problem won't appear anymore.
looks like reading thread is blocking writing thread
AFAIK, TCP connections can offer bi-direction operations at the same time. Can anyone help to explain this?

As explained in TCP Wikipedia - Flow Control:
TCP uses an end-to-end flow control protocol to avoid having the
sender send data too fast for the TCP receiver to receive and process
it reliably. Having a mechanism for flow control is essential in an
environment where machines of diverse network speeds communicate. For
example, if a PC sends data to a smartphone that is slowly processing
received data, the smartphone must regulate the data flow so as not to
be overwhelmed.

Data exchange between records very slow

We have huge record set on AIX box that we send over network to Linux box and process it.
Each record is about 277 bytes in size.
complete flow is like:
i) Program A sends records to java process B (both on AIX box).
ii) Java process B on AIX sends the records to java Program C on linux. Both are communicating through java sockets where B is client and C is server.
iii) Program C processes each record and sends an ACK back to Program B.
iv) Program B sends ACK back to Program A, which then sends next record.
I tihnk all these ACKs eat up the network and overall process is becoming very slow. For eg. in latest run, it processed 330,000 records in 4 hours and then we got a socket reset and client failed.
I was trying to find out that what would be better protocol in this case to have less network traffic and finish up faster. 330,000 records in 4 hours is really slow as processing each record on Program C takes less than 5-10 seconds but over-all flow is such that we are facing this slowness issue.
Thanks in advance,
-JJ

Waiting for the ack to go all the way back to A before sending the next record will definitely slow you down because C is essentially idle while this is happening. Why don't you move to a queuing architecture? Why not create a persistent queue on C which can receive the records from A (via B) and then have one (or many) processors for this queue sitting on C.
This way you decouple how fast A can send from how fast C can process them. A's ack becomes the fact that the message was delivered to the queue successfully. I would use HornetQ for this purpose.
EDIT
The HornetQ getting-started guide is here.
If you can't use this, for the simplest non-persistent in-memory queue, simply use a ThreadPoolExecutor from Java's concurrency libraries. You create a ThreadPoolExecutor like this:
new ThreadPoolExecutor(
threadPoolSize, threadPoolSize, KEEP_ALIVE, MILLISECONDS,
new LinkedBlockingQueue<Runnable>(queueSize), ThreadPoolExecutor.DiscardOldestPolicy.discardOldest());
Where queueSize can be MAX_INT. You call execute() with a Runnable on the ThreadPool to get tasks to be carried out. So your receiving code in C can simply pop these Runnables created and parameterized with the Record on to the ThreadPool and then return the ack immediately to A (via B).

If each record takes 5 seconds, and there are 330,000 record, this should take 1,650,000 seconds which is 19 days. If you are taking 4 hours to process 330,000 records, are they not taking 43 ms.
One reason they might take 43 ms per request is if you are creating a closing a connection for each request. It could be sending most of its time creating/closing rather than doing. A simple way around this is to create a connection once, and only reconnect if there is an error.
If you use a persistent connection your overhead could drop below 100 micro-seconds per request.
Is there any reason you cannot send a batch of data of say 1000 records to process, which would return 1 ACK and cut the overhead by a factor of 1000?

"Lost" UDP packets (JBoss + DatagramSocket)

I develop part of some JBoss+EJB based enterprise application. My module needs to process huge amount of incoming UDP packets. I've done some load testing and it looks that in case of sending packets with 11ms interval everything is fine, but in case of 10ms interval some packets are lost. It's rather strange in my opinion, but I done 10/11ms interval load tests comparison several times and it is always the same result (10 ms - some "lost" packets, 11ms - everything's fine).
If it was something wrong with synchronization, I'd expect that it will also be visible in case of 11ms tests (at least one packet lost, or at least one wrong counter value).
So if it is not because of synchronization, then maybe DatagramSocket through which I receive packets doesn't work as expected.
I found that receive buffer size (SO_RCVBUF) has default 57344 value (probably it's underlying IO network buffers dependent). I suspect, that maybe when this buffer goes full, then new incoming UDP datagrams are rejected. I tried set this value to some higher, but I noticed that if I exaggerate, buffer returns to its default size. If it's underlying layer dependent how can I find out maximum buffer size for certain OS/network card from JBoss level?
Is it possible that it is caused by receive buffer size, or maybe 57344 value is big enough to handle most cases? Do you have any experience with such issues?
There is no timeout set on my DatagramSocket. My UDP datagrams contains about 70 bytes of data (value without datagram header included).
[Edited]
I have to use UDP because I receive Cisco Netflow data - it is protocol used by network devices to send some traffic statistics. Also, I have no influence on sent bytes format (e.g. I cannot add counters for packets and so on). It is not expected that all packets will be processed (some datagrams may be lost), but I'd expect that I will process most of packets. During 10ms interval tests, about 30% of packets were lost.
It is not very possible that slow processing causes this issue. Currently singleton component holds reference to DatagramSocket calling receive method in a loop. When packet is received, it is passed to the queue, and processed by picked from pool stateless component. "Facade" Singleton is only responsible for receiving packets and passing it on to the processing (it does not wait for processing complete event).
Thanks in advance,
Piotr

UDP does not guarantee delivery, so you can tweak parameters, but you can't guarantee that the message will get delivered, especially in the case of very large data transfers.
If you need to guarantee delivery, you should use TCP instead.
If you need (or want) to use UDP, you can encode each packet with a number, and also send the number of packets expected. For example, if you sent 10 large packets, you could include the information: packet 1/10, packet 2/10, etc. This way you can at least tell if you have not received all of the packets. If you have not received them, you could send a request to resend those missing packets.

UDP is inherently unreliable.
Datagrams can be thrown away at any point between sender and receiver, even within the receiver at a level below your code. Setting the recv buffer to a larger size is likely to help the networking code within your machine buffer more datagrams but you should expect that some datagrams will be lost anyway.
If your recv logic takes too long (i.e. longer than it takes for a new datagram to arrive) then you'll always be behind and you'll always miss datagrams eventually. All you can do is make sure that your recv code runs as fast as possible, perhaps move the inbound datagram to a queue and process it 'later' or on another thread but then that will just move your problem to being one where you have a queue that keeps growing.
[Re your edit...] And what's processing your queue and how does the locking work between the producer and the consumers? Change your code so that the recv logic simply increments a count and discards the data and loops back around and see if you're losing fewer datagrams; either way, UDP is unreliable, you WILL have datagrams that are discarded and you should just expect that and deal with it. Worrying about it means you're focusing on the wrong problem; make use of the data you DO get and assume that you wont get much of it and then your program will work even if the network gets congested and MOST of your datagrams get discarded.
In summary, that's just how is it with UDP.

It appears in your tests that only up to two packets can be in the buffer so if each packet is less than 28KB this should be fine.
As you know UDP is lossy, but you should be able to send more than one packet per 10 ms. I suggest you write a simple receiver which just listens to packets just to determine if its your application or something at the network/OS level. (I suspect the later)

I don't know Java but ... does the API allow you to invoke an asynch listen/receive for a datagram:
Use O/S API to do a receive (passing your application-level buffer as a paremeter)
(Wait while there's nothing to receive...)
(O/S receives something from the network...)
O/S puts the received packet into the buffer and completes/returns your API call
If that's true then I suggest that you do several concurrent instances of the API call, so that there are several concurrent application-level buffers into which multiple packets can be received.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.