How to parse broken messages received through TCP [closed]

How to parse broken messages received through TCP [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm sending a string message from my client application to my server application using sockets. I'm using DataOutputStream to send from client and DataInputStream to receive the message in my server. I'm sending one string from the client but I noticed that when it gets to the server, it is sometimes broken into several messages. How do I handle this or what's the best way to handle this?
I can probably read each broken message received and check each character for a delimeter to know that it is the end of one message. But is there a better way to handle this?

There is no such thing in TCP as "message". It is a stream-oriented protocol. Of course, at lower levels it is transmitted in separate packets, but you have no way to control it and what you are seeing can be different from those packets. You just read as much as available in the receiving buffer at any particular moment. You may perceive your messages as broken down, but you may as well encounter a situation where several messages arrive as combined into one piece.
So when reading a message you should either use some sort of delimiter to figure out where your message ends, or use a header with message length. If you are sending simple strings, encoding them as UTF-8 and terminating them with null bytes should work fine. For more complicated things you'll need more complicated approach, obviously.

Though about prefixing your messages with a header? It's very common to send a header with the length and checksum of the transmission. This way you're able to allocate appropriate sized buffers and verify data integrity.

A simple example of sending the length first. Note: I check the length when reading as an invalid message length can result in an OutOfMemoryException which can be confusing/alarming.
public static void writeBytes(DataOutput out, byte [] bytes) throws IOException {
out.writeInt(bytes.length);
out.write(bytes);
}
public static byte[] readBytes(DataInput in) throws IOException {
int len = in.readInt();
if (len < 0 || len > 1 << 24) throw new StreamCorruptedException("Invalid message length "+len);
byte [] bytes = new byte[len];
in.readFully(bytes);
return bytes;
}

Related

Last few chars in a string sent over socket sometimes missing in Java network program

Right now, I'm trying to write a GUI based Java tic-tac-toe game that functions over a network connection. It essentially works at this point, however I have an intermittent error in which several chars sent over the network connection are lost during gameplay. One case looked like this, when println statements were added to message sends/reads:
Player 1:
Just sent ROW 14 COLUMN 11 GAMEOVER true
Player 2:
Just received ROW 14 COLUMN 11 GAMEOV
Im pretty sure the error is happening when I read over the network. The read takes place in its own thread, with a BufferedReader wrapped around the socket's InputStream, and looks like this:
try {
int input;
while((input = dataIn.read()) != -1 ){
char msgChar = (char)input;
String message = msgChar + "";
while(dataIn.ready()){
msgChar = (char)dataIn.read();
message+= msgChar;
}
System.out.println("Just received " + message);
this.processMessage(message);
}
this.sock.close();
}
My sendMessage method is pretty simple, (just a write over a DataOutputStream wrapped around the socket's outputstream) so I don't think the problem is happening there:
try {
dataOut.writeBytes(message);
System.out.println("Just sent " + message);
}
Any thoughts would be highly appreciated. Thanks!

As it turns out, the ready() method guaruntees only that the next read WON'T block. Consequently, !ready() does not guaruntee that the next read WILL block. Just that it could.
I believe that the problem here had to do with the TCP stack itself. Being stream-oriented, when bytes were written to the socket, TCP makes no guarantees as to the order or grouping of the bytes it sends. I suspect that the TCP stack was breaking up the sent string in a way that made sense to it, and that in the process, the ready() method must detect some sort of underlying break in the stream, and return false, in spite of the fact that more information is available.
I refactored the code to add a newline character to every message send, then simply performed a readLine() instead. This allowed my network protocol to be dependent on the newline character as a message delimiter, rather than the ready() method. I'm happy to say this fixed the problem.
Thanks for all your input!

Try flushing the OutputStream on the sender side. The last bytes might remain in some intenal buffers.

It is really important what types of streamed objects you use to operate with data. It seems to me that this troubleshooting is created by the fact that you use DataOutputStream for sending info, but something else for receiving. Try to send and receive info by DataOutputStream and DataInputStream respectively.
Matter fact, if you send something by calling dataOut.writeBoolean(b)
but trying to receive this thing by calling dataIn.readString(), you will eventually get nothing. DataInputStream and DataOutputStream are type-sensitive. Try to refactor your code keeping it in mind.
Moreover, some input streams return on invocation of read() a single byte. Here you try to convert this one single byte into char, while in java char by default consists of two bytes.
msgChar = (char)dataIn.read();
Check whether it is a reason of data loss.

BufferedReader, other Object to get a String

In java there is another Object like BufferedReader to read data recived by server??
Because the server send a string without newline and the client don't print any string untile the Server close the connection form Timeout (the timeout message have a newline!) , after the Client print all message recived and the timeout message send by server!
help me thanks!!

Just don't read by newlines using readLine() method, but read char-by-char using read() method.
for (int c = 0; (c = reader.read()) > -1;) {
System.out.print((char) c);
}

You asked for another class to use, so in that case give Scanner a try for this. It's usually used for delimiting input based on patterns or by the types inferred from the input (e.g. reading on a byte-by-byte bases or an int-by-int basis, or some combination thereof). However, you can use it as just a general purpose "reader" here as well, to cover your use case.

When you read anything from a server, you have to strictly follow the communication protocol. For example the server might be an HTTP server or an SMTP server, it may encrypt the data before sending it, some of the data may be encoded differently, and so on.
So you should basically ask: What kind of server do I want to access? How does it send the bytes to me? And has someone else already done the work of interpreting the bytes so that I can quickly get to the data that I really want?
If it is an HTTP server, you can use the code new URL("http://example.org/").openStream(). Then you will get a stream of bytes. How you convert these bytes into characters, strings and other stuff is another task.

You could try
InputStream is = ... // from input
String text = IOUtils.toString(is);
turns the input into text, without without newlines (it preserves original newlines as well)

C sockets - randomly receiving more than one string at a time

I have encountered an interesting C sockets problem.
I am receiving incoming strings and noticed that I will, randomly, receive 3 strings at a same time for the first 2 ~ 4 strings.
For example, I am receiving the following incoming strings.
1~message~i love you\r\n
2~message~do you love me?\r\n
3~message~when are we going to meet again?\r\n
4~message~How about now?\r\n
5~message~Oh! I'm pregnant!\r\n
I added a counter to track the number of messages received and noticed that the counter sometimes does not count the first 3 strings. For example
1~message~i love you\r\n
->Line 1 received
2~message~do you love me?\r\n
3~message~when are we going to meet again?\r\n
4~message~How about now?\r\n
->Line 2 received
5~message~Oh! I'm pregnant!\r\n
->Line 3 received
The following is my code for printing the line number
int lineNo = 1;
while ((recvBytes = recv(clntSockfd, buffer, sizeof(buffer), 0)) > 0) {
printf("%s", buffer);
memset(&buffer, 0, sizeof(buffer));
printf("Line %d received\n", lineNo++);
}
I'm not sure why is this happening since this problem did not appear when i coded in Java nio.
Any ideas, folks?

Assuming you are using TCP, relating recv() calls to "messages" (or "lines") in your case is flawed. TCP, conceptually, is a stream of bytes. The sending operating system is free to group multiple send() calls into a single IP packet, as is the receiving operating system free to report multiple incoming packets as a single recv() call (assuming the buffer is large enough). It may even choose to split an incoming packet across recv calls.
So you really need to put a message structure in the data itself, eg. by scanning for line breaks in the data received.
That this didn't occur in Java was pure luck.

You are not reading till end of line. The buffer can contain more than one line.

what connection type are you using?
UDP is unreliable most of the time.
TCP is much better than UDP in terms of reliability.

Trying to packetize TCP with non-blocking IO is hard! Am I doing something wrong?

Oh how I wish TCP was packet-based like UDP is! [see comments] But alas, that's not the case, so I'm trying to implement my own packet layer. Here's the chain of events so far (ignoring writing packets)
Oh, and my Packets are very simply structured: two unsigned bytes for length, and then byte[length] data. (I can't imagine if they were any more complex, I'd be up to my ears in if statements!)
Server is in an infinite loop, accepting connections and adding them to a list of Connections.
PacketGatherer (another thread) uses a Selector to figure out which Connection.SocketChannels are ready for reading.
It loops over the results and tells each Connection to read().
Each Connection has a partial IncomingPacket and a list of Packets which have been fully read and are waiting to be processed.
On read():
Tell the partial IncomingPacket to read more data. (IncomingPacket.readData below)
If it's done reading (IncomingPacket.complete()), make a Packet from it and stick the Packet into the list waiting to be processed and then replace it with a new IncomingPacket.
There are a couple problems with this. First, only one packet is being read at a time. If the IncomingPacket needs only one more byte, then only one byte is read this pass. This can of course be fixed with a loop but it starts to get sorta complicated and I wonder if there is a better overall way.
Second, the logic in IncomingPacket is a little bit crazy, to be able to read the two bytes for the length and then read the actual data. Here is the code, boiled down for quick & easy reading:
int readBytes; // number of total bytes read so far
byte length1, length2; // each byte in an unsigned short int (see getLength())
public int getLength() { // will be inaccurate if readBytes < 2
return (int)(length1 << 8 | length2);
}
public void readData(SocketChannel c) {
if (readBytes < 2) { // we don't yet know the length of the actual data
ByteBuffer lengthBuffer = ByteBuffer.allocate(2 - readBytes);
numBytesRead = c.read(lengthBuffer);
if(readBytes == 0) {
if(numBytesRead >= 1)
length1 = lengthBuffer.get();
if(numBytesRead == 2)
length2 = lengthBuffer.get();
} else if(readBytes == 1) {
if(numBytesRead == 1)
length2 = lengthBuffer.get();
}
readBytes += numBytesRead;
}
if(readBytes >= 2) { // then we know we have the entire length variable
// lazily-instantiate data buffers based on getLength()
// read into data buffers, increment readBytes
// (does not read more than the amount of this packet, so it does not
// need to handle overflow into the next packet's data)
}
}
public boolean complete() {
return (readBytes > 2 && readBytes == getLength()+2);
}
Basically I need feedback on my code and overall process. Please suggest any improvements. Even overhauling my entire system would be okay, if you have suggestions for how better to implement the whole thing. Book recommendations are welcome too; I love books. I just get the feeling that something isn't quite right.
Here's the general solution I came up with thanks to Juliano's answer: (feel free to comment if you have any questions)
public void fillWriteBuffer() {
while(!writePackets.isEmpty() && writeBuf.remaining() >= writePackets.peek().size()) {
Packet p = writePackets.poll();
assert p != null;
p.writeTo(writeBuf);
}
}
public void fillReadPackets() {
do {
if(readBuf.position() < 1+2) {
// haven't yet received the length
break;
}
short packetLength = readBuf.getShort(1);
if(readBuf.limit() >= 1+2 + packetLength) {
// we have a complete packet!
readBuf.flip();
byte packetType = readBuf.get();
packetLength = readBuf.getShort();
byte[] packetData = new byte[packetLength];
readBuf.get(packetData);
Packet p = new Packet(packetType, packetData);
readPackets.add(p);
readBuf.compact();
} else {
// not a complete packet
break;
}
} while(true);
}

Probably this is not the answer you are looking for, but someone would have to say it: You are probably overengineering the solution for a very simple problem.
You do not have packets before they arrive completely, not even IncomingPackets. You have just a stream of bytes without defined meaning. The usual, the simple solution is to keep the incoming data in a buffer (it can be a simple byte[] array, but a proper elastic and circular buffer is recommended if performance is an issue). After each read, you check the contents of the buffer to see if you can extract an entire packet from there. If you can, you construct your Packet, discard the correct number of bytes from the beginning of the buffer and repeat. If or when you cannot extract an entire packet, you keep those incoming bytes there until the next time you read something from the socket successfully.
While you are at it, if you are doing datagram-based communication over a stream channel, I would recommend you to include a magic number at the beginning of each "packet" so that you can test that both ends of the connection are still synchronized. They may get out of sync if for some reason (a bug) one of them reads or writes the wrong number of bytes to/from the stream.

Can't you just read whatever number of bytes that are ready to be read, and feed all incoming bytes into a packet parsing state machine? That would mean treating the incoming (TCP) data stream like any other incoming data stream (via serial line, or USB, a pipe, or whatever...)
So you would have some Selector determining from which connection(s) there are incoming bytes to be read, and how many. Then you would (for each connection) read the available bytes, and then feed those bytes into a (connection specific) state machine instance (the reading and feeding could be done from the same class, though). This packet parsing state machine class would then spit out finished packets from time to time, and hand those over to whoever will handle those complete and parsed packets.
For an example packet format like
2 magic header bytes to mark the start
2 bytes of payload size (n)
n bytes of payload data
2 bytes of checksum
the state machine would have states like (try an enum, Java has those now, I gather)
wait_for_magic_byte_0,
wait_for_magic_byte_1,
wait_for_length_byte_0,
wait_for_length_byte_1,
wait_for_payload_byte (with a payload_offset variable counting),
wait_for_chksum_byte_0,
wait_for_chksum_byte_1
and on each incoming byte you can switch the state accordingly. If the incoming byte does not properly advance the state machine, discard the byte by resetting the state machine to wait_for_magic_byte_0.

Ignoring client disconnects and server shutdown for now, here's more or less traditional structure of a socket server:
Selector, handles sockets:
polls open sockets
if it's the server socket, create new Connection object
for each active client socket find the Connection, call it with event (read or write)
Connection (one per socket), handles I/O on one socket:
Communicates to Protocol via two queues, input and output
keeps two buffers, one for reading, one for writing, and respective offsets
on read event: read all available input bytes, look for message boundaries, put whole messages onto Protocol input queue, call Protocol
on write event: write the buffer, or if it's empty, take message form output queue into buffer, start writing it
Protocol (one per connection), handles application protocol exchange on one connection:
take message from input queue, parse application portion of the message
do the server work (here's where the state machine is - some messages are appropriate in one state, while not in the other), generate response message, put it onto output queue
That's it. Everything could be in a single thread. The key here is separation of responsibilities.
Hope this helps.

I think you're approaching the issue from a slightly wrong direction. Instead of thinking of packets, think of a data structure. That's what you're sending. Effectively, yes, it's an application layer packet, but just think of it as a data object. Then, at the lowest level, write a routine which will read off the wire, and output data objects. That will give you the abstraction layer I think you're looking for.

Java to/from C++ socket communication, DataInputStream and eof, binary, encryption

I'm trying to have Java server and C++ clients communicate over TCP under the following conditions: text mode, and binary/encrypted mode. My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1. If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending? It seems I'm limited to text mode. I can live with that until I need to scale, but then I have the problem that I am going to encrypt the text and add message authentication. Even if I were sending from another Java program rather than C++, encrypting a string with AES+MAC would produce binary output not a normal string. What's to prevent some encrypted sequence containing a part identical to an eof?
So, what are the solutions here?

If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending?
In most cases (including TCP/IP and similar network protocols) there is no specific data representation for an EOF. Rather, EOF is a logical abstraction that means that you have reached the end of the data stream. For example, with a Socket it means that the input side of the socket has been closed and you have read all outstanding bytes. (And for a file, it means that you have read the last bytes of the file.)
Since there is no data representation for the (logical) EOF, you don't need to worry about getting false EOFs. In short, there is no problem to be solved here.

"end of stream" in TCP is normally signaled by closing the socket -- that is what makes the stream actually end. If you don't really want the stream to end, but just to signal the end of a "packet" (to be followed, quite possibly, by other packets on the same connection), you can start each packet with an unencrypted length indicator (say, 2 or 4 bytes depending on your need). DataInputStream, according to its docs, is suitable only to receive streams sent by a DataOutputStream, which appears to have nothing to do with your use case as you describe it.

Usually when using tcp streams you have a data header format which at a minimum has a field which holds the length of data to be expected so that the receiver knows exactly how many bytes to expect. Simple example is the TLV format.

As Thomas Pornin replied to Aelx Martelli, DataInputStream is used even on data not sent by DataOutputStream or Java. My question is the consequences of, as the documentation says, DataInputStream's read() returning when the stream ends--that is, is there some sequence of bytes that read() interprets as a stream end, and that I cannot use it thus if there's any possibility of it occurring in the data I'm sending, as can be if I send generic binary data?

My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1.
No it isn't. This problem is imaginary. -1 is the return code of InputStream.read() that indicates that the peer has closed the connection. It has nothing whatsoever to do with the data being sent over the connection.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to parse broken messages received through TCP [closed] - java

Though about prefixing your messages with a header? It's very common to send a header with the length and checksum of the transmission. This way you're able to allocate appropriate sized buffers and verify data integrity.

Related

Last few chars in a string sent over socket sometimes missing in Java network program

BufferedReader, other Object to get a String

C sockets - randomly receiving more than one string at a time

Trying to packetize TCP with non-blocking IO is hard! Am I doing something wrong?

Java to/from C++ socket communication, DataInputStream and eof, binary, encryption

Categories

Resources