InputStream in.read() behaving differently than expected - java

I am trying to transfer a text file to another server using TCP and it is behaving differently than expected. The code sending the data is:
System.out.println("sending file name...");
String outputFileNameWithDelimiter = outputFileName + "\r\n"; //These 4 lines send the fileName with the delimiter
byte[] fileNameData = outputFileNameWithDelimiter.getBytes("US-ASCII");
outToCompression.write(fileNameData, 0, fileNameData.length);
outToCompression.flush();
System.out.println("sending content...");
System.out.println(new String(buffer, dataBegin, dataEnd-dataBegin));
outToCompression.write(buffer, dataBegin, dataEnd-dataBegin); //send the content
outToCompression.flush();
System.out.println("sending magic String...");
byte[] magicStringData = "--------MagicStringCSE283Miami".getBytes("US-ASCII"); //sends the magic string to tell Compression server the data being sent is done
outToCompression.write(magicStringData, 0, magicStringData.length);
outToCompression.flush();
Because this is TCP and you can't send discrete packets like in UDP, I expected all of the data to be in the input stream and I could just use delimiters to separate the file name, content, and ending string and then each in.read() would just give me the next subsequent amount of data.
Instead this is the data I am getting on each read:
On the first in.read() byteBuffer appears to only have "fileName\r\n".
On the second in.read() byteBuffer still has the same information.
On the third in.read() byteBuffer now holds the content I sent.
On the fourth in.read() byteBuffer holds the content I sent minus a few letters.
On the fifth in.read() I get the magicString + part of the message.
I am flushing on every send from the Webserver, but input streams don't seem to implement flushable.
Can anyone explain why this is happening?
EDIT:
This is how I am reading things in. Basically this in a loop then writing to a file.
in.read(byteBuffer, 0, BUFSIZE);

If your expectation is that read will fill the buffer, or receive exactly what was sent by a single write() by the peer, it is your expectation that is at fault here, not read(). it isn't specified to transfer more than one byte at a time, and there is no guarantee about preserving write boundaries.
It is quite impossible to write correct code without storing the result of read() into a variable.

When you read from an InputStream, you're giving it a byte array to write into (and optionally an offset and a maximum amount to read). InputStream makes no guarantees that the array will be filled with fresh data. The return value is the number of bytes that was actually read into it.
What's happening in your example is this:
The first TCP packet comes in with "fileName\r\n", gets written into your buffer, everything fine so far.
You call read() again, but the next packet hasn't arrived yet. read() will have returned 0, because it didn't want to block until data arrived. So the buffer still contains "fileName\r\n". Edit: as pointed out, read() always blocks until it reads at least one byte. Don't really know why the buffer didn't change then.
On the third read, the content has arrived.
The first bit of the content gets overwritten with the second part of the message, the last bit still contains part of the old message (I think that's what you meant).
etc., you get the idea
You need to check the return value, wait for the data to arrive, and only use as much of the buffer as was written by the last read().

Related

Last few chars in a string sent over socket sometimes missing in Java network program

Right now, I'm trying to write a GUI based Java tic-tac-toe game that functions over a network connection. It essentially works at this point, however I have an intermittent error in which several chars sent over the network connection are lost during gameplay. One case looked like this, when println statements were added to message sends/reads:
Player 1:
Just sent ROW 14 COLUMN 11 GAMEOVER true
Player 2:
Just received ROW 14 COLUMN 11 GAMEOV
Im pretty sure the error is happening when I read over the network. The read takes place in its own thread, with a BufferedReader wrapped around the socket's InputStream, and looks like this:
try {
int input;
while((input = dataIn.read()) != -1 ){
char msgChar = (char)input;
String message = msgChar + "";
while(dataIn.ready()){
msgChar = (char)dataIn.read();
message+= msgChar;
}
System.out.println("Just received " + message);
this.processMessage(message);
}
this.sock.close();
}
My sendMessage method is pretty simple, (just a write over a DataOutputStream wrapped around the socket's outputstream) so I don't think the problem is happening there:
try {
dataOut.writeBytes(message);
System.out.println("Just sent " + message);
}
Any thoughts would be highly appreciated. Thanks!
As it turns out, the ready() method guaruntees only that the next read WON'T block. Consequently, !ready() does not guaruntee that the next read WILL block. Just that it could.
I believe that the problem here had to do with the TCP stack itself. Being stream-oriented, when bytes were written to the socket, TCP makes no guarantees as to the order or grouping of the bytes it sends. I suspect that the TCP stack was breaking up the sent string in a way that made sense to it, and that in the process, the ready() method must detect some sort of underlying break in the stream, and return false, in spite of the fact that more information is available.
I refactored the code to add a newline character to every message send, then simply performed a readLine() instead. This allowed my network protocol to be dependent on the newline character as a message delimiter, rather than the ready() method. I'm happy to say this fixed the problem.
Thanks for all your input!
Try flushing the OutputStream on the sender side. The last bytes might remain in some intenal buffers.
It is really important what types of streamed objects you use to operate with data. It seems to me that this troubleshooting is created by the fact that you use DataOutputStream for sending info, but something else for receiving. Try to send and receive info by DataOutputStream and DataInputStream respectively.
Matter fact, if you send something by calling dataOut.writeBoolean(b)
but trying to receive this thing by calling dataIn.readString(), you will eventually get nothing. DataInputStream and DataOutputStream are type-sensitive. Try to refactor your code keeping it in mind.
Moreover, some input streams return on invocation of read() a single byte. Here you try to convert this one single byte into char, while in java char by default consists of two bytes.
msgChar = (char)dataIn.read();
Check whether it is a reason of data loss.

Java: Is copying a ByteBuffer to a byte[] array a performance no-no?

Basically, my situation is this:
Server streams data from the client connection to a ByteBuffer object called inQueue. This contains whatever the most recent stream of data is
Server must process the data in each of these streams and expect a packet of data in a specific format
The payload of data is to be read into a byte[] object then processed separately
Now my question boils down to this: is copying the remaining buffer data (the payload) to a byte[] array bad for performance?
Here's what it would look like:
// pretend we're reading the packet ID and length
// int len = LENGTH OF PACKET PAYLOAD
/*
* Mark the starting position of the packet's payload.
*/
int pos = inQueue.position();
byte[] payload = new byte[len];
inQueue.get(payload);
// Process the packet's payload here
/*
* Set the inQueue buffer to the length added to the previous position
* so as to move onto the next packet to process.
*/
inQueue.position(pos + len);
As you can see, I'm essentially doing this:
Mark the position of the complete buffer as it were just before the payload
Copy the contents of inQueue as far as the payload goes to a separate byte[] object
Set the complete buffer's position to after the payload we just read so we can read more packets
My concern is that, in doing this, I'm wasting memory by copying the buffer. Keep in mind the packets used will never exceed 500 bytes and are often under 100 bytes.
Is my concern valid, or am I being performance-paranoid? :p
You should avoid it. That's the whole reason for the ByteBuffer design: to avoid data copies.
What exactly do you mean by 'process payload here'?
With a little rearrangement of whatever happens in there, you should be able to do that directly in the ByteBuffer, calling flip() first, one or more get()s to get the data you require, and compact() afterwards (clear() if you're sure it's empty), without an intermediate copy step into yet another byte[] array.
Not only is this unnecessary but, to answer your question, no you won't notice a performance change even when scaling up.

Why does Java read random amounts from a socket but not the whole message?

I am working on a project and have a question about Java sockets. The source file which can be found here.
After successfully transmitting the file size in plain text I need to transfer binary data. (DVD .Vob files)
I have a loop such as
// Read this files size
long fileSize = Integer.parseInt(in.readLine());
// Read the block size they are going to use
int blockSize = Integer.parseInt(in.readLine());
byte[] buffer = new byte[blockSize];
// Bytes "red"
long bytesRead = 0;
int read = 0;
while(bytesRead < fileSize){
System.out.println("received " + bytesRead + " bytes" + " of " + fileSize + " bytes in file " + fileName);
read = socket.getInputStream().read(buffer);
if(read < 0){
// Should never get here since we know how many bytes there are
System.out.println("DANGER WILL ROBINSON");
break;
}
binWriter.write(buffer,0,read);
bytesRead += read;
}
I read a random number of bytes close to 99%. I am using Socket, which is TCP based,
so I shouldn't have to worry about lower layer transmission errors.
The received number changes but is always very near the end
received 7258144 bytes of 7266304 bytes in file GLADIATOR/VIDEO_TS/VTS_07_1.VOB
The app then hangs there in a blocking read. I am confounded. The server is sending the correct
file size and has a successful implementation in Ruby but I can't get the Java version to work.
Why would I read less bytes than are sent over a TCP socket?
The above is because of a bug many of you pointed out below.
BufferedReader ate 8Kb of my socket's input. The correct implementation can be found
Here
If your in is a BufferedReader then you've run into the common problem with buffering more than needed. The default buffer size of BufferedReader is 8192 characters which is approximately the difference between what you expected and what you got. So the data you are missing is inside BufferedReader's internal buffer, converted to characters (I wonder why it didn't break with some kind of conversion error).
The only workaround is to read the first lines byte-by-byte without using any buffered classes readers. Java doesn't provide an unbuffered InputStreamReader with readLine() capability as far as I know (with the exception of the deprecated DataInputStream.readLine(), as indicated in the comments below), so you have to do it yourself. I would do it by reading single bytes, putting them into a ByteArrayOutputStream until I encounter an EOL, then converting the resulting byte array into a String using the String constructor with the appropriate encoding.
Note that while you can't use a BufferedInputReader, nothing stops you from using a BufferedInputStream from the very beginning, which will make byte-by-byte reads more efficient.
Update
In fact, I am doing something like this right now, only a bit more complicated. It is an application protocol that involves exchanging some data structures that are nicely represented in XML, but they sometimes have binary data attached to them. We implemented this by having two attributes in the root XML: fragmentLength and isLastFragment. The first one indicates how much bytes of binary data follow the XML part and isLastFragment is a boolean attribute indicating the last fragment so the reading side knows that there will be no more binary data. XML is null-terminated so we don't have to deal with readLine(). The code for reading looks like this:
InputStream ins = new BufferedInputStream(socket.getInputStream());
while (!finished) {
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int b;
while ((b = ins.read()) > 0) {
buf.write(b);
}
if (b == -1)
throw new EOFException("EOF while reading from socket");
// b == 0
Document xml = readXML(new ByteArrayInputStream(buf.toByteArray()));
processAnswers(xml);
Element root = xml.getDocumentElement();
if (root.hasAttribute("fragmentLength")) {
int length = DatatypeConverter.parseInt(
root.getAttribute("fragmentLength"));
boolean last = DatatypeConverter.parseBoolean(
root.getAttribute("isLastFragment"));
int read = 0;
while (read < length) {
// split incoming fragment into 4Kb blocks so we don't run
// out of memory if the client sent a really large fragment
int l = Math.min(length - read, 4096);
byte[] fragment = new byte[l];
int pos = 0;
while (pos < l) {
int c = ins.read(fragment, pos, l - pos);
if (c == -1)
throw new EOFException(
"Preliminary EOF while reading fragment");
pos += c;
read += c;
}
// process fragment
}
Using null-terminated XML for this turned out to be a really great thing as we can add additional attributes and elements without changing the transport protocol. At the transport level we also don't have to worry about handling UTF-8 because XML parser will do it for us. In your case you're probably fine with those two lines, but if you need to add more metadata later you may wish to consider null-terminated XML too.
Here is your problem. The first few lines of the program your using in.readLine() which is probably some sort of BufferedReader. BufferedReaders will read data off the socket in 8K chunks. So when you did the first readLine() it read the first 8K into the buffer. The first 8K contains your two numbers followed by newlines, then some portion of the head of the VOB file (that's the missing chunk). Now when you switched to using the getInputStream() off the socket you are 8K into the transmission assuming your starting at zero.
socket.getInputStream().read(buffer); // you can't do this without losing data.
While the BufferedReader is nice for reading character data, switching between binary and character data in a stream is not possible with it. You'll have to switch to using InputStream instead of Reader and convert the first few portions by hand to character data. If you read the file using a buffered byte array you can read the first chunk, look for your newlines and convert everything to the left of that to character data. Then write everything to the right to your file, then start reading the rest of the file.
This used to be easier with DataInputStream, but it doesn't do a good job handling character conversion for you (readLine is deprecated with BufferedReader being the only replacement - doh). Probably should write a DataInputStream replacement that under the covers uses Charset to properly handle string conversion. Then switching between characters and binary would be easier.
Your basic problem is that BufferedReader will read as much data is available and place in its buffer. It will give you the data as you ask for it. This is the whole point of buffereing i.e. to reduce the number of calls to the OS. The only safe way to use an buffered input is to use the same buffer over the life of the connection.
In your case, you only use the buffer to read two lines, however it is highly likely that 8192 bytes has been read into the buffer. (The default size of the buffer) Say the first two lines consist of 32 bytes, this leaves 8160 waiting for you to read, however you by-pass the buffer to perform the read() on the socket directly leading to 8160 bytes left in the buffer you end up discarding. (the amount you are missing)
BTW: You should be able to see this in a debugger if you inspect the contents of your buffered reader.
Sergei may have been right about data being lost inside the buffer, but I'm not sure about his explanation. (BufferedReaders don't usually hold onto data inside their buffers. He may be thinking of a problem with BufferedWriters, which can lose data if the underlying stream is shut down prematurely.) [Never mind; I had misread Sergei's answer. The rest of this is valid AFAIK.]
I think you have a problem that's specific to your application. In your client code, you start reading as follows:
public static void recv(Socket socket){
try {
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
//...
int numFiles = Integer.parseInt(in.readLine());
... and you proceed to use in for the start of the exchange. But then you switch to using the raw socket stream:
while(bytesRead > fileSize){
read = socket.getInputStream().read(buffer);
Because in is a BufferedReader, it's already going to have filled its buffer with up to 8192 bytes from the socket input stream. Any bytes that are in that buffer, and which you don't read from in, will be lost. Your app is hanging because it believes that the server is holding onto some bytes, but the server doesn't have them.
The solution is not to do byte-by-byte reads from the socket (ouch! your poor CPU!), but to use the BufferedReader consistently. Or, to use buffering with binary data, change the BufferedReader to a BufferedInputStream that wraps the socket's InputStream.
By the way, TCP is not as reliable as many people assume it to be. For example, when the server socket closes, it's possible for it to have written data into the socket which then gets lost as the socket connection is shutdown. Calling Socket.setSoLinger can help to prevent this problem.
EDIT: Also BTW, you're playing with fire by treating byte and character data as if they're interchangeable, as you do below. If the data really is binary, then the conversion to String risks corrupting the data. Perhaps you want to be writing into a BufferedOutputStream?
// Java is retarded and reading and writing operate with
// fundamentally different types. So we write a String of
// binary data.
fileWriter.write(new String(buffer));
bytesRead += read;
EDIT 2: Clarified (or attempted to clarify :-} the handling of binary vs. String data.

Trying to packetize TCP with non-blocking IO is hard! Am I doing something wrong?

Oh how I wish TCP was packet-based like UDP is! [see comments] But alas, that's not the case, so I'm trying to implement my own packet layer. Here's the chain of events so far (ignoring writing packets)
Oh, and my Packets are very simply structured: two unsigned bytes for length, and then byte[length] data. (I can't imagine if they were any more complex, I'd be up to my ears in if statements!)
Server is in an infinite loop, accepting connections and adding them to a list of Connections.
PacketGatherer (another thread) uses a Selector to figure out which Connection.SocketChannels are ready for reading.
It loops over the results and tells each Connection to read().
Each Connection has a partial IncomingPacket and a list of Packets which have been fully read and are waiting to be processed.
On read():
Tell the partial IncomingPacket to read more data. (IncomingPacket.readData below)
If it's done reading (IncomingPacket.complete()), make a Packet from it and stick the Packet into the list waiting to be processed and then replace it with a new IncomingPacket.
There are a couple problems with this. First, only one packet is being read at a time. If the IncomingPacket needs only one more byte, then only one byte is read this pass. This can of course be fixed with a loop but it starts to get sorta complicated and I wonder if there is a better overall way.
Second, the logic in IncomingPacket is a little bit crazy, to be able to read the two bytes for the length and then read the actual data. Here is the code, boiled down for quick & easy reading:
int readBytes; // number of total bytes read so far
byte length1, length2; // each byte in an unsigned short int (see getLength())
public int getLength() { // will be inaccurate if readBytes < 2
return (int)(length1 << 8 | length2);
}
public void readData(SocketChannel c) {
if (readBytes < 2) { // we don't yet know the length of the actual data
ByteBuffer lengthBuffer = ByteBuffer.allocate(2 - readBytes);
numBytesRead = c.read(lengthBuffer);
if(readBytes == 0) {
if(numBytesRead >= 1)
length1 = lengthBuffer.get();
if(numBytesRead == 2)
length2 = lengthBuffer.get();
} else if(readBytes == 1) {
if(numBytesRead == 1)
length2 = lengthBuffer.get();
}
readBytes += numBytesRead;
}
if(readBytes >= 2) { // then we know we have the entire length variable
// lazily-instantiate data buffers based on getLength()
// read into data buffers, increment readBytes
// (does not read more than the amount of this packet, so it does not
// need to handle overflow into the next packet's data)
}
}
public boolean complete() {
return (readBytes > 2 && readBytes == getLength()+2);
}
Basically I need feedback on my code and overall process. Please suggest any improvements. Even overhauling my entire system would be okay, if you have suggestions for how better to implement the whole thing. Book recommendations are welcome too; I love books. I just get the feeling that something isn't quite right.
Here's the general solution I came up with thanks to Juliano's answer: (feel free to comment if you have any questions)
public void fillWriteBuffer() {
while(!writePackets.isEmpty() && writeBuf.remaining() >= writePackets.peek().size()) {
Packet p = writePackets.poll();
assert p != null;
p.writeTo(writeBuf);
}
}
public void fillReadPackets() {
do {
if(readBuf.position() < 1+2) {
// haven't yet received the length
break;
}
short packetLength = readBuf.getShort(1);
if(readBuf.limit() >= 1+2 + packetLength) {
// we have a complete packet!
readBuf.flip();
byte packetType = readBuf.get();
packetLength = readBuf.getShort();
byte[] packetData = new byte[packetLength];
readBuf.get(packetData);
Packet p = new Packet(packetType, packetData);
readPackets.add(p);
readBuf.compact();
} else {
// not a complete packet
break;
}
} while(true);
}
Probably this is not the answer you are looking for, but someone would have to say it: You are probably overengineering the solution for a very simple problem.
You do not have packets before they arrive completely, not even IncomingPackets. You have just a stream of bytes without defined meaning. The usual, the simple solution is to keep the incoming data in a buffer (it can be a simple byte[] array, but a proper elastic and circular buffer is recommended if performance is an issue). After each read, you check the contents of the buffer to see if you can extract an entire packet from there. If you can, you construct your Packet, discard the correct number of bytes from the beginning of the buffer and repeat. If or when you cannot extract an entire packet, you keep those incoming bytes there until the next time you read something from the socket successfully.
While you are at it, if you are doing datagram-based communication over a stream channel, I would recommend you to include a magic number at the beginning of each "packet" so that you can test that both ends of the connection are still synchronized. They may get out of sync if for some reason (a bug) one of them reads or writes the wrong number of bytes to/from the stream.
Can't you just read whatever number of bytes that are ready to be read, and feed all incoming bytes into a packet parsing state machine? That would mean treating the incoming (TCP) data stream like any other incoming data stream (via serial line, or USB, a pipe, or whatever...)
So you would have some Selector determining from which connection(s) there are incoming bytes to be read, and how many. Then you would (for each connection) read the available bytes, and then feed those bytes into a (connection specific) state machine instance (the reading and feeding could be done from the same class, though). This packet parsing state machine class would then spit out finished packets from time to time, and hand those over to whoever will handle those complete and parsed packets.
For an example packet format like
2 magic header bytes to mark the start
2 bytes of payload size (n)
n bytes of payload data
2 bytes of checksum
the state machine would have states like (try an enum, Java has those now, I gather)
wait_for_magic_byte_0,
wait_for_magic_byte_1,
wait_for_length_byte_0,
wait_for_length_byte_1,
wait_for_payload_byte (with a payload_offset variable counting),
wait_for_chksum_byte_0,
wait_for_chksum_byte_1
and on each incoming byte you can switch the state accordingly. If the incoming byte does not properly advance the state machine, discard the byte by resetting the state machine to wait_for_magic_byte_0.
Ignoring client disconnects and server shutdown for now, here's more or less traditional structure of a socket server:
Selector, handles sockets:
polls open sockets
if it's the server socket, create new Connection object
for each active client socket find the Connection, call it with event (read or write)
Connection (one per socket), handles I/O on one socket:
Communicates to Protocol via two queues, input and output
keeps two buffers, one for reading, one for writing, and respective offsets
on read event: read all available input bytes, look for message boundaries, put whole messages onto Protocol input queue, call Protocol
on write event: write the buffer, or if it's empty, take message form output queue into buffer, start writing it
Protocol (one per connection), handles application protocol exchange on one connection:
take message from input queue, parse application portion of the message
do the server work (here's where the state machine is - some messages are appropriate in one state, while not in the other), generate response message, put it onto output queue
That's it. Everything could be in a single thread. The key here is separation of responsibilities.
Hope this helps.
I think you're approaching the issue from a slightly wrong direction. Instead of thinking of packets, think of a data structure. That's what you're sending. Effectively, yes, it's an application layer packet, but just think of it as a data object. Then, at the lowest level, write a routine which will read off the wire, and output data objects. That will give you the abstraction layer I think you're looking for.

Java socket listener load problem

I have made a socket listener in Java that listens on two ports for data and does operations on the listened data. Now the scenario is such that when both the listener and the device that transmits data are up and running, the listener receives data, one at a time ( each data starts with a "#S" and ends with a ".") and when the listener is not up or is not listening, the device stores the data in its local memory and as soon as the listener is up it sends all the data in the appended form like:
"#S ...DATA...[.]#S...DATA...[.]..."
Now I have implemented this in a way that, whatever data the listener gets on either port, it converts into the hex form, and then carries out operations on the hex format of the input data.The hex form of"#S" is "2353" and the hex form of "." is "2e". The code for handling the hex-converted form of the input data is as follows.
hexconverted1 is a string that contains the hex-converted form of the whole input data, that comes on any port.
String store[];
store=hexconverted1.split("2353");
for(int m=0;m<store.length;m++)
store[m]="2353"+store[m];
PrintWriter out2 = new PrintWriter(new BufferedWriter(new FileWriter("C:/Listener/array.bin", true)));
for(int iter=0;iter<store.length; iter++)
out2.println(store[iter]);
out2.close();
What I am trying to accomplish by the above code is that, whenever a bunch of data arrives, I'm trying to scan through the data and sore every single data from the bunch and store in in a string array so that the operations I wish to carry out on the hex converted form of the data can be done in an easier manner. So when I write the contents of the array to a BIN file,the output varies for the same input. When I send a bunched data of 280 data packets, appended one after the other, at times, the array contains 180, at other times 270. But for smaller bunch sizes I get the desired results and the size of the 'store' array is also as expected.
I'm pretty clueless about whats going on and any pointers would be of great help.
To make matters more lucid, the data I get on the ports are mostly unreadable and often the only readable parts are the starting bits"#S" and the end bit".". So I'm using a combination of BufferedInputStream and InputStream to read the incoming data and convert it into the hex format and I'm quite sure that the conversion to hex is coming about alright.
im using a combination of BufferedInputStream and InputStream to read the incoming data
Clutching at straws here. If you read from a Stream using both InputStream and BufferedInputStream methods, you'll get into difficulty:
InputStream is = ...
BufferedInputStream bis = new BufferedInputStream(is);
// This is OK
int b = bis.read();
...
// Reading the InputStream directly at this point is liable to
// give unpredictable results. It is likely that some bytes still
// remain in "bis"'s buffer, and a read on "is" will not return them.
int b2 = is.read();

Categories