I am creating a simple TCP server in java which accepts clients. Each client that connects to the server sends a message, and sends an answer based on the message.
In order to accept clients I am using ServerSocket class. In order to read the client's message and write to him I am allowed to use only in the Socket, DataOutputStream and DataInputStream classes and in StandardCharsets.UTF_8. In addition, every message that the server gets and sends must be in UTF-8 encoding.
However, I am not sure how to read and write messages in UTF-8 encoding using those classes. The size of the messages is unbounded. I tried to read about the read function of DataInputStream class, but I couldn't understand how to use it (if this indeed the function I need to use).
DataOutputStream has a writeUTF() method, and DataInputStream has a readUTF() method. Just note that these methods deal in modified UTF-8 rather than standard UTF-8, and they limit the UTF-8 data to 65535 bytes max.
To provide interoperability with non-Java clients, and/or handle longer strings, would be better to just handle the UTF-8 yourself manually.
The sender can use String.getBytes() to encode a String to a byte[] array using StandardCharsets.UTF_8, then send the array's length using DataOutputStream.writeInt(), and then finally send the actual bytes using DataOutputStream.write(byte[]).
The receiver can then read the array length using DataInputStream.readInt(), then allocate a byte[] array and read it using DataInputStream.readFully(), and then finally use the String constructor to decode the bytes using StandardCharsets.UTF_8.
Related
I have a ruby program that writes data to a socket with sock.write, and I'm reading the data with ObjectInputStream in a java file. I'm getting an invalid header error that translate to the first few characters of my stream.
I've read that if you use ObjectInputStream you must write with ObjectOutputStream, but since the writing file is in ruby im not sure how to accomplish this.
As you say, ObjectInputStream assumes that the bytes it's receiving have been formatted by an ObjectOutputStream. That is, it is expecting the incoming bytes to be a specific representation of a Java primitive or object.
Your Ruby code is unlikely to format bytes in such a way.
You need to define exactly the byte format of the message passing from the Ruby to the Java process. You could tell us more about that message format, but it's likely you will need to use Java's ByteArrayInputStream (https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayInputStream.html). The data will come into the Java program as a raw array of bytes, and you will need to parse/unpack/process these bytes into whatever objects are appropriate.
Unless performance is critical, you'd probably be best off using JSON or YAML as the intermediate format. They would make it simple to send simple objects such as strings, arrays, and hashes (maps).
I have a server-side function that draws an image with the Python Imaging Library. The Java client requests an image, which is returned via socket and converted to a BufferedImage.
I prefix the data with the size of the image to be sent, followed by a CR. I then read this number of bytes from the socket input stream and attempt to use ImageIO to convert to a BufferedImage.
In abbreviated code for the client:
public String writeAndReadSocket(String request) {
// Write text to the socket
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
bufferedWriter.write(request);
bufferedWriter.flush();
// Read text from the socket
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
// Read the prefixed size
int size = Integer.parseInt(bufferedReader.readLine());
// Get that many bytes from the stream
char[] buf = new char[size];
bufferedReader.read(buf, 0, size);
return new String(buf);
}
public BufferedImage stringToBufferedImage(String imageBytes) {
return ImageIO.read(new ByteArrayInputStream(s.getBytes()));
}
and the server:
# Twisted server code here
# The analog of the following method is called with the proper client
# request and the result is written to the socket.
def worker_thread():
img = draw_function()
buf = StringIO.StringIO()
img.save(buf, format="PNG")
img_string = buf.getvalue()
return "%i\r%s" % (sys.getsizeof(img_string), img_string)
This works for sending and receiving Strings, but image conversion (usually) fails. I'm trying to understand why the images are not being read properly. My best guess is that the client is not reading the proper number of bytes, but I honestly don't know why that would be the case.
Side notes:
I realize that the char[]-to-String-to-bytes-to-BufferedImage Java logic is roundabout, but reading the bytestream directly produces the same errors.
I have a version of this working where the client socket isn't persistent, ie. the request is processed and the connection is dropped. That version works fine, as I don't need to care about the image size, but I want to learn why the proposed approach doesn't work.
BufferedReader.read() isn't guaranteed to fill the buffer, and converting the image to String and back is not only pointless but wrong.
String is not a container for binary data, and the round-trip isn't guaranteed to work.
It would be better to redesign the protocol so that you can get rid of the readLine(), and send the length in binary and can read the entire stream with a DataInputStream.
In general when dealing with binary protocols, the answer is always DataInputStream and DataOutputStream, unless the byte order isn't the canonical network byte order, which is a protocol design mistake, and in which case you need to look into byte-ordered ByteBuffers.
In the server code, your use of sys.getsizeof is wrong. That returns the size of the bytestring object, whereas what you want is the number of bytes in the bytestring, i.e. its length len(img_string).
Also, in the client code the .readLine method reads characters until it sees either '\r' possibly followed '\n' or '\n', so using '\r' as the terminator will cause a problem if the first byte of the image data happens to be 0x0A, i.e. '\n'.
I expect that the problem is that you are trying to use a Reader and getBytes() to read binary data (the image).
The Reader stack will be taking the bytes from the underlying socket stream, converting them to characters (using the platform's default character encoding), and returning them as a String. Then you convert the String contents back into bytes using the default encoding again. The initial conversion of bytes to characters is likely to be "lossy" for binary data.
The fix is not to use a Reader / BufferedReader. Use an InputStream and a BufferedInputStream. You are not making it easy for yourself by sending the image size encoded as text, but you can deal with that by reading bytes one at a time until you get the newline, and converting them "by hand" into an integer.
(If the size was sent as a fixed-sized binary integer in "network order" you could use DataInputStream instead ... )
In java there is another Object like BufferedReader to read data recived by server??
Because the server send a string without newline and the client don't print any string untile the Server close the connection form Timeout (the timeout message have a newline!) , after the Client print all message recived and the timeout message send by server!
help me thanks!!
Just don't read by newlines using readLine() method, but read char-by-char using read() method.
for (int c = 0; (c = reader.read()) > -1;) {
System.out.print((char) c);
}
You asked for another class to use, so in that case give Scanner a try for this. It's usually used for delimiting input based on patterns or by the types inferred from the input (e.g. reading on a byte-by-byte bases or an int-by-int basis, or some combination thereof). However, you can use it as just a general purpose "reader" here as well, to cover your use case.
When you read anything from a server, you have to strictly follow the communication protocol. For example the server might be an HTTP server or an SMTP server, it may encrypt the data before sending it, some of the data may be encoded differently, and so on.
So you should basically ask: What kind of server do I want to access? How does it send the bytes to me? And has someone else already done the work of interpreting the bytes so that I can quickly get to the data that I really want?
If it is an HTTP server, you can use the code new URL("http://example.org/").openStream(). Then you will get a stream of bytes. How you convert these bytes into characters, strings and other stuff is another task.
You could try
InputStream is = ... // from input
String text = IOUtils.toString(is);
turns the input into text, without without newlines (it preserves original newlines as well)
Java newbie here. Are there any helper functions to serialize data in and out of byte arrays? I am writing a Java package that implements a network protocol. So I have to write some typical variables like a version (1byte), sequence Number (long) and binary data (bytes) in a loop. How do I do this in Java? Coming from C I am thinking of creating a byte array of the required size and then since there is no memcpy() I am converting the long into a temporary byte array and then copying it into the actual byte array. It seems so inefficient and also really error prone. Is there a class I could use to marshall and unmarshall parameters to a byte array?
Also why does all the Socket classes only deals with char[] and not byte[]? A socket by definition has to deal with binary data also. How is this done in Java?
I am sure what I am missing is the Java mindset. Appreciate it if some one can point it to me.
EDIT: I did look at DataOutputStream and DataInputStream but I cannot convert the bytes to a String not to a byte[] which means the information might be lost in the conversion to write to a socket.
Pav
Have a look at DataInputStream, DataOutputStream, ObjectInputStream and ObjectOutputStream. Check first if the layout of the data is acceptable to you. Also, Serialization.
Sockets neither deal with char[] nor with byte[] but with InputStream and OutputStream which are used to read and write bytes.
If you are sending the data over a socket, then you don't need a temporary byte array at all; you can wrap the socket's OutputStream with DataOutputStream or ObjectOutputStream and just write what you want to write.
There might be an aspect I've missed that means you do actually need temporary byte arrays. If so, look at ByteArrayOutputStream. Also, there's no memcpy(), sure, but there is System.arraycopy.
As above, DataInputStream and DataOutputStream are exactly what you are looking for. Re your comment about String, if you're planning to use Java Strings over the wire, you're not designing a network protocol, youre designing a Java protocol. There are readUTF() and writeUTF() if you're sure the other end is Java or if you can code the other end to understand these formats. Or you can send as bytes along with the appropriate charset, or predefine the charset for the entire protocol if that makes sense.
I'm trying to have Java server and C++ clients communicate over TCP under the following conditions: text mode, and binary/encrypted mode. My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1. If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending? It seems I'm limited to text mode. I can live with that until I need to scale, but then I have the problem that I am going to encrypt the text and add message authentication. Even if I were sending from another Java program rather than C++, encrypting a string with AES+MAC would produce binary output not a normal string. What's to prevent some encrypted sequence containing a part identical to an eof?
So, what are the solutions here?
If I send binary data, what's to prevent a random byte sequence happening to represent an eof and falsely indicating to read() that the stream is ending?
In most cases (including TCP/IP and similar network protocols) there is no specific data representation for an EOF. Rather, EOF is a logical abstraction that means that you have reached the end of the data stream. For example, with a Socket it means that the input side of the socket has been closed and you have read all outstanding bytes. (And for a file, it means that you have read the last bytes of the file.)
Since there is no data representation for the (logical) EOF, you don't need to worry about getting false EOFs. In short, there is no problem to be solved here.
"end of stream" in TCP is normally signaled by closing the socket -- that is what makes the stream actually end. If you don't really want the stream to end, but just to signal the end of a "packet" (to be followed, quite possibly, by other packets on the same connection), you can start each packet with an unencrypted length indicator (say, 2 or 4 bytes depending on your need). DataInputStream, according to its docs, is suitable only to receive streams sent by a DataOutputStream, which appears to have nothing to do with your use case as you describe it.
Usually when using tcp streams you have a data header format which at a minimum has a field which holds the length of data to be expected so that the receiver knows exactly how many bytes to expect. Simple example is the TLV format.
As Thomas Pornin replied to Aelx Martelli, DataInputStream is used even on data not sent by DataOutputStream or Java. My question is the consequences of, as the documentation says, DataInputStream's read() returning when the stream ends--that is, is there some sequence of bytes that read() interprets as a stream end, and that I cannot use it thus if there's any possibility of it occurring in the data I'm sending, as can be if I send generic binary data?
My problem is over the eof indicator for end of stream that DataInputStream's read(byte []) uses to return with -1.
No it isn't. This problem is imaginary. -1 is the return code of InputStream.read() that indicates that the peer has closed the connection. It has nothing whatsoever to do with the data being sent over the connection.