How does SocketChannel know when reading a file is completed? - java

I am using socket channel and NIO concept to read data from client.
How does Socket Channel knows when reading a file is completed?
ByteBuffer byteBuffer = ByteBuffer.allocate(BUFSIZE);
int nbytes = socketChannel.getChannel().read(byteBuffer);
I am reading the data all at once if the data is small, but if the data is large, I read the data in fragments and finally get the same data now I want to know how does channel understood for end of data.
Is there any way for me to know when the file reading is completed?

There are three basic options:
The protocol could specify that the length of the file should come before the data.
The protocol could specify some "end of file" marker (which would have to be invalid for data within the file, of course)
The server could close the socket when it had finished: your read call will return -1 to let you know when all the data has been read
Basically the way data is streamed, you can't rely on all the data coming down in a particular number of requests.
What protocol are you using, and can you modify it appropriately? A length prefix is usually the easiest solution.

Related

Serialize multiple protobuf messages in java and desesrialize them in Python

I want to store a bunch of protobuf messages in a file, and read them later.
In java, I can just use 'writeDelimitedTo' and 'parseDelimitedFrom' to read and write to a file. However, I want to read it in Python, which only seems to have a 'ParseFromString' method.
Some SO questions are very similar, such as, Parsing Protocol Buffers, written in Java and read in Python, but that is only for a single message: not for multiple.
From the proto guide it is written that you need to deal yourself with the size of your message:
Streaming Multiple Messages
If you want to write multiple messages to a single file or stream, it
is up to you to keep track of where one message ends and the next
begins. The Protocol Buffer wire format is not self-delimiting, so
protocol buffer parsers cannot determine where a message ends on their
own. The easiest way to solve this problem is to write the size of
each message before you write the message itself. When you read the
messages back in, you read the size, then read the bytes into a
separate buffer, then parse from that buffer. (If you want to avoid
copying bytes to a separate buffer, check out the CodedInputStream
class (in both C++ and Java) which can be told to limit reads to a
certain number of bytes.)
https://developers.google.com/protocol-buffers/docs/techniques
A simple solution could be for you to serialize each proto in base64, on a new line in your file.
Doing so, it would be pretty easy on python to parse and use them.

Read from TcpClient.GetStream() without knowing the length

I'm working on a tcp base communication protocol . As i know
there are many ways to determine when to end reading.
Closing the connection at the end of the message
Putting the length of the message before the data itself
Using a separator; some value which will never occur in the normal data (or would always be escaped somehow)
Typically i'm trying to send a file over the WiFi network (that may be Unstable and Low speed)
Cause of RSA and AES communication I don't like to close the connection each time (Can't use 1)
It's a large file that i cant predict the length of it so i cant act
as method (Can't use 2)
Checking for something special when reading and escape it when writing need a lot of process (Can't use 3)
This method should be compatible with both c# and java.
What you suggest ?
More general problems :
How to identify end of InputStream in java
C# - TcpClient - Detecting end of stream?
More Iformation
I'm coding a TCP client server communication
At first server generates and sends a RSA public code to the client.
Then the client will generate AES(key,IV) and send it back using RSA encryption.
Till here everything is fine.
But i want to send a file over this network. here is my current packet EncryptUsingAES(new AES.IV(16 byte) +file.content(any size))
In the server i can't capture all the data sent by client. So i need to know how much data to read with (TcpClient.GetStream().read(buffer , 0 , buffersize) )
Current code:
List<byte> message = new List<byte>();
int bytes = -1;
do
{
byte[] buffer = new byte[bufferrSize];
bytes = stream.Read(buffer, 0, bufferrSize);
if (bytes > 0)
{
byte[] tmp = new byte[bytes];
Array.Copy(buffer, tmp, bytes);
message.AddRange(tmp);
}
} while (bytes == bufferrSize);
Your second method is the best one. Prefixing each packet with the packet's length will create a reliable message framing protocol which will, if done correctly, ensure that all your data is received even in the same size you sent it (that is, no partial data or data being lumped together).
Recommended packet structure:
[Data length (4 bytes)][Header (1 byte)][Data (?? bytes)]
- The header in question is a single byte you can use to indicate what kind of packet this is, so that the endpoint will know what to do with it.
Sending files
The sender of a file is in 90% of the cases aware of the amount of data it is about to send (after all, it usually has the file stored locally), which means there will be no problem knowing how much of the file has been sent or not.
The method I use and recommend is that you start by sending an "info packet", which explains to the endpoint that it is about to receive a file and also how many bytes that file consists of. After that you start sending the actual data - most preferrably in chunks since it's inefficient to proccess the entire file at once (at least if it's a large file).
Always keep track of how many bytes of the file you've received so far. By doing so the receiver can automatically tell when it has received the whole file.
Send a file a few kilobytes at a time (I use 8192 bytes = 8 kB as a file buffer). That way you don't have to read the entire file into memory nor encrypt it all at the same time.
Encrypting the data
Dealing with encryption will not be a problem. If you use length-prefixing just encrypt the data itself and leave the data length header untouched. The data length header must then be generated by the size of the encrypted data, like so:
Encrypt the data.
Get the length of the encrypted data.
Produce the following packet:
[Encrypted data length][Encrypted data]
(Insert a header byte in there if you need to)
Receiving an encrypted file
Receiving an encrypted file and knowing when everything has been received is infact not very hard. Assuming you're using the above the described method for sending the file, you would just have to:
Receive the encrypted packet → decrypt it.
Get the length of the decrypted data.
Increment a variable keeping track of the amount of file-bytes received.
If the received amount is equal to the expected amount: close the file.
Additional resources/references
You can refer to two of my previous answers that I wrote about TCP length-prefixed message framing:
C# Deserializing a struct after receiving it through TCP
TCP Client to Server communication
The easiest way would be to use your #2. If you cannot predict message length, buffer up to a certain amount of bytes (like 1 KiB or something along those lines), and insert a length header for every one of those chunks instead of prefixing the whole message once.

pdf file transfer

I have implemented the program that will transfer any txt file using the udp socket in java. I am using printwriter to write and read. But using that I am not able to transfer any file other than txt (say i want to transfer pdf). In this case what should be done. I am using the below function for file write.
Output_File_Write = new PrintWriter("dummy.txt");
Output_File_Write.print(new String(p.getData()));
Writers / PrintWriters are for writing text files. They take (Unicode-based) character data and encode it using the default character encoding (or a specified one), and write that to the file.
A PDF document (as you get it from the network) is in a binary format, so you need to use a FileOutputStream to write the file.
It is also a little bit concerning that you are attempting to transfer documents using UDP. UDP provides no guarantees that the datagrams sent will all arrive, or that they will arrive in the same order as they were sent. Unless you can always fit the entire document into a single datagram, you will have to do a significant amount of work to detect that datagrams have been dropped or have arrived in the wrong order ... and take remedial action.
Using TCP would be far simpler.
AFAIK PrintWriter is meant to be used with Text. Quote from doc
Prints formatted representations of objects to a text-output stream. This class implements all of the print methods found in PrintStream. It does not contain methods for writing raw bytes, for which a program should use unencoded byte streams.
To be able to send binary data you would need to use apt API for it, for example PrintStream

Need to send multiple objects through an http output stream

I am trying to send some very large files (>200MB) through an Http output stream from a Java client to a servlet running in Tomcat.
My protocol currently packages the file contents in a byte[] and that is placed a a Map<String, Object> along with some metadata (filename, etc.), each part under a "standard" key ("FILENAME" -> "Foo", "CONTENTS" -> byte[], "USERID" -> 1234, etc.). The Map is written to the URL connection output stream (urlConnection.getOutputStream()). This works well when the file contents are small (<25MB), but I am running into Tomcat memory issues (OutOfMemoryError) when the file size is very large.
I thought of sending the metadata Map first, followed by the file contents, and finally by a checksum on the file data. The receiver servlet can then read the metadata from its input stream, then read bytes until the entire file is finished, finally followed by reading the checksum.
Would it be better to send the metadata in connection headers? If so, how? If I send the metadata down the socket first, followed by the file contents, is there some kind of standard protocol for doing this?
You will almost certainly want to use a multipart POST to send the data to the server. Then on the server you can use something like commons-fileupload to process the upload.
The good thing about commons-fileupload is that it understands that the server may not have enough memory to buffer large files and will automatically stream the uploaded data to disk once it exceeds a certain size, which is quite helpful in avoiding OutOfMemoryError type problems.
Otherwise you are going to have to implement something comparable yourself. It doesn't really make much difference how you package and send your data, so long as the server can 1) parse the upload and 2) redirect data to a file so that it doesn't ever have to buffer the entire request in memory at once. As mentioned both of these come free if you use commons-fileupload, so that's definitely what I'd recommend.
I don't have a direct answer for you but you might consider using FTP instead. Apache Mina provides FTPLets, essentially servlets that respond to FTP events (see http://mina.apache.org/ftpserver/ftplet.html for details).
This would allow you to push your data in any format without requiring the receiving end to accommodate the entire data in memory.
Regards.

Processing received data from socket

I am developing a socket application and my application needs to receive xml file over socket. The size of xml files received vary from 1k to 100k. I am now thinking of storing data that I received into a temporary file first, then pass it to the xml parser. I am not sure if it is a proper way to do it.
Another question is if I wanna do as mentioned above, should I pass file object or file path to xml parser?
Thanks in advance,
Regards
Just send it straight to the parser. That's what browsers do. Adding a temp file costs you time and space with no actual benefit.
Do you think it would work to put a BufferedReader around whatever input stream you have? It wouldn't put it into a temporary file, but it would let you hang onto that data. You can set whatever size BufferedReader you need.
Did you write your XML parser? If you didn't, what will it accept as a parameter? If you did write it, are you asking about efficiency. That is to say which object, the path or file, should your parser ask for to be most efficient?
You do not have to store the data from socket to any file. Just read whole the DataInputStream into a byte array and you can then do whatever you need. E.g. if needed create a String with the xml input to feed the parser. (I am assuming tcp sockets).
If there are preceding data you skip them so as to feed the actual xml data to the parser.

Categories