Java transfer files via UDP. Compressed files are damaged? - java

i'm trying to transfer Files with a DatagrammSocket in Java. I'm reading the files into 4096 Byte pieces. We are using ACK, so all pieces are in the right order, we tried pdf, exe, jpg and lot more stuff successfully, but iso, zip and 7z are not working. They have exactly the same size afterwards. Do you have any idea?
Reading the Parts:
byte[] b = new byte[FileTransferClient.PACKAGE_SIZE - 32];
FileInputStream read = new FileInputStream(file);
read.skip((part - 1) * (FileTransferClient.PACKAGE_SIZE - 32));
read.read(b);
content = b;
Writing the Parts:
stream = new FileOutputStream(new File(this.filePath));
stream.write(output);
...
stream.write(output);
stream.close();
(Sorry for great grammar, i'm German)

Your write() method calls are assuming that the entire buffer was filled by receive(). You must use the length provided with the DatagramPacket:
datagramSocket.receive(packet);
stream.write(packet.getData(), packet.getOffset(), packet.getLength());
If there is overhead in the packet, e.g. a sequence number, which there should be, you will need to adjust the offset and length accordingly.
NB TCP will ensure 'everything gets transferred and is not damaged'.

Related

Receiving files over socket

I am implementing a Direct Connect client. I am using the NMDC protocol. I can connect to a hub and other connected clients. I am trying to retrieve the file list from each client, I understand that in order to do that one must download the file files.xml.bz2 from the other client. The protocol to download a file is as follows:
-> $ADCGET file <filename> <params>|
<- $ADCSND file <fileName> <params>|
<- (*** binary data is now transfered from client B to client A ***)
I am trying to create a file named files.xml.bz2 using the binary data received. Here's my code:
//filesize is provided through the $ADCSND response from other client
byte[] data = new byte[filesize];
/*
Reading binary data from socket inputstream
*/
int read = 0;
for (int i=0; read<filesize;){
int available = in2.available();
int leftspace = filesize-read;
if (available>0){
in2.read(data, read, available>leftspace? leftspace:available);
++i;
}
read += (available>leftspace? leftspace:available)+1;
}
/*
writing the bytes to an actual file
*/
ByteArrayInputStream f = new ByteArrayInputStream(data);
FileOutputStream file = new FileOutputStream("files.xml.bz2");
file.write(data);
file.close();
The file is created, however, the contents (files.xml) are not readable. Opening it in firefox gives:
XML Parsing Error: not well-formed
Viewing the contents in the terminal only reads binary data. What am i doing wrong?
EDIT
I also tried Decompressing the file using the bz2 libray from Apache Ant.
ByteArrayInputStream f = new ByteArrayInputStream(data);
BZip2CompressorInputStream bzstream = new BZip2CompressorInputStream(f);
FileOutputStream xmlFile = new FileOutputStream("files.xml");
byte[] bytes = new byte[1024];
while((bzstream.read(bytes))!=-1){
xmlFile.write(bytes);
}
xmlFile.close();
bzstream.close();
I get an error, here's the stacktrace:
java.io.IOException: Stream is not in the BZip2 format
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.init(BZip2CompressorInputStream.java:240)
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:132)
at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:109)
at control.Controller$1.run(Controller.java:196)
Usual, typical misuse of available(). All you need to copy a stream in Java is as follows:
while ((count = in.read(buffer)) >= 0)
{
out.write(buffer, 0, count);
}
Use this with any size buffer greater than zero, but preferably several kilobytes. You don't need a new buffer per iteration, and you don't need to know how much data is available to read without blocking, as you have to block, otherwise you're just smoking the CPU. But you do need to know how much data was actually read per iteration, and this is the first place where your code falls down.
The error java.io.IOException: Stream is not in the BZip2 format is generated by the constructor of class BZip2CompressorInputStream. I decided to scan the bytes, looking for the magic number to make sure that the file was bz2 format, it turns out that Java was right -- it wasnt in bz2 format.
Upon examining the source code of Jucy, I saw that the reason for this was a slight error in the command I sent to the other client, in essence, this error was caused a mistake in my protocol implementation. The solution was:
Replace:
$ADCGET file files.xml.bz2 0 -1 ZL1|
With:
$ADCGET file files.xml.bz2 0 -1|
ZL1 specifies compression of the files being sent (Not necessary).

Why does my untar not contain the last bytes

I've written a rest resource that serves a .tar.gz file. It's working OK. I've tried requesting it, saving the data, unpacking it (with tar xzvf [filename]) and I get the correct data.
However, I'm trying to use java.util.zip.GZIPInputStream and org.apache.tools.tar.TarInputStream to unzip and untar a .tar.gz that I'm serving in a JUnit test, to verify that it's working automatically. This is the code in my unit test with some details removed:
HttpResponse response = <make request code here>
byte[] receivedBytes = FileHelper.copyInputStreamToByteArray(response.getEntity().getContent(), true);
GZIPInputStream gzipInputStream = new GZIPInputStream(new ByteArrayInputStream(receivedBytes));
TarInputStream tarInputStream = new TarInputStream(gzipInputStream);
TarEntry tarEntry = tarInputStream.getNextEntry();
ByteArrayOutputStream byteArrayOutputStream = null;
System.out.println("Record size: " + tarInputStream.getRecordSize());
while (tarEntry != null) // It only goes in here once
{
byteArrayOutputStream = new ByteArrayOutputStream();
tarInputStream.copyEntryContents(byteArrayOutputStream);
tarEntry = tarInputStream.getNextEntry();
}
byteArrayOutputStream.flush();
byteArrayOutputStream.close();
byte[] archivedBytes = byteArrayOutputStream.toByteArray();
byte[] actualBytes = <get actual bytes>
Assert.assertArrayEquals(actualBytes, archivedBytes);
The final assert fails with a difference at byte X = (n * 512) + 1, where n is the greatest natural number such that n * 512 <= l and l is the length of the data. That is, I get the the biggest possible multiple of 512 bytes of data correctly, but debugging the test I can see that all the remaining bytes are zero. So, if the total amount of data is 1000 bytes, the first 512 bytes in archivedBytes are correct, but the last 488 are all zero / unset, and if the total data is 262272 bytes I get the first 262144 (512*512) bytes correctly, but the remaining bytes are all zero again.
Also, the tarInputStream.getRecordSize() System out above prints Record size: 512, so I presume that this is somehow related. However, since the archive works if I download it, I guess the data must be there, and there's just something I'm missing.
Stepping into the tarInputStream.copyEntryContents(byteArrayOutputStream) with the 1000 byte data, in
int numRead = read(buf, 0, buf.length);
the numRead is 100, but looking at the buffer, only the first 512 bytes are non-zero. Maybe I shouldn't be using that method to get the data out of the TarInputStream?
If anyone knows how it's supposed to work, I'd be very grateful for any advice or help.
You can specify the output block size to be used when you create a tar archive. Thus the size of the archive will be a multiple of the block size. As the archive size doesn't normally fit in a whole number of blocks, zeros are added to the last block of data to make it of the right size.
It turned out that I was wrong in my original question, and the error was in the resource code. I wasn't closing the entry on the TarOutputStream when writing to it. I guess this was not causing any problems when requesting it manually from the server, maybe because the entry was closed with the connection or something, but working differently when being requested from a Unit test... though I must admit that doesn't make a whole lot of sense to be :P
Looking at the fragment of my writing code below, I was missing line 3.
1: tarOutputStream.putNextEntry(tarEntry);
2: tarOutputStream.write(fileRawBytes);
3: tarOutputStream.closeEntry();
4: tarOutputStream.close();
I didn't even know there was such a thing as a "closeEntry" on the TarOutputStream... I do now! :P

Get offset in file read by BufferedReader?

I'm reading a file line by line. The file is encoded by CipherOutputStream and then later compressed by DeflaterOutputStream. The file can consist of UTF-8 characters, like Russian letters, etc.
I want to obtain the offset in actually read file, or the number of bytes read by br.ReadLine() command. The problem is that the file is both encrypted, and deflated, so length of read String is larger than number of read bytes in file.
InputStream fis=tempURL.openStream(); //in tempURL I've got an URL to download
CipherInputStream cis=new CipherInputStream(fis,pbeCipher); //CipherStream
InflaterInputStream iis=new InflaterInputStream(cis); //InflaterInputStream
BufferedReader br = new BufferedReader(
new InputStreamReader(iis, "UTF8")); //BufferedReader
br.readLine();
int fSize=tempURL.openConnection().getContentLength(); //Catch FileSize
Use a CountingInputStream from the Apache Commons IO project:
InputStream fis=tempURL.openStream();
CountingInputStream countStream = new CountingInputStream(fis);
CipherInputStream cis=new CipherInputStream(countStream,pbeCipher);
...
Later you can obtain the file position with countStream.getByteCount().
For compressed files, you can find that a String doesn't use a whole number of bytes so the question cannot be answered. e.g. a byte can take less than a byte when compressed (otherwise there would be no point trying to compress it)
BTW: Is usually best to compress the data before encrypting it as it will usually be much more compact. Compressing the data after it has been encrypted will only help if its output is base 64 or something similar. Compression works best when you can the contents become predictable (e.g. repeating sequences, common characters) whereas the porpose of encryption is to make the data appear unpredictable.

Can I write multiple byte arrays to an HttpClient without client-side buffering?

The Problem
I would like to upload very large files (up to 5 or 6 GB) to a web server using the HttpClient class (4.1.2) from Apache. Before sending these files, I break them into smaller chunks (100 MB, for example). Unfortunately, all of the examples I see for doing a multi-part POST using HttpClient appear to buffer the file contents before sending them (typically, a small file size is assumed). Here is such an example:
HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");
MultipartEntity mpe = new MultipartEntity();
// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));
// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);
post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);
In this example, it looks like we create a new FileBody object and add it to the MultipartEntity. In my case, where the file could be 100 MB in size, I'd rather not buffer all of that data at once. I'd like to be able to write out that data in smaller chunks (4 MB at a time, for example), eventually writing all 100 MB. I'm able to do this using the HTTPURLConnection class from Java (by writing directly to the output stream), but that class has its own set of problems, which is why I'm trying to use the Apache offerings.
My Question
Is it possible to write 100 MB of data to an HttpClient, but in smaller, iterative chunks? I don't want the client to have to buffer up to 100 MB of data before actually doing the POST. None of the examples I see seem to allow you to write directly to the output stream; they all appear to pre-package things before the execute() call.
Any tips would be appreciated!
--- Update ---
For clarification, here's what I did previously with the HTTPURLConnection class. I'm trying to figure out how to do something similar in HttpClient:
// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());
// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());
// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;
// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk
// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;
// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
// Create an appropriately sized mini-buffer (max 4 MB) for the pieces
// of this chunk we have yet to read
byte[] buffer = (bytesLeftToRead < bufferSize) ?
new byte[bytesLeftToRead] : new byte[bufferSize];
int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
out.write(buffer, 0, bytes_read); // Write that bit to the stream
bytesLeftToRead -= bytes_read;
}
// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();
If I'm understanding your question correctly, your concern is loading the whole file into memory (right?). If That is the case, you should employ Streams (such as a FileInputStream). That way, the whole file doesn't get pulled into memory at once.
If that doesn't help, and you still want to divide the file up into chunks, you could code the server to deal with multiple POSTS, concatenating the data as it gets them, and then manually split up the bytes of the file.
Personally, I prefer my first answer, but either way (or neither way if these don't help), Good luck!
Streams are definitely the way to go, I remember doing something similar a while back with some bigger files and it worked perfectly.
All you need is to wrap your custom content generation logic into HttpEntity implementation. This will give you a complete control over the process of content generation and content streaming.
And for the record: MultipartEntity shipped with HttpClient does not buffer file parts in memory prior to writing them out to the connection socket.

getResourceAsStream returns HttpInputStream not of the entire file

I am having a web application with an applet which will copy a file packed witht the applet to the client machine.
When I deploy it to webserver and use: InputStream in = getClass().getResourceAsStream("filename") ;
The in.available() always return a size of 8192 bytes for every file I tried, which means the file is corrupted when it is copied to the client computer.
The InputStream is of type HttpInputStream (sun.net.protocol.http.HttpUrlConnection$httpInputStream). But while I test applet in applet viewer, the files are copied fine, with the InputStream returned is of type BufferedInputStream, which has the file's byte sizes. I guess that when getResourceStream in file system the BufferedInputStream will be used and when at http protocol, HttpInputStream will be used.
How will I copy the file completely, is there a size limited for HttpInputStream?
Thanks a lot.
in.available() tells you how many bytes you can read without blocking, not the total number of bytes you can read from a stream.
Here's an example of copying an InputStream to an OutputStream from org.apache.commons.io.IOUtils:
public static long copyLarge(InputStream input, OutputStream output)
throws IOException {
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
The in.available() always return a size of 8192 bytes for every file I tried, which means the file is corrupted when it is copied to the client computer.
It does not mean that at all!
The in.available() method returns the number of characters that can be read without blocking. It is not the length of the stream. In general, there is no way to determine the length of an InputStream apart from reading (or skipping) all bytes in the stream.
(You may have observed that new FileInputStream("someFile").available() usually gives you the file size. But that behaviour is not guaranteed by the spec, and is certainly untrue for some kinds of file, and possibly for some kinds of file system as well. A better way to get the size of a file is new File("someFile").length(), but even that doesn't work in some cases.)
See #tdavies answer for example code for copying an entire stream's contents. There are also third party libraries that can do this kind of thing; e.g. org.apache.commons.net.io.Util.

Categories