I've written a rest resource that serves a .tar.gz file. It's working OK. I've tried requesting it, saving the data, unpacking it (with tar xzvf [filename]) and I get the correct data.
However, I'm trying to use java.util.zip.GZIPInputStream and org.apache.tools.tar.TarInputStream to unzip and untar a .tar.gz that I'm serving in a JUnit test, to verify that it's working automatically. This is the code in my unit test with some details removed:
HttpResponse response = <make request code here>
byte[] receivedBytes = FileHelper.copyInputStreamToByteArray(response.getEntity().getContent(), true);
GZIPInputStream gzipInputStream = new GZIPInputStream(new ByteArrayInputStream(receivedBytes));
TarInputStream tarInputStream = new TarInputStream(gzipInputStream);
TarEntry tarEntry = tarInputStream.getNextEntry();
ByteArrayOutputStream byteArrayOutputStream = null;
System.out.println("Record size: " + tarInputStream.getRecordSize());
while (tarEntry != null) // It only goes in here once
{
byteArrayOutputStream = new ByteArrayOutputStream();
tarInputStream.copyEntryContents(byteArrayOutputStream);
tarEntry = tarInputStream.getNextEntry();
}
byteArrayOutputStream.flush();
byteArrayOutputStream.close();
byte[] archivedBytes = byteArrayOutputStream.toByteArray();
byte[] actualBytes = <get actual bytes>
Assert.assertArrayEquals(actualBytes, archivedBytes);
The final assert fails with a difference at byte X = (n * 512) + 1, where n is the greatest natural number such that n * 512 <= l and l is the length of the data. That is, I get the the biggest possible multiple of 512 bytes of data correctly, but debugging the test I can see that all the remaining bytes are zero. So, if the total amount of data is 1000 bytes, the first 512 bytes in archivedBytes are correct, but the last 488 are all zero / unset, and if the total data is 262272 bytes I get the first 262144 (512*512) bytes correctly, but the remaining bytes are all zero again.
Also, the tarInputStream.getRecordSize() System out above prints Record size: 512, so I presume that this is somehow related. However, since the archive works if I download it, I guess the data must be there, and there's just something I'm missing.
Stepping into the tarInputStream.copyEntryContents(byteArrayOutputStream) with the 1000 byte data, in
int numRead = read(buf, 0, buf.length);
the numRead is 100, but looking at the buffer, only the first 512 bytes are non-zero. Maybe I shouldn't be using that method to get the data out of the TarInputStream?
If anyone knows how it's supposed to work, I'd be very grateful for any advice or help.
You can specify the output block size to be used when you create a tar archive. Thus the size of the archive will be a multiple of the block size. As the archive size doesn't normally fit in a whole number of blocks, zeros are added to the last block of data to make it of the right size.
It turned out that I was wrong in my original question, and the error was in the resource code. I wasn't closing the entry on the TarOutputStream when writing to it. I guess this was not causing any problems when requesting it manually from the server, maybe because the entry was closed with the connection or something, but working differently when being requested from a Unit test... though I must admit that doesn't make a whole lot of sense to be :P
Looking at the fragment of my writing code below, I was missing line 3.
1: tarOutputStream.putNextEntry(tarEntry);
2: tarOutputStream.write(fileRawBytes);
3: tarOutputStream.closeEntry();
4: tarOutputStream.close();
I didn't even know there was such a thing as a "closeEntry" on the TarOutputStream... I do now! :P
Related
I know similar questions exist but I haven't found any of them suitable for my problem. I have an Android device (API 15 - Android version 4.0.4) and a Linux machine running Arch Linux. The idea was to create a connection between the server (C program) and client (Android app) so I could exchange files. Also, the server supports parallel connections and requires authentication. The android app has to create 3 connections to the server (using 3 different ports, so that means 3 AsyncTask-s running multi-threaded ) .. Two of them are for parallel background processes, and 1 is for the actual file transfer. I have created a code which worked well on the Emulator (Android KitKat OS), but when testing on my own phone it doesn't work. I will post my code, and I would like some advice from you, if that is possible. Thanks.
This is the code running on Android devices... BUFFSIZE
is 1024 and it is a global variable.
I've tried setting it to many values and none of them worked for me. filesize is set earlier in the code and always has the correct value so don't worry about it :)
InputStream is = socket.getInputStream();
FileOutputStream fs = new FileOutputStream(target);
int u;
byte[] jj = new byte[BUFFSIZE];
long overall = 0, percent = 0;
try {
while (overall < filesize && mRun) {
u = is.read(jj, 0, BUFFSIZE);
if (u == -1) break;
fs.write(jj, 0, u);
overall += u;
percent = overall*100/filesize;
}
fs.flush();
fs.close();
is.close();
socket.close();
} catch (IOException ex) {
// There were no exceptions while testing
// There is some code here that deals with the UI
// which is not important
}
And this is the C code...
for (;;)
{
/* First read file in chunks of BUF_SIZE bytes */
unsigned char buff[BUFFER]={0};
int nread = fread(buffer,1,BUFFER, input);
printf("Bytes read %d \n", nread);
/* If read was success, send data. */
if(nread > 0)
{
printf("Sending \n");
write(sockfd, buffer, nread);
}
/*
* There is something tricky going on with read ..
* Either there was error, or we reached end of file.
*/
if (nread < BUFFER)
{
if (feof(input))
printf("End of file\n");
if (ferror(input))
printf("Error reading\n");
break;
}
}
I have tested this code many times, even using telnet and it worked quite well. But I am not sure about the Java code.
So, why doesn't it work? Well, what I know so far is that some files are damaged. Let's just say that if I transfer an mp3 file with the size of 4MB, 3.99 would be sent and the resting 0.01 would be lost somewhere in the middle of the file, for no reason! When you play the damaged mp3, you can realize that some parts (like, every 10 seconds) you go "off-beat".. Like if there was a small noise that is then skipped. The resulting file is shorter for around 10 000 bytes than the original (but that depends on the actual file size.. You always lose some small percentage of the file, and that means that the while loop never finishes - the download process never finishes because the sockets are blocking and the client ends up waiting for more bytes which are never received) .. What I believe that happens is that, from a 1024-byte-long buffer, around 1000 bytes are used sometimes, instead of the full 1024 buffer size, which leads to the loss of 24 bytes. I am not saying that these are the actual numbers, but, that is just something going on in my head; I am likely wrong about this. I couldn't share the whole code with you because it's really long, so I decided to use the functions that deal with the download process, instead.
It is totally fine for the read method not to fill the whole buffer. That method returns the number of bytes read. You even assign that value to a variable:
u = is.read(jj, 0, BUFFSIZE);
But then you only check if u is -1 in order to find out when to stop reading. From the documentation:
The number of bytes actually read is returned as an integer.
Which means that your byte array has a maximum length of 1024 bytes, but not all of those will be filled at each read. And of course it won't be, otherwise this would only work if your input stream contains an exakt multiple of 1024 bytes.
Also: a thing called debugging exists. It might be hard to inspect binary data such as mp3 files, so try debugging while transmitting a text file.
So I have created my own personal HTTP Server in Java from scratch.
So far it is working fine but with one major flaw.
When I try to pass big files to the browser I get a Java Heap Space error. I know how to fix this error through the JVM but I am looking for the long term solution for this.
//declare an integer for the byte length of the file
int length = (int) f.length();
//start the fileinput stream.
FileInputStream fis = new FileInputStream(f);
//byte array with the length of the file
byte[] bytes = new byte[length];
//write the file until the bytes is empty.
while ((length = fis.read(bytes)) != -1 ){
write(bytes, 0, length);
}
flush();
//close the file input stream
fis.close();
This way sends the file to the browser successfully and streams it perfectly but the issue is, because I am creating a byte array with the length of the file. When the file is very big I get the Heap Space error.
I have eliminated this issue by using a buffer as shown below and I dont get Heap Space errors anymore. BUT the way shown below does not stream the files in the browser correctly. It's as if the file bytes are being shuffled and are being sent to the browser all together.
final int bufferSize = 4096;
byte buffer[] = new byte[bufferSize];
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
while ( true )
{
int length = bis.read( buffer, 0, bufferSize );
if ( length < 0 ) break;
write( buffer, 0, length );
}
flush();
bis.close();
fis.close();
NOTE1:
All the correct Response Headers are being sent perfectly to the browser.
Note2:
Both ways work perfectly on a computer browser but only the first way works on a smartphone's browser (but sometimes it gives me Heap Space error).
If someone knows how to correctly send files to a browser and stream them correctly I would be a very very happy man.
Thank you in advance! :)
When reading from a BufferedInputStream you can allow its' buffer to handle the buffering, there is no reason to read everything into a byte[] (and certainly not a byte[] of the entire File). Read one byte at a time, and rely on the internal buffer of the stream. Something like,
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
int abyte;
while ((abyte = bis.read()) != -1 ){
write(abyte);
}
Emm... As I can see it, you try to use chunks in your code anyway,
as I can remember, even the apache HttpClient+FileUpload solution has file size limit about <=2.1GB or something (correct me if I am wrong) so it is a bit hard thing...
I haven't tried the solution yet but as a test you can use java.io.RandomAccessFile in combination with File(Input/Output)Stream on the client and server not to read and write the whole file at a time but sequence of lets say <=30MB blocks for example to avoid the annoying outofmemory errors ; An example of using RandomAccessFile can be found here https://examples.javacodegeeks.com/core-java/io/randomaccessfile/java-randomaccessfile-example/
But still you give less details :( I mean is your client suppose to be a common Java application or not?
If you have some additional information please let me know
Good luck :)
i'm trying to transfer Files with a DatagrammSocket in Java. I'm reading the files into 4096 Byte pieces. We are using ACK, so all pieces are in the right order, we tried pdf, exe, jpg and lot more stuff successfully, but iso, zip and 7z are not working. They have exactly the same size afterwards. Do you have any idea?
Reading the Parts:
byte[] b = new byte[FileTransferClient.PACKAGE_SIZE - 32];
FileInputStream read = new FileInputStream(file);
read.skip((part - 1) * (FileTransferClient.PACKAGE_SIZE - 32));
read.read(b);
content = b;
Writing the Parts:
stream = new FileOutputStream(new File(this.filePath));
stream.write(output);
...
stream.write(output);
stream.close();
(Sorry for great grammar, i'm German)
Your write() method calls are assuming that the entire buffer was filled by receive(). You must use the length provided with the DatagramPacket:
datagramSocket.receive(packet);
stream.write(packet.getData(), packet.getOffset(), packet.getLength());
If there is overhead in the packet, e.g. a sequence number, which there should be, you will need to adjust the offset and length accordingly.
NB TCP will ensure 'everything gets transferred and is not damaged'.
I have used the code in this stackoverflow discussion in order to calculate the checksum of a file in java.
I am a little confused about this working I am applying this in my problem as follows :
I have a file with some data. I have calculated the size of the text in the file using
System.out.println(file1content.toString().getBytes().length); the o/p is 4096 bytes
When i try to execute the checksum code I realize that the number of bytes being read is 4096+12 bytes, is this 12 bytes equal to the filename ?
I have another file2 with the same content as file1 ( i know this for sure because I extract the text to a String and compare it with String.equals) but the checksum generated is different. I am wondering why this is happening ?
Am I missing something here ?
Edit 1:
I am reading data from the file using the following loop :
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
do {
numRead = fis.read(buffer);
System.out.println(" "+ numRead);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
The output of numread is :
1024
1024
1024
1024
12
-1
Regards,
Bhavya
Well I found out what the bug was, I am not sure if this I introduced the bug or if it was already there.
I realised that the data being read from the file was not correct, some portions of the file were read multiple times, so I modified the code so that I could obtain data from the file by specifying the start and end positions.
In case anyone is facing this issue please let me know I can post the solution for this.
Regards,
I'm trying to figure out why this particular snippet of code isn't working for me. I've got an applet which is supposed to read a .pdf and display it with a pdf-renderer library, but for some reason when I read in the .pdf files which sit on my server, they end up as being corrupt. I've tested it by writing the files back out again.
I've tried viewing the applet in both IE and Firefox and the corrupt files occur. Funny thing is, when I trying viewing the applet in Safari (for Windows), the file is actually fine! I understand the JVM might be different, but I am still lost. I've compiled in Java 1.5. JVMs are 1.6. The snippet which reads the file is below.
public static ByteBuffer getAsByteArray(URL url) throws IOException {
ByteArrayOutputStream tmpOut = new ByteArrayOutputStream();
URLConnection connection = url.openConnection();
int contentLength = connection.getContentLength();
InputStream in = url.openStream();
byte[] buf = new byte[512];
int len;
while (true) {
len = in.read(buf);
if (len == -1) {
break;
}
tmpOut.write(buf, 0, len);
}
tmpOut.close();
ByteBuffer bb = ByteBuffer.wrap(tmpOut.toByteArray(), 0,
tmpOut.size());
//Lines below used to test if file is corrupt
//FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
//fos.write(tmpOut.toByteArray());
return bb;
}
I must be missing something, and I've been banging my head trying to figure it out. Any help is greatly appreciated. Thanks.
Edit:
To further clarify my situation, the difference in the file before I read then with the snippet and after, is that the ones I output after reading are significantly smaller than they originally are. When opening them, they are not recognized as .pdf files. There are no exceptions being thrown that I ignore, and I have tried flushing to no avail.
This snippet works in Safari, meaning the files are read in it's entirety, with no difference in size, and can be opened with any .pdf reader. In IE and Firefox, the files always end up being corrupted, consistently the same smaller size.
I monitored the len variable (when reading a 59kb file), hoping to see how many bytes get read in at each loop. In IE and Firefox, at 18kb, the in.read(buf) returns a -1 as if the file has ended. Safari does not do this.
I'll keep at it, and I appreciate all the suggestions so far.
Just in case these small changes make a difference, try this:
public static ByteBuffer getAsByteArray(URL url) throws IOException {
URLConnection connection = url.openConnection();
// Since you get a URLConnection, use it to get the InputStream
InputStream in = connection.getInputStream();
// Now that the InputStream is open, get the content length
int contentLength = connection.getContentLength();
// To avoid having to resize the array over and over and over as
// bytes are written to the array, provide an accurate estimate of
// the ultimate size of the byte array
ByteArrayOutputStream tmpOut;
if (contentLength != -1) {
tmpOut = new ByteArrayOutputStream(contentLength);
} else {
tmpOut = new ByteArrayOutputStream(16384); // Pick some appropriate size
}
byte[] buf = new byte[512];
while (true) {
int len = in.read(buf);
if (len == -1) {
break;
}
tmpOut.write(buf, 0, len);
}
in.close();
tmpOut.close(); // No effect, but good to do anyway to keep the metaphor alive
byte[] array = tmpOut.toByteArray();
//Lines below used to test if file is corrupt
//FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
//fos.write(array);
//fos.close();
return ByteBuffer.wrap(array);
}
You forgot to close fos which may result in that file being shorter if your application is still running or is abruptly terminated. Also, I added creating the ByteArrayOutputStream with the appropriate initial size. (Otherwise Java will have to repeatedly allocate a new array and copy, allocate a new array and copy, which is expensive.) Replace the value 16384 with a more appropriate value. 16k is probably small for a PDF, but I don't know how but the "average" size is that you expect to download.
Since you use toByteArray() twice (even though one is in diagnostic code), I assigned that to a variable. Finally, although it shouldn't make any difference, when you are wrapping the entire array in a ByteBuffer, you only need to supply the byte array itself. Supplying the offset 0 and the length is redundant.
Note that if you are downloading large PDF files this way, then ensure that your JVM is running with a large enough heap that you have enough room for several times the largest file size you expect to read. The method you're using keeps the whole file in memory, which is OK as long as you can afford that memory. :)
I thought I had the same problem as you, but it turned out my problem was that I assumed you always get the full buffer until you get nothing. But you do not assume that.
The examples on the net (e.g. java2s/tutorial) use a BufferedInputStream. But that does not make any difference for me.
You could check whether you actually get the full file in your loop. Than the problem would be in the ByteArrayOutputStream.
Have you tried a flush() before you close the tmpOut stream to ensure all bytes written out?