I'm trying to write an integration test to see if a file is downloaded correctly from a url.
I'm not sure how to test this because I expect to get the file in byte[] but I not really sure about the image that I'm comparing it to.
I thought about downloading the file manually and then convert it to bytes and take the result and paste it in the code as the expected value and than compare it to the result i get.
If you have a better idea I would be glad to hear it.
Thanks:)
Comparing the images' hash value will be helpful.
Compute the hash value before and after downloading the file.
Compare the hash values. If they are equal, your file's integrity is good.
You can use hash algorithms like MD5 or SHA-1. If the files are smaller MD5 is good. For large number of file comparison SHA-1 will be useful since there will be less collisions.
Since you are using and
expect to get the file in byte[]
There's an input stream decorator, java.security.DigestInputStream or java.security.MessageDigest, so that you can compute the digest while using the input stream.
import java.io.*;
import java.security.MessageDigest;
public class MD5Checksum {
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 );
}
return result;
}
public static void main(String args[]) {
try {
System.out.println(getMD5Checksum("apache-tomcat-5.5.17.exe"));
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Here you can find other also good code snippets.
Related
I am new to java IO and I tried to simply copy and paste a photo. I used two ways to achieve this the first works nicely but the second doesn't.
This Code works fine.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int n =0;
byte[] buf = new byte[4092];
while((n = input.read(buf))!=-1){
output. Write(buf,0,n);
output.flush();
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
But the second doesn't work , after the program finished I find the copy File with the same exact size as the original but when trying to open it ,it shows format not supported error.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
I don't understand where the problem is, it seems that the 2 sample are doing the same thing.
Is reading to and writing from byte array different from reading and writing single byte at a time ?
Isn't writing int to a Stream with write(int b) method only writes the lowest 8 bits and vice versa as said in Documentation ?
write
public abstract void write(int b)
throws IOException
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
hope someone will help.
You're not writing out the first byte - you call input.read(), check that it's not -1, but then call input.read() again:
// Broken code
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
If you just move the next input.read() call to the end of the loop, it will work:
// Working code with duplication
int byteRead = input.read();
while (byteRead != -1) {
output.write(byteRead);
output.flush();
byteRead = input.read();
}
Or you could combine the "read and test" to avoid duplication:
// Working code without duplication
int byteRead;
while ((byteRead = input.read()) != -1) {
output.write(byteRead);
output.flush();
}
However, this is still a very inefficient way of copying a stream. Copying a chunk at a time, as per your first code, is much more efficient (or using the built-in transferTo method if you're using Java 9 or higher, as rostamn79 notes).
Baeldung.com provides information on stream.transferTo() method which does not incur an additional copy to Java heap
https://www.baeldung.com/java-inputstream-to-outputstream
Example code
#Test
public void givenUsingJavaNine_whenCopyingInputStreamToOutputStream_thenCorrect() throws IOException {
String initialString = "Hello World!";
try (InputStream inputStream = new ByteArrayInputStream(initialString.getBytes());
ByteArrayOutputStream targetStream = new ByteArrayOutputStream()) {
inputStream.transferTo(targetStream);
assertEquals(initialString, new String(targetStream.toByteArray()));
}
}
See how this transferTo is called with both streams as arguments
I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException {
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE]; //final int BUFF_SIZE = 1024;
byte[] outBuff = null;
while (in.read(inBuff) > 0) {
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
out.flush();
out.close();
in.close();
}
However, it always throws
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
at java.util.Base64$Decoder.decode0(Base64.java:704)
at java.util.Base64$Decoder.decode(Base64.java:526)
at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
...
After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).
Also, as #Jon Skeet and #Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.
(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException
{
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE];
byte[] outBuff = null;
int bytesRead = 0;
while (true)
{
bytesRead = in.read(inBuff);
if (bytesRead == BUFF_SIZE)
{
outBuff = decoder.decode(inBuff);
}
else if (bytesRead > 0)
{
byte[] tempBuff = new byte[bytesRead];
System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
outBuff = decoder.decode(tempBuff);
}
else
{
out.flush();
out.close();
in.close();
return;
}
out.write(outBuff);
}
}
Special thanks to #Jon Skeet and #Tagir Valeev.
I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:
while (in.read(inBuff) > 0) {
// This always decodes the *complete* buffer
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
should be
int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
outBuff = decoder.decode(inBuff, 0, bytesRead);
out.write(outBuff);
}
I wouldn't expect this to be any faster than using wrap though.
Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.
As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the #JonSkeet answer for details.
Well, I changed
"final int BUFF_SIZE = 1024;"
into
"final int BUFF_SIZE = 1024 * 3;"
It worked!
So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...
You should records the number of bytes you have read, beside this,
You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")
I have an application where I am generating a "target file" based on a Java "source" class. I want to regenerate the target when the source changes. I have decided the best way to do this would be to get a byte[] of the class contents and calculate a checksum on the byte[].
I am looking for the best way to get the byte[] for a class. This byte[] would be equivalent to the contents of the compiled .class file. Using ObjectOutputStream does not work. The code below generates a byte[] that is much smaller than the byte contents of the class file.
// Incorrect function to calculate the byte[] contents of a Java class
public static final byte[] getClassContents(Class<?> myClass) throws IOException {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
try( ObjectOutputStream stream = new ObjectOutputStream(buffer) ) {
stream.writeObject(myClass);
}
// This byte array is much smaller than the contents of the *.class file!!!
byte[] contents = buffer.toByteArray();
return contents;
}
Is there a way to get the byte[] with the identical contents of the *.class file? Calculating the checksum is the easy part, the hard part is obtaining the byte[] contents used to calculate an MD5 or CRC32 checksum.
THis is the solution that I ended up using. I don't know if it's the most efficient implementation, but the following code uses the class loader to get the location of the *.class file and reads its contents. For simplicity, I skipped buffering of the read.
// Function to obtain the byte[] contents of a Java class
public static final byte[] getClassContents(Class<?> myClass) throws IOException {
String path = myClass.getName().replace('.', '/');
String fileName = new StringBuffer(path).append(".class").toString();
URL url = myClass.getClassLoader().getResource(fileName);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
try (InputStream stream = url.openConnection().getInputStream()) {
int datum = stream.read();
while( datum != -1) {
buffer.write(datum);
datum = stream.read();
}
}
return buffer.toByteArray();
}
I don't get what you means, but i think you are looking for this, MD5.
To check MD5 of a file, you can use this code
public String getMd5(File file)
{
DigestInputStream stream = null;
try
{
stream = new DigestInputStream(new FileInputStream(file), MessageDigest.getInstance("MD5"));
byte[] buffer = new byte[65536];
read = stream.read(buffer);
while (read >= 1) {
read = stream.read(buffer);
}
}
catch (Exception ignored)
{
int read;
return null;
}
return String.format("%1$032x", new Object[] { new BigInteger(1, stream.getMessageDigest().digest()) });
}
Then, you can store the md5 of a file in any way for exmaple XML. An exmaple of MD5 is 49e6d7e2967d1a471341335c49f46c6c so once the file name and size change, md5 will change. You can store md5 of each file in XML format and next time your run a code to check md5 and compare the md5 of each file in the xml file.
If you really want the contents of the .class file, you should read the contents of .class file, not the byte[] representation that is in memory. So something like
import java.io.*;
public class ReadSelf {
public static void main(String args[]) throws Exception {
Class classInstance = ReadSelf.class;
byte[] bytes = readClass(classInstance);
}
public static byte[] readClass(Class classInstance) throws Exception {
String name = classInstance.getName();
name = name.replaceAll("[.]", "/") + ".class";
System.out.println("Reading this: " + name);
File file = new File(name);
System.out.println("exists: " + file.exists());
return read(file);
}
public static byte[] read(File file) throws Exception {
byte[] data = new byte[(int)file.length()]; // can only read a file of size INT_MAX
DataInputStream inputStream =
new DataInputStream(
new BufferedInputStream(
new FileInputStream(file)));
int total = 0;
int nRead = 0;
try {
while((nRead = inputStream.read(data)) != -1) {
total += nRead;
}
}
finally {
inputStream.close();
}
System.out.println("Read " + total
+ " characters, which should match file length of "
+ file.length() + " characters");
return data;
}
}
I have a binary file that I need to read and save as characters or a string of 0's and 1's in the same order that they are in the binary file. I am currently able to read in the binary file, but am unable to obtain the 0's and 1's. Here is the code I am currently using:
public void read()
{
try
{
byte[] buffer = new byte[(int)infile.length()];
FileInputStream inputStream = new FileInputStream(infile);
int total = 0;
int nRead = 0;
while((nRead = inputStream.read(buffer)) != -1)
{
System.out.println(new String(buffer));
total += nRead;
}
inputStream.close();
System.out.println(total);
}
catch(FileNotFoundException ex)
{
System.out.println("File not found.");
}
catch(IOException ex)
{
System.out.println(ex);
}
}
and the output from running this with the binary file:
�, �¨Ã �¨ÊÃ
�!Cˇ¯åaÃ!Dˇ¸åÇÃ�"( ≠EÃ!J�H���û�������
����������������������������������������������������������������������������������������
156
Thanks for any help you can give.
Check out String to binary output in Java. Basically you need to take your String, convert it to a byte array, and print out each byte as a binary string.
Instead of converting the bytes directly into characters and then printing them, convert each byte into a binary string and print them out. In other words, replace
System.out.println(new String(buffer));
with
for (int i = 0; i<nRead; i++) {
String bin=Integer.toBinaryString(0xFF & buffer[i] | 0x100).substring(1);
System.out.println(bin);
}
Notice though that the bits of each byte are printed in big-endian order. There is no way to know if bits are actually stored in this order on disk.
with JBBP such operation will be very easy
public static final void main(final String ... args) throws Exception {
try (InputStream inStream = ClassLoader.getSystemClassLoader().getResourceAsStream("somefile.txt")) {
class Bits { #Bin(type = BinType.BIT_ARRAY) byte [] bits; }
for(final byte b : JBBPParser.prepare("bit [_] bits;",JBBPBitOrder.MSB0).parse(inStream).mapTo(Bits.class).bits)
System.out.print(b != 0 ? "1" : "0");
}
}
But it will not be working with huge files because parsed data will be cached in memory during operatio
Even though this response is in C, you can use the JNI to access it natively from a Java program.
Since they are in a binary format, you will not be able to read it. I would do it like this.
fstream fs;
int value; //Since you are reading bytes, change accordingly.
fs.open( fileName, is.in | is.binary );
fs.read((char *) &value, sizeof(int));
while(!fs.eof())
{
//Print or do something with value
fs.read((char *) &value, sizeof(long));
}
I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.
Well if you read the javadoc it quite clearly states:
Returns the number of bytes that can
be read from this input stream
without blocking (emphasis added by me)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}
There are few correct uses of available(), and this isn't one of them. You don't need all that junk. Memorize this:
int count;
byte[] buffer = new byte[8192]; // or more
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
That's the canonical way to copy a stream in Java.
You should not use the InputStream.available() function at all. It is only needed in very special circumstances.
You should also not create byte arrays that are larger than 1 MB. It's a waste of memory. The commonly accepted way is to read a small block (4 kB up to 1 MB) from the source file and then store only as many bytes as you have read in the destination file. Do that until you have reached the end of the source file.
available isn't a measure of how much is still to be read but more a measure how much is guaranteed to be able to read before it might EOF or block waiting for input
and put close calls in the finallies
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output;
BufferedOutputStream oBuff=0;
try{
int buffSize = 8192;
int offset=0;
byte[] buffer = new byte[buffSize];
while(true){
int len = iBuff.read(buffer,offset,buffSize-offset);
if(len==-1){//EOF write out last chunk
oBuff.write(buffer,0,offset);
break;
}
if(len+offset==buffSize){//end of buffer write out to file
try{
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
oBuff.write(buffer);
}finally{
oBuff.close();
}
++i;
offset=0;
}
offset+=len;
}//while
}finally{
iBuff.close();
}
Here is some code that splits a file. If performance is critical to you, you can experiment with the buffer size.
package so6164853;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Formatter;
public class FileSplitter {
private static String printf(String fmt, Object... args) {
Formatter formatter = new Formatter();
formatter.format(fmt, args);
return formatter.out().toString();
}
/**
* #param outputPattern see {#link Formatter}
*/
public static void splitFile(String inputFilename, long fragmentSize, String outputPattern) throws IOException {
InputStream input = new FileInputStream(inputFilename);
try {
byte[] buffer = new byte[65536];
int outputFileNo = 0;
OutputStream output = null;
long writtenToOutput = 0;
try {
while (true) {
int bytesToRead = buffer.length;
if (bytesToRead > fragmentSize - writtenToOutput) {
bytesToRead = (int) (fragmentSize - writtenToOutput);
}
int bytesRead = input.read(buffer, 0, bytesToRead);
if (bytesRead != -1) {
if (output == null) {
String outputName = printf(outputPattern, outputFileNo);
outputFileNo++;
output = new FileOutputStream(outputName);
writtenToOutput = 0;
}
output.write(buffer, 0, bytesRead);
writtenToOutput += bytesRead;
}
if (output != null && (bytesRead == -1 || writtenToOutput == fragmentSize)) {
output.close();
output = null;
}
if (bytesRead == -1) {
break;
}
}
} finally {
if (output != null) {
output.close();
}
}
} finally {
input.close();
}
}
public static void main(String[] args) throws IOException {
splitFile("d:/backup.zip", 1440 << 10, "d:/backup.zip.part%04d");
}
}
Some remarks:
Only those bytes that have actually been read from the input file are written to one of the output files.
I left out the BufferedInputStream and BufferedOutputStream since their buffer's size is only 8192 bytes, which less than the buffer I use in the code.
As soon as I open a file, I make sure that it will be closed at the end, no matter what happens. (The finally blocks.)
The code contains only one call to input.read and only one call to output.write. This makes it easier to check for correctness.
The code for splitting a file does not catch the IOException, since it doesn't know what to do in such a case. It is just passed to the caller; maybe the caller knows how to handle it.
Both #ratchet and #Voo are correct.
As for what is happening.
int max value is 2,147,483,647 (http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
14 gigabytes is 15,032,385,536 which clearly don't fit an int.
See that according to the API Javadoc (http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#available%28%29) and as stated by #Voo, this don't break the method contract at all (just isn't what you are looking for).