I'm quite new to using Java I/O as I haven't ever before and have written this to download a .mp4 file from www.kissanime.com.
The download is very, very slow at the moment (approximately 70-100kb/s) and was wondering how I could speed it up. I don't really understand the byte buffering so any help with that would be appreciated. That may be my problem, I'm not sure.
Here's my code:
protected static boolean downloadFile(URL source, File dest) {
try {
URLConnection urlConn = source.openConnection();
urlConn.setConnectTimeout(1000);
urlConn.setReadTimeout(5000);
InputStream in = urlConn.getInputStream();
FileOutputStream out = new FileOutputStream(dest);
BufferedOutputStream bout = new BufferedOutputStream(out);
int fileSize = urlConn.getContentLength();
byte[] b = new byte[65536];
int bytesDownloaded = 0, len;
while ((len = in.read(b)) != -1 && bytesDownloaded < fileSize) {
bout.write(b, 0, len);
bytesDownloaded += len;
// System.out.println((double) bytesDownloaded / 1000000.0 + "mb/" + (double) fileSize / 1000000.0 + "mb");
}
bout.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
Thanks. Any further information will be provided upon request.
I can't find any questions on here related to downloading media files, and I'm sorry if this is deemed to be a duplicate.
Try using IOUtils.toByteArray, It takes an inputstream and returns an array with all bytes, in my opinion it's generally a good idea to check the common utility packages like apache-commons and guava and see if what you're trying to do hasn't already been done
If you want to save the file from InputStream then use this bellow method of apache-commons
FileUtils.copyInputStreamToFile ()
public static void copyInputStreamToFile(InputStream source,
File destination)
throws IOException
Copies bytes from an InputStream source to a file destination. The directories up to destination will be created if they don't already exist. destination will be overwritten if it already exists. The source stream is closed.
Always use file and IO related stuff by using library if available.There are also some other utility methods available & you can explore .
IOUtils
FileUtils
Turns out that it was the vast number of redirects from the link that caused the download speed to be throttled. Thanks everyone who answered.
Related
I have a piece of code which uses the deflate algorithm to compress a file:
public static File compressOld(File rawFile) throws IOException
{
File compressed = new File(rawFile.getCanonicalPath().split("\\.")[0]
+ "_compressed." + rawFile.getName().split("\\.")[1]);
InputStream inputStream = new FileInputStream(rawFile);
OutputStream compressedWriter = new DeflaterOutputStream(new FileOutputStream(compressed));
byte[] buffer = new byte[1000];
int length;
while ((length = inputStream.read(buffer)) > 0)
{
compressedWriter.write(buffer, 0, length);
}
inputStream.close();
compressedWriter.close();
return compressed;
}
However, I'm not happy with the OutputStream copying loop since it's the "outdated" way of writing to streams. Instead, I want to use a Java 7 API method such as Files.copy:
public static File compressNew(File rawFile) throws IOException
{
File compressed = new File(rawFile.getCanonicalPath().split("\\.")[0]
+ "_compressed." + rawFile.getName().split("\\.")[1]);
OutputStream compressedWriter = new DeflaterOutputStream(new FileOutputStream(compressed));
Files.copy(compressed.toPath(), compressedWriter);
compressedWriter.close();
return compressed;
}
The latter method however does not work correctly, the compressed file is messed up and only a few bytes are copied. How come?
I see mainly two problems.
You copy from the target instead of the source. I think the copying has to be changed to Files.copy(rawFile.toPath(), compressedWriter);.
The Javadoc of copy says: "Note that if the given output stream is Flushable then its flush method may need to invoked after this method completes so as to flush any buffered output." So, you have to call the flush-method of the OutputStream after copy.
Additionally there is one more point. The Javadoc of copy says:
It is strongly recommended that the output stream be promptly closed if an I/O error occurs.
You can close the OutputStream in a finally-block to make sure it happens in case of an error. Another possibility is to use try with resources that was introduced in Java 7.
I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.
Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.
you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B
This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}
Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.
This question already has answers here:
Easy way to write contents of a Java InputStream to an OutputStream
(24 answers)
Closed 3 years ago.
FileInputStream in = new FileInputStream(myFile);
ByteArrayOutputStream out = new ByteArrayOutputStream();
Question: How can I read everything from in into out in a way which is not a hand-crafted loop with my own byte buffer?
Java 9 (and later) answer (docs):
in.transferTo(out);
Seems they finally realized that this functionality is so commonly needed that it’d better be built in. The method returns the number of bytes copied in case you need to know.
Write one method to do this, and call it from everywhere which needs the functionality. Guava already has code for this, in ByteStreams.copy. I'm sure just about any other library with "general" IO functionality has it too, but Guava's my first "go-to" library where possible. It rocks :)
In Apache Commons / IO, you can do it using IOUtils.copy(in, out):
InputStream in = new FileInputStream(myFile);
OutputStream out = new ByteArrayOutputStream();
IOUtils.copy(in, out);
But I agree with Jon Skeet, I'd rather use Guava's ByteStreams.copy(in, out)
So what Guava's ByteStreams.copy(in, out) does:
private static final int BUF_SIZE = 0x1000; // 4K
public static long copy(InputStream from, OutputStream to)
throws IOException {
checkNotNull(from);
checkNotNull(to);
byte[] buf = new byte[BUF_SIZE];
long total = 0;
while (true) {
int r = from.read(buf);
if (r == -1) {
break;
}
to.write(buf, 0, r);
total += r;
}
return total;
}
In my project I used this method:
private static void copyData(InputStream in, OutputStream out) throws Exception {
byte[] buffer = new byte[8 * 1024];
int len;
while ((len = in.read(buffer)) > 0) {
out.write(buffer, 0, len);
}
}
Alternatively to Guava one could use Apache Commons IO (old), and Apache Commons IOUtils (new as advised in the comment).
I'd use the loop, instead of importing new classes, or adding libraries to my project. The library function is probably also implemented with a loop. But that's just my personal taste.
However, my question to you: what are you trying to do? Think of the "big picture", if you want to put the entire contents of a file into a byte array, why not just do that? The size of the arrray is file.length(), and you don't need it to dynamically grow, hidden behind a ByteArrayOutputStream (unless your file is shared, and its contents can change while you read).
Another alternative: could you use a FileChannel and a ByteBuffer (java.nio)?
I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!
InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.
Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.
What is the best way to find out i java.io.InputStream contains zipped data?
Introduction
Since all the answers are 5 years old I feel a duty to write down, what's going on today. I seriously doubt one should read magic bytes of the stream! That's a low level code, it should be avoided in general.
Simple answer
miku writes:
If the Stream can be read via ZipInputStream, it should be zipped.
Yes, but in case of ZipInputStream "can be read" means that first call to .getNextEntry() returns a non-null value. No exception catching et cetera. So instead of magic bytes parsing you can just do:
boolean isZipped = new ZipInputStream(yourInputStream).getNextEntry() != null;
And that's it!
General unzipping thoughts
In general, it appeared that it's much more convenient to work with files while [un]zipping, than with streams. There are several useful libraries, plus ZipFile has got more functionality than ZipInputStream. Handling of zip files is discussed here: What is a good Java library to zip/unzip files? So if you can work with files you better do!
Code sample
I needed in my application to work with streams only. So that's the method I wrote for unzipping:
import org.apache.commons.io.IOUtils;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public boolean unzip(InputStream inputStream, File outputFolder) throws IOException {
ZipInputStream zis = new ZipInputStream(inputStream);
ZipEntry entry;
boolean isEmpty = true;
while ((entry = zis.getNextEntry()) != null) {
isEmpty = false;
File newFile = new File(outputFolder, entry.getName());
if (newFile.getParentFile().mkdirs() && !entry.isDirectory()) {
FileOutputStream fos = new FileOutputStream(newFile);
IOUtils.copy(zis, fos);
IOUtils.closeQuietly(fos);
}
}
IOUtils.closeQuietly(zis);
return !isEmpty;
}
The magic bytes for the ZIP format are 50 4B. You could test the stream (using mark and reset - you may need to buffer) but I wouldn't expect this to be a 100% reliable approach. There would be no way to distinguish it from a US-ASCII encoded text file that began with the letters PK.
The best way would be to provide metadata on the content format prior to opening the stream and then treat it appropriately.
You could check that the first four bytes of the stream are the local file header signature that starts the local file header that proceeds every file in a ZIP file, as shown in the spec here to be 50 4B 03 04.
A little test code shows this to work:
byte[] buffer = new byte[4];
try {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("so.zip"));
ZipEntry ze = new ZipEntry("HelloWorld.txt");
zos.putNextEntry(ze);
zos.write("Hello world".getBytes());
zos.close();
FileInputStream is = new FileInputStream("so.zip");
is.read(buffer);
is.close();
}
catch(IOException e) {
e.printStackTrace();
}
for (byte b : buffer) {
System.out.printf("%H ",b);
}
Gave me this output:
50 4B 3 4
Not very elegant, but reliable:
If the Stream can be read via ZipInputStream, it should be zipped.
Checking the magic number may not be the right option.
Docx files are also having similar magic number 50 4B 3 4
Since both .zip and .xlsx having the same Magic number, I couldn't find the valid zip file (if renamed).
So, I have used Apache Tika to find the exact document type.
Even if renamed the file type as zip, it finds the exact type.
Reference: https://www.baeldung.com/apache-tika
I combined answers from #McDowell and
#Innokenty to a small lib function that you can paste into you project:
public static boolean isZipStream(InputStream inputStream) {
if (inputStream == null || !inputStream.markSupported()) {
throw new IllegalArgumentException("InputStream must support mark-reset. Use BufferedInputstream()");
}
boolean isZipped = false;
try {
inputStream.mark(2048);
isZipped = new ZipInputStream(inputStream).getNextEntry() != null;
inputStream.reset();
} catch (IOException ex) {
// cannot be opend as zip.
}
return isZipped;
}
You can use the lib like this:
public static void main(String[] args) {
InputStream inputStream = new BufferedInputStream(...);
if (isZipStream(inputStream)) {
// do zip processing using inputStream
} else {
// do non-zip processing using inputStream
}
}