Downloading images from URL more efficiently

Downloading images from URL more efficiently - java

Situation: I have an ArrayList<String> containing a bunch of links to images (http:/www.foo.com/bar/image1.jpg, http:/www.foo.com/bar/image2.png,... etc)
I have found a working piece of code in order to download them one by one:
public void run() {
try {
int counter = 1;
for (String image : imagesList) {
controller.setDownloadStatusTextArea("Downloading image " + counter + " of " + imagesList.size());
URL u = new URL(image);
URLConnection uc = u.openConnection();
String contentType = uc.getContentType();
int contentLength = uc.getContentLength();
InputStream raw = uc.getInputStream();
InputStream in = new BufferedInputStream(raw);
byte[] data = new byte[contentLength];
int bytesRead;
int offset = 0;
while (offset < contentLength) {
bytesRead = in.read(data, offset, data.length - offset);
if (bytesRead == -1)
break;
offset += bytesRead;
}
in.close();
if (offset != contentLength) {
throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
}
String[] tmp = image.split("/");
String filename = tmp[tmp.length - 1];
FileOutputStream out = new FileOutputStream(filename);
out.write(data);
out.flush();
out.close();
counter++;
}
controller.setDownloadStatusTextArea("Download complete");
} catch (Exception ex) {
controller.setDownloadStatusTextArea("Download failed");
}
}
This is the first time I'm doing something like this in Java, and I have a feeling this code can be much more efficient by moving a bunch of variables outside of the for loop. But I'm not sure which can be safely moved outside without affecting the functionality and/or performance (both in a negative or positive way). An insight in this situation would be greatly appreciated.
Also: Can I specify where the files need to be downloaded to? Now they just appear in the project folder, I want the user to be able to change his download folder.
Thanks in advance.

This code can't be made much more time-efficient.
Think of it this way: even if you polished every last dispensable opcode out it, the time it takes for the JVM to execute this portion of code is not significant at all. The real delay will be in waiting for the data to arrive through the network.
It could be more space-efficient, but I don't think it's necessary.
Edit: what you can do is download multiple images at the same time, using threads. If the code above looks complicated though, I would advise against it: take some more time to learn your way around the language.

You don't need to allocte a byte array for whole image... you only need a small buffer - e.g. 8 kB.
Then, read 8 kB from the connection, and write into the FileOutputStream, in a loop.
To make whole code simpler (kick out the loops), you can use e.g.
Commons-IO
(click on FRAMES link to see whole javadoc).

In a Swing application to let the user select a directory, instantiate a JFileChooser with setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY).
You could move all the variable declarations outside of the loop as long as you ensure they are properly initialized with each iteration. You won't save a lot of time relative to the time it will take to download and save the file though.

Related

Reading and writting a large file using Java NIO

How can I effectively read from a large file and write bulk data into a file using the Java NIO framework.
I'm working with ByteBuffer and FileChannel and had tried something like below:
public static void main(String[] args)
{
String inFileStr = "screen.png";
String outFileStr = "screen-out.png";
long startTime, elapsedTime;
int bufferSizeKB = 4;
int bufferSize = bufferSizeKB * 1024;
// Check file length
File fileIn = new File(inFileStr);
System.out.println("File size is " + fileIn.length() + " bytes");
System.out.println("Buffer size is " + bufferSizeKB + " KB");
System.out.println("Using FileChannel with an indirect ByteBuffer of " + bufferSizeKB + " KB");
try ( FileChannel in = new FileInputStream(inFileStr).getChannel();
FileChannel out = new FileOutputStream(outFileStr).getChannel() )
{
// Allocate an indirect ByteBuffer
ByteBuffer bytebuf = ByteBuffer.allocate(bufferSize);
startTime = System.nanoTime();
int bytesCount = 0;
// Read data from file into ByteBuffer
while ((bytesCount = in.read(bytebuf)) > 0) {
// flip the buffer which set the limit to current position, and position to 0.
bytebuf.flip();
out.write(bytebuf); // Write data from ByteBuffer to file
bytebuf.clear(); // For the next read
}
elapsedTime = System.nanoTime() - startTime;
System.out.println("Elapsed Time is " + (elapsedTime / 1000000.0) + " msec");
}
catch (IOException ex) {
ex.printStackTrace();
}
}
Can anybody tell, should I follow the same procedure if my file size in more than 2 GB?
What should I follow if the similar things I want to do while writing if written operations are in bulk?

Note that you can simply use Files.copy(Paths.get(inFileStr),Paths.get(outFileStr), StandardCopyOption.REPLACE_EXISTING) to copy the file as your example code does, just likely faster and with only one line of code.
Otherwise, if you already have opened the two file channels, you can just use
in.transferTo(0, in.size(), out) to transfer the entire contents of the in channel to the out channel. Note that this method allows to specify a range within the source file that will be transferred to the target channel’s current position (which is initially zero) and that there’s also a method for the opposite way, i.e. out.transferFrom(in, 0, in.size()) to transfer data from the source channel’s current position to an absolute range within the target file.
Together, they allow almost every imaginable nontrivial bulk transfer in an efficient way without the need to copy the data into a Java side buffer. If that’s not solving your needs, you have to be more specific in your question.
By the way, you can open a FileChannel directly without the FileInputStream/FileOutputStream detour since Java 7.

while ((bytesCount = in.read(bytebuf)) > 0) {
// flip the buffer which set the limit to current position, and position to 0.
bytebuf.flip();
out.write(bytebuf); // Write data from ByteBuffer to file
bytebuf.clear(); // For the next read
}
Your copy loop is not correct. It should be:
while ((bytesCount = in.read(bytebuf)) > 0 || bytebuf.position() > 0) {
// flip the buffer which set the limit to current position, and position to 0.
bytebuf.flip();
out.write(bytebuf); // Write data from ByteBuffer to file
bytebuf.compact(); // For the next read
}
Can anybody tell, should I follow the same procedure if my file size [is] more than 2 GB?
Yes. The file size doesn't make any difference.

Out of memory java heap space

I am trying to send chunks of files from server to more than one clients. When I am trying to send file of size 700mb, it showed "OutOfMemory java heap space" error. I am using Netbeans 7.1.2 version.
I also tried VMoption in the properties. But still the same error happens. I think there is some problem with reading the entire file. Below code is working for up to 300mb. Please give me some suggestions.
Thanks in advance
public class SplitFile {
static int fileid = 0 ;
public static DataUnit[] getUpdatableDataCode(File fileName) throws FileNotFoundException, IOException{
int i = 0;
DataUnit[] chunks = new DataUnit[UAProtocolServer.singletonServer.cloudhosts.length];
FileInputStream fis;
long Chunk_Size = (fileName.length())/chunks.length;
int cursor = 0;
long fileSize = (long) fileName.length();
int nChunks = 0, read = 0;long readLength = Chunk_Size;
byte[] byteChunk;
try {
fis = new FileInputStream(fileName);
//StupidTest.size = (int)fileName.length();
while (fileSize > 0) {
System.out.println("loop"+ i);
if (fileSize <= Chunk_Size) {
readLength = (int) fileSize;
}
byteChunk = new byte[(int)readLength];
read = fis.read(byteChunk, 0, (int)readLength);
fileSize -= read;
// cursor += read;
assert(read==byteChunk.length);
long aid = fileid;
aid = aid<<32 | nChunks;
chunks[i] = new DataUnit(byteChunk,aid);
// Lister.add(chunks[i]);
nChunks++;
++i;
}
fis.close();
fis = null;
}catch(Exception e){
System.out.println("File splitting exception");
e.printStackTrace();
}
return chunks;
}

Reading in the whole file would definitely trigger OutOfMemoryError as file size grow. Tuning the -Xmx1024M may be good for temporary fix, but it's definitely not the right/scalable solution. Also, doesn't matter how you move your variables around (like creating buffer outside of the loop instead of inside the loop) you will get OutOfMemoryError sooner or later. The only way to not get OutOfMemoryError for you is to not to read the complete file in memory.
If you have to use just memory, then an approach is to send off chunks to the client so you don't have to keep all the chunks in memory:
instead of:
chunks[i] = new DataUnit(byteChunk,aid);
do:
sendChunkToClient(new DataUnit(byteChunk, aid));
But the above solution has the drawback that if some error happened in-between chunk sending, you may have hard time trying to resume/recover from the error point.
Saving the chunks to temporary files like Ross Drew suggested is probably better and more reliable.

How about creating the
byteChunk = new byte[(int)readLength];
outside of the loop and just reuse it instead of creating an array of bytes over and over if it's always the same.
Alternatively
You could write incoming data to a temporary file as it comes in instead of maintaining that huge array then process it once it's all arrived.
Also
If you are using it multiple times as an int, you should probably just case readLength to an int outside the loop as well
int len = (int)readLength;
And Chunk_Size is a variable right? It should begin with a lower case letter.

How to speed up download using Java I/O

I'm quite new to using Java I/O as I haven't ever before and have written this to download a .mp4 file from www.kissanime.com.
The download is very, very slow at the moment (approximately 70-100kb/s) and was wondering how I could speed it up. I don't really understand the byte buffering so any help with that would be appreciated. That may be my problem, I'm not sure.
Here's my code:
protected static boolean downloadFile(URL source, File dest) {
try {
URLConnection urlConn = source.openConnection();
urlConn.setConnectTimeout(1000);
urlConn.setReadTimeout(5000);
InputStream in = urlConn.getInputStream();
FileOutputStream out = new FileOutputStream(dest);
BufferedOutputStream bout = new BufferedOutputStream(out);
int fileSize = urlConn.getContentLength();
byte[] b = new byte[65536];
int bytesDownloaded = 0, len;
while ((len = in.read(b)) != -1 && bytesDownloaded < fileSize) {
bout.write(b, 0, len);
bytesDownloaded += len;
// System.out.println((double) bytesDownloaded / 1000000.0 + "mb/" + (double) fileSize / 1000000.0 + "mb");
}
bout.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
Thanks. Any further information will be provided upon request.
I can't find any questions on here related to downloading media files, and I'm sorry if this is deemed to be a duplicate.

Try using IOUtils.toByteArray, It takes an inputstream and returns an array with all bytes, in my opinion it's generally a good idea to check the common utility packages like apache-commons and guava and see if what you're trying to do hasn't already been done

If you want to save the file from InputStream then use this bellow method of apache-commons
FileUtils.copyInputStreamToFile ()
public static void copyInputStreamToFile(InputStream source,
File destination)
throws IOException
Copies bytes from an InputStream source to a file destination. The directories up to destination will be created if they don't already exist. destination will be overwritten if it already exists. The source stream is closed.
Always use file and IO related stuff by using library if available.There are also some other utility methods available & you can explore .
IOUtils
FileUtils

Turns out that it was the vast number of redirects from the link that caused the download speed to be throttled. Thanks everyone who answered.

How can I read a specific number of bytes from a FileInputStream object using buffers

I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.

Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.

you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B

This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}

Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

Load text file to memory in Java

I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.

I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.

My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.

It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.

I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Downloading images from URL more efficiently - java

Related

Reading and writting a large file using Java NIO

Out of memory java heap space

How to speed up download using Java I/O

How can I read a specific number of bytes from a FileInputStream object using buffers

Load text file to memory in Java

Categories

Resources