I have to read a 53 MB file character by character. When I do it in C++ using ifstream, it is completed in milliseconds but using Java InputStream it takes several minutes. Is it normal for Java to be this slow or am I missing something?
Also, I need to complete the program in Java (it uses servlets from which I have to call the functions which process these characters). I was thinking maybe writing the file processing part in C or C++ and then using Java Native Interface to interface these functions with my Java programs... How is this idea?
Can anyone give me any other tip... I seriously need to read the file faster. I tried using buffered input, but still it is not giving performance even close to C++.
Edited: My code spans several files and it is very dirty so I am giving the synopsis
import java.io.*;
public class tmp {
public static void main(String args[]) {
try{
InputStream file = new BufferedInputStream(new FileInputStream("1.2.fasta"));
char ch;
while(file.available()!=0) {
ch = (char)file.read();
/* Do processing */
}
System.out.println("DONE");
file.close();
}catch(Exception e){}
}
}
I ran this code with a 183 MB file. It printed "Elapsed 250 ms".
final InputStream in = new BufferedInputStream(new FileInputStream("file.txt"));
final long start = System.currentTimeMillis();
int cnt = 0;
final byte[] buf = new byte[1000];
while (in.read(buf) != -1) cnt++;
in.close();
System.out.println("Elapsed " + (System.currentTimeMillis() - start) + " ms");
I would try this
// create the file so we have something to read.
final String fileName = "1.2.fasta";
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(new byte[54 * 1024 * 1024]);
fos.close();
// read the file in one hit.
long start = System.nanoTime();
FileChannel fc = new FileInputStream(fileName).getChannel();
ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
while (bb.remaining() > 0)
bb.getLong();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to read %.1f MB%n", time / 1e9, fc.size() / 1e6);
fc.close();
((DirectBuffer) bb).cleaner().clean();
prints
Took 0.016 seconds to read 56.6 MB
Use a BufferedInputStream:
InputStream buffy = new BufferedInputStream(inputStream);
As noted above, use a BufferedInputStream. You could also use the NIO package. Note that for most files, BufferedInputStream will be just as fast reading as NIO. However, for extremely large files, NIO may do better because you can memory mapped file operations. Furthermore, the NIO package does interruptible IO, whereas the java.io package does not. That means if you want to cancel the operation from another thread, you have to use NIO to make it reliable.
ByteBuffer buf = ByteBuffer.allocate(BUF_SIZE);
FileChannel fileChannel = fileInputStream.getChannel();
int readCount = 0;
while ( (readCount = fileChannel.read(buf)) > 0) {
buf.flip();
while (buf.hasRemaining()) {
byte b = buf.get();
}
buf.clear();
}
Related
I was trying to read a file into an array by using FileInputStream, and an ~800KB file took about 3 seconds to read into memory. I then tried the same code except with the FileInputStream wrapped into a BufferedInputStream and it took about 76 milliseconds. Why is reading a file byte by byte done so much faster with a BufferedInputStream even though I'm still reading it byte by byte? Here's the code (the rest of the code is entirely irrelevant). Note that this is the "fast" code. You can just remove the BufferedInputStream if you want the "slow" code:
InputStream is = null;
try {
is = new BufferedInputStream(new FileInputStream(file));
int[] fileArr = new int[(int) file.length()];
for (int i = 0, temp = 0; (temp = is.read()) != -1; i++) {
fileArr[i] = temp;
}
BufferedInputStream is over 30 times faster. Far more than that. So, why is this, and is it possible to make this code more efficient (without using any external libraries)?
In FileInputStream, the method read() reads a single byte. From the source code:
/**
* Reads a byte of data from this input stream. This method blocks
* if no input is yet available.
*
* #return the next byte of data, or <code>-1</code> if the end of the
* file is reached.
* #exception IOException if an I/O error occurs.
*/
public native int read() throws IOException;
This is a native call to the OS which uses the disk to read the single byte. This is a heavy operation.
With a BufferedInputStream, the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStream makes less native calls to the OS to read from the file.
For example, your file is 32768 bytes long. To get all the bytes in memory with a FileInputStream, you will require 32768 native calls to the OS. With a BufferedInputStream, you will only require 4, regardless of the number of read() calls you will do (still 32768).
As to how to make it faster, you might want to consider Java 7's NIO FileChannel class, but I have no evidence to support this.
Note: if you used FileInputStream's read(byte[], int, int) method directly instead, with a byte[>8192] you wouldn't need a BufferedInputStream wrapping it.
A BufferedInputStream wrapped around a FileInputStream, will request data from the FileInputStream in big chunks (512 bytes or so by default, I think.) Thus if you read 1000 characters one at a time, the FileInputStream will only have to go to the disk twice. This will be much faster!
It is because of the cost of disk access. Lets assume you will have a file which size is 8kb. 8*1024 times access disk will be needed to read this file without BufferedInputStream.
At this point, BufferedStream comes to the scene and acts as a middle man between FileInputStream and the file to be read.
In one shot, will get chunks of bytes default is 8kb to memory and then FileInputStream will read bytes from this middle man.
This will decrease the time of the operation.
private void exercise1WithBufferedStream() {
long start= System.currentTimeMillis();
try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
BufferedInputStream bufferedInputStream = new BufferedInputStream(myFile);
boolean eof = false;
while (!eof) {
int inByteValue = bufferedInputStream.read();
if (inByteValue == -1) eof = true;
}
} catch (IOException e) {
System.out.println("Could not read the stream...");
e.printStackTrace();
}
System.out.println("time passed with buffered:" + (System.currentTimeMillis()-start));
}
private void exercise1() {
long start= System.currentTimeMillis();
try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
boolean eof = false;
while (!eof) {
int inByteValue = myFile.read();
if (inByteValue == -1) eof = true;
}
} catch (IOException e) {
System.out.println("Could not read the stream...");
e.printStackTrace();
}
System.out.println("time passed without buffered:" + (System.currentTimeMillis()-start));
}
I'm developing client - server architecture for files exchanging, it's for my own purpose. Everything works great except memory usage. After I've sent some files I realized my applications memory management isn't so effective when I was trying to send some videos(something about 900MB), my client's and server's memory usage was about 1,5GB.
I used NetBeans's Profiler and it said that the problem is byte array.
//Client side
FileInputStream f = new FileInputStream(file);
FileChannel ch = f.getChannel();
ByteBuffer bb = ByteBuffer.allocate(8192*32);
int nRead = 0;
while ((nRead = ch.read(bb)) != -1) {
if (nRead == 0) {
continue;
}
bb.position(0);
bb.limit(nRead);
send.writeObject(Arrays.copyOfRange(bb.array(), 0, nRead));
send.flush();
bb.clear();
}
f.close();
ch.close();
bb.clear();
send.writeObject(0xBB);
send.flush();
//Server side
FileOutputStream fos = new FileOutputStream(file);
FileChannel fco = fos.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(8192 * 32);
do {
Object received = download.readObject();
if (received instanceof byte[]) {
byte[] bytes = (byte[]) received;
buffer.put(bytes);
buffer.flip();
buffer.position(0);
buffer.limit(bytes.length);
fco.write(buffer);
buffer.clear();
} else if (received instanceof Integer) {
Integer tempx = (Integer) received;
state = (byte) (tempx & (0xFF));
}
} while (received != (byte) 0xBB);
fco.close();
fos.close();
Is there anyway to fix it, I mean is it possible to clean used memory? Limiting bytebuffer doesnt work properly so I've limited the byte array from buffer, I didn't attached whole code, because working with files is the problem.
SCREEN FROM PROFILER - CLIENT'S MEMORY USAGE
http://i.stack.imgur.com/ouTDk.png
Your buffers are 8192 times 32. If you have memory problems, make them smaller. You don't need them that big for network purposes. It's also a strange way to write 256k.
Don't create pointless copies of byte arrays. ObjectOutputStream.writeUnshared() will do what you need there.
I would strongly suggest getting rid of the serialization and just copying the bytes. The code becomes much simpler and you have less copies of the data, especially at the receiving end.
It's not a direct solution - but you should use try-with-resources blocks on all your streams. That will prevent any possible resource leaks that may be making your situation worse.
try (FileOutputStream fos = new FileOutputStream(file)) {
// Do stuff here, fos is automatically closed when you leave the block
}
I'm writing an application that needs to send a file over the network. I've only been taught how to use standard java.net and java.io classes so far (in my first year of college) so I have no experience with java.nio and netty and all those nice things. I've got a working server/client set up using Socket and ServerSocket classes along with BufferedInput/OutputStreams and BufferedFile streams, as follows:
The server:
public class FiletestServer {
static ServerSocket server;
static BufferedInputStream in;
static BufferedOutputStream out;
public static void main(String[] args) throws Exception {
server = new ServerSocket(12354);
System.out.println("Waiting for client...");
Socket s = server.accept();
in = new BufferedInputStream(s.getInputStream(), 8192);
out = new BufferedOutputStream(s.getOutputStream(), 8192);
File f = new File("test.avi");
BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f), 8192);
System.out.println("Sending to client...");
byte[] b = new byte[8192];
while (fin.read(b) != -1) {
out.write(b);
}
fin.close();
out.close();
in.close();
s.close();
server.close();
System.out.println("done!");
}
}
And the client:
public class FiletestClient {
public static void main(String[] args) throws Exception {
System.out.println("Connecting to server...");
Socket s;
if (args.length < 1) {
s = new Socket("", 12354);
} else {
s = new Socket(args[0], 12354);
}
System.out.println("Connected.");
BufferedInputStream in = new BufferedInputStream(s.getInputStream(), 8192);
BufferedOutputStream out = new BufferedOutputStream(s.getOutputStream(), 8192);
File f = new File("test.avi");
System.out.println("Receiving...");
FileOutputStream fout = new FileOutputStream(f);
byte[] b = new byte[8192];
while (in.read(b) != -1) {
fout.write(b);
}
fout.close();
in.close();
out.close();
s.close();
System.out.println("Done!");
}
}
At first I was using no buffering, and writing each int from in.read(). That got me about 200kb/s transfer according to my network monitor gadget on windows 7. I then changed it as above but used 4096 byte buffers and got the same speed, but the file received was usually a couple kilobytes bigger than the source file, and that is what my problem is. I changed the buffer size to 8192 and I now get about 3.7-4.5mb/sec transfer over wireless to my laptop, which is plenty fast enough for now, but I still have the problem of the file getting slightly bigger (which would cause it to fail an md5/sha hash test) when it is received.
So my question is what is the proper way of buffering to get decent speeds and end up with exactly the same file on the other side? Getting it to go a bit faster would be nice too but I'm happy with the speed for now. I'm assuming a bigger buffer is better up to a point, I just need to find what that point is.
You are ignoring the size of data actually read.
while (in.read(b) != -1) {
fout.write(b);
}
will always write 8192 bytes even if only one byte is read. Instead I suggest using
for(int len; ((len = in.read(b)) > 0;)
fout.write(b, 0, len);
Your buffers are the same size as your byte[] so they are not really doing anything at the moment.
The MTU for most networks is around 1500 bytes and you get a performance improvement on slower networks (up to 1 GB) up to 2 KB. 8 KB as fine as well. Larger than that is unlikely to help.
If you actually want to make it 'so perfect', you should take a look at the try-catch-with-resources statement and the java.nio package (or any nio-derivated libraries).
I am trying to accomplish a large file upload on a blackberry. I am succesfully able to upload a file but only if I read the file and upload it 1 byte at a time. For large files I think this is decreasing performance. I want to be able to read and write at something more 128 kb at a time. If i try to initialise my buffer to anything other than 1 then I never get a response back from the server after writing everything.
Any ideas why i can upload using only 1 byte at a time?
z.write(boundaryMessage.toString().getBytes());
DataInputStream fileIn = fc.openDataInputStream();
boolean isCancel = false;
byte[]b = new byte[1];
int num = 0;
int left = buffer;
while((fileIn.read(b)>-1))
{
num += b.length;
left = buffer - num * 1;
Log.info(num + "WRITTEN");
if (isCancel == true)
{
break;
}
z.write(b);
}
z.write(endBoundary.toString().getBytes());
It's a bug in BlackBerry OS that appeared in OS 5.0, and persists in OS 6.0. If you try using a multi-byte read before OS 5, it will work fine. OS5 and later produce the behavior you have described.
You can also get around the problem by creating a secure connection, as the bug doesn't manifest itself for secure sockets, only plain sockets.
Most input streams aren't guaranteed to fill a buffer on every read. (DataInputStream has a special method for this, readFully(), which will throw an EOFException if there aren't enough bytes left in the stream to fill the buffer.) And unless the file is a multiple of the buffer length, no stream will fill the buffer on the final read. So, you need to store the number of bytes read and use it during the write:
while(!isCancel)
{
int n = fileIn.read(b);
if (n < 0)
break;
num += n;
Log.info(num + "WRITTEN");
z.write(b, 0, n);
}
Your loop isn't correct. You should take care of the return value from read. It returns how many bytes that were actually read, and that isn't always the same as the buffer size.
Edit:
This is how you usually write loops that does what you want to do:
OutputStream z = null; //Shouldn't be null
InputStream in = null; //Shouldn't be null
byte[] buffer = new byte[1024 * 32];
int len = 0;
while ((len = in.read(buffer)) > -1) {
z.write(buffer, 0, len);
}
Note that you might want to use buffered streams instead of unbuffered streams.
I am doing a program to saturate a link for performance testing in my networking lab, I tried different things, from changing Send and Receive buffers, creating a file and reading it, creating a long array and sending it through the socket all at once: OutputStream.write(byte[])
The array is 1000000 positions length, when I sniff the network traffic, according to the sniffer, the packets have "Data (1460 bytes)" which make me supose that I'm not sending byte by byte.
The bandwidth used is about 8% of the 100Mbps.
I post the relevant code as there is some interaction between client and server which I don't think is relevant:
Client:
int car=0;
do {
car=is.read();
//System.out.println(car);
contador++;
} while(car!=104);
Server:
byte dades[]=new byte[1000000];
FileInputStream fis=null;
try {
FileOutputStream fos = new FileOutputStream("1MB.txt");
fos.write(dades);
fos=null;
File f = new File("1MB.txt");
fis = new FileInputStream(f);
step=0;
correcte=true;
sck = srvSock.accept();
sck.setSendBufferSize(65535);
sck.setReceiveBufferSize(65535);
os = sck.getOutputStream();
is = sck.getInputStream();
}
...
BufferedInputStream bis = new BufferedInputStream(fis);
bis.read(dades);
for(int i=0;i<100;i++) {
os.write(dades);
}
In this case I put the last idea I had, to create a file with a million positions byte array and then read this file and write to the socket, before this idea I was sending the byte array.
Another thing which make me believe this is not a byte by byte sending is that in a quad core computer the client uses 25% CPU and uses around 8% of the bandwidth, and in an old computer which is single core (AMD Athlon) it uses 100% of the CPU and just 4% of the bandwidth. The server is not so CPU intensive.
Any ideas??? I feel a little lost right now...
Thanks!!!
Perhaps it's related to the fact that client reads data byte by byte, that can force flow control algorithm to limit transmission bandwidth:
int car=0;
do {
car=is.read();
//System.out.println(car);
contador++;
} while(car!=104);
Try to read data into array instead, or use BufferedInputStream:
byte[] buf = new byte[65536];
int size = 0;
boolean stop = false;
while (!stop && (size = is.read(buf)) != -1) {
for (int i = 0; i < size; i++) {
if (buf[i] == 104) {
stop = true;
break;
}
}
}