Java creating new objects performance - java

I have the below class.
class MyObject implements Serializable {
private String key;
private String val;
private int num;
MyObject(String a, String b, int c) {
this.key = a;
this.val = b;
this.num = c;
}
}
I need to create a list of Objects, the following method is called repeatedly (say 10K times or more)
public void addToIndex(String a, String b, int c) {
MyObject ob = new MyObject(a,b,c);
list.add(ob); // List<MyObject>
}
I used a profiler to see the memory footprint, and it increases so much due to creation of object everytime. Is there a better way of doing this? I am writing the list then to disk.
EDIT: This is how I write once the list is fully populated. Is there a way to append once the memory goes beyond a value (size of list).
ObjectOutputStream oos = new ObjectOutputStream(
new DeflaterOutputStream(new FileOutputStream(
list)));
oos.writeObject(list);
oos.close();

I used a profiler to see the memory footprint, and it increases so much due to creation of object everytime. Is there a better way of doing this?
Java Serialization doesn't use that much memory in your situation. What it does so is create a lot of garbage, far more than you might imagine. It also has a very verbose output which can be improved using compression as you do.
A simple way to improve this situation is to use Externalizable instead of Serializable. This can reduce the garbage produced dramatically and make it more compact. It can also be much faster with lower over head.
BTW You can get even better performance if you use custom serialization for the list itself.
public class Main {
public static void main(String[] args) throws IOException, ClassNotFoundException {
List<MyObject> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
list.add(new MyObject("key-" + i, "value-" + i, i));
}
for (int i = 0; i < 10; i++) {
timeJavaSerialization(list);
timeCustomSerialization(list);
timeCustomSerialization2(list);
}
}
private static void timeJavaSerialization(List<MyObject> list) throws IOException, ClassNotFoundException {
File file = File.createTempFile("java-serialization", "dz");
long start = System.nanoTime();
ObjectOutputStream oos = new ObjectOutputStream(
new DeflaterOutputStream(new FileOutputStream(file)));
oos.writeObject(list);
oos.close();
ObjectInputStream ois = new ObjectInputStream(
new InflaterInputStream(new FileInputStream(file)));
Object o = ois.readObject();
ois.close();
long time = System.nanoTime() - start;
long size = file.length();
System.out.printf("Java serialization uses %,d bytes and too %.3f seconds.%n",
size, time / 1e9);
}
private static void timeCustomSerialization(List<MyObject> list) throws IOException {
File file = File.createTempFile("custom-serialization", "dz");
long start = System.nanoTime();
MyObject.writeList(file, list);
Object o = MyObject.readList(file);
long time = System.nanoTime() - start;
long size = file.length();
System.out.printf("Faster Custom serialization uses %,d bytes and too %.3f seconds.%n",
size, time / 1e9);
}
private static void timeCustomSerialization2(List<MyObject> list) throws IOException {
File file = File.createTempFile("custom2-serialization", "dz");
long start = System.nanoTime();
{
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
new DeflaterOutputStream(new FileOutputStream(file))));
dos.writeInt(list.size());
for (MyObject mo : list) {
dos.writeUTF(mo.key);
}
for (MyObject mo : list) {
dos.writeUTF(mo.val);
}
for (MyObject mo : list) {
dos.writeInt(mo.num);
}
dos.close();
}
{
DataInputStream dis = new DataInputStream(new BufferedInputStream(
new InflaterInputStream(new FileInputStream(file))));
int len = dis.readInt();
String[] keys = new String[len];
String[] vals = new String[len];
List<MyObject> list2 = new ArrayList<>(len);
for (int i = 0; i < len; i++) {
keys[i] = dis.readUTF();
}
for (int i = 0; i < len; i++) {
vals[i] = dis.readUTF();
}
for (int i = 0; i < len; i++) {
list2.add(new MyObject(keys[i], vals[i], dis.readInt()));
}
dis.close();
}
long time = System.nanoTime() - start;
long size = file.length();
System.out.printf("Compact Custom serialization uses %,d bytes and too %.3f seconds.%n",
size, time / 1e9);
}
static class MyObject implements Serializable {
private String key;
private String val;
private int num;
MyObject(String a, String b, int c) {
this.key = a;
this.val = b;
this.num = c;
}
MyObject(DataInput in) throws IOException {
key = in.readUTF();
val = in.readUTF();
num = in.readInt();
}
public void writeTo(DataOutput out) throws IOException {
out.writeUTF(key);
out.writeUTF(val);
out.writeInt(num);
}
public static void writeList(File file, List<MyObject> list) throws IOException {
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
new DeflaterOutputStream(new FileOutputStream(file))));
dos.writeInt(list.size());
for (MyObject mo : list) {
mo.writeTo(dos);
}
dos.close();
}
public static List<MyObject> readList(File file) throws IOException {
DataInputStream dis = new DataInputStream(new BufferedInputStream(
new InflaterInputStream(new FileInputStream(file))));
int len = dis.readInt();
List<MyObject> list = new ArrayList<>(len);
for (int i = 0; i < len; i++) {
list.add(new MyObject(dis));
}
dis.close();
return list;
}
}
}
prints finally
Java serialization uses 61,168 bytes and too 0.061 seconds.
Faster Custom serialization uses 62,519 bytes and too 0.024 seconds.
Compact Custom serialization uses 68,225 bytes and too 0.020 seconds.
As you can see my attempts to make the file more compact instead made it faster, which is a good example of why you should test performance improvements.

Consider using fast-serialization. It is source-level compatible to JDK-serialization, and creates less bloat.
Additionally it beats most of handcrafted "Externalizable" serialization, as its not only the JDK-serialization implementation itself, but also inefficient In/Output stream implementations of stock JDK which hurt performance.
http://code.google.com/p/fast-serialization/

Related

Java 8 - Most effective way to merge List<byte[]> to byte[]

I have a library that returns some binary data as list of binary arrays. Those byte[] need to be merged into an InputStream.
This is my current implementation:
public static InputStream foo(List<byte[]> binary) {
byte[] streamArray = null;
binary.forEach(bin -> {
org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
});
return new ByteArrayInputStream(streamArray);
}
but this is quite cpu intense. Is there a better way?
Thanks for all the answers. I did a performance test. Those are my results:
Function: 'NicolasFilotto' => 68,04 ms average on 100 calls
Function: 'NicolasFilottoEstSize' => 65,24 ms average on 100 calls
Function: 'NicolasFilottoSequenceInputStream' => 63,09 ms average on 100 calls
Function: 'Saka1029_1' => 63,06 ms average on 100 calls
Function: 'Saka1029_2' => 0,79 ms average on 100 calls
Function: 'Coco' => 541,60 ms average on 10 calls
I'm not sure if 'Saka1029_2' is measured correctly...
this is the execute function:
private static double execute(Callable<InputStream> funct, int times) throws Exception {
List<Long> executions = new ArrayList<>(times);
for (int idx = 0; idx < times; idx++) {
BufferedReader br = null;
long startTime = System.currentTimeMillis();
InputStream is = funct.call();
br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null) {}
executions.add(System.currentTimeMillis() - startTime);
}
return calculateAverage(executions);
}
note that I read every input stream
those are the used implementations:
public static class NicolasFilotto implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilotto(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
}
}
public static class NicolasFilottoEstSize implements Callable<InputStream> {
private final List<byte[]> binary;
private final int lineSize;
public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
this.binary = binary;
this.lineSize = lineSize;
}
#Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class Saka1029_1 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_1(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
}
public static class Saka1029_2 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_2(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
#Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}
}
public static class Coco implements Callable<InputStream> {
private final List<byte[]> binary;
public Coco(List<byte[]> binary) {
this.binary = binary;
}
#Override
public InputStream call() throws Exception {
byte[] streamArray = new byte[0];
for (byte[] bin : binary) {
streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
}
return new ByteArrayInputStream(streamArray);
}
}
You could use a ByteArrayOutputStream to store the content of each byte arrays of your list but to make it efficient, we would need to create the instance of ByteArrayOutputStream with an initial size that matches the best as possible with the target size, so if you know the size or at least the average size of the array of bytes, you should use it, the code would be:
public static InputStream foo(List<byte[]> binary) {
ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
Another approach would be to use SequenceInputStream in order to logically concatenate all the ByteArrayInputStream instances representing one element of your list, as next:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
)
);
}
The interesting aspect of this approach is the fact that you have no need to copy anything, you only create instances of ByteArrayInputStream that will use the byte array as it is.
To avoid collecting the result as a List which has a cost especially if your initial List is big, you can directly call iterator() as proposed by #Holger, then we will simply need to convert an iterator into an enumeration which can be done with IteratorUtils.asEnumeration(iterator) from Apache Commons Collection, the final code would then be:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
IteratorUtils.asEnumeration(
binary.stream().map(ByteArrayInputStream::new).iterator()
)
);
}
Try this.
public static InputStream foo(List<byte[]> binary) {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
Or
public static InputStream foo(List<byte[]> binary) {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
#Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}

Read the newly appended file content to an InputStream in Java

I have a writer program that writes a huge serialized java object (at the scale of 1GB) into a binary file on local disk at a specific speed. Actually, the writer program (implemented in C language) is a network receiver that receives the bytes of the serialized object from a remote server. The implementation of the writer is fixed.
Now, I want to implement a Java reader program that reads the file and deserializes it to a Java object. Since the file could be very large, it is beneficial to reduce the latency of deserializing the object. Particularly, I want the Java reader starts to read/deserialize the object once the first byte of the object has been written to the disk file so that the reader can start to deserialize the object even before the entire serialized object has been written to the file. The reader knows the size of the file ahead of time (before the first byte is written to the file).
I think what I need is something like a blocking file InputStream that will be blocked when it reaches the EndOfFile but it has not read the expected number of bytes (the size of the file will be). Thus, whenever new bytes have been written to the file, the reader's InputStream could keep reading the new content. However, FileInputStream in Java does not support this feature.
Probably, I also need a file listener that monitoring the changes made to the file to achieve this feature.
I am wondering if there is any existing solution/library/package can achieve this function. Probably the question may be similar to some of the questions in monitoring the log files.
The flow of the bytes is like this:
FileInputStream -> SequenceInputStream -> BufferedInputStream -> JavaSerializer
You need two threads: Thread1 to download from the server and write to a File, and Thread2 to read the File as it becomes available.
Both threads should share a single RandomAccessFile, so access to the OS file can be synchronized correctly. You could use a wrapper class like this:
public class ReadWriteFile {
ReadWriteFile(File f, long size) throws IOException {
_raf = new RandomAccessFile(f, "rw");
_size = size;
_writer = new OutputStream() {
#Override
public void write(int b) throws IOException {
write(new byte[] {
(byte)b
});
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
if (len < 0)
throw new IllegalArgumentException();
synchronized (_raf) {
_raf.seek(_nw);
_raf.write(b, off, len);
_nw += len;
_raf.notify();
}
}
};
}
void close() throws IOException {
_raf.close();
}
InputStream reader() {
return new InputStream() {
#Override
public int read() throws IOException {
if (_pos >= _size)
return -1;
byte[] b = new byte[1];
if (read(b, 0, 1) != 1)
throw new IOException();
return b[0] & 255;
}
#Override
public int read(byte[] buff, int off, int len) throws IOException {
synchronized (_raf) {
while (true) {
if (_pos >= _size)
return -1;
if (_pos >= _nw) {
try {
_raf.wait();
continue;
} catch (InterruptedException ex) {
throw new IOException(ex);
}
}
_raf.seek(_pos);
len = (int)Math.min(len, _nw - _pos);
int nr = _raf.read(buff, off, len);
_pos += Math.max(0, nr);
return nr;
}
}
}
private long _pos;
};
}
OutputStream writer() {
return _writer;
}
private final RandomAccessFile _raf;
private final long _size;
private final OutputStream _writer;
private long _nw;
}
The following code shows how to use ReadWriteFile from two threads:
public static void main(String[] args) throws Exception {
File f = new File("test.bin");
final long size = 1024;
final ReadWriteFile rwf = new ReadWriteFile(f, size);
Thread t1 = new Thread("Writer") {
public void run() {
try {
OutputStream w = new BufferedOutputStream(rwf.writer(), 16);
for (int i = 0; i < size; i++) {
w.write(i);
sleep(1);
}
System.out.println("Write done");
w.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
};
Thread t2 = new Thread("Reader") {
public void run() {
try {
InputStream r = new BufferedInputStream(rwf.reader(), 13);
for (int i = 0; i < size; i++) {
int b = r.read();
assert (b == (i & 255));
}
int eof = r.read();
assert (eof == -1);
r.close();
System.out.println("Read done");
} catch (IOException ex) {
ex.printStackTrace();
}
}
};
t1.start();
t2.start();
t1.join();
t2.join();
rwf.close();
}

How to read the large text files efficiently in java

Here, I am reading the 18 MB file and store it in a two dimensional array. But this program takes almost 15 minutes to run. Is there anyway to optimize the running time of the program. The file contains only binary values. Thanks in advance…
public class test
{
public static void main(String[] args) throws FileNotFoundException, IOException
{
BufferedReader br;
FileReader fr=null;
int m = 2160;
int n = 4320;
int[][] lof = new int[n][m];
String filename = "D:/New Folder/ETOPOCHAR";
try {
Scanner input = new Scanner(new File("D:/New Folder/ETOPOCHAR"));
double range_km=1.0;
double alonn=-57.07; //180 to 180
double alat=38.53;
while (input.hasNextLine()) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
try
{
lof[j][i] = input.nextInt();
System.out.println("value[" + j + "][" + i + "] = "+ lof[j][i]);
}
catch (java.util.NoSuchElementException e) {
// e.printStackTrace();
}
}
} //print the input matrix
}
I have also tried with byte array but i can not save it in twoD array...
public class FileToArrayOfBytes
{
public static void main( String[] args )
{
FileInputStream fileInputStream=null;
File file = new File("name of file");
byte[] bFile = new byte[(int) file.length()];
try {
//convert file into array of bytes
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
for (int i = 0; i < bFile.length; i++) {
System.out.print((char)bFile[i]);
}
System.out.println("Done");
}catch(Exception e){
e.printStackTrace();
}
}
}
You can read the file into a byte array first, then deserialize these bytes. Start with 2048 bytes buffer (as input buffer), then experiment by increasing/decreasing its size, but the experimental buffer size values should be a power of two (512, 1024, 2048, etc).
As far as I rememenber, there are good chances that the best performance can be achived with a buffer of size 2048 bytes, but it is OS dependent and should be verified.
Code sample (here you can try different values of BUFFER_SIZE variable, in my case I've read a test file of size 7.5M in less then one second):
public static void main(String... args) throws IOException {
File f = new File(args[0]);
byte[] buffer = new byte[BUFFER_SIZE];
ByteBuffer result = ByteBuffer.allocateDirect((int) f.length());
try (FileInputStream fos = new FileInputStream(f)) {
int bytesRead;
int totalBytesRead = 0;
while ((bytesRead = fos.read(buffer, 0, BUFFER_SIZE)) != -1) {
result.put(buffer, 0, bytesRead);
totalBytesRead += bytesRead;
}
// debug info
System.out.printf("Read %d bytes\n", totalBytesRead);
// Here you can do whatever you want with the result, including creation of a 2D array...
int pos = result.position();
result.rewind();
for (int i = 0; i < pos / 4; i++) {
System.out.println(result.getInt());
}
}
}
Take your time and read docs for java.io, java.nio packages as well as Scanner class, just to improve understanding.

What's a more memory-efficient way of taking part of a byte[]?

I've got a ByteArrayOutputStream of stereo audio data. Currently I'm doing this, which I know is bad:
WaveFileWriter wfw = new WaveFileWriter();
AudioFormat format = new AudioFormat(Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
byte[] audioData = dataout.toByteArray(); //bad bad bad
int length = audioData.length;
byte[] monoData = new byte[length/2]; //bad bad bad
for(int i = 0; i < length; i+=4){
monoData[i/2] = audioData[i];
monoData[1+i/2] = audioData[i+1];
}
ByteArrayInputStream bais = new ByteArrayInputStream(monoData);
AudioInputStream outStream = new AudioInputStream(bais,format,length);
wfw.write(outStream, Type.WAVE,output);
What's a better way of doing this? Can I convert the ByteArrayOutputStream into a ByteArrayInputStream so that I can read from it?
Edit
Ok so I've dug into the class that's giving me the ByteArrayOutputStream I'm working with. It's being populated with a call to:
dataout.write(convbuffer, 0, 2 * vi.channels * bout);
I can swap this out for something else if it'll help, but what should I use?
I tried replacing it with:
for(int j = 0;j < bout; j += 2){
dataout.write(convbuffer,2*j,2);
}
but that didn't work, not sure why.
Can't you read audio data by one sample at a time, and write the samples to the file as you read them?
Also it seems that your current code overwrites monoData pointlessly. — Thanks for the correction, #fredley.
State what you're doing in plain English first; this will help you understand it, and then turn to code.
this is what I use instead of the vanilla ByteArrayOutputStream. You get a handy toByteArrayInputStream() + toByteBuffer() (I tend to use quite a lot of ByteBuffers)
Hopefully many can find the code below useful, some methods are removed form the original class.
Cheers!
public class ByteBufStream extends ByteArrayOutputStream implements Serializable{
private static final long serialVersionUID = 1L;
public ByteBufStream(int initSize){
super(initSize);
}
//+few more c-tors, skipped here
public ByteArrayInputStream toByteArrayInputStream(){
return new ByteArrayInputStream(getBuf(),0, count);
}
public ByteBuffer toByteBuffer(){
return ByteBuffer.wrap(getBuf(), 0 , count);
}
public int capacity(){
return buf.length;
}
public byte[] getBuf(){
return buf;
}
public final int size() {
return count;
}
private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException{
out.defaultWriteObject();
out.writeInt(capacity());
out.writeInt(size());
writeTo(out);
}
private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException{
in.defaultReadObject();
int capacity = in.readInt();
int size = in.readInt();
byte[] b = new byte[capacity];
for (int n=0;n<size;){
int read = in.read(b, n, size-n);
if (read<0) throw new StreamCorruptedException("can't read buf w/ size:"+size);
n+=read;
}
this.buf = b;
this.count = size;
}
}
While I generally refrain from teaching hacks, this one is probably harmless, have fun!
If you want to steal the buf[] off a vanilla ByteArrayOutputStream, look at the following method...
public synchronized void writeTo(OutputStream out) throws IOException {
out.write(buf, 0, count);
}
I guess you know what you need to do now:
class ByteArrayOutputStreamHack extends OutputStream{
public ByteArrayInputStream in;
public void write(byte b[], int off, int len) {
in = new ByteArrayInputStream(b, off, len);
}
public void write(int b){
throw new AssertionError();
}
}
ByteArrayOutputStreamHack hack = new ByteArrayOutputStreamHack()
byteArrayOutputStream.writeTo(hack);
ByteArrayInputStream in = hack.in; //we done, we cool :)
Like new ByteArrayInputStream(dataout.toByteArray())?

CharBuffer vs. char[]

Is there any reason to prefer a CharBuffer to a char[] in the following:
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append( buf.flip() );
buf.clear();
}
vs.
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int n;
while( (n = in.read(buf)) >= 0 ) {
out.write( buf, 0, n );
}
(where in is a Reader and out in a Writer)?
No, there's really no reason to prefer a CharBuffer in this case.
In general, though, CharBuffer (and ByteBuffer) can really simplify APIs and encourage correct processing. If you were designing a public API, it's definitely worth considering a buffer-oriented API.
I wanted to mini-benchmark this comparison.
Below is the class I have written.
The thing is I can't believe that the CharBuffer performed so badly. What have I got wrong?
EDIT: Since the 11th comment below I have edited the code and the output time, better performance all round but still a significant difference in times. I also tried out2.append((CharBuffer)buff.flip()) option mentioned in the comments but it was much slower than the write option used in the code below.
Results: (time in ms)
char[] : 3411
CharBuffer: 5653
public class CharBufferScratchBox
{
public static void main(String[] args) throws Exception
{
// Some Setup Stuff
String smallString =
"1111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000";
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
stringBuilder.append(smallString);
}
String string = stringBuilder.toString();
int DEFAULT_BUFFER_SIZE = 1000;
int ITTERATIONS = 10000;
// char[]
StringReader in1 = null;
StringWriter out1 = null;
Date start = new Date();
for (int i = 0; i < ITTERATIONS; i++)
{
in1 = new StringReader(string);
out1 = new StringWriter(string.length());
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int n;
while ((n = in1.read(buf)) >= 0)
{
out1.write(
buf,
0,
n);
}
}
Date done = new Date();
System.out.println("char[] : " + (done.getTime() - start.getTime()));
// CharBuffer
StringReader in2 = null;
StringWriter out2 = null;
start = new Date();
CharBuffer buff = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
for (int i = 0; i < ITTERATIONS; i++)
{
in2 = new StringReader(string);
out2 = new StringWriter(string.length());
int n;
while ((n = in2.read(buff)) >= 0)
{
out2.write(
buff.array(),
0,
n);
buff.clear();
}
}
done = new Date();
System.out.println("CharBuffer: " + (done.getTime() - start.getTime()));
}
}
If this is the only thing you're doing with the buffer, then the array is probably the better choice in this instance.
CharBuffer has lots of extra chrome on it, but none of it is relevant in this case - and will only slow things down a fraction.
You can always refactor later if you need to make things more complicated.
The difference, in practice, is actually <10%, not 30% as others are reporting.
To read and write a 5MB file 24 times, my numbers taken using a Profiler. They were on average:
char[] = 4139 ms
CharBuffer = 4466 ms
ByteBuffer = 938 (direct) ms
Individual tests a couple times favored CharBuffer.
I also tried replacing the File-based IO with In-Memory IO and the performance was similar. If you are trying to transfer from one native stream to another, then you are better off using a "direct" ByteBuffer.
With less than 10% performance difference, in practice, I would favor the CharBuffer. It's syntax is clearer, there's less extraneous variables, and you can do more direct manipulation on it (i.e. anything that asks for a CharSequence).
Benchmark is below... it is slightly wrong as the BufferedReader is allocated inside the test-method rather than outside... however, the example below allows you to isolate the IO time and eliminate factors like a string or byte stream resizing its internal memory buffer, etc.
public static void main(String[] args) throws Exception {
File f = getBytes(5000000);
System.out.println(f.getAbsolutePath());
try {
System.gc();
List<Main> impls = new java.util.ArrayList<Main>();
impls.add(new CharArrayImpl());
//impls.add(new CharArrayNoBuffImpl());
impls.add(new CharBufferImpl());
//impls.add(new CharBufferNoBuffImpl());
impls.add(new ByteBufferDirectImpl());
//impls.add(new CharBufferDirectImpl());
for (int i = 0; i < 25; i++) {
for (Main impl : impls) {
test(f, impl);
}
System.out.println("-----");
if(i==0)
continue; //reset profiler
}
System.gc();
System.out.println("Finished");
return;
} finally {
f.delete();
}
}
static int BUFFER_SIZE = 1000;
static File getBytes(int size) throws IOException {
File f = File.createTempFile("input", ".txt");
FileWriter writer = new FileWriter(f);
Random r = new Random();
for (int i = 0; i < size; i++) {
writer.write(Integer.toString(5));
}
writer.close();
return f;
}
static void test(File f, Main impl) throws IOException {
InputStream in = new FileInputStream(f);
File fout = File.createTempFile("output", ".txt");
try {
OutputStream out = new FileOutputStream(fout, false);
try {
long start = System.currentTimeMillis();
impl.runTest(in, out);
long end = System.currentTimeMillis();
System.out.println(impl.getClass().getName() + " = " + (end - start) + "ms");
} finally {
out.close();
}
} finally {
fout.delete();
in.close();
}
}
public abstract void runTest(InputStream ins, OutputStream outs) throws IOException;
public static class CharArrayImpl extends Main {
char[] buff = new char[BUFFER_SIZE];
public void runTest(InputStream ins, OutputStream outs) throws IOException {
Reader in = new BufferedReader(new InputStreamReader(ins));
Writer out = new BufferedWriter(new OutputStreamWriter(outs));
int n;
while ((n = in.read(buff)) >= 0) {
out.write(buff, 0, n);
}
}
}
public static class CharBufferImpl extends Main {
CharBuffer buff = CharBuffer.allocate(BUFFER_SIZE);
public void runTest(InputStream ins, OutputStream outs) throws IOException {
Reader in = new BufferedReader(new InputStreamReader(ins));
Writer out = new BufferedWriter(new OutputStreamWriter(outs));
int n;
while ((n = in.read(buff)) >= 0) {
buff.flip();
out.append(buff);
buff.clear();
}
}
}
public static class ByteBufferDirectImpl extends Main {
ByteBuffer buff = ByteBuffer.allocateDirect(BUFFER_SIZE * 2);
public void runTest(InputStream ins, OutputStream outs) throws IOException {
ReadableByteChannel in = Channels.newChannel(ins);
WritableByteChannel out = Channels.newChannel(outs);
int n;
while ((n = in.read(buff)) >= 0) {
buff.flip();
out.write(buff);
buff.clear();
}
}
}
I think that CharBuffer and ByteBuffer (as well as any other xBuffer) were meant for reusability so you can buf.clear() them instead of going through reallocation every time
If you don't reuse them, you're not using their full potential and it will add extra overhead. However if you're planning on scaling this function this might be a good idea to keep them there
You should avoid CharBuffer in recent Java versions, there is a bug in #subsequence(). You cannot get a subsequence from the second half of the buffer since the implementation confuses capacity and remaining. I observed the bug in java 6-0-11 and 6-0-12.
The CharBuffer version is slightly less complicated (one less variable), encapsulates buffer size handling and makes use of a standard API. Generally I would prefer this.
However there is still one good reason to prefer the array version, in some cases at least. CharBuffer was only introduced in Java 1.4 so if you are deploying to an earlier version you can't use Charbuffer (unless you roll-your-own/use a backport).
P.S If you use a backport remember to remove it once you catch up to the version containing the "real" version of the backported code.

Categories