Reactor - How to compress Flux<ByteBuffer> on the fly?

Reactor - How to compress Flux<ByteBuffer> on the fly? - java

I have a requirement to read and write compressed (gzip/brotli) streams without intermediate storage. The data is received from the underlying in Flux<ByteBuffer> format. The data is large enough that buffering is out of option. How do I compress Flux<ByteBuffer> on the fly without having to store the full data either in memory or writing out to disk?

You want to avoid buffering of full data, but you can archive each ByteBuffer chunk or, if your chunks are sufficiently small, to consolidate chunks in groups and then to archive.
This will not require too long memory, but will compress your data.
The actual level of compression depends on content of your source data and on number of chunks consolidated before archiving. I think, you cat adjust it manually to have the best ratio.
The example of probable code is below:
public class Test_GzipFlux {
/**
* Returns Flux of gzip-ed buffers after (optional) buffer consolidation
* #param inFlux input stream of buffers
* #param consolidatedBufCount number of buffers to consolidate before gzip-ing
*/
public static Flux<ByteBuffer> gzipFlux(Flux<ByteBuffer> inFlux,
int consolidatedBufCount, int outChunkMaxLength) {
return inFlux.buffer(consolidatedBufCount)
.map(inList->zipBuffers(inList, outChunkMaxLength));
}
/**
* Consolidates buffers from input list, applies gzip, returns result as single buffer
* #param inList portion of chunks to be consolidated
* #param outChunkMaxLength estimated length of output chunk.
* !!! to avoid pipe deadlock, this length to be sufficient
* !!! for consolidated data after gzip
*/
private static ByteBuffer zipBuffers(List<ByteBuffer> inList, int outChunkMaxLength) {
try {
PipedInputStream pis = new PipedInputStream(outChunkMaxLength);
GZIPOutputStream gos = new GZIPOutputStream(new PipedOutputStream(pis));
for (var buf: inList) {
gos.write(buf.array());
}
gos.close();
byte[] outBytes = new byte[pis.available()];
pis.read(outBytes);
pis.close();
return ByteBuffer.wrap(outBytes);
} catch (IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}
private static void test() {
int inLength = ... // actual full length of source data
Flux<ByteBuffer> source = ... // your source Flux
// these are parameters for your adjustment
int consolidationCount = 5;
int outChunkMaxLength= 30 * 1024;
Flux<ByteBuffer> result = gzipFlux(source,consolidationCount, outChunkMaxLength);
int outLen = result.reduce(0, (res, bb) -> res + bb.array().length).block();
System.out.println("ratio=" + (double)inLength/outLen);
}
}

Related

Get AudioInputStream of FloatBuffer

I have a callback that gets incoming audio data as FloatBuffer containing 1024 floats that gets called several times per second. But I need an AudioInputStream since my system only works with them.
Converting the floats into 16bit PCM isgned audio data is not a problem, but I cannot create an InputStream out of it. The AudioInputStream constructor only accepts data with known length, but I have a constant stream. The AudioSystem.getAudioInputStream throws an "java.io.IOException: mark/reset not supported" if I feed it with a PipedInputStream containing the audio data.
Any ideas?
Here's my current code:
Jack jack = Jack.getInstance();
JackClient client = jack.openClient("Test", EnumSet.noneOf(JackOptions.class), EnumSet.noneOf(JackStatus.class));
JackPort in = client.registerPort("in", JackPortType.AUDIO, EnumSet.of(JackPortFlags.JackPortIsInput));
PipedInputStream pin = new PipedInputStream(1024 * 1024 * 1024);
PipedOutputStream pout = new PipedOutputStream(pin);
client.setProcessCallback(new JackProcessCallback() {
public boolean process(JackClient client, int nframes) {
FloatBuffer inData = in.getFloatBuffer();
byte[] buffer = new byte[inData.capacity() * 2];
for (int i = 0; i < inData.capacity(); i++) {
int sample = Math.round(inData.get(i) * 32767);
buffer[i * 2] = (byte) sample;
buffer[i * 2 + 1] = (byte) (sample >> 8);
}
try {
pout.write(buffer, 0, buffer.length);
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
});
client.activate();
client.transportStart();
Thread.sleep(10000);
client.transportStop();
client.close();
AudioInputStream audio = AudioSystem.getAudioInputStream(new BufferedInputStream(pin, 1024 * 1024 * 1024));
AudioSystem.write(audio, Type.WAVE, new File("test.wav"));
It uses the JnaJack library, but it doesn't really matter where the data comes from. The conversion to bytes is fine by the way: writing that data directly to a SourceDataLine will work correctly. But I need the data as
AudioInputStream.

AudioSystem.getAudioInputStream expects a stream which conforms to a supported AudioFileFormat, which means it must conform to a known type. From the documentation:
The stream must point to valid audio file data.
And also from that documentation:
The implementation of this method may require multiple parsers to examine the stream to determine whether they support it. These parsers must be able to mark the stream, read enough data to determine whether they support the stream, and reset the stream's read pointer to its original position. If the input stream does not support these operation, this method may fail with an IOException.
You can create your own AudioInputStream using the three-argument constructor. If the length is not known, it can specified as AudioSystem.NOT_SPECIFIED. Frustratingly, neither the constructor documentation nor the class documentation mentions this, but the other constructor’s documentation does.

MappedByteBuffer not releasing memory

I am having trouble using the NIO MappedByteBuffer function to read very large seismic files. The format my program reads is called SEGY and consists of seismic data samples as well as meta data regarding, among other items, the numeric ID and XY coordinates of the seismic data.
The structure of the format is fairly fixed with a 240 byte header followed by a fixed number of data samples making up each seismic trace. The number of samples per trace can vary from file to file but usually is around 1000 to 2000.
Samples can be written as single bytes, 16 or 32 bit integers, or either IBM or IEEE float. The data in each trace header can likewise be in any of the above formats. To further confuse the issue SEGY files can be in big or little endian byte order.
The files can range in size from 3600 bytes up to several terrabytes.
My application is a SEGY editor and viewer. For many of the functions it performs I must read only one or two variables, say long ints from each trace header.
At present I am reading from a RandomAccessFile into a byte buffer, then extracting the needed variables from a view buffer. This works but is painfully slow for very large files.
I have written a new file handler using a mapped byte buffer that breaks the file into 5000 trace MappedByteBuffers. This works well and is very fast until my system runs low on memory and then it slows to a crawl and I am forced to reboot just to make my Mac useable again.
For some reason the memory from the buffers is never released, even after my program is finished. I need to either do a purge or reboot.
This is my code. Any suggestions would be most appreciated.
package MyFileHandler;
import java.io.*;
import java.nio.*;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
public class MyFileHandler
{
/*
A buffered file IO class that keeps NTRACES traces in memory for reading and writing.
the buffers start and end at trace boundaries and the buffers are sequential
i.e 1-20000,20001-40000, etc
The last, or perhaps only buffer will contain less than NTRACES up to the last trace
The arrays BufferOffsets and BufferLengths contain the start and length for all the
buffers required to read and write to the file
*/
private static int NTRACES = 5000;
private boolean HighByte;
private long FileSize;
private int BytesPerTrace;
private FileChannel FileChnl;
private MappedByteBuffer Buffer;
private long BufferOffset;
private int BufferLength;
private long[] BufferOffsets;
private int[] BufferLengths;
private RandomAccessFile Raf;
private int BufferIndex;
private ArrayList Maps;
public MyFileHandler(RandomAccessFile raf, int bpt)
{
try
{
HighByte = true;
// allocate a filechannel to the file
FileChnl = raf.getChannel();
FileSize = FileChnl.size();
BytesPerTrace = bpt;
SetUpBuffers();
BufferIndex = 0;
GetNewBuffer(0);
} catch (IOException ioe)
{
ioe.printStackTrace();
}
}
private void SetUpBuffers()
{
// get number of traces in entire file
int ntr = (int) ((FileSize - 3600) / BytesPerTrace);
int nbuffs = ntr / NTRACES;
// add one to nbuffs unless filesize is multiple of NTRACES
if (Math.IEEEremainder(ntr, NTRACES) != 0)
{
nbuffs++;
}
BufferOffsets = new long[nbuffs];
BufferLengths = new int[nbuffs];
// BuffOffset are in bytes, not trace numbers
//get the offsets and lengths of each buffer
for (int i = 0; i < nbuffs; i++)
{
if (i == 0)
{
// first buffer contains EBCDIC header 3200 bytes and binary header 400 bytes
BufferOffsets[i] = 0;
BufferLengths[i] = 3600 + (Math.min(ntr, NTRACES) * BytesPerTrace);
} else
{
BufferOffsets[i] = BufferOffsets[i - 1] + BufferLengths[i - 1];
BufferLengths[i] = (int) (Math.min(FileSize - BufferOffsets[i], NTRACES * BytesPerTrace));
}
}
GetMaps();
}
private void GetMaps()
{
// map the file to list of MappedByteBuffer
Maps = new ArrayList(BufferOffsets.length);
try
{
for(int i=0;i<BufferOffsets.length;i++)
{
MappedByteBuffer map = FileChnl.map(FileChannel.MapMode.READ_WRITE, BufferOffsets[i], BufferLengths[i]);
SetByteOrder(map);
Maps.add(map);
}
} catch (IOException ioe)
{
ioe.printStackTrace();
}
}
private void GetNewBuffer(long offset)
{
if (Buffer == null || offset < BufferOffset || offset >= BufferOffset + BufferLength)
{
BufferIndex = GetBufferIndex(offset);
BufferOffset = BufferOffsets[BufferIndex];
BufferLength = BufferLengths[BufferIndex];
Buffer = (MappedByteBuffer)Maps.get(BufferIndex);
}
}
private int GetBufferIndex(long offset)
{
int indx = 0;
for (int i = 0; i < BufferOffsets.length; i++)
{
if (offset >= BufferOffsets[i] && offset < BufferOffsets[i]+BufferLengths[i])
{
indx = i;
break;
}
}
return indx;
}
private void SetByteOrder(MappedByteBuffer ByteBuff)
{
if (HighByte)
{
ByteBuff.order(ByteOrder.BIG_ENDIAN);
} else
{
ByteBuff.order(ByteOrder.LITTLE_ENDIAN);
}
}
// public methods to read, (get) or write (put) an array of types, byte, short, int, or float.
// for sake of brevity only showing get and put for ints
public void Get(int[] buff, long offset)
{
GetNewBuffer(offset);
Buffer.position((int) (offset - BufferOffset));
Buffer.asIntBuffer().get(buff);
}
public void Put(int[] buff, long offset)
{
GetNewBuffer(offset);
Buffer.position((int) (offset - BufferOffset));
Buffer.asIntBuffer().put(buff);
}
public void HighByteOrder(boolean hb)
{
// all byte swapping is done by the buffer class
// set all allocated buffers to same byte order
HighByte = hb;
}
public int GetBuffSize()
{
return BufferLength;
}
public void Close()
{
try
{
FileChnl.close();
} catch (Exception e)
{
e.printStackTrace();
}
}
}

You are mapping the entire file into memory, via a possibly large number of MappedByteBuffers, and as you are keeping them in a Map they are never released. It is pointless. You may as well map the entire file with a single MappedByteBuffer, or the minimum number you need to overcome the address limitation. There is no benefit in using more of them than you need.
But I would only map the segment of the file that is currently being viewed/edited, and release it when the user moves to another segment.
I'm surprised that MappedByteBuffer is found to be so much faster. Last time I tested, reads via mapped byte buffers were only 20% faster than RandomAccessFile, and writes not at all. I'd like to see the RandomAccessFile code, as it seems there is probably something wrong with it that could easily be fixed.

How can I retrieve the size of a file during the downloading from an URL (using a http connection)?

I'm working on a project which download a file by using a http connection. I display a horizontal progress bar with the progress bar status during the downloading.
My function looks like this:
.......
try {
InputStream myInput = urlconnect.getInputStream();
BufferedInputStream buffinput = new BufferedInputStream(myInput);
ByteArrayBuffer baf = new ByteArrayBuffer(capacity);
int current = 0;
while((current = buffinput.read()) != -1) {
baf.append((byte) current);
}
File outputfile = new File(createRepertory(app, 0), Filename);
FileOutputStream myOutPut = new FileOutputStream(outputfile);
myOutPut.write(baf.toByteArray());
...
}
I know in advance the size of my file so I need to retrieve the size during the downloading (in my while block). Thus I'll be able to determinate the status of the progress bar.
progressBarStatus = ((int) downloadFileHttp(url, app) * 100)/sizefile;
long downloadFileHttp(.., ..) is the name of my function.
I already try to retrieve it by using outputfile.length but his value is "1" maybe it's the number of file that I'm trying to download.
Is there any way to figure it out?
UPDATE 1
I haven't got any thread which allow me to figure this out.
Currently I have got a horizontal progress bar whch displays only 0 and 100% whitout intermediate values. I
think about another approach. If I know the rate of my wifi and the
size of the file I can determinate the time of downloading.
I know that I can retrieve the piece of information of my Wifi
connection and the size of my file to download.
Is anybody already have worked or have thread on it?

I'll assume that you're using HttpURLConnection. In which case you need to call the getContentLength() method on urlconnect.
However, the server is not required to send a valid content length, so you should be prepared for it to be -1.

AsyncTask might be the perfect solution for you:
private class DownloadFileTask extends AsyncTask<URL, Integer, Long> {
protected Long doInBackground(URL... urls) {
Url url = urls[0];
//connect to url here
.......
try {
InputStream myInput = urlconnect.getInputStream();
BufferedInputStream buffinput = new BufferedInputStream(myInput);
ByteArrayBuffer baf = new ByteArrayBuffer(capacity);
int current = 0;
while((current = buffinput.read()) != -1) {
baf.append((byte) current);
//here you can send data to onProgressUpdate
publishProgress((int) (((float)baf.length()/ (float)sizefile) * 100));
}
File outputfile = new File(createRepertory(app, 0), Filename);
FileOutputStream myOutPut = new FileOutputStream(outputfile);
myOutPut.write(baf.toByteArray());
...
}
protected void onProgressUpdate(Integer... progress) {
//here you can set progress bar in UI thread
progressBarStatus = progress;
}
}
to start AsyncTask call here within your method
new DownloadFileTask().execute(url);

Simple. Below Code:
try {
URL url = new URL(yourLinkofFile);
URLConnection conn = url.openConnection();
conn.connect();
totalFileSize = conn.getContentLength();
} catch (Exception e) {
Log.e(TAG, "ERROR: " + e.toString());
}

Check the Content-Length header on the response. It should be set. All major HTTP servers use this header.

In HTTP 1.1 specs, the chunk response data is supposed to be pulled back across multiple rounds. Acutally, the content-length is -1 in chunk response, so we can't use availble method in Inputstream. BTW, Inputstream.availble method is only stable to get content length in ByteArrayInputStream.
If you just want to get the total length, you need to calc it by yourself in each read round. see the IOUtils class in apache commons-io project as below:
//-----------------------------------------------------------------------
/**
* Copy bytes from an <code>InputStream</code> to an
* <code>OutputStream</code>.
* <p>
* This method buffers the input internally, so there is no need to use a
* <code>BufferedInputStream</code>.
* <p>
* Large streams (over 2GB) will return a bytes copied value of
* <code>-1</code> after the copy has completed since the correct
* number of bytes cannot be returned as an int. For large streams
* use the <code>copyLarge(InputStream, OutputStream)</code> method.
*
* #param input the <code>InputStream</code> to read from
* #param output the <code>OutputStream</code> to write to
* #return the number of bytes copied
* #throws NullPointerException if the input or output is null
* #throws IOException if an I/O error occurs
* #throws ArithmeticException if the byte count is too large
* #since Commons IO 1.1
*/
public static int copy(InputStream input, OutputStream output) throws IOException {
long count = copyLarge(input, output);
if (count > Integer.MAX_VALUE) {
return -1;
}
return (int) count;
}
/**
* Copy bytes from a large (over 2GB) <code>InputStream</code> to an
* <code>OutputStream</code>.
* <p>
* This method buffers the input internally, so there is no need to use a
* <code>BufferedInputStream</code>.
*
* #param input the <code>InputStream</code> to read from
* #param output the <code>OutputStream</code> to write to
* #return the number of bytes copied
* #throws NullPointerException if the input or output is null
* #throws IOException if an I/O error occurs
* #since Commons IO 1.3
*/
public static long copyLarge(InputStream input, OutputStream output)
throws IOException {
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
If you want to check download progress, you need to callback on each read round from InputStream input to OutputStream output in the course of copping to disk. In the callback, you can copy a piece of data and add the amount to the counter which is designed to be available for your progressbar functionality. it is a little bit complex

It sounds like your main problem is not getting the length of the file or figuring out the actual value, but rather how to access the current value from another thread so that you can update the status bar appropriately.
You have a few approaches to solve this:
1.) In your progress bar item have a callback that will let you set the value and call that method each time you update your count in your download thread.
2.) Put the value in some field that is accessible to both threads (potentially not-thread-safe).
If it were me, in my progress bar item I would have a method that would allow updating the progress with some value. Then I would call that method from my thread that is downloading the file.
So basically User --> Clicks some download button --> Handler calls the method to start the download, passing a callback to the update progress bar method --> Downloading thread calls the method on each iterative cycle with the updated percentage complete.

I think you're making your life too complicated :)
First: since progressBarStatus = ((int) downloadFileHttp(url, app) * 100)/sizefile; is always either 0, either 100, maybe you're not computing the value correctly. You didn't post the whole method there, but don't forget you're dealing with int so sizefile is always int, and making divisions to a higher or equal to sizefile is always going to return 0 or 1. I suspect that is the direction you need to look into ...
Also, I don't see in your code where you're updating the progressbar after an intermediate byte read.
Second: I think it would be more efficient if you would read in chunks. The read is more efficient and you don't need to notify the UI thread for each downloaded byte. The answer from Adamski from this thread might help you out. Just use a smaller byte array. I am usually using 256 (3G) or 512 (Wi-Fi) - but maybe you don't need to go that much into detail. So once you got a new array read, count the total number of bytes read, notify the UI and continue reading until the end of stream.
Third: Set the progressBar.setMax() before downloading to sizeFile, compute the downloaded bytes number properly based on the comment from "First" and then call setProgress with that computed number. Just don't forget to update the progressbar on an UI thread. AsyncTask has a great mechanism to help you with that.
Good luck!

Well this should help you out
URLConnection connection = servletURL.openConnection();
BufferedInputStream buff = new BufferedInputStream(connection .getInputStream());
ObjectInputStream input = new ObjectInputStream(buff );
int avail = buff .available();
System.out.println("Response content size = " + avail);

At what point does wrapping a FileOutputStream with a BufferedOutputStream make sense, in terms of performance?

I have a module that is responsible for reading, processing, and writing bytes to disk. The bytes come in over UDP and, after the individual datagrams are assembled, the final byte array that gets processed and written to disk is typically between 200 bytes and 500,000 bytes. Occassionally, there will be byte arrays that, after assembly, are over 500,000 bytes, but these are relatively rare.
I'm currently using the FileOutputStream's write(byte\[\]) method. I'm also experimenting with wrapping the FileOutputStream in a BufferedOutputStream, including using the constructor that accepts a buffer size as a parameter.
It appears that using the BufferedOutputStream is tending toward slightly better performance, but I've only just begun to experiment with different buffer sizes. I only have a limited set of sample data to work with (two data sets from sample runs that I can pipe through my application). Is there a general rule-of-thumb that I might be able to apply to try to calculate the optimal buffer sizes to reduce disk writes and maximize the performance of the disk writing given the information that I know about the data I'm writing?

BufferedOutputStream helps when the writes are smaller than the buffer size e.g. 8 KB. For larger writes it doesn't help nor does it make it much worse. If ALL your writes are larger than the buffer size or you always flush() after every write, I would not use a buffer. However if a good portion of your writes are less that the buffer size and you don't use flush() every time, its worth having.
You may find increasing the buffer size to 32 KB or larger gives you a marginal improvement, or make it worse. YMMV
You might find the code for BufferedOutputStream.write useful
/**
* Writes <code>len</code> bytes from the specified byte array
* starting at offset <code>off</code> to this buffered output stream.
*
* <p> Ordinarily this method stores bytes from the given array into this
* stream's buffer, flushing the buffer to the underlying output stream as
* needed. If the requested length is at least as large as this stream's
* buffer, however, then this method will flush the buffer and write the
* bytes directly to the underlying output stream. Thus redundant
* <code>BufferedOutputStream</code>s will not copy data unnecessarily.
*
* #param b the data.
* #param off the start offset in the data.
* #param len the number of bytes to write.
* #exception IOException if an I/O error occurs.
*/
public synchronized void write(byte b[], int off, int len) throws IOException {
if (len >= buf.length) {
/* If the request length exceeds the size of the output buffer,
flush the output buffer and then write the data directly.
In this way buffered streams will cascade harmlessly. */
flushBuffer();
out.write(b, off, len);
return;
}
if (len > buf.length - count) {
flushBuffer();
}
System.arraycopy(b, off, buf, count, len);
count += len;
}

I have lately been trying to explore IO performance. From what I have observed, directly writing to a FileOutputStream has led to better results; which I have attributed to FileOutputStream's native call for write(byte[], int, int). Moreover, I have also observed that when BufferedOutputStream's latency begins to converge towards that of direct FileOutputStream, it fluctuates a lot more i.e. it can abruptly even double-up (I haven't yet been able to find out why).
P.S. I am using Java 8 and will not be able to comment right now on whether my observations will hold for previous java versions.
Here's the code I tested, where my input was a ~10KB file
public class WriteCombinationsOutputStreamComparison {
private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class);
public static void main(String[] args) throws IOException {
final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024);
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int data = input.read();
while (data != -1) {
byteArrayOutputStream.write(data); // everything comes in memory
data = input.read();
}
final byte[] bytesRead = byteArrayOutputStream.toByteArray();
input.close();
/*
* 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes
*/
try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) {
final long begin = System.nanoTime();
outputStream.write(bytesRead);
outputStream.flush();
final long end = System.nanoTime();
LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
if (LOG.isDebugEnabled()) {
LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
}
}
/*
* 2. WRITE USING A BUFFERED STREAM, write entire array
*/
// changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs
try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) {
final long begin = System.nanoTime();
outputStream.write(bytesRead);
outputStream.flush();
final long end = System.nanoTime();
LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
if (LOG.isDebugEnabled()) {
LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
}
}
}
}
OUTPUT:
2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059]
2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]

Java: Javolution: How to use UTF8StreamReader properly? Error occurs Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048

Here's the code:
public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
String fileDir = "C:\\TestData\\w12";
File dirSrc = new File(fileDir);
File[] list = dirSrc.listFiles();
long start = System.currentTimeMillis();
for(int j=0; j<list.length; j++){
int chr;
String srcFile = list[j].getPath();
String outFile = fileDir + "\\..\\merged.txt";
UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true));
while((chr=inFile.read()) != -1) {
outPut.write(chr);
}
outPut.close();
inFile.close();
}
System.out.println(System.currentTimeMillis()-start);
}
File size of the utf-8 file is 200MB as test data but high possibility of 800MB up.
Here's the UTF8StreamReader.read() source code.
/**
* Holds the bytes buffer.
*/
private final byte[] _bytes;
/**
* Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
*/
public UTF8StreamReader() {
_bytes = new byte[2048];
}
/**
* Reads a single character. This method will block until a character is
* available, an I/O error occurs or the end of the stream is reached.
*
* #return the 31-bits Unicode of the character read, or -1 if the end of
* the stream has been reached.
* #throws IOException if an I/O error occurs.
*/
public int read() throws IOException {
byte b = _bytes[_start];
return ((b >= 0) && (_start++ < _end)) ? b : read2();
}
The error occurs at _bytes[_start] because the _bytes = new byte[2048].
Here's another UTF8StreamReader constructor:
/**
* Creates a UTF-8 reader having a byte buffer of specified capacity.
*
* #param capacity the capacity of the byte buffer.
*/
public UTF8StreamReader(int capacity) {
_bytes = new byte[capacity];
}
Problem: How can I specified the correct capacity of the _bytes upon UTF8StreamReader creation?
I tried the File.length() but it returns long type (i think its right because I am expecting huge file size but the constructor receiving only by int type).
Any guidance on the right direction is appreciated.

It seems anybody does not yet experience same with the above situation.
Anyway, I tried other solution by not using the above class (UTF8StreamReader) rather ByteBuffer (UTF8ByteBufferReader). It is incredible faster than StreamReader.
Faster Merging Files by using ByteBuffer

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reactor - How to compress Flux<ByteBuffer> on the fly? - java

Related

Get AudioInputStream of FloatBuffer

MappedByteBuffer not releasing memory

How can I retrieve the size of a file during the downloading from an URL (using a http connection)?

At what point does wrapping a FileOutputStream with a BufferedOutputStream make sense, in terms of performance?

Java: Javolution: How to use UTF8StreamReader properly? Error occurs Caused by: java.lang.ArrayIndexOutOfBoundsException: 2048

Categories

Resources