I have a requirement to read and write compressed (gzip/brotli) streams without intermediate storage. The data is received from the underlying in Flux<ByteBuffer> format. The data is large enough that buffering is out of option. How do I compress Flux<ByteBuffer> on the fly without having to store the full data either in memory or writing out to disk?
You want to avoid buffering of full data, but you can archive each ByteBuffer chunk or, if your chunks are sufficiently small, to consolidate chunks in groups and then to archive.
This will not require too long memory, but will compress your data.
The actual level of compression depends on content of your source data and on number of chunks consolidated before archiving. I think, you cat adjust it manually to have the best ratio.
The example of probable code is below:
public class Test_GzipFlux {
/**
* Returns Flux of gzip-ed buffers after (optional) buffer consolidation
* #param inFlux input stream of buffers
* #param consolidatedBufCount number of buffers to consolidate before gzip-ing
*/
public static Flux<ByteBuffer> gzipFlux(Flux<ByteBuffer> inFlux,
int consolidatedBufCount, int outChunkMaxLength) {
return inFlux.buffer(consolidatedBufCount)
.map(inList->zipBuffers(inList, outChunkMaxLength));
}
/**
* Consolidates buffers from input list, applies gzip, returns result as single buffer
* #param inList portion of chunks to be consolidated
* #param outChunkMaxLength estimated length of output chunk.
* !!! to avoid pipe deadlock, this length to be sufficient
* !!! for consolidated data after gzip
*/
private static ByteBuffer zipBuffers(List<ByteBuffer> inList, int outChunkMaxLength) {
try {
PipedInputStream pis = new PipedInputStream(outChunkMaxLength);
GZIPOutputStream gos = new GZIPOutputStream(new PipedOutputStream(pis));
for (var buf: inList) {
gos.write(buf.array());
}
gos.close();
byte[] outBytes = new byte[pis.available()];
pis.read(outBytes);
pis.close();
return ByteBuffer.wrap(outBytes);
} catch (IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}
private static void test() {
int inLength = ... // actual full length of source data
Flux<ByteBuffer> source = ... // your source Flux
// these are parameters for your adjustment
int consolidationCount = 5;
int outChunkMaxLength= 30 * 1024;
Flux<ByteBuffer> result = gzipFlux(source,consolidationCount, outChunkMaxLength);
int outLen = result.reduce(0, (res, bb) -> res + bb.array().length).block();
System.out.println("ratio=" + (double)inLength/outLen);
}
}
I just finished coding a Huffman compression/decompression program. The compression part of it seems to work fine but I am having a little bit of a problem with the decompression. I am quite new to programming and this is my first time doing any sort of byte manipulation/file handling so I am aware that my solution is probably awful :D.
For the most part my decompression method works as intended but sometimes it drops data after decompression (aka my decompressed file is smaller than my original file).
Also whenever I try to decompress a file that isnt a plain text file (for example a .jpg) the decompression returns a completely empty file (0 bytes), the compression compresses these other types of files just fine though.
Decompression method:
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName() + "_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
When I compress the file I save the "character" (byte) values and the frequencies of each unique "character" + the compressed bytes in the same file (all in binary form). I then use this saved info to fill by charFreqs array in my decompress() method and then use that array to build my tree. The formatting of the saved structure looks like this:
<n><value 1><frequency>...<value n><frequency>[the compressed bytes]
(without the <> of course) where n is the number of unique bytes/characters I have in my original text (AKA my leaf values).
I have tested my code a bit and the bytes seem to get dropped somewhere in the while() loop at the bottom of my decompress method (charFreqs[] and the tree seem to retain all the original byte values).
EDIT: Upon request I have now shortened my post a bit in an attempt to make it less cluttered and more "straight to the point".
EDIT 2: I fixed it (but not fully)! The fault was in my BitFileWriter and not in my decompress method. My decompression still does not function properly though. Whenever I try to decompress something that isn't a plain text file (for example a .jpg) it returns a empty "decompressed" file (0 bytes in size). I have no idea what is causing this...
I am trying to send chunks of files from server to more than one clients. When I am trying to send file of size 700mb, it showed "OutOfMemory java heap space" error. I am using Netbeans 7.1.2 version.
I also tried VMoption in the properties. But still the same error happens. I think there is some problem with reading the entire file. Below code is working for up to 300mb. Please give me some suggestions.
Thanks in advance
public class SplitFile {
static int fileid = 0 ;
public static DataUnit[] getUpdatableDataCode(File fileName) throws FileNotFoundException, IOException{
int i = 0;
DataUnit[] chunks = new DataUnit[UAProtocolServer.singletonServer.cloudhosts.length];
FileInputStream fis;
long Chunk_Size = (fileName.length())/chunks.length;
int cursor = 0;
long fileSize = (long) fileName.length();
int nChunks = 0, read = 0;long readLength = Chunk_Size;
byte[] byteChunk;
try {
fis = new FileInputStream(fileName);
//StupidTest.size = (int)fileName.length();
while (fileSize > 0) {
System.out.println("loop"+ i);
if (fileSize <= Chunk_Size) {
readLength = (int) fileSize;
}
byteChunk = new byte[(int)readLength];
read = fis.read(byteChunk, 0, (int)readLength);
fileSize -= read;
// cursor += read;
assert(read==byteChunk.length);
long aid = fileid;
aid = aid<<32 | nChunks;
chunks[i] = new DataUnit(byteChunk,aid);
// Lister.add(chunks[i]);
nChunks++;
++i;
}
fis.close();
fis = null;
}catch(Exception e){
System.out.println("File splitting exception");
e.printStackTrace();
}
return chunks;
}
Reading in the whole file would definitely trigger OutOfMemoryError as file size grow. Tuning the -Xmx1024M may be good for temporary fix, but it's definitely not the right/scalable solution. Also, doesn't matter how you move your variables around (like creating buffer outside of the loop instead of inside the loop) you will get OutOfMemoryError sooner or later. The only way to not get OutOfMemoryError for you is to not to read the complete file in memory.
If you have to use just memory, then an approach is to send off chunks to the client so you don't have to keep all the chunks in memory:
instead of:
chunks[i] = new DataUnit(byteChunk,aid);
do:
sendChunkToClient(new DataUnit(byteChunk, aid));
But the above solution has the drawback that if some error happened in-between chunk sending, you may have hard time trying to resume/recover from the error point.
Saving the chunks to temporary files like Ross Drew suggested is probably better and more reliable.
How about creating the
byteChunk = new byte[(int)readLength];
outside of the loop and just reuse it instead of creating an array of bytes over and over if it's always the same.
Alternatively
You could write incoming data to a temporary file as it comes in instead of maintaining that huge array then process it once it's all arrived.
Also
If you are using it multiple times as an int, you should probably just case readLength to an int outside the loop as well
int len = (int)readLength;
And Chunk_Size is a variable right? It should begin with a lower case letter.
I have a database dump program that writes out flat files of a table in a very specific format. I now need to test this against our old program and confirm the produced files are identical. Doing this manually is painful, so I need to write some unit tests.
I need to compare two file contents byte by byte, and see the first difference. The issue is they have all manner of crazy bytes with CF/LF/null's etc littered throughout.
Here is a screenshot of the two files fro Scite to give you an idea:
http://imageshack.us/photo/my-images/840/screenshot1xvt.png/
What is the best strategy for confirming each byte corresponds?
Apache Commons IO has a FileUtils.contentEquals(File file1, File file2) method that seems to do what you want. Pros:
Looks efficient -- reads the file contents using a buffered stream, doesn't even open the files if the lengths are different.
Convenient.
Con:
Won't give you details about where the differences are. It sounds from your comment like you want this.
I would say your best bet is to just download the source code, see what they're doing, and then enhance it to print out the line numbers. The hard part will be figuring out which line you're on. By reading at the byte level, you will have to explicitly check for \r, \n, or \r\n and then increment your own "line number" counter. I also don't know what kind of i18n issues (if any) you'll run into.
class DominicFile {
static boolean equalfiles(File f1, File f2) {
byte[] b1 = getBytesFromFile(f1);
byte[] b2 = getBytesFromFile(f2);
if(b1.length != b2.length) return false;
for(int i = 0; i < b1.length; i++) {
if(b1[i] != b2[i]) return false;
}
return true;
}
// returns the index (0 indexed) of the first difference, or -1 if identical
// fails for files 2G or more due to limitations of "int"... use long if needed
static int firstDiffBetween(File f1, File f2) {
byte[] b1 = getBytesFromFile(f1);
byte[] b2 = getBytesFromFile(f2);
int shortest = b1.length;
if(b2.length < shortest) shortest = b2.length;
for(int i = 0; i < shortest; i++) {
if(b1[i] != b2[i]) return i;
}
return -1;
}
// Returns the contents of the file in a byte array.
// shamelessly stolen from http://www.exampledepot.com/egs/java.io/file2bytearray.html
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
}
Why not do an MD5 checksum, like the one describe here
In a Java program, what is the best way to read an audio file (WAV file) to an array of numbers (float[], short[], ...), and to write a WAV file from an array of numbers?
I read WAV files via an AudioInputStream. The following snippet from the Java Sound Tutorials works well.
int totalFramesRead = 0;
File fileIn = new File(somePathName);
// somePathName is a pre-existing string whose value was
// based on a user selection.
try {
AudioInputStream audioInputStream =
AudioSystem.getAudioInputStream(fileIn);
int bytesPerFrame =
audioInputStream.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
// some audio formats may have unspecified frame size
// in that case we may read any amount of bytes
bytesPerFrame = 1;
}
// Set an arbitrary buffer size of 1024 frames.
int numBytes = 1024 * bytesPerFrame;
byte[] audioBytes = new byte[numBytes];
try {
int numBytesRead = 0;
int numFramesRead = 0;
// Try to read numBytes bytes from the file.
while ((numBytesRead =
audioInputStream.read(audioBytes)) != -1) {
// Calculate the number of frames actually read.
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
// Here, do something useful with the audio data that's
// now in the audioBytes array...
}
} catch (Exception ex) {
// Handle the error...
}
} catch (Exception e) {
// Handle the error...
}
To write a WAV, I found that quite tricky. On the surface it seems like a circular problem, the command that writes relies on an AudioInputStream as a parameter.
But how do you write bytes to an AudioInputStream? Shouldn't there be an AudioOutputStream?
What I found was that one can define an object that has access to the raw audio byte data to implement TargetDataLine.
This requires a lot of methods be implemented, but most can stay in dummy form as they are not required for writing data to a file. The key method to implement is read(byte[] buffer, int bufferoffset, int numberofbytestoread).
As this method will probably be called multiple times, there should also be an instance variable that indicates how far through the data one has progressed, and update that as part of the above read method.
When you have implemented this method, then your object can be used in to create a new AudioInputStream which in turn can be used with:
AudioSystem.write(yourAudioInputStream, AudioFileFormat.WAV, yourFileDestination)
As a reminder, an AudioInputStream can be created with a TargetDataLine as a source.
As to the direct manipulating the data, I have had good success acting on the data in the buffer in the innermost loop of the snippet example above, audioBytes.
While you are in that inner loop, you can convert the bytes to integers or floats and multiply a volume value (ranging from 0.0 to 1.0) and then convert them back to little endian bytes.
I believe since you have access to a series of samples in that buffer you can also engage various forms of DSP filtering algorithms at that stage. In my experience I have found that it is better to do volume changes directly on data in this buffer because then you can make the smallest possible increment: one delta per sample, minimizing the chance of clicks due to volume-induced discontinuities.
I find the "control lines" for volume provided by Java tend to situations where the jumps in volume will cause clicks, and I believe this is because the deltas are only implemented at the granularity of a single buffer read (often in the range of one change per 1024 samples) rather than dividing the change into smaller pieces and adding them one per sample. But I'm not privy to how the Volume Controls were implemented, so please take that conjecture with a grain of salt.
All and all, Java.Sound has been a real headache to figure out. I fault the Tutorial for not including an explicit example of writing a file directly from bytes. I fault the Tutorial for burying the best example of Play a File coding in the "How to Convert..." section. However, there's a LOT of valuable FREE info in that tutorial.
EDIT: 12/13/17
I've since used the following code to write audio from a PCM file in my own projects. Instead of implementing TargetDataLine one can extend InputStream and use that as a parameter to the AudioSystem.write method.
public class StereoPcmInputStream extends InputStream
{
private float[] dataFrames;
private int framesCounter;
private int cursor;
private int[] pcmOut = new int[2];
private int[] frameBytes = new int[4];
private int idx;
private int framesToRead;
public void setDataFrames(float[] dataFrames)
{
this.dataFrames = dataFrames;
framesToRead = dataFrames.length / 2;
}
#Override
public int read() throws IOException
{
while(available() > 0)
{
idx &= 3;
if (idx == 0) // set up next frame's worth of data
{
framesCounter++; // count elapsing frames
// scale to 16 bits
pcmOut[0] = (int)(dataFrames[cursor++] * Short.MAX_VALUE);
pcmOut[1] = (int)(dataFrames[cursor++] * Short.MAX_VALUE);
// output as unsigned bytes, in range [0..255]
frameBytes[0] = (char)pcmOut[0];
frameBytes[1] = (char)(pcmOut[0] >> 8);
frameBytes[2] = (char)pcmOut[1];
frameBytes[3] = (char)(pcmOut[1] >> 8);
}
return frameBytes[idx++];
}
return -1;
}
#Override
public int available()
{
// NOTE: not concurrency safe.
// 1st half of sum: there are 4 reads available per frame to be read
// 2nd half of sum: the # of bytes of the current frame that remain to be read
return 4 * ((framesToRead - 1) - framesCounter)
+ (4 - (idx % 4));
}
#Override
public void reset()
{
cursor = 0;
framesCounter = 0;
idx = 0;
}
#Override
public void close()
{
System.out.println(
"StereoPcmInputStream stopped after reading frames:"
+ framesCounter);
}
}
The source data to be exported here is in the form of stereo floats ranging from -1 to 1. The format of the resulting stream is 16-bit, stereo, little-endian.
I omitted skip and markSupported methods for my particular application. But it shouldn't be difficult to add them if they are needed.
This is the source code to write directly to a wav file.
You just need to know the mathematics and sound engineering to produce the sound you want.
In this example the equation calculates a binaural beat.
import javax.sound.sampled.AudioFileFormat;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
public class Program {
public static void main(String[] args) throws IOException {
final double sampleRate = 44100.0;
final double frequency = 440;
final double frequency2 = 90;
final double amplitude = 1.0;
final double seconds = 2.0;
final double twoPiF = 2 * Math.PI * frequency;
final double piF = Math.PI * frequency2;
float[] buffer = new float[(int)(seconds * sampleRate)];
for (int sample = 0; sample < buffer.length; sample++) {
double time = sample / sampleRate;
buffer[sample] = (float)(amplitude * Math.cos(piF * time) * Math.sin(twoPiF * time));
}
final byte[] byteBuffer = new byte[buffer.length * 2];
int bufferIndex = 0;
for (int i = 0; i < byteBuffer.length; i++) {
final int x = (int)(buffer[bufferIndex++] * 32767.0);
byteBuffer[i++] = (byte)x;
byteBuffer[i] = (byte)(x >>> 8);
}
File out = new File("out10.wav");
final boolean bigEndian = false;
final boolean signed = true;
final int bits = 16;
final int channels = 1;
AudioFormat format = new AudioFormat((float)sampleRate, bits, channels, signed, bigEndian);
ByteArrayInputStream bais = new ByteArrayInputStream(byteBuffer);
AudioInputStream audioInputStream = new AudioInputStream(bais, format, buffer.length);
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, out);
audioInputStream.close();
}
}
Some more detail on what you'd like to achieve would be helpful. If raw WAV data is okay for you, simply use a FileInputStream and probably a Scanner to turn it into numbers. But let me try to give you some meaningful sample code to get you started:
There is a class called com.sun.media.sound.WaveFileWriter for this purpose.
InputStream in = ...;
OutputStream out = ...;
AudioInputStream in = AudioSystem.getAudioInputStream(in);
WaveFileWriter writer = new WaveFileWriter();
writer.write(in, AudioFileFormat.Type.WAVE, outStream);
You could implement your own AudioInputStream that does whatever voodoo to turn your number arrays into audio data.
writer.write(new VoodooAudioInputStream(numbers), AudioFileFormat.Type.WAVE, outStream);
As #stacker mentioned, you should get yourself familiar with the API of course.
The javax.sound.sample package is not suitable for processing WAV files if you need to have access to the actual sample values. The package lets you change volume, sample rate, etc., but if you want other effects (say, adding an echo), you are on your own. (The Java tutorial hints that it should be possible to process the sample values directly, but the tech writer overpromised.)
This site has a simple class for processing WAV files: http://www.labbookpages.co.uk/audio/javaWavFiles.html
WAV File Specification
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
There is an API for your purpose
http://code.google.com/p/musicg/
First of all, you may need to know the headers and data positions of a WAVE structure, you can find the spec here.
Be aware that the data are little endian.
There's an API which may helps you to achieve your goal.
Wave files are supported by the javax.sound.sample package
Since isn't a trivial API you should read an article / tutorial which introduces the API like
Java Sound, An Introduction
If anyone still can find it required, there is an audio framework I'm working on that aimed to solve that and similar issues. Though it's on Kotlin. You can find it on GitHub: https://github.com/WaveBeans/wavebeans
It would look like this:
wave("file:///path/to/file.wav")
.map { it.asInt() } // here it as Sample type, need to convert it to desired type
.asSequence(44100.0f) // framework processes everything as sequence/stream
.toList() // read fully
.toTypedArray() // convert to array
And it's not dependent on Java Audio.
I use FileInputStream with some magic:
byte[] byteInput = new byte[(int)file.length() - 44];
short[] input = new short[(int)(byteInput.length / 2f)];
try{
FileInputStream fis = new FileInputStream(file);
fis.read(byteInput, 44, byteInput.length - 45);
ByteBuffer.wrap(byteInput).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(input);
}catch(Exception e ){
e.printStackTrace();
}
Your sample values are in short[] input!