I had a question previously: Reading wav file in Java
Firstly, I want to read a wav file with Java and get its bytes to process into an array.
Secondly, I will remove the silences of it(this is another question's topic).
After that I will divide that wav file into 1 second pieces(I should handle header problem for that small wav files, this is one of the major problems)
I tried to read wav file with Java with that API, the answer of my previous question. However with that API should I do anything for header or API does it itself and how can I divide that wav file into 1 second pieces with Java. Another API or anything else is OK for me too.
The API you reference will return a double[] array containing the sample data from the original WAV file. All you have to do then is create a bunch of smaller arrays (each one second long) and copy from the appropriate position in the original array into each of the smaller arrays (so that all the original data is copied into the smaller ones). Then just use the writing methods of your API to create actual WAV files from each smaller array.
Using that API means you shouldn't have to worry about the headers yourself (which is a good thing, because they're easy to write but very complicated to read, potentially).
The size of each smaller array is dependent upon the format of the original. If the original WAV file is mono with a 44.1 KHz sample rate, then a 1-second array would contain 44,100 elements.
Related
I have 2 arrays of shorts containing musical data, one for each channel, and I need the simplest Java method/library possible to write them to a WAV file without the fuss with headers. Does anyone know of such a method/library?
Java Sound can write WAV files. You may find what you are looking for here:
https://docs.oracle.com/javase/8/docs/technotes/guides/sound/programmer_guide/chapter7.html#a114602
Not sure if it takes shorts as I have only used it to playback files not save them.
Well, I know that there are two similar questions but mine is different:
How do I get the subset of a primitive array without copying it?
Why?
I have to read many large "files" (~20M) read via stdin. These files have inside a JPG files embedded. These JPG files are used for some calculations. I need to keep the original file, while I use the embedded JPGs to decide to keep the original file at all.
For that I would like to get the subset (JPG) of the byte array (original file). I want to use this subset array with a 3rd party library (OpenCV).
The most close solution is: Arrays.asList(array).subList(x, y).
But this solution doesn't work for primitive arrays. I am trying to improve performance by not copying the array and not using wrapper classes.
Are there really no ways of grabbing subset of a primitive array?
By the way, ByteBuffer.wrap(array, position, length).toArray() returns the original full byte array.
Edit: Sorry for forgetting to tell you that I am getting these files from a device and not disk. Therefore, I need to capture the whole thing first into memory. Then decide to keep it on disk or not.
When you pass an array reference in Java (speaking about HotSpot JVM), you pass a reference to a memory region which must have a particular layout: object header, array length, then array contents. You should somehow put header and length at the start of your smaller JPG array, thus "corrupting" your bigger array.
All in all, I think, this can be done using sun.misc.Unsafe or JNI, knowledge of the details of your JVM (32 or 64 bits, with compressed oops or not), luck so that no GC will happen amidst yout trick, saving and then restoring the "corrupted" bytes, but it's extremely fragile and prone to errors.
I am trying to read a .doc or .docx file in a byte Array in Java. I am not looking forward to use any third party APis like Apache POI, TIKA, doc4j, etc.
The code at its simplest level is:-
Path file_path = Paths.get("D:\\", "myname3.doc");
byte[] ByteArray= Files.readAllBytes(file_path);
for(byte b : ByteArray){
System.out.print(b);
}
Code is working fine and I receive the byte array. However, when I open the same file again in MS Word, makes no changes, but save it again in the same drive with same contents, the byte array that I receive is totally different. I understand that differences in MetaData pertaining to modified time exists, but the byte Array is totally different, as if the whole content of file has been changed. (attaching the text files containing byte Array of both the iterations).
Difference in Byte Array
Is there a solution to this without using third party APIs?
Note:- I have gone through Word encoding format given at MS website, and also, looked at the endianness issues just in case. Also tried reading the file through FileInputStreamReader.
Note:- This program is working fine for Text files in ASCII format.
EDIT 1:- just to make the question lucid. When I read the .docx file which is saved twice at same location with dfferent name, the byte array is completely different when I read the file using the program above. I would like to know the reason behind this.
EDIT 2:- i tried reading the files in OffVis tool, there also, the raw bytes are different.
This can be definitely be explained for .docx, which is simply a zipped, XML-based file format. Since it is a compressed file, a slight change in one of the underlying files can drastically change the bits of the archive file globally.
Not sure why it happens with doc.
.Doc file yields same byte array everytime.
.docx file, as stated in answer, is a zipped OOXML file, and hence, when I tried looking for binary through officeVis, some extra bytes were added and hence whole byte array was either shifted, or changed.
Another observation, there is a security application in my system which was encrypting .doc files, hence, when I tried reading file through my smartphone (which doesn't have security application), all worked fine.
Thanks for the help.
I have a program that is reading from plain text files. the amount of these files can be more that 5 Million!
When I'm reading them I found them by name! the names are basically save as x and y of a matrix for example 440x300.txt
Now I want to put all of them in one big file and index them
I mean I want to now exactly for example 440x300.txt is saved in the file from which byte and end in which byte!
My first Idea was to create a separate file and save this info in that like each line contains 440 x 300 150883 173553
but finding this info will also a lot of time!
I want to know if the is a better way to find out where do they start and end!
Somehow index the files
Please help
By the way I'm programming in Java.
Thanks in advance for your time.
If you only need to read these files I would archive them in batches. e.g. use ZIP or Jar format. This support the naming and indexing of files and you can build, update and check them using standard tools.
It is possible to place 5 million file sin one archive but using a small number of archives may be more manageable.
BTW: As the files are text, compressing them will also make them smaller. You can try this yourself by create a ZIP or JAR with say 1000 of them.
If you want to be able to do direct addressing within your file, then you have two options:
Have an index at the beginning of your file so you can lookup the start/end address based on (x, y)
Make all records exactly the same size (in bytes) so you can easily compute the location of a record in your files.
Choosing the right option should be done based on the following criteria:
Do you have records for each cell in your matrix?
Do the matrix values change?
Does the matrix dimension change?
Can the values in the matrix have a fixed byte length (i.e. are they numbers or strings)?
I am looking for a java library that allows me to specify max size or max number of lines in output files, and then splits a large xml/text file into smaller files.
I saw that there is a 2 year old question on SO for the same, however the answers there were for specific cloud platforms....I just want a library for use in java desktop apps.
You could use Guava CountingOutputStream to keep track of how much data is written to a file. Write one line at a time, check the number of bytes written and once you exceed the threshold close the file and open a new one.