Receive input from microphone from 2 processes at once - java

I've been working on the java speech recognition using sphynx4 and I currently have and issue.
I have an app that recognizes the microphone input using LiveSpeechRecognizer class of Sphynx4 which works fine. The issue is that after i added the class that also listens to microphone and transforms and visualizes the output.
Separately both classes works ok. But when combined in a single app i get the error:
LineUnavailableException: line with format PCM_SIGNED 44100.0 Hz, 8 bit, mono, 1 bytes/frame, not supported.
I have checked the issue and it seems to be caused by the simultaneous access to the microphone. I had an idea to use StreamSpeechRecognizer instead of a Live, but I failed to retrieve the stream from the microphone input. Tried AudioInputStream for that purpose.
Could you please suggest how can i adjust my code to get both: SpeechRecognition and Oscilloscope to use microphone simultaneously?
Thanks in advance.
UPD:
That's an my attempt to split the microphone input to use in both apps.
....
byte[] data = new byte[dataCaptureSize];
line.read(data, 0, data.length);
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(data);
byte audioData[] = out.toByteArray();
InputStream byteArrayInputStream = new ByteArrayInputStream(audioData);
AudioInputStream audioInputStream = new AudioInputStream(byteArrayInputStream,
inputFormat,
audioData.length / inputFormat.getFrameSize());
....
That's how i convert it to the input stream which is than passed to the StreamSpeechRecognizer and the array of bytes is transformed with Fast Fourier Transform and passed to the graph. That doesn't work as it just freezes the graph all the time so the data displayed is not an actual one.
I tried to run recognition in separate thread but it didn't increase performance at all.
My code of splitting to threads is down below:
Thread recognitionThread = new Thread(new RecognitionThread(configuration,data));
recognitionThread.join();
recognitionThread.run();
UPD 2:
The input is from microphone.
The above AudioInputStream is passed to the StreamSpeechRecognizer:
StreamSpeechRecognizer nRecognizer = new StreamSpeechRecognizer(configuration);
nRecognizer.startRecognition(audioStream);
And the byte array is passed transformed by FFT and passed to the graph:
`
double[] arr = FastFourierTransform.TransformRealPart(data);
for (int i = 0; i < arr.length; i++) {
series1.getData().add(new XYChart.Data<>(i*22050/(arr.length), arr[i]));
`

Here is a plausible approach to consider.
First, write your own microphone reader. (There are tutorials on how to do this.) Then repackage that data as two parallel Lines that the other applications can read.
Another approach would be to check if either application has some sort of "pass through" capability enabled.
EDIT: added to clarify
This Java sound record utility code example opens a TargetDataLine to the microphone, and stores data from it into an array (lines 69, 70). Instead of storing the data in an array, I'm suggesting that you create two SourceDataLine objects and write the data out to each.
recordBytes = new ByteArrayOutputStream();
secondStreamBytes = new ByteArrayOutputStream();
isRunning = true;
while (isRunning) {
bytesRead = audioLine.read(buffer, 0, buffer.length);
recordBytes.write(buffer, 0, bytesRead);
secondStreamBytes.write(buffer, 0, bytesRead);
}
Hopefully it won't be too difficult to figure out how to configure your two programs to read from the created lines rather than from the microphone's line. I'm unable to provide guidance on how to do that.
EDIT 2:
I wish some other people would join in. I'm a little over my head with doing anything fancy with streams. And the code you are giving is so minimal I still don't understand what is happening or how things are connecting.
FWTW: (1) Is the data you are adding into "series1" is the streaming data? If so, can you add a line in that for loop, and write the same data to a stream consumed by the other class? (This would be a way of using the microphone data "in series" as opposed to "in parallel.")
(2) Data streams often involves code that blocks or that runs at varying paces due to the unpredictable way in which the cpu switches between tasks. So if you do write a "splitter" (as I tried to illustrate by modifying the microphone reading code I linked earlier) there could arise a situation where the code will only run as fast as the slower of the two "splits" at the given moment. You may need to incorporate some sort of buffering and use separate threads for the two recipients of the mike data.
I wrote my first buffering code recently, for a situation where a microphone-reading line is sending a stream to an audio-mixing funtion on another thread. I only wrote this a few weeks ago and it's the first time I dealt with trying to run a stream across a thread barrier threads, so I don't know if the idea I came up with is the best way to do this sort of thing. But it does manage to keep the feed from the mike to the mixer steady with no drop outs and no losses.
The mike reader reads a buffer of data, then adds this byte[] buffer into a ConcurrentLinkedQueue<Byte[]>.
From the other thread, the audio-mixing code polls the ConcurrentLinkedQueue for data.
I experimented a bit and currently have the size of the byte[] buffer at 512 bytes and the ConcurrentLinkedQueue is set to hold up to 12 "buffers" before it starts throwing away the oldest buffers (the structure is FIFO). This seems to be enough of these small buffers to accommodate when the microphone processing code temporarily gets ahead of the mixer.
The ConcurrentLinkedQueue has built in provisions to allow adding and polling to occur from two threads at the same time without throwing an exception. Whether this is something you have to write to help with a hand off, and what the best buffer size might be, I can't say. Maybe a much larger buffer with fewer buffers held in the Queue would be better.
Maybe someone else will weigh in, or maybe the suggestion will be worth experimenting with and trying out.
Anyway, that's about the best I can do, given my limited experience with this. I hope you are able to work something out.

Related

In Java how to read the latest string of constantly generated stream fast?

In Java I have a process constantly generating output. Of course it's placed into some buffer of the out stream (FiFo) until it's processed. But what I need is sometimes read the latest, actual string of the stream as if it was LiFo. The problem is when I need it, I have to read all the previous output generated between my reads, because streams don't have random access - which is very slow.
I use BufferedReader(StreamReader(process.getInputStream()))
The buffer of BufferedReader also poses a little problem.
How can I discard all the output I don't need, fast?
If possible I wouldn't like to create separate reader-discarder thread.
I tried:
stdInput = new BufferedReader(new
InputStreamReader(process.getInputStream()), 1000);
then when I need to read the Output:
stdInput.skip(iS.available() + 1000); //get the generated up
//till now sequence length and discard it
stdInput.readLine(); //to 'flush' the BufferedReader buffer
s = stdInput.readLine(); //to read the latest string
this way is very slow and takes undetermined time
Since you haven't posted the full code, it may be useful for you to try a few things and see what performs best. Overall, I'm not sure how much improvement you will see.
Remove BufferedReader wrapper and benchmark. It seems that your InputStream is memory backed. Hence reads maybe cheap. If it is then buffered reader may slow you down. If reads are expensive, then you should keep it.
Try Channels.newChannel(InputStream in) and use ReadableByteChannel.read(ByteBUffer dst) and benchmark. Again, it depends on your InputStream, but a Channel may show performance benefits.
Overall I'd recommend going the multithreaded approach with an ExecutorService and Callable class doing the reading and processing into a memory buffer.
My suggestion (I don't know your implementation): https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#mark(int)
BufferedReader has a mark method:
public void mark(int readAheadLimit)
throws IOException
Marks the present position in the stream. Subsequent calls to reset()
will attempt to reposition the stream to this point.
If you know the size of your data and you can utilize that size to set the position to the part of the stream in which you care about.

Can't understand piped inputstream

I am trying to understand piped streams.
Instead of piped stream why can't we use other streams to pipe each other? like below:
final ByteArrayOutputStream pos = new ByteArrayOutputStream();
final ByteArrayInputStream pis = new ByteArrayInputStream(pos.toByteArray());
and when will we have a deadlock in a piped stream? I tried to read and write using single main thread, but it executes smoothly.
The difficulty here is that the process must be implemented in several threads because writing to one end of the pipe must be matched with a read at the other end.
It is certainly not difficult to create a thread to monitor arrivals at the end of one pipe and push them back through another pipe but it cannot be done with a single thread.
Have you looked at this question?
Piped streams allow for efficient byte-by-byte processing with minimal effort.
I could very well be wrong, but I believe toByteArray() might not do what you think it does. It just copies the current contents, not any contents in future.
So the only real issue here is management of this, which would be a bit more difficult. You'd have to constantly poll the output stream. Not to mention the memory allocation of an array for each call to toByteArray (which "Creates a newly allocated byte array" for each call).
How I suspect deadlocks may happen in a single thread:
If you try to (blocking) read from an input stream that doesn't have data yet. It will never be able to get data because data can only be obtained from the output stream to which must be written in the same thread, which can't happen while you're sitting waiting for data.
So, in a single thread, it will happen if you're not very careful, but it should be possible to successfully use them in the same thread without deadlocks, but why would you want to? I think another data structure may be better suited, like a linked-list or simple circular array.

Why does input stream read data in chunks?

I am trying to read some data from a network socket using the following code -
Socket s = new Socket(address, 502);
response = new byte[1024];
InputStream is = s.getInputStream();
int count = is.read(response, 0, 100);
The amount of data isn't large. It is 16 bytes in total. However the read() statement does not read all the data in one go. It reads only 8 bytes of data into my buffer.
I have to make multiple calls to read() like this in order to read the data -
Socket s = new Socket(address, 502);
response = new byte[1024];
InputStream is = s.getInputStream();
int count = is.read(response, 0, 100);
count += is.read(response, count, 100-count);
Why is this happening? Why does read() not read the entire stream in one go?
Please note that the data is not arriving gradually. If I wait for 2 seconds before reading the data by making a call to Thread.sleep(2000) the behavior remains the same.
Why does read() not read the entire stream in one go?
Because it isn't specified to do so. See the Javadoc. It blocks until at least one byte is available, then returns some number between 1 and the supplied length, inclusive.
That in turn is because the data doesn't necessarily arrive all in one go. You have no control over how TCP sends and receives data. You are obliged to just treat it as a byte stream.
I understand that it blocks until data arrives. "That in turn is because the data doesn't necessarily arrive all in one go." Why not is my question.
The data doesn't necessarily all arrive in one go because the network typically breaks it up into packets. IP is a packet switching protocol.
Does TCP transmit it blocks of 8 bytes?
Possibly, but probably not. The packet size depends on the network / networks that the data has traversed, but a typical internet packet size is around 1500 bytes.
If you are getting 8 bytes at a time, either your data is either coming through a network with an unusually small packet size, or (more likely) the sender is sending the data 8 bytes at a time. The second explanation more or less jives with what your other comments say.
And since I explicitly specify 100, a number much larger than the data in buffer shouldn't it attempt to read up till atleast 100 bytes?
Well no. It is not specified to work that way, and it doesn't work that way. You need to write your code according to what the spec says.
It is possible that this has something to do with the way the device is being "polled". But without looking at the specs for the device (or even knowing what it is exactly) this is only a guess.
Maybe the data is arriving gradually not because of your reading but because of the sender.
The sender should use a BufferedOutputStream (in the middle) to make big chunks before sending (and use flush only when it's needed).

Multithreading approach to find text pattern in files

Consider simple Java application which should traverse files tree in a disc to find specific pattern in the body of the file.
Wondering is it possible to achieve better performance, using multi-threading, for example when we find new folder we submit new Runnable in fixed ThreadPool. Runnable task should traverse folder to find out new folders etc. In my opinion this operation should be IO bound, not CPU bound, so spawning new Thread would not improve performance.
Does it depends of hard drive type ? ( hdd, ... etc)
Does it depends on OS type ?
IMHO the only thing that can be parallel is - to spawn new Thread for parsing file content to find out pattern in file body.
What is the common pattern to solve this problem it ? Should it be multi-threaded or single-threaded ?
I've performed some research in this area while working under test project, you can look at the project on github at: http://github.com/4ndrew/filesearcher. Of course the main problem is a disk I/O speed, but if you would use optimal count of threads to perform reading/searching in parallel you can get better results in common.
UPD: Also look at this article http://drdobbs.com/parallel/220300055
I did some experiments on just this question some time ago. In the end I concluded that I could achieve a far better improvement by changing the way I accessed the file.
Here's the file walker I eventually ended up using:
// 4k buffer size ... near-optimal for Windows.
static final int SIZE = 4 * 1024;
// Fastest because a FileInputStream has an associated channel.
private static void ScanDataFile(Hunter h, FileInputStream f) throws FileNotFoundException, IOException {
// Use a mapped and buffered stream for best speed.
// See: http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
FileChannel ch = f.getChannel();
// How much I've read.
long red = 0L;
do {
// How much to read this time around.
long read = Math.min(Integer.MAX_VALUE, ch.size() - red);
// Map a byte buffer to the file.
MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, red, read);
// How much to get.
int nGet;
// Walk the buffer to the end or until the hunter has finished.
while (mb.hasRemaining() && h.ok()) {
// Get a max of 4k.
nGet = Math.min(mb.remaining(), SIZE);
// Get that much.
mb.get(buffer, 0, nGet);
// Offer each byte to the hunter.
for (int i = 0; i < nGet && h.ok(); i++) {
h.check(buffer[i]);
}
}
// Keep track of how far we've got.
red += read;
// Stop at the end of the file.
} while (red < ch.size() && h.ok());
// Finish off.
h.close();
ch.close();
f.close();
}
You stated it right that you need to determine if your task is CPU or IO bound and then decide could it benefit from multithreading or not. Generally disk operations are pretty slow so unless amount of data you need to parse and parsing complexity you might not benefit from multithreading much. I would just write a simple test - just to read files w/o parsing in single thread, measure it and then add parsing and see if it's much slower and then decide.
Perhaps good design would be to use two threads - one reader thread that reads the files and places data in (bounded) queue and then another thread (or better use ExecutorService) parses the data - it would give you nice separation of concerns and you could always tweak number of threads doing parsing. I'm not sure if it makes much sense to read disk with multiple threads (unless you need to read from multiple physical disks etc).
What you could do is this: implement a single-producer multi-consumer pattern, where one thread searches the disk, retrieves files and then the consumer threads process them.
You are right that in this case using multiple threads to scan the disk would not be beneficial, in fact it would probably degrade performance, since the disk needs to seek the next reading position every time, so you end up bouncing the disk between the threads.

Measuring actual bytes written through Java sockets

I have written a small program which send/receives files from one client to another. I've set up progressbars for both the receiver and the client, but the problem is that the sender seems to have his progressbar finish much quicker than the actual transfer. The problem lies with the how it calculates how many bytes that have been written. I'm assuming it's counting how many bytes I've read into buffer, rather than bytes that were sent through the network, so how can I find a solution to this problem? The receiver is calculating his received bytes at a correct rate, but the sender is not doing his part correctly.
Setting a lower buffersize offsets the difference a bit, but it's still not correct. I've tried wrapping the outputstream with a CountingOutputStream, but it returns the same result as the code snippet below. The transfer eventually completes correctly, but I need the proper "sent" values to update my progressbar, as in what was actually received and written to disc at the receiver side. I've included a very stripped down code snippet which represents my way of calculating transferred bytes. Any examples of a solution would be very helpful.
try
{
int sent = 0;
Socket sk = new Socket(ip, port);
OutputStream output = sk.getOutputStream();
FileInputStream file = new FileInputStream(filepath);
byte[] buffer = new byte[8092];
while ((bytesRead = file.read(buffer)) > 0)
{
output.write(buffer, 0, bytesRead);
sent += bytesRead;
System.out.println(sent); // Shows incorrect values for the actual speed.
}
}
In short, I don't think you can get the sort of accurate visibility you're looking for solely from the "sender" side, given the number of buffers between you and the "wire" itself. But also, I don't think that matters. Here's why:
Bytes count as "sent" when they are handed to the network stack. When you are sending a small number of bytes (such as your 8K example) those bytes are going to be buffered & the write() calls will return quickly.
Once you're reached network saturation, your write() calls will start to block as the various network buffers become full - and thus then you'll get a real sense of the timings.
If you really must have some sort of "how many bytes have you received?" you'll have to have the receiving end send that data back periodically via an out-of-band mechanism (such as suggested by glowcoder)
Get the input stream from the socket, and on the other side, when you've written a selection of bytes to disk, write the result to the output stream. Spawn a second thread to handle the reading of this information, and link it to your counter.
Your variable is sent - it is accurate. What you need is a received or processed variable, and for that you will need two-way communication.

Categories