Im working on cmusphinx speech to text, I need to train/add words to my dictionary, so i used lmtool and uploaded a corpus file and used the .dict and .lm file and used these as parameters for pocketsphinx and it worked. Im wondering how to add these files to default files. i.e i want to add the new words .dict and .lm files to /edu/cmu/sphinx/models/en-us/cmudict-en-us.dict and /edu/cmu/sphinx/models/en-us/en-us.lm.bin
Im not sure, if this is feasible and im wondering how to combine dictionaries into single one. I found this link but not sure how to achieve the same.
When i use the TranscriberDemo.java my wav file has different words and the output prints different. how to improve the accuracy ?
Dictionary and language model extension is covered in the following part of tutorial
http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced
Related
so I'm still fairly new to Java and Android programming. I have designed a simple Text Lingo Android app. Everything works well, but I was just wondering if there is an easier way to create my own "dictionary" of words. Currently...my code involves about 80 lines of
HashMap<String> words //new hashMap
words.put("Lol,"laugh out loud");
words.put(someKey, someValue); //repeat for 80 different words and counting..
I don't know much about databases, although I don't know if that would really make it easier. Just wondering. Thanks.
The method you have shown looks pretty easy but time consuming. The only other idea I could have would be if you have the words in a .csv file you could read the file, then split the string on the delimiter (usually a comma) then iterate over the string array.
That's only faster if you already have a file with the words in.
There is no shorter way indeed. Though it is possible to store the data somewhere online in a XML file and load/parse this data only when you need it. you'll have to make a network connection. See this link.
I'm pretty sure the answer i'm going to get is: "why don't you just have the text files all be the same or follow some set format". Unfortunately i do not have this option but, i was wondering if there is a way to take any text file and translate it over to another text or xml file that will always look the same?
The text files pretty much have the same data just arranged differently.
The closest i can come up with is to have an XSLT sheet for each text file but, then i have to turn around and read the file that was just created, delete it, and repeat for each text file.
So, is there a way to grab the data off text files that essentially have the same data just stored differently; and store this data in an object that i could then re-use later on in some process?
If it was up to me, i would push for every text file to follow some predefined format since they all pretty much contain the same data but, it's not up to me.
Odd question... You say they are text files yet mention XSLT as a possible solution. XSLT will only work if the source is XML, if that is so, please redefine the question. If you say text files I assume delimiter separated (e.g. csv), fixed length,...
There are some parsers (like smooks) out there that allow you to parse multiple formats, but it will still require you to perform the "mapping" yourself of course.
This is a typical problem in the integration world so any integration tool should offer you a solution (e.g. wso2, fuse,...).
When I google "how to make a dictionary", it gives me a great measure of the explanation of "make", which is very helpful. But I need something else, so I put this question here.
I want to make a small project. I want to make a dictionary with java or android. But I don't know how should I organize the words. I have considered a JSON file, a XML file or I can also simply output all the words as ojbects into a file. Could anyone please give me some adivce?
Assuming that you want to be able to read (quickly) values from your dictionary, and maybe update values or create new values then I suggest that you store your dictionary in a Database. For a simple Java database I suggest that you use an embedded Derby Database.
see http://db.apache.org/derby/
I'm writing a tool to analyze stock market data. For this I download data and then save all the data corresponding to a stock as a double[][] 20*100000 array in a data.bin on my hd, I know I should put it in some database but this is simply performance wise the best method.
Now here is my problem: I need to do updates and search on the data:
Updates: I have to append new data to the end of the array as time progresses.
Search: I want to iterate over different data files to find a minimum or calculate moving averages etc.
I could do both of them by reading the whole file in and update it writing or do search in a specific area... but this is somewhat overkill since I don't need the whole data.
So my question is: Is there a library (in Java) or something similar to open/read/change parts of the binary file without having to open the whole file? Or searching through the file starting at a specific point?
RandomAccessFile allows seeking into particular position in a file and updating parts of the file or adding new data to the end without rewriting everything. See the tutorial here: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
You could try looking at Random Access Files:
Tutorial: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
API: http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html
... but you will still need to figure out the exact positions you want to read in a binary file.
You might want to consider moving to a database, maybe a small embedded one like H2 (http://www.h2database.com)
I am suppose to generate graph from the results/execution of my algorithm . I have heard something about using CSV file in Excel and generating the graph. I have no idea what this CSV file is and how to do it. I googled CSV file but the answer i got was in connection with databases.
I am asking if someone can show me or point me to a tutorial where this kind of thing has been done before. For instance i have to generate a graph from a Quicksort algorithm and also generate another graph with many algorithms at the same time.
Need help please
Thanks
CSV == "comma separated values". It's a file that has one row per line, where each value is separated by a comma.
I'm not sure how this is relevant to your algorithm or generating graphs.
Since you're using Java, you can easily generate a nice looking graph using GraphViz from AT&T. I think it's a terrific tool.