Neural network for letter recognition - java

I'm trying to add to the code for a single layer neural network which takes a bitmap as input and has 26 outputs for the likelihood of each letter in the alphabet.
The first question I have is regarding the single hidden layer that is being added. Am I correct in thinking that the hidden layer will have it's own set of output values and weights only? It doesn't need to have it's own bias'?
Can I also confirm that I'm thinking about the feedforward aspect correctly? Here's some pseudocode:
// input => hidden
for j in hiddenOutput.length:
sum=inputs*hiddenWeights
hiddenOutput[j] = activationFunction(sum)
// hidden => output
for j in output.length:
sum=hiddenOutputs*weights
output[j] = activationFunction(sum)
Assuming that is correct, would the training be something like this?
def train(input[], desired[]):
iterate through output and determine errors[]
update weights & bias accordingly
iterate through hiddenOutput and determine hiddenErrors[]
update hiddenWeights & (same bias?) accordingly
Thanks in advance for any help, I've read so many examples and tutorials and I'm still having trouble determining how to do everything correctly.

Dylan, this is probably long after your homework assignment was due, but I do have a few thoughts about what you've posted.
Make the hidden layer much bigger than the size of the input bitmaps.
You should have different weights and biases from input -> hidden than from hidden -> output.
Spend a lot of time on your error function (discriminator).
Understand that neural nets have a tendency to get quickly locked in to a set of weights (usually incorrect). You'll need to start over and train in a different order.
The thing I learned about neural nets is that you never know why they're working (or not working). That alone is reason to keep it out of the realms of medicine and finance.

you might want to read http://www.ai-junkie.com/ann/evolved/nnt1.html . there it mentions exactly something about what you are doing. It also provided code along with a (mostly) simple explanation of how it learns. Although the learning aspect is completely different from feed forward this should hopefully give you some ideas about the nature of NN.
It is my belief that even the hidden and output layers should have a bias.
Also NN can be tricky, try first identifying only 1 letter. Getting a consistent high/low signal from only a single output. Then try to keep that signal with different variations of the same letter. Then you can progress and add more. You might do that by teaching 26 different networks that give an output only on a match. Or maybe you make it as one large NN with 26 outputs. Two different approaches.

As far as the use of bias terms is concerned I found the section Why use a bias/threshold? in the comp.ai.neural-nets FAQ very useful. I highly recommend reading that FAQ.

Related

Programmatically finding periodicity of a given function

I am working on a project in Android for my Signal Processing course. My aim is to find signal properties, such as periodicity, even/odd, causality etc, given a user-input function. Right now, I am stuck at trying to figure out how to programmatically calculate the periodicity of a given function. I know the basics behind periodicity: f(t+T) = f(t)
The only idea I have right now is to extensively calculate values for the function, and then check for repetition of values. I know the method is quite stupid, given the fact I don't know how many such values I need to calculate to determine if it is periodic or not.
I know this can be done easily in Matlab, but again very difficult to port Matlab to Java. Is there something I am missing? I have been searching a lot, but haven't found anything useful.
Thanks for any help, in advance!
If the function f is given as a symbolic expression, then the equation you stated can in some cases be solved symbolically. The amount of work required for this will depend on how your functions are described, what kinds of functions you allow, what libraries you use and so on.
If your only interaction with the function is evaluating it, i.e. if you treat the function description as a black box or obtain its values from some sensor, then your best bet would be a Fourier transformation of the data, to convert it from the time domain into frequency domain. In particularly, you probably want to choose your number of of samples to analyze as a power of two, and then use FFT to quickly obtain intensities for various frequencies.

Artificial Neural network PSO training

I am working on a FF Neural network (used for classification problems) which I am training using a PSO. I only have one hidden layer and I can vary the amount of neurons in that layer.
My problem is that the NN can learn linearly separable problems quite easily but can not learn problems that are not linearly separable(like XOR) like it should be able to do.
I believe my PSO is working correctly because I cans see that it tries to minimises the error function of each particle (using mean squared error over the training set).
I have tried using a sigmoid and linear activation function with similar(bad) results. I also have a bias unit(which also doesn't help much).
What I want to know is if there is something specific that I might be doing wrong that might cause this type of problem, or maybe just some things I should look at where the error might be.
I am a bit lost at the moment
Thanks
PSO can train a neural network to non-solve linearly separable problems, like XOR. I've done this before, my algorithm takes about 50 or so iterations at most. Sigmoid is a good activation function for XOR. If it does converge for non-separable problems then my guess is somehow your hidden layer is not having an effect, or is bypassed. As the hidden layer is what typically allows non-separable.
When I debug AI I find it often useful to determine first if my training code or evaluation code (the neural network in this case) is at fault. You might want to create a 2nd trainer for your network. Then you can make sure your network code is calculating the output correctly. You could even do a simple "hill climber". Pick a random weight and change by a random small amount (up or down). Did your error get better? Keep the weight change and repeat. Did your error get worse, drop the change and try again.

Neural network to solve a card-prob

The problem - I have 10 number of cards value 1 to 10. Now I have to arrange the cards in away that adding 5 cards gives me 36 and product of remaining 5 cards give me 360.
I had successfully made a GA to solve cards Problem in java. Now I am thinking to solve same problem with Neural Network. Is it possible to solve this by NN? What approach should I take?
This problem is hard to solve directly with a Neural Network. Neural Networks are not going to have a concept of sum or product, so they won't be able to tell the difference between a valid and invalid solution directly.
If you created enough examples and labelled then then the neural network might be able to learn to tell the "good" and "bad" arrangements apart just by memorising them all. But it would be a very inefficient and inaccurate way of doing this, and it would be somewhat pointless - you'd have to have a separate program that knew how to solve the problem in order to create the data to train the neural network.
P.S. I think you are a bit lucky that you managed to get the GA to work as well - I suspect it only worked because the problem is small enough for the GA to try most of the possible solutions in the vicinity of the answer(s) and hence it stumbles upon a correct answer by chance before too long.
To follow up on #mikera's comments on why Neural Networks (NNs) might not be best for this task, it is useful to consider how NNs are usually used.
A NN is usually used in a supervised learning task. That is, the implementer provides many examples of input and the correct output that goes with that input. The NN then finds a general function which captures the provided input/output pairs and hopefully captures many other previously unseen input/output pairs as well.
In your problem you are solving a particular optimization, so there isn't much training to be done. There is just one (or more) right answers. So, NNs aren't really designed for such problems.
Note that the concept of not having a sum/product doesn't necessarily hurt a NN. You just have to create your own input layer which has sum and product features so that the NN can learn directly from these features. But, in this problem it won't help very much.
Note also that your problem is so small that even a naive enumeration of all combinations (10! = 3,628,800) of numbers should be achievable in a few seconds at most.

Handwritten character (English letters, kanji,etc.) analysis and correction

I would like to know how practical it would be to create a program which takes handwritten characters in some form, analyzes them, and offers corrections to the user. The inspiration for this idea is to have elementary school students in other countries or University students in America learn how to write in languages such as Japanese or Chinese where there are a lot of characters and even the slightest mistake can make a big difference.
I am unsure how the program will analyze the character. My current idea is to get a single pixel width line to represent the stroke, compare how far each pixel is from the corresponding pixel in the example character loaded from a database, and output which area needs the most work. Endpoints will also be useful to know. I would also like to tell the user if their character could be interpreted as another character similar to the one they wanted to write.
I imagine I will need a library of some sort to complete this project in any sort of timely manner but I have been unable to locate one which meets the standards I will need for the program. I looked into OpenCV but it appears to be meant for vision than image processing. I would also appreciate the library/module to be in python or Java but I can learn a new language if absolutely necessary.
Thank you for any help in this project.
Character Recognition is usually implemented using Artificial Neural Networks (ANNs). It is not a straightforward task to implement seeing that there are usually lots of ways in which different people write the same character.
The good thing about neural networks is that they can be trained. So, to change from one language to another all you need to change are the weights between the neurons, and leave your network intact. Neural networks are also able to generalize to a certain extent, so they are usually able to cope with minor variances of the same letter.
Tesseract is an open source OCR which was developed in the mid 90's. You might want to read about it to gain some pointers.
You can follow company links from this Wikipedia article:
http://en.wikipedia.org/wiki/Intelligent_character_recognition
I would not recommend that you attempt to implement a solution yourself, especially if you want to complete the task in less than a year or two of full-time work. It would be unfortunate if an incomplete solution provided poor guidance for students.
A word of caution: some companies that offer commercial ICR libraries may not wish to support you and/or may not provide a quote. That's their right. However, if you do not feel comfortable working with a particular vendor, either ask for a different sales contact and/or try a different vendor first.
My current idea is to get a single pixel width line to represent the stroke, compare how far each pixel is from the corresponding pixel in the example character loaded from a database, and output which area needs the most work.
The initial step of getting a stroke representation only a single pixel wide is much more difficult than you might guess. Although there are simple algorithms (e.g. Stentiford and Zhang-Suen) to perform thinning, stroke crossings and rough edges present serious problems. This is a classic (and unsolved) problem. Thinning works much of the time, but when it fails, it can fail miserably.
You could work with an open source library, and although that will help you learn algorithms and their uses, to develop a good solution you will almost certainly need to dig into the algorithms themselves and understand how they work. That requires quite a bit of study.
Here are some books that are useful as introduct textbooks:
Digital Image Processing by Gonzalez and Woods
Character Recognition Systems by Cheriet, Kharma, Siu, and Suen
Reading in the Brain by Stanislas Dehaene
Gonzalez and Woods is a standard textbook in image processing. Without some background knowledge of image processing it will be difficult for you to make progress.
The book by Cheriet, et al., touches on the state of the art in optical character recognition (OCR) and also covers handwriting recognition. The sooner you read this book, the sooner you can learn about techniques that have already been attempted.
The Dehaene book is a readable presentation of the mental processes involved in human reading, and could inspire development of interesting new algorithms.
Have you seen http://www.skritter.com? They do this in combination with spaced recognition scheduling.
I guess you want to classify features such as curves in your strokes (http://en.wikipedia.org/wiki/CJK_strokes), then as a next layer identify componenents, then estimate the most likely character. All the while statistically weighting the most likely character. Where there are two likely matches you will want to show them as likely to be confused. You will also need to create a database of probably 3000 to 5000 characters, or up to 10000 for the ambitious.
See also http://www.tegaki.org/ for an open source program to do this.

String analysis and classification

I am developing a financial manager in my freetime with Java and Swing GUI. When the user adds a new entry, he is prompted to fill in: Moneyamount, Date, Comment and Section (e.g. Car, Salary, Computer, Food,...)
The sections are created "on the fly". When the user enters a new section, it will be added to the section-jcombobox for further selection. The other point is, that the comments could be in different languages. So the list of hard coded words and synonyms would be enormous.
So, my question is, is it possible to analyse the comment (e.g. "Fuel", "Car service", "Lunch at **") and preselect a fitting Section.
My first thought was, do it with a neural network and learn from the input, if the user selects another section.
But my problem is, I donĀ“t know how to start at all. I tried "encog" with Eclipse and did some tutorials (XOR,...). But all of them are only using doubles as in/output.
Anyone could give me a hint how to start or any other possible solution for this?
Here is a runable JAR (current development state, requires Java7) and the Sourceforge Page
Forget about neural networks. This is a highly technical and specialized field of artificial intelligence, which is probably not suitable for your problem, and requires a solid expertise. Besides, there is a lot of simpler and better solutions for your problem.
First obvious solution, build a list of words and synonyms for all your sections and parse for these synonyms. You can then collect comments online for synonyms analysis, or use parse comments/sections provided by your users to statistically detect relations between words, etc...
There is an infinite number of possible solutions, ranging from the simplest to the most overkill. Now you need to define if this feature of your system is critical (prefilling? probably not, then)... and what any development effort will bring you. One hour of work could bring you a 80% satisfying feature, while aiming for 90% would cost one week of work. Is it really worth it?
Go for the simplest solution and tackle the real challenge of any dev project: delivering. Once your app is delivered, then you can always go back and improve as needed.
String myString = new String(paramInput);
if(myString.contains("FUEL")){
//do the fuel functionality
}
In a simple app, if you will be having only some specific sections in your application then you can get string from comments and check it if it contains some keywords and then according to it change the value of Section.
If you have a lot of categories, I would use something like Apache Lucene where you could index all the categories with their name's and potential keywords/phrases that might appear in a users description. Then you could simply run the description through Lucene and use the top matched category as a "best guess".
P.S. Neural Network inputs and outputs will always be doubles or floats with a value between 0 and 1. As for how to implement String matching I wouldn't even know where to start.
It seems to me that following will do:
hard word statistics
maybe a stemming class (English/Spanish) which reduce a word like "lunches" to "lunch".
a list of most frequent non-words (the, at, a, for, ...)
The best fit is a linear problem, so theoretical fit for a neural net, but why not take immediately the numerical best fit.
A machine learning algorithm such as an Artificial Neural Network doesn't seem like the best solution here. ANNs can be used for multi-class classification (i.e. 'to which of the provided pre-trained classes does the input represent?' not just 'does the input represent an X?') which fits your use case. The problem is that they are supervised learning methods and as such you need to provide a list of pairs of keywords and classes (Sections) that spans every possible input that your users will provide. This is impossible and in practice ANNs are re-trained when more data is available to produce better results and create a more accurate decision boundary / representation of the function that maps the inputs to outputs. This also assumes that you know all possible classes before you start and each of those classes has training input values that you provide.
The issue is that the input to your ANN (a list of characters or a numerical hash of the string) provides no context by which to classify. There's no higher level information provided that describes the word's meaning. This means that a different word that hashes to a numerically close value can be misclassified if there was insufficient training data.
(As maclema said, the output from an ANN will always be floats with each value representing proximity to a class - or a class with a level of uncertainty.)
A better solution would be to employ some kind of word-relation or synonym graph. A Bag of words model might be useful here.
Edit: In light of your comment that you don't know the Sections before hand,
an easy solution to program would be to provide a list of keywords in a file that gets updated as people use the program. Simply storing a mapping of provided comments -> Sections, which you will already have in your database, would allow you to filter out non-keywords (and, or, the, ...). One option is to then find a list of each Section that the typed keywords belong to and suggest multiple Sections and let the user pick one. The feedback that you get from user selections would enable improvements of suggestions in the future. Another would be to calculate a Bayesian probability - the probability that this word belongs to Section X given the previous stored mappings - for all keywords and Sections and either take the modal Section or normalise over each unique keyword and take the mean. Calculations of probabilities will need to be updated as you gather more information ofcourse, perhaps this could be done with every new addition in a background thread.

Categories