I need to make encog program in Java with XOR function that must have string words with definitions as inputs but BasicMLDataSet can only receive doubles. Here is the sample code that I am using:
/**
* The input necessary for XOR.
*/
public static double XOR_INPUT[][] = { { 0.0, 0.0 }, { 1.0, 0.0 },
{ 0.0, 1.0 }, { 1.0, 1.0 } };
/**
* The ideal data necessary for XOR.
*/
public static double XOR_IDEAL[][] = { { 0.0 }, { 1.0 }, { 1.0 }, { 0.0 } };
And here is the class that receives XOR_INPUT and XOR_IDEAL:
MLDataSet trainingSet = new BasicMLDataSet(XOR_INPUT, XOR_IDEAL);
The code is from encog xor example
Is there any way that I can acomplish training with strings or parse them somehow and then return them to strings before writing them to console?
I have found a work around for this. As I can only provide double values between 0 and 1 as inputs and as I haven't found any function in encog that can naturally normalize string to double values I have made my own function. I'm getting ascii value from every letter in word and then I'm simply dividing 90/asciiValue to get value between 0 and 1. Keep in mind that this only works for small letters. Function can be easily upgraded to support upper letters also. Here is the function:
//Converts every letter in string to ascii and normalizes it (90/asciiValue)
public static double[] toAscii(String s, int najveci) {
double[] ascii = new double[najveci];
try {
byte[] bytes = s.getBytes("US-ASCII");
for (int i = 0; i < bytes.length; i++) {
ascii[i] = 90.0 / bytes[i];
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return ascii;
}
For word ideal output I'm using similar solution. I'm also normalizing each letter in word but then I make average of those values. Later, I'm denormalizing those values to get strings back and check model training goodnes.
You can view full code here.
You can use Encog's EncogAnalyst and AnalystWizard to normalize your data. This posting by #JeffHeaton (the author of Encog) shows an example using .csv files
These classes can normalize both numeric and "nominal" data (e.g. the strings you want to use.) You will likely want to use the "Equilateral" normalization for these strings, as this will avoid some training issues with Neural Networks.
You might also want to check out this tutorial on Encog on PluralSight which has an entire section on Normalization.
Here is an example from the Encog documentation that shows how to normalize a field using code (without a .csv file):
var fuelStats = new NormalizedField( NormalizationAction.Normalize , ”fuel”, 200, 0, −0.9, 0.9) ;
28 Obtaining Data for Encog
For the above example the range is normalized to -0.9 to 0.9. This is very similar to normalizing between -1 and 1, but less extreme. This can produce better results at times. It is also known that the acceptable range for fuel is between 0 and 200. Now that the field object has been created, it is easy to normalize the values. Here the value 100 is normalized into the variable n. double n = fuelStats .Normalize(100); To denormalize n back to the original fuel value, use the following code:
double f = fuelStats .Denormalize(n);
Related
I am using .NET implementation of liblinear in my C# code by the following nuget package:
https://www.nuget.org/packages/Liblinear/
But in the readme file of liblinear, the format for x is:
struct problem describes the problem:
struct problem
{
int l, n;
int *y;
struct feature_node **x;
double bias;
};
where `l` is the number of training data. If bias >= 0, we assume
that one additional feature is added to the end of each data
instance. `n` is the number of feature (including the bias feature
if bias >= 0). `y` is an array containing the target values. (integers
in classification, real numbers in regression) And `x` is an array
of pointers, each of which points to a sparse representation (array
of feature_node) of one training vector.
For example, if we have the following training data:
LABEL ATTR1 ATTR2 ATTR3 ATTR4 ATTR5
----- ----- ----- ----- ----- -----
1 0 0.1 0.2 0 0
2 0 0.1 0.3 -1.2 0
1 0.4 0 0 0 0
2 0 0.1 0 1.4 0.5
3 -0.1 -0.2 0.1 1.1 0.1
and bias = 1, then the components of problem are:
l = 5
n = 6
y -> 1 2 1 2 3
x -> [ ] -> (2,0.1) (3,0.2) (6,1) (-1,?)
[ ] -> (2,0.1) (3,0.3) (4,-1.2) (6,1) (-1,?)
[ ] -> (1,0.4) (6,1) (-1,?)
[ ] -> (2,0.1) (4,1.4) (5,0.5) (6,1) (-1,?)
[ ] -> (1,-0.1) (2,-0.2) (3,0.1) (4,1.1) (5,0.1) (6,1) (-1,?)
But, in the example showing java implementation:
https://gist.github.com/hodzanassredin/6682771
problem.x <- [|
[|new FeatureNode(1,0.); new FeatureNode(2,1.)|]
[|new FeatureNode(1,2.); new FeatureNode(2,0.)|]
|]// feature nodes
problem.y <- [|1.;2.|] // target values
which means his data set is:
1 0 1
2 2 0
So, he is not storing the nodes as per sparse format of liblinear. Does, anyone know of correct format for x for liblinear implementation?
Though it doesn't address exactly the library you mentioned, I can offer you an alternative. The
Accord.NET Framework has recently incorporated all of LIBLINEAR's algorithms in its machine learning
namespaces. It is also available through NuGet.
In this library, the direct syntax to create a linear support vector machine from in-memory data is
// Create a simple binary AND
// classification problem:
double[][] problem =
{
// a b a + b
new double[] { 0, 0, 0 },
new double[] { 0, 1, 0 },
new double[] { 1, 0, 0 },
new double[] { 1, 1, 1 },
};
// Get the two first columns as the problem
// inputs and the last column as the output
// input columns
double[][] inputs = problem.GetColumns(0, 1);
// output column
int[] outputs = problem.GetColumn(2).ToInt32();
// However, SVMs expect the output value to be
// either -1 or +1. As such, we have to convert
// it so the vector contains { -1, -1, -1, +1 }:
//
outputs = outputs.Apply(x => x == 0 ? -1 : 1);
After the problem is created, one can learn a linear SVM using
// Create a new linear-SVM for two inputs (a and b)
SupportVectorMachine svm = new SupportVectorMachine(inputs: 2);
// Create a L2-regularized L2-loss support vector classification
var teacher = new LinearDualCoordinateDescent(svm, inputs, outputs)
{
Loss = Loss.L2,
Complexity = 1000,
Tolerance = 1e-5
};
// Learn the machine
double error = teacher.Run(computeError: true);
// Compute the machine's answers for the learned inputs
int[] answers = inputs.Apply(x => Math.Sign(svm.Compute(x)));
This assumes, however, that your data is already in-memory. If you wish to load your data from the
disk, from a file in libsvm sparse format, you can use the framework's SparseReader class.
An example on how to use it can be found below:
// Suppose we are going to read a sparse sample file containing
// samples which have an actual dimension of 4. Since the samples
// are in a sparse format, each entry in the file will probably
// have a much smaller number of elements.
//
int sampleSize = 4;
// Create a new Sparse Sample Reader to read any given file,
// passing the correct dense sample size in the constructor
//
SparseReader reader = new SparseReader(file, Encoding.Default, sampleSize);
// Declare a vector to obtain the label
// of each of the samples in the file
//
int[] labels = null;
// Declare a vector to obtain the description (or comments)
// about each of the samples in the file, if present.
//
string[] descriptions = null;
// Read the sparse samples and store them in a dense vector array
double[][] samples = reader.ReadToEnd(out labels, out descriptions);
Afterwards, one can use the samples and labels vectors as the inputs and outputs of the problem,
respectively.
I hope it helps.
Disclaimer: I am the author of this library. I am answering this question in the sincere hope it
can be useful for the OP, since not long ago I also faced the same problems. If a moderator thinks
this looks like spam, feel free to delete. However, I am only posting this because I think it might
help others. I even came across this question by mistake while searching for existing C#
implementations of LIBSVM, not LIBLINEAR.
I have searched the internet but have not found any solutions for my question.
I would like to be able to use the same/replicate the type of FLOOR function found in Excel in Java. In particular I would like to be able to provide a value (double or preferably BigDecimal) and round down to the nearest multiple of a significance I provide.
Examples 1:
Value = 24,519.30235
Significance = 0.01
Returned Value = 24,519.30
Example 2:
Value = 76.81485697
Significance = 1
Returned Value = 76
Example 3:
Value = 12,457,854
Significance = 100
Returned Value = 12,457,800
I am pretty new to java and was wondering if someone knew if an API already includes the function or if they would be kind enough to give me a solution to the above. I am aware of BigDecimal but I might have missed the correct function.
Many thanks
Yes you can.
Lets say given numbers are
76.21445
and
0.01
what you can do is multiply 76.21445 by 100 (or divide per 0.01)
round the result to nearest or lower integer (depending which one you want)
and than multiply it by the number again.
Note that it may not exactly print what you want if you will not go for the numbers with decimal precision. (The problem of numbers which in the binary format are not finite in extansion). Also in Math you have the round function taking doing pretty much what you want.
http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html you use it like this
round(200.3456, 2);
one Example Code could be
public static void main(String[] args) {
BigDecimal value = new BigDecimal("2.0");
BigDecimal significance = new BigDecimal("0.5");
for (int i = 1; i <= 10; i++) {
System.out.println(value + " --> " + floor(value, significance));
value = value.add(new BigDecimal("0.1"));
}
}
private static double floor(BigDecimal value, BigDecimal significance) {
double result = 0;
if (value != null) {
result = value.divide(significance).doubleValue();
result = Math.floor(result) * significance.doubleValue();
}
return result;
}
To round a BigDecimal, you can use setScale(). In your case, you want RoundingMode.FLOOR.
Now you need to determine the number of digits from the "significance". Use Math.log10(significance) for that. You'll probably have to round the result up.
If the result is negative, then you have a significance < 1. In this case, use setScale(-result, RoundingMode.FLOOR) to round to N digits.
If it's > 1, then use this code:
value
.divide(significance)
.setScale(0, RoundingMode.FLOOR)
.multiply(significance);
i.e. 1024 and 100 gives 10.24 -> 10 -> 1000.
Is there a function in Java, or in a library such as Apache Commons Math which is equivalent to the MATLAB function randsample?
More specifically, I want to find a function randSample which returns a vector of Independent and Identically Distributed random variables according to the probability distribution which I specify.
For example:
int[] a = randSample(new int[]{0, 1, 2}, 5, new double[]{0.2, 0.3, 0.5})
// { 0 w.p. 0.2
// a[i] = { 1 w.p. 0.3
// { 2 w.p. 0.5
The output is the same as the MATLAB code randsample([0 1 2], 5, true, [0.2 0.3 0.5]) where the true means sampling with replacement.
If such a function does not exist, how do I write one?
Note: I know that a similar question has been asked on Stack Overflow but unfortunately it has not been answered.
I'm pretty sure one doesn't exist, but it's pretty easy to make a function that would produce samples like that. First off, Java does come with a random number generator, specifically one with a function, Random.nextDouble() that can produce random doubles between 0.0 and 1.0.
import java.util.Random;
double someRandomDouble = Random.nextDouble();
// This will be a uniformly distributed
// random variable between 0.0 and 1.0.
If you have sampling with replacement, if you convert the pdf you have as an input into a cdf, you can use the random doubles Java provides to create a random data set by seeing in which part of the cdf it falls. So first you need to convert the pdf into a cdf.
int [] randsample(int[] values, int numsamples,
boolean withReplacement, double [] pdf) {
if(withReplacement) {
double[] cdf = new double[pdf.length];
cdf[0] = pdf[0];
for(int i=1; i<pdf.length; i++) {
cdf[i] = cdf[i-1] + pdf[i];
}
Then you make the properly-sized array of ints to store the result and start finding the random results:
int[] results = new int[numsamples];
for(int i=0; i<numsamples; i++) {
int currentPosition = 0;
while(randomValue > cdf[currentPosition] && currentPosition < cdf.length) {
currentPosition++; //Check the next one.
}
if(currentPosition < cdf.length) { //It worked!
results[i] = values[currentPosition];
} else { //It didn't work.. let's fail gracefully I guess.
results[i] = values[cdf.length-1];
// And assign it the last value.
}
}
//Now we're done and can return the results!
return results;
} else { //Without replacement.
throw new Exception("This is unimplemented!");
}
}
There's some error checking (make sure value array and pdf array are the same size) and some other features you can implement by overloading this to provide the other functions, but hopefully this is enough for you to start. Cheers!
I'm creating a pseudo-random text generator using a Markov model. Basically, I use a hash table to store lists of substrings of order k(the order of the Markov model), then for each substring I have a TreeMap of the suffixes with their frequencies throughout the substring.
I'm struggling with generating the random suffix. For each substring, I have a TreeMap containing all of the possible suffixes and their frequencies. I'm having trouble with using this to create a probability for each suffix, and then generating a pseudo-random suffix based on the probabilities.
Any help on the concept of this and how to go about doing this is appreciated. If you have any questions or need clarification, please let me know.
I'm not sure that a TreeMap is really the best data-structure for this, but . . .
You can use the Math.random() method to obtain a random value between 0.0 (inclusive) and 1.0 (exclusive). Then, iterate over the elements of your map, accumulating their frequencies, until you surpass that value. The suffix that first surpasses this value is your result. Assuming that your map-elements' frequencies all add up to 1.0, this will choose all suffixes in proportion to their frequencies.
For example:
public class Demo
{
private final Map<String, Double> suffixFrequencies =
new TreeMap<String, Double>();
private String getRandomSuffix()
{
final double value = Math.random();
double accum = 0.0;
for(final Map.Entry<String, Double> e : suffixFrequencies.entrySet())
{
accum += e.getValue();
if(accum > value)
return e.getKey();
}
throw new AssertionError(); // or something
}
public static void main(final String... args)
{
final Demo demo = new Demo();
demo.suffixFrequencies.put("abc", 0.3); // value in [0.0, 0.3)
demo.suffixFrequencies.put("def", 0.2); // value in [0.3, 0.5)
demo.suffixFrequencies.put("ghi", 0.5); // value in [0.5, 1.0)
// Print "abc" approximately three times, "def" approximately twice,
// and "ghi" approximately five times:
for(int i = 0; i < 10; ++i)
System.out.println(demo.getRandomSuffix());
}
}
Notes:
Due to roundoff error, the throw new AssertionError() probably actually will happen every so often, albeit very rarely. So I recommend that you replace that line with something that just always chooses the first element or last element or something.
If the frequencies don't all add up to 1.0, then you should add a pass at the beginning of getRandomSuffix() that determines the sum of all frequencies. You can then scale value accordingly.
I feel like there should be an available library to more simply do two things, A) Find the mode to an array, in the case of doubles and B) gracefully degrade the precision until you reach a particular frequency.
So imagine an array like this:
double[] a = {1.12, 1.15, 1.13, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4};
If I was looking for a frequency of 3 then it would go from 2 decimal positions to 1 decimal, and finally return 1.1 as my mode. If I had a frequency requirement of 4 it would return 4 as my mode.
I do have a set of code that is working the way I want, and returning what I am expecting, but I feel like there should be a more efficient way to accomplish this, or an existing library that would help me do the same. Attached is my code, I'd be interested in thoughts / comments on different approaches I should have taken....I have the iterations listed to limit how far the precision can degrade.
public static double findMode(double[] r, int frequencyReq)
{
double mode = 0d;
int frequency = 0;
int iterations = 4;
HashMap<Double, BigDecimal> counter = new HashMap<Double, BigDecimal>();
while(frequency < frequencyReq && iterations > 0){
String roundFormatString = "#.";
for(int j=0; j<iterations; j++){
roundFormatString += "#";
}
DecimalFormat roundFormat = new DecimalFormat(roundFormatString);
for(int i=0; i<r.length; i++){
double element = Double.valueOf(roundFormat.format(r[i]));
if(!counter.containsKey(element))
counter.put(element, new BigDecimal(0));
counter.put(element,counter.get(element).add(new BigDecimal(1)));
}
for(Double key : counter.keySet()){
if(counter.get(key).compareTo(new BigDecimal(frequency))>0){
mode = key;
frequency = counter.get(key).intValue();
log.debug("key: " + key + " Count: " + counter.get(key));
}
}
iterations--;
}
return mode;
}
Edit
Another way to rephrase the question, per Paulo's comment: the goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
Here a solution to the reformulated question:
The goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
(I took the freedom of switching the order of 1.15 and 1.13 in the input array.)
The basic idea is: We have the input already sorted (i.e. neighboring elements are consecutive), and we know how many elements we want in our neighborhood. So we loop once over this array, measuring the distance between the left element and the element frequency elements more to the right. Between them are frequency elements, so this forms a neighbourhood. Then we simply take the minimum such distance. (My method has a complicated way to return the results, you may want to do it better.)
This is not completely equivalent to your original question (does not work by fixed steps of digits), but maybe this is more what you really want :-)
You'll have to find a better way of formatting the results, though.
package de.fencing_game.paul.examples;
import java.util.Arrays;
/**
* searching of dense points in a distribution.
*
* Inspired by http://stackoverflow.com/questions/5329628/finding-a-mode-with-decreasing-precision.
*/
public class InpreciseMode {
/** our input data, should be sorted ascending. */
private double[] data;
public InpreciseMode(double ... data) {
this.data = data;
}
/**
* searchs the smallest neighbourhood (by diameter) which
* contains at least minSize elements.
*
* #return an array of two arrays:
* { { the middle point of the neighborhood,
* the diameter of the neighborhood },
* all the elements of the neigborhood }
*
* TODO: better return an object of a class encapsuling these.
*/
public double[][] findSmallNeighbourhood(int minSize) {
int currentLeft = -1;
int currentRight = -1;
double currentMinDiameter = Double.POSITIVE_INFINITY;
for(int i = 0; i + minSize-1 < data.length; i++) {
double diameter = data[i+minSize-1] - data[i];
if(diameter < currentMinDiameter) {
currentMinDiameter = diameter;
currentLeft = i;
currentRight = i + minSize-1;
}
}
return
new double[][] {
{
(data[currentRight] + data[currentLeft])/2.0,
currentMinDiameter
},
Arrays.copyOfRange(data, currentLeft, currentRight+1)
};
}
public void printSmallNeighbourhoods() {
for(int frequency = 2; frequency <= data.length; frequency++) {
double[][] found = findSmallNeighbourhood(frequency);
System.out.printf("There are %d elements in %f radius "+
"around %f:%n %s.%n",
frequency, found[0][1]/2, found[0][0],
Arrays.toString(found[1]));
}
}
public static void main(String[] params) {
InpreciseMode m =
new InpreciseMode(1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1,
4.2, 4.3, 4.4);
m.printSmallNeighbourhoods();
}
}
The output is
There are 2 elements in 0,005000 radius around 1,125000:
[1.12, 1.13].
There are 3 elements in 0,015000 radius around 1,135000:
[1.12, 1.13, 1.15].
There are 4 elements in 0,150000 radius around 4,250000:
[4.1, 4.2, 4.3, 4.4].
There are 5 elements in 0,450000 radius around 3,850000:
[3.4, 3.44, 4.1, 4.2, 4.3].
There are 6 elements in 0,500000 radius around 3,900000:
[3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 7 elements in 1,200000 radius around 3,200000:
[2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 8 elements in 1,540000 radius around 2,660000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2].
There are 9 elements in 1,590000 radius around 2,710000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3].
There are 10 elements in 1,640000 radius around 2,760000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
I think there's nothing wrong with your code and I doubt that you will find a library that does something so specific. But if still you want an idea to approach this problem using a more OOP approach that reuses Java collections, here it comes another approach:
Create a class to represent numbers with different number of decimals. It would have something like VariableDecimal(double d,int ndecimals) as constructor.
In that class override the object methods equals and hashCode. Your implementation of equals will test if two instances of VariableDecimal are the same taking into account the value d and the number of decimals. hashCode can simple return d*exp(10,ndecimals) casted to Integer.
In your logic use HashMaps so that they reuse your object:
HashMap<VariableDecimal, AtomicInteger> counters = new HashMap<VariableDecimal, AtomicInteger>();
for (double d : a) {
VariableDecimal vd = new VariableDecimal(d,ndecimals);
if (counters.get(vd)!=null)
counters.set(vd,new AtomicInteger(0));
counters.get(vd).incrementAndGet();
}
/* at the end of this loop counters should hold a map with frequencies of
each double for the selected precision so that you can simply traverse and
get the max */
This piece of code doesn't show the iteration to decrement the number of decimals, which is trivial.