How to implement a bag-of-words in java

How to implement a bag-of-words in java - java

Basically, I need a double-connected Map, which can retrieve value from key and the inverse, I have checked this link, BUT also should be sorted according to the values AND should take multiple values for a single key (I cannot guarantee that there won't be an exact freq for different keys).
So is there any structure with that criteria out there?
Below is the specific problem that imposed this need (maybe I have something missing while I was implementing this but if you know an answer to the above question then you can probably skip it):
I want to implement a bag-of-words method for some features. The idea is to keep only the top k bins with the greatest frequency of occurrence.
To make it more specific let's say I have a codebook
double[10000][d] codebook and a set of features double[][] features. For each line in features, which represent a feature I check the distance from each line in codebook and assign it to the bin having this line a centroid.
Then I increment the index of this bin by 1 until all features have been used.
Then, I would like to keep only the top k bins as results.
The part I am a bit stuck is the one for keeping only the top-k bins. I use a BoundedPriorityQueue<Feature> collection to achieve but I am not sure if there is some simpler approach.
public static BoundedPriorityQueue<Feature> boWquantizerLargerK(double[][] codebook, double[][] features, int featureLength, int maxNumWords) {
HashMap<Integer, Integer> boWMap = new HashMap<Integer, Integer>();
BoundedPriorityQueue<Feature> nn = new BoundedPriorityQueue<Feature>(new Feature(), maxNumWords);
for(int i = 0; i < features.length; i++) {
double[] distCodebook = new double[codebook.length];
for(int j = 0; j < codebook.length; j++) {
double[] dist = new double[featureLength];
for(int k = 0; k < featureLength; k++)
dist[k] = (codebook[j][k] - features[i][k])*(codebook[j][k] - features[i][k]);
distCodebook[j] = MathUtils.sum(dist);
}
Integer index = MathUtils.indexOfMin(distCodebook) + 1;
Integer freq;
if((freq = boWMap.get(index)) == null) {
boWMap.put(index, 1);
nn.offer(new Feature(1, index));
}
else {
boWMap.put(index, ++freq);
nn.offer(new Feature(freq, index));
}
}
return nn;
}
The Feature class is a simple implementation of Comparator:
public class Feature implements Comparator<Feature> {
private Integer freq;
private Integer word;
public Feature() {}
public Feature(Integer freq, Integer word) {
this.freq = freq;
this.word = word;}
public int compare(Feature o1, Feature o2) {
if ((o1).getFrequency() > (o2).getFrequency())
return -1;
else if ((o1).getFrequency() < (o2).getFrequency())
return 1;
else
return 0;}
public double getFrequency() {
return freq;}
}
To summarize the problem, I have a collection which has members pairs of values, the first representing the bin and the second the frequency. This collection is updated until all features have been processed and at which point I just want to keep the bins with the greatest values.
I am using a HashMap<Integer, Integer> structure for the collection and a BoundedPriorityQueue<Feature> for the top k bins.

Related

How to find the object with the highest value of an attribute? [duplicate]

Is there an easy way to get the max value from one field of an object in an arraylist of objects? For example, out of the following object, I was hoping to get the highest value for the Value field.
Example arraylist I want to get the max value for ValuePairs.mValue from.
ArrayList<ValuePairs> ourValues = new ArrayList<>();
outValues.add(new ValuePairs("descr1", 20.00));
outValues.add(new ValuePairs("descr2", 40.00));
outValues.add(new ValuePairs("descr3", 50.00));
Class to create objects stored in arraylist:
public class ValuePairs {
public String mDescr;
public double mValue;
public ValuePairs(String strDescr, double dValue) {
this.mDescr = strDescr;
this.mValue = dValue;
}
}
I'm trying to get the max value for mValue by doing something like (which I know is incorrect):
double dMax = Collections.max(ourValues.dValue);
dMax should be 50.00.

With Java 8 you can use stream() together with it's predefined max() function and Comparator.comparing() functionality with lambda expression:
ValuePairs maxValue = values.stream().max(Comparator.comparing(v -> v.getMValue())).get();
Instead of using a lambda expression, you can also use the method reference directly:
ValuePairs maxValue = values.stream().max(Comparator.comparing(ValuePairs::getMValue)).get();

Use a Comparator with Collections.max() to let it know which is greater in comparison.
Also See
How to use custom Comparator

This has been answered multiple time already, but since it's the first result on google I will give a Java 8 answer with an example.
Take a look at the stream feature. Then you can get the max form an List of Objects like this:
List<ValuePairs> ourValues = new ArrayList<>();
ourValues.stream().max(comparing(ValuePairs::getMValue)).get()
By the way in your example, the attributes should be private. You can then access them with a getter.

You should iterate over the list comparing/finding the max value O(N). If you need to do this often replace the list with a PriorityQueue O(1) to find the max.

Here fist and last is intervals between two indexes of arraylist you can also get for a complete list by removing them and i=0 upto size of float list.
// for min value
public String getMinValue(ArrayList list, int first, int last) {
List<Float> floatList = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
Float prova2 = ((Double) list.get(i)).floatValue();
floatList.add(prova2);
}
float min = Float.MAX_VALUE;
String minValue = "";
for (int i = first; i < last; i++) {
if (floatList.get(i) < min) {
min = floatList.get(i);
}
}
minValue = String.format("%.1f", min);
return minValue;
}
// for max value
public String getMaxValue(List<Object> list, int first, int last) {
List<Float> floatList = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
Float prova2 = ((Double) list.get(i)).floatValue();
floatList.add(prova2);
}
float max = Float.MIN_VALUE;
String maxValue = "";
for (int i = first; i < last; i++) {
if (floatList.get(i) > max) {
max = floatList.get(i);
}
}
maxValue = String.format("%.1f", max);
return maxValue;
}

Couple Sum return statement explanation and alternative method

I was doing practice questions on Firecode and the question was:
Given an array of integers, find two numbers such that they sum up to a specific target.
The method coupleSum should return the indices of the two numbers in the array, where index1 must be less than index2.
Please note that the indices are not zero based, and you can assume that each input has exactly one solution. Target linear runtime and space complexity.
The solution given was:
public static int[] coupleSum(int[] numbers, int target) {
HashMap<Integer, Integer> map = new HashMap<>();
for(int i=0; i < numbers.length; i++){
int n = numbers[i];
if(map.containsKey(n)){
return new int[]{map.get(n), i+1};
} else {
map.put(target-n, i+1);
}
}
return null;
}
Can you explain what
if(map.containsKey(n)){
return new int[]{map.get(n), i+1};
is doing and perhaps provide an alternate way of writing it?

so to write more readable and understandable code, you may look at below snippet. the logic is same as mentioned in your question but we are just maintaining an array to store the desired result.
HashMap<Integer,String> map = new HashMap<Integer,String>();
int[] res = new int[2];
//B=target
for(int i=0; i<A.length; i++) {
int key = A[i];
int diff = B-key;
if(!map.containsKey(key) && !map.containsKey(diff)) {
map.put(diff, key+"_"+i);
} else if (map.containsKey(key)) {
String val = map.get(key);
res[0] = Integer.parseInt(val.split("_")[1])+1;
res[1] = i+1;
return res;
}
}
return new int[0];
In the code above, value of HashMap is combination of array element and it's respective index and key of HashMap is the difference.
Lets say A[0] is 12, value will be 12_0

Search fast an element in array of java.util.LinkedList

Hey I am searching for a better method to search a String element in an array of LinkedLists.
public static void main(String[] args) {
int m = 1000;
LinkedList<String>[] arrayOfList = new LinkedList[m];
for (int i = 0; i < m; i++) {
arrayOfList[i] = new LinkedList<>();
}
}
This is my search method:
public int search(String word) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < arrayOfList[i].size(); j++) {
if (arrayOfList[i].get(j).equals(word)) {
return i;
}
}
}
return -1;
}
This is how my LinkedLists look like:
Example output: arrayOfList[0] = [house,car,tree.....]
arrayOfList[1] = [computer,book,pen....]
......
until arrayOfList[1000] = [...]
My search method should find the index of my word. Example: search("computer") = 1; search("house") = 0

Ah, a classic!
LinkedList is notoriously bad for random access, i.e. the list.get(j) method.
It's much better at iterating through the list, so it can jump from each item to the next item.
You could use list.iterator(), but the foreach loop does the same thing:
public int search(String word) {
for (int i = 0; i < m; i++) {
for (String listValue: arrayOfList[i]) {
if (listValue.equals(word)) {
return i;
}
}
}
return -1;
}

The other answer notes that you can get much better performance by iterating over each LinkedList rather than using List.get. That's because List.get has to search from the start of the list each time. For example, if the LinkedList has 100 elements, then on average each call to List.get(j) will have to iterate over 50 elements, and you're doing that 100 times. The foreach loop just iterates over the LinkedList elements once.
The foreach strategy runs in O(n) time, that is, the time required to perform the lookup increases proportional to n, the total number of words, because you have to search them all for each word.
If you're going to be doing this a lot, and you can use data structures other than LinkedList, then you should iterate through your array of LinkedList once and build a HashMap where the key is the word and the value is the number of the array in which that word appears. Setting up this HashMap will require O(n) time, but subsequent lookups will only require O(1) time, meaning a constant time regardless of the number of words involved. So if you're going to do more than a single lookup, creating the HashMap will have better performance in big-O terms, although for a very small number of lookups (2 or 3) it may still be faster to scan the arrays.
You can build a HashMap like this:
Map<String, Integer> index = new HashMap<>();
for (int i = 0; i < m; i++) {
for (String word: arrayOfList[i]) {
index.put(word, i);
}
}
Now search just becomes:
public int search(String word) {
return index.getOrDefault(word, -1);
}

Depending on how the strings are constructed in your program and how often you call the search method, testing on your strings' hash code can improve the performance. ex:
public int search(String word) {
int wordHashCode = word.hashCode();
for (int i = 0; i < m; i++) {
for (String listValue: arrayOfList[i]) {
if (listValue.hashCode() == wordHashCode && listValue.equals(word)) {
return i;
}
}
}
return -1;
}

Improving the speed of the HashMap

Just coded my own realisation of HashMap with open addressing, key type is int and value type is long. But it works more slowly than exicted java realisation even when i just add a new values. Whats way to make it faster?
public class MyHashMap {
private int maxPutedId =0;
private int size;
private int[] keys;
private long[] values;
private boolean[] change;
public MyHashMap(int size){
this.size = size;
keys = new int[size];
values = new long[size];
change = new boolean[size];
}
public MyHashMap(){
this.size = 100000;
keys = new int[size];
values = new long[size];
change = new boolean[size];
}
public boolean put(int key, long value){
int k = 0;
boolean search = true;
for(int i = 0;i<maxPutedId+2;i++){
if(search&& !change[i] && keys[i] == 0 && values [i] == 0 ){
k=i;
search = false;
}
if(change[i] && keys[i] == key ){
return false;
}
}
keys[k] = key;
values[k] = value;
change[k] = true;
maxPutedId = k;
return true;
}
public Long get(int key) {
for (int i = 0; i < size; i++) {
if (change[i] && keys[i] == key) {
return values[i];
}
}
return null;
}
public int size(){
int s = 0;
for(boolean x: change){
if(x) s++;
}
return s;
}}

You have not implemented a hash table; there is no hashing going on. For example, your get() method is doing a linear traversal through the key array. A hash table implementation is supposed to be able to compute the array entry where the key is most likely to be found (and will in fact be found if it exists and there were no hash collisions).
A simple hash table would look like this: we first compute a hash from the key. Then we look at that slot in the table. Ideally, that's where the key will be found. However, if the key is not there, it could be due to collisions, so then we scan (assuming open addressing) looking for the key in subsequent slots - until we've looked through the whole table or found an unoccupied slot.
I wrote 'get' since it seemed simpler :-)
This is 'off the top of my head' code so you will need to check it carefully.
Long get(int key) {
int h = hash(key);
// look in principal location for this key
if (change[h] && keys[h] == key)
return values[h];
// nope, scan table (wrapping around at the end)
// and stop when we have found the key, scanned
// the whole table, or met an empty slot
int h0 = h; // save original position
while ((h = (h+1) % size) != h0 && change[h])
if ( keys[h] == key)
return values[h];
return null;
}
I probably should have written 'put' first to be more instructive.
The hash function, for int keys, could be computed as key % size. Whether that's a good hash depends on the distribution of your keys; you want a hash that avoids collisions.

Genetic Algorithm - Java Crossover

With my GA's crossover method, I keep getting an ArrayOutOfBounds Exception while concatenating the mother's second half to the father's first half. The ArrayList's are all the same size. Why does my mother keep trying to access the 10th element in my list of objects? MyPair is an object with a random direction and random number of steps.
We are currently learning this topic in my A.I. class, so I'm not an expert in GA's yet. Any additional commentary on my crossover algorithm is welcomed.
public static class Chromosome{
public ArrayList<MyPair> pairs;
private double x, y;
public double cost;
public Chromosome(){
this.pairs = new ArrayList<MyPair>();
this.x = 100.0; this.y = 100.0;
// not sure if I should do this or not
for(int numPairs = 0; numPairs < 10; numPairs++)
this.addToChromosome();
}
public void addToChromosome(){
MyPair myPair = new MyPair();
this.pairs.add(myPair);
}
public ArrayList<MyPair> getPairsList(){
return this.pairs;
}
public Chromosome crossOver(Chromosome father, Chromosome mother){
Chromosome replacement = new Chromosome();
int pos1 = r.nextInt(father.getPairsList().size());
while(pos1 >= 10)
pos1 = r.nextInt(father.getPairsList().size());
for(int i = 0; i < pos1; i++){
MyPair tempPair = father.getPairsList().get(i);
replacement.getPairsList().set(i, tempPair);
}
for(int i = pos1; i < mother.getPairsList().size() - 1; i++){
MyPair tempPair = mother.getPairsList().get(i);
// ArrayList keeps trying to set out of bounds here
replacement.getPairsList().set(i, tempPair);
}
return replacement;
}

The problem appears to be that you are constructing chromosomes including replacement to have 10 pairs, and then you are setting the element in position i, when i may be 10 or greater.
This has multiple effects you might not have intended. If you splice together a mother and father so that the mother has fewer than 10 pairs, you end up with 10 pairs anyway, with the last ones being just new pairs. If the mother has more than 10 pairs, you are trying to set elements of the arraylist that don't exist, hence you are getting an exception. Another thing you might not have encountered yet is that you haven't copied the information in the pair, you copied the reference to the pair. This means if you give the mother a mutation later by changing the information in the pair rather than replacing a pair, it will affect the child and the child's descendants, which probably is not what you intended.
Instead, start the chromosome with an empty list of pairs, and then add copies of the pairs from the father, and then add copies of pairs from the mother.
Untested code:
public Chromosome crossOver(Chromosome father, Chromosome mother){
Chromosome replacement = new Chromosome();
replacement.getPairsList().clear(); // get rid of the original 10 pairs
int pos1 = r.nextInt(father.getPairsList().size());
while(pos1 >= 10)
pos1 = r.nextInt(father.getPairsList().size());
for(int i = 0; i < pos1; i++){
MyPair tempPair = father.getPairsList().get(i);
replacement.getPairsList().add(tempPair.makeCopy()); // appended copy instead of setting ith
}
for(int i = pos1; i < mother.getPairsList().size() - 1; i++){
MyPair tempPair = mother.getPairsList().get(i);
// ArrayList keeps trying to set out of bounds here
replacement.getPairsList().add(tempPair.makeCopy()); // append copy instead of setting ith
}
return replacement;
}
You have to make a makeCopy method in your Pair class that returns a Pair with the same information. There are other ways to do this.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to implement a bag-of-words in java - java

Related

How to find the object with the highest value of an attribute? [duplicate]

Couple Sum return statement explanation and alternative method

Search fast an element in array of java.util.LinkedList

Improving the speed of the HashMap

Genetic Algorithm - Java Crossover

Categories

Resources