How to find N grams of a word in Java? - java

For example if the input is "name" and the minGram is 1 and maxGramSize is 2 output will consist of n,a,m,e,na,am,me. If the minGram=2, maxGram=4 inputWord=name, output = na,am,me,nam,ame,name.
Function signature can be something like this:
public List<String> generateNGrams(String input, int minGramSize, int maxGramSize)
Initially I tried doing it with for loops, but I was finding it hard to follow the indices. Then I tried solving it using recursion with pen and paper but I'm still struggling with it. Can someone help me with this?

One solution:
private static void addNgrams(final int size, final String input,
final List<String> list)
{
final int maxStartIndex = input.length() - size;
for (int i = 0; i < maxStartIndex; i++)
list.add(input.stubString(i, i + size));
}
public List<String> generateNGrams(final String input, final int minSize,
final int maxSize)
{
final List<String> ret = new ArrayList<>();
for (int size = minSize; size <= maxSize; size++)
addNgrams(size, input, ret);
return ret;
}
Note: lacks basic error checkings (for instance, maxSize greater than the size of input; minSize greater than maxSize; others); left as an exercise.

Here is a program that recursively generates nGrams: This code also handles the tail grams.
import java.util.ArrayList;
public class NGrams {
ArrayList<String> nGrams = new ArrayList<String>();
public void generateNGrams(String str, int n) {
if (str.length() == n ) {
int counter = 0;
while (counter < n) {
nGrams.add(str.substring(counter));
counter++;
}
return;
}
int counter = 0;
String gram = "";
while (counter < n) {
gram += str.charAt(counter);
counter++;
}
nGrams.add(gram);
generateNGrams(str.substring(1), n);
}
public void printNGrams() {
for (String str : nGrams) {
System.out.println(str);
}
}
public static void main(String[] args) {
NGrams ng = new NGrams();
ng.generateNGrams("hello world", 3);
ng.printNGrams();
}
}
Output:
hel
ell
llo
lo
o w
wo
wor
orl
rld
ld
d

Related

Constraining all combinations of an array-list

I know similar questions have been asked before but I have found the answers confusing. I am trying to make a program that will find every combination of an array-list with no repetitions and only of the maximum size. If the list has 4 items it should print out only the combinations with all 4 items present. This is what I have so far:
public main(){
UI.initialise();
UI.addButton("Test", this::testCreate);
UI.addButton("Quit", UI::quit);
}
public void createCombinations(ArrayList<String> list, String s, int depth) {
if (depth == 0) {
return;
}
depth --;
for (int i = 0; i < list.size(); i++) {
if (this.constrain(s + "_" + list.get(i), list.size())) {
UI.println(s + "_" + list.get(i));
}
createCombinations(list, s + "_" + list.get(i), depth);
}
}
public void testCreate() {
ArrayList<String> n = new ArrayList<String>();
n.add("A"); n.add("B"); n.add("C"); n.add("D");
this.createCombinations(n , "", n.size());
}
public boolean constrain(String s, int size) {
// Constrain to only the maximum length
if ((s.length() != size*2)) {
return false;
}
// Constrain to only combinations without repeats
Scanner scan = new Scanner(s).useDelimiter("_");
ArrayList<String> usedTokens = new ArrayList<String>();
String token;
while (scan.hasNext()) {
token = scan.next();
if (usedTokens.contains(token)) {
return false;
} else {
usedTokens.add(token);
}
}
// If we fully iterate over the loop then there are no repitions
return true;
}
public static void main(String[] args){
main obj = new main();
}
This prints out the following which is correct:
_A_B_C_D
_A_B_D_C
_A_C_B_D
_A_C_D_B
_A_D_B_C
_A_D_C_B
_B_A_C_D
_B_A_D_C
_B_C_A_D
_B_C_D_A
_B_D_A_C
_B_D_C_A
_C_A_B_D
_C_A_D_B
_C_B_A_D
_C_B_D_A
_C_D_A_B
_C_D_B_A
_D_A_B_C
_D_A_C_B
_D_B_A_C
_D_B_C_A
_D_C_A_B
_D_C_B_A
This works for small lists but is very inefficient for larger ones. I am aware that what I have done is completely wrong but I want to learn the correct way. Any help is really appreciated. Thanks in advance.
P.S. This is not homework, just for interest although I am a new CS student (if it wasn't obvious).
Implementing Heap's algorithm in Java:
import java.util.Arrays;
public class Main {
public static void swap(final Object[] array, final int index1, final int index2) {
final Object tmp = array[index1];
array[index1] = array[index2];
array[index2] = tmp;
}
public static void printPermutations_HeapsAlgorithm(final int n, final Object[] array) {
final int[] c = new int[n];
for (int i = 0; i < c.length; ++i)
c[i] = 0;
System.out.println(Arrays.toString(array)); //Consume first permutation.
int i=0;
while (i < n) {
if (c[i] < i) {
if ((i & 1) == 0)
swap(array, 0, i);
else
swap(array, c[i], i);
System.out.println(Arrays.toString(array)); //Consume permutation.
++c[i];
i=0;
}
else
c[i++] = 0;
}
}
public static void main(final String[] args) {
printPermutations_HeapsAlgorithm(4, new Character[]{'A', 'B', 'C', 'D'});
}
}
Possible duplicate of this.

Java Sorting Algorithms

I have a class called ThreeSorts.java
The aim is to generate a random arraylist of size n - this works.
Then display the array - this works.
Then I have to prove that these sorting algorithms work, however when I pass the random generated array into one of the sorts like SortA(a);
and then display the array it does not get sorted the output is the same:
Generated ArrayList : (153),(209),(167),(117),(243),(67),(0),(148),(39),(188),
SortA ArrayList : (153),(209),(167),(117),(243),(67),(0),(148),(39),(188),
ThreeSorts.java:
import java.util.*;
public class ThreeSorts
{
private static ArrayList<Integer> CopyArray(ArrayList<Integer> a)
{
ArrayList<Integer> resa = new ArrayList<Integer>(a.size());
for(int i=0;i<a.size();++i) resa.add(a.get(i));
return(resa);
}
public static ArrayList<Integer> SortA(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
int n = a.size(),i;
boolean noswaps = false;
while (noswaps == false)
{
noswaps = true;
for(i=0;i<n-1;++i)
{
if (array.get(i) < array.get(i+1))
{
Integer temp = array.get(i);
array.set(i,array.get(i+1));
array.set(i+1,temp);
noswaps = false;
}
}
}
return(array);
}
public static ArrayList<Integer> SortB(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
Integer[] zero = new Integer[a.size()];
Integer[] one = new Integer[a.size()];
int i,b;
Integer x,p;
//Change from 8 to 32 for whole integers - will run 4 times slower
for(b=0;b<8;++b)
{
int zc = 0;
int oc = 0;
for(i=0;i<array.size();++i)
{
x = array.get(i);
p = 1 << b;
if ((x & p) == 0)
{
zero[zc++] = array.get(i);
}
else
{
one[oc++] = array.get(i);
}
}
for(i=0;i<oc;++i) array.set(i,one[i]);
for(i=0;i<zc;++i) array.set(i+oc,zero[i]);
}
return(array);
}
public static ArrayList<Integer> SortC(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
SortC(array,0,array.size()-1);
return(array);
}
public static void SortC(ArrayList<Integer> array,int first,int last)
{
if (first < last)
{
int pivot = PivotList(array,first,last);
SortC(array,first,pivot-1);
SortC(array,pivot+1,last);
}
}
private static void Swap(ArrayList<Integer> array,int a,int b)
{
Integer temp = array.get(a);
array.set(a,array.get(b));
array.set(b,temp);
}
private static int PivotList(ArrayList<Integer> array,int first,int last)
{
Integer PivotValue = array.get(first);
int PivotPoint = first;
for(int index=first+1;index<=last;++index)
{
if (array.get(index) > PivotValue)
{
PivotPoint = PivotPoint+1;
Swap(array,PivotPoint,index);
}
}
Swap(array,first,PivotPoint);
return(PivotPoint);
}
/////////////My Code////////////////
public static ArrayList<Integer> randomArrayList(int n)
{
ArrayList<Integer> list = new ArrayList<>();
Random random = new Random();
for (int i = 0; i < n; i++)
{
list.add(random.nextInt(255));
}
return list;
}
private static void showArray(ArrayList<Integer> a) {
for (Iterator<Integer> iter = a.iterator(); iter.hasNext();)
{
Integer x = (Integer)iter.next();
System.out.print(("("+x + ")"));
System.out.print(",");
//System.out.print(GetAge());
//System.out.print(") ");
}
System.out.println();
}
static void test(int n) {
//int n = 13;
ArrayList<Integer> a = randomArrayList(n);
System.out.println("Generated ArrayList : ");
showArray(a);
System.out.println("SortA ArrayList : ");
SortA(a);
showArray(a);
}
}
Test is called in main like this ThreeSorts.test(10);
Why is it not getting sorted even tho the random array is passed and there are no errors?
In your sample code you are only testing SortA which reads:
ArrayList<Integer> array = CopyArray(a);
...
return(array);
so actually it is taking a copy of your array, sorting it, and then returning you the sorted array.
So when you test it, instead of using:
SortA(a);
you need to use
a = SortA(a);
The type of data your function SortA returns is ArrayList<Integer>, which means it returns an array list of integers. You need to change the line SortA(a); to a = SortA(a);: this way a variable will receive the results of this function's work.
You have to set the returned value, otherwise a will not be sorted in test method. Change the way you call SortA(a) to:
static void test(int n) {
//int n = 13;
ArrayList<Integer> a = randomArrayList(n);
System.out.println("Generated ArrayList : ");
showArray(a);
System.out.println("SortA ArrayList : ");
a = SortA(a);
showArray(a);
}
instead of the method SortA(a); just use Collections.sort(a); inside static block. Like
ArrayList<Integer> a = randomArrayList(n);
System.out.println("Generated ArrayList : ");
showArray(a);
System.out.println("SortA ArrayList : ");
Collections.sort(a);
showArray(a);
Thats it.

How to retrieve element from ArrayList containing long array

How to retrieve element from ArrayList<long[]>?
I wrote like this:
ArrayList<long []> dp=new ArrayList<>();
//m is no of rows in Arraylist
for(int i=0;i<m;i++){
dp.add(new long[n]); //n is length of each long array
//so I created array of m row n column
}
Now how to get each element?
every element in that list is an array... so you need to carefully add those by:
using anonymous arrays new long[] { 1L, 2L, 3L }
or especifying the size using the new keyword new long[5]
public static void main(String[] args) throws Exception {
ArrayList<long[]> dp = new ArrayList<>();
// add 3 arrays
for (int i = 0; i < 3; i++) {
dp.add(new long[] { 1L, 2L, 3L });
}
// add a new array of size 5
dp.add(new long[5]); //all are by defaul 0
// get the info from array
for (long[] ls : dp) {
for (long l : ls) {
System.out.println("long:" + l);
}
System.out.println("next element in the list");
}
}
You get the arrays the same way you get anything from an ArrayList. For example, to get the tenth long[] stored in the ArrayList, you'd use the get method:
long[] tenthArray = dp.get(9);
You could also have an ArrayList of objetcs that contain an array of longs inside. But the problem so far with your code is that you are not putting any values in each long array.
public class NewClass {
private static class MyObject {
private long []v;
public MyObject(int n) {
v = new long[n];
}
#Override
public String toString() {
String x = "";
for (int i = 0; i < v.length; i++) {
x += v[i] + " ";
}
return x;
}
}
public static void main(String[] args) {
ArrayList<MyObject> dp = new ArrayList();
int m = 3;
int n = 5;
for (int i = 0; i < m; i++) {
dp.add(new MyObject(n));
}
for (MyObject ls : dp) {
System.out.println(ls);
}
}
}

Altering the value of k in kNN algorithm - Java

I have applied the KNN algorithm for classifying handwritten digits. the digits are in vector format initially 8*8, and stretched to form a vector 1*64..
As it stands my code applies the kNN algorithm but only using k = 1. I'm not entirely sure how to alter the value k after attempting a couple of things I kept getting thrown errors. If anyone could help push me in the right direction it would be really appreciated. The training dataset can be found here and the validation set here.
ImageMatrix.java
import java.util.*;
public class ImageMatrix {
private int[] data;
private int classCode;
private int curData;
public ImageMatrix(int[] data, int classCode) {
assert data.length == 64; //maximum array length of 64
this.data = data;
this.classCode = classCode;
}
public String toString() {
return "Class Code: " + classCode + " Data :" + Arrays.toString(data) + "\n"; //outputs readable
}
public int[] getData() {
return data;
}
public int getClassCode() {
return classCode;
}
public int getCurData() {
return curData;
}
}
ImageMatrixDB.java
import java.util.*;
import java.io.*;
import java.util.ArrayList;
public class ImageMatrixDB implements Iterable<ImageMatrix> {
private List<ImageMatrix> list = new ArrayList<ImageMatrix>();
public ImageMatrixDB load(String f) throws IOException {
try (
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr)) {
String line = null;
while((line = br.readLine()) != null) {
int lastComma = line.lastIndexOf(',');
int classCode = Integer.parseInt(line.substring(1 + lastComma));
int[] data = Arrays.stream(line.substring(0, lastComma).split(","))
.mapToInt(Integer::parseInt)
.toArray();
ImageMatrix matrix = new ImageMatrix(data, classCode); // Classcode->100% when 0 -> 0% when 1 - 9..
list.add(matrix);
}
}
return this;
}
public void printResults(){ //output results
for(ImageMatrix matrix: list){
System.out.println(matrix);
}
}
public Iterator<ImageMatrix> iterator() {
return this.list.iterator();
}
/// kNN implementation ///
public static int distance(int[] a, int[] b) {
int sum = 0;
for(int i = 0; i < a.length; i++) {
sum += (a[i] - b[i]) * (a[i] - b[i]);
}
return (int)Math.sqrt(sum);
}
public static int classify(ImageMatrixDB trainingSet, int[] curData) {
int label = 0, bestDistance = Integer.MAX_VALUE;
for(ImageMatrix matrix: trainingSet) {
int dist = distance(matrix.getData(), curData);
if(dist < bestDistance) {
bestDistance = dist;
label = matrix.getClassCode();
}
}
return label;
}
public int size() {
return list.size(); //returns size of the list
}
public static void main(String[] argv) throws IOException {
ImageMatrixDB trainingSet = new ImageMatrixDB();
ImageMatrixDB validationSet = new ImageMatrixDB();
trainingSet.load("cw2DataSet1.csv");
validationSet.load("cw2DataSet2.csv");
int numCorrect = 0;
for(ImageMatrix matrix:validationSet) {
if(classify(trainingSet, matrix.getData()) == matrix.getClassCode()) numCorrect++;
} //285 correct
System.out.println("Accuracy: " + (double)numCorrect / validationSet.size() * 100 + "%");
System.out.println();
}
In the for loop of classify you are trying to find the training example that is closest to a test point. You need to switch that with a code that finds K of the training points that is the closest to the test data. Then you should call getClassCode for each of those K points and find the majority(i.e. the most frequent) of the class codes among them. classify will then return the major class code you found.
You may break the ties (i.e. having 2+ most frequent class codes assigned to equal number of training data) in any way that suits your need.
I am really inexperienced in Java, but just by looking around the language reference, I came up with the implementation below.
public static int classify(ImageMatrixDB trainingSet, int[] curData, int k) {
int label = 0, bestDistance = Integer.MAX_VALUE;
int[][] distances = new int[trainingSet.size()][2];
int i=0;
// Place distances in an array to be sorted
for(ImageMatrix matrix: trainingSet) {
distances[i][0] = distance(matrix.getData(), curData);
distances[i][1] = matrix.getClassCode();
i++;
}
Arrays.sort(distances, (int[] lhs, int[] rhs) -> lhs[0]-rhs[0]);
// Find frequencies of each class code
i = 0;
Map<Integer,Integer> majorityMap;
majorityMap = new HashMap<Integer,Integer>();
while(i < k) {
if( majorityMap.containsKey( distances[i][1] ) ) {
int currentValue = majorityMap.get(distances[i][1]);
majorityMap.put(distances[i][1], currentValue + 1);
}
else {
majorityMap.put(distances[i][1], 1);
}
++i;
}
// Find the class code with the highest frequency
int maxVal = -1;
for (Entry<Integer, Integer> entry: majorityMap.entrySet()) {
int entryVal = entry.getValue();
if(entryVal > maxVal) {
maxVal = entryVal;
label = entry.getKey();
}
}
return label;
}
All you need to do is adding K as a parameter. Keep in mind, however, that the code above does not handle ties in a particular way.

Word-Puzzle (ArrayList problems)

So I have been creating a Word-Puzzle which I recently got stuck on a index out of bounds problem. This has been resolved however the program is not doing what I would like it to do. The Idea i that the test class will print 3 words in an array e.g. [FEN, GNU, NOB] (and yes they are apparently real english words). Then check to see if the first letter of each word combined is a word and so forth e.g. FGN if so add it to the next ArrayList else start again. Ideal output would be [FEN, GNU, NOB] [FGN, ENO, NUB] for example. However the current output is [FEN, GNU, NOB] [SOY, SOY, SOY] or [FEN, GNU, NOB] [].
The Test Class
public class Test_WordPuzzleGenerator {
public static void main(String[] args) throws FileNotFoundException {
System.out.println("Test 1: size 3");
int size = 3;
Puzzle.WordPuzzleGenerator.generatePuzzle(size);
}
}
WordGenerator:
public class WordPuzzleGenerator {
static ArrayList<String> wordList = new ArrayList<String>();
public static void generatePuzzle(int size) throws FileNotFoundException {
ArrayList<String> puzzleListY = new ArrayList<String>();
ArrayList<String> puzzleListX = new ArrayList<String>();
String randomXWord;
String letterSize = "" + size;
makeLetterWordList(letterSize);
boolean finished = false;
while ( !finished ) {
finished = true;
puzzleListX.clear();
puzzleListY.clear();
for (int i = 0; i < size; i++) {
int randomYWord = randomInteger(wordList.size());
String item = wordList.get(randomYWord);
puzzleListY.add(item);
}
for (int i = 0; i < puzzleListY.size(); i++) {
StringBuilder sb = new StringBuilder();
for (int j = 0; j < puzzleListY.size(); j++) {
sb.append(puzzleListY.get(j).charAt(i));
}
randomXWord = sb.toString();
if (!wordList.contains(randomXWord)) {
break;
}
puzzleListX.add(randomXWord);
if (puzzleListX.size() == size){
finished = false;
}
}
}
System.out.print(puzzleListY);
System.out.print(puzzleListX);
}
public static int randomInteger(int size) {
Random rand = new Random();
int randomNum = rand.nextInt(size);
return randomNum;
}
public static void makeLetterWordList(String letterSize) throws FileNotFoundException {
Scanner letterScanner = new Scanner( new File (letterSize + "LetterWords.txt"));
wordList.clear();
while (letterScanner.hasNext()){
wordList.add(letterScanner.next());
}
letterScanner.close();
}
}
I think you are confusing yourself with that finished variable. Replace the while condition with puzzleListX.size() != size and your code should work.

Categories