Why is this variable in the trie data structure insertion method?

Why is this variable in the trie data structure insertion method? - java

I'm using a trie data structure in my spell checker program. I used a method insertWord to insert words from a text file into the trie structure that I found online, but I'm confused as to why the variable offset is used. Why does it subtract an integer from the char array letters[i]? The program runs as it should. I'm just looking to understand the code a little more. Any help would be appreciated!
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class spellChecker {
static TrieNode createTree()
{
return(new TrieNode('\0', false));
}
static void insertWord(TrieNode root, String word)
{
int offset = 97;
int l = word.length();
char[] letters = word.toCharArray();
TrieNode curNode = root;
for (int i = 0; i < l; i++)
{
if (curNode.links[letters[i]-offset] == null)
curNode.links[letters[i]-offset] = new TrieNode(letters[i], i == l-1 ? true : false);
curNode = curNode.links[letters[i]-offset];
}
}
static boolean find(TrieNode root, String word)
{
char[] letters = word.toCharArray();
int l = letters.length;
int offset = 97;
TrieNode curNode = root;
int i;
for (i = 0; i < l; i++)
{
if (curNode == null)
return false;
curNode = curNode.links[letters[i]-offset];
}
if (i == l && curNode == null)
return false;
if (curNode != null && !curNode.fullWord)
return false;
return true;
}
private static String[] dictionaryArray;
public String[] dictionaryRead() throws Exception
{
// Find and read the file into array
String token = "";
// Use scanner for input file
Scanner dictionaryScan = new Scanner(new File("dictionary2.txt")).useDelimiter("\\s+");
List<String> dictionary = new ArrayList<String>();
//Check for next line in text file
while (dictionaryScan.hasNext())
{
token = dictionaryScan.next();
dictionary.add(token);
}
dictionaryScan.close();
dictionaryArray = dictionary.toArray(new String[0]);
return dictionaryArray;
}
public static void main(String[] args) throws Exception
{
spellChecker spellcheck = new spellChecker();
spellcheck.dictionaryRead();
TrieNode tree = createTree();
for (int i = 0; i < dictionaryArray.length; i++)
insertWord(tree, dictionaryArray[i]);
Scanner inputFileScan = new Scanner(new File("test.txt")).useDelimiter("\\s+");
//Check for next line in text file,
//then write arraylist to trie data structure
boolean mispelled = false;
while (inputFileScan.hasNext())
{
String word = inputFileScan.next();
if (!find(tree, word))
{
System.out.println("Mispelled word: " + word);
mispelled = true;
}
}
inputFileScan.close();
if(mispelled == false)
{
System.out.println("There are no errors.");
}
}
}
class TrieNode
{
char letter;
TrieNode[] links;
boolean fullWord;
TrieNode(char letter, boolean fullWord)
{
this.letter = letter;
links = new TrieNode[100];
this.fullWord = fullWord;
}
}

97 in ASCII table is 'a'. So variable offset is used to treat char 'a' as first (like we do in our alphabet)

97 is the numeric value of the character 'a'. When you wish to get the index of the links array that corresponds with the character letters[i], you have to subtract 97 from that character, so that 'a' is mapped to the index 0, 'b' is mapped to 1, ..., 'z' is mapped to 25.

Related

when using a trie i get an unusual error that shouldnt be possible

I am working on a simple spell checker which grabs the alphabet from a text file and then checks any word for whether it is a correct spelling or not using a trie
Code
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
public class Spellchecker {
static final int ALPHABET_SIZE = 26;
static class node {
node[] children = new node[ALPHABET_SIZE];
boolean isEndOfWord;
node() {
isEndOfWord = false;
for (int i = 0; i < ALPHABET_SIZE; i++)
children[i] = null;
}
}
static node root;
static void insert(String key) {
int length = key.length();
node pCrawl = root;
for (int level = 0; level < length; level++) {
int index = key.charAt(level) - 'a';
if (pCrawl.children[index] == null)
pCrawl.children[index] = new node();
pCrawl = pCrawl.children[index];
}
// mark last node as leaf
pCrawl.isEndOfWord = true;
}
static int wordCount (node root){
int result = 0;
if (root.isEndOfWord){
result++;
}
for (int i = 0; i < ALPHABET_SIZE; i ++){
if (root.children[i]!=null){
result += wordCount(root.children[i]);
}
}
return result;
}
// Returns true if key presents in trie, else false
static boolean search(String key) {
int length = key.length();
node pCrawl = root;
for (int level = 0; level < length; level++) {
int index = key.charAt(level) - 'a';
if (pCrawl.children[index] == null)
return false;
pCrawl = pCrawl.children[index];
}
return (pCrawl != null && pCrawl.isEndOfWord);
}
public static void main(String args[]) throws IOException {
ArrayList<String> dictionary = new ArrayList<>();
ArrayList<String> text = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader("dictionary.txt"))) {
while (br.ready()) {
dictionary.add(br.readLine());
}
} catch (Exception e) {
e.printStackTrace();
}
root = new node();
int i;
for (i = 0; i < dictionary.size(); i++)
insert(dictionary.get(i));
if (!search(){
System.out.println("not in dictionary: " + text.get(j));
}
}
}
}
the error I get is very confusing
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -65 out of bounds for length 26
at ASS1.Spellchecker.search(Spellchecker.java:67)
at ASS1.Spellchecker.main(Spellchecker.java:126)
I have absolutely no idea how it could possibly be -65. any help would be greatly appreciated.

Based on your code and the error, I think your search key has a space character in it, so space character ASCII code is 32, so while searching
int index = key.charAt(level) - 'a';
//key.charAt(level) = space = 32
//index = 32 - 'a' = 32 - 97 = -65 //invalid index
The best solution for this is, if your input is not all lower case english alphabet, then better use HashMap for holding children instead of array.

How to refacor a code use only loops and simple arrays?

I wrote that code and it's working. But I need to refactor it. I can use only simple methods for solving the problem, for example: "for" loops and simple array.
public class Anagram {
public static void main(String[] args) throws IOException {
Anagram anagrama = new Anagram();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));) {
System.out.println("Enter word or phrase: ");
String userText = reader.readLine();
String resultAnagrama = anagrama.makeAnagram(userText);
System.out.println("Result of Anagrama : " + resultAnagrama);
}
}
This method take user's text and make anagram, but all non-letters should stay on the same places
/**
* #param text
* #return reversed text and all non-letter symbols stay on the same places
*/
public String makeAnagram(String text) {
HashMap<Integer, Character> mapNonLetters;
String[] textFragments = text.split(" ");
StringBuilder stringBuilder = new StringBuilder();
//Check each elements of array for availability symbols and make reverse of elements
for (int i = 0; i < textFragments.length; i++) {
char[] arrayCharacters = textFragments[i].toCharArray();
mapNonLetters = saerchNonLetters(arrayCharacters); // search symbols
StringBuilder builderAnagramString = new StringBuilder(textFragments[i]);
//Delete all non-letters from element of array
int reindexing = 0;
for (HashMap.Entry<Integer, Character> entry : mapNonLetters.entrySet()) {
int key = entry.getKey();
builderAnagramString.deleteCharAt(key - reindexing);
reindexing ++;
}
builderAnagramString.reverse();
//Insert all non-letters in the same places where ones stood
for (HashMap.Entry<Integer, Character> entry : mapNonLetters.entrySet()) {
int key = entry.getKey();
char value = entry.getValue();
builderAnagramString.insert(key, value);
}
textFragments[i] = builderAnagramString.toString();
stringBuilder.append(textFragments[i]);
if (i != (textFragments.length - 1)) {
stringBuilder.append(" ");
}
mapNonLetters.clear();
}
return stringBuilder.toString();
}
This method search all non-letters from each worв of user's text
/**
* Method search symbols
* #param arrayCharacters
* #return HashMap with symbols found from elements of array
*/
public HashMap<Integer, Character> saerchNonLetters(char[] arrayCharacters) {
HashMap<Integer, Character> mapFoundNonLetters = new HashMap<Integer, Character>();
for (int j = 0; j < arrayCharacters.length; j++) {
//Letters lay in scope 65-90 (A-Z) and 97-122 (a-z) therefore other value is non-letter
if (arrayCharacters[j] < 65 || (arrayCharacters[j] > 90 && arrayCharacters[j] < 97) ||
arrayCharacters[j] > 122) {
mapFoundNonLetters.put(j, arrayCharacters[j]);
}
}
return mapFoundNonLetters;
}
}

public class Anagram {
public static void main(String[] args) {
String text = "!Hello123 ";
char[] chars = text.toCharArray();
int left = 0;
int right = text.length() - 1;
while (left < right) {
boolean isLeftLetter = Character.isLetter(chars[left]);
boolean isRightLetter = Character.isLetter(chars[right]);
if (isLeftLetter && isRightLetter) {
swap(chars, left, right);
left++;
right--;
} else {
if (!isLeftLetter) {
left++;
}
if (!isRightLetter) {
right--;
}
}
}
String anagram = new String(chars);
System.out.println(anagram);
}
private static void swap(char[] chars, int index1, int index2) {
char c = chars[index1];
chars[index1] = chars[index2];
chars[index2] = c;
}
}

If I understand correctly and you need only 1 anagram, this should work:
String originalString = "This is 1 sentence with 2 numbers!";
System.out.println("original: "+originalString);
// make a mask to keep track of where the non letters are
char[] mask = originalString.toCharArray();
for(int i=0; i<mask.length; i++)
mask[i] = Character.isLetter(mask[i]) ? '.' : mask[i];
System.out.println("mask: "+ new String(mask));
// remove non letters from the string
StringBuilder sb = new StringBuilder();
for(int i=0; i< originalString.length(); i++) {
if(mask[i] == '.')
sb.append(originalString.charAt(i));
}
// find an anagram
String lettersOnlyAnagram = sb.reverse().toString();
// reinsert the non letters at their place
int letterIndex = 0;
for(int i=0; i<mask.length; i++) {
if(mask[i] == '.') {
mask[i] = lettersOnlyAnagram.charAt(letterIndex);
letterIndex++;
}
}
String anagram = new String(mask);
System.out.println("anagram: "+ anagram);
It prints out:
original: This is 1 sentence with 2 numbers!
mask: .... .. 1 ........ .... 2 .......!
anagram: sreb mu 1 nhtiwecn etne 2 ssisihT!

Tries - Contacts - Hackerrank

I am currently trying to solve this challenge on hackerrank Tries - Contacts
And my algorithm fails for only one test case. Test case #1. Can any one share any insight into what I need to change in order to pass this test case. I am using a TrieNode class that contains a hashmap of its children nodes. I also store the size of each node to deonte how many words it contains.
Test case #1 is as follows:
add s
add ss
add sss
add ssss
add sssss
find s
find ss
find sss
find ssss
find sssss
find ssssss
The code is as follows:
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
TrieNode root;
class TrieNode{
Map<Character, TrieNode> children = new HashMap<Character, TrieNode>();
int size=0;
}
public Solution(){
root = new TrieNode();
}
public void addWord(String word){
TrieNode current = root;
for(int i=0;i<word.length();i++){
char c = word.charAt(i);
if(!current.children.containsKey(c)){
//create a new node
TrieNode temp = new TrieNode();
//add the word to the current node's children
current.children.put(c, temp);
current.size++;
current = temp;
}
else{
current.size++;
current = current.children.get(c);
}
}
}
public void prefixSearch(String letters){
TrieNode current = root;
boolean sequenceExists = true;
for(int i=0; i<letters.length();i++){
char c = letters.charAt(i);
if(current.children.containsKey(c)){
if(i == letters.length()-1){
System.out.println(current.size);
break;
}
else{
current = current.children.get(c);
}
}
else{
System.out.println(0);
break;
}
}
}
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
int n = in.nextInt();
Solution sol = new Solution();
for(int a0 = 0; a0 < n; a0++){
String op = in.next();
String contact = in.next();
if(op.equals("add")){
if(contact.length() >=1 && contact.length() <=21)
sol.addWord(contact);
}
else if(op.equals("find")){
if(contact.length() >=1 && contact.length() <=21)
sol.prefixSearch(contact);
}
else{
//do nothing
}
}
}
}

When you add words to your Trie you increment count for all nodes, except the last one. This is quite common and hard to notice kind of error called off-by-one https://en.wikipedia.org/wiki/Off-by-one_error
add this line once again at the end of addWord method (after the loop):
current.size++;
Your code passed test case 0 because this particular bug in your code doesn't show up when you look up a prefix like hac-kerrank, but does show up when you look up for complete word including the last character like hackerrank, or sssss

I have this solution, except test case 0, 1 & 5 all others are timing out. Here is my implementation in java 8. Where should I improve my code to pass all the test cases
public class Contacts {
static Map<String, String> contactMap = new HashMap<>();
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
int n = in.nextInt();
for(int a0 = 0; a0 < n; a0++){
String op = in.next();
String contact = in.next();
if(op.equalsIgnoreCase("add")) {
addOrFind(contact, op);
} else {
addOrFind(contact, op);
}
}
}
public static void addOrFind(String name, String type) {
if(type.equalsIgnoreCase("add")) {
contactMap.put(name, name);
} else {
long count = contactMap.entrySet().stream()
.filter(p->p.getKey().contains(name)).count();
System.out.println(count);
}
}
}

If you will checkout:enter link description here
And also use their test case of:
4
add hack
add hackerrank
find hac
find hak
It will compile.
// from his website at https://github.com/RodneyShag/HackerRank_solutions/blob/master/Data%20Structures/Trie/Contacts/Solution.java
import java.util.Scanner;
import java.util.HashMap;
public class Solution {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
Trie trie = new Trie();
for (int i = 0; i < n; i++) {
String operation = scan.next();
String contact = scan.next();
if (operation.equals("add")) {
trie.add(contact);
} else if (operation.equals("find")) {
System.out.println(trie.find(contact));
}
}
scan.close();
}
}
/* Based loosely on tutorial video in this problem */
class TrieNode {
private HashMap<Character, TrieNode> children = new HashMap<>();
public int size = 0; // this was the main trick to decrease runtime to pass tests.
public void putChildIfAbsent(char ch) {
children.putIfAbsent(ch, new TrieNode());
}
public TrieNode getChild(char ch) {
return children.get(ch);
}
}
class Trie {
TrieNode root = new TrieNode();
Trie(){} // default constructor
Trie(String[] words) {
for (String word : words) {
add(word);
}
}
public void add(String str) {
TrieNode curr = root;
for (int i = 0; i < str.length(); i++) {
Character ch = str.charAt(i);
curr.putChildIfAbsent(ch);
curr = curr.getChild(ch);
curr.size++;
}
}
public int find(String prefix) {
TrieNode curr = root;
/* Traverse down tree to end of our prefix */
for (int i = 0; i < prefix.length(); i++) {
Character ch = prefix.charAt(i);
curr = curr.getChild(ch);
if (curr == null) {
return 0;
}
}
return curr.size;
}
}

Recursively creating strings from 2 dimensional character array

I need to create all possible strings from a 2d-array so that the first character comes from charArray[0], the second character comes from charArray[1]...and the final character comes from the charArray[keyLength-1].
Example:
input:
char[][] charArray =
{{'m','M','L','S','X'}
{'e','E','o','N','Z'}
{'o','G','F','r','Y'}
{'D','H','I','J','w'}};
output:
{meoD, meoH, meoI,..., XZYJ, XZYw} //in an Array or ArrayList
I had a working solution that builts a tree with each character in charArray[0] as a root and it did a depth first string construction, but the JVM ran out of memory for charArray lengths less than 12. I would normally take an iterative approach, but the charArray length (i.e. key string length) is decided at runtime and I would like to find a more complete solution than writing a switch statement on the key string length and manually writing out loops for a finite number of key string lengths.
I've been stuck on this small section of my program for longer than I'd like to admit, so any help would be greatly appreciated!

Here is how it can be solved:
import java.util.ArrayList;
import java.util.List;
public class Arrays2D {
public static void main(String[] args) {
//input keys
String[][] charArray =
{{"m","M","L","S","X"},
{"e","E","o","N","Z"},
{"o","G","F","r","Y"},
{"D","H","I","J","w"}};
//print output
System.out.println(findCombinations(charArray));
}
private static List<String> findCombinations(String[][] charArray) {
List<String> prev = null;
for (int i = 0; i < charArray.length; i++) {
List<String> curr = new ArrayList<String>();
for (int j = 0; j < charArray[i].length; j++) {
if (i + 1 < charArray.length) {
for (int l = 0; l < charArray[i+1].length; l++) {
String s = charArray[i][j] + charArray[i + 1][l];
curr.add(s);
}
}
}
if (prev != null && !curr.isEmpty()) {
prev = join(prev, curr);
}
if (prev == null)
prev = curr;
}
return prev;
}
public static List<String> join(List<String> p, List<String> q) {
List<String> join = new ArrayList<String>();
for (String st1 : p) {
for (String st2 : q) {
if (st1.substring(st1.length() - 1).equals(st2.substring(0, 1))) {
String s = st1 + st2;
s = s.replaceFirst(st1.substring(st1.length() - 1), "");
join.add(s);
}
}
}
return join;
}
}
I have checked and it correctly generating the combinations. You can run and see the output.

counting unique words in a string without using an array

So my task is to write a program that counts the number of words and unique words in a given string that we get from the user without using arrays.
I can do the first task and was wondering how I could go about doing the second part.
For counting the number of words in the string I have
boolean increment = false;
for (int i = 0; i < inputPhrase.length(); i++){
if(validChar(inputPhrase.charAt(i))) //validChar(char c) is a simple method that returns a valid character{
increment = true;
}
else if(increment){
phraseWordCount ++;
increment = false;
}
}
if(increment) phraseWordCount++; //in the case the last word is a valid character
(originally i left this out and was off by one word)
to count unique words can I somehow modify this?

Here a suggestion how to do it without arrays:
1) Read every char until a blank is found and add this char to a second String.
2) If a blank is found, add it (or another token to seperate words) to the second String.
2a) Read every word from second String comparing it to the current word from he input String
public static void main(String[] args) {
final String input = "This is a sentence that is containing three times the word is";
final char token = '#';
String processedInput = "";
String currentWord = "";
int wordCount = 0;
int uniqueWordCount = 0;
for (char c : input.toCharArray()) {
if (c != ' ') {
processedInput += c;
currentWord += c;
} else {
processedInput += token;
wordCount++;
String existingWord = "";
int occurences = 0;
for (char c1 : processedInput.toCharArray()) {
if (c1 != token) {
existingWord += c1;
} else {
if (existingWord.equals(currentWord)) {
occurences++;
}
existingWord = "";
}
}
if (occurences <= 1) {
System.out.printf("New word: %s\n", currentWord);
uniqueWordCount++;
}
currentWord = "";
}
}
wordCount++;
System.out.printf("%d words total, %d unique\n", wordCount, uniqueWordCount);
}
Output
New word: This
New word: is
New word: a
New word: sentence
New word: that
New word: containing
New word: three
New word: times
New word: the
New word: word
12 words total, 10 unique

Using the Collections API you can count words with the following method:
private int countWords(final String text) {
Scanner scanner = new Scanner(text);
Set<String> uniqueWords = new HashSet<String>();
while (scanner.hasNext()) {
uniqueWords.add(scanner.next());
}
scanner.close();
return uniqueWords.size();
}
If it is possible that you get normal sentences with punctuation marks you can change the second line to:
Scanner scanner = new Scanner(text.replaceAll("[^0-9a-zA-Z\\s]", "").toLowerCase());

Every time a word ends findUpTo checks if the word is contained in the input before the start of that word. So "if if if" would count as one unique and three total words.
/**
* Created for http://stackoverflow.com/q/22981210/1266906
*/
public class UniqueWords {
public static void main(String[] args) {
String inputPhrase = "one two ones two three one";
countWords(inputPhrase);
}
private static void countWords(String inputPhrase) {
boolean increment = false;
int wordStart = -1;
int phraseWordCount = 0;
int uniqueWordCount = 0;
for (int i = 0; i < inputPhrase.length(); i++){
if(validChar(inputPhrase.charAt(i))) { //validChar(char c) is a simple method that returns a valid character{
increment = true;
if(wordStart == -1) {
wordStart = i;
}
} else if(increment) {
phraseWordCount++;
final String lastWord = inputPhrase.substring(wordStart, i);
boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
if(unique) {
uniqueWordCount++;
}
increment = false;
wordStart = -1;
}
}
if(increment) {
phraseWordCount++; //in the case the last word is a valid character
final String lastWord = inputPhrase.substring(wordStart, inputPhrase.length());
boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
if(unique) {
uniqueWordCount++;
}
}
System.out.println("Words: "+phraseWordCount);
System.out.println("Unique: "+uniqueWordCount);
}
private static boolean findUpTo(String needle, String haystack, int lastPos) {
boolean previousValid = false;
boolean unique = true;
for(int j = 0; unique && j < lastPos - needle.length(); j++) {
final boolean nextValid = validChar(haystack.charAt(j));
if(!previousValid && nextValid) {
// Word start
previousValid = true;
for (int k = 0; k < lastPos - j; k++) {
if(k == needle.length()) {
// We matched all characters. Only if the word isn't finished it is unique
unique = validChar(haystack.charAt(j+k));
break;
}
if (needle.charAt(k) != haystack.charAt(j+k)) {
break;
}
}
} else {
previousValid = nextValid;
}
}
return unique;
}
private static boolean validChar(char c) {
return Character.isAlphabetic(c);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is this variable in the trie data structure insertion method? - java

97 in ASCII table is 'a'. So variable offset is used to treat char 'a' as first (like we do in our alphabet)

97 is the numeric value of the character 'a'. When you wish to get the index of the links array that corresponds with the character letters[i], you have to subtract 97 from that character, so that 'a' is mapped to the index 0, 'b' is mapped to 1, ..., 'z' is mapped to 25.

Related

when using a trie i get an unusual error that shouldnt be possible

How to refacor a code use only loops and simple arrays?

Tries - Contacts - Hackerrank

Recursively creating strings from 2 dimensional character array

counting unique words in a string without using an array

Categories

Resources