Replacing pronouns throughout a String - java

I'm working on a project where I want to be able to be able to parse some text and find nouns and a lot of the text I want to parse has pronouns in it for Example => "Emma the parrot was a bird. She lived in a tall tree".
I don't want to work with "She's" etc. as they aren't seen as nouns in the dictionary I'm working with so I've been working on a method to replace She etc with the previous occurrence of a name. So the above example would output to => "Emma the parrot was a bird. Emma lived in a tall tree".
The method is working fine when I have a small sample however when I'm working with 3-4 different people in one text it doesn't work.
public static String replacePronouns(String text, ArrayList<String> dictionary) {
String[] strArray = text.replaceAll("\\.", " .").replaceAll("\\,", "").split("\\s+");
String previousName = "";
for(int i = 0; i < strArray.length; i++ ) {
//we'll have to set this to be more dynamic -> change to pronouns in dicitonary
if(strArray[i].equals("His") || strArray[i].equals("She") || strArray[i].equals("she") || strArray[i].equals("him") || strArray[i].equals("he") || strArray[i].equals("her")) {
for(int j = (i-1); j>=0; j--) {
int count = dictionary.size()-1;
boolean flag = false;
while(count>=0 && flag==false) {
if(strArray[j].equals(dictionary.get(count).split(": ")[1]) && dictionary.get(count).split(": ")[0].equals("Name")) {
previousName = strArray[j];
flag = true; }
count--;
} }
strArray[i] = previousName; } }
return Arrays.toString(strArray).replaceAll("\\[", "").replaceAll("\\,", "").replaceAll("\\]", "");
}
It takes in my text
String text = "Karla was a bird and she had beautifully colorful feathers. She lived in a tall tree.
And a "dictionary"
ArrayList<String> dictionary = new ArrayList<>();
dictionary.add("Name: hunter");
dictionary.add("Name: Karla");
dictionary.add("Noun: hawk");
dictionary.add("Noun: feathers");
dictionary.add("Noun: tree");
dictionary.add("Noun: arrows");
dictionary.add("Verb: was a");
dictionary.add("Verb: had");
dictionary.add("Verb: missed");
dictionary.add("Verb: knew");
dictionary.add("Verb: offered");
dictionary.add("Verb: pledged");
dictionary.add("Verb: shoot");
But it always outputs Karla in this example, even if we had "The hunter shot his gun" in the same string.
Any help on why this isn't working would be appreciated

This isn't working because you continue looping over j even after you've found a match in the dictionary. That is - you keep looking back towards the beginning of the string, and eventually find "Karla", even though you've already matched "hunter".
There are many ways you could fix this. One very simple one would be to move boolean flag = false; up to before the for loop over j, and change the condition from j >= 0 to j >= 0 && !flag, so that you stop looping as soon as flag is true. Like so :
public static String replacePronouns(String text, ArrayList<String> dictionary) {
String[] strArray = text.replaceAll("\\.", " .").replaceAll("\\,", "").split("\\s+");
String previousName = "";
for (int i = 0; i < strArray.length; i++) {
boolean flag = false;
// we'll have to set this to be more dynamic -> change to pronouns in dicitonary
if (strArray[i].equals("His") || strArray[i].equals("She") || strArray[i].equals("she") || strArray[i].equals("him") || strArray[i].equals("he") || strArray[i].equals("her")) {
for (int j = (i - 1); j >= 0 && flag == false; j--) {
int count = dictionary.size() - 1;
while (count >= 0) {
if (strArray[j].equals(dictionary.get(count).split(": ")[1]) && dictionary.get(count).split(": ")[0].equals("Name")) {
previousName = strArray[j];
flag = true;
}
count--;
}
}
strArray[i] = previousName;
}
}
return Arrays.toString(strArray).replaceAll("\\[", "").replaceAll("\\,", "").replaceAll("\\]", "");
}
If you placed your } characters in a more standard way, this kind of error would be easier to see.

Related

finding matching characters between two strings

public class findMatching {
public static void main(String[] args) {
String matchOne = "caTch";
String matchTwo = "cat";
findMatching(matchOne, matchTwo);
}
public static void findMatching(String matchOne, String matchTwo) {
int lengthOne = matchOne.length();
int lengthTwo = matchTwo.length();
char charOne;
char charTwo;
while(!matchOne.equals(matchTwo)) {
for(int i = 0; i < lengthOne && i < lengthTwo; i++) {
charOne = matchOne.charAt(i);
charTwo = matchTwo.charAt(i);
if(charOne == charTwo && lengthOne >= lengthTwo) {
System.out.print(charOne);
} else if (charOne == charTwo && lengthTwo >= lengthOne){
System.out.print(charTwo);
} else {
System.out.print(".");
}
}
}
}
}
I have created a static method called findMatching that takes in two String parameters and then compares them for matching characters. If matching characters are detected, it prints said characters while characters that do not match are represented with an "." instead.
EX: for caTch and cat, the expected output should be ca... where the non-matching characters are represented with "." in the longer string.
Right now however, my program's output only prints out ca. in that it only prints the non-matching characters for the shorter string. I believe the source of the problem may be with the logic of my if statements for lengthOne and lengthTwo.
Your for loop will terminate as soon as you meet the length of the shorter string as you are doing i < lengthOne && i < lengthTwo. So you need to keep the loop going until you get to the end of the longer string, but stop comparing when the shorter string is out of characters.
Something like this would be able to do the job
public static void findMatching(String matchOne, String matchTwo) {
int lengthOne = matchOne.length();
int lengthTwo = matchTwo.length();
char charOne;
char charTwo;
for(int i = 0; i < lengthOne || i < lengthTwo; i++) {
if(i < lengthOne && i < lengthTwo) {
charOne = matchOne.charAt(i);
charTwo = matchTwo.charAt(i);
if (charOne == charTwo) {
System.out.print(charTwo);
} else {
System.out.print(".");
}
} else {
System.out.print(".");
}
}
}
I am not sure what the point of the while loop is as it would make the program run forever, however maybe you want that as an if?
First for loop to print all the common and uncommon (".") characters and second for loop to print uncommon characters (".") till the difference between the larger and smaller string using absolute (abs) function
Code:
for(int i = 0; i < lengthTwo && i < lengthOne; i++){
if(matchOne.charAt(i) == matchTwo.charAt(i)){
System.out.print(matchOne.charAt(i));
}
else{
System.out.print(".");
}
}
for(int j = 0; j < java.lang.Math.abs(lengthOne - lengthTwo);j++){
System.out.print(".");
}

Abbreviation expander for a given lexicon

I am trying to write a program that will allows users to make short blog entries by typing abbreviations for common words. On completion of the input, Program will expand the abbreviations according to the lexicon defined.
Conditions
A substituted word must be the shortest word that can be formed by adding zero or more letters (or punctuation symbols) to the abbreviation.
If two or more unique words can be formed by adding the same number of letters, then the abbreviation should be printed as it is.
Input
The input is divided into two sections.
The first section is the lexicon itself, and the second section is a user's blog entry that needs to be expanded. The sections are divided by a single | character.
For example:-
cream chocolate every ever does do ice is fried friend friends lick like floor favor flavor flower best but probably poorly say says that what white our you your strawberry storyboard the | wht flvr ic crm ds yr bst fnd lke? ur frds lk stbry, bt choc s prly th bs flr vr!
Output
what flavor ice cream does your best friend like? our friends lk strawberry, but chocolate is poorly the best floor ever!
I have written the program for this and tested it locally with many different test cases with success but it fails on submission to test server.
An automated Test suit runs to validate the program’s output on its submission to test server. In case of failure, details of the failing test case/cases are not visible.
Below is the program
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.StringTokenizer;
public class BlogEntry {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String[][] info = readInput();
String[] output = inputExpander(info[0],info[1]);
//System.out.println();
for(int i = 0; i < output.length; ++i) {
if(i!=0)
System.out.print(" ");
System.out.print(output[i]);
}
}
public static String[][] readInput() {
BufferedReader bufferReader = new BufferedReader(new InputStreamReader(
System.in));
String input = null;
String[][] info = new String[2][];
String[] text;
String[] abbr;
try {
input = bufferReader.readLine();
StringTokenizer st1 = new StringTokenizer(input, "|");
String first = "", second = "";
int count = 0;
while (st1.hasMoreTokens()) {
++count;
if(count == 1)
first = st1.nextToken();
if(count == 2)
second = st1.nextToken();
}
st1 = new StringTokenizer(first, " ");
count = st1.countTokens();
text = new String[count];
count = 0;
while (st1.hasMoreTokens()) {
text[count] = st1.nextToken();
count++;
}
st1 = new StringTokenizer(second, " ");
count = st1.countTokens();
abbr = new String[count];
count = 0;
while (st1.hasMoreTokens()) {
abbr[count] = st1.nextToken();
count++;
}
info[0] = text;
info[1] = abbr;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return info;
}
public static String[] inputExpander(String[] text, String[] abbr) {
String[] output = new String[abbr.length];
boolean result;
for (int i = 0; i < abbr.length; ++i) {
String abbrToken = abbr[i];
char[] char_abbr_token = abbrToken.toCharArray();
for (int j = 0; j < text.length; ++j) {
String textToken = text[j];
boolean flag2 = false;
if ((char_abbr_token[char_abbr_token.length - 1] == '!')
|| (char_abbr_token[char_abbr_token.length - 1] == '?')
|| (char_abbr_token[char_abbr_token.length - 1] == ',')
|| (char_abbr_token[char_abbr_token.length - 1] == ';')) {
flag2 = true;
}
char[] char_text_token = textToken.toCharArray();
result = ifcontains(char_text_token, char_abbr_token);
if (result) {
int currentCount = textToken.length();
int alreadyStoredCount = 0;
if (flag2)
textToken = textToken
+ char_abbr_token[char_abbr_token.length - 1];
if (output[i] == null)
output[i] = textToken;
else {
alreadyStoredCount = output[i].length();
char[] char_stored_token = output[i].toCharArray();
if ((char_stored_token[char_stored_token.length - 1] == '!')
|| (char_stored_token[char_stored_token.length - 1] == '?')
|| (char_stored_token[char_stored_token.length - 1] == ',')
|| (char_stored_token[char_stored_token.length - 1] == ';')) {
alreadyStoredCount -= 1;
}
if (alreadyStoredCount > currentCount) {
output[i] = textToken;
} else if (alreadyStoredCount == currentCount) {
output[i] = abbrToken;
}
}
}
}
if(output[i] == null)
output[i] = abbrToken;
}
return output;
}
public static boolean ifcontains(char[] char_text_token,
char[] char_abbr_token) {
int j = 0;
boolean flag = false;
for (int i = 0; i < char_abbr_token.length; ++i) {
flag = false;
for (; j < char_text_token.length; ++j) {
if ((char_abbr_token[i] == '!') || (char_abbr_token[i] == '?')
|| (char_abbr_token[i] == ',')
|| (char_abbr_token[i] == ';')) {
flag = true;
break;
}
if (char_abbr_token[i] == char_text_token[j]) {
flag = true;
break;
}
}
if (!flag)
return flag;
}
//System.out.println("match found" + flag);
return flag;
}
}
Can someone direct/hint me to/about the possible use case which I may have missed in the implementation? Thanks in advance.
Ran your program with duplicate word in input (lexicon). When a word is repeated in the lexicon, it is not getting expanded because the check is only on the length(line no. 112) of the stored word not its content.
I think you need to check:-
If same word appears more than once then expand.
If 2 or more unique words of same length appear then keep it short.
How would I approach solving this:
Parse the input, tokenize the lexicon and the text.
For each (possibly abbreviated) token like choc convert it to a regular expression like .*c.*h.*o.*c.*.
Search for shortest lexicon words matching this regular expression. Replace the text token if exactly one is found, otherwise leave it alone.
It is quite hard to say what's wrong with your code without careful debugging. It is hard to understand what one or the other part of the code does, it's not quite self-evident.

Swapping string position in Arraylist java

I have a sentence: Humpty Dumpty sat on a wall.
I want the strings to swap positions such that : Dumpty Humpty on sat wall a.
So the code that I wrote is following :
import java.util.*;
public class Swap{
public static void main(String []args) {
ArrayList<String> sentence = new ArrayList<String>();
sentence.add("Humpty");
sentence.add("Dumpty");
sentence.add("sat");
sentence.add("on");
sentence.add("a");
sentence.add("wall");
int size = sentence.size() ; // for finding size of array list
int numb ;
if(size%2 == 0) {
numb = 1;
}
else {
numb = 0;
}
ArrayList<String> newSentence = new ArrayList<String>();
if(numb == 1) {
for(int i = 0; i <= size ; i = i+2) {
String item = sentence.get(i);
newSentence.add(i+1, item);
}
for(int i = 1; i<=size ; i = i+2) {
String item2 = sentence.get(i);
newSentence.add(i-1, item2);
}
System.out.println(newSentence);
}
else {
System.out.println(sentence);
}
}
}
The code is compiling correct but when I run it, its giving an error.
What i understand of this is that I am adding strings to the array list leaving positions in between. Like adding at position 3 without filling position 2 first. How do I overcome this problem ?
You're correct about your problem - you're trying to insert an element into index 1 before inserting an element at all (at index 0), and you get an IndexOutOfBoundsException.
If you want to use your existing code to achieve this task, simply have just one loop as such:
if(numb == 1) {
for(int i = 0; i < size-1 ; i = i+2) {
String item = sentence.get(i+1);
newSentence.add(i, item);
item = sentence.get(i);
newSentence.add(i+1, item);
}
}
If you want to be a bit more sophisticated a use Java's built-in functions, you can use swap:
for(int i = 0; i < size-1 ; i = i+2) {
Collections.swap(sentence, i, i+1);
}
System.out.println(sentence);
You can initilize newSentence using:
ArrayList<String> newSentence = new ArrayList<String>(Collections.nCopies(size, ""));
This will let you access/skip any position in between 0 and size. So you can keep your rest of the code as it is.
just remember all index are being populated with empty String here.
That is because:
for(int i = 0; i <= size ; i = i+2) {
String item = sentence.get(i);
newSentence.add(i+1, item);//Here you will face java.lang.IndexOutOfBoundsException
}
for(int i = 1; i<=size ; i = i+2) {
String item2 = sentence.get(i);
newSentence.add(i-1, item2);//Here you will face java.lang.IndexOutOfBoundsException
}
instead of this, try following code:
if(numb == 1) {
for(int i = 0; i < size-1 ; i +=2) {
Collections.swap(sentence, i, i+1);
}
}

Can anybody help me to correct the following code?

Please help me to identify my mistakes in this code. I am new to Java. Excuse me if I have done any mistake. This is one of codingbat java questions. I am getting Timed Out error message for some inputs like "xxxyakyyyakzzz". For some inputs like "yakpak" and "pakyak" this code is working fine.
Question:
Suppose the string "yak" is unlucky. Given a string, return a version where all the "yak" are removed, but the "a" can be any char. The "yak" strings will not overlap.
public String stringYak(String str) {
String result = "";
int yakIndex = str.indexOf("yak");
if (yakIndex == -1)
return str; //there is no yak
//there is at least one yak
//if there are yaks store their indexes in the arraylist
ArrayList<Integer> yakArray = new ArrayList<Integer>();
int length = str.length();
yakIndex = 0;
while (yakIndex < length - 3) {
yakIndex = str.indexOf("yak", yakIndex);
yakArray.add(yakIndex);
yakIndex += 3;
}//all the yak indexes are stored in the arraylist
//iterate through the arraylist. skip the yaks and get non-yak substrings
for(int i = 0; i < length; i++) {
if (yakArray.contains(i))
i = i + 2;
else
result = result + str.charAt(i);
}
return result;
}
Shouldn't you be looking for any three character sequence starting with a 'y' and ending with a 'k'? Like so?
public static String stringYak(String str) {
char[] chars = (str != null) ? str.toCharArray()
: new char[] {};
StringBuilder sb = new StringBuilder();
for (int i = 0; i < chars.length; i++) {
if (chars[i] == 'y' && chars[i + 2] == 'k') { // if we have 'y' and two away is 'k'
// then it's unlucky...
i += 2;
continue; //skip the statement sb.append
} //do not append any pattern like y1k or yak etc
sb.append(chars[i]);
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(stringYak("1yik2yak3yuk4")); // Remove the "unlucky" strings
// The result will be 1234.
}
It looks like your programming assignment. You need to use regular expressions.
Look at http://www.vogella.com/articles/JavaRegularExpressions/article.html#regex for more information.
Remember, that you can not use contains. Your code maybe something like
result = str.removeall("y\wk")
you can try this
public static String stringYak(String str) {
for (int i = 0; i < str.length(); i++) {
if(str.charAt(i)=='y'){
str=str.replace("yak", "");
}
}
return str;
}

List collections interface in java

Please find below a function in my code:
private static List<String> formCrfLinesWithMentionClass(int begin, int end, String id,
List<String> mList, int mListPos, List<String> crf) {
List<String> crfLines = crf;
int yes = 0;
mListPosChanged = mListPos;
//--------------------------------------------------------------------------
for (int crfLinesMainIter = begin; crfLinesMainIter < end; ) {
System.out.println(crfLines.get(crfLinesMainIter));
//---------------------------------------------------------------------------
//the total number of attributes without orthographic features
//in a crfLine excluding the class attribute is 98
if (!crfLines.get(crfLinesMainIter).equals("") && crfLines.get(crfLinesMainIter).split("\\s").length == 98) {
//in mList parenthesis are represented by the symbol
//in crfLines parenthesis are represented by -LRB- or -RRB-
//we make a check to ensure the equality is preserved
if(val.equals(crfLines.get(crfLinesMainIter).split("\\s")[0])) {
yes = checkForConsecutivePresence(crfLinesMainIter, mList, mListPos, id, crfLines);
if (yes > 0) {
mListPosChanged += yes;
System.out.println("formCrfLinesWithMentionClass: "+mListPosChanged);
for (int crfLinesMentionIter = crfLinesMainIter;
crfLinesMentionIter < crfLinesMainIter + yes;
crfLinesMentionIter++) {
String valString = "";
if (crfLinesMentionIter == crfLinesMainIter) {
valString += crfLines.get(crfLinesMentionIter);
valString += " B";
crfLines.add(crfLinesMentionIter, valString);
}
else {
valString += crfLines.get(crfLinesMentionIter);
valString += " I";
crfLines.add(crfLinesMentionIter, valString);
}
}
crfLinesMainIter += yes;
}
else {
++crfLinesMainIter;
}
}
else {
++crfLinesMainIter;
}
}
else {
++crfLinesMainIter;
}
}
return crfLines;
}
The problem I face is as follows:
crfLines is a List collections interface.
When the for loop (between //-----) starts out, the crfLines.get(crfLinesMainIter) works fine. But once, it enters into the if and other processing is carried out on it, even though "crfLinesMainIter" changes the crfLines.get(crfLinesMainIter) seems to get a certain previous value. It does not retrieve the actual value at the index. Has anyone faced such a scenario? Would anyone be able to tell me why this occurs?
My actual question is, when does it occur that even though the indexes might be different a list.get() function still retrieves a value from before which was at another index?
For example:
List crfLines = new LinkedList<>();
if crfLinesMainIter = 2
crfLines.get(crfLinesMainIter) brings me a value say 20 and this value 20 satisfies the if loop condition. So then further processing happens. Now when the for loop executes the values of crfLinesMainIter changes to say 5. In this case, crfLines.get(5) should actually bring me a different value, but it still brings me the previous value 20.
(Not an answer.)
Reworked (more or less) for some modicum of readability:
private static List<String> formCrfLinesWithMentionClass(int begin, int end, String id, List<String> mList, int mListPos, List<String> crf) {
List<String> crfLines = crf;
mListPosChanged = mListPos;
int i = begin;
while (i < end) {
if (crfLines.get(i).equals("") || (crfLines.get(i).split("\\s").length != 98)) {
++i;
continue;
}
if (!val.equals(crfLines.get(i).split("\\s")[0])) {
++i;
continue;
}
int yes = checkForConsecutivePresence(i, mList, mListPos, id, crfLines);
if (yes <= 0) {
++i;
continue;
}
mListPosChanged += yes;
for (int j = i; j < i + yes; j++) {
String valString = crfLines.get(j);
valString += (j == i) ? " B" : " I";
crfLines.add(j, valString);
}
i += yes;
}
return crfLines;
}
What is mListPostChanged? I find it confusing that it's being set to the value of a parameter named mListPos--it makes me think the m prefix is meaningless.
What is val in the line containing the split?

Categories