So I have this wrapper program that enables me to return two quantities from a method.
** Wrapper Class**
public class Words
{
private String leftWords;
private String rightWords;
public Words(String leftWords, String rightWords) {
this.leftWords = leftWords;
this.rightWords = rightWords;
}
public String getLeftWords() {
return leftWords;
}
public String getRightWords() {
return rightWords;
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result
+ ((leftWords == null) ? 0 : leftWords.hashCode());
result = prime * result
+ ((rightWords == null) ? 0 : rightWords.hashCode());
return result;
}
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Words other = (Words) obj;
if (leftWords == null)
{
if (other.leftWords != null)
return false;
}
else if (!leftWords.equals(other.leftWords))
return false;
if (rightWords == null)
{
if (other.rightWords != null)
return false;
}
else if (!rightWords.equals(other.rightWords))
return false;
return true;
}
}
The method I want to tie this with is :
private static Map <Set<String>,Set<Words>> getLeftRightWords(LinkedHashMap<Set<String>,Set<Integer>> nnpIndexTokens, NLChunk chunk) throws FileNotFoundException
{
// Map <Set<String>,Set<Integer>> nnpMap = new LinkedHashMap<Set<String>, Set<Integer>>();
Map <Set<String>,Set<Words>> contextMap = new LinkedHashMap<Set<String>, Set<Words>>();
Set<Words> leftRightWords = new HashSet<Words>();
//for(NLChunk chunk : sentence.getChunks()){
if(chunk.getStrPostags().contains("NNP")){
String leftWords = "";
String rightWords = "";
int chunkStartIndex = chunk.getStartIndex();
int chunkEndIndex = chunk.getEndIndex();
//nnpMap = getNNPs(chunk);
String previous = null;
int previousNnpEndIndex = 0;
int previousNnpStartIndex = 0;
for (Map.Entry<Set<String>, Set<Integer>> entry : nnpIndexTokens.entrySet()){
for (Iterator<String> i = entry.getKey().iterator(); i.hasNext();){
Set<Integer> entryIndex = null;
int nnpStartIndex = 0;
int nnpEndIndex = 0;
String currentElement = i.next();
//Deriving values for beginning and ending of chunk
//and beginning and ending of NNP
if (!(entry.getValue().isEmpty())){
if (currentElement.trim().split(" ").length > 1){
entryIndex = entry.getValue();
nnpStartIndex = entryIndex.iterator().next();
nnpEndIndex = getLastElement(entryIndex);
}
else {
entryIndex = entry.getValue();
nnpStartIndex = entryIndex.iterator().next();
nnpEndIndex = nnpStartIndex;
}
}
if(!(chunkStartIndex<=nnpStartIndex && chunkEndIndex>=nnpEndIndex)){
continue;
}
//Extracting LEFT WORDS of the NNP
//1)If another NNP is present in left words, left words of current NNP start from end index of previous NNP
if (previous != null && chunk.toString().substring(chunkStartIndex, nnpStartIndex).contains(previous)){
int leftWordsEndIndex = nnpStartIndex;
int leftWordsStartIndex = previousNnpEndIndex;
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=leftWordsStartIndex
&& nlword.getIndex()<leftWordsEndIndex )
leftWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
//2) If no left words are present
if (chunkStartIndex == nnpStartIndex){
System.out.println("NO LEFT WORDS");
}
//3) Normal case where left words consist of all the words left of the NNP starting from the beginning of the chunk
else {
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=chunkStartIndex
&& nlword.getIndex()<nnpStartIndex )
leftWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
//Extracting RIGHT WORDS of NNP
if (entry.getKey().iterator().hasNext()){// entry.getKey().iterator().hasNext()){
String nextElement = entry.getKey().iterator().next();
//1)If another NNP is present in right words, right words of current NNP start from end index of current NNP to beginning of next NNP
if (nextElement !=null && nextElement != currentElement && chunk.toString().substring(entry.getValue().iterator().next(), chunkEndIndex).contains(nextElement)){
int rightWordsStartIndex = entryIndex.iterator().next();
int rightWordsEndIndex = entry.getValue().iterator().next();
//String rightWord="";
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=rightWordsStartIndex
&& nlword.getIndex()<rightWordsEndIndex )
rightWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
}
//2) If no right words exist
if(nnpEndIndex == chunkEndIndex){
System.out.println("NO RIGHT WORDS");
//continue;
}
//3) Normal case where right words consist of all the words right of the NNP starting from the end of the NNP till the end of the chunk
else {
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=nnpEndIndex+1
&& nlword.getIndex()<=chunkEndIndex )
rightWords+=nlword.getToken() +" ";
}
System.out.println("RIGHT WORDS:" + rightWords+ "OF:"+ currentElement);
}
if (previous == null){
previous = currentElement;
previousNnpStartIndex = nnpStartIndex;
previousNnpEndIndex = nnpEndIndex;
}
Words contextWords = new Words(leftWords.toString(), rightWords.toString());
leftRightWords.add(contextWords);
}
contextMap.put(entry.getKey(), leftRightWords);
}//nnps set
}
System.out.println(contextMap);
return contextMap;
}
As you can see what I am trying to do in this method is taking a proper noun and extracting the left and right words of that proper noun.E.g for a chunk "fellow Rhode Island solution provider" my output is:
LEFT WORDS:fellow OF:Rhode Island
RIGHT WORDS:solution provider OF:Rhode Island
Now I want to put these in a map where Rhode Island is the key and the values for this are solution provider and fellow.
When I try to print this map the output get is:
{[Rhode Island ]=[com.gyan.siapp.nlp.test.Words#681330f0]}
How do i get the right output?
I don't know if it is the only issue but your class Words does not override
toString() method.
Not sure about your Java skill level. So sorry if im posting what you are familiar to.
System.out.println(...) calls toString() method to get message for the object.
By overriding default with your own implementation
#Override
public String toString(){
return "leftWords: "+leftWords+", rightWords: "+rightWords;
}
You change com.gyan.siapp.nlp.test.Words#681330f0 to your own output.
Related
I have created a game of wordle where you guess an unknown word. I am using a linked list. If one or more characters exist in the word but not in the correct position the letter is surrounded with a +. If the character is not in the word it is surrounded with a -. If it is in the correct position it is surrounded by !. My labelWord(Word mystery) method, where mystery is the unknown word checks if the two words are equal, then a toString method is used to output the guess word with the tags. For example, if the mystery word is "CHINA" and the guess is "CHARM" the toString() method will output: "Word: !C! !H! +A+ -R- -M- ". I have run into difficulty when the guess word is longer than the mystery word.
Word word5 = new Word(Letter.fromString("OBJECT"));
Word word6 = new Word(Letter.fromString("CODE"));
word5.labelWord(word6);
System.out.println(word5.toString());
In the above code, I am guessing the word "OBJECT" for the mystery word "CODE". However, this outputs
"Word: +O+ -B- -J- !E! C T ". The method does not label any of the letters that are outside the length of the mystery word. How can I adjust my method to be able to label characters in words that are longer than the mystery word?
public Word(Letter[] letters) {
LinearNode<Letter> prevLetter = null;
LinearNode<Letter> currentLetter;
for (int i = 0; i< letters.length; i++) {
currentLetter = new LinearNode<>(letters[i]);
if (i == 0) {
this.firstLetter = currentLetter;
prevLetter = currentLetter;
continue;
}
prevLetter.setNext(currentLetter);
prevLetter = currentLetter;
}
}
public boolean labelWord(Word mystery) {
LinearNode<Letter> otherNode = mystery.firstLetter;
LinearNode<Letter> thisNode = this.firstLetter;
boolean isEqual = true;
while(true){
if(thisNode == null || otherNode == null){
if(thisNode == null && otherNode == null){
break;
}
isEqual = false;
break;
}
if(thisNode.getElement().equals(otherNode.getElement())){
thisNode.getElement().setCorrect();
}
else{
if(mystery.contains(thisNode)){
thisNode.getElement().setUsed();
}
else {
thisNode.getElement().setUnused();
}
isEqual = false;
}
thisNode = thisNode.getNext();
otherNode = otherNode.getNext();
}
return isEqual;
}
private boolean contains(LinearNode<Letter> letter) {
LinearNode<Letter> currentNode = firstLetter;
while (currentNode != null) {
if (currentNode.getElement().equals(letter.getElement())) {
return true;
}
currentNode = currentNode.getNext();
}
return false;
}
public String toString() {
String str = "Word: ";
LinearNode<Letter> currentNode = firstLetter;
while (currentNode != null) {
if (currentNode.getElement() == null) {
str += currentNode.getElement().toString();
break;
}
str += currentNode.getElement().toString() + " ";
currentNode = currentNode.getNext();
}
return str;
}
In labelWord, you break when otherNode is null so the next letters of thisNode aren't evaluated. So get rid of these tests or change
otherNode = otherNode.getNext();
to
if (otherNode.getNext() != null) {
otherNode = otherNode.getNext();
}
True indeed. You could try this :
while(true){
if(thisNode == null){ //the evaluation continues even with a shorter mystery
isEqual = false;
break;
}
[...]
thisNode = thisNode.getNext();
otherNode = otherNode.getNext();
if(thisNode == null && otherNode == null){ //keeps equality
break;
}
return isEqual;
}
I am trying to implement an insert method of the Paricia trie data structure. I handled many cases but currently I am stuck in the case to differ these both cases:
case 1: Inserting the following 3 strings:
abaxyxlmn, abaxyz, aba
I could implement this case with the code below.
case 2: Inserting the following 3 strings:
abafg, abara, a
In the second case I do not know how to differ between the first and the second case since I need a clue to know when should I append the different substring ab to the childern edge to get abfa, abra. Finally, add ab as a child too to the node a. Please see the image below.
Code:
package patriciaTrie;
import java.util.ArrayList;
import java.util.Scanner;
public class Patricia {
private TrieNode nodeRoot;
private TrieNode nodeFirst;
// create a new node
public Patricia() {
nodeRoot = null;
}
// inserts a string into the trie
public void insert(String s) {
if (nodeRoot == null) {
nodeRoot = new TrieNode();
nodeFirst = new TrieNode(s);
nodeFirst.isWord = true;
nodeRoot.next.add(nodeFirst);
} else {
// nodeRoot.isWrod = false;
insert(nodeRoot, s);
}
}
private String checkEdgeString(ArrayList<TrieNode> history, String s) {
StringBuilder sb = new StringBuilder();
for (TrieNode nextNodeEdge : history) {
int len1 = nextNodeEdge.edge.length();
int len2 = s.length();
int len = Math.min(len1, len2);
for (int index = 0; index < len; index++) {
if (s.charAt(index) != nextNodeEdge.edge.charAt(index)) {
break;
} else {
char c = s.charAt(index);
sb.append(c);
}
}
}
return sb.toString();
}
private void insert(TrieNode node, String s) {
ArrayList<TrieNode> history = new ArrayList<TrieNode>();
for (TrieNode nextNodeEdge : node.getNext()) {
history.add(nextNodeEdge);
}
String communsubString = checkEdgeString(history, s);
System.out.println("commun String: " + communsubString);
if (!communsubString.isEmpty()) {
for (TrieNode nextNode : node.getNext()) {
if (nextNode.edge.startsWith(communsubString)) {
String substringSplit1 = nextNode.edge
.substring(communsubString.length());
String substringSplit2 = s.substring(communsubString
.length());
if (substringSplit1.isEmpty() && !substringSplit2.isEmpty()) {
// 1. case: aba, abaxyz
} else if (substringSplit2.isEmpty()
&& !substringSplit1.isEmpty()) {
// 2. case: abaxyz, aba
ArrayList<TrieNode> cacheNextNode = new ArrayList<TrieNode>();
System.out.println("node edge string is longer.");
if (nextNode.getNext() != null && !nextNode.getNext().isEmpty()) {
for (TrieNode subword : nextNode.getNext()) {
subword.edge = substringSplit1.concat(subword.edge); //This line
cacheNextNode.add(subword);
}
nextNode.getNext().clear();
nextNode.edge = communsubString;
nextNode.isWord = true;
TrieNode child = new TrieNode(substringSplit1);
child.isWord = true;
nextNode.next.add(child);
for(TrieNode node1 : cacheNextNode){
child.next.add(node1);
System.out.println("Test one");
}
cacheNextNode.clear();
}else{
nextNode.edge = communsubString;
TrieNode child = new TrieNode(substringSplit1);
child.isWord = true;
nextNode.next.add(child);
System.out.println("TEST");
}
} else if(substringSplit1.isEmpty() && substringSplit2.isEmpty()){
//3. case: aba and aba.
nextNode.isWord = true;
}else {
// 4. Case: abauwt and abaxyz
//if(nextNode.getNext().isEmpty())
}
break;
}
}
} else {
// There is no commun substring.
System.out.println("There is no commun substring");
TrieNode child = new TrieNode(s);
child.isWord = true;
node.next.add(child);
}
}
public static void main(String[] args) {
Patricia p = new Patricia();
Scanner s = new Scanner(System.in);
while (s.hasNext()) {
String op = s.next();
if (op.equals("INSERT")) {
p.insert(s.next());
}
}
}
class TrieNode {
ArrayList<TrieNode> next = new ArrayList<TrieNode>();
String edge;
boolean isWord;
// To create normal node.
TrieNode(String edge) {
this.edge = edge;
}
// To create the root node.
TrieNode() {
this.edge = "";
}
public ArrayList<TrieNode> getNext() {
return next;
}
public String getEdge() {
return edge;
}
}
}
I am using Selenium Webdriver with Java binding and I'm testing a sorting functionality whereby you have values arranged in an arraylist = {"4"","4.5"","5.5""}. So basically the string contains decimal points as well as double quotation. I have the following code below. The problem is that I keep getting false due to the fact that when it compares the current to previous, it comes with false. Thanks for your help
public Boolean checkAscendingOrderScreensize(List<String> list){
if(list == null || list.isEmpty())
return false;
if(list.size() == 1)
return true;
for(int i=1; i<list.size();i++)
{
String current = list.get(i).toString();
String previous = list.get(i-1).toString();
current = current.replace(",",".");
current = current.replace("\"", "");
previous = previous.replace(",",".");
previous = previous.replace("\"", "");
if(current.compareTo(previous)>0)
return false;
}
return true;
}
You will need to covert the string to double before conversion. So the workaround will be
public class Main {
public static void main(String bicycle[]) {
List<String> texts = new ArrayList<String>();
texts.add("4\"");
texts.add("4.5\"");
texts.add("5.5\"");
System.out.println(checkAscendingOrderScreensize(texts));
// prints true
}
public static boolean checkAscendingOrderScreensize(List<String> list) {
if (list == null || list.isEmpty())
return false;
if (list.size() == 1)
return true;
for (int i = 1; i < list.size(); i++) {
String current = list.get(i).toString();
String previous = list.get(i - 1).toString();
current = current.replace(",", ".");
current = current.replace("\"", "");
previous = previous.replace(",", ".");
previous = previous.replace("\"", "");
if(Double.valueOf(current)<Double.valueOf(previous))
return false;
}
return true;
}
}
I have the following function.
private boolean codeContains(String name, String code) {
if (name == null || code == null) {
return false;
}
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(name) + "\\b");
Matcher matcher = pattern.matcher(code);
return matcher.find();
}
It is called many thousand times in my code, and is the function in which my program spends the most amount of time in. Is there any way to make this function go faster, or is it already as fast as it can be?
If you don't need to check word boundaries, you might do this :
private boolean codeContains(String name, String code) {
return name != null && code != null && code.indexOf(name)>=0;
}
If you need to check word boundaries but, as I suppose is your case, you have a big code in which you often search, you could "compile" the code once by
splitting the code string using the split method
putting the tokens in a HashSet (checking if a token is in a hashset is reasonably fast).
Of course, if you have more than one code, it's easy to store them in a structure adapted to your program, for example in a map having as key the file name.
"Plain" string operations will (almost) always be faster than regex, especially when you can't pre-compile the pattern.
Something like this would be considerably faster (with large enough name and code strings), assuming Character.isLetterOrDigit(...) suits your needs:
private boolean codeContains(String name, String code) {
if (name == null || code == null || code.length() < name.length()) {
return false;
}
if (code.equals(name)) {
return true;
}
int index = code.indexOf(name);
int nameLength = name.length();
if (index < 0) {
return false;
}
if (index == 0) {
// found at the start
char after = code.charAt(index + nameLength);
return !Character.isLetterOrDigit(after);
}
else if (index + nameLength == code.length()) {
// found at the end
char before = code.charAt(index - 1);
return !Character.isLetterOrDigit(before);
}
else {
// somewhere inside
char before = code.charAt(index - 1);
char after = code.charAt(index + nameLength);
return !Character.isLetterOrDigit(after) && !Character.isLetterOrDigit(before);
}
}
And a small test succeeds:
#Test
public void testCodeContainsFaster() {
final String code = "FOO some MU code BAR";
org.junit.Assert.assertTrue(codeContains("FOO", code));
org.junit.Assert.assertTrue(codeContains("MU", code));
org.junit.Assert.assertTrue(codeContains("BAR", code));
org.junit.Assert.assertTrue(codeContains(code, code));
org.junit.Assert.assertFalse(codeContains("FO", code));
org.junit.Assert.assertFalse(codeContains("BA", code));
org.junit.Assert.assertFalse(codeContains(code + "!", code));
}
This code seemed to do it:
private boolean codeContains(String name, String code) {
if (name == null || code == null || name.length() == 0 || code.length() == 0) {
return false;
}
int nameLength = name.length();
int lastIndex = code.length() - nameLength;
if (lastIndex < 0) {
return false;
}
for (int curr = 0; curr < lastIndex; ) {
int index = code.indexOf(name, curr);
int indexEnd = index + nameLength;
if (index < 0 || lastIndex < index) {
break;
}
boolean leftOk = index == curr ||
index > curr && !Character.isAlphabetic(code.charAt(index - 1));
boolean rightOk = index == lastIndex ||
index < lastIndex && !Character.isAlphabetic(code.charAt(indexEnd));
if (leftOk && rightOk) {
return true;
}
curr += indexEnd;
}
return false;
}
The accepted answer goes to dystroy as he was the first to point me in the right direction, excellent answer by Bart Kiers though, +1!
This is the question:
Problem I.
We define the Pestaina strings as follows:
ab is a Pestaina string.
cbac is a Pestaina string.
If S is a Pestaina string, so is SaS.
If U and V are Pestaina strings, so is UbV.
Here a, b, c are constants and S,U,V are variables. In these rules,
the same letter represents the same string. So, if S = ab, rule 3
tells us that abaab is a Pestaina string. In rule 4, U and V represent
Grandpa strings, but they may be different.
Write the method
public static boolean isPestaina(String in)
That returns true if in is a Pestaina string and false otherwise.
And this is what i have so far which only works for the first rule, but the are some cases in which doesnt work for example "abaaab":
public class Main {
private static boolean bool = true;
public static void main(String[] args){
String pestaina = "abaaab";
System.out.println(pestaina+" "+pestainaString(pestaina));
}
public static boolean pestainaString(String p){
if(p == null || p.length() == 0 || p.length() == 3) {
return false;
}
if(p.equals("ab")) {
return true;
}
if(p.startsWith("ab")){
bool = pestainaString(p, 1);
}else{
bool = false;
}
return bool;
}
public static boolean pestainaString(String p, int sign){
String letter;
char concat;
if("".equals(p)){
return false;
}
if(p.length() < 3){
letter = p;
concat = ' ';
p = "";
pestainaString(p);
}else if(p.length() == 3 && (!"ab".equals(p.substring(0, 2)) || p.charAt(2) != 'a')){
letter = p.substring(0, 2);
concat = p.charAt(2);
p = "";
pestainaString(p);
}else{
letter = p.substring(0, 2);
concat = p.charAt(2);
pestainaString(p.substring(3));
}
if(letter.length() == 2 && concat == ' '){
if(!"ab".equals(letter.trim())){
bool = false;
//concat = 'a';
}
}else if((!"ab".equals(letter)) || (concat != 'a')){
bool = false;
}
System.out.println(letter +" " + concat);
return bool;
}
}
Please tell me what i have done wrong.
I found the problem i was calling the wrong method.
You are describing a Context Free Language, which can be described as a Context Free Grammer and parsed with it. The field of parsing these is widely researched and there is a lot of resources for it out there.
The wikipedia page also discusses some algorithms to parse these, specifically - I think you are interested in the Early Parsers
I also believe this "language" can be parsed using a push down automaton (though not 100% sure about it).
public static void main(String[] args) {
// TODO code application logic here
String text = "cbacacbac";
System.out.println("Is \""+ text +"\" a Pestaina string? " + isPestaina(text));
}
public static boolean isPestaina(String in) {
if (in.equals("ab")) {
return true;
}
if (in.equals("cbac")) {
return true;
}
if (in.length() > 3) {
if ((in.startsWith("ab") || in.startsWith("cbac"))
&& (in.endsWith("ab") || in.endsWith("cbac"))) {
return true;
}
}
return false;
}
That was fun.
public boolean isPestaina(String p) {
Set<String> existingPestainas = new HashSet<String>(Arrays.asList(new String[]{"ab", "cbac"}));
boolean isP = false;
int lengthParsed = 0;
do {
if (lengthParsed > 0) {
//just realized there's a touch more to do here for the a/b
//connecting rules...I'll leave it as an excersize for the readers.
if (p.substring(lengthParsed).startsWith("a") ||
p.substring(lengthParsed).startsWith("b")) {
//good connector.
lengthParsed++;
} else {
//bad connector;
return false;
}
}
for (String existingP : existingPestainas) {
if (p.substring(lengthParsed).startsWith(existingP)) {
isP = true;
lengthParsed += existingP.length();
}
}
if (isP) {
System.err.println("Adding pestaina: " + p.substring(0, lengthParsed));
existingPestainas.add(p.substring(0, lengthParsed));
}
} while (isP && p.length() >= lengthParsed + 1);
return isP;
}