Simple words finder

Simple words finder - java

We have to find all simple words from a bunch of simple and compound words. For example:
Input: chat, ever, snapchat, snap, salesperson, per, person, sales, son, whatsoever, what so.
Output should be: chat, ever, snap, per, sales, son, what, so
My sample code:
private static String[] find(String[] words) {
// TODO Auto-generated method stub
//System.out.println();
ArrayList<String> alist = new ArrayList<String>();
Set<String> r1 = new HashSet<String>();
for(String s: words){
alist.add(s);
}
Collections.sort(alist,new Comparator<String>() {
public int compare(String o1, String o2) {
return o1.length()-o2.length();
}
});
//System.out.println(alist.toString());
int count= 0;
for(int i=0;i<alist.size();i++){
String check = alist.get(i);
r1.add(check);
for(int j=i+1;j<alist.size();j++){
String temp = alist.get(j);
//System.out.println(check+" "+temp);
if(temp.contains(check) ){
alist.remove(temp);
}
}
}
System.out.println(r1.toString());
String res[] = new String[r1.size()];
for(String i:words){
if(r1.contains(i)){
res[count++] = i;
}
}
return res;
}
I am unable to get a solution with the above code. Any suggestions or ideas
compound word = concatenation of two or more words;rest all words are considered as simple words
We have to remove all the compound words

Algorithm
Read the input into a set of Strings i.e. Set<String> input
Create a empty set for simple words i.e. Set<String> simpleWords
Create a empty set for compound words i.e. Set<String> compoundWords
Iterate over input. For each element
Let length of element be elemLength
Create a set Set<String> inputs of all Strings from the set input (excluding element) for which the below is true
Length less than element
Not present in compundWords
Create set of all permutations of inputs(by concatenating) with max length = elemLength i.e. Set<String> currentPermutations
See if any of currentPermutations is = element
If yes, add element into compoundWords
If no, continue with iteration
After the iteration is done place all Strings from input which are not present in compoundWords into simpleWords
That is your answer.
Before you start writing code decide the logic that you are going to use. Use descriptive variable names and you are basically done.
The reason your logic is not working has to do with the way you are checking temp.contains(check). This is checking for substring not a compound word as per your definition.

Related

Sorting an arraylist with integers

I have a collection of strings in an array like this:
ArrayList<String> collection = new ArrayList<>();
That stores:
collection: ["(,0,D=1", "(,1,D=2", "),2,D=2", "),3,D=1", "(,4,D=1", "(,5,D=2", "),6,D=2", "),7,D=1"]
I have a lot of d=1 and d=2, as you can see. How do I organize this from 1 first to 2? I tried to use a for loop but the list can contain an infinite number of d=x's. Can you help me organize?
Also, please help me so I don't change the ORDER of any numbers. Example:
collection: ["(,0,D=1", "),3,D=1", "(,4,D=1", "),7,D=1", "(,1,D=2", "),2,D=2", "(,5,D=2", "),6,D=2"]
So like, every parentheses will be aligned.
I should note that collection[0] = "(,0,D=1"

You should use a class for the items, not a string, e.g. Class Item {char c; int i; int depth;} and ArrayList. Then you can easily sort the list with a custom Comparator.

You can implement your own Comparator to do the sorting. A Comparator is a sorting algorithms that you define for your application which written in programming language. Give Collections.sort() a Comparator basically you teach Java how you want to sort the list. And it will sort the list for you.
This implementation is based on the following assumptions:
The comparison will only take effect on the first D=x pattern, subsequent will be ignored.
Element is sorted in ascending order base on x.
Elements do not have D=x will be placed at the back
class DeeEqualComparator implements Comparator<String> {
private static final String REGEX = "D=([0-9])+";
#Override
public int compare(String s1, String s2) {
// find a D=x pattern from the element
Matcher s1Matcher = Pattern.compile(REGEX).matcher(s1);
Matcher s2Matcher = Pattern.compile(REGEX).matcher(s2);
boolean s1Match = s1Matcher.find();
boolean s2Match = s2Matcher.find();
if (s1Match && s2Match) {
// if match is found on s1 and s2, return their integer comparison result
Integer i1 = Integer.parseInt(s1Matcher.group(1));
Integer i2 = Integer.parseInt(s2Matcher.group(1));
return i1.compareTo(i2);
} else if (s1Match) {
// if only s1 found a match
return -1;
} else if (s2Match) {
// if only s2 found a match
return 1;
} else {
// if no match is found on both, return their string comparison result
return s1.compareTo(s2);
}
}
Test run
public static void main(String[] args) {
String[] array = {
// provided example
"(,0,D=1", "(,1,D=2", "),2,D=2", "),3,D=1", "(,4,D=1", "(,5,D=2", "),6,D=2", "),7,D=1"
// extra test case
, "exception-5", "exception-0", "D=68" };
List<String> list = Arrays.asList(array);
Collections.sort(list, new DeeEqualComparator());
System.out.print(list);
}
output
[(,0,D=1, ),3,D=1, (,4,D=1, ),7,D=1, (,1,D=2, ),2,D=2, (,5,D=2, ),6,D=2, D=68, exception-0, exception-5]

How would I get all the words from a list that begins with the specified letter

I am trying to show the list of words which start with the letter specified by the user input.
So for example if I add three words to my list, cat, corn and dog, and the user inputs the letter c, the output on the Java applet should be cat, corn.
However, I have no idea on how to go about this.
public void actionPerformed(ActionEvent e){
if (e.getSource() == b1 ){
x = textf.getText();
wordList.add(x);
textf.setText(null);
}
if (e.getSource() == b2 ){
}
}
b1 is adding all the user input into a secretly stored list, and I now want to make another button when pressed to show the words that start with the specified letter by the user.
textf = my text field
wordList = my list I created
x = string I previously defined

You could loop through all the possible indices, check if the element at that index starts with the letter, and print it if it does.
ALTERNATIVE (and probably better) code (I was going to put this after, but since its better it deserves to be first. Taken form #larsmans's answer here.
//given wordList as the word list
//given startChar as the character to search for in the form of a *String* not char
for (String element : wordList){
if (element.startsWith(startChar)){
System.out.println(element);
}
}
DISCLAIMER: This code is untested, I don't have much experience with ArrayList, and Java is more of a quaternary programming language for me. Hope it works :)
//given same variables as before
for (int i = 0; i < wordList.size(); i++){
String element = wordList.get(i);
//you could remove the temporary variable and replace element with
// wordList.get(i)
if (element.startsWith(startChar){
System.out.println(element);
}
}

You can try something like this -
public static void main(String[] args) {
String prefix = "a";
List<String> l = new ArrayList<String>();
List<String> result = new ArrayList<String>();
l.add("aah");
l.add("abh");
l.add("bah");
for(String s: l) {
if(s.startsWith(prefix)) {
result.add(s);
}
}
System.out.println(result);
}
Result is -
[aah, abh]

If you can use Java 8 then you can build in features to filter your list:
public static void main(String[] args) throws Exception {
final List<String> list = new ArrayList<>();
list.add("cat");
list.add("corn");
list.add("dog");
System.out.println(filter(list, "c"));
}
private static List<String> filter(final Collection<String> source, final String prefix) {
return source.stream().filter(item -> item.startsWith(prefix)).collect(Collectors.toList());
}
This uses the filter method to filter each list item which starts with the String of the prefix argument.
The output is:
[cat, corn]

separating unique values in an algorithm

I am decomposing a series of 90,000+ strings into a discrete list of the individual, non-duplicated pairs of words that are included in the strings with the rxcui id values associated with each string. I have developed a method which tries to accomplish this, but it is producing a lot of redundancy. Analysis of the data shows there are about 12,000 unique words in the 90,000+ source strings, after I clean and format the contents of the strings.
How can I change the code below so that it avoids creating the redundant rows in the destination 2D ArrayList (shown below the code)?
public static ArrayList<ArrayList<String>> getAllWords(String[] tempsArray){//int count = tempsArray.length;
int fieldslenlessthan2 = 0;//ArrayList<String> outputarr = new ArrayList<String>();
ArrayList<ArrayList<String>> twoDimArrayList= new ArrayList<ArrayList<String>>();
int idx = 0;
for (String s : tempsArray) {
String[] fields = s.split("\t");//System.out.println(" --- fields.length is: "+fields.length);
if(fields.length>1){
ArrayList<String> row = new ArrayList<String>();
System.out.println("fields[0] is: "+fields[0]);
String cleanedTerms = cleanTerms(fields[1]);
String[] words = cleanedTerms.split(" ");
for(int j=0;j<words.length;j++){
String word=words[j].trim();
word = word.toLowerCase();
if(isValidWord(word)){//outputarr.add(word);
System.out.println("words["+j+"] is: "+word);
row.add(word_id);//WORD_ID NEEDS TO BE CREATED BY SOME METHOD.
row.add(fields[0]);
row.add(word);
twoDimArrayList.add(row);
idx += 1;
}
}
}else{fieldslenlessthan2 += 1;}
}
System.out.println("........... fieldslenlessthan2 is: "+fieldslenlessthan2);
return twoDimArrayList;
}
The output of the above method currently looks like the following, with many rxcui values for some name values, and with many name values for some rxcui:
How do I change the code above so that the output is a list of unique pairs of name/rxcui values, summarizing all relevant data from the current output while removing only the redundancies?

If you just need a Collection of all words, use a HashSet Sets are primarily used for contains logic. If you need to associate a value with your string use a HashMap
public HashSet<String> getUniqueWords(String[] stringArray) {
HashSet<String> uniqueWords = new HashSet<String>();
for (String str : stringArray) {
uniqueWords.add(str);
}
return uniqueWords;
}
This will give you a collection of all the unique Strings in your array. If you need an ID use a HashMap
String[] strList; // your String array
int idCounter = 0;
HashMap<String, Integer> stringIDMap = new HashMap<String, Integer>();
for (String str : strList) {
if (!stringIDMap.contains(str)) {
stringIDMap.put(str, new Integer(idCounter));
idCounter++;
}
}
This will provide you a HashMap with unique String keys and unique Integer values. To get an id for a String you do this:
stringIDMap.get("myString"); // returns the Integer ID associated with the String "myString"
UPDATE
Based on the question update from the OP. I recommend creating an object that holds the String value and the rxcui. You can then place these in a Set or HashMap using a similar implementation to the one provided above.
public MyObject(String str, int rxcui); // The constructor for your new object
MyObject mo1 = new MyObject("hello", 5);
Either
mySet.add(myObject);
will work or
myMap.put(mo1.getStr, mo1.getRxcui);

What is the purpose of the unique word ID? Is the word itself not unique enough since you are not keeping duplicates?
A very basic way would be to keep a counter going as you are checking new words. For each word that doesn't already exist you could increase the counter and use the new value as the unique id.
Lastly, might I suggest you use a HashMap instead. It would allow you to both insert and retrieve words in O(1) time. I am not entirely sure what you are going for, but I think the HashMap might give you more range.
Edit2:
It would be something a little more along these lines. This should help you out.
public static Set<DataPair> getAllWords(String[] tempsArray) {
Set<DataPair> set = new HashSet<>();
for (String row : tempsArray) {
// PARSE YOUR STRING DATA
// the way you were doing it seemed fine but something like this
String[] rowArray = row.split(" ");
String word = row[1];
int id = Integer.parseInt(row[0]);
DataPair pair = new DataPair(word, id);
set.add(pair);
}
return set;
}
class DataPair {
private String word;
private int id;
public DataPair(String word, int id) {
this.word = word;
this.id = id;
}
public boolean equals(Object o) {
if (o instanceof DataPair) {
return ((DataPair) o).word.equals(word) && ((DataPair) o).id == id;
}
return false;
}
}

ArrayList content check over entire array

I'm trying to get this to return a certain number of array entries based on their containing a certain input string.
/**
* This method returns a list of all words from
* the dictionary that include the given substring.
*/
public ArrayList<String> wordsContaining(String text)
{
ArrayList<String> contentCheck = new ArrayList<String>();
for(int index = 0; index < words.size(); index++)
{
if(words.contains(text))
{
contentCheck.add(words.get(index));
}
}
return contentCheck;
}
I don't understand why this keeps returning freaking every value in the array instead of only the entries containing the string bit.
Thanks!

Your condition:
if(words.contains(text))
checks whether the text is in the list or not. That would be true for all or none of the elements.
What you want is:
if(words.get(index).contains(text))
Apart from that, it would be better if you use enhanced for statement:
for (String word: words) {
if(word.contains(text)) {
contentCheck.add(word);
}
}

You have 2 issues in your code
The first one is one is that You check in your condition
if(words.contains(text)) - this check that text is in list
and what you probably want is to check that given item of list contains text
public List<String> wordsContaining(String text)
{
List<String> contentCheck = new ArrayList<String>();
for(String word : words) //For each word in words
{
if(word.contains(text)) // Check that word contains text
{
contentCheck.add(word);
}
}
return contentCheck;
}

How to Count Unique Values in an ArrayList?

I have to count the number of unique words from a text document using Java. First I had to get rid of the punctuation in all of the words. I used the Scanner class to scan each word in the document and put in an String ArrayList.
So, the next step is where I'm having the problem! How do I create a method that can count the number of unique Strings in the array?
For example, if the array contains apple, bob, apple, jim, bob; the number of unique values in this array is 3.
public countWords() {
try {
Scanner scan = new Scanner(in);
while (scan.hasNext()) {
String words = scan.next();
if (words.contains(".")) {
words.replace(".", "");
}
if (words.contains("!")) {
words.replace("!", "");
}
if (words.contains(":")) {
words.replace(":", "");
}
if (words.contains(",")) {
words.replace(",", "");
}
if (words.contains("'")) {
words.replace("?", "");
}
if (words.contains("-")) {
words.replace("-", "");
}
if (words.contains("‘")) {
words.replace("‘", "");
}
wordStore.add(words.toLowerCase());
}
} catch (FileNotFoundException e) {
System.out.println("File Not Found");
}
System.out.println("The total number of words is: " + wordStore.size());
}

Are you allowed to use Set? If so, you HashSet may solve your problem. HashSet doesn't accept duplicates.
HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();
size() method returns number of unique words.
If you have to really use ArrayList only, then one way to achieve may be,
1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList

Starting from Java 8 you can use Stream:
After you add the elements in your ArrayList:
long n = wordStore.stream().distinct().count();
It converts your ArrayList to a stream and then it counts only the distinct elements.

I would advice to use HashSet. This automatically filters the duplicate when calling add method.

Although I believe a set is the easiest solution, you can still use your original solution and just add an if statement to check if value already exists in the list before you do your add.
if( !wordstore.contains( words.toLowerCase() )
wordStore.add(words.toLowerCase());
Then the number of words in your list is the total number of unique words (ie: wordStore.size() )

This general purpose solution takes advantage of the fact that the Set abstract data type does not allow duplicates. The Set.add() method is specifically useful in that it returns a boolean flag indicating the success of the 'add' operation. A HashMap is used to track the occurrence of each original element. This algorithm can be adapted for variations of this type of problem. This solution produces O(n) performance..
public static void main(String args[])
{
String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}
public static HashMap<String, Integer> duplicates(String arr[])
{
HashSet<String> distinctKeySet = new HashSet<String>();
HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();
for(int i = 0; i < arr.length; i++)
{
if(distinctKeySet.add(arr[i]))
keyCountMap.put(arr[i], 1); // unique value or first occurrence
else
keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
}
return keyCountMap;
}
RESULTS:
RAW: [abc, def, mno, xyz, pqr, xyz, def] ; PROCESSED: {pqr=1, abc=1, def=2, xyz=2, mno=1}

You can create a HashTable or HashMap as well. Keys would be your input strings and Value would be the number of times that string occurs in your input array. O(N) time and space.
Solution 2:
Sort the input list.
Similar strings would be next to each other.
Compare list(i) to list(i+1) and count the number of duplicates.

In shorthand way you can do it as follows...
ArrayList<String> duplicateList = new ArrayList<String>();
duplicateList.add("one");
duplicateList.add("two");
duplicateList.add("one");
duplicateList.add("three");
System.out.println(duplicateList); // prints [one, two, one, three]
HashSet<String> uniqueSet = new HashSet<String>();
uniqueSet.addAll(duplicateList);
System.out.println(uniqueSet); // prints [two, one, three]
duplicateList.clear();
System.out.println(duplicateList);// prints []
duplicateList.addAll(uniqueSet);
System.out.println(duplicateList);// prints [two, one, three]

public class UniqueinArrayList {
public static void main(String[] args) {
StringBuffer sb=new StringBuffer();
List al=new ArrayList();
al.add("Stack");
al.add("Stack");
al.add("over");
al.add("over");
al.add("flow");
al.add("flow");
System.out.println(al);
Set s=new LinkedHashSet(al);
System.out.println(s);
Iterator itr=s.iterator();
while(itr.hasNext()){
sb.append(itr.next()+" ");
}
System.out.println(sb.toString().trim());
}
}

3 distinct possible solutions:
Use HashSet as suggested above.
Create a temporary ArrayList and store only unique element like below:
public static int getUniqueElement(List<String> data) {
List<String> newList = new ArrayList<>();
for (String eachWord : data)
if (!newList.contains(eachWord))
newList.add(eachWord);
return newList.size();
}
Java 8 solution
long count = data.stream().distinct().count();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Simple words finder - java

Related

Sorting an arraylist with integers

How would I get all the words from a list that begins with the specified letter

separating unique values in an algorithm

ArrayList content check over entire array

How to Count Unique Values in an ArrayList?

Categories

Resources