Find a char optimization

Find a char optimization - java

So this part of the homework wants us to take a Set of Strings and we will return a List of Strings. In the String Set we will have email addresses ie myname#uark.edu. We are to pull the first part of the email address; the name and put it in the String List.From the above example myname would be put into the List.
The code I currently have uses an iterator to pull a string from the Set. I then use the String.contains("#") as an error check to make sure the String has an # symbol in it. I then start at the end of the string and use the string.charAt("#") to check each char. Once It's found i then make a substring with the correct part and send it to the List.
My problem is i wanted to use something recursive and cut down on operations. I was thinking of something that would divide the string.length()/2 and then use String.contains("#") on the second half first. If that half does contain the # symbol then it would call the functions recursively agin. If the back half did not contain the # symbol then the front half would have it and we would call the function recursively sending it.
So my problem is when I call the function recursively and send it the "substring" once I find the # symbol I will only have the index of the substring and not the index of the original string. Any ideas on how to keep track of it or maybe a command/method I should be looking at. Below is my original code. Any advice welcome.
public static List<String> parseEmail(Set<String> emails)
{
List<String> _names = new LinkedList<String>();
Iterator<String> eMailIt=emails.iterator();
while(eMailIt.hasNext())
{
String address=new String(eMailIt.next());
boolean check=true;
if(address.contains("#"))//if else will catch addresses that do not contain '#' .
{
String _address="";
for(int i=address.length(); i>0 && check; i--)
{
if('#'==address.charAt(i-1))
{
_address=new String(address.substring(0,i-1));
check=false;
}
}
_names.add(_address);
//System.out.println(_address);//fill in with correct sub string
}
else
{
//System.out.println("Invalid address");
_names.add("Invalid address");//This is whats shownn when you have an address that does not have an # in it.
} // could have it insert some other char i.e. *%# s.t. if you use the returned list it can skip over invalid emails
}
return _names;
}
**It was suggested I use the String.indexOf("#") BUT according to the API this method only gives back the first occurrence of the symbol and I have to work on the assumption that there could be multiple "#" in the address and I have to use the last one. Thank you for the suggestion though. Am looking at the other suggestion and will report back.
***So there is a string.lastindexOf() and that was what I needed.
public static List<String> parseEmail(Set<String> emails)
{
List<String> _names = new LinkedList<String>();
Iterator<String> eMailIt=emails.iterator();
while(eMailIt.hasNext())
{
String address=new String(eMailIt.next());
if(address.contains("#"))//if else will catch addresses that do not contain '#' .
{
int endex=address.lastIndexOf('#');
_names.add(address.substring(0,endex-1));
// System.out.println(address.substring(0,endex));
}
else
{
// System.out.println("Invalid address");
_names.add("Invalid address");//This is whats shownn when you have an address that does not have an # in it.
} // could have it insert some other char i.e. *%# s.t. if you use the returned list it can skip over invalid emails
}
return _names;
}

Don't reinvent the wheel (unless you were asked too of course). Java already has a built-in function for what you are attempting String.indexOf(String str). Use it.
final String email = "someone#example.com";
final int atIndex = email.lastIndexOf("#");
if(atIndex != -1) {
final String name = email.substring(0, atIndex);
}

I agree to the previous two answers, if you are allowed to use the built-in functions split or indexOf then you should. However if it is part of your homework to find the substrings yourself you should definitely just go through the string's characters and stop when you found the # aka linear search.
You should definitely not under no circumstances try to do this recursively: The idea of divide and conquer should not be abused in a situation where there is nothing to gain: Recursion means function-call overhead and doing this recursively would only have a chance of being faster than a simple linear search if the sub-strings were searched in-parallel; and even then: the synchronization overhead would kill the speedup for all but the most gigantic strings.

Unless recursion is specified in the homework, you would be best served by looking into String.split. It will split the String into a String array (if you specify it to be around '#'), and you can access both halves of the e-mail address.

Related

Java - Search keywords list in another string list

I have a list of keywords in a List and I have data coming from some source which will be a list too.
I would like to find if any of keywords exists in the data list, if yes add those keywords to another target list.
E.g.
Keywords list = FIRSTNAME, LASTNAME, CURRENCY & FUND
Data list = HUSBANDFIRSTNAME, HUSBANDLASTNAME, WIFEFIRSTNAME, SOURCECURRENCY & CURRENCYRATE.
From above example, I would like to make a target list with keywords FIRSTNAME, LASTNAME & CURRENCY, however FUND should not come as it doesn't exists in the data list.
I have a solution below that works by using two for loops (one inside another) and check with String contains method, but I would like to avoid two loops, especially one inside another.
for (int i=0; i<dataList.size();i++) {
for (int j=0; j<keywordsList.size();j++) {
if (dataList.get(i).contains(keywordsList.get(j))) {
targetSet.add(keywordsList.get(j));
break;
}
}
}
Is there any other alternate solution for my problem?

Here's a one loop approach using regex. You construct a pattern using your keywords, and then iterate through your dataList and see if you can find a match.
public static void main(String[] args) throws Exception {
List<String> keywords = new ArrayList(Arrays.asList("FIRSTNAME", "LASTNAME", "CURRENCY", "FUND"));
List<String> dataList = new ArrayList(Arrays.asList("HUSBANDFIRSTNAME", "HUSBANDLASTNAME", "WIFEFIRSTNAME", "SOURCECURRENCY", "CURRENCYRATE"));
Set<String> targetSet = new HashSet();
String pattern = String.join("|", keywords);
for (String data : dataList) {
Matcher matcher = Pattern.compile(pattern).matcher(data);
if (matcher.find()) {
targetSet.add(matcher.group());
}
}
System.out.println(targetSet);
}
Results:
[CURRENCY, LASTNAME, FIRSTNAME]

Try Aho–Corasick algorithm. This algorithm can get the count of appearance of every keyword in the data (You just need whether it appeared or not).
The Complexity is O(Sum(Length(Keyword)) + Length(Data) + Count(number of match)).
Here is the wiki-page:
In computer science, the Aho–Corasick algorithm is a string searching
algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is
a kind of dictionary-matching algorithm that locates elements of a
finite set of strings (the "dictionary") within an input text. It
matches all patterns simultaneously. The complexity of the algorithm
is linear in the length of the patterns plus the length of the
searched text plus the number of output matches.
I implemented it(about 200 lines) years ago for similar case, and it works well.
If you just care keyword appeared or not, you can modify that algorithm for your case with a better complexity:
O(Sum(Length(Keyword)) + Length(Data)).
You can find implementation of that algorithm from internet everywhere but I think it's good for you to understand that algorithm and implement it by yourself.
EDIT:
I think you want to eliminate two-loops, so we need find all keywords in one loop. We call it Set Match Problem that a set of patterns(keywords) to match a text(data). You want to solve Set Match Problem, then you should choose Aho–Corasick algorithm which is particularly designed for that case. In that way, we will get one loop solution:
for (int i=0; i < dataList.size(); i++) {
targetSet.addAll(Ac.run(keywordsList));
}
You can find a implementation from here.

How do you pull data from a .FIC file in java?

So I am writing a scrabble word suggestion program that I decided to do because I wanted to learn sets (don't worry, I at least got that part) and referencing info/data not created within the program. Im pretty new to Java (and programming in general), but I was wondering how to pull words from a word list .FIC file in order to check them against words generated from the letters inputted.
To clarify, I have written a program which takes a series of letters and returns a set of every possible word created from those letters. for example:
input:
abc
would give a set containing the "words":
a, ab, ac, abc, acb, b, ba, bc, bac, bca, c, ca, cb, cab, cba
What I am asking, really, is how to check those to find the ones contained in the .FIC file.
The file is the "official crosswords" file from the Moby project word list and I am still (very) shaky on parsing and other file dealing-with methods. I am continuing to research so I dont have any prototype code for that.
Sorry if the question isn't entirely clear.
edit: here is the method that makes the "words" to make it easier to understand the idea. The part I don't understand is specifically how to pull a word(as a string) from the .FIC file.
private static Set<String> Words(String s)
{
Set<String> tempwords = new TreeSet<String>();
if (s.length() == 1)
{ // base case, last letter
tempwords.add(s);
// System.out.println(s); uncomment when debugging
}
else
{
//set up to add each letter in s
for (int i = 0; i < s.length(); i++)
{ //cut the i letter out of the string
String remaining = s.substring(0, i) + s.substring(i+1);
//recursion to add all combinations of letters onto the current letter/"word"
for (String permutation : Words(remaining))
{
// System.out.println(s.substring(i, i+1) + permutation); uncomment when debugging
//add the full length words
tempwords.add(s.substring(i, i+1) + permutation);
// System.out.println(permutation); uncomment when debugging
//add the not-full-length words
tempwords.add(permutation);
}
}
}
// System.out.println(tempwords); uncomment when debugging
return tempwords;
}

I dont know if it is the best solution, but i figured it out (hobbs the line thing helped a lot, thank you). I found that this works:
public static void main(String[] args) throws FileNotFoundException
{
Scanner s = new Scanner(new FileReader("C:/Users/Sean/workspace/Imbored/bin/113809of.fic"));
while(true)
{
words.clear();
String letters = enterLetters();
words.addAll(Words(letters));
while(s.hasNextLine()) {
String line = s.nextLine();
String finalword = checkWords(line, words);
if (finalword != null) finalwordset.add(finalword);
}
s.reset();
System.out.println(finalwordset);
System.out.println();
System.out.println("_________________________________________________________________________");
}
}
A few things:
The checkWords method checks if the current word from the file is in the generated list of "words"
The enterletters method takes user inputted letters and returns them in a string
The Words method returns a set of strings of all of the possible combinations of the characters in the given string, with each character used up to as many times as it appears in the string and no repeated "words" in the returned set.
finalwordset and words are arraylists of strings defined as instance variables(i would put them in the main method but I'm lazy and it doesn't matter for this case)
I am very sure there is a better/more efficient way to do this, but this at least works.
Finally: I decided to answer rather than delete because I didn't see this answered anywhere else, so if it is feel free to delete the question or link to the other answer or whatever, at this point it is to help other people.

Java: Contains() method

Trying to implement contains() method without using built-in method contains().
Here is my code:
public static boolean containsCS(String str, CharSequence cs) {
//char[] chs = str.toCharArray();
boolean result = false;
int i=0;
while(i<str.length()) {
int j=0;
while(j<cs.length()) {
if(NEED TO CHECK IF THERE IS AN INDEX OUT OF BOUNDS EXCEPTION) {
result = false;
break;
}
if(str.charAt(i+j)==cs.charAt(j)) {
result|=true; //result = false or true ->>>>> which is true.
j++;
} else {
result = false;
break;
}
}
i++;
}
return false;
}
Let's say:
String str = "llpll"
Charsequence cs = "llo"
I want to make sure this method works properly in the above case where the Charsequence has one or more char to check but the String runs out length. How should I write the first if statement?

if (i+cs.length() > str.length()){
OUT OF BOUNDS
}

Well if it were me first thing I'd check is that the length of my char sequence was <= to the length of my string.
As soon as you chop that logic path out.
If the lengths are equal you can just use ==
Then it would occur that if you chopped up str
into cs length parts, you could do a straight comparison there as well.
e.g str of TonyJ and search for a three character sequence would pile through
Ton
ony
nyJ
One loop, one if statement and a heck of a lot clearer.

I would suggest using this and using the contains method therein.
Edit - For the reading impaired:
The linked method is not from java.lang.String or java.lang.Object
If you'd bother to actually look at the links, you would see that it is the Apache Commons-Lang API and the StringUtils.contains(...) method that I reference, which very clearly answers the question.

If this is for your homework, which I suspect it is, then I suggest you take a look at the API for the String class to see what other methods are available to help find the location of one String within another.
Also consider looking at the source code for String to see how it implements it.

I'm sure you already know this, but it is in fact possible to see the actual source code of the built-in classes and methods. So I'd take a look there for a start. The String class is especially interesting, IMO.

String Permutations

I was recently trying to write a script that print out all the permutations of a word in Java. For some reason it only prints out one. I just can't figure it out!
import java.util.*;
public class AllPermutations {
ArrayList<String> letters = new ArrayList<String>();
public void main(){
letters.add("H");
letters.add("a");
letters.add("s");
permutate("",letters);
}
public void permutate(String word, ArrayList<String> lettersLeft){
if(lettersLeft.size()==0){
System.out.println(word);
}else{
for(int i=0;i<lettersLeft.size();i++){
String newWord = new String();
newWord = word+lettersLeft.get(i);
lettersLeft.remove(i);
permutate(newWord, lettersLeft);
}
}
}
}

You need to add the letter you have removed back to the lettersLeft list
public void permutate(String word, ArrayList<String> lettersLeft){
if(lettersLeft.size()==0){
System.out.println(word);
}else{
for(int i=0;i<lettersLeft.size();i++){
String temp = lettersLeft.remove(i);
String newWord = word+temp;
permutate(newWord, lettersLeft);
lettersLeft.add(i, temp);
}
}
}
I haven't tested it, but I think it should work.
The problem is that Java/you are passing by reference, not copy (ArrayList). Therefore once you reach the bottom of your recursion tree, lettersLeft will contain 0 elements, and once you go back up, it will still have 0 elements.
As a side note, StringBuilder/StringBuffer is better at doing string permutation task, since String is immutable, therefore you are wasting a lot of resource creating new Strings, n! to be exact. The difference between the two StringBuilder/Buffer is up to you to discover.

The reason for that is lettersLeft is being passed by reference always. Once you are removing a letter from lettersLeft, it is being permanently removed. So for the first iteration you have "HAS" printed out. once that finishes, the recursion algorithm backs up a level to make the second iteration, but what do you know?? lettersLeft is empty. so it terminates without passing by the if statement causing it not to get another word or permutation. In order to resolve this, create a local copy, just like you did with newWord. Hope that helps.

In this case you are removing the letters from the Arraylist and it gets empty till it reaches the end of first word.. Then after that list size is always zero...Add the removed letter back to the list...........
I would recommend you to use the below link and find good examples of String Permutations as there are both memory efficient and space efficient solutions of String permutations...
http://www.codingeek.com/java/strings/find-all-possible-permutations-of-string-using-recursive-method/

Java Searching Through a String

So I want to search through a string to see if it contains the substring that I'm looking for. This is the algorithm I wrote up:
//Declares the String to be searched
String temp = "Hello World?";
//A String array is created to store the individual
//substrings in temp
String[] array = temp.split(" ");
//Iterates through String array and determines if the
//substring is present
for(String a : array)
{
if(a.equalsIgnoreCase("hello"))
{
System.out.println("Found");
break;
}
System.out.println("Not Found");
}
This algorithm works for "hello" but I don't know how to get it to work for "world" since it has a question mark attached to it.
Thanks for any help!

Take a look:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#contains(java.lang.CharSequence)
String.contains();
To get a containsIgnoreCase(), you'll have to make your searchword and your String toLowerCase().
Take a look at this answer:
How to check if a String contains another String in a case insensitive manner in Java?
return s1.toLowerCase().contains(s2.toLowerCase());
This will also be true for:
war of the worlds, because it will find world. If you don't want this behavior, youll have to change your method like #Bart Kiers said.

Split on the following instead:
"[\\s?.!,]"
which matches any space char, question mark, dot, exclamation or a comma (add more chars if you like).
Or do a temp = temp.toLowerCase() and then temp.contains("world").

You dont have to do this, it's already implemented:
IndexOf and others

You may want to use :
String string = "Hello World?";
boolean b = string.indexOf("Hello") > 0; // true
To ignore case, regular expressions must be used .
b = string.matches("(?i).*Hello.*");
One more variation to ignore case would be :
// To ignore case
b=string.toLowerCase().indexOf("Hello".toLowerCase()) > 0 // true

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find a char optimization - java

Unless recursion is specified in the homework, you would be best served by looking into String.split. It will split the String into a String array (if you specify it to be around '#'), and you can access both halves of the e-mail address.

Related

Java - Search keywords list in another string list

How do you pull data from a .FIC file in java?

Java: Contains() method

String Permutations

Java Searching Through a String

Categories

Resources