java string matching - java

All that I am doing in my project is taking two values(that I am reading from two different excel files) and checking how similar they are.! I tried using the pattern and matcher classes which works perfectly fine when both the words are exactly the same (as in organisation and organisation/s). In my data I have say something like (employee and employment), I just need "employ" as the common string between the two, in which case..pattern and matches fails.! I am stuck with this since a week.I have about 700 rows in the first excel file and about 9000 in the other. Each cell value that I am reading into the program using java, I am storing them in two separate variables. Next, i tried using 4 for loops to match word by word and character by character to find only those characters that match between the two.I have pasted the coded for the for loop implementation. Four for loops are like driving me nuts.! Any help in completing this would be greatly appreciated.
String str1 = "Cover for employees of the company";
String str2 = "Employment Agencies ";
String str,strfinal;
String[] count1 = str1.split("\\s+");
String[] count2 = str2.split("\\s+");
char[] count11 = str1.toCharArray();
char[] count22 = str2.toCharArray();
for(int i=0;i<count1.length;i++)
{
for(int j=0;j<count2.length;j++)
{
for(int m=0;m<count1[i].length();m++)
{
for(int n=0;n<count2[j].length();n++)
{
if(count11[m]==count22[n])
{
// please look at the logic that I am looking for to implement
}
}
}
}
}
Expected output: employ
one more concept that I am trying to implement (in order to make my program more efficient) is..
cover ----(compared with) employment. First character itself does not match.Implies go to the next word in the second string. Once all words in the second string are traversed and checked for, go to the next word in the first string and compare this word with all the words in the second string.
Okay.. so this is what I am looking for right now.. Any help will be greatly appreciated.
Thanks!

Related

How to divide a string into equal groups of n characters padded with blank spaces in Java?

How to create a method that will take imput of a String and an integer n and output the String divided into parts consisting of n characters and separated by a blank space? For example, imput String: "THISISFUN", integer:3, result: "THI SIS FUN".
When you answer, can you please really try to explain what each part of the code does? I really want to understand it.
I tried using StringBuilder and the split() method but the problem is that I don't understand how all of that works. Therefore, I ended up kind of thoughtlessly pasting parts of codes from different online articles which doesn't work the best if you want to actually learn something, especially if you simply cannot find any posts about a specific issue. I could only find things like: "how to divide the String into n parts" and "how to ad a space after a specific char" which are sort of similar issues but not the same.
Here is one way to do it:
public static void splitString(String str, int groupSize){
char[] arr = str.toCharArray(); // Split the string into character array ①
// Iterate over array and print the characters
for(int i=0; i<arr.length; i++){
// If 'i' is a multiple of 'groupSize' ②
if(i > 0 && i % groupSize == 0){ ③
System.out.print(" ");
}
System.out.print(arr[i]);
}
}
① Split the string into a character array (so that you can access the characters individually). You can also do it using the charAt() method without splitting the string into an array. Read the Javadoc for more details.
② Check if the loop counter i is a multiple of groupSize
③ Note the use of System.out.print() as we do not want to print a newline. Here you can use a StringBuilder too and print the contents at the end instead of printing the characters inside the loop.

How to find words out of an string using arrays Java?

I need to find all 4,5,6 letter words in a string of letters. When we find each for letter word, we are then supposed asked to check if the word is in the English dictionary.
The problem I am facing is that I am not sure how I can make java find all 4,5,6 letter words?
The example they gave us is like this:
String letter = "fourgooddogsswam";
int wordSize = 4;
Then the words are:
four
ourg
urgo
rgoo
good
oodd
oddo
ddog
dogs
ogss
gssw
sswa
swam
In reality, the only words are four, good, dogs, swam.
Again, I am wondering how to make a loop or something of that nature in order to find all of the four letter words. Any help or tips are highly appreciated.
Thank you.
This is a pretty simple algorithm and if you are taking classes of basic programing is better way take time to think about and learn than use forums to answer your homework
for(int i = 0; i < letter.length()-4;i++){
fourLetter = letter.substring(i,i+4);
//... do whatever you want
}
First You need to split the string of size 4.By using below code you will get array of String having length 4.
List<String> elementList = new ArrayList<String>();
for(int i = 0; i < letter.length()-4;i++){
fourLetter = letter.substring(i,i+4);
elementList.add(fourLetter);
}
Then you need to read the dictionary words from file and insert into set<String>.
Then you can convert String array to List and iterate the list and check of set.contains(listelement).

How do you pull data from a .FIC file in java?

So I am writing a scrabble word suggestion program that I decided to do because I wanted to learn sets (don't worry, I at least got that part) and referencing info/data not created within the program. Im pretty new to Java (and programming in general), but I was wondering how to pull words from a word list .FIC file in order to check them against words generated from the letters inputted.
To clarify, I have written a program which takes a series of letters and returns a set of every possible word created from those letters. for example:
input:
abc
would give a set containing the "words":
a, ab, ac, abc, acb, b, ba, bc, bac, bca, c, ca, cb, cab, cba
What I am asking, really, is how to check those to find the ones contained in the .FIC file.
The file is the "official crosswords" file from the Moby project word list and I am still (very) shaky on parsing and other file dealing-with methods. I am continuing to research so I dont have any prototype code for that.
Sorry if the question isn't entirely clear.
edit: here is the method that makes the "words" to make it easier to understand the idea. The part I don't understand is specifically how to pull a word(as a string) from the .FIC file.
private static Set<String> Words(String s)
{
Set<String> tempwords = new TreeSet<String>();
if (s.length() == 1)
{ // base case, last letter
tempwords.add(s);
// System.out.println(s); uncomment when debugging
}
else
{
//set up to add each letter in s
for (int i = 0; i < s.length(); i++)
{ //cut the i letter out of the string
String remaining = s.substring(0, i) + s.substring(i+1);
//recursion to add all combinations of letters onto the current letter/"word"
for (String permutation : Words(remaining))
{
// System.out.println(s.substring(i, i+1) + permutation); uncomment when debugging
//add the full length words
tempwords.add(s.substring(i, i+1) + permutation);
// System.out.println(permutation); uncomment when debugging
//add the not-full-length words
tempwords.add(permutation);
}
}
}
// System.out.println(tempwords); uncomment when debugging
return tempwords;
}
I dont know if it is the best solution, but i figured it out (hobbs the line thing helped a lot, thank you). I found that this works:
public static void main(String[] args) throws FileNotFoundException
{
Scanner s = new Scanner(new FileReader("C:/Users/Sean/workspace/Imbored/bin/113809of.fic"));
while(true)
{
words.clear();
String letters = enterLetters();
words.addAll(Words(letters));
while(s.hasNextLine()) {
String line = s.nextLine();
String finalword = checkWords(line, words);
if (finalword != null) finalwordset.add(finalword);
}
s.reset();
System.out.println(finalwordset);
System.out.println();
System.out.println("_________________________________________________________________________");
}
}
A few things:
The checkWords method checks if the current word from the file is in the generated list of "words"
The enterletters method takes user inputted letters and returns them in a string
The Words method returns a set of strings of all of the possible combinations of the characters in the given string, with each character used up to as many times as it appears in the string and no repeated "words" in the returned set.
finalwordset and words are arraylists of strings defined as instance variables(i would put them in the main method but I'm lazy and it doesn't matter for this case)
I am very sure there is a better/more efficient way to do this, but this at least works.
Finally: I decided to answer rather than delete because I didn't see this answered anywhere else, so if it is feel free to delete the question or link to the other answer or whatever, at this point it is to help other people.

How can i extract specific terms from string lines in Java?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
"9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044   15939353   C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.

How to print out all permutations of a string in Java

Given a string, I need to print out all permutations of the string. How should I do that? I have tried
for(int i = 0; i<word.length();i++)
{
for(int j='a';j<='z';j++){
word = word.charAt(i)+""+(char)j;
System.out.println(word);
}
}
Is there a good way about doing this?
I'm not 100% sure that I understand what you are trying to do. I'm going to go by your original wording of the question and your comment to #ErstwhileIII's answer, which make me think that it's not really "permutations" (i.e. rearrangement of the letters in the word) that you are looking for, but rather possible single-letter modifications (not sure what a better word for this would be either), like this:
Take a word like "hello" and print a list of all "versions" you can get by adding one "typo" to it:
hello -> aello, bello, cello, ..., zello, hallo, hbllo, hcllo, ..., hzllo, healo, heblo, ...
If that's indeed what you're looking for, the following code will do that for you pretty efficiently:
public void process(String word) {
// Convert word to array of letters
char[] letters = word.toCharArray();
// Run through all positions in the word
for (int pos=0; pos<letters.length; pos++) {
// Run through all letters for the current position
for (char letter='a'; letter<='z'; letter++) {
// Replace the letter
letters[pos] = letter;
// Re-create a string and print it out
System.out.println(new String(letters));
}
// Set the current letter back to what it was
letters[pos] = word.charAt(pos);
}
}
OH .. to print out all permutations of a string, consider your algorithm first. What is the definition of "all permutations" .. for example:
String "a" would have answer a only
String "ab" would have answer: ab, ba
String "abc" would have answer: abc acb, bca, bac, cba, cab
Reflect on the algorithm you would use (write it down in english) .. then translate to Java code
While not the most efficient, a recursive solution might be easiest to use (i.e. for a string of length n, go through each of the characters and follow that with the permutations of the string with that character removed).
EDIT: Ok... you changed your request. Permutations is a whole other story. I think this will help: Generating all permutations of a given string
Not sure what you are trying to do... Example 1 is to get the alphabet one letter next to another. Example 2 is to print whatever you gave us there as an example.
//Example 1
String word=""; //empty string
for(int i = 65; i<=90;i++){ //65-90 are the Ascii numbers for capital letters
word+=(char)i; //cast int to char
}
System.out.println(word);
//Example 2
String word="";
for (int i=65;i<=90;i++){
word+=(char)i+"rse";
if(i!=90){ //you don't want this at the end of your sentence i suppose :)
word+=", ";
}
}
System.out.println(word);

Categories