Comparing Sentences in Java - java

I would like to compare two sentences (sentences A and B), such that the program would output the changes made on sentence B from sentence A. For example:
sentence A: It's a lovely day today.
sentence B: It's a very lovely day today, isnt it?
Output: It's a [I:very] lovely day today [C:./,] [I:isnt it?]
where:
I = INSERTED,
C = CHANGED
PS: I havent started coding yet since I want to gather some of your ideas on how to best implement this case.

I have come up with below code and for this problem.
Conditions Not considered
Removed items from any of the list
First char difference
duplication of diff item
Please check and let me know if you have doubts.
public static void main(String[] args) {
String str1 = "It's a lovely day today.";
String str2 = "It's a very lovely day today, isnt it?";
StringBuilder builder = new StringBuilder();
StringBuilder added = new StringBuilder();
StringBuilder changed = new StringBuilder();
for (int i = 0; i < str1.length(); i++)
for (int j = 0; j < str2.length(); j++) {
if (str1.charAt(i) == str2.charAt(j)) {
if (added.length() > 0) {
builder.append("[I:" + added.toString() + "]");
added = new StringBuilder();
}
if (changed.length() > 0) {
changed.append("[C:" + changed.toString() + "]");
changed = new StringBuilder();
}
// skip as there is no difference.
builder.append(str1.charAt(i));
i++;
// check if index -1 chars are equal then there is
// difference start
} else if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
// check if end of line
if ((i + 1 == str1.length())
|| (str1.charAt(i + 1) == str2.charAt(j + 1))) {
changed.append(str1.charAt(i));
changed.append("/");
changed.append(str2.charAt(j));
j++;
// everything is added
if (i + 1 == str1.length()) {
while (j < str2.length() - 1)
added.append(str2.charAt(j++));
}
continue;
}
// Go until next equal found
while (!(str1.charAt(i) == str2.charAt(j))
&& j < str2.length() - 1) {
added.append(str2.charAt(j++));
}
j--;
}
}
if (changed.length() > 0) {
builder.append("[C:" + changed.toString() + "]");
}
if (added.length() > 0) {
builder.append("[I:" + added.toString() + "]");
}
System.out.println(builder.toString());
}
Output
It's a [I:very ]lovely day today[C:./,][I: isnt it]

Related

Implementing substring checking method using only charAt

I am trying to implement a substring method using only charAt method of the String class
The problem occurs when I include the last character in the search term 'hat.' otherwise everything works perfectly.
Also when searching for example for 'hat' I see the charAt(j) trace prints all 'h' with index 0 for all characters and true occurrence.
Here is the complete code:
public class SubString {
public static void main(String[] args) {
String line = "The cat in the hat.";
String item = "hat.";
System.out.println("'" + item + "' is substring of '" + line + "' : " + isSubString(item, line));
}
private static boolean isSubString(String item, String line) {
int i = 0;
int j = 0;
int count = 0;
for (i = 0; i < line.length() - item.length(); i++) {
for (j = 0; j < item.length(); j++) {
if (item.charAt(j) != line.charAt(i + j)) {
break;
}
if (item.charAt(j) == line.charAt(i + j)) {
System.out.println(item.charAt(j) + ":" + j + " - " + line.charAt(i + j) + ":" + (i + j));
count++;
}
if (count == item.length())
return true;
}
}
return false;
}
}
Again the problem occurs when searching for 'hat.' < == the last word with dot.
and the 'hat' which although return true but trace shows wrong characters ( only h's compared) and indexes are always 0.
The first loop omits the last character of the string. i.e, line.length() - item.length()
Please replace it with below for loop condition.
for (i = 0; i < line.length() - item.length() + 1; i++) {
you should try
line.contains(item)

Finding substring in a string in Java

I am writing a program to find substring in the string in Java without using any Java Library.
I had written a function subString(String str1, String str2) as shown below.
It is working for the following input:
str1="rahul"
str2="My name is rahul"
str1="rahul"
str2="rahul sah"
str3="rahul"
str2="sah rahul"
The problem occurs when I give input as:
str1="rahul"
str2="rararahul"
str1="rahul"
str2="My name is sunil"
It goes to infinite loop. Can anyone have a look into my code snippet and help me out.
public static boolean subString(String str1, String str2) {
boolean found = false;
int len1 = str1.length();
int len2 = str2.length();
int status = 0;
char[] arr1 = new char[len1];
char[] arr2 = new char[len2];
for (int ii = 0; ii < len1; ii++) {
arr1[ii] = str1.charAt(ii);
}
for (int jj = 0; jj < len2; jj++) {
arr2[jj] = str2.charAt(jj);
}
for (int ii = 0; ii < len1; ii++) {
for (int jj = 0; jj < len2; jj++) {
if (arr1[ii] == arr2[jj]) {
if (ii < len1 - 1) {
System.out.println("Found1::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = true;
ii++;
} else if (arr1[ii] == arr2[jj] && ii == len1 - 1) {
System.out.println("Found2::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = true;
break;
}
} else if (found == false && arr1[ii] != arr2[jj]) {
System.out.println("Found3::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = false;
} else if (found == true && arr1[ii] != arr2[jj]) {
System.out.println("Found4::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = false;
ii = 0;
}
}
}
return found;
}
}
Others have suggested using String.contains() - which is java.lang code, rather than a Java library. However, you obviously want to explore how you could do this yourself. One way to do that is to look at the OpenJDK 7 source code for String.contains(), which under the covers uses String.indexOf(). You can see the (fairly basic) algorithm they use there.
Problem with your code
Interestingly, your code works for "rahul" and "rararahul" when I paste it into my dev environment. The infinite loop on non matching exists, though. This will occur for any str2 that contains any of the characters of str1. This is because once you find a match of any character in str1 within str2, you reset your variables to start again. Your output is actually enough to debug that, if you look at the sequence that it goes through each string.
Possible fix
If you want to pursue your own approach and learn from that then consider stopping and doing a little design on paper with your own approach. You're looking for an occurence of str1 in str2. So you probably want to swap your loops around. Then you can be more efficient. You can go through the longer String (str2) character by character in the outer loop. Then you only really need to go into the inner loop if the first character of the shorter string (str1) matches the character you're dealing with in str2.
e.g. for the loop bit of your code
boolean retFound = false;
for (int jj = 0; jj < len2; jj++) {
if (arr1[0] == arr2[jj]) {
boolean tempFound = true;
int foundIndex = jj;
for (int ii = 0; ii < len1; ii++) {
if (arr1[ii] != arr2[jj+ii]) {
tempFound = false;
break;
}
}
if (tempFound) {
System.out.println("Found substring " + str1 + " in " + str2 + " at index " + foundIndex);
System.out.println("Carrying on to look for further matches...");
tempFound = false;
retFound = true;
}
}
}
return retFound;
Note, this won't be fast, but it should work. I've tested on all the string samples you provided. You get a bonus too - it will find multiple matches. If you don't want that (just want true false), break out when it says "Carrying on to look for..."
As others have said, if you want to continue with your original code, certainly don't try to change loop variables (i.e. ii) within the inner loop. That's bad practice, hard to read and prone to lots of bugs.
in the block startin with
} else if (found == true && arr1[ii] != arr2[jj]) {
you set ii back to zero. And thats why ii never will be bigger or equals len1
You need to put the outer loop for jj and inner loop for ii:
int ii=0;
for (int jj = 0; jj < len2; jj++) {
if (arr1[ii] == arr2[jj]) {
if (ii < len1 - 1) {
System.out.println("Found1::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = true;
ii++;
} else if (arr1[ii] == arr2[jj] && ii == len1 - 1) {
System.out.println("Found2::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = true;
break;
}
} else if (found == false && arr1[ii] != arr2[jj]) {
System.out.println("Found3::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = false;
} else if (found == true && arr1[ii] != arr2[jj]) {
System.out.println("Found4::" + "arr1::" + arr1[ii]
+ "and arr2::" + arr2[jj]);
found = false;
ii = 0;
}
}
EDIT:
You are also initializing the inner for loop for each character in the larger string. You don't need two loops at all. I have changed it appropriately. This should work.
You can use one loop and matching condition where the search will begin when the first char will be found in the full string. And then, the search will continue where where the matching will one by one from the list.Okay, here I am giving an example to explain.
public static boolean subString2(String smallString, String fullString)
{
int k = 0;
for (int i = 0; i < fullString.length(); i++)
{
System.out.println("fullStringCharArray[i]: " + fullString.charAt(i));
if (smallString.charAt(k) == fullString.charAt(i))
{
System.out.println("Found: " + smallString.charAt(k));
k++;
if (k == smallString.length())
return true;
}
else
{
k = 0;
}
}
return false;
}
Here, what is happening, we are going to search in fullString. if the first char of your smallString 'rahul' is 'r' then until it is found, the other part of the string ('ahul') will not be matched. so when the 'r' is matched then it will try to search for 'a' and then 'h' and more. So, if the count of search true(k) is equal of smallString length then the substring exists. I hope, I could explain properly. Sorry for my English.
Use This Code.
This will help you and very short and clear
public static boolean subString(String str1, String str2) {
int str1Len = str2 == null ? 0 : str1.length();
int str2Len = str2 == null ? 0 : str2.length();
for (int i = 0; i < str2Len; i++) {
if (str1.charAt(0) == str2.charAt(i)) {
int count = 0;
for (int j = 0; j < str1Len; j++) {
if (str1.charAt(j) == str2.charAt(i)) {
i++;
count++;
}
}
if (count == str1Len) {
return true;
}
}
}
return false;
}
public class Main {
public static void main(String args[]) {
Scanner sc=new Scanner(System.in);
System.out.println("Enter the string");
String str=sc.nextLine();
String Str1=" ";
System.out.println("Enter the numbers");
int start=sc.nextInt();
int end=sc.nextInt();
for (int i = start; i < end; i++)
Str1 += String.valueOf(str.charAt(i));
System.out.println(Str1);
}
}

Check if char is in () block

I would like to check if a semicolon (;) is in the brackets of an AND or OR block within a string.
For example:
IF(AND(ROUND($GX18-SUM(0)/$M$12;2)<=0;$AK$7=1);0;OR(1;A2)+O2)
If it's not within an AND or OR then I replace it with #:
IF(AND(ROUND($GX18-SUM(0)/$M$12;2)<=0$AK$7=1)#0#OR(1;A2)+O2)
I know how to do the substitution, but how do I detect whether the ; is inside such a block?
UPDATE
Using regex possibly seems quite complex. However, to break down the problem:
How to detect if a certain char(;) is within an AND(...) or OR(...)? This would help me a lot!
Hope following java code helps to resolve your problem,
String str = "IF(AND(ROUND($GX18-SUM(0)/$M$12;2)<=0$AK$7=1);0;OR(1;A2)+O2)";
char[] ch = str.toCharArray();
int count = 0;
String temp = "";
for (int i = 0; i < ch.length; i++) {
temp = temp + ch[i];
if ("AND(".equals(temp) || "OR(".equals(temp)) {
count++;
}
if ("(".equals(temp) && count > 0) {
count++;
}
if (")".equals(temp) && count > 0) {
count--;
}
if (";".equals(temp) && count == 0) {
ch[i] = '#';
}
if ((!"AND(".startsWith(temp) && !"OR(".startsWith(temp)) || temp.length() > 4) {
temp = "";
}
}
System.out.println("Expected Data >> " + String.valueOf(ch));

Compression algorithm in java

My goal is to write a program that compresses a string, for example:
input: hellooopppppp!
output:he2l3o6p!
Here is the code I have so far, but there are errors.
When I have the input: hellooo
my code outputs: hel2l3o
instead of: he213o
the 2 is being printed in the wrong spot, but I cannot figure out how to fix this.
Also, with an input of: hello
my code outputs: hel2l
instead of: he2lo
It skips the last letter in this case all together, and the 2 is also in the wrong place, an error from my first example.
Any help is much appreciated. Thanks so much!
public class compressionTime
{
public static void main(String [] args)
{
System.out.println ("Enter a string");
//read in user input
String userString = IO.readString();
//store length of string
int length = userString.length();
System.out.println(length);
int count;
String result = "";
for (int i=1; i<=length; i++)
{
char a = userString.charAt(i-1);
count = 1;
if (i-2 >= 0)
{
while (i<=length && userString.charAt(i-1) == userString.charAt(i-2))
{
count++;
i++;
}
System.out.print(count);
}
if (count==1)
result = result.concat(Character.toString(a));
else
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
IO.outputStringAnswer(result);
}
}
I would
count from 0 as that is how indexes work in Java. Your code will be simpler.
would compare the current char to the next one. This will avoid printing the first character.
wouldn't compress ll as 2l as it is no smaller. Only sequences of at least 3 will help.
try to detect if a number 3 to 9 has been used and at least print an error.
use the debugger to step through the code to understand what it is doing and why it doesn't do what you think it should.
I am doing it this way. Very simple:
public static void compressString (String string) {
StringBuffer stringBuffer = new StringBuffer();
for (int i = 0; i < string.length(); i++) {
int count = 1;
while (i + 1 < string.length()
&& string.charAt(i) == string.charAt(i + 1)) {
count++;
i++;
}
if (count > 1) {
stringBuffer.append(count);
}
stringBuffer.append(string.charAt(i));
}
System.out.println("Compressed string: " + stringBuffer);
}
You can accomplish this using a nested for loops and do something simial to:
count = 0;
String results = "";
for(int i=0;i<userString.length();){
char begin = userString.charAt(i);
//System.out.println("begin is: "+begin);
for(int j=i+1; j<userString.length();j++){
char next = userString.charAt(j);
//System.out.println("next is: "+next);
if(begin == next){
count++;
}
else{
System.out.println("Breaking");
break;
}
}
i+= count+1;
if(count>0){
String add = begin + "";
int tempcount = count +1;
results+= tempcount + add;
}
else{
results+= begin;
}
count=0;
}
System.out.println(results);
I tested this output with Hello and the result was He2lo
also tested with hellooopppppp result he2l3o6p
If you don't understand how this works, you should learn regular expressions.
public String rleEncodeString(String in) {
StringBuilder out = new StringBuilder();
Pattern p = Pattern.compile("((\\w)\\2*)");
Matcher m = p.matcher(in);
while(m.find()) {
if(m.group(1).length() > 1) {
out.append(m.group(1).length());
}
out.append(m.group(2));
}
return out.toString();
}
Try something like this:
public static void main(String[] args) {
System.out.println("Enter a string:");
Scanner IO = new Scanner(System.in);
// read in user input
String userString = IO.nextLine() + "-";
int length = userString.length();
int count = 0;
String result = "";
char new_char;
for (int i = 0; i < length; i++) {
new_char = userString.charAt(i);
count++;
if (new_char != userString.charAt(i + 1)) {
if (count != 1) {
result = result.concat(Integer.toString(count + 1));
}
result = result.concat(Character.toString(new_char));
count = 0;
}
if (userString.charAt(i + 1) == '-')
break;
}
System.out.println(result);
}
The problem is that your code checks if the previous letter, not the next, is the same as the current.
Your for loops basically goes through each letter in the string, and if it is the same as the previous letter, it figures out how many of that letter there is and puts that number into the result string. However, for a word like "hello", it will check 'e' and 'l' (and notice that they are preceded by 'h' and 'e', receptively) and think that there is no repeat. It will then get to the next 'l', and then see that it is the same as the previous letter. It will put '2' in the result, but too late, resulting in "hel2l" instead of "he2lo".
To clean up and fix your code, I recommend the following to replace your for loop:
int count = 1;
String result = "";
for(int i=0;i<length;i++) {
if(i < userString.length()-1 && userString.charAt(i) == userString.charAt(i+1))
count++;
else {
if(count == 1)
result += userString.charAt(i);
else {
result = result + count + userString.charAt(i);
count = 1;
}
}
}
Comment if you need me to explain some of the changes. Some are necessary, others optional.
Here is the solution for the problem with better time complexity:
public static void compressString (String string) {
LinkedHashSet<String> charMap = new LinkedHashSet<String>();
HashMap<String, Integer> countMap = new HashMap<String, Integer>();
int count;
String key;
for (int i = 0; i < string.length(); i++) {
key = new String(string.charAt(i) + "");
charMap.add(key);
if(countMap.containsKey(key)) {
count = countMap.get(key);
countMap.put(key, count + 1);
}
else {
countMap.put(key, 1);
}
}
Iterator<String> iterator = charMap.iterator();
String resultStr = "";
while (iterator.hasNext()) {
key = iterator.next();
count = countMap.get(key);
if(count > 1) {
resultStr = resultStr + count + key;
}
else{
resultStr = resultStr + key;
}
}
System.out.println(resultStr);
}

longest substring with exclude list of strings

I am using this algorithm to find common substring between 2 strings. Please, help me to do this but with using Array of common substrings of this strings, which I should ignore in my function.
My Code in Java:
public static String longestSubstring(String str1, String str2) {
StringBuilder sb = new StringBuilder();
if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) {
return "";
}
// java initializes them already with 0
int[][] num = new int[str1.length()][str2.length()];
int maxlen = 0;
int lastSubsBegin = 0;
for (int i = 0; i < str1.length(); i++) {
for (int j = 0; j < str2.length(); j++) {
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
if (num[i][j] > maxlen) {
maxlen = num[i][j];
// generate substring from str1 => i
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
//if the current LCS is the same as the last time this block ran
sb.append(str1.charAt(i));
} else {
//this block resets the string builder if a different LCS is found
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
}
}
}
}
return sb.toString();
}
So, my function should looks like:
public static String longestSubstring(String str1, String str2, String[] ignore)
Create a suffix tree of one of your strings and run through the second to see which substring can be found in the suffix tree.
Info on suffixtrees: http://en.wikipedia.org/wiki/Suffixtree
As far as I understand, you have to ignore those substrings that contain at least one string from ignore.
if (str1.charAt(i) == str2.charAt(j)) {
if ((i == 0) || (j == 0)) {
num[i][j] = 1;
} else {
num[i][j] = 1 + num[i - 1][j - 1];
}
// we must update `sb` on every step so that we can compare it with `ignore`
int thisSubsBegin = i - num[i][j] + 1;
if (lastSubsBegin == thisSubsBegin) {
sb.append(str1.charAt(i));
} else {
lastSubsBegin = thisSubsBegin;
sb = new StringBuilder();
sb.append(str1.substring(lastSubsBegin, i + 1));
}
// check whether current substring contains any string from `ignore`,
// and if it does, find the longest one
int biggestIndex = -1;
for (String s : ignore) {
int startIndex = sb.lastIndexOf(s);
if (startIndex > biggestIndex) {
biggestIndex = startIndex;
}
}
//Then sb.substring(biggestIndex + 1) will not contain strings to be ignored
sb = sb.substring(biggestIndex + 1);
num[i][j] -= (biggestIndex + 1);
if (num[i][j] > maxlen) {
maxlen = num[i][j];
}
}
If you have to ignore those substrings that are exactly the same as any string in ignore,
then when the candidate for longest common substring is found, iterate over ignore and check whether there is current substring in it.

Categories