Java match string with x allowed mismatches. - java

What is the fastest / clearest way to see if a string matches to another string of the same length with X allowed mismatches? Is there a library that can do this, its not in Apache stringUtils (there is only one that also uses insertions / deletions).
So lets say I have 2 string of length for and I want to know if they match with 1 mismatch allowed. Insertions and deletions are not allowed.
So:
ABCD <-> ABCD = Match
ABCC <-> ABCD = Match with 1 mismatch
ACCC <-> ABCD = no match, 2 mismatches is too much.

String str1, str2;
Assuming the lengths of the strings are equal:
int i = 0;
for(char c : str1.toCharArray())
{
if(c != str2.charAt(i++))
counter++;
}
if(counter > 1)
// mismatch

Compare the strings one character at a time.Keep a counter to count the mismatch.When the counter exceeds the limit, return false.If you reach the end of string, return true

Try this to store the strings in a char array (char[] charArray = String.toCharArray()).
char[] stringA = firsString.toCharArray();
char[] stringB = secondString.toCharArray();
int ctr = 0;
if(stringA.length == stringB.length){
for(int i = 0; i<stringA.length; i++){
if(stringA[i] == stringB[i]){
ctr++;
}
}
}
//do the if-else here using the ctr

If you want the FASTEST way, you should code it from an existing algorithm like "Approximate Boyer-Moore String Matching" or Suffix Tree method...
Look at here: https://codereview.stackexchange.com/questions/13383/approximate-string-matching-interview-question
They did the math, you do the code...
Other interesting SO posts are:
Getting the closest string match
Can java.util.regex.Pattern do partial matches?
Generating all permutations of a given string
Similarity Score - Levenshtein

Related

Reducing run time in java

Below Java code produces the valid output but it takes more time to execute. Code works fine in eclipse, but it do not work in an online compiler like hackerrank or hackerearth since it takes more time for execution.Someone help me to find the solution for my time complexity problem.
I have tried to find the solution of the problem but i wasn't able to fix the performance by reducing the time..
Scanner scan = new Scanner(System. in );
String s = "aab";
String s1 = "";
String s2 = "";
int n1 = 0;
int length = 0;
long n = 882787;
long count = 0;
while (s1.length() < n) {
s1 = s1 + s;
}
if (s1.length() > n) {
count = s1.length() - n;
n1 = (int) count;
}
for (int i = 0; i < s1.length() - n1; i++) {
if (s1.charAt(i) == 'a') {
length += 1;
}
}
System.out.println(length);
Explanation of the above program:
I have a string s,in lowercase English letters that .I have repeat the string for n times and I store it in the new string.
I have to find the number of occurrences of 'a' in my new string
How do i actually reduce the time complexity for the above program
Thanks in advance
I would use a regular expression to create a String based on the initial input consisting of only letter 'a'(s). Take the length of that String and multiply it by n. That is one line that looks like
System.out.println(s.replaceAll("[^a]+", "").length() * n);
You are going to add s to the string n/s.length() times, call this N:
int N = n / s.length();
Each time you add s to the string you are going to append the number of As in s:
int a = 0;
for (int i = 0; i < s.length(); ++i) {
a += s.charAt(i) == 'a' ? 1 : 0;
}
// Or int a = s.replaceAll("[^a]", "").length();
So multiple these together:
int length = a * N;
String is immutable. Modification of a string is in fact create a new String object and put both old and new String into Java String constant poom
If you don't want to change your algorithm, I'd suggest to use StringBuilder to improve the speed of the execution. Note that StringBuilder is not thread safe
String s="aab";
int n1 = 0;
StringBuilder sb1 = new StringBuilder();
int length=0;
long n=882787;
long count=0;
while(sb1.length() < n) {
sb1.append(s);
}
if(sb1.length()>n) {
count =sb1.length()-n;
n1=(int)count;
}
for(int i=0;i<sb1.length()- n1;i++) {
if(sb1.charAt(i)=='a') {
length+=1;
}
}
System.out.println(length);
From here
When to use which one :
If a string is going to remain constant throughout the program, then
use String class object because a String object is immutable. If a
string can change (example: lots of logic and operations in the
construction of the string) and will only be accessed from a single
thread, using a StringBuilder is good enough. If a string can change,
and will be accessed from multiple threads, use a StringBuffer because
StringBuffer is synchronous so you have thread-safety.
I see multiple possible optimizations:
a) One pattern that is not that good is creating lots of Strings through repeated string concatenation. Each "s1 = s1 + s;" creates a new instance of String which will be obsolet the next time the command runs (It increases the load, because the String instances will be additional work for the Garbage Collector).
b) Generally: If you find, that your algorithm takes too long, then you should think about a complete new way to solve the issue. So a different solution could be:
- You know the length you want to have (n) and the length of the small string (s1) that you use to create the big string. So you can calculate: How often will the small string be inside the target string? How many characters are left?
==> You can simply check the small string for the character you are looking for. That multiplied by the number how often the small string will be inside the big string is the first result that you get.
==> Now you need to check the substring of the small string that are missing.
Example: n=10, s1="aab", Looking for "a":
So first we check how often the s1 will fit into a new string of n Characters n/length(s1) => 3
So we check how often the "a" is inside "aab" -> 2
First result is 3*2 = 6
But we checked for 3*3 = 9 characters so far, but we want 10 characters. So we need to check n % length(s1) = 1 character of s1 and in this substring ("a"), wie have 1 a, so we have to add 1.
So the result is 7 which we got without building a big string (which is not required at all!)
Just check how many times the char occurs in the original and multiple it by n. Here's a simple way to do so without even using regex:
// take these as function input or w/e
String s = "aab";
String find = "a";
long n = 882787;
int count = s.length() - s.replaceAll(find, "").length();
System.out.println(count * n);

How to tokenize Chinese into individual characters in Java? [duplicate]

I need to split a String into an array of single character Strings.
Eg, splitting "cat" would give the array "c", "a", "t"
"cat".split("(?!^)")
This will produce
array ["c", "a", "t"]
"cat".toCharArray()
But if you need strings
"cat".split("")
Edit: which will return an empty first value.
String str = "cat";
char[] cArray = str.toCharArray();
If characters beyond Basic Multilingual Plane are expected on input (some CJK characters, new emoji...), approaches such as "a💫b".split("(?!^)") cannot be used, because they break such characters (results into array ["a", "?", "?", "b"]) and something safer has to be used:
"a💫b".codePoints()
.mapToObj(cp -> new String(Character.toChars(cp)))
.toArray(size -> new String[size]);
split("(?!^)") does not work correctly if the string contains surrogate pairs. You should use split("(?<=.)").
String[] splitted = "花ab🌹🌺🌷".split("(?<=.)");
System.out.println(Arrays.toString(splitted));
output:
[花, a, b, 🌹, 🌺, 🌷]
To sum up the other answers...
This works on all Java versions:
"cat".split("(?!^)")
This only works on Java 8 and up:
"cat".split("")
An efficient way of turning a String into an array of one-character Strings would be to do this:
String[] res = new String[str.length()];
for (int i = 0; i < str.length(); i++) {
res[i] = Character.toString(str.charAt(i));
}
However, this does not take account of the fact that a char in a String could actually represent half of a Unicode code-point. (If the code-point is not in the BMP.) To deal with that you need to iterate through the code points ... which is more complicated.
This approach will be faster than using String.split(/* clever regex*/), and it will probably be faster than using Java 8+ streams. It is probable faster than this:
String[] res = new String[str.length()];
int 0 = 0;
for (char ch: str.toCharArray[]) {
res[i++] = Character.toString(ch);
}
because toCharArray has to copy the characters to a new array.
for(int i=0;i<str.length();i++)
{
System.out.println(str.charAt(i));
}
Maybe you can use a for loop that goes through the String content and extract characters by characters using the charAt method.
Combined with an ArrayList<String> for example you can get your array of individual characters.
If the original string contains supplementary Unicode characters, then split() would not work, as it splits these characters into surrogate pairs. To correctly handle these special characters, a code like this works:
String[] chars = new String[stringToSplit.codePointCount(0, stringToSplit.length())];
for (int i = 0, j = 0; i < stringToSplit.length(); j++) {
int cp = stringToSplit.codePointAt(i);
char c[] = Character.toChars(cp);
chars[j] = new String(c);
i += Character.charCount(cp);
}
In my previous answer I mixed up with JavaScript. Here goes an analysis of performance in Java.
I agree with the need for attention on the Unicode Surrogate Pairs in Java String. This breaks the meaning of methods like String.length() or even the functional meaning of Character because it's ultimately a technical object which may not represent one character in human language.
I implemented 4 methods that split a string into list of character-representing strings (Strings corresponding to human meaning of characters). And here's the result of comparison:
A line is a String consisting of 1000 arbitrary chosen emojis and 1000 ASCII characters (1000 times <emoji><ascii>, total 2000 "characters" in human meaning).
(discarding 256 and 512 measures)
Implementations:
codePoints (java 11 and above)
public static List<String> toCharacterStringListWithCodePoints(String str) {
if (str == null) {
return Collections.emptyList();
}
return str.codePoints()
.mapToObj(Character::toString)
.collect(Collectors.toList());
}
classic
public static List<String> toCharacterStringListWithIfBlock(String str) {
if (str == null) {
return Collections.emptyList();
}
List<String> strings = new ArrayList<>();
char[] charArray = str.toCharArray();
int delta = 1;
for (int i = 0; i < charArray.length; i += delta) {
delta = 1;
if (i < charArray.length - 1 && Character.isSurrogatePair(charArray[i], charArray[i + 1])) {
delta = 2;
strings.add(String.valueOf(new char[]{ charArray[i], charArray[i + 1] }));
} else {
strings.add(Character.toString(charArray[i]));
}
}
return strings;
}
regex
static final Pattern p = Pattern.compile("(?<=.)");
public static List<String> toCharacterStringListWithRegex(String str) {
if (str == null) {
return Collections.emptyList();
}
return Arrays.asList(p.split(str));
}
Annex (RAW DATA):
codePoints;classic;regex;lines
45;44;84;256
14;20;98;512
29;42;91;1024
52;56;99;2048
87;121;174;4096
175;221;375;8192
345;411;839;16384
667;826;1285;32768
1277;1536;2440;65536
2426;2938;4238;131072
We can do this simply by
const string = 'hello';
console.log([...string]); // -> ['h','e','l','l','o']
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax says
Spread syntax (...) allows an iterable such as an array expression or string to be expanded...
So, strings can be quite simply spread into arrays of characters.

Replacing a character in a string from another string with the same char index

I'm trying to search and reveal unknown characters in a string. Both strings are of length 12.
Example:
String s1 = "1x11222xx333";
String s2 = "111122223333"
The program should check for all unknowns in s1 represented by x|X and get the relevant chars in s2 and replace the x|X by the relevant char.
So far my code has replaced only the first x|X with the relevant char from s2 but printed duplicates for the rest of the unknowns with the char for the first x|X.
Here is my code:
String VoucherNumber = "1111x22xx333";
String VoucherRecord = "111122223333";
String testVoucher = null;
char x = 'x'|'X';
System.out.println(VoucherNumber); // including unknowns
//find x|X in the string VoucherNumber
for(int i = 0; i < VoucherNumber.length(); i++){
if (VoucherNumber.charAt(i) == x){
testVoucher = VoucherNumber.replace(VoucherNumber.charAt(i), VoucherRecord.charAt(i));
}
}
System.out.println(testVoucher); //after replacing unknowns
}
}
I am always a fan of using StringBuilders, so here's a solution using that:
private static String replaceUnknownChars(String strWithUnknownChars, String fullStr) {
StringBuilder sb = new StringBuilder(strWithUnknownChars);
while ((int index = Math.max(sb.toString().indexOf('x'), sb.toString().indexOf('X'))) != -1) {
sb.setCharAt(index, fullStr.charAt(index));
}
return sb.toString();
}
It's quite straightforward. You create a new string builder. While a x or X can still be found in the string builder (indexOf('X') != -1), get the index and setCharAt.
Your are using String.replace(char, char) the wrong way, the doc says
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
So you if you have more than one character, this will replace every one with the same value.
You need to "change" only the character at a specific spot, for this, the easiest is to use the char array that you can get with String.toCharArray, from this, this is you can use the same logic.
Of course, you can use String.indexOf to find the index of a specific character
Note : char c = 'x'|'X'; will not give you the expected result. This will do a binary operation giving a value that is not the one you want.
The OR will return 1 if one of the bit is 1.
0111 1000 (x)
0101 1000 (X)
OR
0111 1000 (x)
But the result will be an integer (every numeric operation return at minimum an integer, you can find more information about that)
You have two solution here, you either use two variable (or an array) or if you can, you use String.toLowerCase an use only char c = 'x'

Complex numbers string/array in java?

I want to assign complex numbers to binary values which i am doing using complex class, easily available; but when a i am appending complex numbers to string buffer, returning into a string, one complex number is taking multiple entries. How can i place 1 number to 1 entry and read afterwards? Ending string data like this "2.0+2.0i2.0-2.0i2.0+2.0i2.0-2.0i-2.0+2.0i2.0+2.0i....". Now character at 0 is '2', char at 1 is '.' and so on.. I need char at 0 to be 2.0+2.0i. and afterwards i should be able to separate real and imaginary parts of each entry.
StringBuilder symbs = new StringBuilder();
Complex s1 = new Complex(-2,-2);
Complex s2 = new Complex(+2,-2);
Complex s3 = new Complex(+2,+2);
Complex s4 = new Complex(-2,+2);
/////////////////////Symbols to vector ////////////////////
for(int i=0; i< plo.length()-1; i+=2)
{
if(plo.charAt(i)=='1' && plo.charAt(i+1)=='0')
{
symbs.append(s1);
}
else if(plo.charAt(i)=='0' && plo.charAt(i+1)=='1')
{
symbs.append(s2);
}
else if(plo.charAt(i)=='0' || plo.charAt(i+1)=='0')
{
symbs.append(s3);
}
else if(plo.charAt(i)=='1' && plo.charAt(i+1)=='1')
{
symbs.append(s4);
}
}
printComplex(symbs.toString());
"I need char at 0 to be 2.0+2.0i." thats not possible, as a char is one character.
you can append semicolons after every complex and then split for them
Use Regex pattern matching similar like this [[0-9].0+[0-9].0i]+
Pass the string to the Matcher compiler and check available pattern is found on the given string. Based on that extract the value with programmatic logic to get real and imaginary parts.
Explanation on regex string
Real part
[0-9] --> it check value from 0 to 9 followed by . and followed by 0
Imaginary part
[0-9] --> it check value from 0 to 9 followed by . and followed by 0 and then by i

Run-length decompression

CS student here. I want to write a program that will decompress a string that has been encoded according to a modified form of run-length encoding (which I've already written code for). For instance, if a string contains 'bba10' it would decompress to 'bbaaaaaaaaaa'. How do I get the program to recognize that part of the string ('10') is an integer?
Thanks for reading!
A simple regex will do.
final Matcher m = Pattern.compile("(\\D)(\\d+)").matcher(input);
final StringBuffer b = new StringBuffer();
while (m.find())
m.appendReplacement(b, replicate(m.group(1), Integer.parseInt(m.group(2))));
m.appendTail(b);
where replicate is
String replicate(String s, int count) {
final StringBuilder b = new StringBuilder(count);
for (int i = 0; i < count; i++) b.append(s);
return b.toString();
}
Not sure whether this is one efficient way, but just for reference
for (int i=0;i<your_string.length();i++)
if (your_string.charAt(i)<='9' && your_string.charAt(i)>='0')
integer_begin_location = i;
I think you can divide chars in numeric and not numeric symbols.
When you find a numeric one (>0 and <9) you look to the next and choose to enlarge you number (current *10 + new) or to expand your string
Assuming that the uncompressed data does never contain digits: Iterate over the string, character by character until you get a digit. Then continue until you have a non-digit (or end of string). The digits inbetween can be parsed to an integer as others already stated:
int count = Integer.parseInt(str.substring(start, end));
Here is a working implementation in python. This also works fine for 2 or 3 or multiple digit numbers
inputString="a1b3s22d4a2b22"
inputString=inputString+"\0" //just appending a null char
charcount=""
previouschar=""
outputString=""
for char in inputString:
if char.isnumeric():
charcount=charcount+char
else:
outputString=outputString
if previouschar:
outputString=outputString+(previouschar*int(charcount))
charcount=""
previouschar=char
print(outputString) // outputString= abbbssssssssssssssssssssssddddaabbbbbbbbbbbbbbbbbbbbbb
Presuming that you're not asking about the parsing, you can convert a string like "10" into an integer like this:
int i = Integer.parseInt("10");

Categories