Compare first three characters of two strings

Compare first three characters of two strings - java

Strings s1 and s2 will always be of length 1 or higher.
How can I speed this up?
int l1 = s1.length();
if (l1 > 3) { l1 = 3; }
if (s2.startsWith(s1.substring(0,l1)))
{
// do something..
}
Regex maybe?

Rewrite to avoid object creation
Your instincts were correct. The creation of new objects (substring()) is not very fast and it means that each one created must incur g/c overhead as well.
This might be a lot faster:
static boolean fastCmp(String s1, String s2) {
return s1.regionMatches(0, s2, 0, 3);
}

This seems pretty reasonable. Is this really too slow for you? You sure it's not premature optimization?

if (s2.startsWith(s1.substring(0, Math.min(3, s1.length())) {..};
Btw, there is nothing slow in it. startsWith has complexity O(n)
Another option is to compare the char values, which might be more efficient:
boolean match = true;
for (int i = 0; i < Math.min(Math.min(s1.length(), 3), s2.length()); i++) {
if (s1.charAt(i) != s2.charAt(i)) {
match = false;
break;
}
}

My java isn't that good, so I'll give you an answer in C#:
int len = Math.Min(s1.Length, Math.Min(s2.Length, 3));
for(int i=0; i< len; ++i)
{
if (s1[i] != s2[i])
return false;
}
return true;
Note that unlike yours and Bozho's, this does not create a new string, which would be the slowest part of your algorithm.

Perhaps you could do this
if (s1.length() > 3 && s2.length() > 3 && s1.indexOf (s2.substring (0, 3)) == 0)
{
// do something..
}

There is context missing here:
What are you trying to scan for? What type of application? How often is it expected to run?
These things are important because different scenarios call for different solutions:
If this is a one-time scan then this is probably unneeded optimization. Even for a 20MB text file, it wouldn't take more than a couple of minutes in the worst case.
If you have a set of inputs and for each of them you're scanning all the words in a 20MB file, it might be better to sort/index the 20MB file to make it easy to look up matches and skip the 99% of unnecessary comparisons. Also, if inputs tend to repeat themselves it might make sense to employ caching.
Other solutions might also be relevant, depending on the actual problem.
But if you boil it down only to comparing the first 3 characters of two strings, I believe the code snippets given here are as good as you're going to get - they're all O(1)*, so there's no drastic optimization you can do.
*The only place where this might not hold true is if getting the length of the string is O(n) rather than O(1) (which is the case for the strlen function in C++), which is not the case for Java and C# string objects.

Related

How to use Streams instead of a loop that contains a variable?

I developed this code to check if the string made up of parentheses is balanced. For instance,
Not balanced: ["(", ")", "((", "))", "()(", ")(", ...].
Balanced: ["()", "()()", "(())", "(())()", ...]
I want to use a stream instead of a for-loop. There is a conditional statement inside the loop that checks when the counter variable is less than zero. I am unclear know how to include this variable in a stream.
I would welcome feedback on this. Please see my code below:
public String findBalance(String input) {
String[] arr = input.split("");
Integer counter = 0;
for (String s : arr) {
counter += s.equals("(") ? 1 : -1;
if (counter < 0) break;
}
return counter == 0 ? "Balanced" : "Not Balanced";
}
Thanks in advance.

Streaming isn't a good fit. Ask yourself what would happen if you used parallelStream(). How would you handle a simple edge case?
)(
You want to detect when the count dips below 0, even if it later goes back up. It's quite difficult to do that with a parallel stream. The string is best processed sequentially.
Stream operations work best when they are independent and stateless. Stateful actions are better suited for a regular for loop like you already have.

Let's see whether Stream could fit.
Sequentially, by character, maintaining a nesting stack, one would need a counter; reduction would be feasible.
However the code is not much better, more verbose and slower. Exploiting parallelism
would need to handle "...(" & ")...". Possible but not nice.
Let's look at a more parallel approach: substituting inner reducible expressions (redex ()) recursively/repeatedly:
public String findBalance(String input) {
String s = input.replaceAll("[^()]", "");
for (;;) {
String t = s.replace("()", "");
if (t.length() == s.length()) {
break;
}
s = t;
}
return s.isEmpty();
}
Though a split-and-conquer parallelism is feasible here, the problem is the repetition needed for things like (()).
For a Stream one might think of a flatMap or such. However Stream processing steps do not repeat (in the current java version), hence not easily feasible.
So a Stream solution is not fitting.
One could combine the recursion function with a Stream splitting the passed string,
to use a parallel Stream, but even with 16 cores and strings of length 1000 that will not be considerably faster.

optimization - converting std input to integer array in java

I want to read each line of input, store the numbers in an int[] array preform some calculations, then move onto my next line of input as fast as possible.
Input (stdin)
2 4 8
15 10 5
12 14 3999 -284 -71
0 -213 18 4 2
0
This is a pure optimization problem and not entirely good practice in the real world as I'm assuming perfect input. I'm interested in how to improve my current method for taking input from stdin and representing it as an integer array. I have seen methods using scanner where they use a getnextint method, however I've read in multiple places scanner is a lot slower than BufferedReader.
Can this taking in of input step be improved?
Current Method
BufferedReader bufferedInput = new BufferedReader(new InputStreamReader(System.in));
String line;
String[] lineArray;
try{
// a line with just "0" indicates end of std input
while((line = bufferedInput.readLine()) != "0"){
lineArray = line.split("\\s+"); // is "\\s+" the optimized regex
int arrlength = lineArray.length;
int[] lineInt = new int[arrlength];
for(int i = 0; i < arrlength; i++){
lineInt[i] = Integer.parseInt(lineArray[i]);
}
// Preform some operations on lineInt, then regenerate a new
// lineInt with inputs from next line of stdin
}
}catch(IOException e){
}
judging from other questions Difference between parseInt and valueOf in java? parseint seems to be the most efficient method for converting strings to integers1. Any enlightenment would be of great help.
Thank you :)
Edit 1: removed GCD information and 'algorithm' tag
Edit 2: (hopefully) made question more concise, grammatical fix ups

First of all, I just want out that it is totally pointless optimizing in your particular example.
For your example, most people would agree that the best solution is not the optimal one. Rather the most readable solution is will be the best.
Having said that, if you want the most optimal solution, then don't use Scanner, don't use BufferedReader.readLine(), don't use String.split and don't use Integer.parseInt(...).
Instead read characters one at a time using BufferedReader.read() and parse and convert them to int by hand. You also need to implement your own "extendable array of int" type that behaves like an ArrayList<Integer>.
This is a lot of (unnecessary) work, and many more lines of code to maintain. BAD IDEA ...

I second what Stephen said, the speed of parsing is likely to massively outperform the speed of actual I/O done, therefore improving parsing won't give you much.
Seriously, don't do this unless you've built the whole system, profiled it and found that inefficient parsing is what keeps it from hitting its performance targets.
But strictly just as an exercise, and because the general principle may be useful elsewhere, here's an example of how to parse it straight from a string.
The assumptions are:
You will use a sensible encoding, where the characters 0..9 are consecutive.
The only characters in the stream will be 0..9, minus sign and space.
All the numbers are well-formed.
Another important caveat is that for the sake of simplicity I used ArrayList, which is a bad idea for storing primitives, the overhead of boxing/unboxing probably wipes out all improvement in parsing speed. In the real world I'd use a list variant custom-made for primitives.
public static List<Integer> parse(String s) {
List<Integer> ret = new ArrayList<Integer>();
int sign = 1;
int current = 0;
boolean inNumber = false;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c >= '0' && c <= '9') { //we assume a sensible encoding
current = current * 10 + sign * (c-'0');
inNumber = true;
}
else if (c == ' ' && inNumber) {
ret.add(current);
current = 0;
inNumber = false;
sign = 1;;
}
else if (c == '-') {
sign = -1;
}
}
if (inNumber) {
ret.add(current);
}
return ret;
}

Better way to check a String?

I have a code that check a string for space,comma and etc. Well since
I will deal a scenario where my app will going to check, lets say thousand of string with a max length of 15 and a minimum length of 14. I am worried if it will affect the performance since it is in android. Check the code i used..
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (int i = 0;i > undefinedChars.length;i++)
if (c == undefinedChars[i])
return true;
}
return false;
}
Is this correct? or there is a way to improve it?

There is a change that you could make that might make a little difference:
Change
char[] _message = message.toCharArray();
for (char c : _message) {
to
for (int i = 0; i < message.length(); i++) {
char c = message.charAt(i);
However, I doubt that it will be significant.
Replacing the inner loop with a switch is more likely to be fruitful, though it depends on what the JIT compiler does with the code. (And a switch will only works if the set of undefined characters can be hard-wired into the switch statement as compile-time constants.)
I am worried if it will affect the performance since it is in android.
Don't "worry". Approach the problem scientifically.
Implement the code and then benchmark it.
If the measured performance is a concern, then:
profile the code
look at hotspots, and identify possible improvements
implement and test possible improvement
rerun the benchmark to see if the improvement actually made any difference
repeat ... until performance is good enough or you run out of options.
The other thing to note is that the same code could well perform differently across different Android platforms. The quality of JIT compilers has (apparently) improved markedly in more recent releases.
I would argue that it is a bad idea to "bend" your code just to get it to run well on old phones. The chances are that the user will upgrade their hardware soon anyway ... and it is conceivable that your optimization for the old platform actually makes your code slower on a new platform ... 'cos your hand-optimizations have made the code too tricky for the JIT compiler's optimizer to deal with.
This is also an argument for NOT trying to make your code go "as fast as possible" ...

First of all, I see a bug there.
for (int i = 0;i > undefinedChars.length;i++)
that I think you meant
for (int i = 0;i < undefinedChars.length;i++)
instead?
Anyway it seems that your algorithm runs in O(m*n) where m is the length of message and n is the length of undefined chars(in this case fixed size, 15). Therefore it should be efficient in run-time analysis perspective.
I would profile the scenario first then decide how to improve it, that you could've sorted the message upfront somewhere then you can only check for either 1st char or the last char of the string, but as I said, only if that's been sorted elsewhere.
Or maybe think of parallelizing the routine. It should be straightforward.

Without using memory, you're about as fast as you can get. You can trade memory for performance. For example, you can put the characters you want to check into a HashMap. Then you can loop over the string you're checking, and check if each index is in that map or not. If the number of characters you want to check for is small, this will be less efficient. If the number is big, it will be more efficient (Technically this algorithm is O(n) instead of O(n*m), but if m is small then the constants you're usually taught to ignore will matter).
Another way is to use an array of booleans, with each possible character in the string mapping to an index in that array. Set only the characters you care about to true (and save that array). Then you can avoid the hash calculation above, but at the cost of a lot of memory.
Really, your original algorithm is likely good enough. But these (especially the hash map) are things you can consider if needed.

Try using a regular expression. I find it very clean and it should not hurt your performance.
public static boolean checkMessage(String message)
{
if (message == null)
return false;
String regex = " |\\.|/|<|>|\\*|!";
Matcher matcher = Pattern.compile(regex).matcher(message);
if (matcher.find())
return true;
else
return false;
}

For symmetry and possibly some compiler optimization, why not use a for-each style loop for both loops. As an additional benefit, you wouldn't risk a typo like the one pointed out by glaze. Your code would then become:
private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};
public static boolean checkMessage(String message){
if (message == null)
return false;
char[] _message = message.toCharArray();
for (char c : _message) {
for (for u : undefinedChars)
if (c == u)
return true;
}
return false;
}
An additional optimization would be to order the characters in undefinedChars in the order most likely to occur. That way you'll bail-out as quick as possible.

Use a Set to hold your undefinedChars
Set<Character> undefinedChars = new HashSet<Character>(Arrays.asList(new Character(' ') ,new Character('/'),new Character('.')));
public boolean hasUndefinedChar(String str) {
for (int i = 0; i < str.length(); i++) {
char iChar = str.charAt(i);
Character charWrapper = new Character(iChar);
if (undefinedChars.contains(charWrapper)) {
return true;
}
}
return false;
}
This method is O(n) time efficient and does not sufficiently affect space complexity. The contains calls to the Set are O(1) operations and you make n of these contains calls in the worst case.

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.

I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.

What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.

Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().

I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it

Determine if a string can be transformed to another string when only insert/remove/replace operations are allowed

I must write a function that takes two words (strings) as arguments, and determines if the first word can be transformed into the second word using only one first-order transformation.
First-order transformations alter only one letter in a word
The allowed transformations are: insert, remove and replace
insert = insert a letter at any position in the word
remove = delete a letter from any position in the word
replace = replace a letter with another one
Any suggestions? Any Java examples would be great!

Think: If you're only allowed a single transformation, then the difference in length between the "before" and "after" words should give you a very strong hint as to which of those three transformations has any chance of being successful. By the same token, you can tell at a glance which transformations will be simply impossible.
Once you've decided on which transformation, the rest of the problem becomes a job for Brute Force Man and his sidekick, Looping Lady.

This does look like homework so I'm not going to give you the answer, but any time you approach a problem like this the best thing to do is start sketching out some ideas. Break the problem down into smaller chunks, and then it becomes easier to solve.
For example, let's look at the insert operation. To insert an letter, what is that going to do to the length of the word in which we are inserting the letter? Increase it or decrease it? If we increase the length of the word, and the length of this new word is not equal to the length of the word we are trying to match, then what does that tell you? So one condition here is that if you are going to perform an insert operation on the first word to make it match the second word, then there is a known length that the first word must be.
You can apply similar ideas to the other 2 operations.
So once you establish these conditions, it becomes easier to develop an algorithm to solve the problem.
The important thing in any type of assignment like this is to think through it. Don't just ask somebody, "give me the code", you learn nothing like that. When you get stuck, it's ok to ask for help (but show us what you've done so far), but the purpose of homework is to learn.

If you need to check if there is one and exactly one edit from s1 to s2, then this is very easy to check with a simple linear scan.
If both have the same length, then there must be exactly one index where the two differ
They must agree up to a common longest prefix, then skipping exactly one character from both, they must then agree on a common suffix
If one is shorter than the other, then the difference in length must be exactly one
They must agree up to a common longest prefix, then skipping exactly one character from the longer one, they must then agree on a common suffix
If you also allow zero edit from s1 to s2, then simply check if they're equal.
Here's a Java implementation:
static int firstDifference(String s1, String s2, int L) {
for (int i = 0; i < L; i++) {
if (s1.charAt(i) != s2.charAt(i)) {
return i;
}
}
return L;
}
static boolean oneEdit(String s1, String s2) {
if (s1.length() > s2.length()) {
return oneEdit(s2, s1);
}
final int L = s1.length();
final int index = firstDifference(s1, s2, L);
if (s1.length() == s2.length() && index != L) {
return s1.substring(index+1).equals(s2.substring(index+1));
} else if (s2.length() == L + 1) {
return s1.substring(index).equals(s2.substring(index+1));
} else {
return false;
}
}
Then we can test it as follows:
String[][] tests = {
{ "1", "" },
{ "123", "" },
{ "this", "that" },
{ "tit", "tat" },
{ "word", "sword" },
{ "desert", "dessert" },
{ "lulz", "lul" },
{ "same", "same" },
};
for (String[] test : tests) {
System.out.printf("[%s|%s] = %s%n",
test[0], test[1], oneEdit(test[0], test[1])
);
}
This prints (as seen on ideone.com):
[1|] = true
[123|] = false
[this|that] = false
[tit|tat] = true
[word|sword] = true
[desert|dessert] = true
[lulz|lul] = true
[same|same] = false

You can use the Levenshtein distance and only allow distances of 1 (which means, one char must be altered). There are several implementations just google "Levenshtein java" or so.
The other "not so smart" but working thing would be the good old brute force. Just try out every situation with every char and you get what you want. :-)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Compare first three characters of two strings - java

Strings s1 and s2 will always be of length 1 or higher. How can I speed this up? int l1 = s1.length(); if (l1 > 3) { l1 = 3; } if (s2.startsWith(s1.substring(0,l1))) { // do something.. } Regex maybe?

This seems pretty reasonable. Is this really too slow for you? You sure it's not premature optimization?

Perhaps you could do this if (s1.length() > 3 && s2.length() > 3 && s1.indexOf (s2.substring (0, 3)) == 0) { // do something.. }

Related

How to use Streams instead of a loop that contains a variable?

optimization - converting std input to integer array in java

Better way to check a String?

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

Determine if a string can be transformed to another string when only insert/remove/replace operations are allowed

Categories

Resources