How to replace string with irregular size in java - java

I have this sample string "AHHHAAAAAARTFUHLAAAAAHV" and i want to minimized it to "AHHHAARTFUHLAHV". Notice that the first occurrence of successive A's is 6 and the next is 5. So I can't use:
System.out.println("AHHHAAAAAARTFUHLAAAAAHV".replaceAll("A{4}(?!A)", ""));
because of the irregular size of A's is there anyway around to make it "AHHHAARTFUHLAHV"?

You could try to manipulate the string. Like look for the char A and if it appears more than x times consecutively you can just eliminate the other ones. I hope this helps somehow

Related

Java Code optimization - Fastest list to search and remove?

so I've been working on a task. I have two giant strings, both consist of the same characters just scrambled. The task is to find the lowest possible number of changes you can make to turn the first string in the other one, while 1 change = switching neighbour chars in the string. I found a solution that works just fine but there is a problem. It works under 5 seconds only for input of about 100 000 char Strings. I need to make it work for up to 1000 000 char. I tried ArrayList, LinkedList, regular arrays, substrings and different variations of the algorythm, this one is the best so far I came up with but I'm out of ideas. Any help? Is there a faster collection I can use? Maybe the algoryth is wrong here?
"jas" ArrayList is the first string converted into a list
"mal" is the other one. "steps" is the output
int steps=0;
int index=0;
while(jas.size()>1) {
if(jas.get(0)!=mal.get(index)) {
int distance = jas.indexOf(mal.get(index));
jas.remove(distance);
steps+=distance;
} else {
jas.remove(0);
}
index++;
}
System.out.println(steps);
Thanks in advance for any ideas!
I know its a bit off topic here (you should go to codereview) but idea I got is to use something already implemented in Java like:
listToSort.sort(Comparator.comparing(listWithOrder::indexOf));
Go to functions the function and look at them and you can either take them out and create your own based on that and put a counting there or inspire by it.
I believe that whatever is implemented there is probably very fast.

Multiple arithimetic operator store

I have been thinking to solve any problem like 1+2*4-5 with user entering it and program to solve it. I've read some questions on this site about storing arithmetic operator and the solution says to check by using switch which can't be applied here. I would be thankful if anybody could suggest any idea of how to make it.
I had a similar exercise not long ago, but in the question it was stated that the seperation is a space. So the user input would be 1 + 2 * 4 - 5, and i solved it that way. I will give you some tips but not paste the whole code.
-you read the input as a String
-you can use the String.split() method to devide the String into the pieces you need and they will be put in an array.(in this case: strArray[0]='1',strArray[1]='+', etc)
-you will need a for-loop to go trough every String in the array:
-the decimals will need to be converted to integers with the Integer.parseInt() method.
-The + - * / will need to be put in switch-statement.
(be careful how you construct your loop, think about how many times you want to go trough it and what you need in each loop)
I hope these tips helped.

Best way to find character at specific index in a compressed String

I know that we can do this by following ways
StringBuilder
Use substring
But i am looking a way where i have a compressed String say a5b4c2 etc which means a is 5 times b is 4 times etc so String is actually aaaaabbbbcc something like that.
So char at index 2 should return a and char at index 6 should return b.
What can be the best approach for this?
My question is more about what is the best approach to decompress String ?
My question is more about handling this compressed string rather than finding the character at specific index.
Decompress the string until you get the index you want to know. Or you could decompress the whole string and cache it.
What can be the best approach for this?
Without any more specific requirements, I believe the best approach is the simplest approach you can think of.
I would, parse each pair of the letter and the number in turn, reduce the index by that number and if the remaining index is < 0 you have the letter you want.
Check what index you are searching for, and start adding up the numbers of characters. Every time you add, check if the index falls within the previous interval and the current one. If it does, you've found what your character is, otherwise add again.
For example, the workflow given your string a5b4c2, if you want the character at index 7, could be like this:
current position: 0
index we are looking for: 7
add first character's count: 0+5 = 5
does 7 fall within 0 and 5? no, add again
current position: 5
add second character's count: 5+4 = 9
does 7 fall within 5 and 9? yes, so our character must be 'b'.
I'm not sure if this is more efficient or faster than decompressing the string and just using charAt() or something, it's just a different way of approaching it.
EDIT: Since the question is more about how to decompress the string, you could use a StringBuilder and use a for loop to append the correct number of the character to your string... sounds like the simplest way to me.

String.substring vs String[].split

I have a comma delaminated string that when calling String.split(",") it returns an array size of about 60. In a specific use case I only need to get the value of the second value that would be returned from the array. So for example "Q,BAC,233,sdf,sdf," all I want is the value of the string after the first ',' and before the second ','. The question I have for performance am I better off parsing it myself using substring or using the split method and then get the second value in the array? Any input would be appreciated. This method will get called hundreds of times a second so it's important I understand the best approach regarding performance and memory allocation.
-Duncan
Since String.Split returns a string[], using a 60-way Split would result in about sixty needless allocations per line. Split goes through your entire string, and creates sixty new object plus the array object itself. Of these sixty one objects you keep exactly one, and let garbage collector deal with the remaining sixty.
If you are calling this in a tight loop, a substring would definitely be more efficient: it goes through the portion of your string up to the second comma ,, and then creates one new object that you keep.
String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);
The above prints brown
When you run this multiple times, the substring wins on time hands down: 1,000,000 iterations of split take 3.36s, while 1,000,000 iterations of substring take only 0.05s. And that's with only eight components in the string! The difference for sixty components would be even more drastic.
ofcourse why iterate through whole string, just use substring() and indexOf()
You are certainly better off doing it by hand for two reasons:
.split() takes a string as an argument, but this string is interpreted as a Pattern, and for your use case Pattern is costly;
as you say, you only need the second element: the algorithm to grab that second element is simple enough to do by hand.
I would use something like:
final int first = searchString.indexOf(",");
final int second = searchString.indexOf(",", first+1);
String result= searchString.substring(first+1, second);
My first inclination would be to find the index of the first and second commas and take the substring.
The only real way to tell for sure, though, is to test each in your particular scenario. Break out the appropriate stopwatch and measure the two.

Spell checker solution in java

I need to implement a spell checker in java , let me give you an example for a string lets say "sch aproblm iseasili solved" my output is "such a problem is easily solved".The maximum length of the string to correct is 64.As you can see my string can have spaces inserted in the wrong places or not at all and even misspelled words.I need a little help in finding a efficient algorithm of coming up with the corrected string. I am currently trying to delete all spaces in my string and inserting spaces in every possible position , so lets say for the word (it apply to a sentence as well) "hot" i generate the next possible strings to afterwords be corrected word by word using levenshtein distance : h o t ; h ot; ho t; hot. As you can see i have generated 2^(string.length() -1) possible strings. So for a string with a length of 64 it will generate 2^63 possible strings, which is damn high, and afterwords i need to process them one by one and select the best one by a different set of parameters such as : - total editing distance (must take the smallest one)
-if i have more strings with same editing distance i have to choose the one with the fewer number of words
-if i have more strings with the same number of words i need to choose the one with the total maximum frequency the words have( i have a dictionary of the most frequent 8000 words along with their frequency )
-and finally if there are more strings with the same total frequency i have to take the smallest lexicographic one.
So basically i generate all possible strings (inserting spaces in all possible positions into the original string) and then one by one i calculate their total editing distance, nr of words ,etc. and then choose the best one, and output the corrected string. I want to know if there is a easier(in terms of efficiency) way of doing this , like not having to generate all possible combinations of strings etc.
EDIT:So i thought that i should take another approach on this one.Here is what i have in mind: I take the first letter from my string , and extract from the dictionary all the words that begin with that letter.After that i process all of them and extract from my string all possible first words. I will remain at my previous example , for the word "hot" by generating all possible combinations i got 4 results , but with my new algorithm i obtain only 2 "hot" , and "ho" , so it's already an improvement.Though i need a little bit of help in creating a recursive or PD algorithm for doing this . I need a way to store all possible strings for the first word , then for all of those all possible strings for the second word and so on and finally to concatenate all possibilities and add them into an array or something. There will still be a lot of combinations for large strings but not as many as having to do ALL of them. Can someone help me with a pseudocode or something , as this is not my strong suit.
EDIT2: here is the code where i generate all the possible first word from my string http://pastebin.com/d5AtZcth .I need to somehow implement this to do the same for the rest and combine for each first word with each second word and so on , and store all these concatenated into an array or something.
A few tips for you:
try correcting just small parts of the string, not everything at once.
90% of erros (IIRC) have 1 edit distance from the source.
you can use a phonetic index to match words against words that sound alike.
you can assume most typos are QWERTY errors (j=>k, h=>g), and try to check them first.
A few more ideas can be found in this nice article:
http://norvig.com/spell-correct.html

Categories