Best way to find character at specific index in a compressed String - java

I know that we can do this by following ways
StringBuilder
Use substring
But i am looking a way where i have a compressed String say a5b4c2 etc which means a is 5 times b is 4 times etc so String is actually aaaaabbbbcc something like that.
So char at index 2 should return a and char at index 6 should return b.
What can be the best approach for this?
My question is more about what is the best approach to decompress String ?

My question is more about handling this compressed string rather than finding the character at specific index.
Decompress the string until you get the index you want to know. Or you could decompress the whole string and cache it.
What can be the best approach for this?
Without any more specific requirements, I believe the best approach is the simplest approach you can think of.
I would, parse each pair of the letter and the number in turn, reduce the index by that number and if the remaining index is < 0 you have the letter you want.

Check what index you are searching for, and start adding up the numbers of characters. Every time you add, check if the index falls within the previous interval and the current one. If it does, you've found what your character is, otherwise add again.
For example, the workflow given your string a5b4c2, if you want the character at index 7, could be like this:
current position: 0
index we are looking for: 7
add first character's count: 0+5 = 5
does 7 fall within 0 and 5? no, add again
current position: 5
add second character's count: 5+4 = 9
does 7 fall within 5 and 9? yes, so our character must be 'b'.
I'm not sure if this is more efficient or faster than decompressing the string and just using charAt() or something, it's just a different way of approaching it.
EDIT: Since the question is more about how to decompress the string, you could use a StringBuilder and use a for loop to append the correct number of the character to your string... sounds like the simplest way to me.

Related

boolean character compare - trying to compare three characters at once in Java

I am trying to write a piece of code to simulate text messaging on an old keypad. 2 = a,b,c 3 = def etc.
I can read the string and pull out the character but I am trying to develop an elegant way in Java of mapping the character to the number.
I could use the Character.compare. But I am going to have to compare my character with the full alphabet.
compareOneTwo = Character.compare(ch[r], 'a'); etc
I would rather use a Boolean function that compares three characters at once using an "or"
if(ch[r] = 'a'||'b'||'c') {
But I am struggling - with getting this to work.
I appreciate that this is basic and probably a silly mistake but we all have to start somewhere...
Any help will be appreciated.
You said it yourself, you are trying to find a way to map the characters to the numbers, so use a map!
Map<Character, Integer> characters = new HashMap<>();
characters.put('a', 2);
characters.put('b', 2);
characters.put('c', 2);
characters.put('d', 3);
...
characters.put('z', 0);
Integer number = characters.get('a');
System.out.println(number); // Will print '2'
The initial setup is a bit more code since you have to specify the whole alphabet, but store it in a static variable and it'll be done once for your whole application.
This will definitely yield the best performances in terms of speed, and regarding memory usage, it's only 26 characters and as many integers, so negligible :)
Another advantage is that it is easy to update, if you need to handle a new character like *, just add one row to the map and it's done!
You can't use the OR operator as you wish.
You do have other alternatives if you don't want to have many conditions connected by OR (||) operators.
You can create a Set and use contains:
if (Set.of('a','b','c').contains(ch[r])) {}
Or you can use a range of characters if you need to check for a consecutive range:
if (ch[r] >= 'a' && ch[r] <= 'c') {}

Java Code optimization - Fastest list to search and remove?

so I've been working on a task. I have two giant strings, both consist of the same characters just scrambled. The task is to find the lowest possible number of changes you can make to turn the first string in the other one, while 1 change = switching neighbour chars in the string. I found a solution that works just fine but there is a problem. It works under 5 seconds only for input of about 100 000 char Strings. I need to make it work for up to 1000 000 char. I tried ArrayList, LinkedList, regular arrays, substrings and different variations of the algorythm, this one is the best so far I came up with but I'm out of ideas. Any help? Is there a faster collection I can use? Maybe the algoryth is wrong here?
"jas" ArrayList is the first string converted into a list
"mal" is the other one. "steps" is the output
int steps=0;
int index=0;
while(jas.size()>1) {
if(jas.get(0)!=mal.get(index)) {
int distance = jas.indexOf(mal.get(index));
jas.remove(distance);
steps+=distance;
} else {
jas.remove(0);
}
index++;
}
System.out.println(steps);
Thanks in advance for any ideas!
I know its a bit off topic here (you should go to codereview) but idea I got is to use something already implemented in Java like:
listToSort.sort(Comparator.comparing(listWithOrder::indexOf));
Go to functions the function and look at them and you can either take them out and create your own based on that and put a counting there or inspire by it.
I believe that whatever is implemented there is probably very fast.

How to replace string with irregular size in java

I have this sample string "AHHHAAAAAARTFUHLAAAAAHV" and i want to minimized it to "AHHHAARTFUHLAHV". Notice that the first occurrence of successive A's is 6 and the next is 5. So I can't use:
System.out.println("AHHHAAAAAARTFUHLAAAAAHV".replaceAll("A{4}(?!A)", ""));
because of the irregular size of A's is there anyway around to make it "AHHHAARTFUHLAHV"?
You could try to manipulate the string. Like look for the char A and if it appears more than x times consecutively you can just eliminate the other ones. I hope this helps somehow

Spell checker solution in java

I need to implement a spell checker in java , let me give you an example for a string lets say "sch aproblm iseasili solved" my output is "such a problem is easily solved".The maximum length of the string to correct is 64.As you can see my string can have spaces inserted in the wrong places or not at all and even misspelled words.I need a little help in finding a efficient algorithm of coming up with the corrected string. I am currently trying to delete all spaces in my string and inserting spaces in every possible position , so lets say for the word (it apply to a sentence as well) "hot" i generate the next possible strings to afterwords be corrected word by word using levenshtein distance : h o t ; h ot; ho t; hot. As you can see i have generated 2^(string.length() -1) possible strings. So for a string with a length of 64 it will generate 2^63 possible strings, which is damn high, and afterwords i need to process them one by one and select the best one by a different set of parameters such as : - total editing distance (must take the smallest one)
-if i have more strings with same editing distance i have to choose the one with the fewer number of words
-if i have more strings with the same number of words i need to choose the one with the total maximum frequency the words have( i have a dictionary of the most frequent 8000 words along with their frequency )
-and finally if there are more strings with the same total frequency i have to take the smallest lexicographic one.
So basically i generate all possible strings (inserting spaces in all possible positions into the original string) and then one by one i calculate their total editing distance, nr of words ,etc. and then choose the best one, and output the corrected string. I want to know if there is a easier(in terms of efficiency) way of doing this , like not having to generate all possible combinations of strings etc.
EDIT:So i thought that i should take another approach on this one.Here is what i have in mind: I take the first letter from my string , and extract from the dictionary all the words that begin with that letter.After that i process all of them and extract from my string all possible first words. I will remain at my previous example , for the word "hot" by generating all possible combinations i got 4 results , but with my new algorithm i obtain only 2 "hot" , and "ho" , so it's already an improvement.Though i need a little bit of help in creating a recursive or PD algorithm for doing this . I need a way to store all possible strings for the first word , then for all of those all possible strings for the second word and so on and finally to concatenate all possibilities and add them into an array or something. There will still be a lot of combinations for large strings but not as many as having to do ALL of them. Can someone help me with a pseudocode or something , as this is not my strong suit.
EDIT2: here is the code where i generate all the possible first word from my string http://pastebin.com/d5AtZcth .I need to somehow implement this to do the same for the rest and combine for each first word with each second word and so on , and store all these concatenated into an array or something.
A few tips for you:
try correcting just small parts of the string, not everything at once.
90% of erros (IIRC) have 1 edit distance from the source.
you can use a phonetic index to match words against words that sound alike.
you can assume most typos are QWERTY errors (j=>k, h=>g), and try to check them first.
A few more ideas can be found in this nice article:
http://norvig.com/spell-correct.html

Help me understand question related to HashMap in Java

Im given a task which i am a little confused to understand. Here is the question statement:
The following program should read a file and store all its tokens in a member variable.
Your task is to write a single method that returns the number of items in tokenMap, the average length (as double value) of the elements in tokenMap, and the number of tokens starting with character "a".
Here the tokenMap is an object of type HashMap<String, Integer>;
I do have some idea about HashMap but what i want to know the "key value" for HashMap required is a single character or the whole word?? that i should store in tokenMap.
Also how can i compute the average length?
Looks like you have to use the entire word as the key.
The average length of tokens can be computed by summing the lengths of each token and dividing by the number of tokens.
In Java, you can find the number of tokens in the HashMap by tokenMap.size().
You can write loops that visit each member of the map like this:
for(String t: tokenMap.values()){
//t is a token
}
and if you look up String in the Java API docs you will see that it is easy to find the length of a String.
To compute the average length of the items in a hash map, you'll have to iterate over them all and count the length and calculate the average.
As for your other question about what to use for a key, how are we supposed to know? A hashmap can use practically any* value for a key.
*The value must be hashable, which is defined differently for different languages.
Reading the question closely, it seems that you have to read a file, extract each word and use it as the key value, and store the length of each key as the integer:
an example line
leads to a HashMap like this
an : 2
example : 7
line : 4
After you've built your map (made of keys mapping to entries, or seemingly elements in the question), you'll need to run some statistics over it to find
the number of keys (look at HashMap)
the average length of all keys (again, simple enough)
the number beginning with "a" (just look at the String)
Then make a value object containing these values and return it from the method that does the statistics.
I know I've given more information that you require, but someone else may benefit from a little extra help.
Guys there is some confusion. Im not asking for a solution. Im just confused for one thing.
For the time being, im gonna use String type as the key type.
The only confusion i have is once i read the file line by line, should i split it based upon words or based upon each character. So that the key value should be a single character type string or a String of whole word.
If you can go through the question statement, what do you suggest. That's all im asking.
should i split it based upon words or
based upon each character
The requirement is to make tokens, so you should split them based on words. Each word becomes a unique String key. It would make sense for the value to be the count of each token.
If the file you are reading has these three lines:
int alpha;
int beta;
float delta;
Then you should have something like
<"int", 2>
<";", 3>
<"alpha", 1>
<"beta", 1>
<"float", 1>
<"delta", 1>
(The semicolon may or may not be considered a token.)
Your average length would be ( 3x2 + 3x1 + 5 + 4 + 5 + 5) / 6.
Your length of tokens starting with "a" would be 5.0.
Look elsewhere on this forum for keySet and you should be good to go.

Categories