Related
Problem
I want to decode a message encrypted with classic Viginere. I know that the key has a length of exactly 6 characters.
The message is:
BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM
Question
I tried a brute-force approach but unfortunately this yields an extreme amount of combinations, too many to compute.
Do you have any idea how to go from here or how to approach this problem in general?
Attempt
Here is what i have so far:
public class Main {
// instance variables - replace the example below with your own
private String message;
private String answer;
private String first;
/**
* Constructor for objects of class Main
*/
public Main()
{
// initialise instance variables
message ="BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM";
for (int x = 0; x < message.length() / 6; x++) {
int index = x * 6;
first = new StringBuilder()
.append(first)
.append(message.charAt(index))
.toString();
}
System.out.println(first);
}
}
Non-text message
In case the raw message is not actual text (like english text that makes sense) or you have no information about its content, you will be out of luck.
Especially if the text is actually hashed or double-encrypted, i.e. random stuff.
Breaking an encryption scheme requires knowledge about the algorithm and the messages. Especially in your situation, you will need to know the general structure of your messages in order to break it.
Prerequisites
For the rest of this answer, let me assume your message is actually plain english text. Note that you can easily adopt my answer to other languages. Or even adopt the techniques to other message formats.
Let me also assume that you are talking about classic Vigenere (see Wikipedia) and not about one of its many variants. That means that your input consists only of the letters A to Z, no case, no interpunction, no spaces. Example:
MYNAMEISJOHN // Instead of: My name is John.
The same also applies to your key, it only contains A to Z.
Classic Viginere then shifts by the offset in the alphabet, modulo the alphabet size (which is 26).
Example:
(G + L) % 26 = R
Dictionary
Before we talk about attacks we need to find a way to, given a generated key, find out whether it is actually correct or not.
Since we know that the message consists of english text, we can just take a dictionary (a huge list of all valid english words) and compare our decrypted message against the dictionary. If the key was wrong, the resulting message will not contain valid words (or only a few).
This can be a bit tricky since we lack interpunction (in particular, no spaces).
N-grams
Good thing that there is actually a very accurate way of measuring how valid a text is, which also solves the issue with the missing interpunction.
The technique is called N-grams (see Wikipedia). You choose a value for N, for example 3 (then called tri-grams) and start splitting your text into pairs of 3 characters. Example:
MYNAMEISJOHN // results in the trigrams:
$$M, $$MY, MYN, YNA, NAM, AME, MEI, ISJ, SJO, JOH, OHN, HN$, N$$
What you need now is a frequency analysis of the most common tri-grams in english text. There exist various sources online (or you can run it yourself on a big text corpus).
Then you simply compare your tri-gram frequency to the frequency for real text. Using that, you compute a score of how well your frequency matches the real frequency. If your message contains a lot of very uncommon tri-grams, it is highly likely to be garbage data and not real text.
A small note, mono-grams (1-gram) result in a single character frequency (see Wikipedia#Letter frequency). Bi-grams (2-gram) are used commonly for cracking Viginere and yield good results.
Attacks
Brute-Force
The first and most straightforward attack is always brute-force. And, as long as the key and the alphabet is not that big, the amount of combinations is relatively low.
Your key has length 6, the alphabet has size 26. So the amount of different key combinations is 6^26, which is
170_581_728_179_578_208_256
So about 10^20. This number might appear huge, but do not forget that CPUs operate already in the Gigahertz range (10^9 operations per second, per core). That means that a single core with 1 GHz will have generated all solutions in about 317 years. Now replace that by a powerful CPU (or even GPU) and with a multi-core machine (there are clusters with millions of cores), then this is computed in less than a day.
But okay, I get that you most likely do not have access to such a hardcore cluster. So a full brute-force is not feasible.
But do not worry. There are simple tricks to speed this up. You do not have to compute the full key. How about limiting yourself to the first 3 characters instead of the full 6 characters. You will only be able to decrypt a subset of the text then, but it is enough to analyze whether the outcome is valid text or not (using dictionaries and N-grams, as mentioned before).
This small change already drastically cuts down computation time since you then only have 3^26 combinations. Generating those takes around 2 minutes for a single 1 GHz core.
But you can do even more. Some characters are extremely rare in english text, for example Z. You can simply start by not considering keys that would translate to those values in the text. Let us say you remove the 6 least common characters by that, then your combinations are only 3^20. This takes around 100 milliseconds for a single 1 GHz core. Yes, milliseconds. That is fast enough for your average laptop.
Frequency Attack
Enough brute-force, let us do something clever. A letter frequency attack is a very common attack against those encryption schemes. It is simple, extremely fast and very successful. In fact, it is so simple that there are quite some online tools that offer this for free, for example guballa.de/vigenere-solver (it is able to crack your specific example, I just tried it out).
While Viginere changes the message to unreadable garbage, it does not change the distribution of letters, at least not per digit of the key. So if you look at, let's say the second digit of your key, from there on, every sixth letter (length of the key) in the message will be shifted by the exact same offset.
Let us take a look at a simple example. The key is BAC and the message is
CCC CCC CCC CCC CCC // raw
DCF DCF DCF DCF DCF // decrypted
As you notice, the letters repeat. Looking at the third letter, it is always F. So that means that the sixth and ninth letter, which are also F, all must be the exact same original letter. Since they where all shifted by the C from the key.
That is a very important observation. It means that letter frequency is, within a multiple of a digit of the key (k * (i + key_length)), preserved.
Let us now take a look at the letter distribution in english text (from Wikipedia):
All you have to do now is to split your message into its blocks (modulo key-length) and do a frequency analysis per digit of the blocks.
So for your specific input, this yields the blocks
BYOIZR
LAUMYX
XPFLPW
BZLMLQ
PBJMSC
...
Now you analyze the frequency for digit 1 of each block, then digit 2, and so on, until digit 6. For the first digit, this are the letters
B, L, X, B, P, ...
The result for your specific input is:
[B=0.150, E=0.107, X=0.093, L=0.079, Q=0.079, P=0.071, K=0.064, I=0.050, O=0.050, R=0.043, F=0.036, J=0.036, A=0.029, S=0.029, Y=0.021, Z=0.021, C=0.014, T=0.014, D=0.007, V=0.007]
[L=0.129, O=0.100, H=0.093, A=0.079, V=0.071, Y=0.071, B=0.057, K=0.057, U=0.050, F=0.043, P=0.043, S=0.043, Z=0.043, D=0.029, W=0.029, N=0.021, C=0.014, I=0.014, J=0.007, T=0.007]
[W=0.157, Z=0.093, K=0.079, L=0.079, V=0.079, A=0.071, G=0.071, J=0.064, O=0.050, X=0.050, D=0.043, U=0.043, S=0.036, Q=0.021, E=0.014, F=0.014, N=0.014, M=0.007, T=0.007, Y=0.007]
[M=0.150, P=0.100, Q=0.100, I=0.079, B=0.071, Z=0.071, L=0.064, W=0.064, K=0.057, V=0.043, E=0.036, A=0.029, C=0.029, N=0.029, U=0.021, H=0.014, S=0.014, D=0.007, G=0.007, J=0.007, T=0.007]
[L=0.136, Y=0.100, A=0.086, O=0.086, P=0.086, U=0.086, H=0.064, K=0.057, V=0.050, Z=0.050, S=0.043, J=0.029, M=0.021, T=0.021, W=0.021, G=0.014, I=0.014, B=0.007, C=0.007, N=0.007, R=0.007, X=0.007]
[I=0.129, M=0.107, X=0.100, L=0.086, W=0.079, S=0.064, R=0.057, H=0.050, Q=0.050, K=0.043, E=0.036, C=0.029, T=0.029, V=0.029, F=0.021, J=0.021, P=0.021, G=0.014, Y=0.014, A=0.007, D=0.007, O=0.007]
Look at it. You see that for the first digit the letter B is very common, 15%. And then letter E with 10% and so on. There is a high chance that letter B, for the first digit of the key, is an alias for E in the real text (since E is the most common letter in english text) and that the E stands for the second most common letter, namely T.
Using that you can easily reverse-compute the letter of the key used for encryption. It is obtained by
B - E % 26 = X
Note that your message distribution might not necessary align with the real distribution over all english text. Especially if the message is not that long (the longer, the more accurate is the distribution computation) or mainly consists of weird and unusual words.
You can counter that by trying out a few combinations among the highest of your distribution. So for the first digit you could try out whether
B -> E
E -> E
X -> E
L -> E
Or instead of mapping to E only, also try out the second most common character T:
B -> T
E -> T
X -> T
L -> T
The amount of combinations you get with that is very low. Use dictionaries and N-grams (as mentioned before) to validate whether the key is correct or not.
Java Implementation
Your message is actually very interesting. It perfectly aligns with the real letter frequency over english text. So for your particular case you actually do not need to try out any combinations, nor do you need to do any dictionary/n-gram checks. You can actually just translate the most common letter in your encrypted message (per digit) to the most common character in english text, E, and get the real actual key.
Since that is so simple and trivial, here is a full implementation in Java for what I explained before step by step, with some debug outputs (it is a quick prototype, not really nicely structured):
import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public final class CrackViginere {
private static final int ALPHABET_SIZE = 26;
private static final char FIRST_CHAR_IN_ALPHABET = 'A';
public static void main(final String[] args) {
String encrypted =
"BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM";
int keyLength = 6;
char mostCommonCharOverall = 'E';
// Blocks
List<String> blocks = new ArrayList<>();
for (int startIndex = 0; startIndex < encrypted.length(); startIndex += keyLength) {
int endIndex = Math.min(startIndex + keyLength, encrypted.length());
String block = encrypted.substring(startIndex, endIndex);
blocks.add(block);
}
System.out.println("Individual blocks are:");
blocks.forEach(System.out::println);
// Frequency
List<Map<Character, Integer>> digitToCounts = Stream.generate(HashMap<Character, Integer>::new)
.limit(keyLength)
.collect(Collectors.toList());
for (String block : blocks) {
for (int i = 0; i < block.length(); i++) {
char c = block.charAt(i);
Map<Character, Integer> counts = digitToCounts.get(i);
counts.compute(c, (character, count) -> count == null ? 1 : count + 1);
}
}
List<List<CharacterFrequency>> digitToFrequencies = new ArrayList<>();
for (Map<Character, Integer> counts : digitToCounts) {
int totalCharacterCount = counts.values()
.stream()
.mapToInt(Integer::intValue)
.sum();
List<CharacterFrequency> frequencies = new ArrayList<>();
for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
double frequency = entry.getValue() / (double) totalCharacterCount;
frequencies.add(new CharacterFrequency(entry.getKey(), frequency));
}
Collections.sort(frequencies);
digitToFrequencies.add(frequencies);
}
System.out.println("Frequency distribution for each digit is:");
digitToFrequencies.forEach(System.out::println);
// Guessing
StringBuilder keyBuilder = new StringBuilder();
for (List<CharacterFrequency> frequencies : digitToFrequencies) {
char mostFrequentChar = frequencies.get(0)
.getCharacter();
int keyInt = mostFrequentChar - mostCommonCharOverall;
keyInt = keyInt >= 0 ? keyInt : keyInt + ALPHABET_SIZE;
char key = (char) (FIRST_CHAR_IN_ALPHABET + keyInt);
keyBuilder.append(key);
}
String key = keyBuilder.toString();
System.out.println("The guessed key is: " + key);
System.out.println("Decrypted message:");
System.out.println(decrypt(encrypted, key));
}
private static String decrypt(String encryptedMessage, String key) {
StringBuilder decryptBuilder = new StringBuilder(encryptedMessage.length());
int digit = 0;
for (char encryptedChar : encryptedMessage.toCharArray())
{
char keyForDigit = key.charAt(digit);
int decryptedCharInt = encryptedChar - keyForDigit;
decryptedCharInt = decryptedCharInt >= 0 ? decryptedCharInt : decryptedCharInt + ALPHABET_SIZE;
char decryptedChar = (char) (decryptedCharInt + FIRST_CHAR_IN_ALPHABET);
decryptBuilder.append(decryptedChar);
digit = (digit + 1) % key.length();
}
return decryptBuilder.toString();
}
private static class CharacterFrequency implements Comparable<CharacterFrequency> {
private final char character;
private final double frequency;
private CharacterFrequency(final char character, final double frequency) {
this.character = character;
this.frequency = frequency;
}
#Override
public int compareTo(final CharacterFrequency o) {
return -1 * Double.compare(frequency, o.frequency);
}
private char getCharacter() {
return character;
}
private double getFrequency() {
return frequency;
}
#Override
public String toString() {
return character + "=" + String.format("%.3f", frequency);
}
}
}
Decrypted
Using above code, the key is:
XHSIHE
And the full decrypted message is:
ERWASNOTCERTAINDISESTEEMSURELYTHENHEMIGHTHAVEREGARDEDTHATABHORRENCEOFTHEUNINTACTSTATEWHICHHEHADINHERITEDWITHTHECREEDOFMYSTICISMASATLEASTOPENTOCORRECTIONWHENTHERESULTWASDUETOTREACHERYAREMORSESTRUCKINTOHIMTHEWORDSOFIZZHUETTNEVERQUITESTILLEDINHISMEMORYCAMEBACKTOHIMHEHADASKEDIZZIFSHELOVEDHIMANDSHEHADREPLIEDINTHEAFFIRMATIVEDIDSHELOVEHIMMORETHANTESSDIDNOSHEHADREPLIEDTESSWOULDLAYDOWNHERLIFEFORHIMANDSHEHERSELFCOULDDONOMOREHETHOUGHTOFTESSASSHEHADAPPEAREDONTHEDAYOFTHEWEDDINGHOWHEREYESHADLINGEREDUPONHIMHOWSHEHADHUNGUPONHISWORDSASIFTHEYWEREAGODSANDDURINGTHETERRIBLEEVENINGOVERTHEHEARTHWHENHERSIMPLESOULUNCOVEREDITSELFTOHISHOWPITIFULHERFACEHADLOOKEDBYTHERAYSOFTHEFIREINHERINABILITYTOREALIZETHATHISLOVEANDPROTECTIONCOULDPOSSIBLYBEWITHDRAWNTHUSFROMBEINGHERCRITICHEGREWTOBEHERADVOCATECYNICALTHINGSHEHADUTTEREDTOHIMSELFABOUTHERBUTNOMANCANBEALWAYSACYNI
Which is more or less valid english text:
er was not certain disesteem surely then he might have regarded that
abhorrence of the unintact state which he had inherited with the creed
of my sticismas at least open to correction when the result was due to
treachery are morse struck into him the words of izz huett never quite
still ed in his memory came back to him he had asked izz if she loved
him and she had replied in the affirmative did she love him more than
tess did no she had replied tess would lay down her life for him and she
herself could do no more he thought of tess as she had appeared on the day
of the wedding how here yes had lingered upon him how she had hung upon
his words as if they were a gods and during the terrible evening over
the hearth when her simple soul uncovered itself to his how pitiful her
face had looked by the rays of the fire inherinability to realize that
his love and protection could possibly be withdrawn thus from being her
critiche grew to be her advocate cynical things he had uttered to
himself about her but noman can be always acyn I
Which, by the way, is a quote from the british novel Tess of the d'Urbervilles: A Pure Woman Faithfully Presented. Phase the Sixth: The Convert, Chapter XLIX.
Standard Vigenere interleaves Caesar shift cyphers, specified by the key. If the Vigenere key is six characters long, then letters 1, 7, 13, ... of the ciphertext are on one Caesar shift -- every sixth character uses the first character of the key. Letter 2, 8, 14 ... of the ciphertext use a different (in general) Caesar shift and so on.
That gives you six different Caesar shift ciphers to solve. The text will not be in English, due to picking every sixth letter, so you will need to solve it by letter frequency. That will give you a few good options for each position of the key. Try them in order of probability to see which gives the correct decryption.
I'm trying to create a program, that will "create" a series of characters over and over, and compare them to a keyword (unknown to the user or computer). This is very similar to a "brute force" attack if you will, except this will logically build out every single letter it can.
The other thing, is that I've temporarily built this code to handle JUST 5 letter words, and have it broken out into a "value" 2D string array. I have this as a very temporary solution, to help logically discover what it is that my code is doing, before I throw it into super-dynamic and complex for-loops.
public class Sample{
static String key, keyword = "hello";
static String[] list = {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","1","2","3","3","4","5","6","7","8","9"};
int keylen = 5; // Eventually, this will be thrown into a for-loop, to get dynamic "keyword" sizes. (Will test to every word, more/less than 5 characters eventually)
public static void main(String[] args) {
String[] values = {"a", "a", "a", "a", "a"}; // More temporary hardcodes. If I can figure out the for loop, the rest can be set to dynamic values.
int changeout_pos = 0;
int counter = 0;
while(true){
if (counter == list.length){ counter = 0; changeout_pos++; } // Swap out each letter we have in list, in every position once.
// Try to swap them. (Try/catch is temporary lazy way of forcing the computer to say "we've gone through all possible combinations")
try { values[changeout_pos] = list[counter]; } catch (Exception e) { break; }
// Add up all the values in their respectful positions. Again, will be dynamic (and in a for-loop) once figured out.
key = values[0] + values[1] + values[2] + values[3] + values[4];
System.out.println(key); // Temporarily print it.
if (key.equalsIgnoreCase(keyword)){ break; } // If it matches our lovely keyword, then we're done. We've done it!
counter ++; // Try another letter.
}
System.out.println("Done! \nThe keyword was: " + key); // Should return what "Keyword" is.
}
}
My goal is to have the output look like this: (For five letter example)
aaaaa
aaaab
aaaac
...
aaaba
aaabb
aaabc
aaabd
...
aabaa
aabab
aabac
...
So on and so forth. By running this code now however, it is not what I was hoping for. Now, it will go:
aaaaa
baaaa
caaaa
daaaa
... (through until 9)
9aaaa
9baaa
9caaa
9daaa
...
99aaa
99baa
99caa
99daa
... (Until it hits 99999 without finding the "keyword")
Any help appreciated. I'm really struggling to solve this puzzle.
First of all, your alphabet is missing 0 (zero) and z. It also has 3 twice.
Second, the number of five letter words using 36 possible characters is 60,466,176. The equation is (size of alphabet)^(length of word). In this case, that is 36^5. I ran your code, and its only generating 176 permutations.
On my machine, with a basic implementation of five nested for loops, each iterating over the alphabet, it took 144 seconds to generate and print all the permutations. So, if you're getting quick results, you should check what's being generated.
Of course, manually nesting for loops isn't a valid solution for when you want the length of the word to be variable, so you still have some work to do. However, my advice would be to pay attention to the details and validate your assumptions!
Good luck.
I've been dealing with the following recursion question for a while now and haven't been able to figure it out. Basically, you have some sort of a sentence made out of certain words, where all the words are just jammed together, not spaced out. The idea is to find the number of all possible combinations of words that can be used to create the sentence.
For example,
Words: ook, ookook
Sentence: ookookook
Solution: {ook, ook, ook}, {ookook, ook}, {ook, ookook}.
Another example:
Words: ooga, oogam, oogum, mook, ook
Sentence: oogamookoogumook
Solution: {ooga, mook, oogum, ook}, {oogam, ook, oogum, ook}
I've tried a lot of things, finally giving up and trying to do it manually...
public static int WAYS(String word) {
int ways = 1;
for (int i = 0; i < word.length(); i++) {
try{
if(word.substring(i, i - 2).equals("ug")){
if(word.substring(i - 4, i - 2).equals("ug")){
ways++;
}
}
else if(word.substring(i, i - 3).contains("ook")){
System.out.println(word.substring(i-6, i-3));
if(word.substring(i - 6, i - 3).equals("ook")){
ways++;
}
if(word.charAt(i - 4) == 'm'){
if(word.substring(i - 8, i - 4).equals("ooga") || word.substring(i - 8, i - 4).equals("oogu")){
ways++;
}
}
}
else if(word.substring(i, i - 4).contains("mook")){
if(word.substring(i - 8, i - 4).contains("mook")){
ways++;
}
}
if(word.substring(i, i - 2).equals("oog")){
if(word.charAt(i + 2) == 'm'){
if(word.charAt(i + 1) == 'a' || word.charAt(i + 1) == 'u'){
ways++;
}
}
}
} catch(Exception e){
continue;
}
}
return ways;
}
But it hasn't worked. Could somebody please give me an idea or a sample on approaching this problem using recursion?
1) Name your methods properly, "WAYS" is a constant name, not a method name.
2) Provide runnable code, especially in cases where it's so short.
3) Never use Exceptions for control flow.
4) You are using magic values like "uug" and "ook" in your code? Does this look simple and obvious? Does this look maintainable? What is this supposed to look like if you get a lexicon with a million of different words?
Edit: giving the complete listing is somehow boring, so I left a few gaps. Try to fill those, hope that helps.
public class JammedWords {
public static int ways(String sentence, String[] words) {
if (sentence.isEmpty()) {
// The trivial case: the sentence is empty. Return a single number.
} else {
int c = 0;
for (String w: words) {
if (sentence.startsWith(w)) {
// call method recursively, update counter `c`.
}
}
return c;
}
}
public static void main(String[] args) {
System.out.println(ways("ookookook", new String[]{"ook", "ookook"}));
System.out.println(ways("oogamookoogumook", new String[]{"ooga","oogam","oogum","mook","ook"}));
}
}
Hints:
A) Understand the difference between empty set, set containing the empty set, set containing a set containing an empty set etc. Sets that contain empty sets are of course not empty, and their size is not 0.
B) There is a handy method String.substring(n) that drops everything before the 'n'-th character. And there is String.length() to get size of words.
Hope VB.NET code won't mind, just for the grasp.
Private Sub Go()
Dim words As New List(Of String)
words.Add("ooga")
words.Add("oogam")
words.Add("oogum")
words.Add("mook")
words.Add("ook")
Search("oogamookoogumook", words, "", New List(Of String))
End Sub
Private Sub Search(ByVal sentence As String, _
ByVal wordList As List(Of String), _
ByVal actualSentenceBuildingState As String, _
ByVal currentPath As List(Of String))
For Each word As String In wordList
Dim actualSentenceAttemp As String
Dim thisPath As New List(Of String)(currentPath)
thisPath.Add(word)
actualSentenceAttemp = actualSentenceBuildingState + word
If actualSentenceAttemp = sentence Then
Debug.Print("Found: " + String.Join("->", thisPath.ToArray()))
End If
If actualSentenceAttemp.Length < sentence.Length Then 'if we are not too far, we can continue
Search(sentence, wordList, actualSentenceAttemp, thisPath)
End If
Next
End Sub
Printouts:
Sentence: oogamookoogumook
Found: ooga->mook->oogum->ook
Found: oogam->ook->oogum->ook
Sentence: ookookook
Found: ook->ook->ook
Found: ook->ookook
Found: ookook->ook
Think about it as walking in graph (its nothing else than that in fact). You start with nothing (empty string). Now you start to iteratively add words from wordlist into your 'current attemp for sentence'. After adding word to current attemp, you can end only in three possible states: (1) you got the final sentence, (2) current attemp is shorter than target sentence and thus still suitable for adding next words (recursion call), or (3), your current attemp is longer (or the same length but not equal) than target sequence, thus it has no meaning to continue in search with it.
What you have to remember is path -- "how did i get here?" list (back tracking).
I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).
i am developing high-level petri net editor / simulator. At first, here is a little of vocabulary
circle = place
rectangle = transition
integers in place = tokens
condition in transition = guard
And im stucked at passing the guard of the transition. Guard is a condition, that needs to be true if you want to execute the transition. I know that i should use backtracking somehow, but i dont know number of places entering the transition before the program start, So i cant use for loops since i dont know how many of them i will need.
Here is the picture that illustrates the problem
So, i want to take first token from first place, first token from second place, then try to pass the guard, if passed, then save tokens, and break the loop, if false, continue with second token of second place..etc...
i finally pass guard with last token (4) of first place, and last token(2) of second place.
I would know how to code this, if i had constant number of places entering the transition, it would looks like this
for token in place 1
for token in place 2
try pass guard
if (passed)
save tokens
break;
but as i said before, i dont have constant number of places entering transition, so i cant use this approach.
So, basically, i need to try combinations of tokens, and try to pass the guard - until i passed the guard, or until i tried all combinations.
Do you have any ideas ? pseudocode would be enough.
By the way i use these datastructure
list of places - normal java list, List places = new ArrayList();
and each place has its own list of tokens, List tokens = new ArrayList();
///EDIT:
the guard has following format:
op1 rel op2,
where op1 is variable, and op2 is constant or variable, rel is relation (<,>,=,...)
there can be several conditions in guard connected with the logical operator AND - example:
op1 rel op2 && op3 rel op4 ...
----EDIT:
So i tried to implement Rushil algorithm, but it is quite buggy, so im posting SSCCE so you can try it and maybe help a little.
First , create Place class:
public class Place {
public List<Integer> tokens ;
//constructor
public Place() {
this.tokens = new ArrayList<Integer>();
}
}
And then testing class:
public class TestyParmutace {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
List<Place> places = new ArrayList<Place>();
Place place1 = new Place();
place1.tokens.add(1);
place1.tokens.add(2);
place1.tokens.add(3);
places.add(place1); //add place to the list
Place place2 = new Place();
place2.tokens.add(3);
place2.tokens.add(4);
place2.tokens.add(5);
places.add(place2); //add place to the list
Place place3 = new Place();
place3.tokens.add(6);
place3.tokens.add(7);
place3.tokens.add(8);
places.add(place3); //add place to the list
//so we have
//P1 = {1,2,3}
//P2 = {3,4,5}
//P3 = {6,7,8}
List<Integer> tokens = new ArrayList<Integer>();
Func(places,0,tokens);
}
/**
*
* #param places list of places
* #param index index of current place
* #param tokens list of tokens
* #return true if we passed guard, false if we did not
*/
public static boolean Func( List<Place> places, int index, List<Integer> tokens)
{
if (index >= places.size())
{
// if control reaches here, it means that we've recursed through a particular combination
// ( consisting of exactly 1 token from each place ), and there are no more "places" left
String outputTokens = "";
for (int i = 0; i< tokens.size(); i++) {
outputTokens+= tokens.get(i) +",";
}
System.out.println("Tokens: "+outputTokens);
if (tokens.get(0) == 4 && tokens.get(1) == 5 && tokens.get(2) == 10) {
System.out.println("we passed the guard with 3,5,8");
return true;
}
else {
tokens.remove(tokens.get(tokens.size()-1));
return false;
}
}
Place p = places.get(index);
for (int i = 0; i< p.tokens.size(); i++)
{
tokens.add(p.tokens.get(i));
//System.out.println("Pridali sme token:" + p.tokens.get(i));
if ( Func( places, index+1, tokens ) ) return true;
}
if (tokens.size()>0)
tokens.remove(tokens.get(tokens.size()-1));
return false;
}
}
and here is the output of this code:
Tokens: 1,3,6,
Tokens: 1,3,7,
Tokens: 1,3,8,
Tokens: 3,4,6,
Tokens: 3,4,7,
Tokens: 3,4,8,
Tokens: 4,5,6,
Tokens: 4,5,7,
Tokens: 4,5,8,
Tokens: 2,3,6,
Tokens: 2,3,7,
Tokens: 2,3,8,
Tokens: 3,4,6,
Tokens: 3,4,7,
Tokens: 3,4,8,
Tokens: 4,5,6,
Tokens: 4,5,7,
Tokens: 4,5,8,
Tokens: 3,3,6,
Tokens: 3,3,7,
Tokens: 3,3,8,
Tokens: 3,4,6,
Tokens: 3,4,7,
Tokens: 3,4,8,
Tokens: 4,5,6,
Tokens: 4,5,7,
Tokens: 4,5,8,
So, you see, some combinations are correct, like 1,3,6, and 1,3,7... but 4,5,8 is absolute nonsense, since 4 is not even in the first place... and there are also combinations that are missing ompletely..like 2,4,6 etc... anybody see why is it like this ?
EDIT: Now it's working fine.
A recursive approach would make it easy:
boolean Func( ListOfPlaces places, int index ) // index points to the current "place"
{
If index >= NumberOfTokens (if index points to a place, which doesn't exist)
{
// if control reaches here, it means that we've recursed through a particular combination ( consisting of exactly 1 token from each place ), and there are no more "places" left. You have all the tokens you need, in your stack.
try pass guard; if passed, save tokens and return true
else, remove token last added to the stack & and return false
}
place p = places[index]
foreach token in p
{
add token to your stack
if ( Func( places, index+1 ) ) return true
}
remove last token added to the stack
return false
}
Call the function initially with index = 0.
Hope this helps. :-)
You could do the loop administration yourself. What I mean is: you would need a class to depict the iteration status for each place. Lets call it state_of_place. It would consist of two values: a current_index and a max_index.
Next you would have a class I would name iteration_admin, which consists of an array of state_of_place and a boolean called something like iteration_in_progress. Upon creation, the boolean is set to TRUE. You would create as many state_of_place objects as there are places. Current_index would be 0, max_index would be the number of tokens on that place.
The iteration_admin class needs a method to represent the increment of loop variables. Lets call it increment(). This method would increment the current_index of the state_of_place element with the highest index, if the current_index is still below the max_index.
If the current_index is equal to the max_index, the current index is set to 0 and the current index of the state_of_place with the next lower index needs to be incremented.
If that one has reached its max_index, it will be set to 0 and the next lower one will be incremented, and so on.
Only exception, of course, is state_of_place[0]. If that elements current_index would exceed its max_index, the boolean iteration_in_progress will be set to FALSE. This would mean, that all combinations of tokens have been used.
Now, your code for trying out the guard would
initialize an object of type iteration_admin
while iteration_admin.iteration_in_progress is TRUE
build the argument list for the pass() method by using the current_index values in the state_of_place elements
call pass()
if not passed, call the iteration_admin.increment() method
end while
EDIT:
Trying to express the idea in pseudo code. I fear it looks more like a mix of Java and PL/SQL than abstract pseudo code. Still, it should be somewhat clearer than my text description.
// iteration state for one place
class state_of_a_place
{
integer current_index;
integer max_index;
}
// iteration administration for one transition
class iteration_admin
{
boolean iteration_in_progress
state_of_a_place[] state_of_places
procedure increment
{
// move index through tokens
FOR i IN state_of_places.count-1 .. 0 LOOP
IF state_of_places[i].current_index < state_of_places[i].max_index THEN
state_of_places[i].current_index += 1
return
ELSE
state_of_places[i].current_index = 0
IF i = 0 THEN
iteration_in_progress = FALSE
END IF
END IF
END FOR
}
}
handle_transition (list_of_places)
{
// initialize an object of type iteration_admin
iteration_admin ia
ia.iteration_in_progress = TRUE
FOR i IN 0..list_of_places.count LOOP
ia.state_of_places[i].current_index = 0
ia.state_of_places[i].max_index = list_of_places[i].number_of_tokens
END FOR
WHILE ia.iteration_in_progress LOOP
// build the argument list for the pass() method
token[] arguments
FOR i IN 0..list_of_places.count LOOP
arguments[i] = list_of_places[i].token[ia.state_of_places[i].current_index]
END FOR
// try to pass the guard
call pass(arguments)
IF (passed)
// do whatever you need to do here
ELSE
ia.increment()
END IF
END WHILE
}
What about something like this:
method getPassed(places, tokens):
if places are empty:
try pass guard
if (passed)
save tokens
return true
else return false
else:
for token in places[0]:
if getPassed(places[1:].clone(), tokens.clone().append(token)):
break
Start it with call getPassed(places, []), where places is a list of places and [] is empty list. Note that you need to copy the lists always, so that you don't end up messing them up.
In the end, no need for pairs. If you keep the original places list you pass into the algorithm at the beginning, you know that token[i] was selected for originalPlaces[i].
But if you want to, you can keep tokenPlaces pairs instead of tokens, so something like this:
method getPassed(places, tokenPlacePairs):
if places are empty:
try pass guard
if (passed)
save tokens
return true
else return false
else:
for token in places[0]:
if getPassed(places[1:].clone(), tokens.clone().append((token, places[0]))):
break
EDIT: Still some confusion, hopefully this will make it clear. I am trying to generate the for loops recursively. So if places has only 2 elements, you get as you suggested:
for token in place 1
for token in place 2
try pass guard
if (passed)
save tokens
break;
So what it does is that it takes the first place from the list and creates the "for token in place 1" loop. Then it cuts of that place from the places list and adds the current token to the tokens list. This recursive call now does the "for token in place 2" loop. And so on. Every recursive call we decrease the number of places by 1 and create 1 for loop. Hence, after the places list is empty we have n nested loops, where n is the number of places and as far as I understand, this is what you were looking for.
You can initiate the method in the following way:
originalPlaces = places.clone()
getPassed(places, [])
This way you can keep the originalPlaces unchanged and you can assign tokens[i] to originalPlaces[i] when you get to the base case in the recursion, i.e. when you try to determine the passing the guard. Hence you do not really need the pairs.
Assume Transition has an isEnabled() method as well as input/outputArcs:
public boolean isEnabled() {
// check for some special input/output conditions (no arcs, etc.)
// return false if invalid
// check to see if all input arcs are enabled
for (Arc inputArc : inputArcs)
if (!inputArc.isEnabled())
return false;
// should check if there's a guard first...
return guard.evaluate(); // do the selection of tokens from inputs here and evaluate
}