Related
Problem
I want to decode a message encrypted with classic Viginere. I know that the key has a length of exactly 6 characters.
The message is:
BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM
Question
I tried a brute-force approach but unfortunately this yields an extreme amount of combinations, too many to compute.
Do you have any idea how to go from here or how to approach this problem in general?
Attempt
Here is what i have so far:
public class Main {
// instance variables - replace the example below with your own
private String message;
private String answer;
private String first;
/**
* Constructor for objects of class Main
*/
public Main()
{
// initialise instance variables
message ="BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM";
for (int x = 0; x < message.length() / 6; x++) {
int index = x * 6;
first = new StringBuilder()
.append(first)
.append(message.charAt(index))
.toString();
}
System.out.println(first);
}
}
Non-text message
In case the raw message is not actual text (like english text that makes sense) or you have no information about its content, you will be out of luck.
Especially if the text is actually hashed or double-encrypted, i.e. random stuff.
Breaking an encryption scheme requires knowledge about the algorithm and the messages. Especially in your situation, you will need to know the general structure of your messages in order to break it.
Prerequisites
For the rest of this answer, let me assume your message is actually plain english text. Note that you can easily adopt my answer to other languages. Or even adopt the techniques to other message formats.
Let me also assume that you are talking about classic Vigenere (see Wikipedia) and not about one of its many variants. That means that your input consists only of the letters A to Z, no case, no interpunction, no spaces. Example:
MYNAMEISJOHN // Instead of: My name is John.
The same also applies to your key, it only contains A to Z.
Classic Viginere then shifts by the offset in the alphabet, modulo the alphabet size (which is 26).
Example:
(G + L) % 26 = R
Dictionary
Before we talk about attacks we need to find a way to, given a generated key, find out whether it is actually correct or not.
Since we know that the message consists of english text, we can just take a dictionary (a huge list of all valid english words) and compare our decrypted message against the dictionary. If the key was wrong, the resulting message will not contain valid words (or only a few).
This can be a bit tricky since we lack interpunction (in particular, no spaces).
N-grams
Good thing that there is actually a very accurate way of measuring how valid a text is, which also solves the issue with the missing interpunction.
The technique is called N-grams (see Wikipedia). You choose a value for N, for example 3 (then called tri-grams) and start splitting your text into pairs of 3 characters. Example:
MYNAMEISJOHN // results in the trigrams:
$$M, $$MY, MYN, YNA, NAM, AME, MEI, ISJ, SJO, JOH, OHN, HN$, N$$
What you need now is a frequency analysis of the most common tri-grams in english text. There exist various sources online (or you can run it yourself on a big text corpus).
Then you simply compare your tri-gram frequency to the frequency for real text. Using that, you compute a score of how well your frequency matches the real frequency. If your message contains a lot of very uncommon tri-grams, it is highly likely to be garbage data and not real text.
A small note, mono-grams (1-gram) result in a single character frequency (see Wikipedia#Letter frequency). Bi-grams (2-gram) are used commonly for cracking Viginere and yield good results.
Attacks
Brute-Force
The first and most straightforward attack is always brute-force. And, as long as the key and the alphabet is not that big, the amount of combinations is relatively low.
Your key has length 6, the alphabet has size 26. So the amount of different key combinations is 6^26, which is
170_581_728_179_578_208_256
So about 10^20. This number might appear huge, but do not forget that CPUs operate already in the Gigahertz range (10^9 operations per second, per core). That means that a single core with 1 GHz will have generated all solutions in about 317 years. Now replace that by a powerful CPU (or even GPU) and with a multi-core machine (there are clusters with millions of cores), then this is computed in less than a day.
But okay, I get that you most likely do not have access to such a hardcore cluster. So a full brute-force is not feasible.
But do not worry. There are simple tricks to speed this up. You do not have to compute the full key. How about limiting yourself to the first 3 characters instead of the full 6 characters. You will only be able to decrypt a subset of the text then, but it is enough to analyze whether the outcome is valid text or not (using dictionaries and N-grams, as mentioned before).
This small change already drastically cuts down computation time since you then only have 3^26 combinations. Generating those takes around 2 minutes for a single 1 GHz core.
But you can do even more. Some characters are extremely rare in english text, for example Z. You can simply start by not considering keys that would translate to those values in the text. Let us say you remove the 6 least common characters by that, then your combinations are only 3^20. This takes around 100 milliseconds for a single 1 GHz core. Yes, milliseconds. That is fast enough for your average laptop.
Frequency Attack
Enough brute-force, let us do something clever. A letter frequency attack is a very common attack against those encryption schemes. It is simple, extremely fast and very successful. In fact, it is so simple that there are quite some online tools that offer this for free, for example guballa.de/vigenere-solver (it is able to crack your specific example, I just tried it out).
While Viginere changes the message to unreadable garbage, it does not change the distribution of letters, at least not per digit of the key. So if you look at, let's say the second digit of your key, from there on, every sixth letter (length of the key) in the message will be shifted by the exact same offset.
Let us take a look at a simple example. The key is BAC and the message is
CCC CCC CCC CCC CCC // raw
DCF DCF DCF DCF DCF // decrypted
As you notice, the letters repeat. Looking at the third letter, it is always F. So that means that the sixth and ninth letter, which are also F, all must be the exact same original letter. Since they where all shifted by the C from the key.
That is a very important observation. It means that letter frequency is, within a multiple of a digit of the key (k * (i + key_length)), preserved.
Let us now take a look at the letter distribution in english text (from Wikipedia):
All you have to do now is to split your message into its blocks (modulo key-length) and do a frequency analysis per digit of the blocks.
So for your specific input, this yields the blocks
BYOIZR
LAUMYX
XPFLPW
BZLMLQ
PBJMSC
...
Now you analyze the frequency for digit 1 of each block, then digit 2, and so on, until digit 6. For the first digit, this are the letters
B, L, X, B, P, ...
The result for your specific input is:
[B=0.150, E=0.107, X=0.093, L=0.079, Q=0.079, P=0.071, K=0.064, I=0.050, O=0.050, R=0.043, F=0.036, J=0.036, A=0.029, S=0.029, Y=0.021, Z=0.021, C=0.014, T=0.014, D=0.007, V=0.007]
[L=0.129, O=0.100, H=0.093, A=0.079, V=0.071, Y=0.071, B=0.057, K=0.057, U=0.050, F=0.043, P=0.043, S=0.043, Z=0.043, D=0.029, W=0.029, N=0.021, C=0.014, I=0.014, J=0.007, T=0.007]
[W=0.157, Z=0.093, K=0.079, L=0.079, V=0.079, A=0.071, G=0.071, J=0.064, O=0.050, X=0.050, D=0.043, U=0.043, S=0.036, Q=0.021, E=0.014, F=0.014, N=0.014, M=0.007, T=0.007, Y=0.007]
[M=0.150, P=0.100, Q=0.100, I=0.079, B=0.071, Z=0.071, L=0.064, W=0.064, K=0.057, V=0.043, E=0.036, A=0.029, C=0.029, N=0.029, U=0.021, H=0.014, S=0.014, D=0.007, G=0.007, J=0.007, T=0.007]
[L=0.136, Y=0.100, A=0.086, O=0.086, P=0.086, U=0.086, H=0.064, K=0.057, V=0.050, Z=0.050, S=0.043, J=0.029, M=0.021, T=0.021, W=0.021, G=0.014, I=0.014, B=0.007, C=0.007, N=0.007, R=0.007, X=0.007]
[I=0.129, M=0.107, X=0.100, L=0.086, W=0.079, S=0.064, R=0.057, H=0.050, Q=0.050, K=0.043, E=0.036, C=0.029, T=0.029, V=0.029, F=0.021, J=0.021, P=0.021, G=0.014, Y=0.014, A=0.007, D=0.007, O=0.007]
Look at it. You see that for the first digit the letter B is very common, 15%. And then letter E with 10% and so on. There is a high chance that letter B, for the first digit of the key, is an alias for E in the real text (since E is the most common letter in english text) and that the E stands for the second most common letter, namely T.
Using that you can easily reverse-compute the letter of the key used for encryption. It is obtained by
B - E % 26 = X
Note that your message distribution might not necessary align with the real distribution over all english text. Especially if the message is not that long (the longer, the more accurate is the distribution computation) or mainly consists of weird and unusual words.
You can counter that by trying out a few combinations among the highest of your distribution. So for the first digit you could try out whether
B -> E
E -> E
X -> E
L -> E
Or instead of mapping to E only, also try out the second most common character T:
B -> T
E -> T
X -> T
L -> T
The amount of combinations you get with that is very low. Use dictionaries and N-grams (as mentioned before) to validate whether the key is correct or not.
Java Implementation
Your message is actually very interesting. It perfectly aligns with the real letter frequency over english text. So for your particular case you actually do not need to try out any combinations, nor do you need to do any dictionary/n-gram checks. You can actually just translate the most common letter in your encrypted message (per digit) to the most common character in english text, E, and get the real actual key.
Since that is so simple and trivial, here is a full implementation in Java for what I explained before step by step, with some debug outputs (it is a quick prototype, not really nicely structured):
import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public final class CrackViginere {
private static final int ALPHABET_SIZE = 26;
private static final char FIRST_CHAR_IN_ALPHABET = 'A';
public static void main(final String[] args) {
String encrypted =
"BYOIZRLAUMYXXPFLPWBZLMLQPBJMSCQOWVOIJPYPALXCWZLKXYVMKXEHLIILLYJMUGBVXBOIRUAVAEZAKBHXBDZQJLELZIKMKOWZPXBKOQALQOWKYIBKGNTCPAAKPWJHKIAPBHKBVTBULWJSOYWKAMLUOPLRQOWZLWRSLEHWABWBVXOLSKOIOFSZLQLYKMZXOBUSPRQVZQTXELOWYHPVXQGDEBWBARBCWZXYFAWAAMISWLPREPKULQLYQKHQBISKRXLOAUOIEHVIZOBKAHGMCZZMSSSLVPPQXUVAOIEHVZLTIPWLPRQOWIMJFYEIAMSLVQKWELDWCIEPEUVVBAZIUXBZKLPHKVKPLLXKJMWPFLVBLWPDGCSHIHQLVAKOWZSMCLXWYLFTSVKWELZMYWBSXKVYIKVWUSJVJMOIQOGCNLQVXBLWPHKAOIEHVIWTBHJMKSKAZMKEVVXBOITLVLPRDOGEOIOLQMZLXKDQUKBYWLBTLUZQTLLDKPLLXKZCUKRWGVOMPDGZKWXZANALBFOMYIXNGLZEKKVCYMKNLPLXBYJQIPBLNMUMKNGDLVQOWPLEOAZEOIKOWZZMJWDMZSRSMVJSSLJMKMQZWTMXLOAAOSTWABPJRSZMYJXJWPHHIVGSLHYFLPLVXFKWMXELXQYIFUZMYMKHTQSMQFLWYIXSAHLXEHLPPWIVNMHRAWJWAIZAAWUGLBDLWSPZAJSCYLOQALAYSEUXEBKNYSJIWQUKELJKYMQPUPLKOLOBVFBOWZHHSVUIAIZFFQJEIAZQUKPOWPHHRALMYIAAGPPQPLDNHFLBLPLVYBLVVQXUUIUFBHDEHCPHUGUM";
int keyLength = 6;
char mostCommonCharOverall = 'E';
// Blocks
List<String> blocks = new ArrayList<>();
for (int startIndex = 0; startIndex < encrypted.length(); startIndex += keyLength) {
int endIndex = Math.min(startIndex + keyLength, encrypted.length());
String block = encrypted.substring(startIndex, endIndex);
blocks.add(block);
}
System.out.println("Individual blocks are:");
blocks.forEach(System.out::println);
// Frequency
List<Map<Character, Integer>> digitToCounts = Stream.generate(HashMap<Character, Integer>::new)
.limit(keyLength)
.collect(Collectors.toList());
for (String block : blocks) {
for (int i = 0; i < block.length(); i++) {
char c = block.charAt(i);
Map<Character, Integer> counts = digitToCounts.get(i);
counts.compute(c, (character, count) -> count == null ? 1 : count + 1);
}
}
List<List<CharacterFrequency>> digitToFrequencies = new ArrayList<>();
for (Map<Character, Integer> counts : digitToCounts) {
int totalCharacterCount = counts.values()
.stream()
.mapToInt(Integer::intValue)
.sum();
List<CharacterFrequency> frequencies = new ArrayList<>();
for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
double frequency = entry.getValue() / (double) totalCharacterCount;
frequencies.add(new CharacterFrequency(entry.getKey(), frequency));
}
Collections.sort(frequencies);
digitToFrequencies.add(frequencies);
}
System.out.println("Frequency distribution for each digit is:");
digitToFrequencies.forEach(System.out::println);
// Guessing
StringBuilder keyBuilder = new StringBuilder();
for (List<CharacterFrequency> frequencies : digitToFrequencies) {
char mostFrequentChar = frequencies.get(0)
.getCharacter();
int keyInt = mostFrequentChar - mostCommonCharOverall;
keyInt = keyInt >= 0 ? keyInt : keyInt + ALPHABET_SIZE;
char key = (char) (FIRST_CHAR_IN_ALPHABET + keyInt);
keyBuilder.append(key);
}
String key = keyBuilder.toString();
System.out.println("The guessed key is: " + key);
System.out.println("Decrypted message:");
System.out.println(decrypt(encrypted, key));
}
private static String decrypt(String encryptedMessage, String key) {
StringBuilder decryptBuilder = new StringBuilder(encryptedMessage.length());
int digit = 0;
for (char encryptedChar : encryptedMessage.toCharArray())
{
char keyForDigit = key.charAt(digit);
int decryptedCharInt = encryptedChar - keyForDigit;
decryptedCharInt = decryptedCharInt >= 0 ? decryptedCharInt : decryptedCharInt + ALPHABET_SIZE;
char decryptedChar = (char) (decryptedCharInt + FIRST_CHAR_IN_ALPHABET);
decryptBuilder.append(decryptedChar);
digit = (digit + 1) % key.length();
}
return decryptBuilder.toString();
}
private static class CharacterFrequency implements Comparable<CharacterFrequency> {
private final char character;
private final double frequency;
private CharacterFrequency(final char character, final double frequency) {
this.character = character;
this.frequency = frequency;
}
#Override
public int compareTo(final CharacterFrequency o) {
return -1 * Double.compare(frequency, o.frequency);
}
private char getCharacter() {
return character;
}
private double getFrequency() {
return frequency;
}
#Override
public String toString() {
return character + "=" + String.format("%.3f", frequency);
}
}
}
Decrypted
Using above code, the key is:
XHSIHE
And the full decrypted message is:
ERWASNOTCERTAINDISESTEEMSURELYTHENHEMIGHTHAVEREGARDEDTHATABHORRENCEOFTHEUNINTACTSTATEWHICHHEHADINHERITEDWITHTHECREEDOFMYSTICISMASATLEASTOPENTOCORRECTIONWHENTHERESULTWASDUETOTREACHERYAREMORSESTRUCKINTOHIMTHEWORDSOFIZZHUETTNEVERQUITESTILLEDINHISMEMORYCAMEBACKTOHIMHEHADASKEDIZZIFSHELOVEDHIMANDSHEHADREPLIEDINTHEAFFIRMATIVEDIDSHELOVEHIMMORETHANTESSDIDNOSHEHADREPLIEDTESSWOULDLAYDOWNHERLIFEFORHIMANDSHEHERSELFCOULDDONOMOREHETHOUGHTOFTESSASSHEHADAPPEAREDONTHEDAYOFTHEWEDDINGHOWHEREYESHADLINGEREDUPONHIMHOWSHEHADHUNGUPONHISWORDSASIFTHEYWEREAGODSANDDURINGTHETERRIBLEEVENINGOVERTHEHEARTHWHENHERSIMPLESOULUNCOVEREDITSELFTOHISHOWPITIFULHERFACEHADLOOKEDBYTHERAYSOFTHEFIREINHERINABILITYTOREALIZETHATHISLOVEANDPROTECTIONCOULDPOSSIBLYBEWITHDRAWNTHUSFROMBEINGHERCRITICHEGREWTOBEHERADVOCATECYNICALTHINGSHEHADUTTEREDTOHIMSELFABOUTHERBUTNOMANCANBEALWAYSACYNI
Which is more or less valid english text:
er was not certain disesteem surely then he might have regarded that
abhorrence of the unintact state which he had inherited with the creed
of my sticismas at least open to correction when the result was due to
treachery are morse struck into him the words of izz huett never quite
still ed in his memory came back to him he had asked izz if she loved
him and she had replied in the affirmative did she love him more than
tess did no she had replied tess would lay down her life for him and she
herself could do no more he thought of tess as she had appeared on the day
of the wedding how here yes had lingered upon him how she had hung upon
his words as if they were a gods and during the terrible evening over
the hearth when her simple soul uncovered itself to his how pitiful her
face had looked by the rays of the fire inherinability to realize that
his love and protection could possibly be withdrawn thus from being her
critiche grew to be her advocate cynical things he had uttered to
himself about her but noman can be always acyn I
Which, by the way, is a quote from the british novel Tess of the d'Urbervilles: A Pure Woman Faithfully Presented. Phase the Sixth: The Convert, Chapter XLIX.
Standard Vigenere interleaves Caesar shift cyphers, specified by the key. If the Vigenere key is six characters long, then letters 1, 7, 13, ... of the ciphertext are on one Caesar shift -- every sixth character uses the first character of the key. Letter 2, 8, 14 ... of the ciphertext use a different (in general) Caesar shift and so on.
That gives you six different Caesar shift ciphers to solve. The text will not be in English, due to picking every sixth letter, so you will need to solve it by letter frequency. That will give you a few good options for each position of the key. Try them in order of probability to see which gives the correct decryption.
This question already has answers here:
Memory efficient power set algorithm
(5 answers)
Closed 8 years ago.
I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:
private ArrayList<String> subsets(String word){
ArrayList<String> s = new ArrayList<String>();
int length = word.length();
for (int c=0; c<length; c++){
for (int i=0; i<length-c; i++){
String sub = word.substring(c, c+i+1);
System.out.println(sub);
//if (!s.contains(sub) && sub!=null)
s.add(sub);
}
}
//java.util.Collections.sort(s, new MyComparator());
//System.out.println(s.toString());
return s;
}
My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):
f
fn
fnu
n
nu
u
But when I try 4 letter words, it leaves something out, as in catq gives me:
a
ac
acq
acqt
c
cq
cqt
q
qt
t
i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.
EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string.
I hope I have made my problem a bit clearer
It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.
(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)
After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.
Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.
Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:
000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.
Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.
It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:
public string getSubSet(string input, int index) {
// Should check that index >=0 and < 2^input.length here.
// Should also check that input.length <= 31.
string returnValue = "";
for (int i = 0; i < input.length; i++) {
if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
returnValue += input[i];
}
return returnValue;
}
Then, if you need to you can just do a loop that calls this function, like this:
for (i = 1; i < (1 << input.length); i++)
getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.
Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.
This is the method I came up with, seems like it's working
private void subsets(String word, ArrayList<String> subset){
if(word.length() == 1){
subset.add(word);
return;
}
else {
String firstChar = word.substring(0,1);
word = word.substring(1);
subsets(word, subset);
int size = subset.size();
for (int i = 0; i < size; i++){
String temp = firstChar + subset.get(i);
subset.add(temp);
}
subset.add(firstChar);
return;
}
}
What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.
When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again.
I.E., with the word funthis happens:
f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return
I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.
This is working code:
public static void main(String[] args) {
String input = "abcde";
Set<String> returnList = permutations(input);
System.out.println(returnList);
}
private static Set<String> permutations(String input) {
if (input.length() == 1) {
Set<String> a = new TreeSet<>();
a.add(input);
return a;
}
Set<String> returnSet = new TreeSet<>();
for (int i = 0; i < input.length(); i++) {
String prefix = input.substring(i, i + 1);
Set<String> permutations = permutations(input.substring(i + 1));
returnSet.add(prefix);
returnSet.addAll(permutations);
Iterator<String> it = permutations.iterator();
while (it.hasNext()) {
returnSet.add(prefix + it.next());
}
}
return returnSet;
}
I know how to work out the index of a certain character or number in a string, but is there any predefined method I can use to give me the character at the nth position? So in the string "foo", if I asked for the character with index 0 it would return "f".
Note - in the above question, by "character" I don't mean the char data type, but a letter or number in a string. The important thing here is that I don't receive a char when the method is invoked, but a string (of length 1). And I know about the substring() method, but I was wondering if there was a neater way.
The method you're looking for is charAt. Here's an example:
String text = "foo";
char charAtZero = text.charAt(0);
System.out.println(charAtZero); // Prints f
For more information, see the Java documentation on String.charAt. If you want another simple tutorial, this one or this one.
If you don't want the result as a char data type, but rather as a string, you would use the Character.toString method:
String text = "foo";
String letter = Character.toString(text.charAt(0));
System.out.println(letter); // Prints f
If you want more information on the Character class and the toString method, I pulled my info from the documentation on Character.toString.
You want .charAt()
Here's a tutorial
"mystring".charAt(2)
returns s
If you're hellbent on having a string there are a couple of ways to convert a char to a string:
String mychar = Character.toString("mystring".charAt(2));
Or
String mychar = ""+"mystring".charAt(2);
Or even
String mychar = String.valueOf("mystring".charAt(2));
For example.
None of the proposed answers works for surrogate pairs used to encode characters outside of the Unicode Basic Multiligual Plane.
Here is an example using three different techniques to iterate over the "characters" of a string (incl. using Java 8 stream API). Please notice this example includes characters of the Unicode Supplementary Multilingual Plane (SMP). You need a proper font to display this example and the result correctly.
// String containing characters of the Unicode
// Supplementary Multilingual Plane (SMP)
// In that particular case, hieroglyphs.
String str = "The quick brown π₯ jumps over the lazy ππΏπ
π‘";
Iterate of chars
The first solution is a simple loop over all char of the string:
/* 1 */
System.out.println(
"\n\nUsing char iterator (do not work for surrogate pairs !)");
for (int pos = 0; pos < str.length(); ++pos) {
char c = str.charAt(pos);
System.out.printf("%s ", Character.toString(c));
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
}
Iterate of code points
The second solution uses an explicit loop too, but accessing individual
code points with codePointAt and incrementing the loop index accordingly to charCount:
/* 2 */
System.out.println(
"\n\nUsing Java 1.5 codePointAt(works as expected)");
for (int pos = 0; pos < str.length();) {
int cp = str.codePointAt(pos);
char chars[] = Character.toChars(cp);
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to a `char[]`
// as code points outside the Unicode BMP
// will map to more than one Java `char`
System.out.printf("%s ", new String(chars));
// ^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
pos += Character.charCount(cp);
// ^^^^^^^^^^^^^^^^^^^^^^^
// Increment pos by 1 of more depending
// the number of Java `char` required to
// encode that particular codepoint.
}
Iterate over code points using the Stream API
The third solution is basically the same as the second, but using the Java 8 Stream API:
/* 3 */
System.out.println(
"\n\nUsing Java 8 stream (works as expected)");
str.codePoints().forEach(
cp -> {
char chars[] = Character.toChars(cp);
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to a `char[]`
// as code points outside the Unicode BMP
// will map to more than one Java `char`
System.out.printf("%s ", new String(chars));
// ^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
});
Results
When you run that test program, you obtain:
Using char iterator (do not work for surrogate pairs !)
T h e q u i c k b r o w n ? ? j u m p s o v e r t h e l a z y ? ? ? ? ? ? ? ?
Using Java 1.5 codePointAt(works as expected)
T h e q u i c k b r o w n π₯ j u m p s o v e r t h e l a z y π πΏ π
π‘
Using Java 8 stream (works as expected)
T h e q u i c k b r o w n π₯ j u m p s o v e r t h e l a z y π πΏ π
π‘
As you can see (if you're able to display hieroglyphs properly), the first solution does not handle properly characters outside of the Unicode BMP. On the other hand, the other two solutions deal well with surrogate pairs.
You're pretty stuck with substring(), given your requirements. The standard way would be charAt(), but you said you won't accept a char data type.
You could use the String.charAt(int index) method result as the parameter for String.valueOf(char c).
String.valueOf(myString.charAt(3)) // This will return a string of the character on the 3rd position.
A hybrid approach combining charAt with your requirement of not getting char could be
newstring = String.valueOf("foo".charAt(0));
But that's not really "neater" than substring() to be honest.
It is as simple as:
String charIs = string.charAt(index) + "";
Here's the correct code. If you're using zybooks this will answer all the problems.
for (int i = 0; i<passCode.length(); i++)
{
char letter = passCode.charAt(i);
if (letter == ' ' )
{
System.out.println("Space at " + i);
}
}
if someone is strugling with kotlin, the code is:
var oldStr: String = "kotlin"
var firstChar: String = oldStr.elementAt(0).toString()
Log.d("firstChar", firstChar.toString())
this will return the char in position 1, in this case k
remember, the index starts in position 0, so in this sample:
kotlin would be k=position 0, o=position 1, t=position 2, l=position 3, i=position 4 and n=position 5
CodePointAt instead of charAt is safer to use. charAt may break when there are emojis in the strtng.
CharAt function not working
Edittext.setText(YourString.toCharArray(),0,1);
This code working fine
I come across this question yeasterday and I am aware this has accepted answer, just want to add one more solution to this in javascript scenario which can help people like me who are looking for this -
let name = 'Test'
console.log(name[2])
// Here at index 2 we have 's' value and this will simply give the expected output
Like this:
String a ="hh1hhhhhhhh";
char s = a.charAt(3);
This program is to use the keyboard keys to play notes. I get a different string index out of range for each key I press, ranging from 49 for the 1 to 109 for the m. but I always get this error message. I am new to Java, and any help would be appreciated since I've checked a bunch of forums and haven't found the answer to quite this kind of problem.
The exception is thrown at this line:
nextnote = keyboard.charAt(key);
This is my code:
public class GuitarHero {
public static void main(String[] args) {
//make array for strings
double[] notes = new double[37];
GuitarString[] strings = new GuitarString[37];
int nextnote;
int firstnote=0;
double NOTE = 440.0;
String keyboard ="1234567890qwertyuiopasdfghjklzxcvbnm";
//for loop to set notes
for(int i=0;i<37;i++){
double concert = 440.0* Math.pow(2, (i-24)/12.0);
notes[i] = concert;
for(int j=0;j<37;j++){
strings[j] = new GuitarString(concert);
}
}
while (true) {
// check if the user has typed a key; if so, process it
if (StdDraw.hasNextKeyTyped()) {
char key = StdDraw.nextKeyTyped();
//charAt gets index of character in string
nextnote = keyboard.charAt(key);
//make sure value is within string
if(nextnote>=0 && nextnote<37){
// pluck string and compute the superposition of samples
strings[nextnote].pluck();
double sample = strings[firstnote].sample()
+strings[nextnote].sample();
StdAudio.play(sample);
// advance the simulation of each guitar string by one step
strings[nextnote].tic();
firstnote=nextnote;
}
}
}
}
}
You want to call String#indexOf(int), which will give you the index of the character. String#charAt(int) returns the character at the given index.
You need the indexOf method
Returns the index within this string of the first occurrence of the specified character
and not the charAt
Returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.
The problem is here:
StdDraw.nextKeyTyped(); documentation says:
What is the next key that was typed by the user? This method returns a
Unicode character corresponding to the key typed (such as 'a' or 'A').
It cannot identify action keys (such as F1 and arrow keys) or modifier
keys (such as control).
key is a character not an index at this line. Do the following instead:
int charIndexInKeyboard = keyboard.indexOf(key);
if(charIndexInKeyboard == -1) // char not recognized
nextnote = keyboard.charAt(charIndexInKeyboard );
nextnote should now contain the character you want.
EDIT: Here is how your while loop should look like now
while (true) {
// check if the user has typed a key; if so, process it
if (StdDraw.hasNextKeyTyped()) {
char key = StdDraw.nextKeyTyped();
int charIndexInKeyboard = keyboard.indexOf(key);
if(charIndexInKeyboard == -1){
// Not recognized, just continue to next
continue;
}
nextnote = keyboard.charAt(charIndexInKeyboard);
// pluck string and compute the superposition of samples
strings[nextnote].pluck();
double sample = strings[firstnote].sample()
+strings[nextnote].sample();
StdAudio.play(sample);
// advance the simulation of each guitar string by one step
strings[nextnote].tic();
firstnote=nextnote;
}
}
I am trying to learn Java by doing some assignments from a Stanford class and am having trouble answering this question.
boolean stringIntersect(String a, String b, int len): Given 2 strings,
consider all the substrings within them of length len. Returns true if
there are any such substrings which appear in both strings. Compute
this in O(n) time using a HashSet.
I can't figure out how to do it using a Hashset because you cannot store repeating characters. So stringIntersect(hoopla, loopla, 5) should return true.
thanks!
Edit: Thanks so much for all your prompt responses. It was helpful to see explanations as well as code. I guess I couldn't see why storing substrings in a hashset would make the algorithm more efficient. I originally had a solution like :
public static boolean stringIntersect(String a, String b, int len) {
assert (len>=1);
if (len>a.length() || len>b.length()) return false;
String s1=new String(),s2=new String();
if (a.length()<b.length()){
s1=a;
s2=b;
}
else {
s1=b;
s2=a;
}
int index = 0;
while (index<=s1.length()-len){
if (s2.contains(s1.substring(index,index+len)))return true;
index++;
}
return false;
}
I'm not sure I understand what you mean by "you cannot store repeating characters" A hashset is a Set, so it can do two things: you can add value to it, and you can add values to it, and you can check if a value is already in it. In this case, the problem wants you to answer the question by storing strings, not chars, in the HashSet. To do this in java:
Set<String> stringSet = new HashSet<String>();
Try breaking this problem into two parts:
1. Generate all the substrings of length len of a string
2. Use this to solve the problem.
The hint for part two is:
Step 1: For the first string enter the substrings into a hashset
Step 2: For the second string, check the values in the hashset
Note (Advanced): this problem is poorly specified. Entering and checking strings in a hashtable is O the length of the string. For string a of length n you have O(n-k) substrings of length k. So for string a being a string of length n and string b being a string of length m you have O((n-k)*k+(m-k)*k) this is not really big Oh of n, since your running time for k = n/2 is O((n/2)*(n/2)) = O(n^2)
Edit: So what if you actually want to do this in O(n) (or perhaps O(n+m+k))? My belief is that the original homework was asking for something like the algorithm I described above. But we can do better. Whats more, we can do better and still make a HashSet the crucial tool for our algorithm. The idea is to perform our search using a "Rolling Hash." Wikipedia describes a couple: http://en.wikipedia.org/wiki/Rolling_hash, but we will implement our own.
A simple solution would be to XOR the values of the character hashes together. This could allow us to add a new char to the hash O(1) and remove one O(1) making computing the next hash trivial. But this simple algorithm wont work for two reasons
The character hashes may not provide sufficient entropy. Okay, we dont know if we will have this problem, but lets solve it anyways, just for fun.
We will hash permutations to the same value ... "abc" should not have the same hash as "cba"
To solve the first problem we can use an idea from AI, namely lets steel from Zobrist hashing. The idea is to assign every possible character a random value of a greater length. If we were using ASCI, we could easily create an array with all the ASCI characters, but that will run into problems when using unicode characters. The alternative is to assign values lazily.
object LazyCharHash{
private val map = HashMap.empty[Char,Int]
private val r = new Random
def lHash(c: Char): Int = {
val d = map.get(c)
d match {
case None => {
map.put(c,r.nextInt)
lHash(c)
}
case Some(v) => v
}
}
}
This is Scala code. Scala tends to be less verbose than Java, but still allows me to use Java collections, as such I will be using imperative style Scala through out. It wouldn't be that hard to translate.
The second problem can be solved aswell. First, instead of using a pure XOR, we combine our XOR with a shift, thus the hash function is now:
def fullHash(s: String) = {
var h = 0
for(i <- 0 until s.length){
h = h >>> 1
h = h ^ LazyCharHash.lHash(s.charAt(i))
}
h
}
Of-course, using fullHash wont give a performance advantage. It is just a specification
We need a way of using our hash function to store values in the HashSet (I promised we would use it). We can just create a wrapper class:
class HString(hash: Int, string: String){
def getHash = hash
def getString = string
override def equals(otherHString: Any): Boolean = {
otherHString match {
case other: HString => (hash == other.getHash) && (string == other.getString)
case _ => false
}
}
override def hashCode = hash
}
Okay, to make the hashing function rolling, we just have to XOR the value associated with the character we will no longer be using. To that just takes shifting that value by the appropriate amount.
def stringIntersect(a: String, b: String, len: Int): Boolean = {
val stringSet = new HashSet[HString]()
var h = 0
for(i <- 0 until len){
h = h >>> 1
h = h ^ LazyCharHash.lHash(a.charAt(i))
}
stringSet.add(new HString(h,a.substring(0,len)))
for(i <- len until a.length){
h = h >>> 1
h = h ^ (LazyCharHash.lHash(a.charAt(i - len)) >>> (len))
h = h ^ LazyCharHash.lHash(a.charAt(i))
stringSet.add(new HString(h,a.substring(i - len + 1,i + 1)))
}
...
You can figure out how to finish this code on your own.
Is this O(n)? Well, it matters what mean. Big Oh, big Omega, big Theta, are all metrics of bounds. They could serve as metrics of the worst case of the algorithm, the best case, or something else. In this case these modification gives expected O(n) performance, but this only holds if we avoid hash collisions. It still take O(n) to tell if two Strings are equals. This random approach works pretty well, and you can scale up the size of the random bit arrays to make it work better, but it does not have guaranteed performance.
You should not store characters in the Hashset, but substrings.
When considering string "hoopla": if you store the substrings "hoopl" and "oopla" in the Hashset (linear operation), then it's linear again to find if one of the substrings of "loopla" matches.
I don't know how they're thinking you're supposed to use the HashSet but I ended up doing a solution like this:
public class StringComparator {
public static boolean compare( String a, String b, int len ) {
Set<String> pieces = new HashSet<String>();
for ( int x = 0; (x + len) <= b.length(); x++ ) {
pieces.add( a.substring( x, x + len ) );
}
for ( String piece : pieces ) {
if ( b.contains(piece) ) {
return true;
}
}
return false;
}
}