Finding a string inside a string twice as big effectively

Finding a string inside a string twice as big effectively - java

I'm looking to see if the difference between every adjacent number in an array is the same as another array, or a rotation of it, for example
A = {1,2,4}, so the differences are {1,1,2}
B = {4,6,7}, the differences are {1,2,1}
If all elements in {1,2,1} were moved clockwise one-element, the result is {1,1,2}, which is correct.
so far I convert the differences to strings, and then see if the differences of the second array is found in the first concatenated with itself
valid if "1 2 1" is in "1 1 2 1 1 2"
my code so far looks like this
count is the length of the array, both have the same length
int c = count - 1;
StringBuilder b1 = new StringBuilder();
StringBuilder b2 = new StringBuilder();
for (int i = 0; i < c; i++) {
b1.append(array1[i + 1] - array1[i]);
b1.append(" ");
b2.append(array2[i + 1] - array2[i]);
b2.append(" ");
}
b1.append((array1[0] - array1[c]) + d);
b1.append(" ");
b2.append((array2[0] - array2[c]) + d);
String a2 = b2.toString();
String a3 = b1.toString() + b1.toString();
System.out.println(a3.contains(a2) ? "valid" : "not valid"); //bottleneck here
My problem is when I use big arrays (up to about 250,000 elements) I get a massive bottleneck at the last line with the .contains(). I'm wondering if there is either a faster way of check if its inside the method than what I'm using, or if I can check while building up the string, or if there is a completely different way of doing this?

You need a more efficient algorithm then the one that is used in contains method(it actually depends on a concrete implementation, but it looks like it is not efficient in the version of Java you are using).
You can use Knuth-Morris-Pratt algorithm: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm. It has linear time and space complexity in the worst case so it works fast even for very big arrays. Note that there is no need to convert an array to a string, because this algorithm works for arrays, too.

Related

Hashmap in for loop not reading all the input

This is for AOC day 2. The input is something along the lines of
"6-7 z: dqzzzjbzz
13-16 j: jjjvjmjjkjjjjjjj
5-6 m: mmbmmlvmbmmgmmf
2-4 k: pkkl
16-17 k: kkkkkkkkkkkkkkkqf
10-16 s: mqpscpsszscsssrs
..."
It's formatted like 'min-max letter: password' and seperated by line. I'm supposed to find how many passwords meet the minimum and maximum requirements. I put all that prompt into a string variable and used Pattern.quote("\n") to seperate the lines into a string array. This worked fine. Then, I replaced all the letters except for the numbers and '-' by making a pattern Pattern.compile("[^0-9]|-"); and running that for every index in the array and using .trim() to cut off the whitespace at the end and start of each string. This is all working fine, I'm getting the desired output like 6 7 and 13 16.
However, now I want to try and split this string into two. This is my code:
HashMap<Integer,Integer> numbers = new HashMap<Integer,Integer>();
for(int i = 0; i < inputArray.length; i++){
String [] xArray = x[i].split(Pattern.quote(" "));
int z = Integer.valueOf(xArray[0]);
int y = Integer.valueOf(xArray[1]);
System.out.println(z);
System.out.println(y);
numbers.put(z, y);
}
System.out.println(numbers);
So, first making a hasmap which will store <min, max> values. Then, the for loop (which runs 1000 times) splits every index of the 6 7 and 13 16 string into two, determined by the " ". The System.out.println(z); and System.out.println(y); are working as intended.
6
7
13
16
...
This output goes on to give me 2000 integers seperated by a line each time. That's exactly what I want. However, the System.out.println(numbers); is outputting:
{1=3, 2=10, 3=4, 4=7, 5=6, 6=9, 7=12, 8=11, 9=10, 10=18, 11=16, 12=13, 13=18, 14=16, 15=18, 16=18, 17=18, 18=19, 19=20}
I have no idea where to even start with debugging this. I made a test file with an array that is formatted like "even, odd" integers all the way up to 100. Using this exact same code (I did change the variable names), I'm getting a better output. It's not exactly desired since it starts at 350=351 and then goes to like 11=15 and continues in a non-chronological order but at least it contains all the 100 keys and values.
Also, completely unrelated question but is my formatting of the for loop fine? The extra space at the beginning and the end of the code?
Edit: I want my expected output to be something like {6=7, 13=16, 5=6, 2=4, 16=17...}. Basically, the hashmap would have the minimum and maximum as the key and value and it'd be in chronological order.

The problem with your code is that you're trying to put in a nail with a saw. A hashmap is not the right tool to achieve what you want, since
Keys are unique. If you try to input the same key multiple times, the first input will be overwritten
The order of items in a HashMap is undefined.
A hashmap expresses a key-value-relationship, which does not exist in this context
A better datastructure to save your Passwords would probably just be a ArrayList<IntegerPair> where you would have to define IntegerPair yourself, since java doesn't have the notion of a type combining two other types.

I think you are complicating the task unnecessarily. I would proceed as follows:
split the input using the line separator
for each line remove : and split using the spaces to get an array with length 3
build from the array in step two
3.1. the min/max char count from array[0]
3.2 charachter classes for the letter and its negation
3.3 remove from the password all letters that do not correspond to the given one and check if the length of the password is in range.
Something like:
public static void main(String[] args){
String input = "6-7 z: dqzzzjbzz\n" +
"13-16 j: jjjvjmjjkjjjjjjj\n" +
"5-6 m: mmbmmlvmbmmgmmf\n" +
"2-4 k: pkkl\n" +
"16-17 k: kkkkkkkkkkkkkkkqf\n" +
"10-16 s: mqpscpsszscsssrs\n";
int count = 0;
for(String line : input.split("\n")){
String[] temp = line.replace(":", "").split(" "); //[6-7, z, dqzzzjbzz]
String minMax = "{" + (temp[0].replace('-', ',')) + "}"; //{6,7}
String letter = "[" + temp[1] + "]"; //[z]
String letterNegate = "[^" + temp[1] + "]"; //[^z]
if(temp[2].replaceAll(letterNegate, "").matches(letter + minMax)){
count++;
}
}
System.out.println(count + "passwords are valid");
}

is there a clean way in Java to output part of an int array as a string?

Given an array of int, and an index k. The task is to output the array that has been left-shift rotated by k.
eg...
a = [1,2,3,4,5], k = 4
Output:
5 1 2 3 4
In Javascript the code is:
var result = a.slice(d).join(' ') + ' ' + a.slice(0, d).join(' ')
I'm really struggling to find a concise equivalent in Java.

As you say you only need to output, this will work :
int[] a = new int[]{1, 2, 3, 4, 5};
int k = 4;
for (int i = 0; i < a.length; i++) {
System.out.print(a[(i + k) % a.length] + " ");
}

This depends on your definition of concise, I suppose. Here's a bit of Java 8 Stream capability that does what you want, in a similar way to your JS code, and is arguably somewhat concise.
int[] arr = {1, 2, 3, 4, 5};
int k = 4;
String result = Arrays.stream(arr).skip(k).mapToObj(Integer::toString).collect(Collectors.joining(" ")) + " " + Arrays.stream(arr).limit(k).mapToObj(Integer::toString).collect(Collectors.joining(" "));
To break down what each bit does:
Arrays.stream(arr) just creates a Stream of ints (Specifically, an IntStream, since int, as a built-in type, must have its own wrapper stream class rather than the generic Stream<T>).
skip(k) essentially discards as many elements as the provided input, so we can start with the part from the index forward.
mapToObj(Integer::toString) is how we convert the Stream from an IntStream to a Stream<String>, so that we can join it into a single String. It takes each element in the Stream and applies the Integer.toString method to it, i.e. if each element of the Stream were in turn referred to as int, it would be like calling Integer.toString(int). We are now left with a Stream of Strings representing the numbers.
collect(Collectors.joining(" ")) takes each value, and gives it to a Collector. Collectors take Streams and reduce them to single objects. In this case, the joining Collector takes each object and joins them all to a single string, and the argument we have provided is the delimiter it uses.
Looking at the second half of the procedure, the only really different part is limit(k), which simply takes the first k elements of the Stream and ignores the rest.
It's a bit more long-winded that your JS example (necessary partly because of Java's strong typing and partly because Java arrays aren't objects and therefore don't do anything useful by themselves), but fairly easy to understand and very powerful.

Find every possible subset given a string [duplicate]

This question already has answers here:
Memory efficient power set algorithm
(5 answers)
Closed 8 years ago.
I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:
private ArrayList<String> subsets(String word){
ArrayList<String> s = new ArrayList<String>();
int length = word.length();
for (int c=0; c<length; c++){
for (int i=0; i<length-c; i++){
String sub = word.substring(c, c+i+1);
System.out.println(sub);
//if (!s.contains(sub) && sub!=null)
s.add(sub);
}
}
//java.util.Collections.sort(s, new MyComparator());
//System.out.println(s.toString());
return s;
}
My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):
f
fn
fnu
n
nu
u
But when I try 4 letter words, it leaves something out, as in catq gives me:
a
ac
acq
acqt
c
cq
cqt
q
qt
t
i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.
EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string.
I hope I have made my problem a bit clearer

It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.
(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)
After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.

Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.
Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:
000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.
Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.
It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:
public string getSubSet(string input, int index) {
// Should check that index >=0 and < 2^input.length here.
// Should also check that input.length <= 31.
string returnValue = "";
for (int i = 0; i < input.length; i++) {
if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
returnValue += input[i];
}
return returnValue;
}
Then, if you need to you can just do a loop that calls this function, like this:
for (i = 1; i < (1 << input.length); i++)
getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.
Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.

This is the method I came up with, seems like it's working
private void subsets(String word, ArrayList<String> subset){
if(word.length() == 1){
subset.add(word);
return;
}
else {
String firstChar = word.substring(0,1);
word = word.substring(1);
subsets(word, subset);
int size = subset.size();
for (int i = 0; i < size; i++){
String temp = firstChar + subset.get(i);
subset.add(temp);
}
subset.add(firstChar);
return;
}
}
What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.
When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again.
I.E., with the word funthis happens:
f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return
I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.

This is working code:
public static void main(String[] args) {
String input = "abcde";
Set<String> returnList = permutations(input);
System.out.println(returnList);
}
private static Set<String> permutations(String input) {
if (input.length() == 1) {
Set<String> a = new TreeSet<>();
a.add(input);
return a;
}
Set<String> returnSet = new TreeSet<>();
for (int i = 0; i < input.length(); i++) {
String prefix = input.substring(i, i + 1);
Set<String> permutations = permutations(input.substring(i + 1));
returnSet.add(prefix);
returnSet.addAll(permutations);
Iterator<String> it = permutations.iterator();
while (it.hasNext()) {
returnSet.add(prefix + it.next());
}
}
return returnSet;
}

String manipulation - GA

I'm having trouble (all sorts of different ones), with my mutation function for an Genetic algorithm. I manipulate strings as DNA, which come from Integer.toString(Float.floatToIntBits(value)). Everything cross-over nicely, and repopulates, so now its time for some nasty mutations. And now i have a problem, this is my mutation function:
public void muttate() {
Random rand = new Random();
int mutationPoint = rand.nextInt(valueString.length()-1);
//int mutationPoint=valueString.length()-1;
//System.out.println(mutationPoint);
if(mutationPoint==0)
valueString = rand.nextInt(10)
+ valueString.substring(0);
else if (mutationPoint == 1)
valueString = valueString.charAt(0)
+ Integer.toString(rand.nextInt(10))
+ valueString.substring(mutationPoint);
else if (mutationPoint != valueString.length()-1)
valueString = valueString.substring(0, mutationPoint-1)
+ Integer.toString(rand.nextInt(10))
+ valueString.substring(mutationPoint);
else
valueString = valueString.substring(0, mutationPoint - 1)
+ Integer.toString(rand.nextInt(10));
changeStringtovalue();
calculateFitnes();
}
and as i run it, i see it eats up my DNA (so length is first 9, then after some time is 8 and so long). And it comes from this mutation part, not cross-overs (tested). I think it's some kind of stupid mistake, but i just can't find a clue.
And also, is that kind of mutation even valid for this situation? Maybe I should manipulate bits, after applying masks, to get to certain part of that float.

Mutate is spelled without the double tt. Your code will never mutate the last position because of the way you call random.
The problem is substring(start, end) returns a string that does not include the end index character. So you're losing one character. The whole if block is unneeded as well.
If you're trying to mutate, you could write a function like this:
public void mutate() {
Random rand = new Random();
int mutPos = rand.nextInt(valueString.length());
valueString = valueString.substring(0, mutPos)
+ rand.nextInt(10) + valueString.substring(mutPos+1);
}
Here's some tips:
Given string "ABC".
substring(1,2) returns "B", index 1 inclusive to index 2 exclusive.
substring(0, string.length()) return whole string.
substring(0) returns whole string.
substring(i, i) returns ""
Substring also returns "" if it starts (and ends) at index one past the last character (at string.length() index.
This allows you to easily handle corner cases without if sentences as demonstrated in the code I wrote above.

Exporting specific pattern of string using split method in a most efficient way

I want to export pattern of bit stream in a String varilable. Assume our bit stream is something like bitStream="111000001010000100001111". I am looking for a Java code to save this bit stream in a specific array (assume bitArray) in a way that all continous "0"s or "1"s be saved in one array element. In this example output would be somethins like this:
bitArray[0]="111"
bitArray[1]="00000"
bitArray[2]="1"
bitArray[3]="0"
bitArray[4]="1"
bitArray[5]="0000"
bitArray[6]="1"
bitArray[7]="0000"
bitArray[8]="1111"
I want to using bitArray to calculate the number of bit which is stored in each continous stream. For example in this case the final output would be, "3,5,1,1,1,4,1,4,4". I figure it out that probably "split" method would solve this for me. But I dont know what splitting pattern would do that for me, if i Using bitStream.split("1+") it would split on contious "1" pattern, if i using bitStream.split("0+") it will do that base on continous"0" but how it could be based on both?
Mathew suggested this solution and it works:
var wholeString = "111000001010000100001111";
wholeString = wholeString.replace('10', '1,0');
wholeString = wholeString.replace('01', '0,1');
stringSplit = wholeString.split(',');
My question is "Is this solution the most efficient one?"

Try replacing any occurrence of "01" and "10" with "0,1" and "1,0" respectively. Then once you've injected the commas, split the string using the comma as the delimiting character.
String wholeString = "111000001010000100001111"
wholeString = wholeString.replace("10", "1,0");
wholeString = wholeString.replace("01", "0,1");
String stringSplit[] = wholeString.split(",");

You can do this with a simple regular expression. It matches 1s and 0s and will return each in the order they occur in the stream. How you store or manipulate the results is up to you. Here is some example code.
String testString = "111000001010000100001111";
Pattern pattern = Pattern.compile("1+|0+");
Matcher matcher = pattern.matcher(testString);
while (matcher.find())
{
System.out.print(matcher.group().length());
System.out.print(" ");
}
This will result in the following output:
3 5 1 1 1 4 1 4 4
One option for storing the results is to put them in an ArrayList<Integer>
Since the OP wanted most efficient, I did some tests to see how long each answer takes to iterate over a large stream 10000 times and came up with the following results. In each test the times were different but the order of fastest to slowest remained the same. I know tick performance testing has it's issues like not accounting for system load but I just wanted a quick test.
My answer completed in 1145 ms
Alessio's answer completed in 1202 ms
Matthew Lee Keith's answer completed in 2002 ms
Evgeniy Dorofeev's answer completed in 2556 ms
Hope this helps

I won't give you a code, but I'll guide you to a possible solution:
Construct an ArrayList<Integer>, iterate on the array of bits, as long as you have 1's, increment a counter and as soon as you have 0, add the counter to the ArrayList. After this procedure, you'll have an ArrayList that contain numbers, etc: [1,2,2,3,4] - Representing a serieses of 1's and 0's.
This will represent the sequences of 1's and 0's. Then you construct an array of the size of the ArrayList, and fill it accordingly.
The time complexity is O(n) because you need to iterate on the array only once.

This code works for any String and patterns, not only 1s and 0s. Iterate char by char, and if the current char is equal to the previous one, append the last char to the last element of the List, otherwise create a new element in the list.
public List<String> getArray(String input){
List<String> output = new ArrayList<String>();
if(input==null || input.length==0) return output;
int count = 0;
char [] inputA = input.toCharArray();
output.add(inputA[0]+"");
for(int i = 1; i <inputA.length;i++){
if(inputA[i]==inputA[i-1]){
String current = output.get(count)+inputA[i];
output.remove(count);
output.add(current);
}
else{
output.add(inputA[i]+"");
count++;
}
}
return output;
}

try this
String[] a = s.replaceAll("(.)(?!\\1)", "$1,").split(",");

I tried to implement #Maroun Maroun solution.
public static void main(String args[]){
long start = System.currentTimeMillis();
String bitStream ="0111000001010000100001111";
int length = bitStream.length();
char base = bitStream.charAt(0);
ArrayList<Integer> counts = new ArrayList<Integer>();
int count = -1;
char currChar = ' ';
for (int i=0;i<length;i++){
currChar = bitStream.charAt(i);
if (currChar == base){
count++;
}else {
base = currChar;
counts.add(count+1);
count = 0;
}
}
counts.add(count+1);
System.out.println("Time taken :" + (System.currentTimeMillis()-start ) +"ms");
System.out.println(counts.toString());
}
I believe it is more effecient way, as he said it is O(n) , you are iterating only once. Since the goal to get the count only not to store it as array. i woul recommen this. Even if we use Regular Expression ( internal it would have to iterate any way )
Result out put is
Time taken :0ms
[1, 3, 5, 1, 1, 1, 4, 1, 4, 4]

Try this one:
String[] parts = input.split("(?<=1)(?=0)|(?<=0)(?=1)");
See in action here: http://rubular.com/r/qyyfHNAo0T

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding a string inside a string twice as big effectively - java

Related

Hashmap in for loop not reading all the input

is there a clean way in Java to output part of an int array as a string?

Find every possible subset given a string [duplicate]

String manipulation - GA

Exporting specific pattern of string using split method in a most efficient way

Categories

Resources