Check if a string contains another string twice

Check if a string contains another string twice - java

I have a for loop which iterates through a maplist and now I want to check every entry of the maplist if it contains a certain String more than once and then delete all Strings except the first one which occurs but I have no clue how to do it.
for (Map<String, String> entry : mapList) {
String line = "";
for (String key : entry.keySet()) {
if (StringUtils.containsAny(key, "5799"){
line += entry.get(key) + "|";
}
list1.add(line);
}
}
I am thankful for every idea.

From your comments I assume your requirements are as follows:
You have string that contains multiple parts delimited by the pipe character |, e.g. "a|e|b|c|a|c|a|d"
You want to remove all repeating strings while preserving the order of elements, e.g. you want "a|e|b|c|d"
To achieve that you could split your string at the pipe, collect the elements into a LinkedHashSet and rejoin the elements using the pipe.
Example using Java 8:
//The pipe needs to be escaped because split() interprets the input as a regex
Set<String> elements = new LinkedHashSet<>( Arrays.asList( input.split( "\\|" ) ) );
//rejoin using the pipe
String output = elements.stream().collect( Collectors.joining( "|" ) );

To see if key contains a string s at least twice, and to remove the second occurrence, use indexOf twice, with the second call starting the search after the first occurrence:
static String removeSecond(String key, String s) {
int idxFirst = key.indexOf(s);
if (idxFirst != -1) {
int idxSecond = key.indexOf(s, idxFirst + s.length());
if (idxSecond != -1) {
return key.substring(0, idxSecond) +
key.substring(idxSecond + s.length());
}
}
return key; // Nothing to remove
}
Test
System.out.println(removeSecond("mississippi", "ss")); // prints: missiippi
System.out.println(removeSecond("mississippi", "i")); // prints: missssippi
System.out.println(removeSecond("mississippi", "pp")); // prints: mississippi
UPDATE
If you want to remove all duplicates, i.e. leave only the first occurrence, keep searching. For best performance of building the new string, use StringBuilder.
static String removeDuplicates(String key, String s) {
int idx = key.indexOf(s);
if (idx == -1)
return key; // Nothing to remove
StringBuilder buf = new StringBuilder();
int prev = 0;
for (int start = idx + s.length(); (idx = key.indexOf(s, start)) != -1; prev = start = idx + s.length())
buf.append(key.substring(prev, idx));
return (prev == 0 ? key : buf.append(key.substring(prev)).toString());
}
Test
System.out.println(removeDuplicates("mississippi", "ss")); // prints: missiippi
System.out.println(removeDuplicates("mississippi", "i")); // prints: misssspp
System.out.println(removeDuplicates("mississippi", "s")); // prints: misiippi
System.out.println(removeDuplicates("mississippi", "ab")); // prints: mississippi

If you want to remove all occurrences except the first one:
public static String removeExceptFirst(String master, String child) throws Exception {
int firstIndex = master.indexOf(child);
int lastIndexOf = master.lastIndexOf(child);
if (firstIndex == lastIndexOf) {
if (firstIndex == -1) {
throw new Exception("No occurrence!");
} else {
throw new Exception("Only one occurrence!");
}
}
while (true) {
firstIndex = master.indexOf(child);
lastIndexOf = master.lastIndexOf(child);
if (firstIndex == lastIndexOf) {
return master;
}
master = master.substring(0, lastIndexOf) + master.substring(child.length() + lastIndexOf);
}
}

Related

How to find first character after second dot java

Do you have any ideas how could I get first character after second dot of the string.
String str1 = "test.1231.asdasd.cccc.2.a.2";
String str2 = "aaa.1.22224.sadsada";
In first case I should get a and in second 2.
I thought about dividing string with dot, and extracting first character of third element. But it seems to complicated and I think there is better way.

How about a regex for this?
Pattern p = Pattern.compile(".+?\\..+?\\.(\\w)");
Matcher m = p.matcher(str1);
if (m.find()) {
System.out.println(m.group(1));
}
The regex says: find anything one or more times in a non-greedy fashion (.+?), that must be followed by a dot (\\.), than again anything one or more times in a non-greedy fashion (.+?) followed by a dot (\\.). After this was matched take the first word character in the first group ((\\w)).

Usually regex will do an excellent work here. Still if you are looking for something more customizable then consider the following implementation:
private static int positionOf(String source, String target, int match) {
if (match < 1) {
return -1;
}
int result = -1;
do {
result = source.indexOf(target, result + target.length());
} while (--match > 0 && result > 0);
return result;
}
and then the test is done with:
String str1 = "test..1231.asdasd.cccc..2.a.2.";
System.out.println(positionOf(str1, ".", 3)); -> // prints 10
System.out.println(positionOf(str1, "c", 4)); -> // prints 21
System.out.println(positionOf(str1, "c", 5)); -> // prints -1
System.out.println(positionOf(str1, "..", 2)); -> // prints 22 -> just have in mind that the first symbol after the match is at position 22 + target.length() and also there might be none element with such index in the char array.

Without using pattern, you can use subString and charAt method of String class to achieve this
// You can return String instead of char
public static char returnSecondChar(String strParam) {
String tmpSubString = "";
// First check if . exists in the string.
if (strParam.indexOf('.') != -1) {
// If yes, then extract substring starting from .+1
tmpSubString = strParam.substring(strParam.indexOf('.') + 1);
System.out.println(tmpSubString);
// Check if second '.' exists
if (tmpSubString.indexOf('.') != -1) {
// If it exists, get the char at index of . + 1
return tmpSubString.charAt(tmpSubString.indexOf('.') + 1);
}
}
// If 2 '.' don't exists in the string, return '-'. Here you can return any thing
return '-';
}

You could do it by splitting the String like this:
public static void main(String[] args) {
String str1 = "test.1231.asdasd.cccc.2.a.2";
String str2 = "aaa.1.22224.sadsada";
System.out.println(getCharAfterSecondDot(str1));
System.out.println(getCharAfterSecondDot(str2));
}
public static char getCharAfterSecondDot(String s) {
String[] split = s.split("\\.");
// TODO check if there are values in the array!
return split[2].charAt(0);
}
I don't think it is too complicated, but using a directly matching regex is a very good (maybe better) solution anyway.
Please note that there might be the case of a String input with less than two dots, which would have to be handled (see TODO comment in the code).

You can use Java Stream API since Java 8:
String string = "test.1231.asdasd.cccc.2.a.2";
Arrays.stream(string.split("\\.")) // Split by dot
.skip(2).limit(1) // Skip 2 initial parts and limit to one
.map(i -> i.substring(0, 1)) // Map to the first character
.findFirst().ifPresent(System.out::println); // Get first and print if exists
However, I recommend you to stick with Regex, which is safer and a correct way to do so:
Here is the Regex you need (demo available at Regex101):
.*?\..*?\.(.).*
Don't forget to escape the special characters with double-slash \\.
String[] array = new String[3];
array[0] = "test.1231.asdasd.cccc.2.a.2";
array[1] = "aaa.1.22224.sadsada";
array[2] = "test";
Pattern p = Pattern.compile(".*?\\..*?\\.(.).*");
for (int i=0; i<array.length; i++) {
Matcher m = p.matcher(array[i]);
if (m.find()) {
System.out.println(m.group(1));
}
}
This code prints two results on each line: a, 2 and an empty lane because on the 3rd String, there is no match.

A plain solution using String.indexOf:
public static Character getCharAfterSecondDot(String s) {
int indexOfFirstDot = s.indexOf('.');
if (!isValidIndex(indexOfFirstDot, s)) {
return null;
}
int indexOfSecondDot = s.indexOf('.', indexOfFirstDot + 1);
return isValidIndex(indexOfSecondDot, s) ?
s.charAt(indexOfSecondDot + 1) :
null;
}
protected static boolean isValidIndex(int index, String s) {
return index != -1 && index < s.length() - 1;
}
Using indexOf(int ch) and indexOf(int ch, int fromIndex) needs only to examine all characters in worst case.
And a second version implementing the same logic using indexOf with Optional:
public static Character getCharAfterSecondDot(String s) {
return Optional.of(s.indexOf('.'))
.filter(i -> isValidIndex(i, s))
.map(i -> s.indexOf('.', i + 1))
.filter(i -> isValidIndex(i, s))
.map(i -> s.charAt(i + 1))
.orElse(null);
}

Just another approach, not a one-liner code but simple.
public class Test{
public static void main (String[] args){
for(String str:new String[]{"test.1231.asdasd.cccc.2.a.2","aaa.1.22224.sadsada"}){
int n = 0;
for(char c : str.toCharArray()){
if(2 == n){
System.out.printf("found char: %c%n",c);
break;
}
if('.' == c){
n ++;
}
}
}
}
}
found char: a
found char: 2

Extract word from a line between a specific position and the next delimiter Java

I have a text file which contain many lines, every line contain many words separated by delimiter like "hello,world,I,am,here".
I want to extract some words between position and delimiter for example:
the position is 7 so the string is "world" and if the position was 1 the string will be "hello"

I would recommend using the split() method. With commas delimiting the words you would do this:
String[] words = "hello,world,I,am,here".split(",");
Then you can get the words by position by indexing into the array:
words[3] // would yield "am"
Note that the parameter to split() is a regular expression, so if you aren't familiar with them see the docs here (or google for a tutorial).

Just implement the following code while taking advantage of the method split() that can be used an all Strings objects :
String line = "hello,world,I,am,here";
String[] words = line.split(",");

public static String wordAtPosition(String line, int position) {
String[] words = line.split(",");
int index = 0;
for (String word : words) {
index += word.length();
if (position < index) {
return word;
}
}
return null;
}
Example
String line = "hello,world,I,am,here";
String word = wordAtPosition(line, 7);
System.out.println(word); // prints "world"

First get the substring , then split and get first element from Array.
public class Test {
public static void main(String[] args) throws ParseException {
Test test = new Test();
String t = test.getStringFromLocation("hello,world,I,am,here", 1, ",");
System.out.println(t);
t = test.getStringFromLocation("hello,world,I,am,here", 7, ",");
System.out.println(t);
t = test.getStringFromLocation("hello,world,I,am,here", 6, ",");
System.out.println(t);
}
public String getStringFromLocation(final String input, int position,
String demlimter) {
if (position == 0) {
return null;
}
int absoulutionPosition = position - 1;
String[] value = input.substring(absoulutionPosition).split(demlimter);
return value.length > 0 ? value[0] : null;
}
}

Not the most readable solution but covers corner cases. The split solutions are nice but does not reflect the position in the original string since it skips the ',' from the count
String line = "hello,world,I,am,here";
int position = new Random().nextInt(line.length());
int startOfWord = -1;
int currentComa = line.indexOf(",", 0);
while (currentComa >= 0 && currentComa < position) {
startOfWord = currentComa;
currentComa = line.indexOf(",", currentComa + 1);
}
int endOfWord = line.indexOf(",", position);
if(endOfWord < 0) {
endOfWord = line.length();
}
String word = line.substring(startOfWord + 1, endOfWord);
System.out.println("position " + position + ", word " + word);

Pattern matching interview Q

I was recently in an interview and they asked me the following question:
Write a function to return true if a string matches a pattern, false
otherwise
Pattern: 1 character per item, (a-z), input: space delimited string
This was my solution for the first problem:
static boolean isMatch(String pattern, String input) {
char[] letters = pattern.toCharArray();
String[] split = input.split("\\s+");
if (letters.length != split.length) {
// early return - not possible to match if lengths aren't equal
return false;
}
Map<String, Character> map = new HashMap<>();
// aaaa test test test1 test1
boolean used[] = new boolean[26];
for (int i = 0; i < letters.length; i++) {
Character existing = map.get(split[i]);
if (existing == null) {
// put into map if not found yet
if (used[(int)(letters[i] - 'a')]) {
return false;
}
used[(int)(letters[i] - 'a')] = true;
map.put(split[i], letters[i]);
} else {
// doesn't match - return false
if (existing != letters[i]) {
return false;
}
}
}
return true;
}
public static void main(String[] argv) {
System.out.println(isMatch("aba", "blue green blue"));
System.out.println(isMatch("aba", "blue green green"));
}
The next part of the problem stumped me:
With no delimiters in the input, write the same function.
eg:
isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true
I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length and checking in turn against isMatch) but the interviewer said it wasn't optimal.
I have no idea how to solve this part of the problem, is this even possible or am I missing something?
They were looking for something with a time complexity of O(M x N ^ C), where M is the length of the pattern and N is the length of the input, C is some constant.
Clarifications
I'm not looking for a regex solution, even if it works.
I'm not looking for the naive solution that generates all possible splits and checks them, even with optimization since that'll always be exponential time.

It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p) of the initial string and have matched i characters from the pattern. Let's take look at the i + 1 character.
If there is a string in the prefix that corresponds to the i + 1 letter, we should just check that a substring that starts at the position p + 1 is equal to it. If it is, we just proceed to i + 1 and p + the length of this string. Otherwise, we can kill this branch.
If there is no such string, we should try all substrings that start in the position p + 1 and end somewhere after it.
We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.
This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.

I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.
boolean matches(String s, String pattern) {
StringBuilder patternBuilder = new StringBuilder();
Map<Character, Integer> backreferences = new HashMap<>();
int nextBackreference = 1;
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (!backreferences.containsKey(c)) {
backreferences.put(c, nextBackreference++);
patternBuilder.append("(.*?)");
} else {
patternBuilder.append('\\').append(backreferences.get(c));
}
}
return s.matches(patternBuilder.toString());
}

You could improve on brute force by first assuming token lengths, and checking that the sum of token lengths equals the length of the test string. That would be quicker than pattern matching each time. Still very slow as number of unique tokens increases however.

UPDATE:
Here is my solution. Based it off of the explanation I made before.
import com.google.common.collect.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.math3.util.Combinations;
import java.util.*;
/**
* Created by carlos on 2/14/15.
*/
public class PatternMatcher {
public static boolean isMatch(char[] pattern, String searchString){
return isMatch(pattern, searchString, new TreeMap<Integer, Pair<Integer, Integer>>(), Sets.newHashSet());
}
private static boolean isMatch(char[] pattern, String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution, Set<String> mappedStrings) {
List<Integer> occurrencesOfCharacterInPattern = getNextUnmappedPatternOccurrences(candidateSolution, pattern);
if(occurrencesOfCharacterInPattern.size() == 0)
return isValidSolution(candidateSolution, searchString, pattern, mappedStrings);
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = sectionsOfUnmappedStrings(searchString, candidateSolution);
if(sectionsOfUnmappedStrings.size() == 0)
return false;
String firstUnmappedString = substring(searchString, sectionsOfUnmappedStrings.get(0));
for (int substringSize = 1; substringSize <= firstUnmappedString.length(); substringSize++) {
String candidateSubstring = firstUnmappedString.substring(0, substringSize);
if(mappedStrings.contains(candidateSubstring))
continue;
List<Pair<Integer, Integer>> listOfAllOccurrencesOfSubstringInString = Lists.newArrayList();
for (int currentIndex = 0; currentIndex < sectionsOfUnmappedStrings.size(); currentIndex++) {
Pair<Integer,Integer> currentUnmappedSection = sectionsOfUnmappedStrings.get(currentIndex);
List<Pair<Integer, Integer>> occurrencesOfSubstringInString =
findAllInstancesOfSubstringInString(searchString, candidateSubstring,
currentUnmappedSection);
for(Pair<Integer,Integer> possibleAddition:occurrencesOfSubstringInString) {
listOfAllOccurrencesOfSubstringInString.add(possibleAddition);
}
}
if(listOfAllOccurrencesOfSubstringInString.size() < occurrencesOfCharacterInPattern.size())
return false;
Iterator<int []> possibleSolutionIterator =
new Combinations(listOfAllOccurrencesOfSubstringInString.size(),
occurrencesOfCharacterInPattern.size()).iterator();
iteratorLoop:
while(possibleSolutionIterator.hasNext()) {
Set<String> newMappedSets = Sets.newHashSet(mappedStrings);
newMappedSets.add(candidateSubstring);
TreeMap<Integer,Pair<Integer,Integer>> newCandidateSolution = Maps.newTreeMap();
// why doesn't Maps.newTreeMap(candidateSolution) work?
newCandidateSolution.putAll(candidateSolution);
int [] possibleSolutionIndexSet = possibleSolutionIterator.next();
for(int i = 0; i < possibleSolutionIndexSet.length; i++) {
Pair<Integer, Integer> candidatePair = listOfAllOccurrencesOfSubstringInString.get(possibleSolutionIndexSet[i]);
//if(candidateSolution.containsValue(Pair.of(0,1)) && candidateSolution.containsValue(Pair.of(9,10)) && candidateSolution.containsValue(Pair.of(18,19)) && listOfAllOccurrencesOfSubstringInString.size() == 3 && candidateSolution.size() == 3 && possibleSolutionIndexSet[0]==0 && possibleSolutionIndexSet[1] == 2){
if (makesSenseToInsert(newCandidateSolution, occurrencesOfCharacterInPattern.get(i), candidatePair))
newCandidateSolution.put(occurrencesOfCharacterInPattern.get(i), candidatePair);
else
break iteratorLoop;
}
if (isMatch(pattern, searchString, newCandidateSolution,newMappedSets))
return true;
}
}
return false;
}
private static boolean makesSenseToInsert(TreeMap<Integer, Pair<Integer, Integer>> newCandidateSolution, Integer startIndex, Pair<Integer, Integer> candidatePair) {
if(newCandidateSolution.size() == 0)
return true;
if(newCandidateSolution.floorEntry(startIndex).getValue().getRight() > candidatePair.getLeft())
return false;
Map.Entry<Integer, Pair<Integer, Integer>> ceilingEntry = newCandidateSolution.ceilingEntry(startIndex);
if(ceilingEntry !=null)
if(ceilingEntry.getValue().getLeft() < candidatePair.getRight())
return false;
return true;
}
private static boolean isValidSolution( Map<Integer, Pair<Integer, Integer>> candidateSolution,String searchString, char [] pattern, Set<String> mappedStrings){
List<Pair<Integer,Integer>> values = Lists.newArrayList(candidateSolution.values());
return areIntegersConsecutive(Lists.newArrayList(candidateSolution.keySet())) &&
arePairsConsecutive(values) &&
values.get(values.size() - 1).getRight() == searchString.length() &&
patternsAreUnique(pattern,mappedStrings);
}
private static boolean patternsAreUnique(char[] pattern, Set<String> mappedStrings) {
Set<Character> uniquePatterns = Sets.newHashSet();
for(Character character:pattern)
uniquePatterns.add(character);
return uniquePatterns.size() == mappedStrings.size();
}
private static List<Integer> getNextUnmappedPatternOccurrences(Map<Integer, Pair<Integer, Integer>> candidateSolution, char[] searchArray){
List<Integer> allMappedIndexes = Lists.newLinkedList(candidateSolution.keySet());
if(allMappedIndexes.size() == 0){
return occurrencesOfCharacterInArray(searchArray,searchArray[0]);
}
if(allMappedIndexes.size() == searchArray.length){
return Lists.newArrayList();
}
for(int i = 0; i < allMappedIndexes.size()-1; i++){
if(!areIntegersConsecutive(allMappedIndexes.get(i),allMappedIndexes.get(i+1))){
return occurrencesOfCharacterInArray(searchArray,searchArray[i+1]);
}
}
List<Integer> listOfNextUnmappedPattern = Lists.newArrayList();
listOfNextUnmappedPattern.add(allMappedIndexes.size());
return listOfNextUnmappedPattern;
}
private static String substring(String string, Pair<Integer,Integer> bounds){
try{
string.substring(bounds.getLeft(),bounds.getRight());
}catch (StringIndexOutOfBoundsException e){
System.out.println();
}
return string.substring(bounds.getLeft(),bounds.getRight());
}
private static List<Pair<Integer, Integer>> sectionsOfUnmappedStrings(String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution) {
if(candidateSolution.size() == 0) {
return Lists.newArrayList(Pair.of(0, searchString.length()));
}
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = Lists.newArrayList();
List<Pair<Integer,Integer>> allMappedPairs = Lists.newLinkedList(candidateSolution.values());
// Dont have to worry about the first index being mapped because of the way the first candidate solution is made
for(int i = 0; i < allMappedPairs.size() - 1; i++){
if(!arePairsConsecutive(allMappedPairs.get(i), allMappedPairs.get(i + 1))){
Pair<Integer,Integer> candidatePair = Pair.of(allMappedPairs.get(i).getRight(), allMappedPairs.get(i + 1).getLeft());
sectionsOfUnmappedStrings.add(candidatePair);
}
}
Pair<Integer,Integer> lastMappedPair = allMappedPairs.get(allMappedPairs.size() - 1);
if(lastMappedPair.getRight() != searchString.length()){
sectionsOfUnmappedStrings.add(Pair.of(lastMappedPair.getRight(),searchString.length()));
}
return sectionsOfUnmappedStrings;
}
public static boolean areIntegersConsecutive(List<Integer> integers){
for(int i = 0; i < integers.size() - 1; i++)
if(!areIntegersConsecutive(integers.get(i),integers.get(i+1)))
return false;
return true;
}
public static boolean areIntegersConsecutive(int left, int right){
return left == (right - 1);
}
public static boolean arePairsConsecutive(List<Pair<Integer,Integer>> pairs){
for(int i = 0; i < pairs.size() - 1; i++)
if(!arePairsConsecutive(pairs.get(i), pairs.get(i + 1)))
return false;
return true;
}
public static boolean arePairsConsecutive(Pair<Integer, Integer> left, Pair<Integer, Integer> right){
return left.getRight() == right.getLeft();
}
public static List<Integer> occurrencesOfCharacterInArray(char[] searchArray, char searchCharacter){
assert(searchArray.length>0);
List<Integer> occurrences = Lists.newLinkedList();
for(int i = 0;i<searchArray.length;i++){
if(searchArray[i] == searchCharacter)
occurrences.add(i);
}
return occurrences;
}
public static List<Pair<Integer,Integer>> findAllInstancesOfSubstringInString(String searchString, String substring, Pair<Integer,Integer> bounds){
String string = substring(searchString,bounds);
assert(StringUtils.isNoneBlank(substring,string));
int lastIndex = 0;
List<Pair<Integer,Integer>> listOfOccurrences = Lists.newLinkedList();
while(lastIndex != -1){
lastIndex = string.indexOf(substring,lastIndex);
if(lastIndex != -1){
int newIndex = lastIndex + substring.length();
listOfOccurrences.add(Pair.of(lastIndex + bounds.getLeft(), newIndex + bounds.getLeft()));
lastIndex = newIndex;
}
}
return listOfOccurrences;
}
}
It works with the cases provided, but is not thoroughly tested. Let me know if there are any mistakes.
ORIGINAL RESPONSE:
Assuming your string you are searching can have arbitrary length tokens (which some of your examples do) then:
You want to start trying to break your string into parts that match the pattern. Looking for contradictions along the way to cut down on your search tree.
When you start processing you're going to select N characters of the beginning of the string. Now, go and see if you can find that substring in the rest of the string. If you can't then it can't possibly be a solution. If you can then your string looks something like this
(N characters)<...>[(N characters)<...>] where either one of the <...> contains 0+ characters and aren't necessarily the same substring. And whats inside of [] could repeat a number of times equal to the number of times (N characters) appears in the string.
Now, you have the first letter of your pattern matched, your not sure if the rest of the pattern matches, but you can basically re-use this algorithm (with modifications) to interrogate the <...> parts of the string.
You would do this for N = 1,2,3,4...
Make sense?
I'll work an example (which doesn't cover all cases, but hopefully illustrates) Note, when i'm referring to substrings in the pattern i'll use single quotes and when i'm referring to substrings of the string i'll use double quotes.
isMatch("ababac", "bluegreenbluegreenbluewhite")
Ok, 'a' is my first pattern.
for N = 1 i get the string "b"
where is "b" in the search string?
bluegreenbluegreenbluewhite.
Ok, so at this point this string MIGHT match with "b" being the pattern 'a'. Lets see if we can do the same with the pattern 'b'. Logically, 'b' MUST be the entire string "luegreen" (because its squeezed between two consecutive 'a' patterns) then I check in between the 2nd and 3rd 'a'. YUP, its "luegreen".
Ok, so far i've matched all but the 'c' of my pattern. Easy case, its the rest of the string. It matches.
This is basically writing a Perl regex parser. ababc = (.+)(.+)(\1)(\2)(.+). So you just have to convert it to a Perl regex

Here's a sample snippet of my code:
public static final boolean isMatch(String patternStr, String input) {
// Initial Check (If all the characters in the pattern string are unique, degenerate case -> immediately return true)
char[] patt = patternStr.toCharArray();
Arrays.sort(patt);
boolean uniqueCase = true;
for (int i = 1; i < patt.length; i++) {
if (patt[i] == patt[i - 1]) {
uniqueCase = false;
break;
}
}
if (uniqueCase) {
return true;
}
String t1 = patternStr;
String t2 = input;
if (patternStr.length() == 0 && input.length() == 0) {
return true;
} else if (patternStr.length() != 0 && input.length() == 0) {
return false;
} else if (patternStr.length() == 0 && input.length() != 0) {
return false;
}
int count = 0;
StringBuffer sb = new StringBuffer();
char[] chars = input.toCharArray();
String match = "";
// first read for the first character pattern
for (int i = 0; i < chars.length; i++) {
sb.append(chars[i]);
count++;
if (!input.substring(count, input.length()).contains(sb.toString())) {
match = sb.delete(sb.length() - 1, sb.length()).toString();
break;
}
}
if (match.length() == 0) {
match = t2;
}
// based on that character, update patternStr and input string
t1 = t1.replace(String.valueOf(t1.charAt(0)), "");
t2 = t2.replace(match, "");
return isMatch(t1, t2);
}
I basically decided to first parse the pattern string and determine if there are any matching characters in the pattern string. For example in "aab" "a" is used twice in the pattern string and so "a" cannot map to something else. Otherwise, if there are no matching characters in a string such as "abc", it won't matter what our input string is since the pattern is unique and so it doesn't matter what each pattern character matches to (degenerative case).
If there are matching characters in the pattern string, then I would begin to check what each string matches to. Unfortunately, without knowing the delimiter I wouldn't know how long each string would be. Instead, I just decided to parse 1 character at a time and check if the other parts of the string contains the same string and continue adding characters to the buffer letter by letter until the buffer string cannot be found in the input string. Once I have the string determined, it's now in the buffer I would simply delete all the matched strings in the input string and the character pattern from the pattern string then recurse.
Apologies if my explanation wasn't very clear, I hope my code can be clear though.

Concatenate two strings without intersection

I need to concatenate two string in another one without their intersection (in terms of last/first words).
In example:
"Some little d" + "little dogs are so pretty" = "Some little dogs are so pretty"
"I love you" + "love" = "I love youlove"
What is the most efficient way to do this in Java?

Here we go - if the first doesn't even contain the first letter of the second string, just return the concatenation. Otherwise, go from longest to shortest on the second string, seeing if the first ends with it. If so, return the non-overlapping parts, otherwise try one letter shorter.
public static String docat(String f, String s) {
if (!f.contains(s.substring(0,1)))
return f + s;
int idx = s.length();
try {
while (!f.endsWith(s.substring(0, idx--))) ;
} catch (Exception e) { }
return f + s.substring(idx + 1);
}
docat("Some little d", "little dogs are so pretty");
-> "Some little dogs are so pretty"
docat("Hello World", "World")
-> "Hello World"
docat("Hello", "World")
-> "HelloWorld"
EDIT: In response to the comment, here is a method using arrays. I don't know how to stress test these properly, but none of them took over 1ms in my testing.
public static String docat2(String first, String second) {
char[] f = first.toCharArray();
char[] s = second.toCharArray();
if (!first.contains("" + s[0]))
return first + second;
int idx = 0;
try {
while (!matches(f, s, idx)) idx++;
} catch (Exception e) { }
return first.substring(0, idx) + second;
}
private static boolean matches(char[] f, char[] s, int idx) {
for (int i = idx; i <= f.length; i++) {
if (f[i] != s[i - idx])
return false;
}
return true;
}

Easiest: iterate over the first string taking suffixes ("Some little d", "ome little d", "me little d"...) and test the second string with .startsWith. When you find a match, concatenate the prefix of the first string with the second string.
Here's the code:
String overlappingConcat(String a, String b) {
int i;
int l = a.length();
for (i = 0; i < l; i++) {
if (b.startsWith(a.substring(i))) {
return a.substring(0, i) + b;
}
}
return a + b;
}
The biggest efficiency problem here is the creation of new strings at substring. Implementing a custom stringMatchFrom(a, b, aOffset) should improve it, and is trivial.

You can avoid creating unnecessary substrings with the regionMatches() method.
public static String intersecting_concatenate(String a, String b) {
// Concatenate two strings, but if there is overlap at the intersection,
// include the intersection/overlap only once.
// find length of maximum possible match
int len_a = a.length();
int len_b = b.length();
int max_match = (len_a > len_b) ? len_b : len_a;
// search down from maximum match size, to get longest possible intersection
for (int size=max_match; size>0; size--) {
if (a.regionMatches(len_a - size, b, 0, size)) {
return a + b.substring(size, len_b);
}
}
// Didn't find any intersection. Fall back to straight concatenation.
return a + b;
}

isBlank(CharSequence), join(T...) and left(String, int) are methods from Apache Commons.
public static String joinOverlap(String s1, String s2) {
if(isBlank(s1) || isBlank(s2)) { //empty or null input -> normal join
return join(s1, s2);
}
int start = Math.max(0, s1.length() - s2.length());
for(int i = start; i < s1.length(); i++) { //this loop is for start point
for(int j = i; s1.charAt(j) == s2.charAt(j-i); j++) { //iterate until mismatch
if(j == s1.length() - 1) { //was it s1's last char?
return join(left(s1, i), s2);
}
}
}
return join(s1, s2); //no overlapping; do normal join
}

Create a suffix tree of the first String, then traverse the tree from the root taking characters from the beginning of the second String and keeping track of the longest suffix found.
This should be the longest suffix of the first String that is a prefix of the second String. Remove the suffix, then append the second String.
This should all be possible in linear time instead of the quadratic time required to loop through and compare all suffixes.

The following code seems to work for the first example. I did not test it extensively, but you get the point. It basically searches for all occurrences of the first char of the secondString in the firstString since these are the only possible places where overlap can occur. Then it checks whether the rest of the first string is the start of the second string. Probably the code contains some errors when no overlap is found, ... but it was more an illustration of my answer
String firstString = "Some little d";
String secondString = "little dogs are so pretty";
String startChar = secondString.substring( 0, 1 );
int index = Math.max( 0, firstString.length() - secondString.length() );
int length = firstString.length();
int searchedIndex = -1;
while ( searchedIndex == -1 && ( index = firstString.indexOf( startChar, index ) )!= -1 ){
if ( secondString.startsWith( firstString.substring( index, length ) ) ){
searchedIndex = index;
}
}
String result = firstString.substring( 0, searchedIndex ) + secondString;

Java String parsing - {k1=v1,k2=v2,...}

I have the following string which will probably contain ~100 entries:
String foo = "{k1=v1,k2=v2,...}"
and am looking to write the following function:
String getValue(String key){
// return the value associated with this key
}
I would like to do this without using any parsing library. Any ideas for something speedy?

If you know your string will always look like this, try something like:
HashMap map = new HashMap();
public void parse(String foo) {
String foo2 = foo.substring(1, foo.length() - 1); // hack off braces
StringTokenizer st = new StringTokenizer(foo2, ",");
while (st.hasMoreTokens()) {
String thisToken = st.nextToken();
StringTokenizer st2 = new StringTokenizer(thisToken, "=");
map.put(st2.nextToken(), st2.nextToken());
}
}
String getValue(String key) {
return map.get(key).toString();
}
Warning: I didn't actually try this; there might be minor syntax errors but the logic should be sound. Note that I also did exactly zero error checking, so you might want to make what I did more robust.

The speediest, but ugliest answer I can think of is parsing it character by character using a state machine. It's very fast, but very specific and quite complex. The way I see it, you could have several states:
Parsing Key
Parsing Value
Ready
Example:
int length = foo.length();
int state = READY;
for (int i=0; i<length; ++i) {
switch (state) {
case READY:
//Skip commas and brackets
//Transition to the KEY state if you find a letter
break;
case KEY:
//Read until you hit a = then transition to the value state
//append each letter to a StringBuilder and track the name
//Store the name when you transition to the value state
break;
case VALUE:
//Read until you hit a , then transition to the ready state
//Remember to save the built-key and built-value somewhere
break;
}
}
In addition, you can implement this a lot faster using StringTokenizers (which are fast) or Regexs (which are slower). But overall, individual character parsing is most likely the fastest way.

If the string has many entries you might be better off parsing manually without a StringTokenizer to save some memory (in case you have to parse thousands of these strings, it's worth the extra code):
public static Map parse(String s) {
HashMap map = new HashMap();
s = s.substring(1, s.length() - 1).trim(); //get rid of the brackets
int kpos = 0; //the starting position of the key
int eqpos = s.indexOf('='); //the position of the key/value separator
boolean more = eqpos > 0;
while (more) {
int cmpos = s.indexOf(',', eqpos + 1); //position of the entry separator
String key = s.substring(kpos, eqpos).trim();
if (cmpos > 0) {
map.put(key, s.substring(eqpos + 1, cmpos).trim());
eqpos = s.indexOf('=', cmpos + 1);
more = eqpos > 0;
if (more) {
kpos = cmpos + 1;
}
} else {
map.put(key, s.substring(eqpos + 1).trim());
more = false;
}
}
return map;
}
I tested this code with these strings and it works fine:
{k1=v1}
{k1=v1, k2 = v2, k3= v3,k4 =v4}
{k1= v1,}

Written without testing:
String result = null;
int i = foo.indexOf(key+"=");
if (i != -1 && (foo.charAt(i-1) == '{' || foo.charAt(i-1) == ',')) {
int j = foo.indexOf(',', i);
if (j == -1) j = foo.length() - 1;
result = foo.substring(i+key.length()+1, j);
}
return result;
Yes, it's ugly :-)

Well, assuming no '=' nor ',' in values, the simplest (and shabby) method is:
int start = foo.indexOf(key+'=') + key.length() + 1;
int end = foo.indexOf(',',i) - 1;
if (end==-1) end = foo.indexOf('}',i) - 1;
return (start<end)?foo.substring(start,end):null;
Yeah, not recommended :)

Adding code to check for existance of key in foo is left as exercise to the reader :-)
String foo = "{k1=v1,k2=v2,...}";
String getValue(String key){
int offset = foo.indexOf(key+'=') + key.length() + 1;
return foo.substring(foo.indexOf('=', offset)+1,foo.indexOf(',', offset));
}

Please find my solution:
public class KeyValueParser {
private final String line;
private final String divToken;
private final String eqToken;
private Map<String, String> map = new HashMap<String, String>();
// user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;
public KeyValueParser(String line, String divToken, String eqToken) {
this.line = line;
this.divToken = divToken;
this.eqToken = eqToken;
proccess();
}
public void proccess() {
if (Strings.isNullOrEmpty(line) || Strings.isNullOrEmpty(divToken) || Strings.isNullOrEmpty(eqToken)) {
return;
}
for (String div : line.split(divToken)) {
if (Strings.isNullOrEmpty(div)) {
continue;
}
String[] split = div.split(eqToken);
if (split.length != 2) {
continue;
}
String key = split[0];
String value = split[1];
if (Strings.isNullOrEmpty(key)) {
continue;
}
map.put(key.trim(), value.trim());
}
}
public String getValue(String key) {
return map.get(key);
}
}
Usage
KeyValueParser line = new KeyValueParser("user_uid=224620; pass=e10adc3949ba59abbe56e057f20f883e;", ";", "=");
String userUID = line.getValue("user_uid")

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Check if a string contains another string twice - java

Related

How to find first character after second dot java

Extract word from a line between a specific position and the next delimiter Java

Pattern matching interview Q

Concatenate two strings without intersection

Java String parsing - {k1=v1,k2=v2,...}

Categories

Resources