Find match with regex in arraylist - java

I'm trying to develop a function that reads an ArrayList of string and is capable to find if there exist at least two tuples that have the same values from a set of indices but differ for a supplementary index. I've developed a version of this function by using a RegEx comparison as follow:
public boolean checkMatching(){
ArrayList<String> rows = new ArrayList<String>();
rows.add("7,2,2,1,1");
rows.add("7,3,2,1,1");
rows.add("7,8,1,1,1");
rows.add("8,2,1,3,1");
rows.add("8,2,1,4,1");
rows.add("8,4,5,1,1");
int[] indices = new int[] {2,3};
int supplementaryIndex = 1;
String regex = "";
for(String r : rows){
String[] rt = r.split(",");
regex = "[a-zA-Z0-9,-.]*[,][a-zA-Z0-9,-.]*[,][" + rt[indices[0]] + "][,][" + rt[indices[1]] + "][,][a-zA-Z0-9,-.]*";
for(String r2 : rows){
if(r.equals(r2) == false){
if(Pattern.matches(regex, r2)){
String[] rt2 = r.split(",");
if(rt[supplementaryIndex].equals(rt2[supplementaryIndex]) == false){
return true;
}
}
}
}
}
return false;
}
However, it is very expensive, especially if there are many rows. I've thought to create a more complex RegEx that considers multiple choices (with '|' condition), as follow:
public boolean checkMatching(){
ArrayList<String> rows = new ArrayList<String>();
rows.add("7,2,2,1,1");
rows.add("7,3,2,1,1");
rows.add("7,8,1,1,1");
rows.add("8,2,1,3,1");
rows.add("8,2,1,4,1");
rows.add("8,4,5,1,1");
int[] indices = new int[] {2,3};
int supplementaryIndex = 1;
String regex = "";
for(String r : rows){
String[] rt = r.split(",");
regex += "[a-zA-Z0-9,-.]*[,][a-zA-Z0-9,-.]*[,][" + rt[indices[0]] + "][,][" + rt[indices[1]] + "][,][a-zA-Z0-9,-.]*";
regex += "|"; //or
}
for(String r2 : rows){
if(Pattern.matches(regex, r2)){
//String rt2 = r.split(",");
//if(rt[supplementaryIndex].equals(rt2[supplementaryIndex]) == false){
return true;
//}
}
}
return false;
}
But the problem is that this way I can't compare the supplementary index values. Do you have any suggestions on how to define a regex that can directly satisfy this condition? Or, is it possible to leverage java streams to do this efficiently?

The main problem of your first approach is that you have two nested loops over the same list, which gets you a quadratic time complexity. To recall, that implies that the inner loop’s body gets executed 10,000 times for a list with 100 elements and 1,000,000 times for a list of 1,000 elements, and so on.
It doesn’t help calling Pattern.matches(regex, r2) in the inner loop’s body. That method exist only to support (as delegation target) the String operation r2.matches(r2), a convenience method, to do Pattern.compile(regex).matcher(input).matches() in one go. If you have to apply the same regex multiple times, you should keep and re-use the result of Pattern.compile(regex).
But here, there is no point in using a regex at all. You have already decomposed the string using split and can access each component via a plain array access. Using this starting point to compose a regex to be applied on the string again, is complicated and expensive at the same time.
Just use something like
// return true when at least one string has the same values for indices
// but different value for supplementaryIndex
Map<List<String>,String> map = new HashMap<>();
for(String r : rows) {
String[] rt = r.split(",");
List<String> key = List.of(rt[indices[0]], rt[indices[1]]);
String old = map.putIfAbsent(key, rt[supplementaryIndex]);
if(old != null && !old.equals(rt[supplementaryIndex])) return true;
}
return false;
This loops over the list a single time, extracts the key elements from the array and composes a key for a HashMap. There are various ways to do this. But while it’s tempting to just concatenate these elements like rt[indices[0]] + "," + rt[indices[1]], which would work, using a List is preferable, as it avoids expensive string concatenation.
The code puts the value to check into the map which will return a previous value if this key has been encountered before. If so, the old and new values can be compared and the method can return immediately if they don’t match.
When you are using Java 8, you have to use Arrays.asList(rt[indices[0]], rt[indices[1]]) instead of List.of(rt[indices[0]], rt[indices[1]]).
This can be easily expanded to support variable lengths for indices, by changing
List<String> key = List.of(rt[indices[0]], rt[indices[1]]);
to
List<String> key = Arrays.stream(indices).mapToObj(i -> rt[i]).toList();
or, if you are using a Java version older than 16:
List<String> key
= Arrays.stream(indices).mapToObj(i -> rt[i]).collect(Collectors.toList());

Related

Find a complex element in a set of elements

I have a function that allows me to find a match between an incomplete element and at least one element in a set. An example of an incomplete element is 22.2.X.13, in which there is an item (defined with X) that could assume any value.
The goal of this function is to find at least one element in a set of elements that has 22 in the first position, 2 on the second, and 13 on the fourth.
For example, if we consider the set:
{
20.8.31.13,
32.3.29.13,
24.2.12.13,
19.2.37.13,
22.2.22.13,
27.17.22.13,
26.22.32.13,
22.3.22.13,
20.19.12.13,
17.4.37.13,
31.8.34.13
}
The output of the function return True since there are elements 22.2.22.13 which correspond to 22.2.X.13.
My function compares each pair of elements like strings and each item of the elements as an integer:
public boolean containsElement(String element) {
StringTokenizer strow = null, st = null;
boolean check = true;
String nextrow = "", next = "";
for(String row : setOfElements) {
strow = new StringTokenizer(row, ".");
st = new StringTokenizer(element, ".");
check = true;
while(st.hasMoreTokens()) {
next = st.nextToken();
if(!strow.hasMoreTokens()) {
break;
}
nextrow = strow.nextToken();
if(next.compareTo("X") != 0) {
int x = Integer.parseInt(next);
int y = Integer.parseInt(nextrow);
if(x != y) {
check = false;
break;
}
}
}
if(check) return true;
}
return false;
However, it is an expensive operation, particularly if the size of the string increases. Can you suggest to me another strategy or data structure to quickly perform this operation?
My solution is closely related to strings. However, we can consider other types for elements (e.g. array, list, tree node, etc)
Thanks to all for your answers. I have tried almost all the functions, and the bench:
myFunction: 0ms
hasMatch: 2ms
Stream API: 5ms
isIPMatch; 2ms
I think that the main problem of the regular expression is the time to create the pattern and match the strings.
You want to use Regex which is made exactly for tasks like this. Check out the demo.
22\.2\.\d+\.13
Java 8 and higher
You can use Stream API as of Java 8 to find at least one matching the Regex using Pattern and Matcher classes:
Set<String> set = ... // the set of Strings (can be any collection)
Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13"); // compiled Pattern
boolean matches = set.stream() // Stream<String>
.map(pattern::matcher) // Stream<Matcher>
.anyMatch(Matcher::matches); // true if at least one matches
Java 7 and lower
The way is equal to Stream API: a short-circuit for-each loop with a break statement in case the match is found.
boolean matches = false;
Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13");
for (String str: set) {
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
matches = true;
break;
}
}
You can solve this by approaching the problem in a regex-based manner, as suggested by Nikolas Charalambidis (+1), or you can do it differently. To avoid being redundant with another answer, I will focus on an alternative approach here, using the split method.
public boolean isIPMatch(String pattern[], String input[]) {
if ((pattern == null) || (input == null) || (pattern.length <> input.length)) return false; //edge cases
for (int index = 0; index < pattern.length; index++) {
if ((!pattern[index].equals("X")) && (!pattern[index].equals(input[index]))) return false; //difference
}
return true; //everything matched
}
And you can call the method above in your loop, after converting the items to compare to String arrays via split.
For strings, regular expressions solve the task a lot better:
private boolean hasMatch(String[] haystack, String partial) {
String patternString = partial.replace("X", "[0-9]+").replace(".", "\\.");
// "22.2.X.13" becomes "22\\.2\\.[0-9]+\\.13"
Pattern p = Pattern.compile(patternString);
for (String s : haystack) {
if (p.matcher(s).matches()) return true;
}
return false;
}
For other types of objects, it depends on their structure.
If there is some kind of order, you could consider making your elements implement Comparable - and then you can place them into a TreeSet (or as keys in a TreeMap), which will always be kept sorted. This way, you can compare only against the elements that can match: mySortedSet.subSet(fromElement, toElement) returns only the elements between those two.
If there is no order, you will simply have to compare all elements against your "pattern".
Note that strings are comparable, but their default sorting order ignores the special semantics of your .-separators. So, with some care you can implement a treeset-based approach to make the search better-than-linear.
Other answers have already discussed using a regular expression by converting e.g. 22.2.X.13 to 22\.2\.\d+\.13 (don't forget to also escape the . or they mean "anything"). But while this will definitely be simpler and probably also a good bit faster, it does not lower the overall complexity. You still have to check each element in the set.
Instead, you might try to convert your set of IPs to a nested Map in this form:
{20: {8: {31: {13: null}}, 19: {12: {13: null}}}, 22: {2: {...}, 3: {...}}, ...}
(Of course, you should create this structure just once, and not for each search query.)
You can then write a recursive function match that works roughly as follows (pseudocode):
boolean match(ip: String, map: Map<String, Map<...>>) {
if (ip.empty) return true // done
first, rest = ip.splitfirst
if (first == "X") {
return map.values().any(submap -> match(rest, submap))
} else {
return first in map && match(rest, map[first])
}
}
This should reduce the complexity from O(n) to O(log n); more than that the more often you have to branch out, but at most O(n) for X.X.X.123 (X.X.X.X is trivial again). For small sets, a regular expression might still be faster, as it has less overhead, but for larger sets, this should be faster.

With 2 sets of strings, find a string that can be constructed from either set

Given the following sets of strings:
are yo
you u
how nhoware
alan arala
dear de
I need to find a sequence that can be constructed by concatenating the strings in either columnm, and it must use the same number of elements in both cases.
For example, "dearalanhowareyou" can be constructed from both sets of strings, using 5 elements each time.
A invalid choice would be "dearalanhoware" since it would use 4 elements from the left column but only 3 from the right
The problem is taken from here:
https://open.kattis.com/problems/correspondence
I'm using this site to improve for future job interviews and I just can't seem to figure this one out at all.
My only working implementation is a brute force approach taking every possible combination of each set which is not a very good solution due to time complexity.
My code right now:
list1 = getPermutations("",send1);
list2 = getPermutations("",send2);
ArrayList<String> duplicateValues = new ArrayList<String>();
for (int i = 0; i < list1.size(); i++) {
if (list2.contains(list1.get(i))) {
duplicateValues.add(list1.get(i));
}
private static ArrayList<String> getPermutations(String currentResult, ArrayList<String> possibleChars) {
ArrayList<String> result = new ArrayList<>(possibleChars.size());
for (String append : possibleChars) {
String permutation = currentResult + append;
result.add(permutation);
if (possibleChars.size() > 0) {
ArrayList<String> possibleCharsUpdated = (ArrayList) possibleChars.clone();
possibleCharsUpdated.remove(new String(append));
result.addAll(getPermutations(permutation, possibleCharsUpdated));
}
}
return result;
}
You can significantly narrow down the amount of permutations that you need to check by finding which words from each set could possibly begin the constructed String. In this case, the only two choices are dear and de because de is a substring of dear. Once you get the String started you can take a substring of the longer String, in this case "dear".substring("de".length()) returns ar which tells you that the next element from the right side needs to start with ar. So basically you have two cases :
String stringLeft = "", stringRight = "";
if(stringLeft.length() == stringRight.length())
{
//find two matching Strings here (one is substring of another)
String[] matching = getMatching(); //returns 1d array of size 2(if only two strings match)
stringLeft += matching[0];
stringRight += matching[1];
}
else
{
if(stringLeft.length() > stringRight.length())
{
String start = stringLeft.substring(stringRight.length());
//find string on right that starts with start
stringRight += getStringStartingWith(start);
}
else
{
String start = stringRight.substring(stringLeft.length());
//find string on left that starts with start
stringLeft += getStringStartingWith(start);
}
}
The only thing you need to look out for is if there are multiple matching Strings, or Strings starting with the substring you're looking for.

modifying algorithm to generate unique permutations in a string that contains duplicates

I'm aware of handling the issue with duplicates if I were to use a swap and permute method for generating permutations as shown here.
However, I'm using a different approach where I place current character between any two characters, at the beginning and at the end, of all of the permutations generated without the current character.
How can I modify my code below to give me only unique permutations in a string that contains duplicates
import java.util.ArrayList;
public class Permutations {
public static void main(String[] args) {
String str = "baab";
System.out.println(fun(str, 0));
System.out.println("number of Permutations =="+fun(str, 0).size());
}
static ArrayList<String> fun(String str, int index)
{
if(index == str.length())
{
ArrayList<String> al = new ArrayList<String>();
al.add("");
return al;
}
/* get return from lower frame */
ArrayList<String> rec = fun(str, index+1);
/* get character here */
char c = str.charAt(index);
/* to each of the returned Strings in ArrayList, add str.charAt(j) */
ArrayList<String> ret = new ArrayList<String>();
for(int i = 0;i<rec.size();i++)
{
String here = rec.get(i);
ret.add(c + here);
for(int j = 0;j<here.length();j++)
ret.add(here.substring(0,j+1) + c + here.substring(j+1,here.length()));
}
return ret;
}
}
At the moment, a string such as "bab" generates the following output, which contain abb and bba multiple times.
[bab, abb, abb, bba, bba, bab]
number of Permutations ==6
PS : I do not want to use a hashmap/Set to keep track of my duplicates and see whether they were encountered previously.
When you're iterating through the string and adding the character at each position, if you find a character in the string that is the same as the one you are inserting, break after inserting the new character immediately before it. This means that strings with the same character more than once can only be formed one way (by inserting in reverse order) so duplicates can't happen.
for(int j = 0;j<here.length();j++)
{
if(here.charAt(j) == c)
break;
ret.add(here.substring(0,j+1) + c + here.substring(j+1,here.length()));
}
A general approach to solving these problems involving generating sets without duplicates is to think of a property that only one of each set of duplicates will have, and then enforce that as a constraint. For example in this case the constraint is "all duplicated characters are added in reverse order" (forward order would work just as well, but you'd have to flip the loop direction). For a combination problem where the order isn't important, the constraint could be "items in each list are in ascending order". And so on.

Search array for value containing all characters(in any order) and return value

I've searched high and low and finally have to ask.
I have an array containing, for example, ["123456","132457", "468591", ... ].
I have a string with a value of "46891".
How do I search through the array and find the object that contains all the characters from my string value? For example the object with "468591" contains all the digits from my string value even though it's not an exact match because there's an added "5" between the "8" and "9".
My initial thought was to split the string into its own array of numbers (i.e. ["4","6","8","9","1"] ), then to search through the array for objects containing the number, to create a new array from it, and to keep whittling it down until I have just one remaining.
Since this is likely a learning assignment, I'll give you an idea instead of an implementation.
Start by defining a function that takes two strings, and returns true if the first one contains all characters of the second in any order, and false otherwise. It should looks like this:
boolean containsAllCharsInAnyOrder(String str, String chars) {
...
}
Inside the function set up a loop that picks characters ch from the chars string one by one, and then uses str.indexOf(ch) to see if the character is present in the string. If the index is non-negative, continue; otherwise, return false.
If the loop finishes without returning, you know that all characters from chars are present in src, so you can return true.
With this function in hand, set up another loop in your main function to go through elements of the array, and call containsAllCharsInAnyOrder on each one in turn.
I think you can use sets for this.
List<String> result = new ArrayList<>();
Set<String> chars = new HashSet<>(Arrays.asList(str.split(""));
for(String string : stringList) {
Set<String> stringListChars = new HashSet<>(Arrays.asList(string.split(""));
if(chars.containsAll(stringListChars)) {
result.add(string);
}
}
There is a caveat here; it doesn't work as you would expect for repeated characters and you haven't specified how you want to handle that (for example, 1154 compared against 154 will be considered a positive match). If you do want to take into account repeated characters and you want to make sure that they exist in the other string, you can use a List instead of a Set:
List<String> result = new ArrayList<>();
List<String> chars = Arrays.asList(str.split(""));
for(String string : stringList) {
List<String> stringListChars = Arrays.asList(string.split("");
if(chars.containsAll(stringListChars)) {
result.add(string);
}
}
Your initial idea was good start, so what you can do is to create not an array but set, then using Guava Sets#powerSet method to create all possible subsets filter only those that have "46891".length mebers, convert each set into String and look those strings in the original array :)
You could do this with the ArrayList containsAll method along with asList:
ArrayList<Character> lookingForChars = new ArrayList<Character>(Arrays.asList(lookingForString.toCharArray()));
for (String toSearchString : array) {
ArrayList<Character> toSearchChars = new ArrayList<Character>(Arrays.asList(toSearchString.toCharArray));
if (toSearchChars.containsAll(lookingForChars)) {
System.out.println("Match Found!");
}
}
You can use String#chartAt() in a nested for loop to compare your string with each of the array's elements.
This method would help you check whether a character is contained in both strings.
This is more tricky then a straigt-forward solution.
The are better algorithms but here one easy to implement and understand.
Ways of solving:
Go through every char at your given string and check if it at the
given arrray.
Collect list for every string from the selected
array containing the given char.
Check if no other char to check.
If there is, Perform A again but on the collected list(result list).
Else, Return all possible matches.
try this
public static void main(String args[]) {
String[] array = {"123456", "132457", "468591"};
String search = "46891";
for (String element : array) {
boolean isPresent = true;
for (int index = 0; index < search.length(); index++) {
if(element.indexOf(search.charAt(index)) == -1){
isPresent = false;
break;
}
}
if(isPresent)
System.out.println("Element "+ element + " Contains Serach String");
else
System.out.println("Element "+ element + " Does not Contains Serach String");
}
}
This sorts the char[]'s of the search string and the and the string to search on. Pretty sure (?) this is O(n logn) vs O(n^2) without sorting.
private static boolean contains(String searchMe, String searchOn){
char[] sm = searchMe.toCharArray();
Arrays.sort(sm);
char[] so = searchOn.toCharArray();
Arrays.sort(so);
boolean found = false;
for(int i = 0; i<so.length; i++){
found = false; // necessary to reset 'found' on subsequent searches
for(int j=0; j<sm.length; j++){
if(sm[j] == so[i]){
// Match! Break to the next char of the search string.
found = true;
break;
}else if(sm[j] > so[i]){ // No need to continue because they are sorted.
break;
}
}
if(!found){
// We can quit here because the arrays are sorted.
// I know if I did not find a match of the current character
// for so in sm, then no other characters will match because they are
// sorted.
break;
}
}
return found;
}
public static void main(String[] args0){
String value = "12345";
String[] testValues = { "34523452346", "1112", "1122009988776655443322",
"54321","7172839405","9495929193"};
System.out.println("\n Search where order does not matter.");
for(String s : testValues){
System.out.println(" Does " + s + " contain " + value + "? " + contains(s , value));
}
}
And the results
Search where order does not matter.
Does 34523452346 contain 12345? false
Does 1112 contain 12345? false
Does 1122009988776655443322 contain 12345? true
Does 54321 contain 12345? true
Does 7172839405 contain 12345? true
Does 9495929193 contain 12345? true

ArrayIndexOutOfBounds

I'm working on a fraction calculator using String.split() to get the terms split. The inputs are separated by spaces( 1/2 / 1/2)
String[] toReturn = new String[6];
result = isInputValid(expression);
toReturn = splitExpression(expression, placeToSplit[0]);
int indexOfUnderscore = toReturn[0].indexOf("_");
result = isInputValid(toReturn[0]);
if(toReturn[5] != null){
getOperator2(toReturn);
}
The error is in the if statement. toReturn[5] is out of bounds, because when two terms or less were answered split expression, which uses String.split() to split it at the spaces, doesn't create toReturn[5], even when I set values to toReturn[5]. If there is a way to tell if a field in an array exists, that could solve it, or if there is a way to tell how many terms are being put in. My program works for 1/2 + 1/2 * 1/2, but I haven't figured out how to tell if toReturn[5] exists.
Correctly:
result = isInputValid(expression);
String[] toReturn = splitExpression(expression, placeToSplit[0]);
int indexOfUnderscore = toReturn[0].indexOf("_");
result = isInputValid(toReturn[0]);
if(toReturn.length>5 && !"".equals(toReturn[5]) ){
getOperator2(toReturn);
}
the toReturn.length>5 part verifies that the array itself is at least 6 items long. Then you can check if that element is empty or not...
This is what it should be like.
Remove first line , String[] toReturn = new String[6];
update your third line,
String[] toReturn = splitExpression(expression, placeToSplit[0]);
And check this condition:
if(toReturn.length>5 ){ // use !toReturn[5].isEmpty() to check the empty string
getOperator2(toReturn);
}

Categories