Better way to detect if a string contains multiple words - java

I am trying to create a program that detects if multiple words are in a string as fast as possible, and if so, executes a behavior. Preferably, I would like it to detect the order of these words too but only if this can be done fast. So far, this is what I have done:
if (input.contains("adsf") && input.contains("qwer")) {
execute();
}
As you can see, doing this for multiple words would become tiresome. Is this the only way or is there a better way of detecting multiple substrings? And is there any way of detecting order?

I'd create a regular expression from the words:
Pattern pattern = Pattern.compile("(?=.*adsf)(?=.*qwer)");
if (pattern.matcher(input).find()) {
execute();
}
For more details, see this answer: https://stackoverflow.com/a/470602/660143

Editors note: Despite being heavily upvoted and accepted, this does not function the same as the code in the question. execute is called on the first match, like a logical OR.
You could use an array:
String[] matches = new String[] {"adsf", "qwer"};
bool found = false;
for (String s : matches)
{
if (input.contains(s))
{
execute();
break;
}
}
This is efficient as the one posted by you but more maintainable. Looking for a more efficient solution sounds like a micro optimization that should be ignored until proven to be effectively a bottleneck of your code, in any case with a huge string set the solution could be a trie.

In Java 8 you could do
public static boolean containsWords(String input, String[] words) {
return Arrays.stream(words).allMatch(input::contains);
}
Sample usage:
String input = "hello, world!";
String[] words = {"hello", "world"};
if (containsWords(input, words)) System.out.println("Match");

This is a classical interview and CS problem.
Robin Karp algorithm is usually what people first talk about in interviews. The basic idea is that as you go through the string, you add the current character to the hash. If the hash matches the hash of one of your match strings, you know that you might have a match. This avoids having to scan back and forth into your match strings.
https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
Other typical topics for that interview question are to consider a trie structure to speed up the lookup. If you have a large set of match strings, you have to always check a large set of match strings. A trie structure is more efficient to do that check.
https://en.wikipedia.org/wiki/Trie
Additional algorithms are:
- Aho–Corasick https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
- Commentz-Walter https://en.wikipedia.org/wiki/Commentz-Walter_algorithm

If you have a lot of substrings to look up, then a regular expression probably isn't going to be much help, so you're better off putting the substrings in a list, then iterating over them and calling input.indexOf(substring) on each one. This returns an int index of where the substring was found. If you throw each result (except -1, which means that the substring wasn't found) into a TreeMap (where index is the key and the substring is the value), then you can retrieve them in order by calling keys() on the map.
Map<Integer, String> substringIndices = new TreeMap<Integer, String>();
List<String> substrings = new ArrayList<String>();
substrings.add("asdf");
// etc.
for (String substring : substrings) {
int index = input.indexOf(substring);
if (index != -1) {
substringIndices.put(index, substring);
}
}
for (Integer index : substringIndices.keys()) {
System.out.println(substringIndices.get(index));
}

Use a tree structure to hold the substrings per codepoint. This eliminates the need to
Note that this is efficient only if the needle set is almost constant. It is not inefficient if there are individual additions or removals of substrings though, but a different initialization each time to arrange a lot of strings into a tree structure would definitely slower it.
StringSearcher:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
class StringSearcher{
private NeedleTree needles = new NeedleTree(-1);
private boolean caseSensitive;
private List<Integer> lengths = new ArrayList<>();
private int maxLength;
public StringSearcher(List<String> inputs, boolean caseSensitive){
this.caseSensitive = caseSensitive;
for(String input : inputs){
if(!lengths.contains(input.length())){
lengths.add(input.length());
}
NeedleTree tree = needles;
for(int i = 0; i < input.length(); i++){
tree = tree.child(caseSensitive ? input.codePointat(i) : Character.toLowerCase(input.codePointAt(i)));
}
tree.markSelfSet();
}
maxLength = Collections.max(legnths);
}
public boolean matches(String haystack){
if(!caseSensitive){
haystack = haystack.toLowerCase();
}
for(int i = 0; i < haystack.length(); i++){
String substring = haystack.substring(i, i + maxLength); // maybe we can even skip this and use from haystack directly?
NeedleTree tree = needles;
for(int j = 0; j < substring.maxLength; j++){
tree = tree.childOrNull(substring.codePointAt(j));
if(tree == null){
break;
}
if(tree.isSelfSet()){
return true;
}
}
}
return false;
}
}
NeedleTree.java:
import java.util.HashMap;
import java.util.Map;
class NeedleTree{
private int codePoint;
private boolean selfSet;
private Map<Integer, NeedleTree> children = new HashMap<>();
public NeedleTree(int codePoint){
this.codePoint = codePoint;
}
public NeedleTree childOrNull(int codePoint){
return children.get(codePoint);
}
public NeedleTree child(int codePoint){
NeedleTree child = children.get(codePoint);
if(child == null){
child = children.put(codePoint, new NeedleTree(codePoint));
}
return child;
}
public boolean isSelfSet(){
return selfSet;
}
public void markSelfSet(){
selfSet = true;
}
}

I think a better approach would be something like this, where we can add multiple values as a one string and by index of function validate index
String s = "123";
System.out.println(s.indexOf("1")); // 0
System.out.println(s.indexOf("2")); // 1
System.out.println(s.indexOf("5")); // -1

Related

String data manipulation with Maps for very large data input

I have solved Two Strings problem in HackerRank
Here is the problem.
Given two strings, determine if they share a common substring. A
substring may be as small as one character.
For example, the words "a", "and", "art" share the common substring.
The words "be" and "cat" do not share a substring.
Function Description
Complete the function twoStrings in the editor below. It should return
a string, either YES or NO based on whether the strings share a common
substring.
twoStrings has the following parameter(s):
s1, s2: two strings to analyze .
Output Format
For each pair of strings, return YES or NO.
However, when extra-long strings are subjected, my code does not run within the time limit. Any suggestions to improve efficiency? I think I can improve substring finding with using the Stream API. But I'm not sure how to use it in this context. Could someone please help me to understand this better?
public static void main(String[] args) {
String s1 = "hi";
String s2 = "world";
checkSubStrings(s1, s2);
}
static void checkSubStrings(String s1, String s2) {
Map<String, Long> s1Map = new HashMap<>();
Map<String, Long> s2Map = new HashMap<>();
findAllSubStrings(s1, s1Map);
findAllSubStrings(s2, s2Map);
boolean isContain = s2Map.entrySet().stream().anyMatch(i -> s1Map.containsKey(i.getKey()) );
if (isContain) {
System.out.println("YES");
} else {
System.out.println("NO");
}
}
static void findAllSubStrings(String s, Map<String, Long> map) {
for (int i = 0; i < s.length(); i++) {
String subString = s.substring(i);
for (int j = subString.length(); j > 0; j--) {
String subSubString = subString.substring(0, j);
if (map.containsKey(subSubString)) {
map.put(subSubString, map.get(subSubString) + 1);
} else {
if (!subSubString.equals(""))
map.put(subSubString, 1L);
}
}
}
}
Update
I just solved the question using HashSets.
I optimized the code using Set. Now it runs with very large Strings.
static String twoStrings(String s1, String s2) {
String result = null;
Set<Character> s1Set = new HashSet<>();
Set<Character> s2Set = new HashSet<>();
for(char a : s1.toCharArray()){
s1Set.add(a);
}
for(char a : s2.toCharArray()){
s2Set.add(a);
}
boolean isContain = s2Set.stream().anyMatch(s1Set::contains);
if(isContain){
result = "YES";
} else {
result = "NO";
}
return result;
}
If 2 strings share an N (>=2) character substring, they also share an N-1 character substring (because you can chop a character off the end of the common substring, and this will still be found in both strings). Extending this argument, they also share a 1-character substring.
As such, all you need to check are single-character substrings.
Fill your maps with single-character substrings instead, and you will avoid creating (and checking) unnecessary substrings. (And just use a Set instead of a Map, you never use the counts).
// Yields a `Set<Integer>`, which can be used directly to check.
return s.codePoints().boxed().collect(Collectors.toSet());

How can I retrieve the value in a Hashmap stored in an arraylist type hashmap?

I am a beginner in Java. Basically, I have loaded each text document and stored each individual words in the text document in the hasmap. Afterwhich, I tried storing all the hashmaps in an ArrayList. Now I am stuck with how to retrieve all the words in my hashmaps that is in the arraylist!
private static long numOfWords = 0;
private String userInputString;
private static long wordCount(String data) {
long words = 0;
int index = 0;
boolean prevWhiteSpace = true;
while (index < data.length()) {
//Intialise character variable that will be checked.
char c = data.charAt(index++);
//Determine whether it is a space.
boolean currWhiteSpace = Character.isWhitespace(c);
//If previous is a space and character checked is not a space,
if (prevWhiteSpace && !currWhiteSpace) {
words++;
}
//Assign current character's determination of whether it is a spacing as previous.
prevWhiteSpace = currWhiteSpace;
}
return words;
} //
public static ArrayList StoreLoadedFiles()throws Exception{
final File f1 = new File ("C:/Users/Admin/Desktop/dataFiles/"); //specify the directory to load files
String data=""; //reset the words stored
ArrayList<HashMap> hmArr = new ArrayList<HashMap>(); //array of hashmap
for (final File fileEntry : f1.listFiles()) {
Scanner input = new Scanner(fileEntry); //load files
while (input.hasNext()) { //while there are still words in the document, continue to load all the words in a file
data += input.next();
input.useDelimiter("\t"); //similar to split function
} //while loop
String textWords = data.replaceAll("\\s+", " "); //remove all found whitespaces
HashMap<String, Integer> hm = new HashMap<String, Integer>(); //Creates a Hashmap that would be renewed when next document is loaded.
String[] words = textWords.split(" "); //store individual words into a String array
for (int j = 0; j < numOfWords; j++) {
int wordAppearCount = 0;
if (hm.containsKey(words[j].toLowerCase().replaceAll("\\W", ""))) { //replace non-word characters
wordAppearCount = hm.get(words[j].toLowerCase().replaceAll("\\W", "")); //remove non-word character and retrieve the index of the word
}
if (!words[j].toLowerCase().replaceAll("\\W", "").equals("")) {
//Words stored in hashmap are in lower case and have special characters removed.
hm.put(words[j].toLowerCase().replaceAll("\\W", ""), ++wordAppearCount);//index of word and string word stored in hashmap
}
}
hmArr.add(hm);//stores every single hashmap inside an ArrayList of hashmap
} //end of for loop
return hmArr; //return hashmap ArrayList
}
public static void LoadAllHashmapWords(ArrayList m){
for(int i=0;i<m.size();i++){
m.get(i); //stuck here!
}
Firstly your login wont work correctly. In the StoreLoadedFiles() method you iterate through the words like for (int j = 0; j < numOfWords; j++) { . The numOfWords field is initialized to zero and hence this loop wont execute at all. You should initialize that with length of words array.
Having said that to retrieve the value from hashmap from a list of hashmap, you should first iterate through the list and with each hashmap you could take the entry set. Map.Entry is basically the pair that you store in the hashmap. So when you invoke map.entrySet() method it returns a java.util.Set<Map.Entry<Key, Value>>. A set is returned because the key will be unique.
So a complete program will look like.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
public class FileWordCounter {
public static List<HashMap<String, Integer>> storeLoadedFiles() {
final File directory = new File("C:/Users/Admin/Desktop/dataFiles/");
List<HashMap<String, Integer>> listOfWordCountMap = new ArrayList<HashMap<String, Integer>>();
Scanner input = null;
StringBuilder data;
try {
for (final File fileEntry : directory.listFiles()) {
input = new Scanner(fileEntry);
input.useDelimiter("\t");
data = new StringBuilder();
while (input.hasNext()) {
data.append(input.next());
}
input.close();
String wordsInFile = data.toString().replaceAll("\\s+", " ");
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
for(String word : wordsInFile.split(" ")){
String strippedWord = word.toLowerCase().replaceAll("\\W", "");
int wordAppearCount = 0;
if(strippedWord.length() > 0){
if(wordCountMap.containsKey(strippedWord)){
wordAppearCount = wordCountMap.get(strippedWord);
}
wordCountMap.put(strippedWord, ++wordAppearCount);
}
}
listOfWordCountMap.add(wordCountMap);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if(input != null) {
input.close();
}
}
return listOfWordCountMap;
}
public static void loadAllHashmapWords(List<HashMap<String, Integer>> listOfWordCountMap) {
for(HashMap<String, Integer> wordCountMap : listOfWordCountMap){
for(Entry<String, Integer> wordCountEntry : wordCountMap.entrySet()){
System.out.println(wordCountEntry.getKey() + " - " + wordCountEntry.getValue());
}
}
}
public static void main(String[] args) {
List<HashMap<String, Integer>> listOfWordCountMap = storeLoadedFiles();
loadAllHashmapWords(listOfWordCountMap);
}
}
Since you are beginner in Java programming I would like to point out a few best practices that you could start using from the beginning.
Closing resources : In your while loop to read from files you are opening a Scanner like Scanner input = new Scanner(fileEntry);, But you never closes it. This causes memory leaks. You should always use a try-catch-finally block and close resources in finally block.
Avoid unnecessary redundant calls : If an operation is the same while executing inside a loop try moving it outside the loop to avoid redundant calls. In your case for example the scanner delimiter setting as input.useDelimiter("\t"); is essentially a one time operation after a scanner is initialized. So you could move that outside the while loop.
Use StringBuilder instead of String : For repeated string manipulations such as concatenation should be done using a StringBuilder (or StringBuffer when you need synchronization) instead of using += or +. This is because String is an immutable object, meaning its value cannot be changed. So each time when you do a concatenation a new String object is created. This results in a lot of unused instances in memory. Where as StringBuilder is mutable and values could be changed.
Naming convention : The usual naming convention in Java is starting with lower-case letter and first letter upper-case for each word. So its a standard practice to name a method as storeLoadedFiles as opposed to StoreLoadedFiles. (This could be opinion based ;))
Give descriptive names : Its a good practice to give descriptive names. It helps in later code maintenance. Say its better to give a name as wordCountMap as opposed to hm. So in future if someone tries to go through your code they'll get a better and faster understanding about your code with descriptive names. Again opinion based.
Use generics as much as possible : This avoid additional casting overhead.
Avoid repetition : Similar to point 2 if you have an operation that result in the same output and need to be used multiple times try moving it to a variable and use the variable. In your case you were using words[j].toLowerCase().replaceAll("\\W", "") multiple times. All the time the result is the same but it creates unnecessary instances and repetitions. So you could move that to a String and use that String elsewhere.
Try using for-each loop where ever possible : This relieves us from taking care of indexing.
These are just suggestions. I tried to include most of it in my code but I wont say its the perfect one. Since you are a beginner if you tried to include these best practices now itself it'll get ingrained in you. Happy coding.. :)
for (HashMap<String, Integer> map : m) {
for(Entry<String,Integer> e:map.entrySet()){
//your code here
}
}
or, if using java 8 you can play with lambda
m.stream().forEach((map) -> {
map.entrySet().stream().forEach((e) -> {
//your code here
});
});
But before all you have to change method signature to public static void LoadAllHashmapWords(List<HashMap<String,Integer>> m) otherwise you would have to use a cast.
P.S. are you sure your extracting method works? I've tested it a bit and had list of empty hashmaps all the time.

Finding subsets of size k for a given set of size n

I am trying to work out the solution to the above problem and I came up with this
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Set;
public class Subset_K {
public static void main(String[]args)
{
Set<String> x;
int n=4;
int k=2;
int arr[]={1,2,3,4};
StringBuilder sb=new StringBuilder();
for(int i=1;i<=(n-k);i++)
sb.append("0");
for(int i=1;i<=k;i++)
sb.append("1");
String bin=sb.toString();
x=generatePerm(bin);
Set<ArrayList <Integer>> outer=new HashSet<ArrayList <Integer>>();
for(String s:x){
int dec=Integer.parseInt(s,2);
ArrayList<Integer> inner=new ArrayList<Integer>();
for(int j=0;j<n;j++){
if((dec&(1<<j))>0)
inner.add(arr[j]);
}
outer.add(inner);
}
for(ArrayList<Integer> z:outer){
System.out.println(z);
}
}
public static Set<String> generatePerm(String input)
{
Set<String> set = new HashSet<String>();
if (input == "")
return set;
Character a = input.charAt(0);
if (input.length() > 1)
{
input = input.substring(1);
Set<String> permSet = generatePerm(input);
for (String x : permSet)
{
for (int i = 0; i <= x.length(); i++)
{
set.add(x.substring(0, i) + a + x.substring(i));
}
}
}
else
{
set.add(a + "");
}
return set;
}
}
I am working on a 4 element set for test purpose and using k=2. What I try to do is initially generate a binary string where k bits are set and n-k bits are not set. Now using this string I find all the possible permutations of this string. And then using these permutations I output the respective element in the set. Now i cant figure out the complexity of this code because I used the generatePerm method from someone else. Can someone help me with the time complexity of the generatePerm method and also the overall time complexity of my solution. I found other recursive implementation of this problem in here Find all subsets of length k in an array However I cant figure out the complexity of it either. So need some help there.
Also I was trying to re-factor my code so that its not just for integers but for all types of data. I dont have much experience with generics. so when I try to modify ArrayList< Integer> to ArrayList< ?> in line 21 eclipse says
Cannot instantiate the type ArrayList< ?>
How do I correct that?
You can use ArrayList<Object> throughout. That will accept any kind of object. If you want a specific type that is determined by the calling code, you will need to introduce a generic type parameter.
Note that in your generatePerm method, you should not use the test
if (input == "")
Instead, you should use:
if ("".equals(input))
Your current code will only succeed if input is the interned string "". It will not work, for instance, if input is computed as a substring() with zero length. In general you should always compare strings with .equals() rather than with == (except under very specific conditions when you are looking for object identity rather than object equality).

Can I create an array of sets?

Here is what I am trying to do.
I am reading in a list of words with each having a level of complexity. Each line has a word followed by a comma and the level of the word. "watch, 2" for example. I wish to put all of the words of a given level into a set to ensure their uniqueness in that level. There are 5 levels of complexity, so ideally I'd like an array with 5 elements, each of which is a set.
I can then add words to each of the sets as I read them in. Later on, I wish to pull out a random word of a specified level.
I'm happy with everything except how to create an array of sets. I've read several other posts here that seem to agree that this can't be done exactly as I would hope, but I can't find a good work around. (No, I'm not willing to have 5 sets in a switch statement. Goes against the grain.)
Thanks.
You can use a map . Use level as key and value as the set which contains the words. This will help you to pull out the value for a given level, When a random word is requested from a level, get the value(set in this case) using the key which is the level and pick a random value from that. This will also scale if you increase the number of levels
public static void main(String[] args) {
Map<Integer, Set<String>> levelSet = new HashMap();
//Your code goes here to get the level and word
//
String word="";
int level=0;
addStringToLevel(levelSet,word,level);
}
private static void addStringToLevel(Map<Integer, Set<String>> levelSet,
String word, int level) {
if(levelSet.get(level) == null)
{
// this means this is the first string added for this level
// so create a container to hold the object
levelSet.put(level, new HashSet());
}
Set<String> wordContainer = levelSet.get(level);
wordContainer.add(word);
}
private static String getStringFromLevel(Map<Integer, Set<String>> levelSet,
int level) {
if(levelSet.get(level) == null)
{
return null;
}
Set<String> wordContainer = levelSet.get(level);
return "";// return a random string from wordContainer`
}
If you are willing to use Guava, try SetMultimap. It will take care of everything for you.
SetMultimap<Integer, String> map = HashMultimap.create();
map.put(5, "value");
The collection will take care of creating the inner Set instances for you unlike the array or List solutions which require either pre-creating the Sets or checking that they exist.
Consider using a List instead of an array.
Doing so might make your life easier.
List<Set<String>> wordSetLevels = new ArrayList();
// ...
for ( i = 0; i < 5; i++ ) {
wordSetLevels.add(new HashSet<String>());
}
wordSetLevels = Collections.unmodifiableList(wordSetLevels);
// ...
wordSetLevels.get(2).add("watch");
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
private Set<String>[] process(List<String> words) {
#SuppressWarnings("unchecked")
Set<String>[] arrayOfSets = new Set[5];
for(int i=0; i<arrayOfSets.length; i++) {
arrayOfSets[i] = new HashSet<String>();
}
for(String word: words) {
int index = getIndex(word);
String val = getValue(word);
arrayOfSets[index].add(val);
}
return arrayOfSets;
}
private int getIndex(String str) {
//TODO Implement
return 0;
}
private String getValue(String str) {
//TODO Implement
return "";
}
}

I want to get a specific combination of permutation?

I want to get specific combination of permutation of string like alphabet. To understand me, I'll show you the code that I using:
public class PermutationExample {
public static List<String> getPermutation(String input) {
List<String> collection = null;
if (input.length() == 1) {
collection = new ArrayList<String>();
collection.add(input);
return collection;
} else {
collection = getPermutation(input.substring(1));
Character first = input.charAt(0);
List<String> result = new ArrayList<String>();
for (String str : collection) {
for (int i = 0; i < str.length(); i++) {
String item = str.substring(0, i) + first
+ str.substring(i);
result.add(item);
}
String item = str.concat(first.toString());
result.add(item);
}
return result;
}
}
public static void main(String[] args) {
System.out.println(PermutationExample.getPermutation("ABCD"));
}
}
This code works well and i can get every combination, I can take it from the list, if I need 5-th element, I can receive it. But if the string is the alphabet ... , didn't works, it's too big. What I have to do, to get the specific element like 1221-th from all 26! combinations ?
I solved a similar problem a while ago, only in python.
If what you need is simply the n-th permutation, then you can do a lot better then generating every permutation and returning the n-th, if you try to think about generating only the permutation you need.
You can do this "simply" by figuring out what should be the element in front for the number of permutations you want, and then what should be the remaining of the elements recursively.
Assume a collection of values [0, ... ,X], for any values such that col[n] < col[n+1]
For N elements, there are N! possible permutations, the case when the collection will be perfectly reversed.
We will see the change in the head of the collection after each (N-1)! permutations, so if n < (N-1)!, the head is the head. You then have a remaining number of permutations, and you can apply the same logic recursively.
Does this help? I know it's fairly high level and you'll have to think a bit about it, but maybe it'll get you on the right track.

Categories