Matching multiple keywords from a line in java - java

I have a line from which multiple keywords are to be matched. The whole keywords should be matched.
Example,
String str = "This is an example text for matching countries like Australia India England";
if(str.contains("Australia") ||
str.contains("India") ||
str.contains("England")){
System.out.println("Matches");
}else{
System.out.println("Does not match");
}
This code works fine. But if there are too many keywords to be matched, the line grows. Is there any elegant way of writing the same code?
Thanks

Your can write a regular expression like this:
Country0|Country1|Country2
Use it like this:
String str = "This is an example text like Australia India England";
if (Pattern.compile("Australia|India|England").matcher(str).find())
System.out.println("Matches");
If you would like to know which countries has matched:
public static void main(String[] args) {
String str = "This is an example text like Australia India England";
Matcher m = Pattern.compile("Australia|India|England").matcher(str);
while (m.find())
System.out.println("Matches: " + m.group());
}
Outputs:
Matches: Australia
Matches: India
Matches: England

Put countries to array and use small helper method. Using Set makes it even nicer, but building set of countries is bit more tedious. Something like following, but with better naming and null handling if wished:
String[] countries = {"Australia", "India", "England"};
String str = "NAustraliaA";
if (containsAny(str, countries)) {
System.out.println("Matches");
}
else {
System.out.println("Does not match");
}
public static boolean containsAny(String toCheck, String[] values) {
for (String s: values) {
if (toCheck.contains(s)) {
return true;
}
}
return false;
}

From readability point of view, an ArrayList of strings to be matched will be elegant. A loop can be formed to check if the word is available else it will set a flag to indicate that a keyword was missing
Something like, in case all are to be matched
for (String checkStr : myList) {
if(!str.contains(checkStr)) {
flag=false;
break;
}
}
in case any should match
for (String checkStr : myList) {
if(str.contains(checkStr)) {
flag=true;
break;
}
}

package com.test;
public class Program {
private String str;
public Program() {
str = "This is an example text for matching countries like Australia India England";
// TODO Auto-generated constructor stub
}
public static void main(String[] args) {
Program program = new Program();
program.doWork();
}
private void doWork() {
String[] tomatch = { "Australia", "India" ,"UK"};
for(int i=0;i<tomatch.length;i++){
if (match(tomatch[i])) {
System.out.println(tomatch[i]+" Matches");
} else {
System.out.println(tomatch[i]+" Does not match");
}
}
}
private boolean match(String string) {
if (str.contains(string)) {
return true;
}
return false;
}
}
//-----------------
output
Australia Matches
India Matches
UK Does not match

Related

How to check if the String object passed to the method contains at least one of the words from the list- JAVA [duplicate]

This question already has answers here:
How to check whether a List<String> contains a specific string?
(4 answers)
Closed last year.
I need help with creating a method that takes an object of the String type in the input arguments and a list of objects of the String type. The list contains forbidden words. How can I check if the String object passed to the method contains at least one of the words from the list?
public class Filter {
public static void main(String[] args) {
wordsFilter("This sentence contains a forbidden word");
}
private static void wordsFilter(String sentence) {
List<String> forbiddenWords = new ArrayList<>();
forbiddenWords.add("forbiddenWord");
forbiddenWords.add("forbidden word");
for (String word : forbiddenWords) {
if (sentence.contains(word)) {
System.out.println("The content cannot be displayed");
} else {
System.out.println(sentence);
}
}
}
}
Looks like you are missing a condition to exit the loop when a forbidden word was found:
private static void wordsFilter(String sentence) {
List<String> forbiddenWords = new ArrayList<>();
forbiddenWords.add("forbiddenWord");
forbiddenWords.add("forbidden word");
boolean doesContainAnyForbiddenWords = false;
for (String word : forbiddenWords) {
if (sentence.contains(word)) {
doesContainAnyForbiddenWords = true;
break; // leave the loop
} else {
System.out.println(sentence);
}
}
if (doesContainAnyForbiddenWords) {
System.out.println("The content cannot be displayed");
} else {
System.out.println(sentence);
}
}
You can do this easily using the Streams API
Optional<String> potential_forbidden_word =
forbiddenWords.stream().filter(word -> sentence.contains(word)).findFirst();
if(potential_forbidden_word.isPresent())
System.out.println("don't usw: "+potential_forbidden_word.get());
else
System.out.println("the sentence is clean");
you can even shorten the stream:
Optional<String> potential_forbidden_word =
forbiddenWords.stream().filter(sentence::contains).findFirst();
AS #Adriaan Koster mentioned: you can simply use the terminal operation anyMatch(Predicate):
boolean contains_forbidden_word =
forbiddenWords.stream().anyMatch(sentence::contains);
you might check for with equalsIgnoreCase() because "foo" or "Foo" or "FoO" and so on might also be forbidden.

Strings matching/containing using regex android

I want to match 2 strings
e.g. I have pre-defined words like wheat, egg, flour etc...
I got the text from OCR like wh3at, agg, f1Our etc...
So wh3at should match wheat OR f1Our should match flour etc..
I have worked on OCR projects where we "normalized" extracted text. You can build regular expressions that match reasonably expected/observed output.
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
String[] strings = {"wh3at", "f1Our", "f10ur", "agg"};
for (String s : strings)
System.out.println(String.format("%s -> %s", s, normalizeWord(s)));
}
public static String normalizeWord(String unnormalized) {
if (Pattern.compile("(?i)wh(e|3)at").matcher(unnormalized).matches()) {
return "wheat";
} else if (Pattern.compile("(?i)f(1|L)(O|0)ur").matcher(unnormalized).matches()) {
return "flour";
} else if (Pattern.compile("(?i)(a|e)gg").matcher(unnormalized).matches()) {
return "egg";
}
return unnormalized;
}
}

Regex filename with exactly 2 underscores

I need to match if filenames have exactly 2 underscores and extension 'txt'.
For example:
asdf_assss_eee.txt -> true
asdf_assss_eee_txt -> false
asdf_assss_.txt -> false
private static final String FILENAME_PATTERN = "/^[A-Za-z0-9]+_[A-Za-z0-9]+_[A- Za-z0-9]\\.txt";
does not working.
You just need to add + after the third char class and you must remove the first forward slash.
private static final String FILENAME_PATTERN = "^[A-Za-z0-9]+_[A-Za-z0-9]+_[A-Za-z0-9]+\\.txt$";
You can use a regex like this with insensitive flag:
[a-z\d]+_[a-z\d]+_[a-z\d]+\.txt
Or with inline insensitive flag
(?i)[a-z\d]+_[a-z\d]+_[a-z\d]+\.txt
Working demo
In case you want to shorten it a little, you could do:
([a-z\d]+_){2}[a-z\d]+\.txt
Update
So lets assume you want to at least one or more characters after the second underscore, before the file extension.
Regex is still not "needed" for this. You could split the String by the underscore and you should have 3 elements from the split. If the 3rd element is just ".txt" then it's not valid.
Example:
public static void main(String[] args) throws Exception {
String[] data = new String[] {
"asdf_assss_eee.txt",
"asdf_assss_eee_txt",
"asdf_assss_.txt"
};
for (String d : data) {
System.out.println(validate(d));
}
}
public static boolean validate(String str) {
if (!str.endsWith(".txt")) {
return false;
}
String[] pieces = str.split("_");
return pieces.length == 3 && !pieces[2].equalsIgnoreCase(".txt");
}
Results:
true
false
false
Old Answer
Not sure I understand why your third example is false, but this is something that can easily be done without regex.
Start with checking to see if the String ends with ".txt", then check if it contains only two underscores.
Example:
public static void main(String[] args) throws Exception {
String[] data = new String[] {
"asdf_assss_eee.txt",
"asdf_assss_eee_txt",
"asdf_assss_.txt"
};
for (String d : data) {
System.out.println(validate(d));
}
}
public static boolean validate(String str) {
if (!str.endsWith(".txt")) {
return false;
}
return str.chars().filter(c -> c == '_').count() == 2;
}
Results:
true
false
true
Use this Pattern:
Pattern p = Pattern.compile("_[^_]+_[^_]+\\.txt")
and use .find() instead of .match() in the Matcher:
Matcher m = p.matcher(filename);
if (m.find()) {
// found
}

HashSet contains

I have set of keywords and I have one string which contains keyword instances separated by '/'. e.g. 'Food' or 'Car' are keywords and '/food/oatmeal/fruits' , '/tyre/car/wheel' are strings. Total # of keywords are 5500 . I need to flag this string 'eligible' if it has at least one of the 5550 keywords in it. One way I can do is to load all 5500 keywords in hashSet and split String in to tokens and check if hashSet contains each of the tokens. If find match, I flag that String 'eligible'.
Performance wise, Can there be a better solution ?
A simplified solution for token matching could be
public class REPL {
private static final HashSet<String> keyWords = new HashSet<>();
public static void main(String[] args) {
keyWords.add("food");
keyWords.add("car");
String[] strings = {
"/food/oatmeal/fruits",
"/tyre/car/wheel",
"/steel/nuts/bolts",
"/cart/handle/grill"
};
for (String s : strings) {
System.out.printf("string: %-20s ", s);
if (isEligible(s)) {
System.out.println("eligible: true");
} else {
System.out.println("eligible: false");
}
}
}
private static boolean isEligible(String s) {
StringTokenizer st = new StringTokenizer(s, "/");
while (st.hasMoreTokens()) {
if (keyWords.contains(st.nextToken())) {
return true;
}
}
return false;
}
}

Find word in random string

Say I have a string that may look like:
"RAHDTWUOPO"
I know the word I'm looking for, for example:
"WORD"
what would be the best method for finding if I can make up "WORD" with a string like "RAHDTWUOPO"
EDIT:
Because of this question being unclear Id thought Id put more detail. What I wanted to achieve was to find if a word I knew beforehand could be made up from a random string of letters. Wasn't sure how to go about this, with a loop or if there was some other method.
I had come up with something quickly in my head but I knew it was to much effort, but I'll put it here to make this question more clearer of what I wanted to achieve.
public class MyLetterObject {
private String letter;
private Boolean used;
public String getText() {
return letter;
}
public void setLetter(String letter) {
this.letter = letter;
}
public Boolean getUsed() {
return used;
}
public void setUsed(Boolean used) {
this.used = used;
}
}
boolean ContainsWord(String Word, String RandomLetterString) {
List<MyLetterObject> MyLetterList = new ArrayList<MyLetterObject>();
for (char ch : RandomLetterString.toCharArray()) {
MyLetterObject mlo = new MyLetterObject();
mlo.setLetter(String.valueOf(ch));
mlo.setUsed(false);
MyLetterList.add(mlo);
}
String sMatch = "";
for (char Wordch : Word.toCharArray()) {
for (MyLetterObject o : MyLetterList) {
if (o.getUsed() == false
&& String.valueOf(Wordch).equals(o.getText())) {
o.setUsed(true);
sMatch = sMatch + String.valueOf(Wordch);
break;
}
}
}
if (sMatch.equals(Word)) {
return true;
} else {
return false;
}
}
As you can see to much effort. Evgeniy Dorofeev answer is much more better for the purpose of just finding if a word can be made from a string made up of letters in a random order.
try
boolean containsWord(String s, String w) {
List<Character> list = new LinkedList<Character>();
for (char c : s.toCharArray()) {
list.add(c);
}
for (Character c : w.toCharArray()) {
if (!list.remove(c)) {
return false;
}
}
return true;
}
You search every letter, one by one in the first String.
String randomString = "RAHDTWUOPO";
String word = "WORD";
for(int i=0;i<word.length; i++){
if(randomString.contains(word.charAt(i))){
// Yey, another letter found
}
}
Then you only have to test if for every i the letter was actually found, if not, the word is not included in the randomString.
You need find, that all letters from your word "WORD" exists in input string at list once.
Simple loop will do it for you but performance will not be best one.
You can use guava library multiset:
http://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
Multiset wordsMultiset = HashMultiset.create();wordsMultiset.addAll(words);// now we can use wordsMultiset.count(String) to find the count of a word
This example is about words, adopte it to chars of your input string.

Categories