Java check one string in other string - java

I am receiving metainformations in a radio player via ICY.
Here is a short example of how this can look:
die neue welle - Der beste Musikmix aus 4 Jahrzehnten! - WELSHLY ARMS - SANCTUARY - Der Mehr Musik-Arbeitstag mit Benni Rettich
Another example for the meta information stream would be:
SWR1 Baden Württemberg
or
Welshly Arms - Sanctuary
Now I need to extract the title from there, the problem is that this 'meta-information' string can have any format.
What I know:
-I know the complete meta information string as showed in the first code section
-I know the station name, which is delivered by another ICY propertie
The first approach was to check if the string contains the station name (I thought if not, it has to be the title):
private boolean icyInfoContainsTitleInfo() {
String title = id3Values.get("StreamTitle"); //this is the title string
String icy = id3Values.get("icy-name"); //this is the station name
String[] titleSplit = title.split("\\s");
String[] icySplit = icy.split("\\s");
for (String a : titleSplit) {
StringBuilder abuilder = new StringBuilder();
abuilder.append(a);
for (String b : icySplit) {
StringBuilder builder = new StringBuilder();
builder.append(b);
if (builder.toString().toLowerCase().contains(abuilder.toString().toLowerCase())) {
return false;
}
}
}
return true;
}
But that does not help me if title and station are both present in the title string.
Is there a pattern that matches a string followed by a slash, backslash or a hyphen followed by another string?
Has anyone encountered a similiar problem?

Since you don't have a specification and each station can send a different format. I would not try to find a "perfect" pattern but simply create a mapping to store each station's format regex to recover the title.
First, create a map
Map<String, String> stationPatterns = new HashMap<>();
Them, insert some pattern you know
stationPatterns.put("station1", "(.*)");
stationPatterns.put("station2", "station2 - (.*)");
...
Then, you just need to get this pattern (where you ALWAYS find one capture group).
public String getPattern(String station){
return stationPatterns.getOrDefault(station, "(.*)"); //Use a default value to get everything)
}
With this, you just need to get a pattern to extract the title from a String.
Pattern pattern = Pattern.compile(getPattern(stationSelected));
Matcher matcher = pattern.matcher(title);
if (matcher.find()) {
System.out.println("Title : " + matcher.group(1));
} else {
System.err.println("The title doesn't match the format");
}

Related

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}
You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

Java Comparing two strings with placeholder values

I am working on a command based feature for a project in Java and am having trouble when introducing arguments to these commands.
For example all the commands are stored like this this:
"Hey tell [USER] to [ACTION]"
Now when the user submits their command it will look like this:
"Hey tell Player to come see me"
Now I need to know how I can compare the users inputted command to the stored command containing placeholder values. I need to be able to compare the two strings and recognise that they are the same command and then from this extract the data [USER] and [ACTION] and return them as an array
array[0] = "Player"
array[1] = "come see me"
Really hope somebody can help me out, thanks
You can use Pattern Matching as below:
String command = "Hey tell [USER] to [ACTION]";
String input = "Hey tell Player to come see me";
String[] userInputArray = new String[2];
String patternTemplate = command.replace("[USER]", "(.*)");
patternTemplate = patternTemplate.replace("[ACTION]", "(.*)");
Pattern pattern = Pattern.compile(patternTemplate);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
userInputArray[0] = matcher.group(1);
userInputArray[1] = matcher.group(2);
}
In case you do not need a stored String like "Hey tell [USER] to [ACTION]" and you can use Java (java.util.regex) Pattern and Matcher.
This is an example:
Pattern p = Pattern.compile("Hey tell ([a-zA-z]+) to (.+)");
List<Pattern> listOfCommandPattern = new ArrayList<>();
listOfCommandPattern.add(p);
Example, parse the command:
String user;
String command;
Matcher m;
// for every command
for(Pattern p : listOfCommandPattern){
m = p.matcher(inputCommand);
if (m.matches()) {
user = m.group(1);
command = m.group(2);
break; // found user and command
}
}
Here is a slightly more general version:
String pattern = "Hey tell [USER] to [ACTION]";
String line = "Hey tell Player to come see me";
/* a regular expression matching bracket expressions */
java.util.regex.Pattern bracket_regexp = Pattern.compile("\\[[^]]*\\]");
/* how many bracket expressions are in "pattern"? */
int count = bracket_regexp.split(" " + pattern + " ").length - 1;
/* allocate a result array big enough */
String[] result = new String[count];
/* convert "pattern" into a regular expression */
String regex_pattern = bracket_regexp.matcher(pattern).replaceAll("(.*)");
java.util.regex.Pattern line_regex = Pattern.compile(regex_pattern);
/* match "line" */
if (line_regex.matcher(line).matches()) {
/* extract the matched strings */
for (int i=0; i<count; ++i) {
result[i] = line_matcher.group(i+1);
System.out.println(result[i]);
}
} else {
System.out.println("Doesn't match.");
}

Using regex and android for categorizing different fields

I am currently trying do a business name card scanner app. The idea here is to take a picture of a name card and it would extract the text and categorize the text into different EditText.
I have already completed the OCR part which extract out all the text from a name card image.
What I am missing now is to make a regex method which can take this entire text extracted from OCR and categorize the name, email address, phone number into their respective fields in EditText.
Through some googling I have already found the regex formulas below:
private static final String EMAIL_PATTERN =
"[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +
"\\#" +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +
"(" +
"\\." +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +
")+";
private static final String PHONE_PATTERN =
"^[89]\\d{7}$";
private static final String NAME_PATTERN =
"/^[a-z ,.'-]+$/i";
Currently I am just able to extract out the email address using the below method:
public String EmailValidator(String email) {
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
return email.substring(matcher.start(), matcher.end());
} else {
// TODO handle condition when input doesn't have an email address
}
return email;
}
I am unsure of how to edit the ^above method^ to include using all the 3 regex patterns at once and display them to different EditText fields like (name, email address, phone number).
--------------------------------------------EDIT-------------------------------------------------
After using #Styx answer,
it has a problem with the parameter whereby how I used to pass the text "textToUse" to the method as shown below:
I have also tried passing the text into all three parameters. But since the method is void, it cannot be done. Or if I change the method to a String instead of void, it would require a return value.
Try this code. The function takes in the recognize text and split it using break line symbol. Then run a loop and determine the type of content by running a pattern check. Whenever a pattern is determined then the loop will go into next iteration using continue keyword. This piece of code also able to handle situation where 1 or more email and phone number appear on a single business card. Hope it helps. Cheers!
public void validator(String recognizeText) {
Pattern emailPattern = Pattern.compile(EMAIL_PATTERN);
Pattern phonePattern = Pattern.compile(PHONE_PATTERN);
Pattern namePattern = Pattern.compile(NAME_PATTERN);
String possibleEmail, possiblePhone, possibleName;
possibleEmail = possiblePhone = possibleName = "";
Matcher matcher;
String[] words = recognizeText.split("\\r?\\n");
for (String word : words) {
//try to determine is the word an email by running a pattern check.
matcher = emailPattern.matcher(word);
if (matcher.find()) {
possibleEmail = possibleEmail + word + " ";
continue;
}
//try to determine is the word a phone number by running a pattern check.
matcher = phonePattern.matcher(word);
if (matcher.find()) {
possiblePhone = possiblePhone + word + " ";
continue;
}
//try to determine is the word a name by running a pattern check.
matcher = namePattern.matcher(word);
if (matcher.find()) {
possibleName = possibleName + word + " ";
continue;
}
}
//after the loop then only set possibleEmail, possiblePhone, and possibleName into
//their respective EditText here.
}

How to get multi sub strings from String, Android/Java

I know there are similar questions regarding to this. However, I tried many solutions and it just does not work for me.
I need help to extract multiple substrings from a string:
String content = "Ben Conan General Manager 90010021 benconan#gmail.com";
Note: The content in the String may not be always in this format, it may be all jumbled up.
I want to extract the phone number and email like below:
1. 90010021
2. benconan#gmail.com
In my project, I was trying to get this result and then display it into 2 different EditText.
I have tried using pattern and matcher class but it did not work.
I can provide my codes here if requested, please help me ~
--------------------EDIT---------------------
Below is my current method which only take out the email address:
private static final String EMAIL_PATTERN =
"[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +
"\\#" +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +
"(" +
"\\." +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +
")+";
public String EmailValidator(String email) {
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
return email.substring(matcher.start(), matcher.end());
} else {
// TODO handle condition when input doesn't have an email address
}
return email;
}
You can separate your string into arraylist like this
String str = "Ben Conan, General Manager, 90010021, benconan#gmail.com";
List<String> List = Arrays.asList(str.split(" "));
maybe you should do this instead of yours :
String[] Stringnames = new String[5]
Stringnames [0] = "your phonenumber"
Stringnames[1] = "your email"
System.out.println(stringnames)
Or :
String[] Stringnames = new String[2]
String[] Stringnames = {"yournumber","your phonenumber"};
System.out.println(stringnames [1]);
String.split(...) is a java method for that.
EXAMPLE:
String content = "Ben Conan, General Manager, 90010021, benconan#gmail.com";
String[] selection = content.split(",");
System.out.println(selection[0]);
System.out.println(selection[3]);
BUT if you want to do a Regex then take a look at this:
https://stackoverflow.com/a/16053961/982161
Try this regex for phone number
[\d+]{8} ---> 8 represents number of digits in phone number
You can use
[\d+]{8,} ---> if you want the number of more than 8 digits
Use appropriate JAVA functions for matching. You can try the results here
http://regexr.com/
For email, it depends whether the format is simple or complicated. There is a good explanation here
http://www.regular-expressions.info/index.html

Regex: capture group in list like string

I've searched stacked overflow and the net and I found similar questions but none that gave me a concrete answer. I have a string that acts as a list with the following formatting
Key(Value)/Key(value)/Key(value,value)). I would like to match them by key name IF the key exists, so I don't really want the parenthesis included anywhere.. just the key and the value. I coded something out, but it's a real mess...
so my conditions are:
1)extract key value pairs without parenthesis
2)extract IF they are available...
3)If value portion of list contains two values delimited by a ",", extract individually
textToParse = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000/IP(000.1.000.1)/Blue(2x4,2x4)"
String patternText = "^TdkRoot\(( [A-Za-z0-9]) Tdk\(( \\w}+) VAL\(( \\w) SN\(( \\w) IP\ (( \\w) Blue\(( \\w)"
Pattern pattern = Pattern.compile( patternText );
Matcher matcher = pattern.matcher(textToParse);
//Extract the groups from the regex (e.g. elements in braces)
String messageId = matcher.group( 1 );
String submitDate = matcher.group(4);
String statusText = matcher.group( 6 );
I think a cleaner/easier approach would be to extract the elements using patterns for each individual key/value. If so what pattern could I use to tell regex: for "key" grab "value" but leave the parenthesis... if value is delimited by a coma.. return array?? possibly?
Thanks Community!! Hope to hear from you!
PS I know (?<=\()(.*?)(?=\)) will capture anything in the parentheses "(This) value was captured), but how can I modify that to specify a key before the parentheses? "I want to capture whats in THIS(parentheses)" ... key THIS
possibly delimited by a coma
public static void main(String[] args) {
String textToParse = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000)/IP(000.1.000.1)/Blue(2x4,2x4)";
Pattern p = Pattern.compile("(\\w+)\\((.*?)\\)");
Matcher m = p.matcher(textToParse);
while (m.find()) {
System.out.println("key :" + m.group(1));
if (m.group(2).contains(",")) {
String[] s = m.group(2).split(",");
System.out.println("values : " + Arrays.toString(s));
} else {
System.out.println("value :" + m.group(2));
}
}
}
o/p:
key :TdkRoot
value :0x0
key :Tdk
values : [0x2, 0x0]
key :Tdk
values : [0x0, 0x1]
key :VAL
values : [40A8F0B32240, 2x4]
key :SN
value :0000:0000:0000:0000:0000:0000:0000:0000
key :IP
value :000.1.000.1
key :Blue
values : [2x4, 2x4]
Not sure if this is what you are looking for (your sample code does not compile) but the following code parses the input text into a map :
String inputText = "TdkRoot(0x0)/Tdk(0x2,0x0)/Tdk(0x0,0x1)/VAL(40A8F0B32240,2x4)/SN(0000:0000:0000:0000:0000:0000:0000:0000)/IP(000.1.000.1)/Blue(2x4,2x4)";
Pattern outerPattern = Pattern.compile("([^/()]+)\\(([^()]+)\\)");
Pattern innerPattern = Pattern.compile("([^,]+)");
Map<String, Collection<String>> parsedData = new HashMap<String, Collection<String>>();
Matcher outerMatcher = outerPattern.matcher(inputText);
while (outerMatcher.find()) {
String key = outerMatcher.group(1);
String val = outerMatcher.group(2);
Collection<String> valueCollection = new ArrayList<String>();
Matcher innerMatcher = innerPattern.matcher(val);
while (innerMatcher.find()) {
valueCollection.add(innerMatcher.group(1));
}
parsedData.put(key, valueCollection);
}
System.out.println(parsedData);
The resulting map (printed on last line) is
{Blue=[2x4, 2x4], VAL=[40A8F0B32240, 2x4], IP=[000.1.000.1], TdkRoot=[0x0], SN=[0000:0000:0000:0000:0000:0000:0000:0000], Tdk=[0x0, 0x1]}

Categories