regex to select after and upto specific character at same time - java

Is there any way I could select specific text after specific text and keep selecting until that word is selected. And once selected then leave the remaining.
Here is the example
ABCDEF
JHJHJNJN<098978686
<jjg>
HGHJFGV XXXX
10-10-2018
JHKGHKGHG
JKHJHHJM
10-10-2019 JGHHGHGVH
HBVJHBHBB
Just want to select this date 10-10-2018 in whole content which always comes after XXX with couple of spaces. I can't use just regex with specific value(10-10-2018) because date can be changed and possible that date pattern somewhere is also present like in example in last line.
Please share your thoughts..!
Thanks

Assuming the example is correct, then the following regex will extract just the date using find() and ensure that DOTALL is set.
"XXXX.*?[\\s]+([\\d]{1,2}-[\\d]{1,2}-[\\d]{4})"
Basically, search for XXX followed by spaces/newline then find the date. It will be placed into a group and can then be extracted.
You can see the operation at this location, though be sure to select "DOTALL".
public String getDate(String input)
{
String date = "";
Pattern dte = Pattern.compile("XXXX.*?[\\s]+([\\d]{1,2}-[\\d]{1,2}-[\\d]{4})", Pattern.DOTALL);
Matcher m = dte.matcher(input);
if (m.find() && m.groupCount() > 0) {
date = m.group(1);
}
return date;
}
Test case
#Test
public void testData() throws Exception
{
RegEx_52879334 re = new RegEx_52879334();
String input = re.getInputData();
String date = re.getDate(input);
assertEquals("10-10-2018", date);
System.out.println("Found: " + date);
}
Output:
Found: 10-10-2018

Related

Java check one string in other string

I am receiving metainformations in a radio player via ICY.
Here is a short example of how this can look:
die neue welle - Der beste Musikmix aus 4 Jahrzehnten! - WELSHLY ARMS - SANCTUARY - Der Mehr Musik-Arbeitstag mit Benni Rettich
Another example for the meta information stream would be:
SWR1 Baden Württemberg
or
Welshly Arms - Sanctuary
Now I need to extract the title from there, the problem is that this 'meta-information' string can have any format.
What I know:
-I know the complete meta information string as showed in the first code section
-I know the station name, which is delivered by another ICY propertie
The first approach was to check if the string contains the station name (I thought if not, it has to be the title):
private boolean icyInfoContainsTitleInfo() {
String title = id3Values.get("StreamTitle"); //this is the title string
String icy = id3Values.get("icy-name"); //this is the station name
String[] titleSplit = title.split("\\s");
String[] icySplit = icy.split("\\s");
for (String a : titleSplit) {
StringBuilder abuilder = new StringBuilder();
abuilder.append(a);
for (String b : icySplit) {
StringBuilder builder = new StringBuilder();
builder.append(b);
if (builder.toString().toLowerCase().contains(abuilder.toString().toLowerCase())) {
return false;
}
}
}
return true;
}
But that does not help me if title and station are both present in the title string.
Is there a pattern that matches a string followed by a slash, backslash or a hyphen followed by another string?
Has anyone encountered a similiar problem?
Since you don't have a specification and each station can send a different format. I would not try to find a "perfect" pattern but simply create a mapping to store each station's format regex to recover the title.
First, create a map
Map<String, String> stationPatterns = new HashMap<>();
Them, insert some pattern you know
stationPatterns.put("station1", "(.*)");
stationPatterns.put("station2", "station2 - (.*)");
...
Then, you just need to get this pattern (where you ALWAYS find one capture group).
public String getPattern(String station){
return stationPatterns.getOrDefault(station, "(.*)"); //Use a default value to get everything)
}
With this, you just need to get a pattern to extract the title from a String.
Pattern pattern = Pattern.compile(getPattern(stationSelected));
Matcher matcher = pattern.matcher(title);
if (matcher.find()) {
System.out.println("Title : " + matcher.group(1));
} else {
System.err.println("The title doesn't match the format");
}

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}
You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

Using regex and android for categorizing different fields

I am currently trying do a business name card scanner app. The idea here is to take a picture of a name card and it would extract the text and categorize the text into different EditText.
I have already completed the OCR part which extract out all the text from a name card image.
What I am missing now is to make a regex method which can take this entire text extracted from OCR and categorize the name, email address, phone number into their respective fields in EditText.
Through some googling I have already found the regex formulas below:
private static final String EMAIL_PATTERN =
"[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +
"\\#" +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +
"(" +
"\\." +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +
")+";
private static final String PHONE_PATTERN =
"^[89]\\d{7}$";
private static final String NAME_PATTERN =
"/^[a-z ,.'-]+$/i";
Currently I am just able to extract out the email address using the below method:
public String EmailValidator(String email) {
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
return email.substring(matcher.start(), matcher.end());
} else {
// TODO handle condition when input doesn't have an email address
}
return email;
}
I am unsure of how to edit the ^above method^ to include using all the 3 regex patterns at once and display them to different EditText fields like (name, email address, phone number).
--------------------------------------------EDIT-------------------------------------------------
After using #Styx answer,
it has a problem with the parameter whereby how I used to pass the text "textToUse" to the method as shown below:
I have also tried passing the text into all three parameters. But since the method is void, it cannot be done. Or if I change the method to a String instead of void, it would require a return value.
Try this code. The function takes in the recognize text and split it using break line symbol. Then run a loop and determine the type of content by running a pattern check. Whenever a pattern is determined then the loop will go into next iteration using continue keyword. This piece of code also able to handle situation where 1 or more email and phone number appear on a single business card. Hope it helps. Cheers!
public void validator(String recognizeText) {
Pattern emailPattern = Pattern.compile(EMAIL_PATTERN);
Pattern phonePattern = Pattern.compile(PHONE_PATTERN);
Pattern namePattern = Pattern.compile(NAME_PATTERN);
String possibleEmail, possiblePhone, possibleName;
possibleEmail = possiblePhone = possibleName = "";
Matcher matcher;
String[] words = recognizeText.split("\\r?\\n");
for (String word : words) {
//try to determine is the word an email by running a pattern check.
matcher = emailPattern.matcher(word);
if (matcher.find()) {
possibleEmail = possibleEmail + word + " ";
continue;
}
//try to determine is the word a phone number by running a pattern check.
matcher = phonePattern.matcher(word);
if (matcher.find()) {
possiblePhone = possiblePhone + word + " ";
continue;
}
//try to determine is the word a name by running a pattern check.
matcher = namePattern.matcher(word);
if (matcher.find()) {
possibleName = possibleName + word + " ";
continue;
}
}
//after the loop then only set possibleEmail, possiblePhone, and possibleName into
//their respective EditText here.
}

Compare Multiple Differences in Java Strings

I have a string template that looks something like this:
This position is reserved for <XXXXXXXXXXXXXXXXXXXXXXXXXXX>. Start date is <XXXXXXXX>
Filled out, this might look like this (fixed width is preserved):
This position is reserved for <JOHN SMITH >. Start date is <20150228>
How can I extract multiple differences in a single String? I don't want to use an entire templating engine for one task if I can avoid it.
You can try regex like this :
public static void main(String[] args) {
String s = "This position is reserved for <JOHN SMITH >. Start date is <20150228>";
Pattern p = Pattern.compile(".*?<(.*?)>.*<(.*?)>");
Matcher m = p.matcher(s);
while(m.find()){
System.out.println("Name : " +m.group(1).trim());
System.out.println("Date : " +m.group(2).trim());
}
}
O/P :
Name : JOHN SMITH
Date : 20150228
If the template might be modified you could use a format pattern.
String expected = "This position is reserved for <JOHN SMITH >. Start date is <20150228>";
System.out.println(expected);
// define the output format
String template = "This position is reserved for <%-27s>. Start date is <%s>";
String name = "JOHN SMITH";
String startDate = "20150228";
// output the values using the defined format
System.out.println(String.format(template, name, startDate));

regex matcher check in if logic not working

Hi, you can see my code below. I have some strings Country, rank and grank in my code, initially they will be null, but if regex is mached, it should change the value. But even if regex is matched it is not changing the value it is always null. If I remove all if statements and append the string it works fine, but if match is not found it is throwing an exception. Please let me know how can I check this in if logic.
System.err.println(content);
Pattern c = Pattern.compile("NAME=\"(.*)\" RANK");
Pattern r = Pattern.compile("\" RANK=\"(.*)\"");
Pattern gr = Pattern.compile("\" TEXT=\"(.*)\" SOURCE");
Matcher co = c.matcher(content);
Matcher ra = r.matcher(content);
Matcher gra = gr.matcher(content);
co.find();
ra.find();
gra.find();
String country = null;
String Rank = null;
String Grank = null;
if (co.matches()) {
country = co.group(1);
}
if (ra.matches()) {
Rank = ra.group(1);
}
if (gra.matches()) {
Grank = gra.group(1);
}
You have to escape a single \ - use double \\ then it should work.
Tried this?
while (co.find()) {
System.out.print("Start index: " + co.start());
System.out.print(" End index: " + co.end() + " ");
System.out.println(co.group());
}
Personally I can't make your program work with / without the if so it's not a problem of logic but just a problem that it doesn't match the string for me
So I changed it to get something working, maybe you can use it :)
String content = "NAME=\"salut\" RANK=\"pouet\" TEXT=\"text\" SOURCE";
System.out.println(content);
System.out.println(content.replaceAll(("NAME=\"(.*)\"\\sRANK=\"(.*)\"\\sTEXT=\"(.*)\" SOURCE"), "$1---$2---$3"));
Output
NAME="salut" RANK="pouet" TEXT="text" SOURCE
salut---pouet---text

Categories