Find words in string surrounded by "[" and "]": - java

I need help with a simple task in java. I have the following sentence:
Where Are You [Employee Name]?
your have a [Shift] shift..
I need to extract the strings that are surrounded by [ and ] signs.
I was thinking of using the split method with " " parameter and then find the single words, but I have a problem using that if the phrase I'm looking for contains: " ". using indexOf might be an option as well, only I don't know what is the indication that I have reached the end of the String.
What is the best way to perform this task?
Any help would be appreciated.

Try with regex \[(.*?)\] to match the words.
\[: escaped [ for literal match as it is a meta char.
(.*?) : match everything in a non-greedy way.
Sample code:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift.");
while(m.find()) {
System.out.println(m.group());
}

Here you go Java regular expression that extract text between two brackets including white spaces:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="[ Employee Name ]";
String re1=".*?";
String re2="( )";
String re3="((?:[a-z][a-z]+))"; // Word 1
String re4="( )";
String re5="((?:[a-z][a-z]+))"; // Word 2
String re6="( )";
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String ws1=m.group(1);
String word1=m.group(2);
String ws2=m.group(3);
String word2=m.group(4);
String ws3=m.group(5);
System.out.print("("+ws1.toString()+")"+"("+word1.toString()+")"+"("+ws2.toString()+")"+"("+word2.toString()+")"+"("+ws3.toString()+")"+"\n");
}
}
}
if you want to ignore white space remove "( )";

This is a Scanner base solution
Scanner sc = new Scanner("Where Are You [Employee Name]? your have a [Shift] shift..");
for (String s; (s = sc.findWithinHorizon("(?<=\\[).*?(?=\\])", 0)) != null;) {
System.out.println(s);
}
output
Employee Name
Shift

Use a StringBuilder (I assume you don't need synchronization).
As you suggested, indexOf() using your square bracket delimiters will give you a starting index and an ending index. use substring(startIndex + 1, endIndex - 1) to get exactly the string you want.
I'm not sure what you meant by the end of the String, but indexOf("[") is the start and indexOf("]") is the end.

That's pretty much the use case for a regular expression.
Try "(\\[[\\w ]*\\])" as your expression.
Pattern p = Pattern.compile("(\\[[\\w ]*\\])");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift..");
if (m.find()) {
String found = m.group();
}
What does this expression do?
First it defines a group (...)
Then it defines the starting point for that group. \[ matches [ since [ itself is a 'keyword' for regular expressions it has to be masked by \ which is reserved in Java Strings and has to be masked by another \
Then it defines the body of the group [\w ]*... here the regexpression [] are used along with \w (meaning \w, meaning any letter, number or undescore) and a blank, meaning blank. The * means zero or more of the previous group.
Then it defines the endpoint of the group \]
and closes the group )

Related

Split String using multiple delimiters in one step

My question is on splitting a string initially based on one criteria and then splitting the remaining part of the string with another criteria. I want to split the email address below into 3 parts in Java:
String email = "blah.blah_blah#mail.com";
// After splitting i want 3 separate strings (can be array or accessed via an Iterable)
string1.equals("blah.blah_blah");
string2.equals("mail");
string3.equals("com");
I know I can first split it into two based on # and then later split the second string based on ., but is there anyway of doing this in one step? I don't mind either the String#split method or regex method using Pattern and Matcher.
Use this regex in your split:
#|[.](?!.*[#.])
It will split at an # or at the very last . after the # (the one before "com"). Regex101 Tested
Use it like this:
String[] emailParts = email.split("#|[.](?!.*[#.])");
Then emailParts will be an array of the 3 strings that you want, in order.
As a bonus, if you want it to split at every dot after the # (including the ones between subdomains), then remove the . from the character class at the end of the regex. It will become #|[.](?!.*#)
You can use this regex:
([^#]*)#([^#]*)\.([^#\.]*)
Here is the demo
Here is the example Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaRegex
{
public static void main(String args[])
{
// String to be scanned to find the pattern.
String line = "blah.blah_blah#mail.mail2.com";
String pattern = "([^#]*)#([^#]*)\\.([^#\\.]*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find())
{
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
} else
{
System.out.println("NO MATCH");
}
}
}
Thanks for Pshemo for pointing out that look-aheads were unnecessary.
You seem to want to split on
- #
or
- any dot that is after # (in other words has # somewhere before it).
If that is the case you can use email.split("#|(?<=#.{0,1000})[.]"); which will return String[] array containing separated tokens.
I used .{0,1000} instead of .* because look-behind needs to have obvious max length in Java which excludes * quantifier. But assuming that # and . will not be separated by more than 1000 characters we can use {0,1000} instead.
String str = "blah.blah_blah#mail.com";
String[] tempMailSplitted;
String[] tempHostSplitted;
String delimiter = "#";
tempMailSplitted = str.split(delimiter);
System.out.println(temp[1]); //mail.com
String hostMailDelimiter = "."
tempHostSplitted = temp[1].split(hostMailDelimiter);
You can also do it in a regex if you want that ask me. :)

Java Find Substring Inbetween Characters

I am very stuck. I use this format to read a player's name in a string, like so:
"[PLAYER_yourname]"
I have tried for a few hours and can't figure out how to read only the part after the '_' and before the ']' to get there name.
Could I have some help? I played around with sub strings, splitting, some regex and no luck. Thanks! :)
BTW: This question is different, if I split by _ I don't know how to stop at the second bracket, as I have other string lines past the second bracket. Thanks!
You can do:
String s = "[PLAYER_yourname]";
String name = s.substring(s.indexOf("_") + 1, s.lastIndexOf("]"));
You can use a substring. int x = str.indexOf('_') gives you the character where the '_' is found and int y = str.lastIndexOF(']') gives you the character where the ']' is found. Then you can do str.substring(x + 1, y) and that will give you the string from after the symbol until the end of the word, not including the closing bracket.
Using the regex matcher functions you could do:
String s = "[PLAYER_yourname]";
String p = "\\[[A-Z]+_(.+)\\]";
Pattern r = Pattern.compile(p);
Matcher m = r.matcher(s);
if (m.find( ))
System.out.println(m.group(1));
Result:
yourname
Explanation:
\[ matches the character [ literally
[A-Z]+ match a single character (case sensitive + between one and unlimited times)
_ matches the character _ literally
1st Capturing group (.+) matches any character (except newline)
\] matches the character ] literally
This solution uses Java regex
String player = "[PLAYER_yourname]";
Pattern PLAYER_PATTERN = Pattern.compile("^\\[PLAYER_(.*?)]$");
Matcher matcher = PLAYER_PATTERN.matcher(player);
if (matcher.matches()) {
System.out.println( matcher.group(1) );
}
// prints yourname
see DEMO
You can do like this -
public static void main(String[] args) throws InterruptedException {
String s = "[PLAYER_yourname]";
System.out.println(s.split("[_\\]]")[1]);
}
output: yourname
Try:
Pattern pattern = Pattern.compile(".*?_([^\\]]+)");
Matcher m = pattern.matcher("[PLAYER_yourname]");
if (m.matches()) {
String name = m.group(1);
// name = "yourname"
}

Regular expression to remove everything but words. java

This code doesn't seem doing the right job. It removes the spaces between the words!
input = scan.nextLine().replaceAll("[^A-Za-z0-9]", "");
I want to remove all extra spaces and all numbers or abbreviations from a string, except words and this character: '.
For Example:
input: 34 4fF$##D one 233 r # o'clock 329riewio23
returns: one o'clock
public static String filter(String input) {
return input.replaceAll("[^A-Za-z0-9' ]", "").replaceAll(" +", " ");
}
The first replace replaces all characters except alphabetic characters, the single-quote, and spaces. The second replace replaces all instances of one or more spaces, with a single space.
Your solution doesn't work because you don't replace numbers and you also replace the ' character.
Check out this solution:
Pattern pattern = Pattern.compile("[^| ][A-Za-z']{2,} ");
String input = scan.nextLine();
Matcher matcher = pattern.matcher(input);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append(matcher.group());
}
System.out.println(result.toString());
It looks for the beginning of the string or a space ([^| ]) and then takes all the following characters ([A-Za-z']). However, it only takes the word if there are 2 or more charactes ({2,}) and there has to be a trailing space.
If you want to just extract that time information use this regex group match:
input = scan.nextLine();
Pattern p = Pattern.compile("([a-zA-Z]{3,})\\s.*?(o'clock)");
Matcher m = p.matcher(input);
if (m.find()) {
input = m.group(1) + " " + m.group(2);
}
The regex is quite naive though, and will only work if the input is always of a similar format.

Find a subtring in a string using a regular expression - JAVA

Suppose i have a string " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' "
I want to replace the substring a.b.c in the string which are only outside the single quote , but it is not working.
Here is my code
`
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
Pattern p = Pattern.compile("a\\.b\\.c");
Matcher m = p.matcher(str);
int x = m.find()
`
use this pattern : a\.b\.c(?=(([^']*'){2})*[^']*$) Demo
To search for a substring outside quotes, you can do something like this:
Pattern pat = Pattern.compile("^(?:[^']|'[^']*')*?a\\.b\\.c");
The first part will skip over:
every character that isn't a quote mark ([^']), or
every sequence of non-quote-mark characters enclosed in quotes ('[^']*').
Once those are skipped, then if it sees the pattern you want, it will know that it isn't inside quote marks.
This will handle a simple case. If things start getting more complicated, e.g. you want to allow \' to quote a quote mark in your input string the way C or Java does in a string literal, the regex starts getting more complicated, and you can quickly reach a point whether either your regex is unreadable or regexes aren't suitable solutions.
EDIT: fixed to put "reluctant" qualifier after second *, so that the first a.b.c will be found.
EDIT 2: If you want to replace the substring you find, it gets trickier. The above pattern matches the entire beginning of the string up through a.b.c, and I couldn't get a look-behind to work so that the match would be only the a.b.c part. I think you'll need to put the beginning of the string in a group, and then use $1 in the replacement string to copy the beginning:
Pattern pat = Pattern.compile("^((?:[^']|'[^']*')*?)a\\.b\\.c");
Matcher m = pat.matcher(source);
if (m.find()) {
result = m.replaceFirst("$1replacement");
}
I'm not sure replaceAll works with this, so if you want to replace all of them, you may need to loop.
I wouldn't mess with REGEX.
public static void main(String[] args) {
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
String[] s = str.split("'");
str = s[0].replaceAll("[abc]", "") + "'"+ s[1]+"'"
+ s[2].replaceAll("[abc]", "");
System.out.println(str);
}
OP:
kk ..jkmk jjko ... jjj 'a.b.ckkkkkkkkkkkkkkkk '
Inefficient.. but works

Punctuation Regex in Java

First, i'm read the documentation as follow
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
And i want find any punctuation character EXCEPT #',& but i don't quite understand.
Here is :
public static void main( String[] args )
{
// String to be scanned to find the pattern.
String value = "#`~!#$%^";
String pattern = "\\p{Punct}[^#',&]";
// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
// Now create matcher object.
Matcher m = r.matcher(value);
if (m.find()) {
System.out.println("Found value: " + m.groupCount());
} else {
System.out.println("NO MATCH");
}
}
Result is NO MATCH.
Is there any mismatch ?
Thanks
MRizq
You're matching two characters, not one. Using a (negative) lookahead should solve the task:
(?![#',&])\\p{Punct}
You may use character subtraction here:
String pat = "[\\p{Punct}&&[^#',&]]";
The whole pattern represents a character class, [...], that contains a \p{Punct} POSIX character class, the && intersection operator and [^...] negated character class.
A Unicode modifier might be necessary if you plan to also match all Unicode punctuation:
String pat = "(?U)[\\p{Punct}&&[^#',&]]";
^^^^
The pattern matches any punctuation (with \p{Punct}) except #, ', , and &.
If you need to exclude more characters, add them to the negated character class. Just remember to always escape -, \, ^, [ and ] inside a Java regex character class/set. E.g. adding a backslash and - might look like "[\\p{Punct}&&[^#',&\\\\-]]" or "[\\p{Punct}&&[^#',&\\-\\\\]]".
Java demo:
String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^#',&]]";
Pattern r = Pattern.compile(pattern); // Create a Pattern object
Matcher m = r.matcher(value); // Now create matcher object.
while (m.find()) {
System.out.println("Found value: " + m.group());
}
Output:
Found value: #
Found value: !
Found value: #
Found value: %
Found value: ,

Categories