JAVA RegEx on _ delimited string

JAVA RegEx on _ delimited string - java

OK, I need a RegEx that traps the first word up to underscore character but then capture the next words that may have a underscore character. So, here is a group and the expected result:
gear_Armor_Blessed_Robes = "gear", "Armor" and "Blessed_Robes"
gear_Armor_Chain_Coif = "gear", "Armor" and "Chain_Coif"
gear_Armor_Chain_Hauberk = "gear", "Armor" and "Chain_Hauberk"
gear_Armor_Chain_Shirt = "gear", "Armor" and "Chain_Shirt"
gear_Armor_Chain_Leggings = "gear", "Armor" and "Chain_Leggings"

There's no need to use a regex for this, just use indexOf and substring:
String s = "Armor_Blessed_Robes";
int idx = s.indexOf("_");
System.out.println(s.substring(0, idx)); // Armor
System.out.println(s.substring(idx + 1)); // Blessed_Robes
With regex, you'd have to use the following, which is a tad more complicated and harder to read:
Pattern p = Pattern.compile("([^_]+)_(.+)");
Matcher m = p.matcher(s);
if (m.find()) {
String first = m.group(1); // Armor
String second = m.group(2); // Blessed_Robes
}

You can split along _, limiting the number of splits to 3:
assert Arrays.equals("gear_Armor_Blessed_Robes".split("_", 3),
new String[] { "gear", "Armor", "Blessed_Robes" });
It should give you a String[] that contains the 3 Strings as specified in your question.

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?

You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa

Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

How to replace all {!XXX} from string?

I have string with multiple {!XXX} phrases. For example:
Kumar gaurav {!str1} is just {!str2}, adasdas {!str3}
I need to replace all {!str} values with corresponding str, how to replace all {!str} from my string?

You can use a Pattern and Matcher, which provides you the means to query the string for a unknown number of elements, in combination with a regular expression of \{!str\d\} which will allow you to break the text down based on the tags
For example...
String text = "All that {!str1} is {!str2}";
Map<String, String> values = new HashMap<>(25);
values.put("{!str1}", "glitters");
values.put("{!str2}", "gold");
Pattern p = Pattern.compile("\\{!str\\d\\}");
Matcher matcher = p.matcher(text);
while (matcher.find()) {
String match = matcher.group();
text = text.replaceAll("\\" + match, values.get(match));
}
System.out.println(text);
Which outputs
All that glitters is gold
You could also use something like...
int previousStart = 0;
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
int start = matcher.start();
int end = matcher.end();
sb.append(text.substring(previousStart, start));
sb.append(values.get(match));
previousStart = end;
}
if (previousStart < text.length()) {
sb.append(text.substring(previousStart));
}
Which does away with the String creation in a loop and relies more on the position of the match to cut the original text around the tokens, which makes me happier ;)

use this regex, simple
String string="hello world {!hello}";
string=string.replaceAll("\\{!(.*?)\\}", "replace");
System.out.println(string); //this will print (hello world replace)

Extract every complete word that contains a certain substring

I'm trying to write a function that extracts each word from a sentence that contains a certain substring e.g. Looking for 'Po' in 'Porky Pork Chop' will return Porky Pork.
I've tested my regex on regexpal but the Java code doesn't seem to work. What am I doing wrong?
private static String foo()
{
String searchTerm = "Pizza";
String text = "Cheese Pizza";
String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b";
Pattern pattern = Pattern.compile ( sPattern );
Matcher matcher = pattern.matcher ( text );
if(matcher.find ())
{
String result = "-";
for(int i=0;i < matcher.groupCount ();i++)
{
result+= matcher.group ( i ) + " ";
}
return result.trim ();
}else
{
System.out.println("No Luck");
}
}

In Java to pass \b word boundaries to regex engine you need to write it as \\b. \b represents backspace in String object.
Judging by your example you want to return all words that contains your substring. To do this don't use for(int i=0;i < matcher.groupCount ();i++) but while(matcher.find()) since group count will iterate over all groups in single match, not over all matches.
In case your string can contain some special characters you probably should use Pattern.quote(searchTerm)
In your code you are trying to find "Pizza" in "Cheese Pizza" so I assume that you also want to find strings that same as searched substring. Although your regex will work fine for it, you can change your last part (.+?)?) to \\w* and also add \\w* at start if substring should also be matched in the middle of word (not only at start).
So your code can look like
private static String foo() {
String searchTerm = "Pizza";
String text = "Cheese Pizza, Other Pizzas";
String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b";
StringBuilder result = new StringBuilder("-").append(searchTerm).append(": ");
Pattern pattern = Pattern.compile(sPattern);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
result.append(matcher.group()).append(' ');
}
return result.toString().trim();
}

While the regex approach is certainly a valid method, I find it easier to think through when you split the words up by whitespace. This can be done with String's split method.
public List<String> doIt(final String inputString, final String term) {
final List<String> output = new ArrayList<String>();
final String[] parts = input.split("\\s+");
for(final String part : parts) {
if(part.indexOf(term) > 0) {
output.add(part);
}
}
return output;
}
Of course it is worth nothing that doing this will effectively be doing two passes through your input String. The first pass to find the characters that are whitespace to split on, and the second pass looking through each split word for your substring.
If one pass is necessary though, the regex path is better.

I find nicholas.hauschild's answer to be the best.
However if you really wanted to use regex, you could do it as such:
String searchTerm = "Pizza";
String text = "Cheese Pizza";
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm)
+ "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Pizza

The pattern should have been
String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b";
You want to capture the whole (pizza)string.?: ensures you don't capture a part of the string twice.

Try this pattern:
String searchTerm = "Po";
String text = "Porky Pork Chop oPod zzz llPo";
Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+");
Matcher m = p.matcher(myString);
while(m.find()) {
System.out.println(">> " + m.group());
}

Ok, I give you a pattern in raw style (not java style, you must double escape yourself):
(?i)\b[a-z]*po[a-z]*\b
And that's all.

Using Regex to ignore a pattern in java

I have a sentence: "we:PR show:V".
I want to match only those characters after ":" and before "\\s" using regex pattern matcher.
I used following pattern:
Pattern pattern=Pattern.compile("^(?!.*[\\w\\d\\:]).*$");
But it did not work.
What is the best pattern to get the output?

For a situation such as this, if you are using java, it may be easier to do something with substrings:
String input = "we:PR show:V";
String colon = ":";
String space = " ";
List<String> results = new ArrayList<String>();
int spaceLocation = -1;
int colonLocation = input.indexOf(colon);
while (colonLocation != -1) {
spaceLocation = input.indexOf(space);
spaceLocation = (spaceLocation == -1 ? input.size() : spaceLocation);
results.add(input.substring(colonLocation+1,spaceLocation);
if(spaceLocation != input.size()) {
input = input.substring(spaceLocation+1, input.size());
} else {
input = new String(); //reached the end of the string
}
}
return results;
This will be faster than trying to match on regex.

The following regex assumes that any non-whitespace characters following a colon (in turn preceded by non-colon characters) are a valid match:
[^:]+:(\S+)(?:\s+|$)
Use like:
String input = "we:PR show:V";
Pattern pattern = Pattern.compile("[^:]+:(\\S+)(?:\\s+|$)");
Matcher matcher = pattern.matcher(input);
int start = 0;
while (matcher.find(start)) {
String match = matcher.group(1); // = "PR" then "V"
// Do stuff with match
start = matcher.end( );
}
The pattern matches, in order:
At least one character that isn't a colon.
A colon.
At least non-whitespace character (our match).
At least one whitespace character, or the end of input.
The loop continues as long as the regex matches an item in the string, beginning at the index start, which is always adjusted to point to after the end of the current match.

Why isn't this lookahead assertion working in Java?

I come from a Perl background and am used to doing something like the following to match leading digits in a string and perform an in-place increment by one:
my $string = '0_Beginning';
$string =~ s|^(\d+)(?=_.*)|$1+1|e;
print $string; # '1_Beginning'
With my limited knowledge of Java, things aren't so succinct:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
string.replaceFirst( p.toString(), oneMore.toString() ); //
The regex doesn't match here... but it did in Perl.
What am I doing wrong here?

Actually it matches. You can find out by printing
System.out.println(p.matcher(string).find());
The issue is with line
String digit = string.replaceFirst( p.toString(), "$1" );
which is actually a do-nothing, because it replaces the first group (which is all you match, the lookahead is not part of the match) with the content of the first group.
You can get the desired result (namely the digit) via the following code
Matcher m = p.matcher(string);
String digit = m.find() ? m.group(1) : "";
Note: you should check m.find() anyways if nothing matches. In this case you may not call parseInt and you'll get an error. Thus the full code looks something like
Pattern p = Pattern.compile("^(\\d+)(?=_.*)");
String string = "0_Beginning";
Matcher m = p.matcher(string);
if (m.find()) {
String digit = m.group(1);
Integer oneMore = Integer.parseInt(digit) + 1;
string = m.replaceAll(oneMore.toString());
System.out.println(string);
} else {
System.out.println("No match");
}

Let's see what you are doing here.
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
You declare and initialize String and pattern objects.
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
(You are converting the pattern back into a string, and replaceFirst creates a new Pattern from this. Is this intentional?)
As Howard says, this replaces the first match of the pattern in the string with the contents of the first group, and the match of the pattern is just 0 here, as the first group. Thus digit is equal to string, ...
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
... and your parsing fails here.
string.replaceFirst( p.toString(), oneMore.toString() ); //
This would work (but convert the pattern again to string and back to pattern).
Here how I would do this:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher matcher = p.matcher(string);
StringBuffer result = new StringBuffer();
while(matcher.find()) {
int number = Integer.parseInt(matcher.group());
m.appendReplacement(result, String.valueOf(number + 1));
}
m.appendTail(result);
return result.toString(); // 1_Beginning
(Of course, for your regex the loop will only execute once, since the regex is anchored.)
Edit: To clarify my statement about string.replaceFirst:
This method does not return a pattern, but uses one internally. From the documentation:
Replaces the first substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceFirst(repl)
Here we see that a new pattern is compiled from the first argument.
This also shows us another way to do what you did want to do:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher m = p.matcher(string);
if(m.find()) {
digit = m.group();
int oneMore = Integer.parseInt( digit ) + 1
return m.replaceFirst(string, String.valueOf(oneMore));
}
This only compiles the pattern once, instead of thrice like in your original program - but still does the matching twice (once for find, once for replaceFirst), instead of once like in my program.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA RegEx on _ delimited string - java

You can split along _, limiting the number of splits to 3: assert Arrays.equals("gear_Armor_Blessed_Robes".split("_", 3), new String[] { "gear", "Armor", "Blessed_Robes" }); It should give you a String[] that contains the 3 Strings as specified in your question.

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

How to replace all {!XXX} from string?

Extract every complete word that contains a certain substring

Using Regex to ignore a pattern in java

Why isn't this lookahead assertion working in Java?

Categories

Resources