Regex: Ignoring numbers - java

I am trying to write a regex that tries to match on a specific string, but ignores all numbers in the target string - So my regex could be 'MyDog', but it should match MyDog, as well as My11Dog and MyDog1 etc. I could write something like
M[^\d]*y[^\d]D[^\d]*o[^\d]g[^\d]*
But that is pretty painful. Any ideas out there? I am using Java, and cannot change what is in the string, because I need to retrieve it as is.

Regular Expressions can do this at the end but why don't you get help by your programming language Java? (I can't Java!)
String s1 = "0My1D2og3";
s2 = s1.replaceAll("\d", "");
if (s2.equals("MyDog")) {
// Do something
}

Related

Using regular expressions in JAVA how do i say 4 any letter a space and then 4 numbers

What I want is a class code like ACCT 4838.
I tried
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][\\s][\\d][\\d][\\d][\\d]";
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\\s\\d\\d\\d\\d"
I apologize if this gets flagged i have been looking around for a while and i cant quite peg what it is im doing wrong. should be a quick one for someone.
You can use a regex like this:
(?i)^[a-z]{4} \d{4}$ // With inline insensitive flag
^[A-Za-z]{4} \d{4}$ // without inline flag
Remember to escape backslashes in java like ^[A-Za-z]{4} \\d{4}$
IdeOne example
Below works. In java the single \ gives an error. I was stupidly feeding in the wrong string in addition to not having the proper code.
String REGEX = "^[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\s\d\d\d\d";

Groovy Regular Expressions

I wanted to try to match the inner part of the string between the strong tags where it is guaranteed that the strong inside the strong tags starts with Price Range:. This text should not appear anywhere else in the string, but the <p> and <strong> tags certainly do. How can I match this with groovy?
<p><strong>Price Range: $61,000-$99,500</strong></p>
I tried:
def string = "<p><strong>Price Range: \$61,000-\$181,500</strong></p>strong";
string = string.replace(/Price.*strong/, "Replaced");
Just to see if I could get something to work, but I can't seem to get anything working that is more than a single word, which of course isn't particularly useful since I don't need regex for that.
Found the problems.
def string = "<p><strong>Price Range: \$61,000-\$181,500</strong>?</p>strong";
string = string.replaceFirst(~/<strong>Price Range.*<\/strong>/, "Replaced");
This includes the strong tags but it is good enough for my purpose. Needed replaceFirst instead of replace and ~ at the start to indicate a regex.
Is this what you're trying to do?
http://regexr.com?2t9jp

Best way to create SEO friendly URI string

The method should allows only "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-" chars in URI strings.
What is the best way to make nice SEO URI string?
This is what the general consensus is:
Lowercase the string.
string = string.toLowerCase();
Normalize all characters and get rid of all diacritical marks (so that e.g. é, ö, à becomes e, o, a).
string = Normalizer.normalize(string, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
Replace all remaining non-alphanumeric characters by - and collapse when necessary.
string = string.replaceAll("[^\\p{Alnum}]+", "-");
So, summarized:
public static String toPrettyURL(String string) {
return Normalizer.normalize(string.toLowerCase(), Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
.replaceAll("[^\\p{Alnum}]+", "-");
}
The following regex will do the same thing as your algorithm. I'm not aware of libraries for doing this type of thing.
String s = input
.replaceAll(" ?- ?","-") // remove spaces around hyphens
.replaceAll("[ ']","-") // turn spaces and quotes into hyphens
.replaceAll("[^0-9a-zA-Z-]",""); // remove everything not in our allowed char set
These are commonly called "slugs" if you want to search for more information.
You may want to check out other answers such as How can I create a SEO friendly dash-delimited url from a string? and How to make Django slugify work properly with Unicode strings?
They cover C# and Python more than javascript but have some language-agnostic discussion about slug conventions and issues you may face when making them (such as uniqueness, unicode normalization problems, etc).

Pattern match numbers/operators

Hey, I've been trying to figure out why this regular expression isn't matching correctly.
List l_operators = Arrays.asList(Pattern.compile(" (\\d+)").split(rtString.trim()));
The input string is "12+22+3"
The output I get is -- [,+,+]
There's a match at the beginning of the list which shouldn't be there? I really can't see it and I could use some insight. Thanks.
Well, technically, there is an empty string in front of the first delimiter (first sequence of digits). If you had, say a line of CSV, such as abc,def,ghi and another one ,jkl,mno you would clearly want to know that the first value in the second string was the empty string. Thus the behaviour is desirable in most cases.
For your particular case, you need to deal with it manually, or refine your regular expression somehow. Like this for instance:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(rtString);
if (m.find()) {
List l_operators = Arrays.asList(p.split(rtString.substring(m.end()).trim()));
// ...
}
Ideally however, you should be using a parser for these type of strings. You can't for instance deal with parenthesis in expressions using just regular expressions.
That's the behavior of split in Java. You just have to take it (and deal with it) or use other library to split the string. I personally try to avoid split from Java.
An example of one alternative is to look at Splitter from Google Guava.
Try Guava's Splitter.
Splitter.onPattern("\\d+").omitEmptyStrings().split(rtString)

replacing regex in java string

I have this java string:
String bla = "<my:string>invalid_content</my:string>";
How can I replace the "invalid_content" piece?
I know I should use something like this:
bla.replaceAll(regex,"new_content");
in order to have:
"<my:string>new_content</my:string>";
but I can't discover how to create the correct regex
help please :)
You could do something like
String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");
Mark's answer will work, but can be improved with two simple changes:
The central parentheses are redundant if you're not using that group.
Making it non-greedy will help if you have multiple my:string tags to match.
Giving:
String ResultString = SubjectString.replaceAll
( "(<my:string>).*?(</my:string>)" , "$1whatever$2" );
But that's still not how I'd write it - the replacement can be simplified using lookbehind and lookahead, and you can avoid repeating the tag name, like this:
String ResultString = SubjectString.replaceAll
( "(?<=<(my:string)>).*?(?=</\1>)" , "whatever" );
Of course, this latter one may not be as friendly to those who don't yet know regex - it is however more maintainable/flexible, so worth using if you might need to match more than just my:string tags.
See Java regex tutorial and check out character classes and capturing groups.
The PCRE would be:
/invalid_content/
For a simple substitution. What more do you want?
Is invalid_content a fix value? If so you could simply replace that with your new content using:
bla = bla.replaceAll("invalid_content","new_content");

Categories