Java string.replaceFirst problems - java

I'm trying to replace a part of a string. The part contains some special characters:
#L(inches)=24#
I know replaceFirst is regex driven but I can't seem to create a regular expression that matches this part in a string, any ideas?

#.*?#
This should match the entire String above.

Related

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

How to check if one pattern is included into another

For example, I have the following regular expression:
/lol.*
All strings that matches this expression also matches another expression:
/l.*
How to check that first regex is included into the second one (using JAVA libraries)?
You need to use a lookahead in the second regex inorder to check if the first regex is present in the second or not.
\/l(?=ol).*
In java you don't need to escape forward slash. So the below regex would be enough.
l(?=ol).*
DEMO

Why does the following regex not work for Java?

I want to delete a word and all its trailing whitespace.
Here is my regex:
item.getName().replace(word + "(\\s*)?", "");
I tested this statement by running:
item.getName().replace(word, "");
This executes successfully, albeit with extra whitespaces. So the error must be due to "(\\s*)?" part. Is it because I did not escape the slash correctly? Or does Java not recognize something in that regex?
replace uses a String literal as its first argument. Use replaceAll instead
String.replace method does not take regular expressions. I believe you'd have to use replaceAll in orer to use regular expression. Also, regular expressions are a general grammar that expresses a certain pattern of String rather than a particular instances that contain certain words. You can't mix a word with a regular expression such way.

String replaceAll not working for $

I am trying to strip out certain "<" and ">" from HTML code that is being generated by a 3rd party (of morons)
I am doing a replaceAll for some certain left over conditions that are not being picked up by our ETL people.
I have this string: "<$200" and I need it to be XML compliant like "<$200"
string.replaceAll("<$200","<$200");
does not work. I assume it is some regEx funkyness. What is the correct way to do this?
String#replaceAll accepts a regex as an argument, and not a String. $ is a special character an won't be refereed as a String. Solutions:
Use String#replace instead - It accepts a String and not a regex:
string.replace("<$200","<$200");
Use Pattern#quote - It returns a string representation:
string.replaceAll(Pattern.quote("<$200"),"<$200");
Escape special characters by adding \\ before the special characters.
Use this
String demo ="<$200";
demo = demo.replaceAll("<","<");
System.out.println(demo);

Match "_<digit>string" wth a regular expression

I have a list of strings like
xxx_2pathway
xxx_6pathway
xxx_pathway
So I have a string followed by an underscore and "pathway". There may be a digit between the underscore and "pathway". How can I match and replace everything except xxx with a regular expression in Java?
This does not work:
pathnameRaw = pathnameRaw.replace("_\\dpathway","");
Your regex is almost fine. Since the digit is optional, add a ? at the end of \\d.
Also the replace method does not use regex. Use replaceAll instead.
See it
"_[0-9]?pathway"

Categories