Regular expression for file extensions in Java

Regular expression for file extensions in Java - java

I am trying to write a simple regular expression to identify all filenames in a list which end with ".req.copied" extension. The code I am using is given below
public class Regextest {
public static void main(String[] args) {
// TODO Auto-generated method stub
String test1=new String("abcd.req.copied");
if(test1.matches("(req.copied)?")) {
System.out.println("Matches");
}
else
System.out.println("Does not Match");
}
}
The regex tests ok in online regex testers but does not function in the program. I have tried multiple combinations (like splitting req and copied into two regexes, or literal matching of the dot character) but nothing works (even the simplest regex of (reg)? returned a "Does not Match" output). Please let me know how to tackle this.

Main problem with matches here is that it requires from regex to match entire string. But in your case your regex describes only part of it.
If you really want to use matches here your code could look more like
test1.matches(".*\\.req\\.copied")
. represents any character (except line separators like \r) so if you want it to represent only dot you need to escape it like \. (in string we need to write it as "\\." because \ has also special meaning there - like creating special characters \r \n \t and so on - so it also requires escaping via additional \).
.* will let regex accept any characters before .req.copied
But in your case you should simply use endsWith method
test1.endsWith(".req.copied")

As resueman said in the comments, you don't need a regex for that. You can simply check if each filename endsWith(".req.copied").
if(test1.endsWith(".req.copied")){
System.out.println("Matches");
}else{
System.out.println("Does not match");
}
By the way, the above if-else can be replaced with System.out.println(test1.endsWith(".req.copied") ? "Matches" : "Does not match");.

test1.matches(".*\\.req\\.copied") should do it but in your case you should consider using endsWith() instead of matches.

You should come up with a Regex that would match the whole string format, not a snippet:
String test1= "abcd.req.copied";
if(test1.matches("^.*req\\.copied$")) {
System.out.println("Matches");
} else {
System.out.println("Does not Match");
}
Also, your format was using (req.copied)?, which would match any case. Also, . symbol matches any character, so escape it for matching a dot.

Related

RegExp pattern for a String which contain 0 and 4-9(4 ,5,6,7,8,9)

I am dealing with a string. Use-Case is I don't want a String which has number any digit of 4 to 9 and 0.
Example:-
ABC0123-> Not Valid.
XYZ002456789->Not Valid.
ABC123->Valid
ABC1->Valid
I have tried below pattern but not got success in it.
String pattern = "^[0,4-9]+$";
if(str.matches(pattern)){
//do something.
}

First, remove the comma from the character class. You're not looking for commas.
Since you're disallowing, don't anchor the expression, allow the match anywhere in the string. In fact, matches anchors the expression for you, so we have to intentionally allow characters before and after the disallowed character class:
String pattern = ".*[04-9].*";
if(str.matches(pattern)){
// disallow
}
Live Example
Alternately, you can avoid having those .* in there by using Pattern.compile and then using the resulting Pattern instead of matches, since it won't automatically anchor the pattern like matches does.

It is much more easier to match those that contains 4-9 and 0 than to match those that don't. So you should just write a regex like this:
[4-90]
And call find, then invert the result:
if (!Pattern.compile("[4-90]").matcher(someString).find()) {
// ...
}

Another option could be to use a negated character class and add what you don't want to match. In this case you could add 0 and a range from 4-9 and if you don't want to match a carriage return or a newline you could add those as well.
^[^04-9\\r\\n]+$
Note that if you add the comma to the character class that it would mean a comma literally.
Regex demo | Java demo
String pattern = "^[^04-9\\r\\n]+$";
if(str.matches(pattern)){
//do something.
}

Validating a mathematical expression in java

I am trying to validate if the string "expression" as in the code below is a formula.
String expression = request.getParameter(FORMULA);
if(!Pattern.matches("[a-zA-Z0-9+-*/()]", expression)){return new AjaxMessage(AjaxMessage.ResponseStatusEnum.FAILURE, getJsonString(, "Manager.invalid.formula" , null));
}
examples of value for expression are {a+b/2, (a+b)*2,(john-Max),etc} just for the context (the variable names in the formula might vary and the arithmetic expression contains only [+-/()*] special characters. As you can see I tried to validate using regex (new to regex), but I think it's not possible as I don't know the length of the variable names.
Is there a way to achieve a validation using regex or any other library in java?
Thanks in advance.

The reason is you are using characters with special meaning in regex. You need to escape those characters. I have just modified yor regex to make it work.
Code:
List<String> expressions = new ArrayList<String>();
expressions.add("a+b/2");
expressions.add("(a+b)*2");
expressions.add("john-Max");
expressions.add("etc[");
for (String expression : expressions) {
if (!Pattern.matches("[a-zA-Z0-9\\+\\-\\*/\\(\\)]*", expression)) {
System.out.println("NOT match");
} else {
System.out.println("MATCH");
}
}
}
OUTPUT:
MATCH
MATCH
MATCH
NOT match

You're using special character in your regex, you need to escape them using \.
It should look like [a-zA-Z0-9+\\-*/()] . This only tests one character you need to add a * at the end to test multiple characters.
Edit (thanks Toto): because [] tests a single character, it's called a character class (not like a Java class actually), so only the -is considered special here. For a regex without the braces, you would neeed to escape the other special characters.
Special characters have special meaning using regex and won't be interpreted as the character they are (for example parenthesis are used to make groups, * means 0 or more of the previous character, etc.).
About character class: https://docs.oracle.com/javase/tutorial/essential/regex/char_classes.html
More info:
http://www.regular-expressions.info/characters.html and
http://www.regular-expressions.info/refcharacters.html
I use this site to test my regexes (note that regex engine may vary !):
https://regex101.com/
As said in comment, a mathematic expression is more than just different characters, so if you want to validate, you'll have to do more manual checking.

Regex to replace All turkish symbols to regular latin symbols

I have a class that replaces all turkish symbols to similar latin symbols and pass the result to searcher.
these are the methods for symbol replacement
#Override
String replaceTurkish(String words) {
if (checkWithRegExp(words)) {
return words.toLowerCase().replaceAll("ç", "c").replaceAll("ğ", "g").replaceAll("ı", "i").
replaceAll("ö", "o").replaceAll("ş", "s").replaceAll("ü", "u");
} else return words;
}
public static boolean checkWithRegExp(String word){
Pattern p = Pattern.compile("[öçğışü]");
Matcher m = p.matcher(word);
return m.matches();
}
But this always return unmodified words statement.
What am I doing wrong?
Thanks in advance!

Per the Java 7 api, Matcher.matches()
Attempts to match the entire region against the pattern.
Your pattern is "[öçğışü]", which regex101.com (an awesome resource) says will match
a single character in the list öçğışü literally
Perhaps you may see the problem already. Your regex is not going to match anything except a single Turkish character, since you are attempting to match the entire region against a regex which will only ever accept one character.
I recommend either using find(), per suggestion by Andreas in the comments, or using a regex like this:
".*[öçğışü].*"
which should actually find words which contains any Turkish-specific characters.
Additionally, I'll point out that regex is case-sensitive, so if there are upper-case variants of these letters, you should include those as well and modify your replace statements.
Finally (edit): you can make your Pattern case-insensitive, but your replaceAll's will still need to change to be case-insensitive. I am unsure of how this will work with non-Latin characters, so you should test that flag before relying on it.
Pattern p = Pattern.compile(".*[öçğışü].*", Pattern.CASE_INSENSITIVE);

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.

$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}

If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");

The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin

Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.

Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

Validate string has no illegal characters

Im trying to validate a string that only allows letters, numbers and these characters :
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
I tried doing this but its not working and allowing me to enter characters not in the regex. Im still pretty new to java and something similar was working in javascript but I cant figure out whats going on here. I think its running as if it cant find any of the characters mentioned then it will return four.
Pattern allowedCharacters = Pattern.compile("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]");
if (!allowedCharacters.matcher(pw).find()){
return 4;
}
Any help is appreciated. Thanks
EDIT:
I also tried:
if (pw.matches("^[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}
and
if (!pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}

matcher.find() checks if string contains substring that matches regex, so with
!matcher.find() you are checking if there is no match of regex in tested string.
Consider using using matcher.matches() to check if entire string is matched by regex. In this case you will have to add quantifiers like *, + or {n,m} to character class to decide about passwords length. Otherwise it will only single character passwords.
Here is demo of how your code can look like
// here you place quantifier
// ↓
if (pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]+")){
System.out.println("password contains only valid characters");
} else {
System.out.println("invalid characters in password");
}
Update:
in your regex you are not escaping [ which makes [\]^_`{|}~-] separate character class which will be added to outer character class. This character class will not include \ or [. If you are really interested in accepting only alphanumeric characters and !"#$%&'()*+,-./:;<=>?#[]^_`{|}~ then consider using
"[\\w\\Q!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~\\E]+"
as regex.
\\w represents [a-zA-Z0-9_]
and \Q and \E is quote, which is mechanism to escape metacharacters, even in character class.

It's because you're using find() and not matches(). That said, I'd try the opposite, doing find on [^<legal chars>] (note the caret) to match an illegal characters. It's faster because it'll fail as soon as it hits something illegal. Also, start with the simple legal characters, then move up from there. Regular expressions can get hard to read, and adding one char at a time that has special meaning is easier than adding them all at once.

Using other answers from this question, I found this to work for me. Nothing needs to be escaped between the \Q and \E. They do that for you.
Pattern whitelist = Pattern.compile("^[\\w\\s\\Q!\"#$%&'()*+,-.\\/:;<=>?#[]^_`{|}~\\E]+$");
if (!whitelist.matcher(pw).matches()) {
// error
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression for file extensions in Java - java

test1.matches(".*\\.req\\.copied") should do it but in your case you should consider using endsWith() instead of matches.

Related

RegExp pattern for a String which contain 0 and 4-9(4 ,5,6,7,8,9)

Validating a mathematical expression in java

Regex to replace All turkish symbols to regular latin symbols

Why split method does not support $,* etc delimiter to split string

Validate string has no illegal characters

Categories

Resources