Replacing illegal character in fileName - java

In Java, I've a File-Name-String. There I want to replace all illegal Characters with '_', but not a-z, 0-9, -,. and _
I tried following code: But this did not worked!
myString = myString.replaceAll("[\\W][^\\.][^-][^_]", "_");

You need to replace everything but [a-zA-Z0-9.-].
The ^ within the brackets stands for "NOT".
myString = myString.replaceAll("[^a-zA-Z0-9\\.\\-]", "_");

If you are looking for options on windows platform then you can try below solution to make use of all valid characters other than "\/:*?"<>|" in file name.
fileName = fileName.replaceAll("[\\\\/:*?\"<>|]", "_");

Keep it simple.
myString = myString.replaceAll("[^a-zA-Z0-9.-]", "_");
http://ideone.com/TINsr4

Even simpler
myString = myString.replaceAll("[^\\w.-]", "_");
Predefined Character Classes:
\w A word character: [a-zA-Z_0-9]

I know there have been some answers here already, but I would like to point out that I had to alter the given suggestions slightly.
filename.matches("^.*[^a-zA-Z0-9._-].*$")
This is what I had to use for .matches in Java to get the desired results. I am not sure if this is 100% correct, but this is how it worked for me, it would return true if it encountered any character other than a-z A-Z 0-9 (.) (_) and (-).
I would like to know if there are any flaws with my logic here.
In previous answers I've seen some discussion of what should or should not be escaped. For this example, I've gotten away without escaping anything, but you should escape the (-) minus character to be safe as it will "break" your expression unless it is at the end of the list.
The (.) dot character doesn't have to be escaped within the ([]) Square Braces it would seem, but it will not hurt you if you do escape it.
Please see Java Patterns for more details.

If you want to use more than like [A-Za-z0-9], then check MS Naming Conventions, and dont forget to filter out "...Characters whose integer representations are in the range from 1 through 31,...".

Related

Java - Regex Replace All will not replace matched text

Trying to remove a lot of unicodes from a string but having issues with regex in java.
Example text:
\u2605 StatTrak\u2122 Shadow Daggers
Example Desired Result:
StatTrak Shadow Daggers
The current regex code I have that will not work:
list.replaceAll("\\\\u[0-9]+","");
The code will execute but the text will not be replaced. From looking at other solutions people seem to use only two "\\" but anything less than 4 throws me the typical error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 2
\u[0-9]+
I've tried the current regex solution in online test environments like RegexPlanet and FreeFormatter and both give the correct result.
Any help would be appreciated.
Assuming that you would like to replace a "special string" to empty String. As I see, \u2605 and \u2122 are POSIX character class. That's why we can try to replace these printable characters to "". Then, the result is the same as your expectation.
Sample would be:
list = list.replaceAll("\\P{Print}", "");
Hope this help.
In Java, something like your \u2605 is not a literal sequence of six characters, it represents a single unicode character — therefore your pattern "\\\\u[0-9]{4}" will not match it.
Your pattern describes a literal character \ followed by the character u followed by exactly four numeric characters 0 through 9 but what is in your string is the single character from the unicode code point 2605, the "Black Star" character.
This is just as other escape sequences: in the string "some\tmore" there is no character \ and there is no character t ... there is only the single character 0x09, a tab character — because it is an escape sequence known to Java (and other languages) it gets replaced by the character that it represents and the literal \ t are no longer characters in the string.
Kenny Tai Huynh's answer, replacing non-printables, may be the easiest way to go, depending on what sorts of things you want removed, or you could list the characters you want (if that is a very limited set) and remove the complement of those, such as mystring.replaceAll("[^A-Za-z0-9]", "");
I'm an idiot. I was calling the replaceAll on the string but not assigning it as I thought it altered the string anyway.
What I had previously:
list.replaceAll("\\\\u[0-9]+","");
What I needed:
list = list.replaceAll("\\\\u[0-9]+","");
Result works fine now, thanks for the help.

Validate string has no illegal characters

Im trying to validate a string that only allows letters, numbers and these characters :
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
I tried doing this but its not working and allowing me to enter characters not in the regex. Im still pretty new to java and something similar was working in javascript but I cant figure out whats going on here. I think its running as if it cant find any of the characters mentioned then it will return four.
Pattern allowedCharacters = Pattern.compile("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]");
if (!allowedCharacters.matcher(pw).find()){
return 4;
}
Any help is appreciated. Thanks
EDIT:
I also tried:
if (pw.matches("^[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}
and
if (!pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]")){
return 4;
}
matcher.find() checks if string contains substring that matches regex, so with
!matcher.find() you are checking if there is no match of regex in tested string.
Consider using using matcher.matches() to check if entire string is matched by regex. In this case you will have to add quantifiers like *, + or {n,m} to character class to decide about passwords length. Otherwise it will only single character passwords.
Here is demo of how your code can look like
// here you place quantifier
// ↓
if (pw.matches("[A-Za-z0-9!\"#$%&'()*+,.\\/:;<=>?#[\\]^_`{|}~-]+$]+")){
System.out.println("password contains only valid characters");
} else {
System.out.println("invalid characters in password");
}
Update:
in your regex you are not escaping [ which makes [\]^_`{|}~-] separate character class which will be added to outer character class. This character class will not include \ or [. If you are really interested in accepting only alphanumeric characters and !"#$%&'()*+,-./:;<=>?#[]^_`{|}~ then consider using
"[\\w\\Q!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~\\E]+"
as regex.
\\w represents [a-zA-Z0-9_]
and \Q and \E is quote, which is mechanism to escape metacharacters, even in character class.
It's because you're using find() and not matches(). That said, I'd try the opposite, doing find on [^<legal chars>] (note the caret) to match an illegal characters. It's faster because it'll fail as soon as it hits something illegal. Also, start with the simple legal characters, then move up from there. Regular expressions can get hard to read, and adding one char at a time that has special meaning is easier than adding them all at once.
Using other answers from this question, I found this to work for me. Nothing needs to be escaped between the \Q and \E. They do that for you.
Pattern whitelist = Pattern.compile("^[\\w\\s\\Q!\"#$%&'()*+,-.\\/:;<=>?#[]^_`{|}~\\E]+$");
if (!whitelist.matcher(pw).matches()) {
// error
}

How to search for a special character in java string?

I am having some problem with searching for a special character "(".
I got a java.util.regex.PatternSyntaxException exception has occurred.
It might have something to do with "(" being treated as special character.
I am not very good with pattern expression. Can someone help me properly search for the escape character?
// I need to split the string at the "("
String myString = "Room Temperature (C)";
String splitList[] = myString.split ("("); // i got an exception
// I tried this but got compile error
String splitList[] = myString.split ("\(");
Try one of these:
string.split("\\(");
string.split(Pattern.quote("("));
Since a string split takes a regular expression as an argument, you need to escape characters properly. See Jon Skeet's answer on this here:
The reason you got an exception the first time is because split() takes a regular expression as argument, and ( has a special meaning there, as you suggest. To avoid this, you need to escape it using a \, like you tried.
What you missed, is that you also need to escape your backslashes with an extra \ in Java, meaning you need a total of two:
String splitList[] = myString.split ("\\(");
You need to escape the character via backslashes: string.split("\\(");
( is one of regex special characters. To escape it you can use e.g.
split(Pattern.quote("(")),
split("\\Q(\\E"),
split("\\("),
split("[(]").

Need a regular expression for field which should allow special characters, alphanumeric characters, and spaces

I am using the following regex:
[a-zA-Z0-9-#.()/%&\\s]{0,19}.
The requirement for the field is it should allow any thing and the field size should be 19.
Let me know if any corrections.Any help is appreciated.
You simply need to escape the special characters. Try:
[a-zA-Z0-9\-#\.\(\)\/%&\s]{0,19}
You can test your regular expressions on http://rubular.com/
Your regex is incorrect in at least one way - if you're considering a hyphen to be a "special character", then you should put it at the beginning or end of the range. So: [a-zA-Z0-9#.()/%&\s-]{0,19}.
Characters that are "special" within the context of the regex itself are often not parsed if they're inside a range. So you're fine with ., ( and ). But check your parser to make sure that it understands what \s means. It might be simpler just to put a space.
Also, if your regex parser tends to delimit the regex with slashes, then you may have to escape the slash in the middle of the range: [a-zA-Z0-9#.()\/%&\s-]{0,19}.
Just escape the dash - or put it at the begining or at the end of the character class:
[a-zA-Z0-9\\-#.()/%&\\s]{0,19}
or
[-a-zA-Z0-9#.()/%&\\s]{0,19}
or
[a-zA-Z0-9#.()/%&\\s-]{0,19}

Unescaped "." still matches when used in a negation group

I made, what I believed to be, an error in a regular expression in Java recently but when I test my code I don't get the error I expect.
The expression I created was meant to replace a password in a string that I received from another source. The pattern I used went along the lines of: "password: [^\\s.]*", the idea being that it would match the word "password" the colon, a space, then any characters except for a space or a full-stop (period). I would then replace the instance with "password: XXXXXX" and therefore mask it.
The obvious error should be that I have forgotten to escape the full-stop. In otherwords the proper expression should have been "password: [^\\s\\.]*". Thing is, if I don't escape the full-stop the code still works!
Here's some sample code:
import java.util.regex.*;
public class SimpleRegexTest {
public static void main(String[] args) {
Pattern simplePattern = Pattern.compile("password: [^\\s.]*");
Matcher simpleMatcher = simplePattern.matcher("password: newpass. Enjoy.");
String maskedString = simpleMatcher.replaceAll("password: XXXXXX");
System.out.println(maskedString);
}
}
When I run the above code I get the following output:
password: XXXXXX. Enjoy.
Is this a special case, or have I completely missed something?
(edit: changed to "escape the full-stop")
Michael Borgwardt: I couldn't think of another term to describe what I was doing apart from "negation group", sorry for the ambiguity.
Aviator: In this case, no, a space won't be in the password. I didn't make the rules ;-).
(edit: doubled up the slashes in the non-code text so it displays properly, added the ^ which was in the code, but not the text :-/)
Sundar: Fixed the double slashes, SO seems to have it's own escape characters.
A period ('.' character) does not need to be escaped inside a character class [] in a regular expression.
From the API:
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.
It looks like you got the negation operator mixed up for regex ranges.
In particular, my understanding is that you used the snippet [\s.]* to mean "any characters except for a space or a full-stop (period)." This would in fact be expressed as [^ .]*, using the caret to negate the characters in the set.
I don't know if this was just a typo in your post or what was actually in your code, but the regex as it stands in your question will match the word "password", a colon, a space, then any sequence of backslash characters, "s" characters or periods.

Categories