Regex that removes all non-alphanumeric, single periods, and single slashes - java

I'm trying to whitelist characters for filenames and prevent path manipulation. We take a filename returned from the frontend (i know.) and parse it to determine if it's in a specified folder. As such we need to make sure the user isn't passing in a file that could escape out of the specified folder. This means our case for a valid filename is:
Alphanumeric
Can include single slashes of either direction
Can include single dots but not pairs.
So "APP-TEST-file.20161115.1" is valid but "/../../test//\" needs to have some characters removed prior to checking the filesystem.
Here's the regex I've got now, unfortunately it's removing too much.
public static String validateFilePath(String fileName) {
return fileName.replaceAll("[^A-Za-z0-9]+[(\\.\\/)\\+2]", "");
}
Such that "APP-TEST-file.20161115.1" is becoming "APP-TEST-file0161115.1"
Any help would be appriciated.

Do you want something like this? (I am not clear about what you want!)
String filename = "APP-TEST-file.20161115.1";
// replace two consecutive dots with a single dot
filename = filename.replaceAll("\\.+", ".");
// replace two consecutive forward slash with a single forward slash
filename = filename.replaceAll("/+", "/");
// replace two consecutive baskslash with a backslash
filename = filename.replaceAll("\\\\+", "\\\\");
// allow alphanumeric characters, dots and both type of slashes
filename = filename.replaceAll("[^A-Za-z0-9./\\\\]+", "");
System.out.println(filename);
It prints:
APPTESTfile.20161115.1
If filename="/../../test//\\", then it prints - /././test/\.

Related

Escape the regex character from a Searching String [duplicate]

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.
Since Java 1.5, yes:
Pattern.quote("$5");
Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example
s.replaceFirst(Pattern.quote("text to replace"),
Matcher.quoteReplacement("replacement text"));
It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:
Pattern.compile(textToFormat, Pattern.LITERAL);
I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.
See Pattern javadoc for details.
First off, if
you use replaceAll()
you DON'T use Matcher.quoteReplacement()
the text to be substituted in includes a $1
it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.
I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.
I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)
In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.
Given:
// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input
replacing
msg = msg.replaceAll("<userInput \\/>", userInput);
with
msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));
solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.
To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.
public class Test {
public static void main(String[] args) {
String str = "y z (111)";
String p1 = "x x (111)";
String p2 = ".* .* \\(111\\)";
p1 = escapeRE(p1);
p1 = p1.replace("x", ".*");
System.out.println( p1 + "-->" + str.matches(p1) );
//.*\ .*\ \(111\)-->true
System.out.println( p2 + "-->" + str.matches(p2) );
//.* .* \(111\)-->true
}
public static String escapeRE(String str) {
//Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
//return escaper.matcher(str).replaceAll("\\\\$1");
return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
}
}
Pattern.quote("blabla") works nicely.
The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E".
However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:
String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
This method returns: Some/\s/wText*/\,**
Code for example and tests:
String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
^(Negation) symbol is used to match something that is not in the character group.
This is the link to Regular Expressions
Here is the image info about negation:

Java how to handle a \ on input

I am currently trying to split a String folder. I get the value from a file system and it usually looks something like EAM\Testing.
String folder = "EAM\Testing"
String[] parts = folder.split("\\");
I know \ has special rules to it in java.
String folder = "EAM\\Testing"
String[] parts = folder.split("\\\\");
(I know the code above would work if I could control what the input looked like)
My problem is that I can not control what string folder is as input from a location of a file.
Is there a way to get this to work where folder only has one \ in it?
This is for a recycle bin component I am writing for Documentum a enterprise management system. When a document is deleted and the folder doesn't exist anymore I want to recreate it and inorder to recreate it the folder names must be seperate as I have to create them one at a time.
Here is how I get the name of the folder.
File f = new File(relationRecord.getRepeatingString(
"dp_original_folder_paths",
i));
(This gives an input such as \EAM\testing
String folder1 = f.toString();
I then get rid of the first \ by
String folder = folder1.substring(1);
Which gives me EAM\testing
Well if this is literally a file path, you should consider using the Path class, it'll make your life easier.
Path path = Paths.get("C:\\home\\joe\\foo");
System.out.format("toString: %s%n", path.toString());
System.out.format("getFileName: %s%n", path.getFileName());
System.out.format("getName(0): %s%n", path.getName(0));
System.out.format("getNameCount: %d%n", path.getNameCount());
System.out.format("subpath(0,2): %s%n", path.subpath(0,2));
System.out.format("getParent: %s%n", path.getParent());
System.out.format("getRoot: %s%n", path.getRoot());
Your second option
String[] parts = folder.split("\\\\");
Should work fine for your input string. When you write a string literal like "EAM\\Testing", the resulting string has only one slash. You can read some details on escape sequences in Java there.
The reason you need four slashes in split is because \ is an escape character both for string literals and regular expressions (String#split accepts regular expression as its argument)
You should be doing something like this -
String s = "EAM\\testing";
String a[] = s.split("\\\\");
Here you duplicate the backslash once for the String (since \ is an escape character for String) and again for the regex for the same reason.
Your question seems to be "how can I remove a leading \ from a string:
folder = folder.replaceAll("^\\\\", "");
This searches for a back slash at the start if the string, and if found replaces it with nothing (ie deletes it).
Regarding backslash vs forward slash characters in paths, java handles both.

File path issue with: / -> \

// Dividend Limit check or increase the Dividend
if (dival == 10) {
writer.println("Divident has reached it Limit !");
i++;
// update the file name
String upath = "channel_" + i;
System.out.println(path);
// find channel_1 and replace with the updated path
if (path.contains("channel_1")) {
path = "D:/File Compression/Data/low_freq/low_freq/house_1/"
+ upath + ".dat";
} else {
JOptionPane.showMessageDialog(null, "Invalid File Choosen");
System.exit(0);
}
dival = 10;
} else {
dival = dival + 10;
writer.println("Dividen:" + dival);
}
these lines are in a recursive method. first time it gives right path:
D:/File Compression/Data/low_freq/low_freq/house_1/channel_2.dat
But on the second call it flips the forward slash to back slash:
D:\File Compression\Data\low_freq\low_freq\house_1\channel_1.dat
it works fine if I do not use the condition.
if(path.contains("channel_"))
That is because the File.seperator in Windows is \. Every time you let your path String go through a java.io.File it will replace them. So to fix this, either don't use File as auxiliary tool, or replace the backslashes with forward slashes.
So, what happens is that your path String uses backward slashes. You retrieve that String form a java.io.File which will automatically uses backslashes on Windows. If the path contains "channel_1", then you overwrite the whole string using a hardcoded string with forward slashes.
\ is called as Escape sequence in java which is used in various purposes .
In your case use File.separator
String path = "D:"+File.separator+"File Compression"+File.separator+"Data"+File.separator+"low_freq"+File.separator+"low_freq"+File.separator+"house_1"+File.separator;
Use double slash \\ ! It's a special escape pattern. Like \n or \r.
Escape sequence normally used in text files in Windows, specially in notepad.
The primary Java escape sequences are listed below. They are used to represent non-graphical characters and also characters such as double quotes, single quotes, and backslashes. If you'd like to represent a double quote within a String literal, you can do so with \". If you'd like to represent a single quote within a character literal, you can do so with \'.
In addition to the previous answers. You should not use / or \ hard coded in your application. Because this will harm the portability of your application. rather use,
File.separator
File#separator gives you, the separator depending in your system.

Is there any method of JFileChooser which can provide me a file's path with Double slash "\\"?

In Netbeans I am using a JFileChooser to get a file's path. All is OK and its giving me the path as well with single slash \. But I need the path with double slash \\. So my question is, is there already any kind of method which can provide me that type of path? I also don't know the name of the path which has double slash \\. Example- H:\\New folder\\odesk\\odeskViolin4.wav
What can I do now?
You can simply replace the \ symbols with \\, by using the String.replaceAll() method.
String input = "C:\\Users\\myName"; //special characters have to be escaped.
String doubleSlashed = input.replaceAll("\\\\", "\\\\\\\\");
System.out.println(doubleSlashed);
This will print:
C:\\Users\\myName
Note that String.replaceAll(String pattern, String replacement) takes two arguments and in my example they are four-slashed and eight-slashed strings. This is because the \ symbol is a special character and has to be escaped.
Assuming you truly want to replace single backslashes with double backslashes, you could simply do this:
path = path.replace("\\", "\\\\");
However, you may not actually want double backslashes, depending on your purpose. You should at least be aware of this:
String oneBackSlash = "\\"; //This String will consist of one backslash
String twoBackSlashes = "\\\\"; //This String will consist of two backslashes
//The String below has no double backslashes, only single ones
String path = "H:\\New folder\\odesk\\odeskViolin4.wav";
System.out.println(oneBackSlash);
System.out.println(twoBackSlashes);
System.out.println(path);
Output:
\
\\
H:\New folder\odesk\odeskViolin4.wav

How replaceAll "/" in java String to format URL?

String url = "d://test////hello\\\hello";
String separator = File.separator;
url = url.replaceAll("\\*", separator);
url = url.replaceAll("/+", separator);
I want to format those url, but error occurs when i attempt to use replaceAll("/+", separator). and i attempt to escaped "/" as "\\/", it still doesn't work..
This is the Exception from console:
Exception in thread "main" **java.lang.StringIndexOutOfBoundsException**: String index out of range: 1
at java.lang.String.charAt(String.java:686)
at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
at java.util.regex.Matcher.replaceAll(Matcher.java:813)
at java.lang.String.replaceAll(String.java:2189)
Now it works
String separator = null;
if(File.separator.equals("/")) {
separator = "/";
url = url.replaceAll("/+", separator);
url = url.replaceAll("\\\\+", separator);
} else {
separator = Matcher.quoteReplacement(File.separator);
url = url.replaceAll("/+", separator);
url = url.replaceAll("\\+", separator);
}
:) it works in javascript
var i = "d:\\ad////df";
alert(i.replace(/\/+/g, '\\'));
Your platform is Windows right? So File.separator will be a backslash right?
The explanation is that the 2nd argument of String.replaceAll is not a simple String. Rather it is a replacement pattern ...
The javadoc says:
"Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired. "
So your replacement String that consists of a single backslash is an invalid literal replacement string. You need to quote the separator String ... like the javadoc says.
(It is a little surprising that you get that particular exception. I can imagine how it could happen, but I'd have thought that they'd deal with bad escapes more elegantly. Mind you, if this was reported as a "bug", Oracle would probably not fix it. A fix would break backwards compatibility.)
Try:
url = url.replaceAll("\\\\+", separator);
You need 4 backward slashes. Escape once for Java string and once for regex meta-character. That is for regex you need two backward slashes \\, and in string you need to escape both of them with another two.
Also, the quantifier * means zero or more, you need to use +.

Categories