Java use regex to extract file name

Java use regex to extract file name - java

I need to get a file name from file's absolute path (I am aware of the file.getName() method, but I cannot use it here).
EDIT: I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path). I need the part of file's path AFTER certain path provided.
Let's say the file is located in the folder:
C:\Users\someUser
On windows machine, if I make a pattern string as follows:
String patternStr = "C:\\Users\\someUser\\(.*+)";
I get an exception: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence for backslash.
If I use Pattern.quote(File.pathSeparator):
String patternStr = "C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) + "someUser" + Pattern.quote(File.separator) + "(.*+)";
the resulting pattern string is: C:\Q;\EUsers\Q;\EsomeUser\Q;\E(.*+) which of course has no match with the actual fileName "C:\Users\someUser\myFile.txt".
What am I missing here? What is the proper way to parse file name?

What is the proper way to parse file name?
The proper way to parse a file name is to use File(String). Using a regex for this is going to hard-wire platform dependencies into your code. That's a bad idea.
I know you said you can't use File.getName() ... but that is the proper solution. If you would care to say why you can't use File.getName() perhaps I could suggest an alternative solution.

If you indeed want to use a regular expressions, you should use
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
^^ ^^ ^^
instead.
Why? Your string literal
"C:\\Users\\someUser\\(.*+)"
is compiled to
C:\Users\someUser\(.*+)
Since \ is used for escaping in regular expressions too, you'll have to escape them "twice".
Regarding your edit:
You probably want to have a look at URI.relativize(). Example:
File base = new File("C:/Users/someUser");
File file = new File("C:/Users/someUser/someDir/someFile.txt");
String relativePath = base.toURI().relativize(file.toURI()).getPath();
System.out.println(relativePath); // prints "someDir/someFile.txt"
(Note that / works as file-separator on Windows machines too.)
Btw, I don't know what you have as File.separator on your system, but if it's set to \, then
"C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) +
"someUser" + Pattern.quote(File.separator) + "(.*+)";
should yield
C:\Q\\EUsers\Q\\EsomeUser\Q\\E(.*+)

String patternStr = "C:\\Users\\someUser\\(.*+)";
Backslashes (\) are escape characters in the Java Language. Your string contains the following after compilation:
C:\Users\someUser\(.*+)
This string is then parsed as a regex, which uses backslashes as an escape character as well. The regex parser tries to understand the escaped \U, \s and \(. One of them is incorrect regarding the regex syntax (hence your exception), and none of them are what you are trying to achieve.
Try
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

If you want to solve it by pattern you need to escape your Pattern properly
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like:
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Move from end of string to first occurrence of file path separator* or begin.
File paths separator can be / or \.
public static final char ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR = '/';
public static final char DIRECTORY_SEPARATOR_CHAR = '\\';
public static final char VOLUME_SEPARATOR_CHAR = ':';
public static String getFileName(String path) {
if(path == null || path.isEmpty()) {
return path;
}
int length = path.length();
int index = length;
while(--index >= 0) {
char c = path.charAt(index);
if(c == ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR || c == DIRECTORY_SEPARATOR_CHAR || c == VOLUME_SEPARATOR_CHAR) {
return path.substring(index + 1, length);
}
}
return path;
}
Try to keep it simple ;-).

Try this :
String ResultString = null;
try {
Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Output :
myFile.txt
Also for input : C:/Users/someUser/myFile.txt
Output : myFile.txt

What am I missing here? What is the proper way to parse file name?
The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.

I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path).
OK. So what you want is something like this.
// Canonicalize paths to deal with ".", "..", symlinks,
// relative files and case sensitivity issues.
String directory = new File(someDirectory).canonicalPath();
String test = new File(somePathname).canonicalPath();
if (!directory.endsWith(File.separator)) {
directory += File.separator;
}
if (test.startsWith(directory)) {
String pathInDirectory = test.substring(directory.length()):
...
}
Advantages:
No regexes needed.
Doesn't break if the path separator is something other than \.
Doesn't break if there are symbolic links on the path.
Doesn't break due to case sensitivity issues.

Suppose the file name has special characters, specially when supporting MAC where special characters are allowing in filenames, server side Path.GetFileName(fileName) fails and throws error because of illegal characters in path. The following code using regex come for the rescue.
The following regex take care of 2 things
In IE, when file is uploaded, the file path contains folders aswell (i.e. c:\samplefolder\subfolder\sample.xls). Expression below will replace all folders with empty string and retain the file name
When used in Mac, filename is the only thing supplied as its safari browser and allows special chars in file name
var regExpDir = #"(^[\w]:\\)([\w].+\w\\)";
var fileName = Regex.Replace(fileName, regExpDir, string.Empty);

Related

URLEncoder - what character set to use for empty space instead of %20 or +

I am trying to open new email from my Java app:
String str=String.valueOf(email);
String body="This is body";
String subject="Hello worlds";
String newStr="mailto:"+str.trim()+"?subject="+URLEncoder.encode(subject,"UTF-8")+"&body="+URLEncoder.encode(body, "UTF-8")+"";
Desktop.getDesktop().mail(new URI(newStr));
Here it is my URLEncoding. As I cannot use body or subject string in URL without encoding them, my output here is with "+" instead of whitespace. Which is normal, I understand that. I was thinking if there is a way to visualize subject and body normally in my message? I tried with .replace("+"," ") but it is not working as it is giving an error. This is how it is now:
I think there might be different character set but I am not sure.

That's the way URLEncoder works.
One possible approach would be to replace all + with %20 after URLEncoder.enocde(...)
Or you could rely on URI constructor to encode your parameters correctly:
String scheme = "mailto";
String recipient = "recipient#snakeoil.com";
String subject = "The Meaning of Life";
String content = "..., the universe and all the rest is 42.\n Rly? Just kidding. Special characters: äöü";
String path = "";
String query = "subject=" + subject + "&body=" + content;
Desktop.getDesktop().mail(new URI(scheme, recipient, path, query, null));
Both solutions have issues:
In the first approach, you might replace actual + signs, with the second, you'll have issues with & character.

Java regex for google maps url?

I want to parse all google map links inside a String. The format is as follows :
1st example
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z
https://www.google.com/maps/place//#38.8976763,-77.0387185,17z
https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z
https://www.google.com/maps/place/#38.8976763,-77.0387185,17z
https://google.com/maps/place/#38.8976763,-77.0387185,17z
http://google.com/maps/place/#38.8976763,-77.0387185,17z
https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z
These are all valid google map URLs (linking to White House)
Here is what I tried
String gmapLinkRegex = "(http|https)://(www\\.)?google\\.com(\\.\\w*)?/maps/(place/.*)?#(.*z)[^ ]*";
Pattern patternGmapLink = Pattern.compile(gmapLinkRegex , Pattern.CASE_INSENSITIVE);
Matcher m = patternGmapLink.matcher(s);
while (m.find()) {
logger.info("group0 = {}" , m.group(0));
String place = m.group(4);
place = StringUtils.stripEnd(place , "/"); // remove tailing '/'
place = StringUtils.stripStart(place , "place/"); // remove header 'place/'
logger.info("place = '{}'" , place);
String latLngZ = m.group(5);
logger.info("latLngZ = '{}'" , latLngZ);
}
It works in simple situation , but still buggy ...
for example
It need post-process to grab optional place information
And it cannot extract one line with two urls such as :
s = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z " +
" and http://google.com/maps/place/#38.8976763,-77.0387185,17z";
It should be two urls , but the regex matches the whole line ...
The points :
The whole URL should be matched in group(0) (including the tailing data part in 1st example),
in the 1st example , if the zoom level : 17z is removed , it is still a valid gmap URL , but my regex cannot match it.
Easier to extract optional place info
Lat / Lng extraction is must , zoom level is optional.
Able to parse multiple urls in one line
Able to process maps.google.com(.xx)/maps , I tried (www|maps\.)? but seems still buggy
Any suggestion to improve this regex ? Thanks a lot !

The dot-asterisk
.*
will always allow anything to the end of the last url.
You need "tighter" regexes, which match a single URL but not several with anything in between.
The "[^ ]*" might include the next URL if it is separated by something other than " ", which includes line break, tab, shift-space...
I propose (sorry, not tested on java), to use "anything but #" and "digit, minus, comma or dot" and "optional special string followed by tailored charset, many times".
"(http|https)://(www\.)?google\.com(\.\w*)?/maps/(place/[^#]*)?#([0123456789\.,-]*z)(\/data=[\!:\.\-0123456789abcdefmsx]+)?"
I tested the one above on a perl-regex compatible engine (np++).
Please adapt yourself, if I guessed anything wrong. The explicit list of digits can probably be replaced by "\d", I tried to minimise assumptions on regex flavor.
In order to match "URL" or "URL and URL", please use a variable storing the regex, then do "(URL and )*URL", replacing "URL" with regex var. (Asuming this is possible in java.) If the question is how to then retrieve the multiple matches: That is java, I cannot help. Let me know and I delete this answer, not to provoke deserved downvotes ;-)
(Edited to catch the data part in, previously not seen, first example, first line; and the multi URLs in one line.)

I wrote this regex to validate google maps links:
"(http:|https:)?\\/\\/(www\\.)?(maps.)?google\\.[a-z.]+\\/maps/?([\\?]|place/*[^#]*)?/*#?(ll=)?(q=)?(([\\?=]?[a-zA-Z]*[+]?)*/?#{0,1})?([0-9]{1,3}\\.[0-9]+(,|&[a-zA-Z]+=)-?[0-9]{1,3}\\.[0-9]+(,?[0-9]+(z|m))?)?(\\/?data=[\\!:\\.\\-0123456789abcdefmsx]+)?"
I tested with the following list of google maps links:
String location1 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location2 = "https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z";
String location3 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location4 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298";
String location5 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z";
String location6 = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location7 = "https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location8 = "https://www.google.com/maps/place/#38.8976763,-77.0387185,17z";
String location9 = "https://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location10 = "http://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location11 = "https://www.google.com/maps/place/#/data=!4m2!3m1!1s0x3135abf74b040853:0x6ff9dfeb960ec979";
String location12 = "https://maps.google.com/maps?q=New+York,+NY,+USA&hl=no&sll=19.808054,-63.720703&sspn=54.337928,93.076172&oq=n&hnear=New+York&t=m&z=10";
String location13 = "https://www.google.com/maps";
String location14 = "https://www.google.fr/maps";
String location15 = "https://google.fr/maps";
String location16 = "http://google.fr/maps";
String location17 = "https://www.google.de/maps";
String location18 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location19 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location20 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location21 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location22 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location23 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location24 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location25 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location26 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location27 = "http://google.com/maps/bylatlng?lat=21.01196022&lng=105.86298748";
String location28 = "https://www.google.com/maps/place/C%C3%B4ng+vi%C3%AAn+Th%E1%BB%91ng+Nh%E1%BA%A5t,+354A+%C4%90%C6%B0%E1%BB%9Dng+L%C3%AA+Du%E1%BA%A9n,+L%C3%AA+%C4%90%E1%BA%A1i+H%C3%A0nh,+%C4%90%E1%BB%91ng+%C4%90a,+H%C3%A0+N%E1%BB%99i+100000,+Vi%E1%BB%87t+Nam/#21.0121535,105.8443773,13z/data=!4m2!3m1!1s0x3135ab8ee6df247f:0xe6183d662696d2e9";

How to dynamically update absolute path

Given the below incoming path, e.g.
C:\cresttest\parent_3\child_3_1\child_3_1_.txt
How can one update and add new dir in between above path to construct below result
C:\cresttest\NEW_PATH\parent_3\child_3_1\child_3_1_.txt
Currently I am using multiple subString to identify the incoming path, but incoming path are random and dynamic. Using substring and placing my new path requires more line of code or unnecessary processing, is there any API or way to easily update and add my new dir in between the absolute path?

By using java.nio.file.Path, you could to the following:
Path incomingPath = Paths.get("C:\\cresttest\\parent_3\\child_3_1\\child_3_1_.txt");
//getting C:\cresttest\, adding NEW_PATH to it
Path subPathWithAddition = incomingPath.subpath(0, 2).resolve("NEW_PATH");
//Concatenating C:\cresttest\NEW_PATH\ with \parent_3\child_3_1\child_3_1_.txt
Path finalPath = subPathWithAddition.resolve(incomingPath.subpath(2, incomingPath.getNameCount()));
You could then get the path URI by calling finalPath.toUri()
Note: this doesn't depend on any names in your path, it depends on the directory depth though, which you could edit in the subpath calls.
Note 2: you could probably reduce the amount of Path instances you make to one, I made three to improve readability.

You may simply insert a path at the second backslash like this:
String path="C:\\cresttest\\parent_3\\child_3_1\\child_3_1_.txt";
final String slash="\\\\";
path=path.replaceFirst(slash+"[^"+slash+"]+"+slash, "$0NEW_PATH"+slash);
System.out.println(path);
Demo
This replaces the first occurrence of \\arbitrarydirname\\ with itself (referred to via $0) followed by NEWPATH\\.
The separator’s source code representation looks a bit odd ("\\\\") as a backslash has to be escaped twice when writing regular expression in a Java String literal.
If you want your operation to be platform independent, you may replace that line with
final String slash = Pattern.quote(FileSystems.getDefault().getSeparator());
Of course, then, the input path must be in the right format for the platform as well.

You can use this simple regex replace:
path = path.replaceAll(":.\\w+", "$0\\\\NEW_PATH");
Your code would be simpler if you used / instead of \ for your path delimiters. eg, compare:
String path = "C:\\cresttest\\parent_3\\child_3_1\\child_3_1_.txt";
path = path.replaceAll(":.\\w+", "$0\\\\NEW_PATH");
with
String path = "C:/cresttest/parent_3.child_3_1/child_3_1_.txt";
path = path.replaceAll(":.\\w+", "$0/NEW_PATH");
Java can handle either delimiter on windows, but on linux only / works, so to make your code portable and more readable, prefer using /.

Just for fun, not sure whether this is what you wanted
public static String addFolderToPath(String originalPath, String newFolderName, int position){
String returnString = "";
String[] pathArray = originalPath.split("\\\\");
for(int i = 0; i<pathArray.length; i++){
returnString = returnString.concat(i==position ? "\\" + newFolderName : "");
returnString = returnString.concat(i!=0 ? "\\" + pathArray[i] : "" + pathArray[i]);
}
return returnString;
}
Call:
System.out.println(addFolderToPath("c:\\abc\\def\\ghi\\jkl", "test", 1));
System.out.println(addFolderToPath("c:\\abc\\def\\ghi\\jkl", "test", 2));
System.out.println(addFolderToPath("c:\\abc\\def\\ghi\\jkl", "test", 3));
System.out.println(addFolderToPath("c:\\abc\\def\\ghi\\jkl", "test", 4));
Run:
c:\test\abc\def\ghi\jkl
c:\abc\test\def\ghi\jkl
c:\abc\def\test\ghi\jkl
c:\abc\def\ghi\test\jkl

Java : replacing text URL with clickable HTML link

I am trying to do some stuff with replacing String containing some URL to a browser compatible linked URL.
My initial String looks like this :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
What I want to get is a String looking like :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
I can catch URL with this code line :
String withUrlString = myString.replaceAll(".*://[^<>[:space:]]+[[:alnum:]/]", "HereWasAnURL");
Maybe the regexp expression needs some correction, but it's working fine, need to test in further time.
So the question is how to keep the expression catched by the regexp and just add a what's needed to create the link : catched string
Thanks in advance for your interest and responses !

Try to use:
myString.replaceAll("(.*://[^<>[:space:]]+[[:alnum:]/])", "HereWasAnURL");
I didn't check your regex.
By using () you can create groups. The $1 indicates the group index.
$1 will replace the url.
I asked a simalir question: my question
Some exemples: Capturing Text in a Group in a regular expression

public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
String escapedText = HtmlUtils.htmlEscape(text);
return escapedText.replaceAll("(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)",
"$1$2$4");
}
There may be better REGEXs out there, but this does the trick as long as there is white space after the end of the URL or the URL is at the end of the text. This particular implementation also uses org.springframework.web.util.HtmlUtils to escape any other HTML that may have been entered.

For anybody who is searching a more robust solution I can suggest the Twitter Text Libraries.
Replacing the URLs with this library works like this:
new Autolink().autolink(plainText)

Belows code replaces links starting with "http" or "https", links starting just with "www." and finally replaces also email links.
Pattern httpLinkPattern = Pattern.compile("(http[s]?)://(www\\.)?([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern wwwLinkPattern = Pattern.compile("(?<!http[s]?://)(www\\.+)([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern mailAddressPattern = Pattern.compile("[\\S&&[^#]]+#([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
String textWithHttpLinksEnabled =
"ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda";
if (Objects.nonNull(textWithHttpLinksEnabled)) {
Matcher httpLinksMatcher = httpLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = httpLinksMatcher.replaceAll("$0");
final Matcher wwwLinksMatcher = wwwLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = wwwLinksMatcher.replaceAll("$0");
final Matcher mailLinksMatcher = mailAddressPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = mailLinksMatcher.replaceAll("$0");
System.out.println(textWithHttpLinksEnabled);
}
Prints:
ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda

Assuming your regex works to capture the correct info, you can use backreferences in your substitution. See the Java regexp tutorial.
In that case, you'd do
myString.replaceAll(....., "\1")

In case of multiline text you can use this:
text.replaceAll("(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)",
"$1<a href='$2'>$2</a>$4");
And here is full example of my code where I need to show user's posts with urls in it:
private static final Pattern urlPattern = Pattern.compile(
"(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)");
String userText = ""; // user content from db
String replacedValue = HtmlUtils.htmlEscape(userText);
replacedValue = urlPattern.matcher(replacedValue).replaceAll("$1$2$4");
replacedValue = StringUtils.replace(replacedValue, "\n", "<br>");
System.out.println(replacedValue);

Does groovy have an easy way to get a filename without the extension?

Say I have something like this:
new File("test").eachFile() { file->
println file.getName()
}
This prints the full filename of every file in the test directory. Is there a Groovy way to get the filename without any extension? (Or am I back in regex land?)

I believe the grooviest way would be:
file.name.lastIndexOf('.').with {it != -1 ? file.name[0..<it] : file.name}
or with a simple regexp:
file.name.replaceFirst(~/\.[^\.]+$/, '')
also there's an apache commons-io java lib for that kinda purposes, which you could easily depend on if you use maven:
org.apache.commons.io.FilenameUtils.getBaseName(file.name)

The cleanest way.
String fileWithoutExt = file.name.take(file.name.lastIndexOf('.'))

Simplest way is:
'file.name.with.dots.tgz' - ~/\.\w+$/
Result is:
file.name.with.dots

new File("test").eachFile() { file->
println file.getName().split("\\.")[0]
}
This works well for file names like:
foo, foo.bar
But if you have a file foo.bar.jar, then the above code prints out: foo
If you want it to print out foo.bar instead, then the following code achieves that.
new File("test").eachFile() { file->
def names = (file.name.split("\\.")
def name = names.size() > 1 ? (names - names[-1]).join('.') : names[0]
println name
}

The FilenameUtils class, which is part of the apache commons io package, has a robust solution. Example usage:
import org.apache.commons.io.FilenameUtils
String filename = '/tmp/hello-world.txt'
def fileWithoutExt = FilenameUtils.removeExtension(filename)
This isn't the groovy way, but might be helpful if you need to support lots of edge cases.

Maybe not as easy as you expected but working:
new File("test").eachFile {
println it.name.lastIndexOf('.') >= 0 ?
it.name[0 .. it.name.lastIndexOf('.')-1] :
it.name
}

As mentioned in comments, where a filename ends & an extension begins depends on the situation. In my situation, I needed to get the basename (file without path, and without extension) of the following types of files: { foo.zip, bar/foo.tgz, foo.tar.gz } => all need to produce "foo" as the filename sans extension. (Most solutions, given foo.tar.gz would produce foo.tar.)
Here's one (obvious) solution that will give you everything up to the first "."; optionally, you can get the entire extension either in pieces or (in this case) as a single remainder (splitting the filename into 2 parts). (Note: although unrelated to the task at hand, I'm also removing the path as well, by calling file.name.)
file=new File("temp/foo.tar.gz")
file.name.split("\\.", 2)[0] // => return "foo" at [0], and "tar.gz" at [1]

You can use regular expressions better.
A function like the following would do the trick:
def getExtensionFromFilename(filename) {
def returned_value = ""
m = (filename =~ /(\.[^\.]*)$/)
if (m.size()>0) returned_value = ((m[0][0].size()>0) ? m[0][0].substring(1).trim().toLowerCase() : "");
return returned_value
}

Note
import java.io.File;
def fileNames = [ "/a/b.c/first.txt",
"/b/c/second",
"c:\\a\\b.c\\third...",
"c:\\a\b\\c\\.text"
]
def fileSeparator = "";
fileNames.each {
// You can keep the below code outside of this loop. Since my example
// contains both windows and unix file structure, I am doing this inside the loop.
fileSeparator= "\\" + File.separator;
if (!it.contains(File.separator)) {
fileSeparator = "\\/"
}
println "File extension is : ${it.find(/((?<=\.)[^\.${fileSeparator}]+)$/)}"
it = it.replaceAll(/(\.([^\.${fileSeparator}]+)?)$/,"")
println "Filename is ${it}"
}
Some of the below solutions (except the one using apache library) doesn't work for this example - c:/test.me/firstfile
If I try to find an extension for above entry, I will get ".me/firstfile" - :(
Better approach will be to find the last occurrence of File.separator if present and then look for filename or extension.
Note:
(There is a little trick happens below. For Windows, the file separator is \. But this is a special character in regular expression and so when we use a variable containing the File.separator in the regular expression, I have to escape it. That is why I do this:
def fileSeparator= "\\" + File.separator;
Hope it makes sense :)
Try this out:
import java.io.File;
String strFilename = "C:\\first.1\\second.txt";
// Few other flavors
// strFilename = "/dd/dddd/2.dd/dio/dkljlds.dd"
def fileSeparator= "\\" + File.separator;
if (!strFilename.contains(File.separator)) {
fileSeparator = "\\/"
}
def fileExtension = "";
(strFilename =~ /((?<=\.)[^\.${fileSeparator}]+)$/).each { match, extension -> fileExtension = extension }
println "Extension is:$fileExtension"

// Create an instance of a file (note the path is several levels deep)
File file = new File('/tmp/whatever/certificate.crt')
// To get the single fileName without the path (but with EXTENSION! so not answering the question of the author. Sorry for that...)
String fileName = file.parentFile.toURI().relativize(file.toURI()).getPath()

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java use regex to extract file name - java

If you want to solve it by pattern you need to escape your Pattern properly String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like: String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";

What am I missing here? What is the proper way to parse file name? The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.

Related

URLEncoder - what character set to use for empty space instead of %20 or +

Java regex for google maps url?

How to dynamically update absolute path

Java : replacing text URL with clickable HTML link

Does groovy have an easy way to get a filename without the extension?

Categories

Resources