How do I scan a folder in Java? - java

I need to scan a particular folder in Java, and be able to return the integer number of files of a particular type (based on not only extension but also naming convention.) For example, I want to know how many JPG files there are in the \src folder that have a simple integer filename (say, 1.JPG through 30.JPG). Can anyone point me in the right direction? Thx

java.io.File.list(FilenameFilter) is the method you're looking for.

I have a method that uses a regex pattern for a rather complicated file structure. Something like that could be used, although I'm sure it could be written more concisely than my example (edited for security).
/**
* Get all non-directory filenames from a given foo/flat directory
*
* #param network
* #param typeRegex
* #param locationRegex
* #return
*/
public List<String> getFilteredFilenames(String network, String typeRegex, String locationRegex) {
String regex = null;
List<String> filenames = new ArrayList<String>();
String directory;
// Look at the something network
if (network.equalsIgnoreCase("foo")) {
// Get the foo files first
directory = this.pathname + "/" + "foo/filtered/flat";
File[] foofiles = getFilenames(directory);
// run the regex if need be.
if (locationRegex != null && typeRegex != null ) {
regex = typeRegex + "." + locationRegex;
//System.out.println(regex);
}
for (int i = 0; i < foofiles.length; i++) {
if (foofiles[i].isFile()) {
String file = foofiles[i].getName();
if (regex == null) {
filenames.add(file);
}
else {
if (file.matches(regex)) {
filenames.add(file);
}
}
}
}
}
return filenames;
}

Related

Remove filename extension of a file list in Java [duplicate]

What's the most efficient way to trim the suffix in Java, like this:
title part1.txt
title part2.html
=>
title part1
title part2
This is the sort of code that we shouldn't be doing ourselves. Use libraries for the mundane stuff, save your brain for the hard stuff.
In this case, I recommend using FilenameUtils.removeExtension() from Apache Commons IO
str.substring(0, str.lastIndexOf('.'))
As using the String.substring and String.lastIndex in a one-liner is good, there are some issues in terms of being able to cope with certain file paths.
Take for example the following path:
a.b/c
Using the one-liner will result in:
a
That's incorrect.
The result should have been c, but since the file lacked an extension, but the path had a directory with a . in the name, the one-liner method was tricked into giving part of the path as the filename, which is not correct.
Need for checks
Inspired by skaffman's answer, I took a look at the FilenameUtils.removeExtension method of the Apache Commons IO.
In order to recreate its behavior, I wrote a few tests the new method should fulfill, which are the following:
Path Filename
-------------- --------
a/b/c c
a/b/c.jpg c
a/b/c.jpg.jpg c.jpg
a.b/c c
a.b/c.jpg c
a.b/c.jpg.jpg c.jpg
c c
c.jpg c
c.jpg.jpg c.jpg
(And that's all I've checked for -- there probably are other checks that should be in place that I've overlooked.)
The implementation
The following is my implementation for the removeExtension method:
public static String removeExtension(String s) {
String separator = System.getProperty("file.separator");
String filename;
// Remove the path upto the filename.
int lastSeparatorIndex = s.lastIndexOf(separator);
if (lastSeparatorIndex == -1) {
filename = s;
} else {
filename = s.substring(lastSeparatorIndex + 1);
}
// Remove the extension.
int extensionIndex = filename.lastIndexOf(".");
if (extensionIndex == -1)
return filename;
return filename.substring(0, extensionIndex);
}
Running this removeExtension method with the above tests yield the results listed above.
The method was tested with the following code. As this was run on Windows, the path separator is a \ which must be escaped with a \ when used as part of a String literal.
System.out.println(removeExtension("a\\b\\c"));
System.out.println(removeExtension("a\\b\\c.jpg"));
System.out.println(removeExtension("a\\b\\c.jpg.jpg"));
System.out.println(removeExtension("a.b\\c"));
System.out.println(removeExtension("a.b\\c.jpg"));
System.out.println(removeExtension("a.b\\c.jpg.jpg"));
System.out.println(removeExtension("c"));
System.out.println(removeExtension("c.jpg"));
System.out.println(removeExtension("c.jpg.jpg"));
The results were:
c
c
c.jpg
c
c
c.jpg
c
c
c.jpg
The results are the desired results outlined in the test the method should fulfill.
String foo = "title part1.txt";
foo = foo.substring(0, foo.lastIndexOf('.'));
BTW, in my case, when I wanted a quick solution to remove a specific extension, this is approximately what I did:
if (filename.endsWith(ext))
return filename.substring(0,filename.length() - ext.length());
else
return filename;
Use a method in com.google.common.io.Files class if your project is already dependent on Google core library. The method you need is getNameWithoutExtension.
you can try this function , very basic
public String getWithoutExtension(String fileFullPath){
return fileFullPath.substring(0, fileFullPath.lastIndexOf('.'));
}
String fileName="foo.bar";
int dotIndex=fileName.lastIndexOf('.');
if(dotIndex>=0) { // to prevent exception if there is no dot
fileName=fileName.substring(0,dotIndex);
}
Is this a trick question? :p
I can't think of a faster way atm.
I found coolbird's answer particularly useful.
But I changed the last result statements to:
if (extensionIndex == -1)
return s;
return s.substring(0, lastSeparatorIndex+1)
+ filename.substring(0, extensionIndex);
as I wanted the full path name to be returned.
So "C:\Users\mroh004.COM\Documents\Test\Test.xml" becomes
"C:\Users\mroh004.COM\Documents\Test\Test" and not
"Test"
filename.substring(filename.lastIndexOf('.'), filename.length()).toLowerCase();
Use a regex. This one replaces the last dot, and everything after it.
String baseName = fileName.replaceAll("\\.[^.]*$", "");
You can also create a Pattern object if you want to precompile the regex.
If you use Spring you could use
org.springframework.util.StringUtils.stripFilenameExtension(String path)
Strip the filename extension from the given Java resource path, e.g.
"mypath/myfile.txt" -> "mypath/myfile".
Params: path – the file path
Returns: the path with stripped filename extension
private String trimFileExtension(String fileName)
{
String[] splits = fileName.split( "\\." );
return StringUtils.remove( fileName, "." + splits[splits.length - 1] );
}
String[] splitted = fileName.split(".");
String fileNameWithoutExtension = fileName.replace("." + splitted[splitted.length - 1], "");
create a new file with string image path
String imagePath;
File test = new File(imagePath);
test.getName();
test.getPath();
getExtension(test.getName());
public static String getExtension(String uri) {
if (uri == null) {
return null;
}
int dot = uri.lastIndexOf(".");
if (dot >= 0) {
return uri.substring(dot);
} else {
// No extension.
return "";
}
}
org.apache.commons.io.FilenameUtils version 2.4 gives the following answer
public static String removeExtension(String filename) {
if (filename == null) {
return null;
}
int index = indexOfExtension(filename);
if (index == -1) {
return filename;
} else {
return filename.substring(0, index);
}
}
public static int indexOfExtension(String filename) {
if (filename == null) {
return -1;
}
int extensionPos = filename.lastIndexOf(EXTENSION_SEPARATOR);
int lastSeparator = indexOfLastSeparator(filename);
return lastSeparator > extensionPos ? -1 : extensionPos;
}
public static int indexOfLastSeparator(String filename) {
if (filename == null) {
return -1;
}
int lastUnixPos = filename.lastIndexOf(UNIX_SEPARATOR);
int lastWindowsPos = filename.lastIndexOf(WINDOWS_SEPARATOR);
return Math.max(lastUnixPos, lastWindowsPos);
}
public static final char EXTENSION_SEPARATOR = '.';
private static final char UNIX_SEPARATOR = '/';
private static final char WINDOWS_SEPARATOR = '\\';
The best what I can write trying to stick to the Path class:
Path removeExtension(Path path) {
return path.resolveSibling(path.getFileName().toString().replaceFirst("\\.[^.]*$", ""));
}
dont do stress on mind guys. i did already many times. just copy paste this public static method in your staticUtils library for future uses ;-)
static String removeExtension(String path){
String filename;
String foldrpath;
String filenameWithoutExtension;
if(path.equals("")){return "";}
if(path.contains("\\")){ // direct substring method give wrong result for "a.b.c.d\e.f.g\supersu"
filename = path.substring(path.lastIndexOf("\\"));
foldrpath = path.substring(0, path.lastIndexOf('\\'));;
if(filename.contains(".")){
filenameWithoutExtension = filename.substring(0, filename.lastIndexOf('.'));
}else{
filenameWithoutExtension = filename;
}
return foldrpath + filenameWithoutExtension;
}else{
return path.substring(0, path.lastIndexOf('.'));
}
}
I would do like this:
String title_part = "title part1.txt";
int i;
for(i=title_part.length()-1 ; i>=0 && title_part.charAt(i)!='.' ; i--);
title_part = title_part.substring(0,i);
Starting to the end till the '.' then call substring.
Edit:
Might not be a golf but it's effective :)
Keeping in mind the scenarios where there is no file extension or there is more than one file extension
example Filename : file | file.txt | file.tar.bz2
/**
*
* #param fileName
* #return file extension
* example file.fastq.gz => fastq.gz
*/
private String extractFileExtension(String fileName) {
String type = "undefined";
if (FilenameUtils.indexOfExtension(fileName) != -1) {
String fileBaseName = FilenameUtils.getBaseName(fileName);
int indexOfExtension = -1;
while (fileBaseName.contains(".")) {
indexOfExtension = FilenameUtils.indexOfExtension(fileBaseName);
fileBaseName = FilenameUtils.getBaseName(fileBaseName);
}
type = fileName.substring(indexOfExtension + 1, fileName.length());
}
return type;
}
String img = "example.jpg";
// String imgLink = "http://www.example.com/example.jpg";
URI uri = null;
try {
uri = new URI(img);
String[] segments = uri.getPath().split("/");
System.out.println(segments[segments.length-1].split("\\.")[0]);
} catch (Exception e) {
e.printStackTrace();
}
This will output example for both img and imgLink
private String trimFileName(String fileName)
{
String[] ext;
ext = fileName.split("\\.");
return fileName.replace(ext[ext.length - 1], "");
}
This code will spilt the file name into parts where ever it has " . ", For eg. If the file name is file-name.hello.txt then it will be spilted into string array as , { "file-name", "hello", "txt" }. So anyhow the last element in this string array will be the file extension of that particular file , so we can simply find the last element of any arrays with arrayname.length - 1, so after we get to know the last element, we can just replace the file extension with an empty string in that file name. Finally this will return file-name.hello. , if you want to remove also the last period then you can add the string with only period to the last element of string array in the return line. Which should look like,
return fileName.replace("." + ext[ext.length - 1], "");
public static String removeExtension(String file) {
if(file != null && file.length() > 0) {
while(file.contains(".")) {
file = file.substring(0, file.lastIndexOf('.'));
}
}
return file;
}

Reading text from swf with StuartMacKay's transform-swf library

I need to extract all the texts from some swf files. I'm using Java since I have a lot of modules developed with this language.
Thus, I did a search through the Web for all the free Java library devoted to handle SWF files.
Finally, I found the library developed by StuartMacKay. The library, named transform-swf, may be found on GitHub by clicking here.
The question is: Once I extract the GlyphIndexes from a TextSpan, how can I convert the glyps in characters?
Please, provide a complete working and tested example. No theoretical answer will be accepted nor answers like "it cannot be done", "it ain't possible", etc.
What I know and what I did
I know that the GlyphIndexes are built by using a TextTable, which is constructed by recurring to an integer that represente the font size and a font description provided by a DefineFont2 object, but when I decode all the DefineFont2, all have a zero length advance.
Here follows what I did.
//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));
//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
if (object instanceof DefineFont2) {
DefineFont2 df2 = (DefineFont2) object;
fonts.put(df2.getIdentifier(), df2);
}
}
//Now I retrieve all the texts
for (MovieTag object : list) {
if (object instanceof DefineText2) {
DefineText2 dt2 = (DefineText2) object;
for (TextSpan ts : dt2.getSpans()) {
Integer fontIdentifier = ts.getIdentifier();
if (fontIdentifier != null) {
int fontSize = ts.getHeight();
// Here I try to create an object that should
// reverse the process done by a TextTable
ReverseTextTable rtt =
new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
System.out.println(rtt.charactersForText(ts.getCharacters()));
}
}
}
}
The class ReverseTextTable follows here:
public final class ReverseTextTable {
private final transient Map<Character, GlyphIndex> characters;
private final transient Map<GlyphIndex, Character> glyphs;
public ReverseTextTable(final DefineFont2 font, final int fontSize) {
characters = new LinkedHashMap<>();
glyphs = new LinkedHashMap<>();
final List<Integer> codes = font.getCodes();
final List<Integer> advances = font.getAdvances();
final float scale = fontSize / EMSQUARE;
final int count = codes.size();
for (int i = 0; i < count; i++) {
characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
(int) (advances.get(i) * scale)));
glyphs.put(new GlyphIndex(i,
(int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
}
}
//This method should reverse from a list of GlyphIndexes to a String
public String charactersForText(final List<GlyphIndex> list) {
String text="";
for(GlyphIndex gi: list){
text+=glyphs.get(gi);
}
return text;
}
}
Unfortunately, the list of advances from DefineFont2 is empty, then the constructor of ReverseTableText get an ArrayIndexOutOfBoundException.
Honestly, I don't know how to do that in Java. I'm not claiming that it is not possible, I also believe that there is a way to do that. However, you said that there are a lot of libraries that do that. You also suggested a library, i.e. swftools. So, I suggest to recurr to that library to extract the text from a flash file. To do that you can use Runtime.exec() just to execute a command line to run that library.
Personally, I prefer Apache Commons exec rather than the standard library released with JDK. Well, just let me show you how you should do. The executable file that you should use is "swfstrings.exe". Suppose that it is put in "C:\". Suppose that in the same folder you can find a flash file, e.g. page.swf. Then, I tried the following code (it works fine):
Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
commandLine.addArgument("\"" + swfFile.toString() + "\"");
DefaultExecutor executor = new DefaultExecutor();
executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
//0 for file not found, -1 for error
ByteArrayOutputStream stdout = new ByteArrayOutputStream();
PumpStreamHandler psh = new PumpStreamHandler(stdout);
executor.setStreamHandler(psh);
int exitValue;
try{
exitValue = executor.execute(commandLine);
}catch(org.apache.commons.exec.ExecuteException ex){
psh.stop();
}
if(!executor.isFailure(exitValue)){
String out = stdout.toString("UTF-8"); // here you have the extracted text
}
I know, this is not exactly the answer that you requested, but works fine.
I happened to be working on decompiling an SWF in Java now and I came across this question while figuring out how to reverse engineer the original text back.
After looking at the source code, I realise its really straightforward. Each font has an assigned sequence of characters that can be retrieved by calling DefineFont2.getCodes(), and the glyphIndex is the index to the matching character in DefineFont2.getCodes().
However, in cases where there are multiple fonts in use in a single SWF file, it is difficult to match each DefineText to the corresponding DefineFont2 because there's no attributes that identifies the DefineFont2 used for each DefineText.
To work around this issue, I came up with a self-learning algorithm which will attempt to guess the right DefineFont2 for each DefineText and hence derive the original text correctly.
To reverse engineer the original text back, I created a class called FontLearner:
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* #param font The font to add to the learner
* #param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* #param text The DefineText to retrieve the original String from
* #return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
Usage:
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());
FontLearner learner = new FontLearner();
DefineFont2 font = null;
List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
font = (DefineFont2) object;
} else if (object instanceof DefineText) {
DefineText text = (DefineText) object;
if (font != null) {
learner.addFont(font, text);
font = null;
}
String line = learner.getString(text); // reverse engineers the line
}
I am happy to say that this method has given me a 100% accuracy in reverse engineering the original String using StuartMacKay's transform-swf library.
Its seems to be difficult on what your trying to achieve, Your trying to secompile the file bur i am sorry to say that its not possible , What I would suggest you to do is to convert it into some bitmap (if possible) or by any other method try to read the characters using OCR
There are some software's which do that, you can also check some forums regarding that. Because once compiled version of swf is very difficult (and not possible as far as i know). You can check this decompiler if you want or try using some other languages like the project here
I had a similar problem with long strings using transform-swf library.
Got the source code and debugged it.
I believe there was a small bug in class com.flagstone.transform.coder.SWFDecoder.
Line 540 (applicable to version 3.0.2), change
dest += length;
with
dest += count;
That should do it for you (it's about extracting strings).
I notified Stuart as well. The problem appears only if your strings are very large.
I know this isn't what you asked but I needed to pull text from SWF recently using Java and found the ffdec library much better than transform-swf
Comment if anyone needs sample code

JUnit testing for this particular method

I'm trying to write some JUnit tests for a Java method that takes a base URL and target URL and returns the target URL relative to the given base URL.
I'm using the category based partition to make my test set. Currently i'm testing to check the following:
check the two input URL's have the
same protocol and host;
check for
when the paths aren't the same and
that the relative URL adjusts
correctly;
check for when the base
URL is longer than the target URL;
check for when the target URL is
longer than the base URL;
check for
when the base URL and target URL are
identical;
I was wondering how other people would test this method using JUnit? Am i missing any criteria?
/**
* This method converts an absolute url to an url relative to a given base-url.
* The algorithm is somewhat chaotic, but it works (Maybe rewrite it).
* Be careful, the method is ".mm"-specific. Something like this should be included
* in the librarys, but I couldn't find it. You can create a new absolute url with
* "new URL(URL context, URL relative)".
*/
public static String toRelativeURL(URL base, URL target) {
// Precondition: If URL is a path to folder, then it must end with '/' character.
if( (base.getProtocol().equals(target.getProtocol())) &&
(base.getHost().equals(target.getHost()))) {
String baseString = base.getFile();
String targetString = target.getFile();
String result = "";
//remove filename from URL
baseString = baseString.substring(0, baseString.lastIndexOf("/")+1);
//remove filename from URL
targetString = targetString.substring(0, targetString.lastIndexOf("/")+1);
StringTokenizer baseTokens = new StringTokenizer(baseString,"/");//Maybe this causes problems under windows
StringTokenizer targetTokens = new StringTokenizer(targetString,"/");//Maybe this causes problems under windows
String nextBaseToken = "", nextTargetToken = "";
//Algorithm
while(baseTokens.hasMoreTokens() && targetTokens.hasMoreTokens()) {
nextBaseToken = baseTokens.nextToken();
nextTargetToken = targetTokens.nextToken();
System.out.println("while1");
if (!(nextBaseToken.equals(nextTargetToken))) {
System.out.println("if1");
while(true) {
result = result.concat("../");
System.out.println(result);
if (!baseTokens.hasMoreTokens()) {
System.out.println("break1");
break;
}
System.out.println("break2");
nextBaseToken = baseTokens.nextToken();
}
while(true) {
result = result.concat(nextTargetToken+"/");
System.out.println(result);
if (!targetTokens.hasMoreTokens()) {
System.out.println("break3");
break;
}
System.out.println("break4");
nextTargetToken = targetTokens.nextToken();
}
String temp = target.getFile();
result = result.concat(temp.substring(temp.lastIndexOf("/")+1,temp.length()));
System.out.println("1");
return result;
}
}
while(baseTokens.hasMoreTokens()) {
result = result.concat("../");
baseTokens.nextToken();
}
while(targetTokens.hasMoreTokens()) {
nextTargetToken = targetTokens.nextToken();
result = result.concat(nextTargetToken + "/");
}
String temp = target.getFile();
result = result.concat(temp.substring(temp.lastIndexOf("/")+1,temp.length()));
System.out.println("2");
return result;
}
System.out.println("3");
return target.toString();
}
}
Just a few thoughts...
You might want to test if either (or both) your URL input is null. :)
If the target URL has parameters (ex: http://host/app/bla?param1=value&param2=value), does the generated relative URL contain the parameters?
If the target URL is just http://host, will it cause IndexOutOfBoundException on targetString.lastIndexOf("/")... the same applies to base URL.

How do I get the file extension of a file in Java?

Just to be clear, I'm not looking for the MIME type.
Let's say I have the following input: /path/to/file/foo.txt
I'd like a way to break this input up, specifically into .txt for the extension. Is there any built in way to do this in Java? I would like to avoid writing my own parser.
In this case, use FilenameUtils.getExtension from Apache Commons IO
Here is an example of how to use it (you may specify either full path or just file name):
import org.apache.commons.io.FilenameUtils;
// ...
String ext1 = FilenameUtils.getExtension("/path/to/file/foo.txt"); // returns "txt"
String ext2 = FilenameUtils.getExtension("bar.exe"); // returns "exe"
Maven dependency:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
Gradle Groovy DSL
implementation 'commons-io:commons-io:2.6'
Gradle Kotlin DSL
implementation("commons-io:commons-io:2.6")
Others https://search.maven.org/artifact/commons-io/commons-io/2.6/jar
Do you really need a "parser" for this?
String extension = "";
int i = fileName.lastIndexOf('.');
if (i > 0) {
extension = fileName.substring(i+1);
}
Assuming that you're dealing with simple Windows-like file names, not something like archive.tar.gz.
Btw, for the case that a directory may have a '.', but the filename itself doesn't (like /path/to.a/file), you can do
String extension = "";
int i = fileName.lastIndexOf('.');
int p = Math.max(fileName.lastIndexOf('/'), fileName.lastIndexOf('\\'));
if (i > p) {
extension = fileName.substring(i+1);
}
private String getFileExtension(File file) {
String name = file.getName();
int lastIndexOf = name.lastIndexOf(".");
if (lastIndexOf == -1) {
return ""; // empty extension
}
return name.substring(lastIndexOf);
}
If you use Guava library, you can resort to Files utility class. It has a specific method, getFileExtension(). For instance:
String path = "c:/path/to/file/foo.txt";
String ext = Files.getFileExtension(path);
System.out.println(ext); //prints txt
In addition you may also obtain the filename with a similar function, getNameWithoutExtension():
String filename = Files.getNameWithoutExtension(path);
System.out.println(filename); //prints foo
If on Android, you can use this:
String ext = android.webkit.MimeTypeMap.getFileExtensionFromUrl(file.getName());
This is a tested method
public static String getExtension(String fileName) {
char ch;
int len;
if(fileName==null ||
(len = fileName.length())==0 ||
(ch = fileName.charAt(len-1))=='/' || ch=='\\' || //in the case of a directory
ch=='.' ) //in the case of . or ..
return "";
int dotInd = fileName.lastIndexOf('.'),
sepInd = Math.max(fileName.lastIndexOf('/'), fileName.lastIndexOf('\\'));
if( dotInd<=sepInd )
return "";
else
return fileName.substring(dotInd+1).toLowerCase();
}
And test case:
#Test
public void testGetExtension() {
assertEquals("", getExtension("C"));
assertEquals("ext", getExtension("C.ext"));
assertEquals("ext", getExtension("A/B/C.ext"));
assertEquals("", getExtension("A/B/C.ext/"));
assertEquals("", getExtension("A/B/C.ext/.."));
assertEquals("bin", getExtension("A/B/C.bin"));
assertEquals("hidden", getExtension(".hidden"));
assertEquals("dsstore", getExtension("/user/home/.dsstore"));
assertEquals("", getExtension(".strange."));
assertEquals("3", getExtension("1.2.3"));
assertEquals("exe", getExtension("C:\\Program Files (x86)\\java\\bin\\javaw.exe"));
}
If you use Spring framework in your project, then you can use StringUtils
import org.springframework.util.StringUtils;
StringUtils.getFilenameExtension("YourFileName")
String path = "/Users/test/test.txt";
String extension = "";
if (path.contains("."))
extension = path.substring(path.lastIndexOf("."));
return ".txt"
if you want only "txt", make path.lastIndexOf(".") + 1
In order to take into account file names without characters before the dot, you have to use that slight variation of the accepted answer:
String extension = "";
int i = fileName.lastIndexOf('.');
if (i >= 0) {
extension = fileName.substring(i+1);
}
"file.doc" => "doc"
"file.doc.gz" => "gz"
".doc" => "doc"
My dirty and may tiniest using String.replaceAll:
.replaceAll("^.*\\.(.*)$", "$1")
Note that first * is greedy so it will grab most possible characters as far as it can and then just last dot and file extension will be left.
As is obvious from all the other answers, there's no adequate "built-in" function. This is a safe and simple method.
String getFileExtension(File file) {
if (file == null) {
return "";
}
String name = file.getName();
int i = name.lastIndexOf('.');
String ext = i > 0 ? name.substring(i + 1) : "";
return ext;
}
Here is another one-liner for Java 8.
String ext = Arrays.stream(fileName.split("\\.")).reduce((a,b) -> b).orElse(null)
It works as follows:
Split the string into an array of strings using "."
Convert the array into a stream
Use reduce to get the last element of the stream, i.e. the file extension
How about (using Java 1.5 RegEx):
String[] split = fullFileName.split("\\.");
String ext = split[split.length - 1];
If you plan to use Apache commons-io,and just want to check the file's extension and then do some operation,you can use this,here is a snippet:
if(FilenameUtils.isExtension(file.getName(),"java")) {
someoperation();
}
How about JFileChooser? It is not straightforward as you will need to parse its final output...
JFileChooser filechooser = new JFileChooser();
File file = new File("your.txt");
System.out.println("the extension type:"+filechooser.getTypeDescription(file));
which is a MIME type...
OK...I forget that you don't want to know its MIME type.
Interesting code in the following link:
http://download.oracle.com/javase/tutorial/uiswing/components/filechooser.html
/*
* Get the extension of a file.
*/
public static String getExtension(File f) {
String ext = null;
String s = f.getName();
int i = s.lastIndexOf('.');
if (i > 0 && i < s.length() - 1) {
ext = s.substring(i+1).toLowerCase();
}
return ext;
}
Related question:
How do I trim a file extension from a String in Java?
Here's a method that handles .tar.gz properly, even in a path with dots in directory names:
private static final String getExtension(final String filename) {
if (filename == null) return null;
final String afterLastSlash = filename.substring(filename.lastIndexOf('/') + 1);
final int afterLastBackslash = afterLastSlash.lastIndexOf('\\') + 1;
final int dotIndex = afterLastSlash.indexOf('.', afterLastBackslash);
return (dotIndex == -1) ? "" : afterLastSlash.substring(dotIndex + 1);
}
afterLastSlash is created to make finding afterLastBackslash quicker since it won't have to search the whole string if there are some slashes in it.
The char[] inside the original String is reused, adding no garbage there, and the JVM will probably notice that afterLastSlash is immediately garbage in order to put it on the stack instead of the heap.
This particular question gave me a lot of trouble then i found a very simple solution for this problem which i'm posting here.
file.getName().toLowerCase().endsWith(".txt");
That's it.
Java 20 EA
As of Java 20 EA (early-access), there is finally a new method Path#getExtension that returns the extension as a String:
Paths.get("/Users/admin/notes.txt").getExtension(); // "txt"
Paths.get("/Users/admin/.gitconfig").getExtension(); // "gitconfig"
Paths.get("/Users/admin/configuration.xml.zip").getExtension(); // "zip"
Paths.get("/Users/admin/file").getExtension(); // null
// Modified from EboMike's answer
String extension = "/path/to/file/foo.txt".substring("/path/to/file/foo.txt".lastIndexOf('.'));
extension should have ".txt" in it when run.
Here's the version with Optional as a return value (cause you can't be sure the file has an extension)... also sanity checks...
import java.io.File;
import java.util.Optional;
public class GetFileExtensionTool {
public static Optional<String> getFileExtension(File file) {
if (file == null) {
throw new NullPointerException("file argument was null");
}
if (!file.isFile()) {
throw new IllegalArgumentException("getFileExtension(File file)"
+ " called on File object that wasn't an actual file"
+ " (perhaps a directory or device?). file had path: "
+ file.getAbsolutePath());
}
String fileName = file.getName();
int i = fileName.lastIndexOf('.');
if (i > 0) {
return Optional.of(fileName.substring(i + 1));
} else {
return Optional.empty();
}
}
}
How about REGEX version:
static final Pattern PATTERN = Pattern.compile("(.*)\\.(.*)");
Matcher m = PATTERN.matcher(path);
if (m.find()) {
System.out.println("File path/name: " + m.group(1));
System.out.println("Extention: " + m.group(2));
}
or with null extension supported:
static final Pattern PATTERN =
Pattern.compile("((.*\\" + File.separator + ")?(.*)(\\.(.*)))|(.*\\" + File.separator + ")?(.*)");
class Separated {
String path, name, ext;
}
Separated parsePath(String path) {
Separated res = new Separated();
Matcher m = PATTERN.matcher(path);
if (m.find()) {
if (m.group(1) != null) {
res.path = m.group(2);
res.name = m.group(3);
res.ext = m.group(5);
} else {
res.path = m.group(6);
res.name = m.group(7);
}
}
return res;
}
Separated sp = parsePath("/root/docs/readme.txt");
System.out.println("path: " + sp.path);
System.out.println("name: " + sp.name);
System.out.println("Extention: " + sp.ext);
result for *nix:
path: /root/docs/
name: readme
Extention: txt
for windows, parsePath("c:\windows\readme.txt"):
path: c:\windows\
name: readme
Extention: txt
String extension = com.google.common.io.Files.getFileExtension("fileName.jpg");
Here I made a small method (however not that secure and doesnt check for many errors), but if it is only you that is programming a general java-program, this is more than enough to find the filetype. This is not working for complex filetypes, but those are normally not used as much.
public static String getFileType(String path){
String fileType = null;
fileType = path.substring(path.indexOf('.',path.lastIndexOf('/'))+1).toUpperCase();
return fileType;
}
Getting File Extension from File Name
/**
* The extension separator character.
*/
private static final char EXTENSION_SEPARATOR = '.';
/**
* The Unix separator character.
*/
private static final char UNIX_SEPARATOR = '/';
/**
* The Windows separator character.
*/
private static final char WINDOWS_SEPARATOR = '\\';
/**
* The system separator character.
*/
private static final char SYSTEM_SEPARATOR = File.separatorChar;
/**
* Gets the extension of a filename.
* <p>
* This method returns the textual part of the filename after the last dot.
* There must be no directory separator after the dot.
* <pre>
* foo.txt --> "txt"
* a/b/c.jpg --> "jpg"
* a/b.txt/c --> ""
* a/b/c --> ""
* </pre>
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to retrieve the extension of.
* #return the extension of the file or an empty string if none exists.
*/
public static String getExtension(String filename) {
if (filename == null) {
return null;
}
int index = indexOfExtension(filename);
if (index == -1) {
return "";
} else {
return filename.substring(index + 1);
}
}
/**
* Returns the index of the last extension separator character, which is a dot.
* <p>
* This method also checks that there is no directory separator after the last dot.
* To do this it uses {#link #indexOfLastSeparator(String)} which will
* handle a file in either Unix or Windows format.
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to find the last path separator in, null returns -1
* #return the index of the last separator character, or -1 if there
* is no such character
*/
public static int indexOfExtension(String filename) {
if (filename == null) {
return -1;
}
int extensionPos = filename.lastIndexOf(EXTENSION_SEPARATOR);
int lastSeparator = indexOfLastSeparator(filename);
return (lastSeparator > extensionPos ? -1 : extensionPos);
}
/**
* Returns the index of the last directory separator character.
* <p>
* This method will handle a file in either Unix or Windows format.
* The position of the last forward or backslash is returned.
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to find the last path separator in, null returns -1
* #return the index of the last separator character, or -1 if there
* is no such character
*/
public static int indexOfLastSeparator(String filename) {
if (filename == null) {
return -1;
}
int lastUnixPos = filename.lastIndexOf(UNIX_SEPARATOR);
int lastWindowsPos = filename.lastIndexOf(WINDOWS_SEPARATOR);
return Math.max(lastUnixPos, lastWindowsPos);
}
Credits
Copied from Apache FileNameUtils Class - http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/1.3.2/org/apache/commons/io/FilenameUtils.java#FilenameUtils.getExtension%28java.lang.String%29
Without use of any library, you can use the String method split as follows :
String[] splits = fileNames.get(i).split("\\.");
String extension = "";
if(splits.length >= 2)
{
extension = splits[splits.length-1];
}
private String getExtension(File file)
{
String fileName = file.getName();
String[] ext = fileName.split("\\.");
return ext[ext.length -1];
}
Just a regular-expression based alternative. Not that fast, not that good.
Pattern pattern = Pattern.compile("\\.([^.]*)$");
Matcher matcher = pattern.matcher(fileName);
if (matcher.find()) {
String ext = matcher.group(1);
}
I like the simplicity of spectre's answer, and linked in one of his comments is a link to another answer that fixes dots in file paths, on another question, made by EboMike.
Without implementing some sort of third party API, I suggest:
private String getFileExtension(File file) {
String name = file.getName().substring(Math.max(file.getName().lastIndexOf('/'),
file.getName().lastIndexOf('\\')) < 0 ? 0 : Math.max(file.getName().lastIndexOf('/'),
file.getName().lastIndexOf('\\')));
int lastIndexOf = name.lastIndexOf(".");
if (lastIndexOf == -1) {
return ""; // empty extension
}
return name.substring(lastIndexOf + 1); // doesn't return "." with extension
}
Something like this may be useful in, say, any of ImageIO's write methods, where the file format has to be passed in.
Why use a whole third party API when you can DIY?
The fluent way:
public static String fileExtension(String fileName) {
return Optional.of(fileName.lastIndexOf(".")).filter(i-> i >= 0)
.filter(i-> i > fileName.lastIndexOf(File.separator))
.map(fileName::substring).orElse("");
}
try this.
String[] extension = "adadad.adad.adnandad.jpg".split("\\.(?=[^\\.]+$)"); // ['adadad.adad.adnandad','jpg']
extension[1] // jpg

Generating a canonical path

Does any one know of any Java libraries I could use to generate canonical paths (basically remove back-references).
I need something that will do the following:
Raw Path -> Canonical Path
/../foo/ -> /foo
/foo/ -> /foo
/../../../ -> /
/./foo/./ -> /foo
//foo//bar -> /foo/bar
//foo/../bar -> /bar
etc...
At the moment I lazily rely on using:
new File("/", path).getCanonicalPath();
But this resolves the path against the actual file system, and is synchronised.
java.lang.Thread.State: BLOCKED (on object monitor)
at java.io.ExpiringCache.get(ExpiringCache.java:55)
- waiting to lock <0x93a0d180> (a java.io.ExpiringCache)
at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:137)
at java.io.File.getCanonicalPath(File.java:559)
The paths that I am canonicalising do not exist on my file system, so just the logic of the method will do me fine, thus not requiring any synchronisation. I'm hoping for a well tested library rather than having to write my own.
I think you can use the URI class to do this; e.g. if the path contains no characters that need escaping in a URI path component, you can do this.
String normalized = new URI(path).normalize().getPath();
If the path contains (or might contain) characters that need escaping, the multi-argument constructors will escape the path argument, and you can provide null for the other arguments.
Notes:
The above normalizes a file path by treating it as a relative URI. If you want to normalize an entire URI ... including the (optional) scheme, authority, and other components, don't call getPath()!
URI normalization does not involve looking at the file system as File canonicalization does. But the flip side is that normalization behaves differently to canonicalization when there are symbolic links in the path.
Using Apache Commons IO (a well-known and well-tested library)
public static String normalize(String filename)
will do exactly what you're looking for.
Example:
String result = FilenameUtils.normalize(myFile.getAbsolutePath());
If you don't need path canonization but only normalization, in Java 7 you can use java.nio.file.Path.normalize method.
According to http://docs.oracle.com/javase/7/docs/api/java/nio/file/Path.html:
This method does not access the file system; the path may not locate a file that exists.
If you work with File object you can use something like this:
file.toPath().normalize().toFile()
You could try an algorithm like this:
String collapsePath(String path) {
/* Split into directory parts */
String[] directories = path.split("/");
String[] newDirectories = new String[directories.length];
int i, j = 0;
for (i=0; i<directories.length; i++) {
/* Ignore the previous directory if it is a double dot */
if (directories[i].equals("..") && j > 0)
newDirectories[j--] = "";
/* Completely ignore single dots */
else if (! directories[i].equals("."))
newDirectories[j++] = directories[i];
}
/* Ah, what I would give for String.join() */
String newPath = new String();
for (i=0; i < j; i++)
newPath = newPath + "/" + newDirectories[i];
return newPath;
}
It isn't perfect; it's linear over the number of directories but does make a copy in memory.
Which kind of path is qualified as a Canonical Path is OS dependent.
That's why Java need to check it on the filesystem.
So there's no simple logic to test the path without knowing the OS.
So, while normalizing can do the trick, here is a procedure that exposes a little more of the Java API than would simply calling Paths.normalize()
Say I want to find a file that is not in my current directory on the file system.
My working code file is
myproject/src/JavaCode.java
Located in myproject/src/. My file is in
../../data/myfile.txt
I'm testing my program running my code from JavaCode.java
public static void main(String[] args) {
findFile("../../data","myfile.txt");
System.out.println("Found it.");
}
public static File findFile(String inputPath, String inputFile) {
File dataDir = new File("").getAbsoluteFile(); // points dataDir to working directory
String delimiters = "" + '\\' + '/'; // dealing with different system separators
StringTokenizer st = new StringTokenizer(inputPath, delimiters);
while(st.hasMoreTokens()) {
String s = st.nextToken();
if(s.trim().isEmpty() || s.equals("."))
continue;
else if(s.equals(".."))
dataDir = dataDir.getParentFile();
else {
dataDir = new File(dataDir, s);
if(!dataDir.exists())
throw new RuntimeException("Data folder does not exist.");
}
}
return new File(dataDir, inputFile);
}
Having placed a file at the specified location, this should print "Found it."
I'm assuming you have strings and you want strings, and you have Java 7 available now, and your default file system uses '/' as a path separator, so try:
String output = FileSystems.getDefault().getPath(input).normalize().toString();
You can try this out with:
/**
* Input Output
* /../foo/ -> /foo
* /foo/ -> /foo
* /../../../ -> /
* /./foo/./ -> /foo
* //foo//bar -> /foo/bar
* //foo/../bar -> /bar
*/
#Test
public void testNormalizedPath() throws URISyntaxException, IOException {
String[] in = new String[]{"/../foo/", "/foo/", "/../../../", "/./foo/./",
"//foo/bar", "//foo/../bar", "/", "/foo"};
String[] ex = new String[]{"/foo", "/foo", "/", "/foo", "/foo/bar", "/bar", "/", "/foo"};
FileSystem fs = FileSystems.getDefault();
for (int i = 0; i < in.length; i++) {
assertEquals(ex[i], fs.getPath(in[i]).normalize().toString());
}
}

Categories