Just to be clear, I'm not looking for the MIME type.
Let's say I have the following input: /path/to/file/foo.txt
I'd like a way to break this input up, specifically into .txt for the extension. Is there any built in way to do this in Java? I would like to avoid writing my own parser.
In this case, use FilenameUtils.getExtension from Apache Commons IO
Here is an example of how to use it (you may specify either full path or just file name):
import org.apache.commons.io.FilenameUtils;
// ...
String ext1 = FilenameUtils.getExtension("/path/to/file/foo.txt"); // returns "txt"
String ext2 = FilenameUtils.getExtension("bar.exe"); // returns "exe"
Maven dependency:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
Gradle Groovy DSL
implementation 'commons-io:commons-io:2.6'
Gradle Kotlin DSL
implementation("commons-io:commons-io:2.6")
Others https://search.maven.org/artifact/commons-io/commons-io/2.6/jar
Do you really need a "parser" for this?
String extension = "";
int i = fileName.lastIndexOf('.');
if (i > 0) {
extension = fileName.substring(i+1);
}
Assuming that you're dealing with simple Windows-like file names, not something like archive.tar.gz.
Btw, for the case that a directory may have a '.', but the filename itself doesn't (like /path/to.a/file), you can do
String extension = "";
int i = fileName.lastIndexOf('.');
int p = Math.max(fileName.lastIndexOf('/'), fileName.lastIndexOf('\\'));
if (i > p) {
extension = fileName.substring(i+1);
}
private String getFileExtension(File file) {
String name = file.getName();
int lastIndexOf = name.lastIndexOf(".");
if (lastIndexOf == -1) {
return ""; // empty extension
}
return name.substring(lastIndexOf);
}
If you use Guava library, you can resort to Files utility class. It has a specific method, getFileExtension(). For instance:
String path = "c:/path/to/file/foo.txt";
String ext = Files.getFileExtension(path);
System.out.println(ext); //prints txt
In addition you may also obtain the filename with a similar function, getNameWithoutExtension():
String filename = Files.getNameWithoutExtension(path);
System.out.println(filename); //prints foo
If on Android, you can use this:
String ext = android.webkit.MimeTypeMap.getFileExtensionFromUrl(file.getName());
This is a tested method
public static String getExtension(String fileName) {
char ch;
int len;
if(fileName==null ||
(len = fileName.length())==0 ||
(ch = fileName.charAt(len-1))=='/' || ch=='\\' || //in the case of a directory
ch=='.' ) //in the case of . or ..
return "";
int dotInd = fileName.lastIndexOf('.'),
sepInd = Math.max(fileName.lastIndexOf('/'), fileName.lastIndexOf('\\'));
if( dotInd<=sepInd )
return "";
else
return fileName.substring(dotInd+1).toLowerCase();
}
And test case:
#Test
public void testGetExtension() {
assertEquals("", getExtension("C"));
assertEquals("ext", getExtension("C.ext"));
assertEquals("ext", getExtension("A/B/C.ext"));
assertEquals("", getExtension("A/B/C.ext/"));
assertEquals("", getExtension("A/B/C.ext/.."));
assertEquals("bin", getExtension("A/B/C.bin"));
assertEquals("hidden", getExtension(".hidden"));
assertEquals("dsstore", getExtension("/user/home/.dsstore"));
assertEquals("", getExtension(".strange."));
assertEquals("3", getExtension("1.2.3"));
assertEquals("exe", getExtension("C:\\Program Files (x86)\\java\\bin\\javaw.exe"));
}
If you use Spring framework in your project, then you can use StringUtils
import org.springframework.util.StringUtils;
StringUtils.getFilenameExtension("YourFileName")
String path = "/Users/test/test.txt";
String extension = "";
if (path.contains("."))
extension = path.substring(path.lastIndexOf("."));
return ".txt"
if you want only "txt", make path.lastIndexOf(".") + 1
In order to take into account file names without characters before the dot, you have to use that slight variation of the accepted answer:
String extension = "";
int i = fileName.lastIndexOf('.');
if (i >= 0) {
extension = fileName.substring(i+1);
}
"file.doc" => "doc"
"file.doc.gz" => "gz"
".doc" => "doc"
My dirty and may tiniest using String.replaceAll:
.replaceAll("^.*\\.(.*)$", "$1")
Note that first * is greedy so it will grab most possible characters as far as it can and then just last dot and file extension will be left.
As is obvious from all the other answers, there's no adequate "built-in" function. This is a safe and simple method.
String getFileExtension(File file) {
if (file == null) {
return "";
}
String name = file.getName();
int i = name.lastIndexOf('.');
String ext = i > 0 ? name.substring(i + 1) : "";
return ext;
}
Here is another one-liner for Java 8.
String ext = Arrays.stream(fileName.split("\\.")).reduce((a,b) -> b).orElse(null)
It works as follows:
Split the string into an array of strings using "."
Convert the array into a stream
Use reduce to get the last element of the stream, i.e. the file extension
How about (using Java 1.5 RegEx):
String[] split = fullFileName.split("\\.");
String ext = split[split.length - 1];
If you plan to use Apache commons-io,and just want to check the file's extension and then do some operation,you can use this,here is a snippet:
if(FilenameUtils.isExtension(file.getName(),"java")) {
someoperation();
}
How about JFileChooser? It is not straightforward as you will need to parse its final output...
JFileChooser filechooser = new JFileChooser();
File file = new File("your.txt");
System.out.println("the extension type:"+filechooser.getTypeDescription(file));
which is a MIME type...
OK...I forget that you don't want to know its MIME type.
Interesting code in the following link:
http://download.oracle.com/javase/tutorial/uiswing/components/filechooser.html
/*
* Get the extension of a file.
*/
public static String getExtension(File f) {
String ext = null;
String s = f.getName();
int i = s.lastIndexOf('.');
if (i > 0 && i < s.length() - 1) {
ext = s.substring(i+1).toLowerCase();
}
return ext;
}
Related question:
How do I trim a file extension from a String in Java?
Here's a method that handles .tar.gz properly, even in a path with dots in directory names:
private static final String getExtension(final String filename) {
if (filename == null) return null;
final String afterLastSlash = filename.substring(filename.lastIndexOf('/') + 1);
final int afterLastBackslash = afterLastSlash.lastIndexOf('\\') + 1;
final int dotIndex = afterLastSlash.indexOf('.', afterLastBackslash);
return (dotIndex == -1) ? "" : afterLastSlash.substring(dotIndex + 1);
}
afterLastSlash is created to make finding afterLastBackslash quicker since it won't have to search the whole string if there are some slashes in it.
The char[] inside the original String is reused, adding no garbage there, and the JVM will probably notice that afterLastSlash is immediately garbage in order to put it on the stack instead of the heap.
This particular question gave me a lot of trouble then i found a very simple solution for this problem which i'm posting here.
file.getName().toLowerCase().endsWith(".txt");
That's it.
Java 20 EA
As of Java 20 EA (early-access), there is finally a new method Path#getExtension that returns the extension as a String:
Paths.get("/Users/admin/notes.txt").getExtension(); // "txt"
Paths.get("/Users/admin/.gitconfig").getExtension(); // "gitconfig"
Paths.get("/Users/admin/configuration.xml.zip").getExtension(); // "zip"
Paths.get("/Users/admin/file").getExtension(); // null
// Modified from EboMike's answer
String extension = "/path/to/file/foo.txt".substring("/path/to/file/foo.txt".lastIndexOf('.'));
extension should have ".txt" in it when run.
Here's the version with Optional as a return value (cause you can't be sure the file has an extension)... also sanity checks...
import java.io.File;
import java.util.Optional;
public class GetFileExtensionTool {
public static Optional<String> getFileExtension(File file) {
if (file == null) {
throw new NullPointerException("file argument was null");
}
if (!file.isFile()) {
throw new IllegalArgumentException("getFileExtension(File file)"
+ " called on File object that wasn't an actual file"
+ " (perhaps a directory or device?). file had path: "
+ file.getAbsolutePath());
}
String fileName = file.getName();
int i = fileName.lastIndexOf('.');
if (i > 0) {
return Optional.of(fileName.substring(i + 1));
} else {
return Optional.empty();
}
}
}
How about REGEX version:
static final Pattern PATTERN = Pattern.compile("(.*)\\.(.*)");
Matcher m = PATTERN.matcher(path);
if (m.find()) {
System.out.println("File path/name: " + m.group(1));
System.out.println("Extention: " + m.group(2));
}
or with null extension supported:
static final Pattern PATTERN =
Pattern.compile("((.*\\" + File.separator + ")?(.*)(\\.(.*)))|(.*\\" + File.separator + ")?(.*)");
class Separated {
String path, name, ext;
}
Separated parsePath(String path) {
Separated res = new Separated();
Matcher m = PATTERN.matcher(path);
if (m.find()) {
if (m.group(1) != null) {
res.path = m.group(2);
res.name = m.group(3);
res.ext = m.group(5);
} else {
res.path = m.group(6);
res.name = m.group(7);
}
}
return res;
}
Separated sp = parsePath("/root/docs/readme.txt");
System.out.println("path: " + sp.path);
System.out.println("name: " + sp.name);
System.out.println("Extention: " + sp.ext);
result for *nix:
path: /root/docs/
name: readme
Extention: txt
for windows, parsePath("c:\windows\readme.txt"):
path: c:\windows\
name: readme
Extention: txt
String extension = com.google.common.io.Files.getFileExtension("fileName.jpg");
Here I made a small method (however not that secure and doesnt check for many errors), but if it is only you that is programming a general java-program, this is more than enough to find the filetype. This is not working for complex filetypes, but those are normally not used as much.
public static String getFileType(String path){
String fileType = null;
fileType = path.substring(path.indexOf('.',path.lastIndexOf('/'))+1).toUpperCase();
return fileType;
}
Getting File Extension from File Name
/**
* The extension separator character.
*/
private static final char EXTENSION_SEPARATOR = '.';
/**
* The Unix separator character.
*/
private static final char UNIX_SEPARATOR = '/';
/**
* The Windows separator character.
*/
private static final char WINDOWS_SEPARATOR = '\\';
/**
* The system separator character.
*/
private static final char SYSTEM_SEPARATOR = File.separatorChar;
/**
* Gets the extension of a filename.
* <p>
* This method returns the textual part of the filename after the last dot.
* There must be no directory separator after the dot.
* <pre>
* foo.txt --> "txt"
* a/b/c.jpg --> "jpg"
* a/b.txt/c --> ""
* a/b/c --> ""
* </pre>
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to retrieve the extension of.
* #return the extension of the file or an empty string if none exists.
*/
public static String getExtension(String filename) {
if (filename == null) {
return null;
}
int index = indexOfExtension(filename);
if (index == -1) {
return "";
} else {
return filename.substring(index + 1);
}
}
/**
* Returns the index of the last extension separator character, which is a dot.
* <p>
* This method also checks that there is no directory separator after the last dot.
* To do this it uses {#link #indexOfLastSeparator(String)} which will
* handle a file in either Unix or Windows format.
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to find the last path separator in, null returns -1
* #return the index of the last separator character, or -1 if there
* is no such character
*/
public static int indexOfExtension(String filename) {
if (filename == null) {
return -1;
}
int extensionPos = filename.lastIndexOf(EXTENSION_SEPARATOR);
int lastSeparator = indexOfLastSeparator(filename);
return (lastSeparator > extensionPos ? -1 : extensionPos);
}
/**
* Returns the index of the last directory separator character.
* <p>
* This method will handle a file in either Unix or Windows format.
* The position of the last forward or backslash is returned.
* <p>
* The output will be the same irrespective of the machine that the code is running on.
*
* #param filename the filename to find the last path separator in, null returns -1
* #return the index of the last separator character, or -1 if there
* is no such character
*/
public static int indexOfLastSeparator(String filename) {
if (filename == null) {
return -1;
}
int lastUnixPos = filename.lastIndexOf(UNIX_SEPARATOR);
int lastWindowsPos = filename.lastIndexOf(WINDOWS_SEPARATOR);
return Math.max(lastUnixPos, lastWindowsPos);
}
Credits
Copied from Apache FileNameUtils Class - http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/1.3.2/org/apache/commons/io/FilenameUtils.java#FilenameUtils.getExtension%28java.lang.String%29
Without use of any library, you can use the String method split as follows :
String[] splits = fileNames.get(i).split("\\.");
String extension = "";
if(splits.length >= 2)
{
extension = splits[splits.length-1];
}
private String getExtension(File file)
{
String fileName = file.getName();
String[] ext = fileName.split("\\.");
return ext[ext.length -1];
}
Just a regular-expression based alternative. Not that fast, not that good.
Pattern pattern = Pattern.compile("\\.([^.]*)$");
Matcher matcher = pattern.matcher(fileName);
if (matcher.find()) {
String ext = matcher.group(1);
}
I like the simplicity of spectre's answer, and linked in one of his comments is a link to another answer that fixes dots in file paths, on another question, made by EboMike.
Without implementing some sort of third party API, I suggest:
private String getFileExtension(File file) {
String name = file.getName().substring(Math.max(file.getName().lastIndexOf('/'),
file.getName().lastIndexOf('\\')) < 0 ? 0 : Math.max(file.getName().lastIndexOf('/'),
file.getName().lastIndexOf('\\')));
int lastIndexOf = name.lastIndexOf(".");
if (lastIndexOf == -1) {
return ""; // empty extension
}
return name.substring(lastIndexOf + 1); // doesn't return "." with extension
}
Something like this may be useful in, say, any of ImageIO's write methods, where the file format has to be passed in.
Why use a whole third party API when you can DIY?
The fluent way:
public static String fileExtension(String fileName) {
return Optional.of(fileName.lastIndexOf(".")).filter(i-> i >= 0)
.filter(i-> i > fileName.lastIndexOf(File.separator))
.map(fileName::substring).orElse("");
}
try this.
String[] extension = "adadad.adad.adnandad.jpg".split("\\.(?=[^\\.]+$)"); // ['adadad.adad.adnandad','jpg']
extension[1] // jpg
Related
What's the most efficient way to trim the suffix in Java, like this:
title part1.txt
title part2.html
=>
title part1
title part2
This is the sort of code that we shouldn't be doing ourselves. Use libraries for the mundane stuff, save your brain for the hard stuff.
In this case, I recommend using FilenameUtils.removeExtension() from Apache Commons IO
str.substring(0, str.lastIndexOf('.'))
As using the String.substring and String.lastIndex in a one-liner is good, there are some issues in terms of being able to cope with certain file paths.
Take for example the following path:
a.b/c
Using the one-liner will result in:
a
That's incorrect.
The result should have been c, but since the file lacked an extension, but the path had a directory with a . in the name, the one-liner method was tricked into giving part of the path as the filename, which is not correct.
Need for checks
Inspired by skaffman's answer, I took a look at the FilenameUtils.removeExtension method of the Apache Commons IO.
In order to recreate its behavior, I wrote a few tests the new method should fulfill, which are the following:
Path Filename
-------------- --------
a/b/c c
a/b/c.jpg c
a/b/c.jpg.jpg c.jpg
a.b/c c
a.b/c.jpg c
a.b/c.jpg.jpg c.jpg
c c
c.jpg c
c.jpg.jpg c.jpg
(And that's all I've checked for -- there probably are other checks that should be in place that I've overlooked.)
The implementation
The following is my implementation for the removeExtension method:
public static String removeExtension(String s) {
String separator = System.getProperty("file.separator");
String filename;
// Remove the path upto the filename.
int lastSeparatorIndex = s.lastIndexOf(separator);
if (lastSeparatorIndex == -1) {
filename = s;
} else {
filename = s.substring(lastSeparatorIndex + 1);
}
// Remove the extension.
int extensionIndex = filename.lastIndexOf(".");
if (extensionIndex == -1)
return filename;
return filename.substring(0, extensionIndex);
}
Running this removeExtension method with the above tests yield the results listed above.
The method was tested with the following code. As this was run on Windows, the path separator is a \ which must be escaped with a \ when used as part of a String literal.
System.out.println(removeExtension("a\\b\\c"));
System.out.println(removeExtension("a\\b\\c.jpg"));
System.out.println(removeExtension("a\\b\\c.jpg.jpg"));
System.out.println(removeExtension("a.b\\c"));
System.out.println(removeExtension("a.b\\c.jpg"));
System.out.println(removeExtension("a.b\\c.jpg.jpg"));
System.out.println(removeExtension("c"));
System.out.println(removeExtension("c.jpg"));
System.out.println(removeExtension("c.jpg.jpg"));
The results were:
c
c
c.jpg
c
c
c.jpg
c
c
c.jpg
The results are the desired results outlined in the test the method should fulfill.
String foo = "title part1.txt";
foo = foo.substring(0, foo.lastIndexOf('.'));
BTW, in my case, when I wanted a quick solution to remove a specific extension, this is approximately what I did:
if (filename.endsWith(ext))
return filename.substring(0,filename.length() - ext.length());
else
return filename;
Use a method in com.google.common.io.Files class if your project is already dependent on Google core library. The method you need is getNameWithoutExtension.
you can try this function , very basic
public String getWithoutExtension(String fileFullPath){
return fileFullPath.substring(0, fileFullPath.lastIndexOf('.'));
}
String fileName="foo.bar";
int dotIndex=fileName.lastIndexOf('.');
if(dotIndex>=0) { // to prevent exception if there is no dot
fileName=fileName.substring(0,dotIndex);
}
Is this a trick question? :p
I can't think of a faster way atm.
I found coolbird's answer particularly useful.
But I changed the last result statements to:
if (extensionIndex == -1)
return s;
return s.substring(0, lastSeparatorIndex+1)
+ filename.substring(0, extensionIndex);
as I wanted the full path name to be returned.
So "C:\Users\mroh004.COM\Documents\Test\Test.xml" becomes
"C:\Users\mroh004.COM\Documents\Test\Test" and not
"Test"
filename.substring(filename.lastIndexOf('.'), filename.length()).toLowerCase();
Use a regex. This one replaces the last dot, and everything after it.
String baseName = fileName.replaceAll("\\.[^.]*$", "");
You can also create a Pattern object if you want to precompile the regex.
If you use Spring you could use
org.springframework.util.StringUtils.stripFilenameExtension(String path)
Strip the filename extension from the given Java resource path, e.g.
"mypath/myfile.txt" -> "mypath/myfile".
Params: path – the file path
Returns: the path with stripped filename extension
private String trimFileExtension(String fileName)
{
String[] splits = fileName.split( "\\." );
return StringUtils.remove( fileName, "." + splits[splits.length - 1] );
}
String[] splitted = fileName.split(".");
String fileNameWithoutExtension = fileName.replace("." + splitted[splitted.length - 1], "");
create a new file with string image path
String imagePath;
File test = new File(imagePath);
test.getName();
test.getPath();
getExtension(test.getName());
public static String getExtension(String uri) {
if (uri == null) {
return null;
}
int dot = uri.lastIndexOf(".");
if (dot >= 0) {
return uri.substring(dot);
} else {
// No extension.
return "";
}
}
org.apache.commons.io.FilenameUtils version 2.4 gives the following answer
public static String removeExtension(String filename) {
if (filename == null) {
return null;
}
int index = indexOfExtension(filename);
if (index == -1) {
return filename;
} else {
return filename.substring(0, index);
}
}
public static int indexOfExtension(String filename) {
if (filename == null) {
return -1;
}
int extensionPos = filename.lastIndexOf(EXTENSION_SEPARATOR);
int lastSeparator = indexOfLastSeparator(filename);
return lastSeparator > extensionPos ? -1 : extensionPos;
}
public static int indexOfLastSeparator(String filename) {
if (filename == null) {
return -1;
}
int lastUnixPos = filename.lastIndexOf(UNIX_SEPARATOR);
int lastWindowsPos = filename.lastIndexOf(WINDOWS_SEPARATOR);
return Math.max(lastUnixPos, lastWindowsPos);
}
public static final char EXTENSION_SEPARATOR = '.';
private static final char UNIX_SEPARATOR = '/';
private static final char WINDOWS_SEPARATOR = '\\';
The best what I can write trying to stick to the Path class:
Path removeExtension(Path path) {
return path.resolveSibling(path.getFileName().toString().replaceFirst("\\.[^.]*$", ""));
}
dont do stress on mind guys. i did already many times. just copy paste this public static method in your staticUtils library for future uses ;-)
static String removeExtension(String path){
String filename;
String foldrpath;
String filenameWithoutExtension;
if(path.equals("")){return "";}
if(path.contains("\\")){ // direct substring method give wrong result for "a.b.c.d\e.f.g\supersu"
filename = path.substring(path.lastIndexOf("\\"));
foldrpath = path.substring(0, path.lastIndexOf('\\'));;
if(filename.contains(".")){
filenameWithoutExtension = filename.substring(0, filename.lastIndexOf('.'));
}else{
filenameWithoutExtension = filename;
}
return foldrpath + filenameWithoutExtension;
}else{
return path.substring(0, path.lastIndexOf('.'));
}
}
I would do like this:
String title_part = "title part1.txt";
int i;
for(i=title_part.length()-1 ; i>=0 && title_part.charAt(i)!='.' ; i--);
title_part = title_part.substring(0,i);
Starting to the end till the '.' then call substring.
Edit:
Might not be a golf but it's effective :)
Keeping in mind the scenarios where there is no file extension or there is more than one file extension
example Filename : file | file.txt | file.tar.bz2
/**
*
* #param fileName
* #return file extension
* example file.fastq.gz => fastq.gz
*/
private String extractFileExtension(String fileName) {
String type = "undefined";
if (FilenameUtils.indexOfExtension(fileName) != -1) {
String fileBaseName = FilenameUtils.getBaseName(fileName);
int indexOfExtension = -1;
while (fileBaseName.contains(".")) {
indexOfExtension = FilenameUtils.indexOfExtension(fileBaseName);
fileBaseName = FilenameUtils.getBaseName(fileBaseName);
}
type = fileName.substring(indexOfExtension + 1, fileName.length());
}
return type;
}
String img = "example.jpg";
// String imgLink = "http://www.example.com/example.jpg";
URI uri = null;
try {
uri = new URI(img);
String[] segments = uri.getPath().split("/");
System.out.println(segments[segments.length-1].split("\\.")[0]);
} catch (Exception e) {
e.printStackTrace();
}
This will output example for both img and imgLink
private String trimFileName(String fileName)
{
String[] ext;
ext = fileName.split("\\.");
return fileName.replace(ext[ext.length - 1], "");
}
This code will spilt the file name into parts where ever it has " . ", For eg. If the file name is file-name.hello.txt then it will be spilted into string array as , { "file-name", "hello", "txt" }. So anyhow the last element in this string array will be the file extension of that particular file , so we can simply find the last element of any arrays with arrayname.length - 1, so after we get to know the last element, we can just replace the file extension with an empty string in that file name. Finally this will return file-name.hello. , if you want to remove also the last period then you can add the string with only period to the last element of string array in the return line. Which should look like,
return fileName.replace("." + ext[ext.length - 1], "");
public static String removeExtension(String file) {
if(file != null && file.length() > 0) {
while(file.contains(".")) {
file = file.substring(0, file.lastIndexOf('.'));
}
}
return file;
}
Consider the following template:
<#include "../header.txt"/>
<#list items as item>
Item name is: ${item.name}<br/>
</#list>
Where the header.txt contains:
<html>
<head>
</head>
<body>
I would like to "pre-process" this template so that the resulting output is:
<html>
<head>
</head>
<body>
<#list items as item>
Item name is: ${item.name}<br/>
</#list>
I would like to be able to expand the includes but not resolve the variables. How can I do this with Freemarker?
FreeMarker doesn't support doing this (only resolving some parts of a template). What you could do is pre-processing the template with you own parser. That's supported by using your own TemplateLoader implementation that delegates to another TemplateLoader (the original one) and filters the content. So you can apply your transformation on-the-fly when the template is needed for the first time, and the result will be cached (in the standard template cache of FreeMarker). I would recommend using your own syntax (like <%include '...'>), so that everyone will see that something special is going on there.
I ended up having to create my own expander. I hope it will be useful to someone:
/**
* Utility class to perform Freemarker template expansion.
*
* #author Chris Mepham
*/
public class FreemarkerTemplateExpander implements ApplicationAware {
static final String INCLUDE_REGEX = "<#include \\\"\\S+\\\"\\/>";
static final String PATH_REGEX = "\\\"\\S+\\\"";
private ModuleStateHolder moduleStateHolder;
/**
* Takes the Freemarker template String input and
*
* recursively expands all of the includes.
*
* #param input The String value of a Freemarker template.
*
* #return The expanded version of the Freemarker template.
*/
public final String expand(String module, String path, String input) {
Assert.notNull(module, "module cannot be null");
Assert.notNull(path, "path cannot be null");
Assert.notNull(input, "input cannot be null");
if (!hasText(input)) return input;
// See if there is an include
int indexOfNextInclude = getIndexOfNextInclude(Pattern.compile(INCLUDE_REGEX), input);
// If there is no include just return the input text
if (indexOfNextInclude == -1) return input;
StringBuffer buffer = new StringBuffer();
// Otherwise, get all the text up to the next include and add it to buffer
String prefix = input.substring(0, indexOfNextInclude);
if (hasText(prefix)) buffer.append(prefix);
// Then get the contents of the include as a String
String includeContents = getIncludeContents(module, path, input);
if (hasText(includeContents)) buffer.append(includeContents);
// Then get all the text after the next include
int includeLastCharacterIndex = indexOfNextInclude + matchRegexPattern(input, INCLUDE_REGEX).length();
String suffix = input.substring(includeLastCharacterIndex + 1);
buffer.append(suffix);
input = buffer.toString();
return expand(module, path, input);
}
final String getIncludeContents(String module, String path1, String input) {
// Get next include file relative path
String nextIncludePath = getNextIncludePath(input);
String resourcePath = getResourcePath(nextIncludePath);
// Get file name
String filename = getFilename(nextIncludePath);
// Get file contents here
String path = "templates." + resourcePath;
InputStream resource = ClasspathResourceUtils.getClassPathResource(path, filename, getClassLoader(module));
StringWriter writer = new StringWriter();
try {
IOUtils.copy(resource, writer, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
return writer.toString();
}
static final int getIndexOfNextInclude(Pattern pattern, String input) {
Matcher matcher = pattern.matcher(input);
return matcher.find() ? matcher.start() : -1;
}
private static final String getNextIncludePath(String input) {
String include = matchRegexPattern(input, INCLUDE_REGEX);
if (include == null) return null;
String path = matchRegexPattern(include, PATH_REGEX);
path = path.replace("\"", "");
return path;
}
private static final String matchRegexPattern(String input, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
return matcher.group(0);
}
return null;
}
private String getResourcePath(String path) {
if (!path.contains("/")) return path;
String resourcePath = path.substring(path.indexOf("/") + 1, path.lastIndexOf("/"));
return resourcePath;
}
private String getFilename(String path) {
if (!path.contains("/")) return path;
return path.substring(path.lastIndexOf("/") + 1);
}
}
I need to extract all the texts from some swf files. I'm using Java since I have a lot of modules developed with this language.
Thus, I did a search through the Web for all the free Java library devoted to handle SWF files.
Finally, I found the library developed by StuartMacKay. The library, named transform-swf, may be found on GitHub by clicking here.
The question is: Once I extract the GlyphIndexes from a TextSpan, how can I convert the glyps in characters?
Please, provide a complete working and tested example. No theoretical answer will be accepted nor answers like "it cannot be done", "it ain't possible", etc.
What I know and what I did
I know that the GlyphIndexes are built by using a TextTable, which is constructed by recurring to an integer that represente the font size and a font description provided by a DefineFont2 object, but when I decode all the DefineFont2, all have a zero length advance.
Here follows what I did.
//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));
//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
if (object instanceof DefineFont2) {
DefineFont2 df2 = (DefineFont2) object;
fonts.put(df2.getIdentifier(), df2);
}
}
//Now I retrieve all the texts
for (MovieTag object : list) {
if (object instanceof DefineText2) {
DefineText2 dt2 = (DefineText2) object;
for (TextSpan ts : dt2.getSpans()) {
Integer fontIdentifier = ts.getIdentifier();
if (fontIdentifier != null) {
int fontSize = ts.getHeight();
// Here I try to create an object that should
// reverse the process done by a TextTable
ReverseTextTable rtt =
new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
System.out.println(rtt.charactersForText(ts.getCharacters()));
}
}
}
}
The class ReverseTextTable follows here:
public final class ReverseTextTable {
private final transient Map<Character, GlyphIndex> characters;
private final transient Map<GlyphIndex, Character> glyphs;
public ReverseTextTable(final DefineFont2 font, final int fontSize) {
characters = new LinkedHashMap<>();
glyphs = new LinkedHashMap<>();
final List<Integer> codes = font.getCodes();
final List<Integer> advances = font.getAdvances();
final float scale = fontSize / EMSQUARE;
final int count = codes.size();
for (int i = 0; i < count; i++) {
characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
(int) (advances.get(i) * scale)));
glyphs.put(new GlyphIndex(i,
(int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
}
}
//This method should reverse from a list of GlyphIndexes to a String
public String charactersForText(final List<GlyphIndex> list) {
String text="";
for(GlyphIndex gi: list){
text+=glyphs.get(gi);
}
return text;
}
}
Unfortunately, the list of advances from DefineFont2 is empty, then the constructor of ReverseTableText get an ArrayIndexOutOfBoundException.
Honestly, I don't know how to do that in Java. I'm not claiming that it is not possible, I also believe that there is a way to do that. However, you said that there are a lot of libraries that do that. You also suggested a library, i.e. swftools. So, I suggest to recurr to that library to extract the text from a flash file. To do that you can use Runtime.exec() just to execute a command line to run that library.
Personally, I prefer Apache Commons exec rather than the standard library released with JDK. Well, just let me show you how you should do. The executable file that you should use is "swfstrings.exe". Suppose that it is put in "C:\". Suppose that in the same folder you can find a flash file, e.g. page.swf. Then, I tried the following code (it works fine):
Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
commandLine.addArgument("\"" + swfFile.toString() + "\"");
DefaultExecutor executor = new DefaultExecutor();
executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
//0 for file not found, -1 for error
ByteArrayOutputStream stdout = new ByteArrayOutputStream();
PumpStreamHandler psh = new PumpStreamHandler(stdout);
executor.setStreamHandler(psh);
int exitValue;
try{
exitValue = executor.execute(commandLine);
}catch(org.apache.commons.exec.ExecuteException ex){
psh.stop();
}
if(!executor.isFailure(exitValue)){
String out = stdout.toString("UTF-8"); // here you have the extracted text
}
I know, this is not exactly the answer that you requested, but works fine.
I happened to be working on decompiling an SWF in Java now and I came across this question while figuring out how to reverse engineer the original text back.
After looking at the source code, I realise its really straightforward. Each font has an assigned sequence of characters that can be retrieved by calling DefineFont2.getCodes(), and the glyphIndex is the index to the matching character in DefineFont2.getCodes().
However, in cases where there are multiple fonts in use in a single SWF file, it is difficult to match each DefineText to the corresponding DefineFont2 because there's no attributes that identifies the DefineFont2 used for each DefineText.
To work around this issue, I came up with a self-learning algorithm which will attempt to guess the right DefineFont2 for each DefineText and hence derive the original text correctly.
To reverse engineer the original text back, I created a class called FontLearner:
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* #param font The font to add to the learner
* #param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* #param text The DefineText to retrieve the original String from
* #return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
Usage:
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());
FontLearner learner = new FontLearner();
DefineFont2 font = null;
List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
font = (DefineFont2) object;
} else if (object instanceof DefineText) {
DefineText text = (DefineText) object;
if (font != null) {
learner.addFont(font, text);
font = null;
}
String line = learner.getString(text); // reverse engineers the line
}
I am happy to say that this method has given me a 100% accuracy in reverse engineering the original String using StuartMacKay's transform-swf library.
Its seems to be difficult on what your trying to achieve, Your trying to secompile the file bur i am sorry to say that its not possible , What I would suggest you to do is to convert it into some bitmap (if possible) or by any other method try to read the characters using OCR
There are some software's which do that, you can also check some forums regarding that. Because once compiled version of swf is very difficult (and not possible as far as i know). You can check this decompiler if you want or try using some other languages like the project here
I had a similar problem with long strings using transform-swf library.
Got the source code and debugged it.
I believe there was a small bug in class com.flagstone.transform.coder.SWFDecoder.
Line 540 (applicable to version 3.0.2), change
dest += length;
with
dest += count;
That should do it for you (it's about extracting strings).
I notified Stuart as well. The problem appears only if your strings are very large.
I know this isn't what you asked but I needed to pull text from SWF recently using Java and found the ffdec library much better than transform-swf
Comment if anyone needs sample code
String variable contains a file name, C:\Hello\AnotherFolder\The File Name.PDF. How do I only get the file name The File Name.PDF as a String?
I planned to split the string, but that is not the optimal solution.
just use File.getName()
File f = new File("C:\\Hello\\AnotherFolder\\The File Name.PDF");
System.out.println(f.getName());
using String methods:
File f = new File("C:\\Hello\\AnotherFolder\\The File Name.PDF");
System.out.println(f.getAbsolutePath().substring(f.getAbsolutePath().lastIndexOf("\\")+1));
Alternative using Path (Java 7+):
Path p = Paths.get("C:\\Hello\\AnotherFolder\\The File Name.PDF");
String file = p.getFileName().toString();
Note that splitting the string on \\ is platform dependent as the file separator might vary. Path#getName takes care of that issue for you.
Using FilenameUtils in Apache Commons IO :
String name1 = FilenameUtils.getName("/ab/cd/xyz.txt");
String name2 = FilenameUtils.getName("c:\\ab\\cd\\xyz.txt");
Considering the String you're asking about is
C:\Hello\AnotherFolder\The File Name.PDF
we need to extract everything after the last separator, ie. \. That is what we are interested in.
You can do
String fullPath = "C:\\Hello\\AnotherFolder\\The File Name.PDF";
int index = fullPath.lastIndexOf("\\");
String fileName = fullPath.substring(index + 1);
This will retrieve the index of the last \ in your String and extract everything that comes after it into fileName.
If you have a String with a different separator, adjust the lastIndexOf to use that separator. (There's even an overload that accepts an entire String as a separator.)
I've omitted it in the example above, but if you're unsure where the String comes from or what it might contain, you'll want to validate that the lastIndexOf returns a non-negative value because the Javadoc states it'll return
-1 if there is no such occurrence
Since 1.7
Path p = Paths.get("c:\\temp\\1.txt");
String fileName = p.getFileName().toString();
String directory = p.getParent().toString();
you can use path = C:\Hello\AnotherFolder\TheFileName.PDF
String strPath = path.substring(path.lastIndexOf("\\")+1, path.length());
The other answers didn't quite work for my specific scenario, where I am reading paths that have originated from an OS different to my current one. To elaborate I am saving email attachments saved from a Windows platform on a Linux server. The filename returned from the JavaMail API is something like 'C:\temp\hello.xls'
The solution I ended up with:
String filenameWithPath = "C:\\temp\\hello.xls";
String[] tokens = filenameWithPath.split("[\\\\|/]");
String filename = tokens[tokens.length - 1];
Considere the case that Java is Multiplatform:
int lastPath = fileName.lastIndexOf(File.separator);
if (lastPath!=-1){
fileName = fileName.substring(lastPath+1);
}
getFileName() method of java.nio.file.Path used to return the name of the file or directory pointed by this path object.
Path getFileName()
For reference:
https://www.geeksforgeeks.org/path-getfilename-method-in-java-with-examples/
A method without any dependency and takes care of .. , . and duplicate separators.
public static String getFileName(String filePath) {
if( filePath==null || filePath.length()==0 )
return "";
filePath = filePath.replaceAll("[/\\\\]+", "/");
int len = filePath.length(),
upCount = 0;
while( len>0 ) {
//remove trailing separator
if( filePath.charAt(len-1)=='/' ) {
len--;
if( len==0 )
return "";
}
int lastInd = filePath.lastIndexOf('/', len-1);
String fileName = filePath.substring(lastInd+1, len);
if( fileName.equals(".") ) {
len--;
}
else if( fileName.equals("..") ) {
len -= 2;
upCount++;
}
else {
if( upCount==0 )
return fileName;
upCount--;
len -= fileName.length();
}
}
return "";
}
Test case:
#Test
public void testGetFileName() {
assertEquals("", getFileName("/"));
assertEquals("", getFileName("////"));
assertEquals("", getFileName("//C//.//../"));
assertEquals("", getFileName("C//.//../"));
assertEquals("C", getFileName("C"));
assertEquals("C", getFileName("/C"));
assertEquals("C", getFileName("/C/"));
assertEquals("C", getFileName("//C//"));
assertEquals("C", getFileName("/A/B/C/"));
assertEquals("C", getFileName("/A/B/C"));
assertEquals("C", getFileName("/C/./B/../"));
assertEquals("C", getFileName("//C//./B//..///"));
assertEquals("user", getFileName("/user/java/.."));
assertEquals("C:", getFileName("C:"));
assertEquals("C:", getFileName("/C:"));
assertEquals("java", getFileName("C:\\Program Files (x86)\\java\\bin\\.."));
assertEquals("C.ext", getFileName("/A/B/C.ext"));
assertEquals("C.ext", getFileName("C.ext"));
}
Maybe getFileName is a bit confusing, because it returns directory names also. It returns the name of file or last directory in a path.
extract file name using java regex *.
public String extractFileName(String fullPathFile){
try {
Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
Matcher regexMatcher = regex.matcher(fullPathFile);
if (regexMatcher.find()){
return regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
LOG.info("extractFileName::pattern problem <"+fullPathFile+">",ex);
}
return fullPathFile;
}
You can use FileInfo object to get all information of your file.
FileInfo f = new FileInfo(#"C:\Hello\AnotherFolder\The File Name.PDF");
MessageBox.Show(f.Name);
MessageBox.Show(f.FullName);
MessageBox.Show(f.Extension );
MessageBox.Show(f.DirectoryName);
This answer works for me in c#:
using System.IO;
string fileName = Path.GetFileName("C:\Hello\AnotherFolder\The File Name.PDF");
I need to scan a particular folder in Java, and be able to return the integer number of files of a particular type (based on not only extension but also naming convention.) For example, I want to know how many JPG files there are in the \src folder that have a simple integer filename (say, 1.JPG through 30.JPG). Can anyone point me in the right direction? Thx
java.io.File.list(FilenameFilter) is the method you're looking for.
I have a method that uses a regex pattern for a rather complicated file structure. Something like that could be used, although I'm sure it could be written more concisely than my example (edited for security).
/**
* Get all non-directory filenames from a given foo/flat directory
*
* #param network
* #param typeRegex
* #param locationRegex
* #return
*/
public List<String> getFilteredFilenames(String network, String typeRegex, String locationRegex) {
String regex = null;
List<String> filenames = new ArrayList<String>();
String directory;
// Look at the something network
if (network.equalsIgnoreCase("foo")) {
// Get the foo files first
directory = this.pathname + "/" + "foo/filtered/flat";
File[] foofiles = getFilenames(directory);
// run the regex if need be.
if (locationRegex != null && typeRegex != null ) {
regex = typeRegex + "." + locationRegex;
//System.out.println(regex);
}
for (int i = 0; i < foofiles.length; i++) {
if (foofiles[i].isFile()) {
String file = foofiles[i].getName();
if (regex == null) {
filenames.add(file);
}
else {
if (file.matches(regex)) {
filenames.add(file);
}
}
}
}
}
return filenames;
}