Java language detection with langdetect - how to load profiles?

Java language detection with langdetect - how to load profiles? - java

I'm trying to use a Java library called langdetect hosted here. It couldn't be easier to use:
Detector detector;
String langDetected = "";
try {
String path = "C:/Users/myUser/Desktop/jars/langdetect/profiles";
DetectorFactory.loadProfile(path);
detector = DetectorFactory.create();
detector.append(text);
langDetected = detector.detect();
}
catch (LangDetectException e) {
throw e;
}
return langDetected;
Except with respect to the DetectFactory.loadProfile method. This library works great when I pass it an absolute file path, but ultimately I think I need to package my code and langdetect's companion profiles directory inside the same JAR file:
myapp.jar/
META-INF/
langdetect/
profiles/
af
bn
en
...etc.
com/
me/
myorg/
LangDetectAdaptor --> is what actually uses the code above
I will make sure that the LangDetectAdaptor which is located inside myapp.jar is supplied with both the langdetect.jar and jsonic.jar dependencies it needs for langdetect to work at runtime. However I'm confused as to what I need to pass in to DetectFactory.loadProfile in order to work:
The langdetect JAR ships with the profiles directory, but you need to initialize it from inside your JAR. So do I copy the profiles directory and put it inside my JAR (like I prescribe above), or is there a way to keep it inside langdetect.jar but access it from inside my code?
Thanks in advance for any help here!
Edit : I think the problem here is that langdetect ships with this profiles directory, but then wants you to initialize it from inside your JAR. The API would probably benefit from being changed a little bit to just consider profiles its own configuration, and to then provide methods like DetectFactory.loadProfiles().except("fr") in the event that you don't want it to initialize French, etc. But this still doesn't solve my problem!

I have the same problem. You can load the profiles from the LangDetect jar using JarUrlConnection and JarEntry. Note in this example I am using Java 7 resource management.
String dirname = "profiles/";
Enumeration<URL> en = Detector.class.getClassLoader().getResources(
dirname);
List<String> profiles = new ArrayList<>();
if (en.hasMoreElements()) {
URL url = en.nextElement();
JarURLConnection urlcon = (JarURLConnection) url.openConnection();
try (JarFile jar = urlcon.getJarFile();) {
Enumeration<JarEntry> entries = jar.entries();
while (entries.hasMoreElements()) {
String entry = entries.nextElement().getName();
if (entry.startsWith(dirname)) {
try (InputStream in = Detector.class.getClassLoader()
.getResourceAsStream(entry);) {
profiles.add(IOUtils.toString(in));
}
}
}
}
}
DetectorFactory.loadProfile(profiles);
Detector detector = DetectorFactory.create();
detector.append(text);
String langDetected = detector.detect();
System.out.println(langDetected);

Since no maven-support was available, and the mechanism to load profiles was not perfect (since you you need to define files instead of resources), I created a fork which solves that problem:
https://github.com/galan/language-detector
I mailed the original author, so he can fork/maintain the changes, but no luck - seems the project is abandoned.
Here is an example of how to use it now (own profiles can be written where necessary):
DetectorFactory.loadProfile(new DefaultProfile()); // SmProfile is also available
Detector detector = DetectorFactory.create();
detector.append(input);
String result = detector.detect();
// maybe work with detector.getProbabilities()
I don't like the static approach the DetectorFactory uses, but I won't rewrite the full project, you have to create your own fork/pull request :)

Looks like the library only accepts files. You can either change the code and try submitting the changes upstream. Or write your resource to a temp file and get it to load that.

The solution provided by Mark Butler is still valid and solved my problem, but the dirname needs to be updated as the jar content has changed.
The problem has been reported by Deepak but I have insufficient reputation to reply in comments. Here you are the two declarations you need.
In order to load short profiles:
String dirname = "profiles/shorttext/";
In order to load long profiles:
String dirname = "profiles/longtext/";

Setting the working dir for me fixed the problem.
String workingDir = System.getProperty("user.dir");
DetectorFactory.loadProfile(workingDir+"/profiles/");

Related

checkmarx - How to resolve Stored Absolute Path Traversal issue?

Checkmarx - v 9.3.0 HF11
I am passing env value as data directory path in docker file which used in dev/uat server
ENV DATA /app/data/
In local, using following Environment variable
DATA=C:\projects\app\data\
getDataDirectory("MyDirectoryName"); // MyDirectoryName is present in data folder
public String getDataDirectory(String dirName)
{
String path = System.getenv("DATA");
if (path != null) {
path = sanitizePathValue(path);
path = encodePath(path);
dirName = sanitizePathValue(dirName);
if (!path.endsWith(File.separator)) {
path = path + File.separator;
} else if (!path.contains("data")) {
throw new MyRuntimeException("Data Directory path is incorrect");
}
} else {
return null;
}
File file = new File(dirName); // NOSONAR
if (!file.isAbsolute()) {
File tmp = new File(SecurityUtil.decodePath(path)); // NOSONAR
if (!tmp.getAbsolutePath().endsWith(Character.toString(File.separatorChar))) {
dirName = tmp.getAbsolutePath() + File.separatorChar + dirName;
} else {
dirName = tmp.getAbsolutePath() + dirName;
}
}
return dirName;
}
public static String encodePath(String path) {
try {
return URLEncoder.encode(path, "UTF-8");
} catch (UnsupportedEncodingException e) {
logger.error("Exception while encoding path", e);
}
return "";
}
public static String validateAndNormalizePath(String path) {
path = path.replaceAll("/../", "/");
path = path.replaceAll("/%46%46/", "/");
path = SecurityUtil.cleanIt(path);
path = FilenameUtils.normalize(path); // normalize path
return path;
}
public static String sanitizePathValue(String filename){
filename = validateAndNormalizePath(filename);
String regEx = "..|\\|/";
// compile the regex to create pattern
// using compile() method
Pattern pattern = Pattern.compile(regEx);
// get a matcher object from pattern
Matcher matcher = pattern.matcher(filename);
// check whether Regex string is
// found in actualString or not
boolean matches = matcher.matches();
if(matches){
throw new MyAppRuntimeException("filename:'"+filename+"' is bad.");
}
return filename;
}
public static String validateAndNormalizePath(String path) {
path = path.replaceAll("/../", "/");
path = path.replaceAll("/%46%46/", "/");
path = SecurityUtil.cleanIt(path);
path = FilenameUtils.normalize(path); // normalize path
return path;
}
[Attempt] - Update code which I tried with the help of few members to prevent path traversal issue.
Tried to sanitize string and normalize string, but no luck and getting same issue.
How to resolve Stored Absolute Path Traversal issue ?

Your first attempt is not going to work because escaping alone isn't going to prevent a path traversal. Replacing single quotes with double quotes won't do it either given you need to make sure someone setting a property/env variable with ../../etc/resolv.conf doesn't succeed in tricking your code into overwriting/reading a sensitive file. I believe Checkmarx won't look for StringUtils as part of recognizing it as sanitized, so the simple working example below is similar without using StringUtils.
Your second attempt won't work because it is a validator that uses control flow to prevent a bad input when it throws an exception. Checkmarx analyzes data flows. When filename is passed as a parameter to sanitizePathValue and returned as-is at the end, the data flow analysis sees this as not making a change to the original value.
There also appears to be some customizations in your system that recognize System.getProperty and System.getenv as untrusted inputs. By default, these are not recognized in this way, so anyone trying to scan your code probably would not have gotten any results for Absolute Path Traversal. It is possible that the risk profile of your application requires that you call properties and environment variables as untrusted inputs, so you can't really just remove these and revert back to the OOTB settings.
As Roman had mentioned, the logic in the query does look for values that are prepended to this untrusted input to remove those data flows as results. The below code shows how this could be done using Roman's method to trick the scanner. (I highly suggest you do not choose the route to trick the scanner.....very bad idea.) There could be other string literal values that would work using this method, but it would require some actions that control how the runtime is executed (like using chroot) to make sure it actually fixed the issue.
If you scan the code below, you should see only one vulnerable data path. The last example is likely something along the lines of what you could use to remediate the issues. It really depends on what you're trying to do with the file being created.
(I tested this on 9.2; it should work for prior versions. If it doesn't work, post your version and I can look into that version's query.)
// Vulnerable
String fn1 = System.getProperty ("test");
File f1 = new File(fn1);
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn2 = System.getProperty ("test");
File f2 = new File(Paths.get ("", fn2).toString () );
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn3 = System.getProperty ("test");
File f3 = new File("" + fn3);
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn4 = System.getProperty ("test");
File f4 = new File("", fn4);
// Sanitized by stripping path separator as defined in the JDK
// This would be the safest method
String fn5 = System.getProperty ("test");
File f5 = new File(fn5.replaceAll (File.separator, ""));
So, in summary (TL;DR), replace the file separator in the untrusted input value:
String fn5 = System.getProperty ("test");
File f5 = new File(fn5.replaceAll (File.separator, ""));
Edit
Updating for other Checkmarx users that may come across this in search of an answer.
After my answer, OP updated the question to reveal that the issue being found was due to a mechanism written for the code to run in different environments. Pre-docker, this would have been the method to use. The vulnerability would have still been detected but most courses of action would have been to say "our deployment environment has security measures around it to prevent a bad actor from injecting an undesired path into the environment variable where we store our base path."
But now, with Docker, this is a thing of the past. Generally the point of Docker is to create applications that run the way same everywhere they are deployed. Using a base path in an environment likely means OP is executing the code outside of a container for development (based on the update showing a Windows path) and inside the container for deployment. Why not just run the code in the container for development as well as deployment as is intended by Docker?
Most of the answers tend to explain that OP should use a static path. This is because they are realizing that there is no way to avoid this issue because taking an untrusted input (from the environment) and prefixing it to a path is the exact problem of Absolute Path Traversal.
OP could follow the good advice of many posters here and put a static base path in the code then use Docker volumes or Docker bind mounts.
Is it difficult? Nope. If I were OP, I'd fix the base path prefix in code to a static value of /app/data and do a simple volume binding during development. (When you think about it, if there is storage of data in the container during a deployment then the deployment environment must be doing this exact thing for /app/data unless the data is not kept after the lifetime of the container.)
With the base path fixed at /app/data, one option for OP to run their development build is:
docker run -it -v"C:\\projects\\app\\data":/app/data {container name goes here}
All data written by the application would appear in C:\projects\app\data the same way it does when using the environment variables. The main difference is that there are no environment-variable-prefixed paths and thus no Absolute Path Traversal results from the static analysis scanner.

It depends on how Checkmarx comes to this point. Most likely because the value that is handed to File is still tainted. So make sure both /../ and /%46%46/ are replaced by /.
checkedInput = userInput.replaceAll("/../", "/");
Secondly, give File a parent directory to start with and later compare the path of the file you want to process. Some common example code is below. If the file doesn't start with the full parent directory, then it means you have a path traversal.
File file = new File(BASE_DIRECTORY, userInput);
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
// process file
}
Checkmarx can only check if variables contain a tainted value and in some cases if the logic is correct. Please also think about the running process and file system permissions. A lot of applications have the capability of overwriting their own executables.

If there is one thing to remember it is this
use allow lists not deny lists
(traditionally known as whitelists and blacklists).
For instance, consider replacing /../ with / suggested in another answer. My response is to contain the sequence /../../. You could pursue this iteratively, and I might run out of adversarial examples, but that doesn't mean there are any.
Another problem is knowing all the special characters. \0 used to truncate the file name. What happens to non-ASCII characters - I can't remember. Might other code be changed in future so that the path ends up on a command line with other special characters - worse, OS/command line dependent.
Canonicalisation has its problems too. It can be used to some extent probe the file system (and perhaps beyond the machine).
So, choose what you allow. Say
if (filename.matches("[a-zA-Z0-9_]+")) {
return filename;
} else {
throw new MyException(...);
}
(No need to go through the whole Pattern/Matcher palaver in this situation.)

For this issue i would suggest you hard code the absolute path of the directory that you allow your program to work in; like this:
String separator = FileSystems.getDefault().getSeparator();
// should resolve to /app/workdir in linux
String WORKING_DIR = separator + "app"+separator +"workdir"+separator ;
then when you accept the parameter treat it as a relative path like this:
String filename = System.getProperty("test");
sanitize(filename);
filename = WORKING_DIR+filename;
File dictionaryFile = new File(filename);
To sanitize your user's input make sure he does not include .. and does not include also \ nor /
private static void sanitize(filename){
if(Pattern.compile("\\.\\.|\\|/").matcher(filename).find()){
throw new RuntimeException("filename:'"+filename+"' is bad.");
}
}
Edit
In case you are running the process in linux you can change the root of the process using chroot maybe you do some googling to know how you should implement it.

how about using Java's Path to make the check("../test1.txt" is the input from user):
File base=new File("/your/base");
Path basePath=base.toPath();
Path resolve = basePath.resolve("../test1.txt");
Path relativize = basePath.relativize(resolve);
if(relativize.startsWith("..")){
throw new Exception("invalid path");
}

Based on reading the Checkmarx query for absolute path traversal vulnerability (and I believe in general one of the mitigation approach), is to prepend a hard coded path to avoid the attackers traversing through the file system:
File has a constructor that accepts a second parameter that will allow you to perform some prepending
String filename = System.getEnv("test");
File dictionaryFile = new File("/home/", filename);
UPDATE:
The validateAndNormalizePath would have technically sufficed but I believe Checkmarx is unable to recognize this as a sanitizer (being a custom written function). I would advice to work with your App Security team for them to use the CxAudit and overwrite the base Stored Path Traversal Checkmarx query to recognize validateAndNormalizePath as a valid sanitizer.

Get current path of executed file

I try to write and read to the file in my java project file called Books.txt.
The problem is that I can access the file only if partialPath has full path to the file.
Here is the code:
public <T> List<T> readFromFile(String fileName) {
private String partialPath = "\\HW3\\src\\java\\repos\\";
try {
String path = partialPath + fileName;
FileInputStream fi = new FileInputStream(path);
ObjectInputStream oi = new ObjectInputStream(fi);
// Read objects
List<T> items = (List<T>) oi.readObject();
oi.close();
fi.close();
return items;
} catch (IOException | ClassNotFoundException e) {
}
}
If I set relative path as above I get exception file not found.
My question is how can I set full path to the current directory programmatically?

Here is a code snippet of the Drombler Commons - Client Startup code I wrote, to determine the location of the executable jar. Replace DromblerClientStarter with your main class.
This should work at least when you're running your application as an executable JAR file.
/**
* The jar URI prefix "jar:"
*/
private static final String FULL_JAR_URI_PREFIX = "jar:";
/**
* Length of the jar URI prefix "jar:"
*/
private static final int FULL_JAR_URI_PREFIX_LENGTH = 4;
private Path determineMainJarPath() throws URISyntaxException {
Class<DromblerClientStarter> type = DromblerClientStarter.class;
String jarResourceURIString = type.getResource("/" + type.getName().replace(".", "/") + ".class").toURI().
toString();
int endOfJarPathIndex = jarResourceURIString.indexOf("!/");
String mainJarURIString = endOfJarPathIndex >= 0 ? jarResourceURIString.substring(0, endOfJarPathIndex)
: jarResourceURIString;
if (mainJarURIString.startsWith(FULL_JAR_URI_PREFIX)) {
mainJarURIString = mainJarURIString.substring(FULL_JAR_URI_PREFIX_LENGTH);
}
Path mainJarPath = Paths.get(URI.create(mainJarURIString));
return mainJarPath;
}
Depending on where you bundle Books.txt in your application distribution package, you can use this mainJarPath to determine the path of Books.txt.

I also feel that files created (and later possibly modified and or deleted) by your running Java application is usually better to be placed in a location of the file system that is away from your java application installed home directory. An example might be the 'C:\ProgramData\ApplicationNameFiles\' for the Windows operating system or something similar for other OS platforms. In my opinion, at least for me, I feel it provides less chance of corruption to essential application files due to a poorly maintained drive or, accidental deletion by a User that opens up a File Explorer and decides to take it upon him/her self to clean their system of so called unnecessary files, and other not so obvious reasons.
Because Java can run on almost any platform and such data file locations are platform specific the User should be allowed to select the location to where these files can be created and manipulated from. This location then can be saved as a Property. Indeed, slightly more work but IMHO I feel it may be well worth it.
It is obviously much easier to create a directory (folder) within the install home directory of your JAR file when it's first started and then store and manipulate your application's created data files from there. Definitely much easier to find but then again...that would be a matter of opinion and it wouldn't be mine. Never-the-less if you're bent on doing it this way then your Java application's Install Utility should definitely know where that install path would be, it is therefore just a matter of storing that location somewhere.
No Install Utility? Well then your Java application will definitely need a means to know from where your JAR file is running from and the following code is one way to do that:
public String applicationPath(Class mainStartupClassName) {
try {
String path = mainStartupClassName.getProtectionDomain().getCodeSource().getLocation().getPath();
String pathDecoded = URLDecoder.decode(path, "UTF-8");
pathDecoded = pathDecoded.trim().replace("/", File.separator);
if (pathDecoded.startsWith(File.separator)) {
pathDecoded = pathDecoded.substring(1);
}
return pathDecoded;
}
catch (UnsupportedEncodingException ex) {
Logger.getLogger("applicationPath() Method").log(Level.SEVERE, null, ex);
}
return null;
}
And here is how you would use this method:
String appPath = applicationPath(MyMainStartupClassName.class);
Do keep in mind that if this method is run from within your IDE it will most likely not return the path to your JAR file but instead point to a folder where your classes are stored for the application build.

This is not a unique issue to Java, it's a problem faced by any developer of any language wishing to write data locally to the disk. The are many parts to this problem.
If you want to be able to write to the file (and presumably, read the changes), then you need to devise a solution which allows you find the file in a platform independent way.
Some of the issues
The installation location of the program
While most OS's do have some conventions governing this, this doesn't mean they are always used, for what ever reason.
Also, on some OS's, you are actively restricted from writing to the "installation" location. Windows 8+ doesn't allow you to write to the "Program Files" directory, and in Java, this usually (or at least when I was dealing with it) fails silently.
On MacOS, if you're using a "app bundle", the working directory is automatically set to the user's home directory, making it even more difficult to manage
The execution context (or working directory) may be different from the installation location of the program
A program can be installed in one location, but executed from a different location, this will change the working directory location. Many command line tools suffer from this issue and use different conventions to work around it (ever wonder what the JAVA_HOME environment variable is for 🤔)
Restricted disk access
Many OS's are now actively locking down the locations to which programs can write, even with admin privileges.
A reusable solution...
Most OS's have come up with conventions for solving this issue, not just for Java, but for all developers wishing to work on the platform.
Important Like all guide lines, these are not hard and fast rules, but a recommendations made by the platform authors, which are intended to make your life simpler and make the operation of the platform safer
The most common solution is to simply place the file in a "well known location" on the disk, which can be accessed through an absolute path independently of the installation or execution location of the program.
On Windows, this means placing the file in either ~\AppData\Local\{application name} or ~\AppData\Roaming\{application name}
On MacOS, this means placing the file in ~/Library/Application Data/{application name}
On *nix, this typically means placing the file in ~/.{application name}
It could be argued that you could use ~/.{application name} on all three platforms, but as a user who "shows hidden files", I'd prefer you didn't pollute my home directory.
A possible, reusable, solution...
When Windows 8 came out, I hit the "you can't write to the Program Files" issue, which took some time to diagnose, as it didn't generate an exception, it just failed.
I was also working a lot more on Mac OS as well, so I needed a simple, cross platform solution, so my code could automatically adapt without the need for multiple branches per platform.
To this end, I came with a simple utility class...
public enum SystemUtilities {
INSTANCE;
public boolean isMacOS() {
return getOSName().startsWith("Mac");
}
public boolean isMacOSX() {
return getOSName().startsWith("Mac OS X");
}
public boolean isWindowsOS() {
return getOSName().startsWith("Windows");
}
public boolean isLinux() {
return getOSName().startsWith("Linux");
}
public String getOSName() {
return System.getProperty("os.name");
}
public File getRoamingApplicationSupportPath() {
// For *inx, use '~/.{AppName}'
String path = System.getProperty("user.home");
if (isWindowsOS()) {
path += "\\AppData\\Roaming";
} else if (isMacOS()) {
path += "/Library/Application Support";
}
return new File(path);
}
public File getLocalApplicationSupportPath() {
// For *inx, use '~/.{AppName}'
String path = System.getProperty("user.home");
if (isWindowsOS()) {
path += "\\AppData\\Local";
} else if (isMacOS()) {
path += "/Library/Application Support";
}
return new File(path);
}
}
This provides a baseline from which "independent" code can be built, for example, you could use something like...
File appDataDir = new File(SystemUtilities.INSTANCE.getLocalApplicationSupportPath(), "MyAwesomeApp");
if (appDataDir.exists() || appDataDir.mkdirs()) {
File fileToWrite = new File(appDataDir, "Books.txt");
//...
}
to read/write to the file. Although, personally, I might have manager/factory do this work and return the reference to the end File, but that's me.
What about "pre-packaged" files?
Three possible solutions...
Create the file(s) if they don't exist, populating them with default values as required
Copy "template" file(s) out of the Jar file, if they don't exist
Use an installer to install the files - this is the solution we used when we were faced with changing the location of all our "external" configuration files.
Read only files...
For read only files, the simplest solution is to embedded them within the Jar as "embedded resources", this makes it easier to locate and manage...
URL url = getClass().getResource("/path/to/readOnlyResource.txt");
How you do this, will depend on your build system

Load resource from another JAR that hasn't been loaded

I'm trying to load a resource (plain text file) from a second JAR that has not be loaded yet. This resource will contain a string representing a class in this second jar which I plan to use.
I'm having trouble finding the correct way to load this resource, and previous similar questions haven't gotten me much further. Here is what I'm working with:
public void readResource() {
ClassLoader loader = Thread.currentThread().getContextClassLoader();
}
I can see in this ClassLoader (which ends up being a WebappClassLoader) has the list of jars in the directory:
jarNames: [com.mysql.jdbc.jar, productivity-common.jar]
jarPath: /WEB-INF/lib
When I try to load up the file using the ClassLoader, I'm getting a NullPointerException:
String path = loader.getResource("com/productivity/common/META-INF/providers/hello.txt").getPath();
If this would work, my next step would be reading the value in this file using an InputStream, and trying to create a new instance of a class matching that value from the same second jar. From what I'm reading, I would use the path to that class and use Class.forName("value").newInstance(), but I'm not confident that's right either.
Any assistance would be greatly appreciated. I'm trying to learn how ClassLoaders work and writing this (what should be simple) project to help.

Let me assume you have two resource files with same name "spring/label.properties" stored in two different jar files.
You can use following code to find list of all files from class path, then filter based on path.
Enumeration<URL> en = this.getClass().getClassLoader().getResources("spring/label.properties");
while(en.hasMoreElements()){
URL url = en.nextElement();
//Print all path to visualize the path
System.out.println(url.getPath());
if(url.getPath().contains("my-jar")){ // This can be jar name
BufferedReader reader = new BufferedReader(new InputStreamReader(en.nextElement().openStream()));
String str = null;
while((str = reader.readLine())!=null){
// Now you can do anything with the content.
System.out.println(str);
}
}
}
Does that help?

Create File object of file from parent directory in java

I have this issue of accessing a file in one of the parent directories.
To explain, consider the following dir structure:-
C:/Workspace/Appl/src/org/abc/bm/TestFile.xml
C:/Workspace/Appl/src/org/abc/bm/tests/CheckTest.java
In the CheckTest.java I want to create a File instance for the TestFile.xml
public class Check {
public void checkMethod() {
File f = new File({filePath value I want to determine}, "TestFile.xml");
}
}
I tried a few things with getAbsolutePath() and the getParent() etc but was getting a bit complicated and frankly I think I messed it up.
The reason I don't want to use "C:/Workspace/Appl/src/org/abc/bm" while creating the File instance is because the C:/Workspace/Appl is not fixed and in all circumstances will be different at runtime and basically I don't want to hard-code.
What could be the easiest and cleaner way to achieve this ?
Thank you.

You should load it from Classpath in this case.
In your CheckTest.java, try
FileInputStream fileIs = new FileInputStream(CheckTest.class.getClassLoader().getResourceAsStream("org/abc/bm/TestFile.xml");

Use System.getProperty to get the base dir or you set the base.dir during application launch
java -Dbase.dir=c:\User\pkg
System.getProperty("base.dir");
and use
System.getProperty("file.separator");

What could be the easiest and cleaner way to achieve this ?
For accessing static resources use:
URL urlToResource = this.getClasS().getResource("path/to/the.resource");
If the resource is expected to change, write it to a sub-directory of user.home, where it is easy to locate later.

First of all, you can't get a reference to the source file path on runtime.
But, you can access the resrources included at your classpath (where you complied .class files will be).
Normally, your compiler will copy the xml file included at your srouce directory into the build directory, so at last, you could end up having something like this:
C:/Workspace/Appl/classes/org/abc/bm/TestFile.xml
C:/Workspace/Appl/classes/org/abc/bm/tests/CheckTest.class
Then, with your classpath pointing to the compiled classes root dir, you get the resources from this directory, using the ClassLoader.getResource method (or the equivalent Class.getResource() method).
public class Check {
public void checkMethod() {
java.net.URL fileURL=this.getClass().getResource("/org/abc/bm/tests/TestFile.xml");
File f=new File( fileURL.toURI());
}
}

One could do this:
String pathOfTheCurrentClass = this.getClass().getResource(".").getPath();
File file = new File(pathOfTheCurrentClass + "/..", "Testfile.xml");
or
String pathOfTheCurrentClass = this.getClass().getResource(".").getPath();
File filePath = new File(pathOfTheCurrentClass);
File file = new File(filePath.getParent(), "Testfile.xml");
But as Tomas Naros points out this gives you the file located in the build path.

Did you try
URL some=Test.class.getClass().getClassLoader().getResource("org/abc/bm/TestFile.xml");
File file = new File(some.getFile());

How do I read the manifest file for a webapp running in apache tomcat?

I have a webapp which contains a manifest file, in which I write the current version of my application during an ant build task. The manifest file is created correctly, but when I try to read it in during runtime, I get some strange side-effects. My code for reading in the manifest is something like this:
InputStream manifestStream = Thread.currentThread()
.getContextClassLoader()
.getResourceAsStream("META-INFFFF/MANIFEST.MF");
try {
Manifest manifest = new Manifest(manifestStream);
Attributes attributes = manifest.getMainAttributes();
String impVersion = attributes.getValue("Implementation-Version");
mVersionString = impVersion;
}
catch(IOException ex) {
logger.warn("Error while reading version: " + ex.getMessage());
}
When I attach eclipse to tomcat, I see that the above code works, but it seems to get a different manifest file than the one I expected, which I can tell because the ant version and build timestamp are both different. Then, I put "META-INFFFF" in there, and the above code still works! This means that I'm reading some other manifest, not mine. I also tried
this.getClass().getClassLoader().getResourceAsStream(...)
But the result was the same. What's the proper way to read the manifest file from inside of a webapp running in tomcat?
Edit: Thanks for the suggestions so far. Also, I should note that I am running tomcat standalone; I launch it from the command line, and then attach to the running instance in Eclipse's debugger. That shouldn't make a difference, should it?

Maybe your side-effects come from the fact that almost all jars include a MANIFEST.MF and you're not getting the right one. To read the MANIFEST.MF from the webapp, I would say:
ServletContext application = getServletConfig().getServletContext();
InputStream inputStream = application.getResourceAsStream("/META-INF/MANIFEST.MF");
Manifest manifest = new Manifest(inputStream);
Please note that running Tomcat from Eclipse is not the same as running Tomcat alone as Eclipse plays with the classloader.

a bit late, but this works for me (web appl in Glassfish)
Properties prop = new Properties();
prop.load(getServletContext().getResourceAsStream("/META-INF/MANIFEST.MF"));
System.out.println("All attributes:" + prop.stringPropertyNames());
System.out.println(prop.getProperty("{whatever attribute you want}"));

Try to use jcabi-manifests, that does all this loading work for you. For example:
String version = Manifests.read("My-Version");
loads My-Version attribute from one of available MANIFEST.MF files.
Important to mention that (more details are here) in most web containers current thread class loader is not the same as servlet context class loader. That's why you should append your servlet context to the register in runtime (more info):
Manifests.append(servletContext);
Also, check this out: http://www.yegor256.com/2014/07/03/how-to-read-manifest-mf.html

The default way class loaders work is to defer to the parent before attempting to lookup their own resources. So if a parent class loader has any manifest available, that's what you'll get. In fact, app servers don't necessarily do this, to allow applications to override versions of libraries. Further, class loaders can have multiple jars and hence multiple manifests.
It may be able to get a resource URL of one of your uniquely named resource. Open a connection. Cast to JarURLConnection. Get the JarFile. Load the manifest from that. That may not work, particularly if Tomcat explodes the war.
[Update] Of course, the war file itself isn't on the classpath. The classpath will have something like WEB-INF/lib/(.jar|.zip) and WEB-INF/classes/. Getting a resource from the ServletContext should work.
Best solution: Do something different. :)

The right manifest exists in application root at server.
Find out the appication root, for instance by finding out classpath of your class:
String rootPath = getClass().getProtectionDomain().getCodeSource().getLocation().getPath()
Then replace the path above with the founded path: Glassfish example:
/applications/<webProject>/META-INF/MANIFEST.MF
It work for me.

Don't know about a "official" way to read it, but if the MANIFEST.MF can't be properly loaded as a resource, how about trying to derive its path from a "ServletContext.getRealPath()" on some web path defined in your app?
Writing the app version also to some else place (a property file in WEB-INF/classes) by ant during build is another solution that comes to my mind.

This is what I do to print various versions to a logfile. I have hardcoded an expanded path but apps may use servletContext.getRealPath("/") to read a full path to webapp folder. May print just given libraries or everything from lib folder.
// print library versions (jersey-common.jar, jackson-core-2.6.1.jar)
try {
List<String> jars = Arrays.asList( "jersey-common", "jackson-core", "openjpa", "mylib" );
StringBuilder verbuf = new StringBuilder();
for(File file : new File("/opt/tomcat/webapps/myapp/WEB-INF/lib/").listFiles() ) {
String name = file.getName();
if (file.isDirectory() || !file.isFile() || !name.endsWith(".jar") ) continue;
name = name.substring(0, name.length()-4);
boolean found = jars.contains(name);
if (!found) {
int idx = name.lastIndexOf('-');
if (idx>0)
found = jars.contains( name.substring(0, idx) );
}
if (!found) continue;
JarFile jarFile = new JarFile(file, false);
try {
String ver;
Manifest mf = jarFile.getManifest();
if (mf!=null) {
ver = mf.getMainAttributes().getValue("Bundle-Version");
if (ver==null || ver.isEmpty())
ver = mf.getMainAttributes().getValue("Implementation-Version");
} else ver=null;
if (verbuf.length()>0) verbuf.append(", ");
verbuf.append(name + "=" + (ver!=null?ver:"") );
} finally {
jarFile.close();
}
}
System.out.println( verbuf.toString() );
} catch(Exception ex) {
ex.printStackTrace();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.