Java - find matching pairs from list

Java - find matching pairs from list - java

background:
I need to load test a process on a server that I am working with. What I am doing is I am creating a bunch of files on client side and will upload them to server. The server is monitoring for new files (in input dir, file names are unique) and once there is a new file it processes it, once done, it creates a response file with same name but different extension to output dir. If the processing fails, it puts the incoming file to error dir. I am using the inotifywait to monitor the changes on server, which outputs:
10:48:47 /path/to/in/ CREATE ABCD.infile1
10:48:55 /path/to/out/ CREATE ABCD.outfile1
or
10:49:11 /path/to/in/ CREATE ASDF.infile1
10:49:19 /path/to/err/ CREATE ASDF.infile1
problem:
I need to parse the list of all results (planning to implement in java) like so, that I take the infile and match it with the same file name (either found in ERR or OUT), calculate the time taken and indicate weather it was success or not. The idea I am having is to create 3 lists (in, out, err) and try to parse, something like (in pseudo-code)
inList
outList
errList
for item : inList
if outlist.contains(item) parse;
else if errList.contains(item) parse;
else error;
question:
Is this efficient? Or is there a better way to approach this situation? Anyway, you might think that it is a code you are executing just once, why the struggle, but I really would like to know how do handle this properly.

The solution with lists is problematic, as you will have to keep them synchronized properly with the state of drive and always load them. What is more you will reach at some point capacity limit for file stored in single location.
Alternatives what you have are that you use i/o API to check path existence, or introduce a between database where you will store your values.
Another approach is database where you will store the information about keys and physical paths that file really has.
If I was you i would start with the I/O API and design a simple interface that could be replaced in future if the solution would appear to be inefficient.

You can use the "UserDefinedfileAttributeView" concept.
Create your own File attribute, say, "Result" and set its value accordingly for the files in IN dir. If the file is moved to OUT dir, "Result"="Success" and if the file is moved to ERR dir, "Result"="Error"
I tried the below code, hope it helps.
public static void main(String[] args) {
try{
Path file = Paths.get("C:\\Users\\rohit\\Desktop\\imp docs\\Steps.txt");
UserDefinedFileAttributeView userView = Files.getFileAttributeView(file, UserDefinedFileAttributeView.class);
String attribName = "RESULT";
String attribValue = "SUCCESS";
userView.write(attribName, Charset.defaultCharset().encode(attribValue));
List<String> attribList = userView.list();
for (String s : attribList) {
ByteBuffer buf = ByteBuffer.allocate(userView.size(s));
userView.read(s, buf);
buf.flip();
String value = Charset.defaultCharset().decode(buf).toString();
if("SUCCESS".equals(value)){
System.out.print(String.format("User defined attribute: %s", s));
System.out.println(String.format("; value: %s", value));
}
}
}
catch(Exception e){
}
You can do this for every file placed in IN dir.

Related

checkmarx - How to resolve Stored Absolute Path Traversal issue?

Checkmarx - v 9.3.0 HF11
I am passing env value as data directory path in docker file which used in dev/uat server
ENV DATA /app/data/
In local, using following Environment variable
DATA=C:\projects\app\data\
getDataDirectory("MyDirectoryName"); // MyDirectoryName is present in data folder
public String getDataDirectory(String dirName)
{
String path = System.getenv("DATA");
if (path != null) {
path = sanitizePathValue(path);
path = encodePath(path);
dirName = sanitizePathValue(dirName);
if (!path.endsWith(File.separator)) {
path = path + File.separator;
} else if (!path.contains("data")) {
throw new MyRuntimeException("Data Directory path is incorrect");
}
} else {
return null;
}
File file = new File(dirName); // NOSONAR
if (!file.isAbsolute()) {
File tmp = new File(SecurityUtil.decodePath(path)); // NOSONAR
if (!tmp.getAbsolutePath().endsWith(Character.toString(File.separatorChar))) {
dirName = tmp.getAbsolutePath() + File.separatorChar + dirName;
} else {
dirName = tmp.getAbsolutePath() + dirName;
}
}
return dirName;
}
public static String encodePath(String path) {
try {
return URLEncoder.encode(path, "UTF-8");
} catch (UnsupportedEncodingException e) {
logger.error("Exception while encoding path", e);
}
return "";
}
public static String validateAndNormalizePath(String path) {
path = path.replaceAll("/../", "/");
path = path.replaceAll("/%46%46/", "/");
path = SecurityUtil.cleanIt(path);
path = FilenameUtils.normalize(path); // normalize path
return path;
}
public static String sanitizePathValue(String filename){
filename = validateAndNormalizePath(filename);
String regEx = "..|\\|/";
// compile the regex to create pattern
// using compile() method
Pattern pattern = Pattern.compile(regEx);
// get a matcher object from pattern
Matcher matcher = pattern.matcher(filename);
// check whether Regex string is
// found in actualString or not
boolean matches = matcher.matches();
if(matches){
throw new MyAppRuntimeException("filename:'"+filename+"' is bad.");
}
return filename;
}
public static String validateAndNormalizePath(String path) {
path = path.replaceAll("/../", "/");
path = path.replaceAll("/%46%46/", "/");
path = SecurityUtil.cleanIt(path);
path = FilenameUtils.normalize(path); // normalize path
return path;
}
[Attempt] - Update code which I tried with the help of few members to prevent path traversal issue.
Tried to sanitize string and normalize string, but no luck and getting same issue.
How to resolve Stored Absolute Path Traversal issue ?

Your first attempt is not going to work because escaping alone isn't going to prevent a path traversal. Replacing single quotes with double quotes won't do it either given you need to make sure someone setting a property/env variable with ../../etc/resolv.conf doesn't succeed in tricking your code into overwriting/reading a sensitive file. I believe Checkmarx won't look for StringUtils as part of recognizing it as sanitized, so the simple working example below is similar without using StringUtils.
Your second attempt won't work because it is a validator that uses control flow to prevent a bad input when it throws an exception. Checkmarx analyzes data flows. When filename is passed as a parameter to sanitizePathValue and returned as-is at the end, the data flow analysis sees this as not making a change to the original value.
There also appears to be some customizations in your system that recognize System.getProperty and System.getenv as untrusted inputs. By default, these are not recognized in this way, so anyone trying to scan your code probably would not have gotten any results for Absolute Path Traversal. It is possible that the risk profile of your application requires that you call properties and environment variables as untrusted inputs, so you can't really just remove these and revert back to the OOTB settings.
As Roman had mentioned, the logic in the query does look for values that are prepended to this untrusted input to remove those data flows as results. The below code shows how this could be done using Roman's method to trick the scanner. (I highly suggest you do not choose the route to trick the scanner.....very bad idea.) There could be other string literal values that would work using this method, but it would require some actions that control how the runtime is executed (like using chroot) to make sure it actually fixed the issue.
If you scan the code below, you should see only one vulnerable data path. The last example is likely something along the lines of what you could use to remediate the issues. It really depends on what you're trying to do with the file being created.
(I tested this on 9.2; it should work for prior versions. If it doesn't work, post your version and I can look into that version's query.)
// Vulnerable
String fn1 = System.getProperty ("test");
File f1 = new File(fn1);
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn2 = System.getProperty ("test");
File f2 = new File(Paths.get ("", fn2).toString () );
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn3 = System.getProperty ("test");
File f3 = new File("" + fn3);
// Path prepend - still vulnerable, tricks the scanner, DO NOT USE
String fn4 = System.getProperty ("test");
File f4 = new File("", fn4);
// Sanitized by stripping path separator as defined in the JDK
// This would be the safest method
String fn5 = System.getProperty ("test");
File f5 = new File(fn5.replaceAll (File.separator, ""));
So, in summary (TL;DR), replace the file separator in the untrusted input value:
String fn5 = System.getProperty ("test");
File f5 = new File(fn5.replaceAll (File.separator, ""));
Edit
Updating for other Checkmarx users that may come across this in search of an answer.
After my answer, OP updated the question to reveal that the issue being found was due to a mechanism written for the code to run in different environments. Pre-docker, this would have been the method to use. The vulnerability would have still been detected but most courses of action would have been to say "our deployment environment has security measures around it to prevent a bad actor from injecting an undesired path into the environment variable where we store our base path."
But now, with Docker, this is a thing of the past. Generally the point of Docker is to create applications that run the way same everywhere they are deployed. Using a base path in an environment likely means OP is executing the code outside of a container for development (based on the update showing a Windows path) and inside the container for deployment. Why not just run the code in the container for development as well as deployment as is intended by Docker?
Most of the answers tend to explain that OP should use a static path. This is because they are realizing that there is no way to avoid this issue because taking an untrusted input (from the environment) and prefixing it to a path is the exact problem of Absolute Path Traversal.
OP could follow the good advice of many posters here and put a static base path in the code then use Docker volumes or Docker bind mounts.
Is it difficult? Nope. If I were OP, I'd fix the base path prefix in code to a static value of /app/data and do a simple volume binding during development. (When you think about it, if there is storage of data in the container during a deployment then the deployment environment must be doing this exact thing for /app/data unless the data is not kept after the lifetime of the container.)
With the base path fixed at /app/data, one option for OP to run their development build is:
docker run -it -v"C:\\projects\\app\\data":/app/data {container name goes here}
All data written by the application would appear in C:\projects\app\data the same way it does when using the environment variables. The main difference is that there are no environment-variable-prefixed paths and thus no Absolute Path Traversal results from the static analysis scanner.

It depends on how Checkmarx comes to this point. Most likely because the value that is handed to File is still tainted. So make sure both /../ and /%46%46/ are replaced by /.
checkedInput = userInput.replaceAll("/../", "/");
Secondly, give File a parent directory to start with and later compare the path of the file you want to process. Some common example code is below. If the file doesn't start with the full parent directory, then it means you have a path traversal.
File file = new File(BASE_DIRECTORY, userInput);
if (file.getCanonicalPath().startsWith(BASE_DIRECTORY)) {
// process file
}
Checkmarx can only check if variables contain a tainted value and in some cases if the logic is correct. Please also think about the running process and file system permissions. A lot of applications have the capability of overwriting their own executables.

If there is one thing to remember it is this
use allow lists not deny lists
(traditionally known as whitelists and blacklists).
For instance, consider replacing /../ with / suggested in another answer. My response is to contain the sequence /../../. You could pursue this iteratively, and I might run out of adversarial examples, but that doesn't mean there are any.
Another problem is knowing all the special characters. \0 used to truncate the file name. What happens to non-ASCII characters - I can't remember. Might other code be changed in future so that the path ends up on a command line with other special characters - worse, OS/command line dependent.
Canonicalisation has its problems too. It can be used to some extent probe the file system (and perhaps beyond the machine).
So, choose what you allow. Say
if (filename.matches("[a-zA-Z0-9_]+")) {
return filename;
} else {
throw new MyException(...);
}
(No need to go through the whole Pattern/Matcher palaver in this situation.)

For this issue i would suggest you hard code the absolute path of the directory that you allow your program to work in; like this:
String separator = FileSystems.getDefault().getSeparator();
// should resolve to /app/workdir in linux
String WORKING_DIR = separator + "app"+separator +"workdir"+separator ;
then when you accept the parameter treat it as a relative path like this:
String filename = System.getProperty("test");
sanitize(filename);
filename = WORKING_DIR+filename;
File dictionaryFile = new File(filename);
To sanitize your user's input make sure he does not include .. and does not include also \ nor /
private static void sanitize(filename){
if(Pattern.compile("\\.\\.|\\|/").matcher(filename).find()){
throw new RuntimeException("filename:'"+filename+"' is bad.");
}
}
Edit
In case you are running the process in linux you can change the root of the process using chroot maybe you do some googling to know how you should implement it.

how about using Java's Path to make the check("../test1.txt" is the input from user):
File base=new File("/your/base");
Path basePath=base.toPath();
Path resolve = basePath.resolve("../test1.txt");
Path relativize = basePath.relativize(resolve);
if(relativize.startsWith("..")){
throw new Exception("invalid path");
}

Based on reading the Checkmarx query for absolute path traversal vulnerability (and I believe in general one of the mitigation approach), is to prepend a hard coded path to avoid the attackers traversing through the file system:
File has a constructor that accepts a second parameter that will allow you to perform some prepending
String filename = System.getEnv("test");
File dictionaryFile = new File("/home/", filename);
UPDATE:
The validateAndNormalizePath would have technically sufficed but I believe Checkmarx is unable to recognize this as a sanitizer (being a custom written function). I would advice to work with your App Security team for them to use the CxAudit and overwrite the base Stored Path Traversal Checkmarx query to recognize validateAndNormalizePath as a valid sanitizer.

How to prevent file wipe if an error occurs while writing to it?

This is an issue I have had in many applications.
I want to change the information inside a file, which has an outdated version.
In this instance, I am updating the file that records playlists after adding a song to a playlist. (For reference, I am creating an app for android.)
The problem is if I run this code:
FileOutputStream output = new FileOutputStream(file);
output.write(data.getBytes());
output.close();
And if an IOException occurs while trying to write to the file, the data is lost (since creating an instance of FileOutputStream empties the file). Is there a better method to do this, so if an IOException occurs, the old data remains intact? Or does this error only occur when the file is read-only, so I just need to check for that?
My only "work around" is to inform the user of the error, and give said user the correct data, which the user has to manually update. While this might work for a developer, there is a lot of issues that could occur if this happens. Additionally, in this case, the user doesn't have permission to edit the file themselves, so the "work around" doesn't work at all.
Sorry if someone else has asked this. I couldn't find a result when searching.
Thanks in advance!

One way you could ensure that you do not wipe the file is by creating a new file with a different name first. If writing that file succeeds, you could delete the old file and rename the new one.
There is the possibility that renaming fails. To be completely safe from that, your files could be named according to the time at which they are created. For instance, if your file is named save.dat, you could add the time at which the file was saved (from System.currentTimeMillis()) to the end of the file's name. Then, no matter what happens later (including failure to delete the old file or rename the new one), you can recover the most recent successful save. I have included a sample implementation below which represents the time as a 16-digit zero-padded hexadecimal number appended to the file extension. A file named save.dat will be instead saved as save.dat00000171ed431353 or something similar.
// name includes the file extension (i.e. "save.dat").
static File fileToSave(File directory, String name) {
return new File(directory, name + String.format("%016x", System.currentTimeMillis()));
}
// return the entire array if you need older versions for which deletion failed. This could be useful for attempting to purge any unnecessary older versions for instance.
static File fileToLoad(File directory, String name) {
File[] files = directory.listFiles((dir, n) -> n.startsWith(name));
Arrays.sort(files, Comparator.comparingLong((File file) -> Long.parseLong(file.getName().substring(name.length()), 16)).reversed());
return files[0];
}

Multiple file reading loop and distinguishing between .pdf and .doc files

Am writing a Java program in Eclipse to scan keywords from resumes and filter the most suitable resume among them, apart from showing the keywords for each resume. The resumes can be of doc/pdf format.
I've successfully implemented a program to read pdf files and doc files seperately (by using Apache's PDFBox and POI jar packages and importing libraries for the required methods), display the keywords and show resume strength in terms of the number of keywords found.
Now there are two issues am stuck in:
(1) I need to distinguish between a pdf file and a doc file within the program, which is easily achievable by an if statement but am confused how to write the code to detect if a file has a .pdf or .doc extension. (I intend to build an application to select the resumes, but then the program has to decide whether it will implement the doc type file reading block or the pdf type file reading block)
(2) I intend to run the program for a list of resumes, for which I'll need a loop within which I'll run the keyword scanning operations for each resume, but I can't think of a way as because even if the files were named like 'resume1', 'resume2' etc we can't assign the loop's iterable variable in the file location like : 'C:/Resumes_Folder/Resume[i]' as thats the path.
Any help would be appreciated!

You can use a FileFilter to read only one type or another, then respond accordingly. It'll give you a List containing only files of the desired type.
The second requirement is confusing to me. I think you would be well served by creating a class that encapsulates the data and behavior that you want for a parsed Resume. Write a factory class that takes in an InputStream and produces a Resume with the data you need inside.
You are making a classic mistake: You are embedding all the logic in a main method. This will make it harder to test your code.
All problem solving consists of breaking big problems into smaller ones, solving the small problems, and assembling them to finally solve the big problem.
I would recommend that you decompose this problem into smaller classes. For example, don't worry about looping over a directory's worth of files until you can read and parse an individual PDF and DOC file.
Create an interface:
public interface ResumeParser {
Resume parse(InputStream is) throws IOException;
}
Implement different implementations for PDF and Word Doc.
Create a factory to give you the appropriate ResumeParser based on file type:
public class ResumeParserFactory {
public ResumeParser create(String fileType) {
if (fileType.contains(".pdf") {
return new PdfResumeParser();
} else if (fileType.contains(".doc") {
return new WordResumeParser();
} else {
throw new IllegalArgumentException("Unknown document type: " + fileType);
}
}
}
Be sure to write unit tests as you go. You should know how to use JUnit.

Another alternative to using a FileFilter is to use a DirectoryStream, because Files::newDirectoryStream easily allows to specify relevant file endings:
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir, "*.{doc,pdf}")) {
for (Path entry: stream) {
// process files here
}
} catch (DirectoryIteratorException ex) {
// I/O error encounted during the iteration, the cause is an IOException
throw ex.getCause();
}
}

You can do something basic like:
// Put the path to the folder containing all the resumes here
File f = new File("C:\\");
ArrayList<String> names = new ArrayList<>
(Arrays.asList(Objects.requireNonNull(f.list())));
for (String fileName : names) {
if (fileName.length() > 3) {
String type = fileName.substring(fileName.length() - 3);
if (type.equalsIgnoreCase("doc")) {
// doc file logic here
} else if (type.equalsIgnoreCase("pdf")) {
// pdf file logic here
}
}
}
But as DuffyMo's answer says, you can also use a FileFilter (it's definitely a better option than my quick code).
Hope it helps.

Java - Compare InputStreams of two identical files

I am creating a JUnitTest test that compares a file that is created with a benchmark file, present in the resources folder in the src folder in Eclipse.
Code
public class CompareFileTest
{
private static final String TEST_FILENAME = "/resources/CompareFile_Test_Output.xls";
#Test
public void testCompare()
{
InputStream outputFileInputStream = null;
BufferedInputStream bufferedInputStream = null;
File excelOne = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelTwo = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelThree = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Output.xls");
CompareFile compareFile = new CompareFile(excelOne, excelTwo, excelThree);
// The result of the comparison is stored in the excelThree file
compareFile.compare();
try
{
outputFileInputStream = new FileInputStream(excelThree);
bufferedInputStream = new BufferedInputStream(outputFileInputStream);
assertTrue(IOUtils.contentEquals(CompareFileTest.class.getResourceAsStream(TEST_FILENAME), bufferedInputStream));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
However, I get an Assertion Error message, without any details. Since I just created the benchmark file from the compare file operation, both files should be identical.
Thanks in advance!
EDIT: After slim's comments, I used a file diff tool and found that both files are different, although, since they are copies, I am not sure how that happened. Maybe there is a timestamp or something?

IOUtils.contentEquals() does not claim to give you any more information than a boolean "matches or does not match", so you cannot hope to get extra information from that.
If your aim is just to get to the bottom of why these two files are different, you might step away from Java and use other tools to compare the files. For example https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
If your aim is for your jUnit tests to give you more information when the files do not match (for example, the exception could say Expected files to match, but byte 5678 differs [0xAE] vs [0xAF]), you will need to use something other than IOUtils.contentEquals() -- by rolling your own, or by hunting for something appropriate in Comparing text files w/ Junit

I had a similar issue.
I was using JUNIT assertion library Assertions and got the memory address being compared rather than the actual file it seemed.
Instead of comparing the InputStream objects I converted them to byte arrays and compared those. Not an absolute specials, but I dare to claim that if the byte array is identical, then the underlying InputStream and its file have a large chance of being equal.
like this:
Assertions.assertEquals(
this.getClass().getResourceAsStream("some_image_or_other_file.ext").readAllBytes(),
someObject.getSomeObjectInputStream().readAllBytes());
Not sure that this will scale though for larger files. Certainly not OK for complex diffs, but it does the trick for an assertion.

Read metadata with ExifTool

I'm trying to read illustrator file metadata value by using Exiftool. I tried as per below.
File[] images = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for(File f : images) {
if (f.toString().contains(".ai"))
{
System.out.println("test "+tool.getImageMeta(f, Tag.DATE_TIME_ORIGINAL));
}
}
tool.close();
Above code not printing any value. I even tried this.
public static final File[] IMAGES = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for (File f : IMAGES) {
System.out.println("\n[" + f.getName() + "]");
System.out.println(tool.getImageMeta(f, Format.NUMERIC,
Tag.values()));
}
Which only prints {IMAGE_HEIGHT=2245, IMAGE_WIDTH=5393}. How do I call metadata values using Exiftool. Any advices and references links are highly appreciated.

For the given API, it either;
1-does not contain the tag you are looking for
2-the file itself might not have that tag filled
3-you might want to recreate your own using a more general tag command when calling exiftool.exe
Look in the source code and find the enum containing all the tags available to the API, that'll show you what you're restricted to. But yeah, you might want to consider making your own class similar to the one you're using. I'm in the midst of doing the same. That way you can store the tags in perhaps a set or HashMap instead of an enum and therefore be much less limited in tag choice. Then, all you have to do is write the commands for the tags you want to the process's OutputStream and then read the results from the InputStream.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.