How to test methods that download big files

How to test methods that download big files - java

Usually I have these methods responsible for downloading large file/s to a local directory.
It annoys me because I don't really know how to test that properly?
Should I run a test case that downloads these files to a temp directory using Role in Junit? Or maybe ask it to download files to the same local directory in production?
Importantly, such method takes a long time to download 1GB+ file, so is it bad that a test case would take a long time?
What would be an ideal test case for this?
public File downloadFile(File dir, URL url){
//url contains a 1GB or more
//method takes a long time
return file; // it's located in dir
}

My belief here is that you are approaching the problem in the wrong way altogether.
You want to test "if that method works". But this method is highly dependent on "side effects" which are, in this case, failures which can occur at any of these steps:
connection failure: the given URL cannot be accessed (for whatever reason);
network failure: connection is cut while the transfer was in progress;
file system failure: the specified file cannot be created/opened in write mode; or the write fails while you are downloading contents.
In short: it is impossible to test whether the method "works". And what is more, due to the amount of possible failures mentioned above, it means that your method should at least throw an exception so that the caller of the method can deal with it.
Finally, this is 2016; it is assumed below that you use Java 7 or later, but here is how you can rewrite your method:
public Path downloadFile(final Path dir, final URL url)
throws IOException
{
final String filename = /* decide about the filename here */;
final Path ret = dir.resolve(filename);
try (
final InputStream in = url.openStream();
) {
Files.copy(in, ret);
}
return ret;
}
Now, in your tests, just mock the behavior of that method; make it fail, make it return a valid path, make the URL fail on .openStream()... You can simulate the behavior of that method whichever way you want so that you can test how the callers of this method behave.
But such a method, in itself, is just too dependent on "side effects" that it cannot be tested reliably.

Related

maven test failing as file modified timings get updated

I have a java project with some test files in the following location:
src\test\resources\data\file\daily
I have some Junit test cases that check and assert based on the file modified time.
FileTime modFileTime = Files.getLastModifiedTime(Paths.get(classPathResource.getFile().getPath()));
when I execute the test cases using intellij without maven, my test passes and the modFileTime has time from the past e.g. 16/04/21 19:48
However, my test cases are failing when I run the tests using maven clean test as the file modified timings in target\test-classes\data\file\daily directory get updated timings.
How can I preserve the original file modified timings? or is there a common solution for this?
The method being called with test:
private boolean isFileAvailable(String file) throws IOException {
ClassPathResource classPathResource = new ClassPathResource(file);
boolean exists = Files.exists(Paths.get(classPathResource.getFile().getPath()));
if (exists) {
FileTime modFileTime = Files.getLastModifiedTime(Paths.get(classPathResource.getFile().getPath()));
long modFileMinutes = modFileTime.to(TimeUnit.MINUTES);
long minutes = FileTime.from(Instant.now()).to(TimeUnit.MINUTES);
return minutes - modFileMinutes >= 5;
} else {
return false;
}
}

mvn clean is getting rid of everything in your target/ directory before running the tests, and repopulating it. Hence, the timestamp will change every run. But this also will be the case for an initial (clean) checkout of the project, which you should be doing before any release build, so ... this is a pretty normal thing to be happening.
However, I agree with all the comments -- your test (not posted) doesn't make a lot of sense. If you want to have your test check a file with a relative timestamp, then e.g. set the timestamp on the file to 4 minutes ago and confirm it's not loaded, then set it to 6 minutes ago and confirm it's loaded. You can set the last-modified value on your test file from within the test. This is much more reliable than relying on something in the test execution system (maven) itself, especially if you generate the test file as part of the test (a good idea)
Also: if you only want to load data files older than a certain time, then I doubt you really want to have those be classpath resources. They should probably be loaded from some other known location. I suspect you are trying to solve some problem with cleverness that would be better solved by something from, e.g., https://commons.apache.org/

Junit a application that reads a file and processes it

I am working on a Java application that will read a file and then after reading it into memory will do further processing .
The requirement for file reading is that the code should read from 'current working directory'.
I have written a method as follows:
public List<String> processFile(String fileName){
String localPath = FileSystems.getDefault().getPath(".").toAbsolutePath() + fileName;
}
This method converts the file into an ArrayList which it returns.
Then using this arraylist further processing needs to be done.
public boolean workOnFile(){
List<String> records = processFile("abc.txt");
// additional logic
}
I am blocked / stumped on how to Junit the file reading part since the requirement is that the file reading needs to occur from 'working directory' so wherever the user will run the program the input file would be read from working directory.
However in case of Junit my test files would be in '\src\main\resources'
As a result test files would not be read by the 'processFile' method since it looks for files in 'current working directory'
One thought is that I need not Junit the file reading but the entire application does something after the file is read - so do I have some 'testing' provisions where while executing Junit I read file in junit and then have provisions in my class under test to inject my testArrayList ?
#Test
public void doSomeValidation() {
String testFile = "XYZ.txt";
ClassUnderTest fixture = new ClassUnderTest();
List<String> testList = /** read file in Junit from /src/main/resources/ **/
/** inject this testList into ClassUnderTest **/
fixture.setFileContent(testList );
/** then continue testing the actual method that needs to be tested **/
assertNotFalse(fixture.workOnFile());
}
To achieve this I would have to change my actual class that needs to be tested to be able to inject the test file read . Something along these lines :
public class ClassUnderTest(){
public List<String> processFile(String fileName){
String localPath = FileSystems.getDefault().getPath(".").toAbsolutePath() + fileName;
}
/** new method used in junit to inject to **/
public void setFileContent(List<String> input){
this.input = input;
}
/** modify this method first check if injected arraylist not null **/
public boolean workOnFile(){
List<String> records;
if(this.input == null){
/** in actual runs this block will execute **/
this.input = processFile("abc.txt");
}
// additional logic
}
}
Is this the right way ?
I somehow feel I am messing around with code to just make it more testable ?
is this even the right approach ?

A simple solution: change your interfaces to be easy to test.
Meaning:
have one method that puts together a file name "in the local path" (the same way your processFile() method builds that file name
then pass the result of that operation to your processFile() method.
In other words: your code limits that method to always compute the full path itself. Which makes it really hard to control, and thus to test.
Thus: dissect your problem into the smallest pieces that are possible.
Then you only need to test:
that your new method Path getLocalPathFor(String fileName) does what it is supposed to do
and then, that your method processFile(Path absFilePath) does what it needs to do (and now, you can test that method with a path that sits anywhere, not just in the local directory)

Writing to $HOME from a jar file

I'm trying to write to a file located in my $HOME directory. The code to write to that file has been packaged into a jar file. When I run the unit tests to package the jar file, everything works as expected - namely the file is populated and can be read from again.
When I try to run this code from another application where the jar file is contained the lib directory it fails. The file is created - but the file is never written to. When the app goes to read the file it fails parsing it because it is empty.
Here is the code that writes to the file:
logger.warn("TestNet wallet does not exist creating one now in the directory: " + walletPath)
testNetFileName.createNewFile()
logger.warn("Wallet file name: " + testNetFileName.getAbsolutePath)
logger.warn("Can write: "+ testNetFileName.canWrite())
logger.warn("Can read: " + testNetFileName.canRead)
val w = Wallet.fromWatchingKey(TestNet3Params.get(), testNetSeed)
w.autosaveToFile(testNetFileName, savingInterval, TimeUnit.MILLISECONDS, null)
w
}
here is the log form the above method that is relevant:
2015-12-30 15:11:46,416 - [WARN] - from class com.suredbits.core.wallet.ColdStorageWallet$ in play-akka.actor.default-dispatcher-9
TestNet wallet exists, reading in the one from disk
2015-12-30 15:11:46,416 - [WARN] - from class com.suredbits.core.wallet.ColdStorageWallet$ in play-akka.actor.default-dispatcher-9
Wallet file name: /home/chris/testnet-cold-storage.wallet
then it bombs.
Here is the definition for autoSaveToFile
public WalletFiles autosaveToFile(File f, long delayTime, TimeUnit timeUnit,
#Nullable WalletFiles.Listener eventListener) {
lock.lock();
try {
checkState(vFileManager == null, "Already auto saving this wallet.");
WalletFiles manager = new WalletFiles(this, f, delayTime, timeUnit);
if (eventListener != null)
manager.setListener(eventListener);
vFileManager = manager;
return manager;
} finally {
lock.unlock();
}
}
and the definition for WalletFiles
https://github.com/bitcoinj/bitcoinj/blob/master/core/src/main/java/org/bitcoinj/wallet/WalletFiles.java#L68
public WalletFiles(final Wallet wallet, File file, long delay, TimeUnit delayTimeUnit) {
// An executor that starts up threads when needed and shuts them down later.
this.executor = new ScheduledThreadPoolExecutor(1, new ContextPropagatingThreadFactory("Wallet autosave thread", Thread.MIN_PRIORITY));
this.executor.setKeepAliveTime(5, TimeUnit.SECONDS);
this.executor.allowCoreThreadTimeOut(true);
this.executor.setExecuteExistingDelayedTasksAfterShutdownPolicy(false);
this.wallet = checkNotNull(wallet);
// File must only be accessed from the auto-save executor from now on, to avoid simultaneous access.
this.file = checkNotNull(file);
this.savePending = new AtomicBoolean();
this.delay = delay;
this.delayTimeUnit = checkNotNull(delayTimeUnit);
this.saver = new Callable<Void>() {
#Override public Void call() throws Exception {
// Runs in an auto save thread.
if (!savePending.getAndSet(false)) {
// Some other scheduled request already beat us to it.
return null;
}
log.info("Background saving wallet, last seen block is {}/{}", wallet.getLastBlockSeenHeight(), wallet.getLastBlockSeenHash());
saveNowInternal();
return null;
}
};
}
I'm guessing it is some sort of permissions issue but I cannot seem to figure this out.
EDIT: This is all being run on the exact same Ubuntu 14.04 machine - no added complexity of different operating systems.

You cannot generally depend on the existence or writability of $HOME. There are really only two portable ways to identify (i.e. provide a path to) an external file.
Provide an explicit path using a property set on the invocation command line or provided in the environment, or
Provide the path in a configuration properties file whose location is itself provided as a property on the command line or in the environment.
The problem with using $HOME is that you cannot know what userID the application is running under. The user may or may not even have a home directory, and even if the user does, the directory may or may not be writable. In your specific case, your process may have the ability to create a file (write access on the directory itself) but write access to a file may be restricted by the umask and/or ACLs (on Windows) or selinux (on Linux).
Put another way, the installer/user of the library must explicitly provide a known writable path for your application to use.
Yet another way to think about it is that you are writing library code that may be used in completely unknown environments. You cannot assume ANYTHING about the external environment except what is in the explicit contract between you and the user. You can declare in your interface specification that $HOME must be writable, but that may be highly inconvenient for some users whose environment doesn't have $HOME writable.
A much better and portable solution is to say
specify -Dcom.xyz.workdir=[path] on the command line to indicate the work path to be used
or
The xyz library will look for its work directory in the path specified by the XYZ_WORK environment variable
Ideally, you do BOTH of these to give the user some flexibility.

savePending is always false. In the beginning of call you check that it is false, and return null. The actual save code is never executed. I am guessing you meant to check if it was true there, and also set it to true, not false. You then also need to reset it back to false in the end.
Now, why this works in your unit test is a different story. The test must be executing different code.

Monitoring directory for changes from web service

Don't know if it is clear from title, I'll explain it deeper.
First of all limitations: Java 1.5 IBM.
This is the situation:
I have spring web service that receives request with pdf document in it. I need to put this pdf into the some input directory that AFP application (not of the importance) monitors. This AFP application takes that pdf, do something with it and returns it to some output directory that I need to monitor. Monitoring of output directory would take some time, probably 30 seconds. Also, I know what is exact file name that I expect to appear in output directory. If nothing appears in 30 seconds than I would return some fault response.
Because of my poor knowledge of web services and multithreading I don't know in which possible problems I can fall into.
Also, searching the internet I realize that most of people recommend watchservice for directory monitoring, but this is introduced in Java 7.
Any suggestion, link, idea would be helpful.

So, the scenario is simple. In a main method, the following actions are done in order:
call the AFP service;
poll the directory for the output file;
deal with the output file.
We suppose here that outputFile is a File containing the absolute path to the generated file; this method returns void, adapt:
// We poll every second, so...
private static final int SAMPLES = 30;
public void dealWithAFP(whatever, arguments, are, there)
throws WhateverIsNecessary
{
callAfpService(here);
int i = 0;
try {
while (i < SAMPLES) {
TimeUnit.SECONDS.sleep(1);
if (outputFile.exists())
break;
}
throw new WhateverIsNecessary();
} catch (InterruptedException e) {
// Throw it back if the method does, otherwise the minimum is to:
Thread.currentThread().interrupt();
throw new WhateverIsNecessary();
}
dealWithOutputFile(outputFile);
}

Any sure fire way to check file existence on Linux NFS? [duplicate]

This question already has answers here:
Alternative to File.exists() in Java
(6 answers)
Closed 2 years ago.
I am working on a Java program that requires to check the existence of files.
Well, simple enough, the code make use calls to File.exists() for checking file existence. And the problem I have is, it reports false positive. That means the file does not actually exist but exists() method returns true. No exception was captured (at least no exception like "Stale NFS handle"). The program even managed to read the file through InputStream, getting 0 bytes as expected and yet no exception. The target directory is a Linux NFS. And I am 100% sure that the file being looked for never exists.
I know there are known bugs (kind of API limitation) exist for java.io.File.exists(). So I've then added another way round by checking file existence using Linux command ls. Instead of making call to File.exists() the Java code now runs a Linux command to ls the target file. If exit code is 0, file exists. Otherwise, file does not exist.
The number of times the issue is hit seems to be reduced with the introduction of the trick, but still pops. Again, no error was captured anywhere (stdout this time). That means the problem is so serious that even native Linux command won't fix for 100% of the time.
So there are couple of questions around:
I believe Java's well known issue on File.exists() is about reporting false negative. Where file was reported to not exist but in fact does exist. As the API does not throws IOException for File.exists(), it choose to swallow the Exception in the case calls to OS's underlying native functions failed e.g. NFS timeout. But then this does not explain the false positive case I am having, given that the file never exist. Any throw on this one?
My understanding on Linux ls exit code is, 0 means okay, equivalent to file exists. Is this understanding wrong? The man page of ls is not so clear on explaining the meaning of exit code: Exit status is 0 if OK, 1 if minor problems, 2 if serious trouble.
All right, back to subject. Any surefire way to check File existence with Java on Linux? Before we see JDK7 with NIO2 officially released.

Here is a JUnit test that shows the problem and some Java Code that actually tries to read the file.
The problem happens e.g. using Samba on OSX Mavericks. A possible reason
is explaned by the statement in:
http://appleinsider.com/articles/13/06/11/apple-shifts-from-afp-file-sharing-to-smb2-in-os-x-109-mavericks
It aggressively caches file and folder properties and uses opportunistic locking to enable better caching of data.
Please find below a checkFile that will actually attempt to read a few bytes and forcing a true file access to avoid the caching misbehaviour ...
JUnit test:
/**
* test file exists function on Network drive replace the testfile name and ssh computer
* with your actual environment
* #throws Exception
*/
#Test
public void testFileExistsOnNetworkDrive() throws Exception {
String testFileName="/Volumes/bitplan/tmp/testFileExists.txt";
File testFile=new File(testFileName);
testFile.delete();
for (int i=0;i<10;i++) {
Thread.sleep(50);
System.out.println(""+i+":"+OCRJob.checkExists(testFile));
switch (i) {
case 3:
// FileUtils.writeStringToFile(testFile, "here we go");
Runtime.getRuntime().exec("/usr/bin/ssh phobos /usr/bin/touch "+testFileName);
break;
}
}
}
checkExists source code:
/**
* check if the given file exists
* #param f
* #return true if file exists
*/
public static boolean checkExists(File f) {
try {
byte[] buffer = new byte[4];
InputStream is = new FileInputStream(f);
if (is.read(buffer) != buffer.length) {
// do something
}
is.close();
return true;
} catch (java.io.IOException fnfe) {
}
return false;
}

JDK7 was released a few months ago. There are exists and notExists methods in the Files class but they return a boolean rather than throwing an exception. If you really want an exception then use FileSystems.getDefault().provider().checkAccess(path) and it will throw an exception if the file does not exist.

If you need to be robust, try to read the file - and fail gracefully if the file is not there (or there is a permission or other problem). This applies to any other language than Java as well.
The only safe way to tell if the file exist and you can read from it is to actually read a data from the file. Regardless of a file system - local, or remote. The reason is a race condition which can occur right after you get success from checkAccess(path): check, then open file, and you find it suddenly does not exist. Some other thread (or another remote client) may have removed it, or has acquired an exclusive lock. So don't bother checking access, but rather try to read the file. Spending time in running ls just makes race condition window easier to fit.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.