Java , tdd approach, IO operations

Java , tdd approach, IO operations - java

I am training TDD approach, but I have a problem. How to test IO operations ? I used junit so far, but I read that it shouldn't be used to test with external sources ( databases, files ...), so what will be better ? Sorry for my bad English

You can't test the internal working of those external sources, but you can check the results.
For example, writing to a file:
Start test
Store data you want to write in a variable
Write data to file
Read file
Check if data is the same as the one you stored
End test
Testing is about verifying end results, so it's not necessarily a bad thing you "lose" sight of a part of the process. Generally you can assume external sources (libraries, IO..) are well tested.

Change your API to be passed InputStream and/or OutputStream and have your jUnit code pass ByteArrayInputStream and ByteArrayOutputStream, which you can easily set up/read from.
Of course your production code would need to change to, but you can often achieve this through a simple refactoring; leaving the API as-is but having the public methods call the refactored method, for example:
Change
public void read(File file) {
// do something with contents of file
}
To
public void read(File file) {
read(new FileInputStream(file));
}
// test this method
public void read(InputStream inputStream) {
// do something with contents of inputStream
}

Related

Multiple file reading loop and distinguishing between .pdf and .doc files

Am writing a Java program in Eclipse to scan keywords from resumes and filter the most suitable resume among them, apart from showing the keywords for each resume. The resumes can be of doc/pdf format.
I've successfully implemented a program to read pdf files and doc files seperately (by using Apache's PDFBox and POI jar packages and importing libraries for the required methods), display the keywords and show resume strength in terms of the number of keywords found.
Now there are two issues am stuck in:
(1) I need to distinguish between a pdf file and a doc file within the program, which is easily achievable by an if statement but am confused how to write the code to detect if a file has a .pdf or .doc extension. (I intend to build an application to select the resumes, but then the program has to decide whether it will implement the doc type file reading block or the pdf type file reading block)
(2) I intend to run the program for a list of resumes, for which I'll need a loop within which I'll run the keyword scanning operations for each resume, but I can't think of a way as because even if the files were named like 'resume1', 'resume2' etc we can't assign the loop's iterable variable in the file location like : 'C:/Resumes_Folder/Resume[i]' as thats the path.
Any help would be appreciated!

You can use a FileFilter to read only one type or another, then respond accordingly. It'll give you a List containing only files of the desired type.
The second requirement is confusing to me. I think you would be well served by creating a class that encapsulates the data and behavior that you want for a parsed Resume. Write a factory class that takes in an InputStream and produces a Resume with the data you need inside.
You are making a classic mistake: You are embedding all the logic in a main method. This will make it harder to test your code.
All problem solving consists of breaking big problems into smaller ones, solving the small problems, and assembling them to finally solve the big problem.
I would recommend that you decompose this problem into smaller classes. For example, don't worry about looping over a directory's worth of files until you can read and parse an individual PDF and DOC file.
Create an interface:
public interface ResumeParser {
Resume parse(InputStream is) throws IOException;
}
Implement different implementations for PDF and Word Doc.
Create a factory to give you the appropriate ResumeParser based on file type:
public class ResumeParserFactory {
public ResumeParser create(String fileType) {
if (fileType.contains(".pdf") {
return new PdfResumeParser();
} else if (fileType.contains(".doc") {
return new WordResumeParser();
} else {
throw new IllegalArgumentException("Unknown document type: " + fileType);
}
}
}
Be sure to write unit tests as you go. You should know how to use JUnit.

Another alternative to using a FileFilter is to use a DirectoryStream, because Files::newDirectoryStream easily allows to specify relevant file endings:
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir, "*.{doc,pdf}")) {
for (Path entry: stream) {
// process files here
}
} catch (DirectoryIteratorException ex) {
// I/O error encounted during the iteration, the cause is an IOException
throw ex.getCause();
}
}

You can do something basic like:
// Put the path to the folder containing all the resumes here
File f = new File("C:\\");
ArrayList<String> names = new ArrayList<>
(Arrays.asList(Objects.requireNonNull(f.list())));
for (String fileName : names) {
if (fileName.length() > 3) {
String type = fileName.substring(fileName.length() - 3);
if (type.equalsIgnoreCase("doc")) {
// doc file logic here
} else if (type.equalsIgnoreCase("pdf")) {
// pdf file logic here
}
}
}
But as DuffyMo's answer says, you can also use a FileFilter (it's definitely a better option than my quick code).
Hope it helps.

How can i test Save methode?

I have a following save method, but I dont know how to verify this method. How can i verify it in JUnit ??
public static void save(Spiel spielen,File file ) {
try(ObjectOutputStream out= new ObjectOutputStream(new FileOutputStream(file))) {
out.writeObject(spielen);
System.out.println("Speichern Erfolgreich");
System.out.println();
}
catch (Exception e) {
System.out.println("Fehler beim Speichern");
System.out.println();
}
}

One simple solution: don't pass in a file object. But instead a factory that creates an OutputStream for you.
At runtime, this could be a FileOutputStream. But for testing, you could pass a different factory that creates, say a ByteArrayOutputStream. Then your code writes to memory without knowing it.
And then you could write another test that reads back these bytes.

You can store the reference, expected output file on the disk, and then compare the tested output against that. There are many ways to do that comparison, including some JUnit Addons (its FileAssert in particular), or just read both files into byte arrays and assert that they equal.
Many other utilities exist, some listed on this answer: File comparator utilities

Java - Compare InputStreams of two identical files

I am creating a JUnitTest test that compares a file that is created with a benchmark file, present in the resources folder in the src folder in Eclipse.
Code
public class CompareFileTest
{
private static final String TEST_FILENAME = "/resources/CompareFile_Test_Output.xls";
#Test
public void testCompare()
{
InputStream outputFileInputStream = null;
BufferedInputStream bufferedInputStream = null;
File excelOne = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelTwo = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelThree = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Output.xls");
CompareFile compareFile = new CompareFile(excelOne, excelTwo, excelThree);
// The result of the comparison is stored in the excelThree file
compareFile.compare();
try
{
outputFileInputStream = new FileInputStream(excelThree);
bufferedInputStream = new BufferedInputStream(outputFileInputStream);
assertTrue(IOUtils.contentEquals(CompareFileTest.class.getResourceAsStream(TEST_FILENAME), bufferedInputStream));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
However, I get an Assertion Error message, without any details. Since I just created the benchmark file from the compare file operation, both files should be identical.
Thanks in advance!
EDIT: After slim's comments, I used a file diff tool and found that both files are different, although, since they are copies, I am not sure how that happened. Maybe there is a timestamp or something?

IOUtils.contentEquals() does not claim to give you any more information than a boolean "matches or does not match", so you cannot hope to get extra information from that.
If your aim is just to get to the bottom of why these two files are different, you might step away from Java and use other tools to compare the files. For example https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
If your aim is for your jUnit tests to give you more information when the files do not match (for example, the exception could say Expected files to match, but byte 5678 differs [0xAE] vs [0xAF]), you will need to use something other than IOUtils.contentEquals() -- by rolling your own, or by hunting for something appropriate in Comparing text files w/ Junit

I had a similar issue.
I was using JUNIT assertion library Assertions and got the memory address being compared rather than the actual file it seemed.
Instead of comparing the InputStream objects I converted them to byte arrays and compared those. Not an absolute specials, but I dare to claim that if the byte array is identical, then the underlying InputStream and its file have a large chance of being equal.
like this:
Assertions.assertEquals(
this.getClass().getResourceAsStream("some_image_or_other_file.ext").readAllBytes(),
someObject.getSomeObjectInputStream().readAllBytes());
Not sure that this will scale though for larger files. Certainly not OK for complex diffs, but it does the trick for an assertion.

Why InputStream.available() so time consuming?

I have implemented my own class to read pcap files. (Binary files, i.e. tcpdump, wireshark)
public class PcapReader implements Iterator<PcapPacket> {
private InputStream is;
public PcapReader (File file) throws FileNotFoundException, IOException {
is = this(new DataInputStream(
new BufferedInputStream(
new FileInputStream(file))));
}
#Override
public boolean hasNext () {
try {
return (is.available() > 0);
} catch (IOException e) {
return false;
}
}
//pseudo code!
#Override
public PcapPacket next () {
is.read(header);
is.read(body);
return new PcapPacket(header, body);
}
//more code here
}
Then I use it like this:
PcapReader reader = new PcapReader(file);
while (reader.hasNext()) {
PcapPacket pcapPacket = reader.next();
//process packet
}
The file under test has 190 Mb. And I also use JVisualVM to profile.
hasNext() is called 1.7 million times and time is 7.7 seconds
next() is called same number of times and time is 3.6 seconds
My main question is why hasNext() is so time consuming in absolute value and also twice greater than next?

When you call is.available(), in your hasNext() method, it goes down to FileInputStream.available() implementation. This is a native method, as one may see from FileInputStream source code.
In the end, this is indeed a time-consumming operation, as the Operating System implementation of the file operations will have to check ahead if more data is available to be read. So, it will actually do a read operation without updating the file pointer (or updating it back to the original position), just to check if there is a "next" byte.

I'm sure, that internal (native) implementation of available() method is not something like just returning some return availableSize;, but more complicated. Stream counts available data using OS API; especially, for example, for log files, which are written due Stream reads them.

I have implemented my own class to read pcap files.
Because you're not using jNetPcap, or because you are using jNetPcap but need something that can read from a File?
If the latter, you probably want to use a pattern other than one that has a "more data is available" method and a separate "so read that data" method; something that reads the data and either returns a "packet available"/"end of file"/"error" indication or throws an exception for one or both of the latter conditions (DataInputStream appears to throw exceptions for both I/O errors and EOF, so it might make sense to do the same for your class).
Yeah, that means it can't be an Iterator, but maybe Iterators weren't originally intended to represent records in a sequential file (besides, if you really want it to be an Iterator, what are you going to do about the remove method?).
And if you can avoid needing to read from a File, you could then use jNetPcap's own routines for reading capture files, which, in libpcap 1.1.0 and later, can also read some pcap-ng files.

Xuggle can't open in-memory input

I am working on a program that integrates Hadoop's MapReduce framework with Xuggle. For that, I am implementing a IURLProtocolHandlerFactory class that reads and writes from and to in-memory Hadoop data objects.
You can see the relevant code here:
https://gist.github.com/4191668
The idea is to register each BytesWritable object in the IURLProtocolHandlerFactory class with a UUID so that when I later refer to that name while opening the file it returns a IURLProtocolHandler instance that is attached to that BytesWritable object and I can read and write from and to memory.
The problem is that I get an exception like this:
java.lang.RuntimeException: could not open: byteswritable:d68ce8fa-c56d-4ff5-bade-a4cfb3f666fe
at com.xuggle.mediatool.MediaReader.open(MediaReader.java:637)
(see also under the posted link)
When debugging I see that the objects are correctly found in the factory, what's more, they are even being read from in the protocol handler. If I remove the listeners from/to the output file, the same happens, so the problem is already with the input. Digging deeper in the code of Xuggle I reach the JNI code (which tries to open the file) and I can't get further than this. This apparently returns an error code.
XugglerJNI.IContainer_open__SWIG_0
I would really appreciate some hint where to go next, how should I continue debugging. Maybe my implementation has a flaw, but I can't see it.

I think the problem you are running into is that a lot of the types of inputs/outputs are converted to a native file descriptor in the IContainer JNI code, but the thing you are passing cannot be converted. It may not be possible to create your own IURLProtocolHandler in this way, because it would, after a trip through XuggleIO.map(), just end up calling IContainer again and then into the IContainer JNI code which will probably try to get a native file descriptor and call avio_open().
However, there may be a couple of things that you can open in IContainer which are not files/have no file descriptors, and which would be handled correctly. The things you can open can be seen in the IContainer code, namely java.io.DataOutput and java.io.DataOutputStream (and the corresponding inputs). I recommend making your DataInput/DataOutput implementation which wraps around BytesReadable/BytesWriteable, and opening it in IContainer.
If that doesn't work, then write your inputs to a temp file and read the outputs from a temp file :)

You can copy file to local first and then try open the container:
filePath = split.getPath();
final FileSystem fileSystem = filePath.getFileSystem(job);
Path localFile = new Path(filePath.getName());
fileSystem.createNewFile(localFile);
fileSystem.copyToLocalFile(filePath, localFile);
int result = container.open(filePath.getName(), IContainer.Type.READ, null);
This code works for me in the RecordReader class.
In your case you may copy the file to local first and then try to create the MediaReader

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java , tdd approach, IO operations - java

I am training TDD approach, but I have a problem. How to test IO operations ? I used junit so far, but I read that it shouldn't be used to test with external sources ( databases, files ...), so what will be better ? Sorry for my bad English

Related

Multiple file reading loop and distinguishing between .pdf and .doc files

How can i test Save methode?

Java - Compare InputStreams of two identical files

Why InputStream.available() so time consuming?

Xuggle can't open in-memory input

Categories

Resources