Generate a large stream for testing - java

We have a web service where we upload files and want to write an integration test for uploading a somewhat large file. The testing process needs to generate the file (I don't want to add some larger file to source control).
I'm looking to generate a stream of about 50 MB to upload. The data itself does not much matter. I tried this with an in-memory object and that was fairly easy, but I was running out of memory.
The integration tests are written in Groovy, so we can use Groovy or Java APIs to generate the data. How can we generate a random stream for uploading without keeping it in memory the whole time?

Here is a simple program which generates a 50 MB text file with random content.
import java.io.PrintWriter;
import java.util.Random;
public class Test004 {
public static void main(String[] args) throws Exception {
PrintWriter pw = new PrintWriter("c:/test123.txt");
Random rnd = new Random();
for (int i=0; i<50*1024*1024; i++){
pw.write('a' + rnd.nextInt(10));
}
pw.flush();
pw.close();
}
}

You could construct a mock/dummy implementation of InputStream to supply random data, and then pass that in wherever your class/library/whatever is expecting an InputStream.
Something like this (untested):
class MyDummyInputStream extends InputStream {
private Random rn = new Random(0);
#Override
public byte read() { return (byte)rn.nextInt(); }
}
Of course, if you need to know the data (for test comparison purposes), you'll either need to save this data somewhere, or you'll need to generate algorithmic data (i.e. a known pattern) rather than random data.
(Of course, I'm sure you'll find existing frameworks that do all this kind of thing for you...)

Related

Java - Compare InputStreams of two identical files

I am creating a JUnitTest test that compares a file that is created with a benchmark file, present in the resources folder in the src folder in Eclipse.
Code
public class CompareFileTest
{
private static final String TEST_FILENAME = "/resources/CompareFile_Test_Output.xls";
#Test
public void testCompare()
{
InputStream outputFileInputStream = null;
BufferedInputStream bufferedInputStream = null;
File excelOne = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelTwo = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Input1.xls");
File excelThree = new File(StandingsCreationHelper.directoryPath + "CompareFile_Test_Output.xls");
CompareFile compareFile = new CompareFile(excelOne, excelTwo, excelThree);
// The result of the comparison is stored in the excelThree file
compareFile.compare();
try
{
outputFileInputStream = new FileInputStream(excelThree);
bufferedInputStream = new BufferedInputStream(outputFileInputStream);
assertTrue(IOUtils.contentEquals(CompareFileTest.class.getResourceAsStream(TEST_FILENAME), bufferedInputStream));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
However, I get an Assertion Error message, without any details. Since I just created the benchmark file from the compare file operation, both files should be identical.
Thanks in advance!
EDIT: After slim's comments, I used a file diff tool and found that both files are different, although, since they are copies, I am not sure how that happened. Maybe there is a timestamp or something?
IOUtils.contentEquals() does not claim to give you any more information than a boolean "matches or does not match", so you cannot hope to get extra information from that.
If your aim is just to get to the bottom of why these two files are different, you might step away from Java and use other tools to compare the files. For example https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
If your aim is for your jUnit tests to give you more information when the files do not match (for example, the exception could say Expected files to match, but byte 5678 differs [0xAE] vs [0xAF]), you will need to use something other than IOUtils.contentEquals() -- by rolling your own, or by hunting for something appropriate in Comparing text files w/ Junit
I had a similar issue.
I was using JUNIT assertion library Assertions and got the memory address being compared rather than the actual file it seemed.
Instead of comparing the InputStream objects I converted them to byte arrays and compared those. Not an absolute specials, but I dare to claim that if the byte array is identical, then the underlying InputStream and its file have a large chance of being equal.
like this:
Assertions.assertEquals(
this.getClass().getResourceAsStream("some_image_or_other_file.ext").readAllBytes(),
someObject.getSomeObjectInputStream().readAllBytes());
Not sure that this will scale though for larger files. Certainly not OK for complex diffs, but it does the trick for an assertion.

Does a file object support all files (keyboards, directories, files, etc.)?

In Linux, everything is a file: keyboards, directories, text files, USB devices, etc.
In java, what would happen if I used a File object to take in something like a keyboard (or anything that isn't your typical "file". Would it work? If not, how come?
If it would work, would I be able to do anything significant with it or are there limitations?
Yes, the Java File class works the same way for all files. It will also work for directories (those will return true for isDirectory) and special files like keyboards and USB devices (those will return false for isFile).
FileReader, FileInputStream, and classes like that will work on regular and special files, but will not work on directories.
As an example, here's a simple program (error handling removed for simplicity) that reads random bytes from the '/dev/random' device and writes them to the audio output at '/dev/dsp'. (It's loud and horrible, mind your ears!)
import java.io.*;
public class Main {
public static void main(String[] args) throws Exception {
File random = new File("/dev/random");
File audio = new File("/dev/dsp");
InputStream in = new FileInputStream(random);
OutputStream out = new FileOutputStream(audio);
while (true) {
out.write(in.read());
}
}
}
Something to keep in mind is that 'special' files like these usually do not allow you to seek, that is, go forwards and backwards in the file. You can't, for example, read what they keyboard will send ten minutes from now.

Usefulness of DELETE_ON_CLOSE

There are many examples on the internet showing how to use StandardOpenOption.DELETE_ON_CLOSE, such as this:
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);
Other examples similarly use Files.newOutputStream(..., StandardOpenOption.DELETE_ON_CLOSE).
I suspect all of these examples are probably flawed. The purpose of writing a file is that you're going to read it back at some point; otherwise, why bother writing it? But wouldn't DELETE_ON_CLOSE cause the file to be deleted before you have a chance to read it?
If you create a work file (to work with large amounts of data that are too large to keep in memory) then wouldn't you use RandomAccessFile instead, which allows both read and write access? However, RandomAccessFile doesn't give you the option to specify DELETE_ON_CLOSE, as far as I can see.
So can someone show me how DELETE_ON_CLOSE is actually useful?
First of all I agree with you Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE) in this example the use of DELETE_ON_CLOSE is meaningless. After a (not so intense) search through the internet the only example I could find which shows the usage as mentioned was the one from which you might got it (http://softwarecave.org/2014/02/05/create-temporary-files-and-directories-using-java-nio2/).
This option is not intended to be used for Files.write(...) only. The API make is quite clear:
This option is primarily intended for use with work files that are used solely by a single instance of the Java virtual machine. This option is not recommended for use when opening files that are open concurrently by other entities.
Sorry I can't give you a meaningful short example, but see such file like a swap file/partition used by an operating system. In cases where the current JVM have the need to temporarily store data on the disc and after the shutdown the data are of no use anymore. As practical example I would mention it is similar to an JEE application server which might decide to serialize some entities to disc to freeup memory.
edit Maybe the following (oversimplified code) can be taken as example to demonstrate the principle. (so please: nobody should start a discussion about that this "data management" could be done differently, using fixed temporary filename is bad and so on, ...)
in the try-with-resource block you need for some reason to externalize data (the reasons are not subject of the discussion)
you have random read/write access to this externalized data
this externalized data only is of use only inside the try-with-resource block
with the use of the StandardOpenOption.DELETE_ON_CLOSE option you don't need to handle the deletion after the use yourself, the JVM will take care about it (the limitations and edge cases are described in the API)
.
static final int RECORD_LENGTH = 20;
static final String RECORD_FORMAT = "%-" + RECORD_LENGTH + "s";
// add exception handling, left out only for the example
public static void main(String[] args) throws Exception {
EnumSet<StandardOpenOption> options = EnumSet.of(
StandardOpenOption.CREATE,
StandardOpenOption.WRITE,
StandardOpenOption.READ,
StandardOpenOption.DELETE_ON_CLOSE
);
Path file = Paths.get("/tmp/enternal_data.tmp");
try (SeekableByteChannel sbc = Files.newByteChannel(file, options)) {
// during your business processing the below two cases might happen
// several times in random order
// example of huge datastructure to externalize
String[] sampleData = {"some", "huge", "datastructure"};
for (int i = 0; i < sampleData.length; i++) {
byte[] buffer = String.format(RECORD_FORMAT, sampleData[i])
.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
sbc.position(i * RECORD_LENGTH);
sbc.write(byteBuffer);
}
// example of processing which need the externalized data
Random random = new Random();
byte[] buffer = new byte[RECORD_LENGTH];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
for (int i = 0; i < 10; i++) {
sbc.position(RECORD_LENGTH * random.nextInt(sampleData.length));
sbc.read(byteBuffer);
byteBuffer.flip();
System.out.printf("loop: %d %s%n", i, new String(buffer));
}
}
}
The DELETE_ON_CLOSE is intended for working temp files.
If you need to make some operation that needs too be temporaly stored on a file but you don't need to use the file outside of the current execution a DELETE_ON_CLOSE in a good solution for that.
An example is when you need to store informations that can't be mantained in memory for example because they are too heavy.
Another example is when you need to store temporarely the informations and you need them only in a second moment and you don't like to occupy memory for that.
Imagine also a situation in which a process needs a lot of time to be completed. You store informations on a file and only later you use them (perhaps many minutes or hours after). This guarantees you that the memory is not used for those informations if you don't need them.
The DELETE_ON_CLOSE try to delete the file when you explicitly close it calling the method close() or when the JVM is shutting down if not manually closed before.
Here are two possible ways it can be used:
1. When calling Files.newByteChannel
This method returns a SeekableByteChannel suitable for both reading and writing, in which the current position can be modified.
Seems quite useful for situations where some data needs to be stored out of memory for read/write access and doesn't need to be persisted after the application closes.
2. Write to a file, read back, delete:
An example using an arbitrary text file:
Path p = Paths.get("C:\\test", "foo.txt");
System.out.println(Files.exists(p));
try {
Files.createFile(p);
System.out.println(Files.exists(p));
try (BufferedWriter out = Files.newBufferedWriter(p, Charset.defaultCharset(), StandardOpenOption.DELETE_ON_CLOSE)) {
out.append("Hello, World!");
out.flush();
try (BufferedReader in = Files.newBufferedReader(p, Charset.defaultCharset())) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println(Files.exists(p));
This outputs (as expected):
false
true
Hello, World!
false
This example is obviously trivial, but I imagine there are plenty of situations where such an approach may come in handy.
However, I still believe the old File.deleteOnExit method may be preferable as you won't need to keep the output stream open for the duration of any read operations on the file, too.

Java , tdd approach, IO operations

I am training TDD approach, but I have a problem. How to test IO operations ? I used junit so far, but I read that it shouldn't be used to test with external sources ( databases, files ...), so what will be better ? Sorry for my bad English
You can't test the internal working of those external sources, but you can check the results.
For example, writing to a file:
Start test
Store data you want to write in a variable
Write data to file
Read file
Check if data is the same as the one you stored
End test
Testing is about verifying end results, so it's not necessarily a bad thing you "lose" sight of a part of the process. Generally you can assume external sources (libraries, IO..) are well tested.
Change your API to be passed InputStream and/or OutputStream and have your jUnit code pass ByteArrayInputStream and ByteArrayOutputStream, which you can easily set up/read from.
Of course your production code would need to change to, but you can often achieve this through a simple refactoring; leaving the API as-is but having the public methods call the refactored method, for example:
Change
public void read(File file) {
// do something with contents of file
}
To
public void read(File file) {
read(new FileInputStream(file));
}
// test this method
public void read(InputStream inputStream) {
// do something with contents of inputStream
}

Java Communication Server to print 1 million PDF document

I have a Java batch job which prints 1 million (1 page) PDF document.
This batch job will run after every 5 days.
For printing 1 million (1 Page) PDF document through batch job, which method is better ?
In this PDF most of the text / paragraph is same for all customers, only few information is dynamically picked from database as (Customer Id/ Name/ Due Date/ Expiry Date/ Amount)
We have tried following
1) Jasper Report
2) iText
But above 2 methods are not giving better performance as static text / paragraph for each document is created runtime always.
So I am thinking for some approach like
There will be a template with place holders for dynamic values (Customer Id/ Name/ Due Date/ Expiry Date/ Amount).
There will be a Communication Server like Open Office, which will have this template.
Through our Java Application deployed on web server will fetch dataset from database and pass onto this communication server, where templates are already opened into memory and just from dataset dynamic placeholder values will be changed and template will be saved like "Save As" command.
Can this above approach will be achievable, If yes which API / Communication server is better ?
Here is Jasper Report Code for reference
InputStream is = getClass().getResourceAsStream("/jasperreports/reports/"+reportName+".jasper" );
JasperPrint print = JasperFillManager.fillReport(is, parameters, dataSource);
pdf = File.createTempFile("report.pdf", "");
JasperExportManager.exportReportToPdfFile(print, pdf.getPath());
Wow. 1 million PDF files every 5 days.
Even if it takes you just 0.5 second to generate a PDF file from the beginning to end (a finished file on disk) - It will take you a FULL 5 days to generate this amount of PDFs sequentially.
I think any approach of generating the file in sub-second amount of time is fine (and Jasper reports certainly can give you this level of performance).
I think you need to think about how you're going to optimise the whole process: you're certainly going to have to use multi-threading and perhaps even several physical servers to generate this amount of files in any reasonable amount of time (at least overnight).
I will go with PDF forms (this should be "fast"):
public final class Batch
{
private static final String FORM = "pdf-form.pdf"
public static void main(final String[] args) {
final PdfPrinter printer = new PdfPrinter(FORM);
final List<Customer> customers = readCustomers();
for(final Customer customer : customers) {
try {
printer.fillAndCreate("pdf-" + customer.getId(), customer);
} catch (IOException e) {
// handle exception
} catch (DocumentException e) {
// handle exception
}
}
printer.close();
}
private #Nonnull List<Customers> readCustomers() {
// implements me
}
private Batch() {
// nothing
}
}
public class PdfPrinter implements Closable
{
private final PdfReader reader;
public PdfPrinter(#Nonnull final String src) {
reader = new PdfReader(src); // <= this reads the form pdf
}
#Override
public void close() {
reader.close();
}
public void fillAndCreate(#Nonnull final String dest, #Nonnull final Customer customer) throws IOException, DocumentException {
final PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest)); // dest = output
final AcroFields form = stamper.getAcroFields();
form.setField("customerId", customer.getId());
form.setField("name", customer.getName());
// ...
stamper.close();
}
}
see also: http://itextpdf.com/examples/iia.php?id=164
As a couple of posters mentioned, 1 million PDF files is going to mean you are going to have to sustain a rate of over 2 documents per second. This is achievable from a pure document-generation aspect, but you need to keep in mind that the load on the systems running the queries and compiling the data will also undergo a reasonable load. You also haven't said anything about the PDFs - a one page PDF is much easier to generate than a 40 page PDF...
I have seen iText and Docmosis achieve tens of documents per second and so Jasper and other technologies probably could also. I mention Docmosis because it works along the lines of the technique you mentioned (populating templates loaded into memory). Please note I work for the company that produces Docmosis.
If you haven't already, you will need to consider the hardware/software architecture and run trials with whatever technologies you are trying to make sure you will be able to get the performance you require. Presumably the peak-load might be somewhat higher than the average load.
Good luck.

Categories