Limit Buffer Reader to prevent DoS Attack - java

I need help with the below code. I need to review it and fix the security issues within the code. The issue that I see is the BufferReader should read in chunks. This would possibly prevent a DOS Attack.The way the code is written now it will read a infinite length. I'm not sure the best way to limit the BufferReader.Any help would be appreciated.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class example {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// Read the filename from the command line argument
String filename = args[0];
BufferedReader inputStream = null;
String fileLine;
try {
inputStream = new BufferedReader(new FileReader(filename));
System.out.println("Email Addresses:");
// Read one Line using BufferedReader
while ((fileLine = inputStream.readLine()) != null) {
System.out.println(fileLine);
}
} catch (IOException io) {
System.out.println("File IO exception" + io.getMessage());
} finally {
// Need another catch for closing
// the streams
try {
if (inputStream != null) {
inputStream.close();
}
} catch (IOException io) {
System.out.println("Issue closing the Files" + io.getMessage());
}
}
}
}

The requirement behind the warning about BufferedReader.readLine is to impose a reasonable bound on the maximum amount of memory that an adversary can cause to be allocated at a time. In this case the important usage is the size of the String characters and roughly the same in the buffer used to create it. If the adversary can do this multiple times at once, then that will also need to be limited. Typically, if the resource can be stopped but not closed (for instance, over a network file system) then the buffer can be kept in memory indefinitely.
The easy, general solution is to implement an InputStream that limits the total number of bytes that can be read through it. That could also be implemented at the Reader level limiting the number of characters. The dirty way around is to ignore the BufferedReader and do the reading of char arrays and combining into a StringBuilder yourself.
Presumably various third party library include code that covers those approaches.
(Also: Do use try-with-resource. FileReader picks up whatever character coding has been left as the default, which is probably wrong. Adding throws IOException to main makes the simpler.)

Related

What is the memory efficient way to read a very large csv file of say 3GB in Java?

I have written 2 methods to read the file
public static void parseCsvFile(String path) throws IOException {
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
//logger.info(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
}
public static void parseCsvUsingJavaStream(String path) {
try (Stream<String> stream = Files.lines(Paths.get(path))) {
stream.forEach(System.out :: println);
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
From the first approach what I understand is that the method does not load all the lines from the file into the memory at once, which is memory efficient. I want to achieve the same using lambda expression. My question here is the does my second approach load all the lines into the memory?If yes then how can I make my second approach memory efficient?
The answer to your question is in the Files.lines javadoc :
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Your second code sample should be roughly as memory-efficient as your first code sample.
Using the streams api should result to about the same memory usage as the other approach, unless you parallelize the stream.
From the Javadoc:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed.
Bytes from the file are decoded into characters using the specified charset and the same line terminators as specified by readAllLines are supported.
After this method returns, then any subsequent I/O exception that occurs while reading from the file or when a malformed or unmappable byte sequence is read, is wrapped in an UncheckedIOException that will be thrown from the Stream method that caused the read to take place. In case an IOException is thrown when closing the file, it is also wrapped as an UncheckedIOException.
The returned stream encapsulates a Reader. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed.

Unexpected amount of lines when writing to a csv file

A part of my application writes data to a .csv file in the following way:
public class ExampleWriter {
public static final int COUNT = 10_000;
public static final String FILE = "test.csv";
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE)){
os.write(239);
os.write(187);
os.write(191);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8));
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}
public static String checkLineCount(int expectedLineCount, File file) throws Exception {
BufferedReader expectedReader = new BufferedReader(new FileReader(file));
try {
int lineCount = 0;
while (expectedReader.readLine() != null) {
lineCount++;
}
if (expectedLineCount == lineCount) {
return "correct";
} else {
return "incorrect";
}
}
finally {
expectedReader.close();
}
}
}
The file will be opened in excel and all kind of languages are present in the data. The os.write parts are for prefixing the file with a byte order mark as to enable all kinds of characters.
Somehow the amount of lines in the file do not match the count in the loop and I can not figure out how. Any help on what I am doing wrong here would be greatly appreciated.
You simply need to flush and close your output stream (forcing fsync) before opening the file for input and counting. Try adding:
writer.flush();
writer.close();
inside your try-block. after the for-loop in the main method.
(As a side note).
Note that using a BOM is optional, and (in many cases) reduces the portability of your files (because not all consuming app's are able to handle it well). It does not guarantee that the file has the advertised character encoding. So i would recommend to remove the BOM. When using Excel, just select the file and and choose UTF-8 as encoding.
You are not flushing the stream,Refer oracle docs for more info
which says that
Flushes this output stream and forces any buffered output bytes to be
written out. The general contract of flush is that calling it is an
indication that, if any bytes previously written have been buffered by
the implementation of the output stream, such bytes should immediately
be written to their intended destination. If the intended destination
of this stream is an abstraction provided by the underlying operating
system, for example a file, then flushing the stream guarantees only
that bytes previously written to the stream are passed to the
operating system for writing; it does not guarantee that they are
actually written to a physical device such as a disk drive.
The flush method of OutputStream does nothing.
You need to flush as well as close the stream. There are 2 ways
manually call close() and flush().
use try with resource
As I can see from your code that you have already implemented try with resource and also BufferedReader class also implements Closeable, Flushable so use code as per below
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream(FILE); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os, StandardCharsets.UTF_8))){
os.write(239);
os.write(187);
os.write(191);
for (int i = 0; i < COUNT; i++) {
writer.write(Integer.toString(i));
writer.newLine();
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(checkLineCount(COUNT, new File(FILE)));
}
When COUNT is 1, the code in main() will write a file with two lines, a line with data plus an empty line afterwards. Then you call checkLineCount(COUNT, file) expecting that it will return 1 but it returns 2 because the file has actually two lines.
Therefore if you want the counter to match you must not write a new line after the last line.
(As another side note).
Notice that writing CSV-files the way you are doing is really bad practice. CSV is not so easy as it may look at first sight! So, unless you really know what you are doing (so being aware of all CSV quirks), use a library!

Java: how to synchronize file modification by threads

Only one instance of my Java application can run at a time. It runs on Linux. I need to ensure that one thread doesn't modify the file while the other thread is using it.
I don't know which file locking or synchronization method to use. I have never done file locking in Java and I don't have much Java or programming experience.
I looked into java NIO and I read that "File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine." Right away I knew that I needed expert help because this is production code and I have almost no idea what I'm doing (and I have to get it done today).
Here's a brief outline of my code to upload some stuff (archive files) to a server. It gets the list of files to upload from a file (call it "listFile") -- and listFile can be modified while this method is reading from it. I minimize the chances of that by copying listFile to a temp file and using that temp file thereafter. But I think I need to lock the file during this copy process (or something like that).
package myPackage;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.example.my.FileHelper;
import com.example.my.Logger;
public class BatchUploader implements Runnable {
private int processUploads() {
File myFileToUpload;
File copyOfListFile = null;
try {
copyOfListFile = new File("/path/to/temp/workfile");
File origFile = new File("/path/to/listFile"); //"listFile" - the file that contains a list of files to upload
DataWriter.copyFile(origFile, copyOfListFile);//see code below
} catch (IOException ex) {
Logger.log(ex);
}
try {
BufferedReader input = new BufferedReader(new FileReader(copyOfListFile));
try {
while (!stopRunning && (fileToUploadName = input.readLine()) != null) {
upload(new File(fileToUploadName));
}
} finally {
input.close();
isUploading = false;
}
}
return filesUploadedCount;
}
}
Here is the code that modifies the list of files to be uploaded used in the above code:
public class DataWriter {
public void modifyListOfFilesToUpload(String uploadedFilename) {
StringBuilder content = new StringBuilder();
try {
File listOfFiles = new File("/path/to/listFile"); //file that contains a list of files to upload
if (!listOfFiles.exists()) {
//some code
}
BufferedReader input = new BufferedReader(new FileReader(listOfFiles));
try {
String line = "";
while ((line = input.readLine()) != null) {
if (!line.isEmpty() && line.endsWith(FILE_EXTENSION)) {
if (!line.contains(uploadedFilename)) {
content.append(String.format("%1$s%n", line));
} else {
//some code
}
} else {
//some code
}
}
} finally {
input.close();
}
this.write("/path/to/", "listFile", content.toString(), false, false, false);
} catch (IOException ex) {
Logger.debug("Error reading/writing uploads logfile: " + ex.getMessage());
}
}
public static void copyFile(File in, File out) throws IOException {
FileChannel inChannel = new FileInputStream(in).getChannel();
FileChannel outChannel = new FileOutputStream(out).getChannel();
try {
inChannel.transferTo(0, inChannel.size(), outChannel);
} catch (IOException e) {
throw e;
} finally {
if (inChannel != null) {
inChannel.close();
}
if (outChannel != null) {
outChannel.close();
}
}
}
private void write(String path, String fileName, String data, boolean append, boolean addNewLine, boolean doLog) {
try {
File file = FileHelper.getFile(fileName, path);
BufferedWriter bw = new BufferedWriter(new FileWriter(file, append));
bw.write(data);
if (addNewLine) {
bw.newLine();
}
bw.flush();
bw.close();
if (doLog) {
Logger.debug(String.format("Wrote %1$s%2$s", path, fileName));
}
} catch (java.lang.Exception ex) {
Logger.log(ex);
}
}
}
My I suggest a slightly different approach. Afair on Linux the file rename (mv) operation is atomic on local disks. No chance for one process to see a 'half written' file.
Let XXX be a sequence number with three (or more) digits. You could let your DataWriter append to a file called listFile-XXX.prepare and write a fixed number N of filenames into it. When N names are written, close the file and rename it (atomic, see above) to listFile-XXX. With the next filename, start writing to listFile-YYY where YYY=XXX+1.
Your BatchUploader may at any time check whether it finds files matching the pattern listFile-XXX, open them, read them upload the named files, close and delete them. There is no chance for the threads to mess up each other's file.
Implementation hints:
Make sure to use a polling mechanism in BatchUploader that waits 1 or more seconds if it does not find a file ready for upload (prevent idle wait).
You may want to make sure to sort the listFile-XXX according to XXX to make sure the uploading is kept in sequence.
Of course you could variate the protocol of when listFile-XXX.prepare is closed. If DataWriter has nothing to do for a longer time, you don't want to have files ready for upload hang around just because there are not yet N ready.
Benefits: no locking (which will be a pain to get right), no copying, easy overview over the work queue and it state in the file system.
Here is a slightly different suggestion. Assuming your file names don't have '\n' characters in them (it's a big assumption on linux, I know, but you can have your writer look up for that), why not only read complete lines and ignore the incomplete ones? By incomplete lines, I mean lines that end with EOF and not with \n.
Edit: see more suggestions in comments below.

java: decomprss files into string too slow

Here is how I compressed the string into a file:
public static void compressRawText(File outFile, String src) {
FileOutputStream fo = null;
GZIPOutputStream gz = null;
try {
fo = new FileOutputStream(outFile);
gz = new GZIPOutputStream(fo);
gz.write(src.getBytes());
gz.flush();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
gz.close();
fo.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here is how I decompressed it:
static int BUFFER_SIZE = 8 * 1024;
static int STRING_SIZE = 2 * 1024 * 1024;
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);//constant resizing is costly, so set the STRING_SIZE
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
char[] cbuf = new char[BUFFER_SIZE];
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
The decompression seems to take forever to do. I have got a feeling that I am doing too much redundant steps in the decompression bit. any idea of how I could speed it up?
EDIT: have modified the code to the above based on the following given recommendations,
1. I chaged the pattern, so to simply my code a bit, but if I couldn't use IOUtils is this still ok to use this pattern?
2. I set the StringBuilder buffer to be of 2M, as suggested by entonio, should I set it to be a little bit more? the memory is still OK, I still have around 10M available as it is suggested by the heap monitor from eclipse
3. I cut the BufferedReader and added a BufferedInputStream, but I am still not sure about the BUFFER_SIZE, any suggestions?
The above modification has improved the time taken to loop all my 30 2M files from almost 30 seconds to around 14, but I need to reduce it to under 10, is it even possible on android? Ok, basically, I need to process a text file in all 60M, I have divided them up into 30 2M, and before I start processing on each strings, I did the above timing on the time cost for me just to loop all the files and get the String in the file into my memory. Since I don't have much experience, will it be better, if I use 60 of 1M files instead? or any other improvement should I adopt? Thanks.
ALSO: Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
The BufferedReader's only purpose is the readLine() method you don't use, so why not just read from the InputStreamReader? Also, maybe decreasing the buffer size may be helpful. Also, you should probably specify the encoding while both reading and writing, though that shouldn't have an impact on performance.
edit: more data
If you know the size of the string ahead, you should add a length parameter to decompressRawText and use it to initialise the StringBuilder. Otherwise it will be constantly resized in order to accomodate the result, and that's costly.
edit: clarification
2MB implies a lot of resizes. There is no harm if you specify a capacity higher than the length you end up with after reading (other than temporarily using more memory, of course).
You should wrap the FileInputStream with a BufferedInputStream before wrapping with a GZipInputStream, rather than using a BufferedReader.
The reason is that, depending on implementation, any of the various input classes in your decoration hierarchy could decide to read on a byte-by-byte basis (and I'd say the InputStreamReader is most likely to do this). And that would translate into many read(2) calls once it gets to the FileInputStream.
Of course, this may just be superstition on my part. But, if you're running on Linux, you can always test with strace.
Edit: once nice pattern to follow when building up a bunch of stream delegates is to use a single InputStream variable. Then, you only have one thing to close in your finally block (and can use Jakarta Commons IOUtils to avoid lots of nested try-catch-finally blocks).
InputStream in = null;
try
{
in = new FileInputStream("foo");
in = new BufferedInputStream(in);
in = new GZIPInputStream(in);
// do something with the stream
}
finally
{
IOUtils.closeQuietly(in);
}
Add a BufferedInputStream between the FileInputStream and the GZIPInputStream.
Similarly when writing.

Java: Pause thread and get position in file

I'm writing an application in Java with multithreading which I want to pause and resume.
The thread is reading a file line by line while finding matching lines to a pattern. It has to continue on the place I paused the thread. To read the file I use a BufferedReader in combination with an InputStreamReader and FileInputStream.
fip = new FileInputStream(new File(*file*));
fileBuffer = new BufferedReader(new InputStreamReader(fip));
I use this FileInputStream because I need the filepointer for the position in the file.
When processing the lines it writes the matching lines to a MySQL database. To use a MySQL-connection between the threads I use a ConnectionPool to make sure just one thread is using one connection.
The problem is when I pause the threads and resume them, a few matching lines just disappear. I also tried to subtract the buffersize from the offset but it still has the same problem.
What is a decent way to solve this problem or what am I doing wrong?
Some more details:
The loop
// Regex engine
RunAutomaton ra = new RunAutomaton(this.conf.getAuto(), true);
lw = new LogWriter();
while((line=fileBuffer.readLine()) != null) {
if(line.length()>0) {
if(ra.run(line)) {
// Write to LogWriter
lw.write(line, this.file.getName());
lw.execute();
}
}
}
// Loop when paused.
while(pause) { }
}
Calculating place in file
// Get the position in the file
public long getFilePosition() throws IOException {
long position = fip.getChannel().position() - bufferSize + fileBuffer.getNextChar();
return position;
}
Putting it into the database
// Get the connector
ConnectionPoolManager cpl = ConnectionPoolManager.getManager();
Connector con = null;
while(con == null)
con = cpl.getConnectionFromPool();
// Insert the query
con.executeUpdate(this.sql.toString());
cpl.returnConnectionToPool(con);
Here's an example of what I believe you're looking for. You didn't show much of your implementation so it's hard to debug what might be causing gaps for you. Note that the position of the FileInputStream is going to be a multiple of 8192 because the BufferedReader is using a buffer of that size. If you want to use multiple threads to read the same file you might find this answer helpful.
public class ReaderThread extends Thread {
private final FileInputStream fip;
private final BufferedReader fileBuffer;
private volatile boolean paused;
public ReaderThread(File file) throws FileNotFoundException {
fip = new FileInputStream(file);
fileBuffer = new BufferedReader(new InputStreamReader(fip));
}
public void setPaused(boolean paused) {
this.paused = paused;
}
public long getFilePos() throws IOException {
return fip.getChannel().position();
}
public void run() {
try {
String line;
while ((line = fileBuffer.readLine()) != null) {
// process your line here
System.out.println(line);
while (paused) {
sleep(10);
}
}
} catch (IOException e) {
// handle I/O errors
} catch (InterruptedException e) {
// handle interrupt
}
}
}
I think the root of the problem is that you shouldn't be subtracting bufferSize. Rather you should be subtracting the number of unread characters in the buffer. And I don't think there's a way to get this.
The easiest solution I can think of is to create a custom subclass of FilterReader that keeps track of the number of characters read. Then stack the streams as follows:
FileReader
< BufferedReader
< custom filter reader
< BufferedReader(sz == 1)
The final BufferedReader is there so that you can use readLine ... but you need to set the buffer size to 1 so that the character count from your filter matches the position that the application has reached.
Alternatively, you could implement your own readLine() method in the custom filter reader.
After a few days searching I found out that indeed subtracting the buffersize and adding the position in the buffer wasn't the right way to do it. The position was never right and I was always missing some lines.
When searching a new way to do my job I didn't count the number of characters because it are just too many characters to count which will decrease my performance a lot. But I've found something else. Software engineer Mark S. Kolich created a class JumpToLine which uses the Apache IO library to jump to a given line. It can also provide the last line it has readed so this is really what I need.
There are some examples on his homepage for those interested.

Categories