I try to read a file in chunks and to pass each chunk to a thread that will count how many times each byte in the chunk is contained. The trouble is that when I pass the whole file to only one thread I get correct result but passing it to multiple threads the result becomes very strange. Here`s my code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Main{
public static void main(String[] args) throws InterruptedException, ExecutionException, IOException
{
// get number of threads to be run
Scanner in = new Scanner(System.in);
int numberOfThreads = in.nextInt();
// read file
File file = new File("testfile.txt");
long fileSize = file.length();
long chunkSize = fileSize / numberOfThreads;
FileInputStream input = new FileInputStream(file);
byte[] buffer = new byte[(int)chunkSize];
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
Set<Future<int[]>> set = new HashSet<Future<int[]>>();
while(input.available() > 0)
{
if(input.available() < chunkSize)
{
chunkSize = input.available();
}
input.read(buffer, 0, (int) chunkSize);
Callable<int[]> callable = new FrequenciesCounter(buffer);
Future<int[]> future = pool.submit(callable);
set.add(future);
}
// let`s assume we will use extended ASCII characters only
int alphabet = 256;
// hold how many times each character is contained in the input file
int[] frequencies = new int[alphabet];
// sum the frequencies from each thread
for(Future<int[]> future: set)
{
for(int i = 0; i < alphabet; i++)
{
frequencies[i] += future.get()[i];
}
}
input.close();
for(int i = 0; i< frequencies.length; i++)
{
if(frequencies[i] > 0) System.out.println((char)i + " " + frequencies[i]);
}
}
}
//help class for multithreaded frequencies` counting
class FrequenciesCounter implements Callable<int[]>
{
private int[] frequencies = new int[256];
private byte[] input;
public FrequenciesCounter(byte[] buffer)
{
input = buffer;
}
public int[] call()
{
for(int i = 0; i < input.length; i++)
{
frequencies[(int)input[i]]++;
}
return frequencies;
}
}
My testfile.txt is aaaaaaaaaaaaaabbbbcccccc.
With 1 thread the output is:
a 14
b 4
c 6`
With 2 threads the output is:
a 4
b 8
c 12
With 3 threads the output is:
b 6
c 18
And so other strange results that I cannot figure out. Could anybody help?
Every thread is using the same buffer, and one thread will be overwriting the buffer as another thread is trying to process it.
You need to make sure every thread has its own buffer that nobody else can modify.
Create byte[] array for every thread.
public static void main(String[] args) throws InterruptedException, ExecutionException, IOException {
// get number of threads to be run
Scanner in = new Scanner(System.in);
int numberOfThreads = in.nextInt();
// read file
File file = new File("testfile.txt");
long fileSize = file.length();
long chunkSize = fileSize / numberOfThreads;
FileInputStream input = new FileInputStream(file);
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
Set<Future<int[]>> set = new HashSet<Future<int[]>>();
while (input.available() > 0) {
//create buffer for every thread.
byte[] buffer = new byte[(int) chunkSize];
if (input.available() < chunkSize) {
chunkSize = input.available();
}
input.read(buffer, 0, (int) chunkSize);
Callable<int[]> callable = new FrequenciesCounter(buffer);
Future<int[]> future = pool.submit(callable);
set.add(future);
}
// let`s assume we will use extended ASCII characters only
int alphabet = 256;
// hold how many times each character is contained in the input file
int[] frequencies = new int[alphabet];
// sum the frequencies from each thread
for (Future<int[]> future : set) {
for (int i = 0; i < alphabet; i++) {
frequencies[i] += future.get()[i];
}
}
input.close();
for (int i = 0; i < frequencies.length; i++) {
if (frequencies[i] > 0)
System.out.println((char) i + " " + frequencies[i]);
}
}
}
Related
I have found plenty of different suggestions on how to parse an ASCII file containing double precision numbers into an array of doubles in Java. What I currently use is roughly the following:
stream = FileInputStream(fname);
breader = BufferedReader(InputStreamReader(stream));
scanner = java.util.Scanner(breader);
array = new double[size]; // size is known upfront
idx = 0;
try {
while(idx<size){
array[idx] = scanner.nextDouble();
idx++;
}
}
catch {...}
For an example file with 1 million numbers this code takes roughly 2 seconds. Similar code written in C, using fscanf, takes 0.1 second (!) Clearly I got it all wrong. I guess calling nextDouble() so many times is the wrong way to go because of the overhead, but I cannot figure out a better way.
I am no Java expert and hence I need a little help with this: can you tell me how to improve this code?
Edit The corresponding C code follows
fd = fopen(fname, "r+");
vals = calloc(sizeof(double), size);
do{
nel = fscanf(fd, "%lf", vals+idx);
idx++;
} while(nel!=-1);
(Summarizing some of the things that I already mentioned in the comments:)
You should be careful with manual benchmarks. The answer to the question How do I write a correct micro-benchmark in Java? points out some of the basic caveats. However, this case is not so prone to the classical pitfalls. In fact, the opposite might be the case: When the benchmark solely consists of reading a file, then you are most likely not benchmarking the code, but mainly the hard disc. This involves the usual side effects of caching.
However, there obviously is an overhead beyond the pure file IO.
You should be aware that the Scanner class is very powerful and convenient. But internally, it is a beast consisting of large regular expressions and hides a tremendous complexity from the user - a complexity that is not necessary at all when your intention is to only read double values!
There are solutions with less overhead.
Unfortunately, the simplest solution is only applicable when the numbers in the input are separated by line separators. Then, reading this file into an array could be written as
double result[] =
Files.lines(Paths.get(fileName))
.mapToDouble(Double::parseDouble)
.toArray();
and this could even be rather fast. When there are multiple numbers in one line (as you mentioned in the comment), then this could be extended:
double result[] =
Files.lines(Paths.get(fileName))
.flatMap(s -> Stream.of(s.split("\\s+")))
.mapToDouble(Double::parseDouble)
.toArray();
So regarding the general question of how to efficiently read a set of double values from a file, separated by whitespaces (but not necessarily separated by newlines), I wrote a small test.
This should not be considered as a real benchmark, and be taken with a grain of salt, but it at least tries to address some basic issues: It reads files with different sizes, multiple times, with different methods, so that for the later runs, the effects of hard disc caching should be the same for all methods:
Updated to generate sample data as described in the comment, and added the stream-based approach
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.StreamTokenizer;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Locale;
import java.util.Random;
import java.util.Scanner;
import java.util.StringTokenizer;
import java.util.stream.Stream;
public class ReadingFileWithDoubles
{
private static final int MIN_SIZE = 256000;
private static final int MAX_SIZE = 2048000;
public static void main(String[] args) throws IOException
{
generateFiles();
long before = 0;
long after = 0;
double result[] = null;
for (int n=MIN_SIZE; n<=MAX_SIZE; n*=2)
{
String fileName = "doubles"+n+".txt";
for (int i=0; i<10; i++)
{
before = System.nanoTime();
result = readWithScanner(fileName, n);
after = System.nanoTime();
System.out.println(
"size = " + n +
", readWithScanner " +
(after - before) / 1e6 +
", result " + result);
before = System.nanoTime();
result = readWithStreamTokenizer(fileName, n);
after = System.nanoTime();
System.out.println(
"size = " + n +
", readWithStreamTokenizer " +
(after - before) / 1e6 +
", result " + result);
before = System.nanoTime();
result = readWithBufferAndStringTokenizer(fileName, n);
after = System.nanoTime();
System.out.println(
"size = " + n +
", readWithBufferAndStringTokenizer " +
(after - before) / 1e6 +
", result " + result);
before = System.nanoTime();
result = readWithStream(fileName, n);
after = System.nanoTime();
System.out.println(
"size = " + n +
", readWithStream " +
(after - before) / 1e6 +
", result " + result);
}
}
}
private static double[] readWithScanner(
String fileName, int size) throws IOException
{
try (
InputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
Scanner scanner = new Scanner(br))
{
// Do this to avoid surprises on systems with a different locale!
scanner.useLocale(Locale.ENGLISH);
int idx = 0;
double array[] = new double[size];
while (idx < size)
{
array[idx] = scanner.nextDouble();
idx++;
}
return array;
}
}
private static double[] readWithStreamTokenizer(
String fileName, int size) throws IOException
{
try (
InputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr))
{
StreamTokenizer st = new StreamTokenizer(br);
st.resetSyntax();
st.wordChars('0', '9');
st.wordChars('.', '.');
st.wordChars('-', '-');
st.wordChars('e', 'e');
st.wordChars('E', 'E');
double array[] = new double[size];
int index = 0;
boolean eof = false;
do
{
int token = st.nextToken();
switch (token)
{
case StreamTokenizer.TT_EOF:
eof = true;
break;
case StreamTokenizer.TT_WORD:
double d = Double.parseDouble(st.sval);
array[index++] = d;
break;
}
} while (!eof);
return array;
}
}
// This one is reading the whole file into memory, as a String,
// which may not be appropriate for large files
private static double[] readWithBufferAndStringTokenizer(
String fileName, int size) throws IOException
{
double array[] = new double[size];
try (
InputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr))
{
StringBuilder sb = new StringBuilder();
char buffer[] = new char[1024];
while (true)
{
int n = br.read(buffer);
if (n == -1)
{
break;
}
sb.append(buffer, 0, n);
}
int index = 0;
StringTokenizer st = new StringTokenizer(sb.toString());
while (st.hasMoreTokens())
{
array[index++] = Double.parseDouble(st.nextToken());
}
return array;
}
}
private static double[] readWithStream(
String fileName, int size) throws IOException
{
double result[] =
Files.lines(Paths.get(fileName))
.flatMap(s -> Stream.of(s.split("\\s+")))
.mapToDouble(Double::parseDouble)
.toArray();
return result;
}
private static void generateFiles() throws IOException
{
for (int n=MIN_SIZE; n<=MAX_SIZE; n*=2)
{
String fileName = "doubles"+n+".txt";
if (!new File(fileName).exists())
{
System.out.println("Creating "+fileName);
writeDoubles(new FileOutputStream(fileName), n);
}
else
{
System.out.println("File "+fileName+" already exists");
}
}
}
private static void writeDoubles(OutputStream os, int n) throws IOException
{
OutputStreamWriter writer = new OutputStreamWriter(os);
Random random = new Random(0);
int numbersPerLine = random.nextInt(4) + 1;
for (int i=0; i<n; i++)
{
writer.write(String.valueOf(random.nextDouble()));
numbersPerLine--;
if (numbersPerLine == 0)
{
writer.write("\n");
numbersPerLine = random.nextInt(4) + 1;
}
else
{
writer.write(" ");
}
}
writer.close();
}
}
It compares 4 methods:
Reading with a Scanner, as in your original code snippet
Reading with a StreamTokenizer
Reading the whole file into a String, and dissecting it with a StringTokenizer
Reading the file as a Stream of lines, which are then flat-mapped to a Stream of tokens, which are then mapped to a DoubleStream
Reading the file as one large String may not be appropriate in all cases: When the files become (much) larger, then keeping the whole file in memory as a String may not be a viable solution.
A test run (on a rather old PC, with a slow hard disc drive (no solid state)) showed roughly these results:
...
size = 1024000, readWithScanner 9932.940919, result [D#1c7353a
size = 1024000, readWithStreamTokenizer 1187.051427, result [D#1a9515
size = 1024000, readWithBufferAndStringTokenizer 1172.235019, result [D#f49f1c
size = 1024000, readWithStream 2197.785473, result [D#1469ea2 ...
Obviously, the scanner imposes a considerable overhead that may be avoided when reading more directly from the stream.
This may not be the final answer, as there may be more efficient and/or more elegant solutions (and I'm looking forward to see them!), but maybe it is helpful at least.
EDIT
A small remark: There is a certain conceptual difference between the approaches in general. Roughly speaking, the difference lies in who determines the number of elements that are read. In pseudocode, this difference is
double array[] = new double[size];
for (int i=0; i<size; i++)
{
array[i] = readDoubleFromInput();
}
versus
double array[] = new double[size];
int index = 0;
while (thereAreStillNumbersInTheInput())
{
double d = readDoubleFromInput();
array[index++] = d;
}
Your original approach with the scanner was written like the first one, while the solutions that I proposed are more similar to the second. But this should not make a large difference here, assuming that the size is indeed the real size, and potential errors (like too few or too many numbers in the input) don't appear or are handled in some other way.
Here, I am reading the 18 MB file and store it in a two dimensional array. But this program takes almost 15 minutes to run. Is there anyway to optimize the running time of the program. The file contains only binary values. Thanks in advanceā¦
public class test
{
public static void main(String[] args) throws FileNotFoundException, IOException
{
BufferedReader br;
FileReader fr=null;
int m = 2160;
int n = 4320;
int[][] lof = new int[n][m];
String filename = "D:/New Folder/ETOPOCHAR";
try {
Scanner input = new Scanner(new File("D:/New Folder/ETOPOCHAR"));
double range_km=1.0;
double alonn=-57.07; //180 to 180
double alat=38.53;
while (input.hasNextLine()) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
try
{
lof[j][i] = input.nextInt();
System.out.println("value[" + j + "][" + i + "] = "+ lof[j][i]);
}
catch (java.util.NoSuchElementException e) {
// e.printStackTrace();
}
}
} //print the input matrix
}
I have also tried with byte array but i can not save it in twoD array...
public class FileToArrayOfBytes
{
public static void main( String[] args )
{
FileInputStream fileInputStream=null;
File file = new File("name of file");
byte[] bFile = new byte[(int) file.length()];
try {
//convert file into array of bytes
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
for (int i = 0; i < bFile.length; i++) {
System.out.print((char)bFile[i]);
}
System.out.println("Done");
}catch(Exception e){
e.printStackTrace();
}
}
}
You can read the file into a byte array first, then deserialize these bytes. Start with 2048 bytes buffer (as input buffer), then experiment by increasing/decreasing its size, but the experimental buffer size values should be a power of two (512, 1024, 2048, etc).
As far as I rememenber, there are good chances that the best performance can be achived with a buffer of size 2048 bytes, but it is OS dependent and should be verified.
Code sample (here you can try different values of BUFFER_SIZE variable, in my case I've read a test file of size 7.5M in less then one second):
public static void main(String... args) throws IOException {
File f = new File(args[0]);
byte[] buffer = new byte[BUFFER_SIZE];
ByteBuffer result = ByteBuffer.allocateDirect((int) f.length());
try (FileInputStream fos = new FileInputStream(f)) {
int bytesRead;
int totalBytesRead = 0;
while ((bytesRead = fos.read(buffer, 0, BUFFER_SIZE)) != -1) {
result.put(buffer, 0, bytesRead);
totalBytesRead += bytesRead;
}
// debug info
System.out.printf("Read %d bytes\n", totalBytesRead);
// Here you can do whatever you want with the result, including creation of a 2D array...
int pos = result.position();
result.rewind();
for (int i = 0; i < pos / 4; i++) {
System.out.println(result.getInt());
}
}
}
Take your time and read docs for java.io, java.nio packages as well as Scanner class, just to improve understanding.
So here is my code, it seems to work, but it just prints out the info on file rather than doing both (displaying data on console and saving the information to a text file). Help appreciated.
// imports
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintStream;
public class DTM {
// The main method for our Digital Terrain Models
/** #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException {
//Prints the console output on a text file (Output.txt)
PrintStream out = new PrintStream(new FileOutputStream("output.txt"));
System.setOut(out);
//Declare some variables
int aRows = 401;
int bCols = 401;
String DMTfile = "sk28.asc";
//Declare some tables
double data[][] = new double[aRows][bCols];
BufferedReader file = new BufferedReader(new FileReader(DMTfile));
//Write data into array
for (int i = 0; i < aRows; i++) {
String rowArray[] = file.readLine().split(" ");
for (int j = 0; j < bCols; j++) {
data[i][j] = Double.parseDouble(rowArray[j]);
}
}
//Closing the file
file.close();
//print out the array
for (int i = 0; i < aRows; i++) {
for (int j = 0; j < bCols; j++) {
System.out.println(data[i][j]);
}
}
// this hold's the smallest number
double high = Double.MIN_VALUE;
// this hold's the biggest number
double low = Double.MAX_VALUE;
//initiate a "For" loop to act as a counter through an array
for (int i = 0; i < data.length; i++) {
for (int j = 0; j < data[i].length; j++)
//determine the highest value
if (data[i][j] > high) {
high = data[i][j];
}
//determine the lowest value
else if (data[i][j] < low) {
low = data[i][j];
}
}
// Code here to find the highest number
System.out.println("Peak in this area = " + high);
// Code here to find the lowest number
System.out.println("Dip in this area = " + low);
}
}
Try the Apache Commons TeeOutputStream.
Untested, but should do the tric:
outStream = System.out;
// only the file output stream
OutputStream os = new FileOutputStream("output.txt", true);
// create a TeeOutputStream that duplicates data to outStream and os
os = new TeeOutputStream(outStream, os);
PrintStream printStream = new PrintStream(os);
System.setOut(printStream);
You're merely redirecting standard output to a file instead of the console. As far as I know there is no way to automagically clone an output onto two streams, but it's pretty easy to do it by hand:
public static void multiPrint(String s, FileOutputStream out){
System.out.print(s);
out.write(s);
}
Whenever you want to print you just have to call this function:
FileOutputStream out=new FileOutputStream("out.txt");
multiPrint("hello world\n", out);
I am writing code for the external merge sort. The idea is that the input files contain too many numbers to be stored in an array so you read some of it and put it into files to be stored. Here's my code. While it runs fast, it is not fast enough. I was wondering if you can think of any improvements I can make on the code. Note that at first, I sort every 1m integers together so I skip iterations of the merging algorithm.
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
public class ExternalSort {
public static void sort(String f1, String f2) throws Exception {
RandomAccessFile raf1 = new RandomAccessFile(f1, "rw");
RandomAccessFile raf2 = new RandomAccessFile(f2, "rw");
int fileByteSize = (int) (raf1.length() / 4);
int size = Math.min(1000000, fileByteSize);
externalSort(f1, f2, size);
boolean writeToOriginal = true;
DataOutputStream dos;
while (size <= fileByteSize) {
if (writeToOriginal) {
raf1.seek(0);
dos = new DataOutputStream(new BufferedOutputStream(
new MyFileOutputStream(raf1.getFD())));
} else {
raf2.seek(0);
dos = new DataOutputStream(new BufferedOutputStream(
new MyFileOutputStream(raf2.getFD())));
}
for (int i = 0; i < fileByteSize; i += 2 * size) {
if (writeToOriginal) {
dos = merge(f2, dos, i, size);
} else {
dos = merge(f1, dos, i, size);
}
}
dos.flush();
writeToOriginal = !writeToOriginal;
size *= 2;
}
if (writeToOriginal)
{
raf1.seek(0);
raf2.seek(0);
dos = new DataOutputStream(new BufferedOutputStream(
new MyFileOutputStream(raf1.getFD())));
int i = 0;
while (i < raf2.length() / 4){
dos.writeInt(raf2.readInt());
i++;
}
dos.flush();
}
}
public static void externalSort(String f1, String f2, int size) throws Exception{
RandomAccessFile raf1 = new RandomAccessFile(f1, "rw");
RandomAccessFile raf2 = new RandomAccessFile(f2, "rw");
int fileByteSize = (int) (raf1.length() / 4);
int[] array = new int[size];
DataInputStream dis = new DataInputStream(new BufferedInputStream(
new MyFileInputStream(raf1.getFD())));
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
new MyFileOutputStream(raf2.getFD())));
int count = 0;
while (count < fileByteSize){
for (int k = 0; k < size; ++k){
array[k] = dis.readInt();
}
count += size;
Arrays.sort(array);
for (int k = 0; k < size; ++k){
dos.writeInt(array[k]);
}
}
dos.flush();
raf1.close();
raf2.close();
dis.close();
dos.close();
}
public static DataOutputStream merge(String file,
DataOutputStream dos, int start, int size) throws IOException {
RandomAccessFile raf = new RandomAccessFile(file, "rw");
RandomAccessFile raf2 = new RandomAccessFile(file, "rw");
int fileByteSize = (int) (raf.length() / 4);
raf.seek(4 * start);
raf2.seek(4 *start);
DataInputStream dis = new DataInputStream(new BufferedInputStream(
new MyFileInputStream(raf.getFD())));
DataInputStream dis3 = new DataInputStream(new BufferedInputStream(
new MyFileInputStream(raf2.getFD())));
int i = 0;
int j = 0;
int max = size * 2;
int a = dis.readInt();
int b;
if (start + size < fileByteSize) {
dis3.skip(4 * size);
b = dis3.readInt();
} else {
b = Integer.MAX_VALUE;
j = size;
}
while (i + j < max) {
if (j == size || (a <= b && i != size)) {
dos.writeInt(a);
i++;
if (start + i == fileByteSize) {
i = size;
} else if (i != size) {
a = dis.readInt();
}
} else {
dos.writeInt(b);
j++;
if (start + size + j == fileByteSize) {
j = size;
} else if (j != size) {
b = dis3.readInt();
}
}
}
raf.close();
raf2.close();
return dos;
}
public static void main(String[] args) throws Exception {
String f1 = args[0];
String f2 = args[1];
sort(f1, f2);
}
}
You might wish to merge k>2 segments at a time. This reduces the amount of I/O from n log k / log 2 to n log n / log k.
Edit: In pseudocode, this would look something like this:
void sort(List list) {
if (list fits in memory) {
list.sort();
} else {
sublists = partition list into k about equally big sublists
for (sublist : sublists) {
sort(sublist);
}
merge(sublists);
}
}
void merge(List[] sortedsublists) {
keep a pointer in each sublist, which initially points to its first element
do {
find the pointer pointing at the smallest element
add the element it points to to the result list
advance that pointer
} until all pointers have reached the end of their sublist
return the result list
}
To efficiently find the "smallest" pointer, you might employ a PriorityQueue.
I would use memory mapped files. It can be as much as 10x faster than using this type of IO. I suspect it will be much faster in this case as well. The mapped buffers use virtual memory rather heap space to store data and can be larger than your available physical memory.
We have implemented a public domain external sort in Java:
http://code.google.com/p/externalsortinginjava/
It might be faster than yours. We use strings and not integers, but you could easily modify our code by substituting integers for strings (the code was made hackable by design). At the very least, you can compare with our design.
Looking at your code, it seems like you are reading the data in units of integers. So IO will be a bottleneck I would guess. With external memory algorithms, you want to read and write blocks of data---especially in Java.
You are sorting integers so you should check out radix sort. The core idea of radix sort is that you can sort n byte integers with n passes through the data with radix 256.
You can combine this with merge sort theory.
if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.
thread 1 read 0-999, thread 2 read 1000 - 2999, etc.
please give an example in java.
The file is very very small, and will be very fast to load. What I would do is create a thread-safe data class that loads the data. Each processing thread can then request an ID from the data class and receive a unique one with a guarantee of no other thread sending the same ID to your remote service.
In this manner, you remove the need to have all the threads accessing the file, and trying to figure out who has read and sent what ID.
RandomAccessFile or FileChannel will let you access bytes within a file. For waiting until your threads finish, look at CyclicBarrier or CountDownLatch.
Given this comment by the question's author:
I want to run a batch file , in which
it contains thousands of unique id's.
this each unique id will send as a
request to the remote system. So i
want to send requests parallely using
threads to speed up the process. But
if use multi threads then all the
threads reading complete data and
duplicate requests are sending. So i
want to avoid this duplicate requests.
I would suggest that you load the file into memory as some kind of data structure - an array of ids perhaps. Have the threads consume ids from the array. Be sure to access the array in a synchronized manner.
If the file is larger than you'd like to load in memory or the file is constantly being appended to then create a single producer thread that watches and reads from the file and inserts ids into a queue type structure.
Sorry, Here is the working code. Now i've test it self :-)
package readfilemultithreading;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class MultiThreadFileReader
{
public MultiThreadFileReader(File fileToRead, int numThreads, int numBytesForEachThread)
{
this.file = fileToRead;
this.numThreads = numThreads;
this.bytesForEachThread = numBytesForEachThread;
this.bytes = new byte[(int) file.length()];
}
private File file;
private int numThreads;
private byte[] bytes;
int bytesForEachThread;
public byte[] getResult()
{
return bytes;
}
public void startReading()
{
List<ReaderThread> readers = new ArrayList<ReaderThread>();
for (int i = 0; i < numThreads; i ++) {
ReaderThread rt = new ReaderThread(i * bytesForEachThread, bytesForEachThread, file);
readers.add(rt);
rt.start();
}
// Each Thread is Reading....
int resultIndex = 0;
for (int i = 0; i < numThreads; i++) {
ReaderThread thread = readers.get(i);
while (!thread.done) {
try {
Thread.sleep(1);
} catch (Exception e) {
}
}
for (int b = 0; b < thread.len; b++, resultIndex++)
{
bytes[resultIndex] = thread.rb[b];
}
}
}
private class ReaderThread extends Thread
{
public ReaderThread(int off, int len, File f)
{
this.off = off;
this.len = len;
this.f = f;
}
public int off, len;
private File f;
public byte[] rb;
public boolean done = false;
#Override
public void run()
{
done = false;
rb = readPiece();
done = true;
}
private byte[] readPiece()
{
try {
BufferedInputStream reader = new BufferedInputStream(new FileInputStream(f));
if (off + len > f.length()) {
len = (int) (f.length() - off);
if (len < 0)
{
len = 0;
}
System.out.println("Correct Length to: " + len);
}
if (len == 0)
{
System.out.println("No bytes to read");
return new byte[0];
}
byte[] b = new byte[len];
System.out.println("Length: " + len);
setName("Thread for " + len + " bytes");
reader.skip(off);
for (int i = off, index = 0; i < len + off; i++, index++)
{
b[index] = (byte) reader.read();
}
reader.close();
return b;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
}
Here is usage code:
package readfilemultithreading;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class Main
{
public static void main(String[] args)
{
new Main().start(args);
}
public void start(String[] args)
{
try {
MultiThreadFileReader reader = new MultiThreadFileReader(new File("C:\\Users\\Martijn\\Documents\\Test.txt"), 4, 2500);
reader.startReading();
byte[] result = reader.getResult();
FileOutputStream stream = new FileOutputStream(new File("C:\\Users\\Martijn\\Documents\\Test_cop.txt"));
for (byte b : result) {
System.out.println(b);
stream.write((int) b);
}
stream.close();
} catch (IOException ex) {
System.err.println("Reading failed");
}
}
}
Can I get now my +1 back ;-)
You must somehow synchronize the read access to the file. I suggest to use the ExecutorService:
Your main thread reads the IDs from the file and passed them to the executor service one at a time. The executor will run N threads to process N IDs concurrently.