External Sort GC Overhead

External Sort GC Overhead - java

I am writing an external sort to sort a large 2 gig file on disk
I first split the file into chunks that fit into memory and sort each one individually, and rewrite them back to disk. However, during this process I am getting GC Memory overhead exception in String.Split method in function geModel. Below is my code.
private static List<Model> getModel(String file, long lineCount, final long readSize) {
List<Model> modelList = new ArrayList<Model>();
long read = 0L;
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
//Skip lineCount lines;
for (long i = 0; i < lineCount; i++)
br.readLine();
String line = "";
while ((line = br.readLine()) != null) {
read += line.length();
if (read > readSize)
break;
String[] split = line.split("\t");
String curvature = (split.length >= 7) ? split[6] : "";
String heading = (split.length >= 8) ? split[7] : "";
String slope = (split.length == 9) ? split[8] : "";
modelList.add(new Model(split[0], split[1], split[2], split[3], split[4], split[5], curvature, heading, slope));
}
br.close();
return modelList;
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
private static void split(String inputDir, String inputFile, String outputDir, final long readSize) throws IOException {
long lineCount = 0L;
int count = 0;
int writeSize = 100000;
System.out.println("Reading...");
List<Model> curModel = getModel(inputDir + inputFile, lineCount, readSize);
System.out.println("Reading Complete");
while (curModel.size() > 0) {
lineCount += curModel.size();
System.out.println("Sorting...");
curModel.sort(new Comparator<Model>() {
#Override
public int compare(Model arg0, Model arg1) {
return arg0.compareTo(arg1);
}
});
System.out.println("Sorting Complete");
System.out.println("Writing...");
writeFile(curModel, outputDir + inputFile + count, writeSize);
System.out.println("Writing Complete");
count++;
System.out.println("Reading...");
curModel = getModel(inputDir + inputFile, lineCount, readSize);
System.out.println("Reading Complete");
}
}
It makes it through one pass and sorts ~250 MB of data from the file. However, on the second pass it throws GC Memory Overhead exception on String.split function. I do not want to use external libraries, I want to learn this on my own. The sorting and splitting works, but I cannot understand why the GC is throwing memory overhead exception on string.split function.

I'm not sure just what is causing the exception--manipulating large strings, in particular cutting and splicing them, is a huge memory/gc issue. StringBuilder can help, but in general you may have to take more direct control over the process.
To figure out more you probably want to run a profiler with your app. There is one built into the JDK (VisualVM) that is functional. It will show you what objects Java is holding on to... because of the nature of strings it's possible that you are holding onto a lot of redundant character array data.
Personally I'd try a completely different approach, for instance, what if you sorted the entire file in memory by loading the first 10(?) sortable characters of each line into an array along with the file location they were read from, sort the array and resolve any ties by loading more (the rest?) of those lines that were identical.
If you did something like that then you should be able to seek to each line and copy it to the destination file without ever caching more than one line in memory and only reading through the source file twice.
I suppose you could manufacture a file that would fail if all the strings were identical until the last couple characters, so if that ever became an issue you might have to be able to flush the full strings you've cached (there is a java memory reference object made to automatically do this for you, it's not particularly hard)

Based on how I read your implementation readSize only makes sure that you get first block X size. You are not reading 2nd block or 3rd block. Hence its not actually complete external sort.
read += line.length();
if (read > readSize)
break;
String[] split = line.split("\t");
even though you are splitting each line you seem to be using only first 9 characters. And then checking no of words in each line. This means your data is not uniform.

Related

How sort N files

Following this answer -->
How do I sort very large files
I need only the Merge function on N already sorted files on disk ,
I want to sort them into one Big file my limitation is the memory Not more than K lines in the memory (K < N) so i cannot fetch all them and then sort, preferred with java
so far I Tried as the code below , but I need a good way to iterate over all N of files line by line (not more than K LINES in memory) + store to disk the sorted final file
public void run() {
try {
System.out.println(file1 + " Started Merging " + file2 );
FileReader fileReader1 = new FileReader(file1);
FileReader fileReader2 = new FileReader(file2);
//......TODO with N ?? ......
FileWriter writer = new FileWriter(file3);
BufferedReader bufferedReader1 = new BufferedReader(fileReader1);
BufferedReader bufferedReader2 = new BufferedReader(fileReader2);
String line1 = bufferedReader1.readLine();
String line2 = bufferedReader2.readLine();
//Merge 2 files based on which string is greater.
while (line1 != null || line2 != null) {
if (line1 == null || (line2 != null && line1.compareTo(line2) > 0)) {
writer.write(line2 + "\r\n");
line2 = bufferedReader2.readLine();
} else {
writer.write(line1 + "\r\n");
line1 = bufferedReader1.readLine();
}
}
System.out.println(file1 + " Done Merging " + file2 );
new File(file1).delete();
new File(file2).delete();
writer.close();
} catch (Exception e) {
System.out.println(e);
}
}
regards,

You can use something like this
public static void mergeFiles(String target, String... input) throws IOException {
String lineBreak = System.getProperty("line.separator");
PriorityQueue<Map.Entry<String,BufferedReader>> lines
= new PriorityQueue<>(Map.Entry.comparingByKey());
try(FileWriter fw = new FileWriter(target)) {
String header = null;
for(String file: input) {
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
if(line == null) br.close();
else {
if(header == null) fw.append(header = line).write(lineBreak);
line = br.readLine();
if(line != null) lines.add(new AbstractMap.SimpleImmutableEntry<>(line, br));
else br.close();
}
}
for(;;) {
Map.Entry<String, BufferedReader> next = lines.poll();
if(next == null) break;
fw.append(next.getKey()).write(lineBreak);
final BufferedReader br = next.getValue();
String line = br.readLine();
if(line != null) lines.add(new AbstractMap.SimpleImmutableEntry<>(line, br));
else br.close();
}
}
catch(Throwable t) {
for(Map.Entry<String,BufferedReader> br: lines) try {
br.getValue().close();
} catch(Throwable next) {
if(t != next) t.addSuppressed(next);
}
}
}
Note that this code, unlike the code in your question, handles the header line. Like the original code, it will delete the input lines. If that’s not intended, you can remove the DELETE_ON_CLOSE option and simplify the entire reader construction to
BufferedReader br = new BufferedReader(new FileReader(file));
It has exactly as much lines in memory, as you have files.
While in principle, it is possible to hold less line strings in memory, to re-read them when needed, it would be a performance disaster for a questionable little saving. E.g. you have already N strings in memory when calling this method, due to the fact that you have N file names.
However, when you want to reduce the number of lines held at the same time, at all costs, you can simply use the method shown in your question. Merge the first two files into a temporary file, merge that temporary file with the third to another temporary file, and so on, until merging the temporary file with the last input file to the final result. Then you have at most two line strings in memory (K == 2), saving less memory than the operating system will use for buffering, trying to mitigate the horrible performance of this approach.
Likewise, you can use the method shown above to merge K files into a temporary file, then merge the temporary file with the next K-1 file, and so on, until merging the temporary file with the remaining K-1 or less files to the final result, to have a memory consumption scaling with K < N. This approach allows to tune K to have a reasonable ratio to N, to trade memory for speed. I think, in most practical cases, K == N will work just fine.

#Holger gave a nice answer assuming that K>=N.
You can extend it to the K<N case by using mark(int) and reset() methods of the BufferedInputStream.
The parameter of mark is how many bytes a single line can have.
The idea is as follows:
Instead of putting all the N lines in the TreeMap, you can only have K of them. Whenever you put a new line into the set and it is already 'full' you evict the smallest one from it. Additionally, you reset the stream from which it came. So when you will read it again the same data can pop up.
You have to keep track of the maximum line not kept in the TreeSet, lets call it the lower bound. Once there are no elements in the TreeSet greater than the maintained lower bound, you scan all the files once again and repopulate the set.
I'm not sure if this approach is optimal, but should be ok.
Moreover, you have to be aware that BufferedInputStream has an internal buffer at least the size of a single line, so that will consume a lot of your memory, perhaps it would be better to maintain buffering on your own.

Faster way of searching a String in a .txt

Hey guys i have written this code for searching a string in a txt file.
Is it possible to optimize the code so that it searches for the string in fastest manner possible.
Assuming the text file would be a large one (500MB - 1GB)
I dont want to use pattern Matchers.
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class StringFinder {
public static void main(String[] args)
{
double count = 0,countBuffer=0,countLine=0;
String lineNumber = "";
String filePath = "C:\\Users\\allen\\Desktop\\TestText.txt";
BufferedReader br;
String inputSearch = "are";
String line = "";
try {
br = new BufferedReader(new FileReader(filePath));
try {
while((line = br.readLine()) != null)
{
countLine++;
//System.out.println(line);
String[] words = line.split(" ");
for (String word : words) {
if (word.equals(inputSearch)) {
count++;
countBuffer++;
}
}
if(countBuffer > 0)
{
countBuffer = 0;
lineNumber += countLine + ",";
}
}
br.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Times found at--"+count);
System.out.println("Word found at--"+lineNumber);
}
}

There are fast string search algorithms, but a big part of the time will go into reading the file from external storage. If you can index the file ahead of time, you can save reading and scanning the entire file. If you can't, perhaps you can at least avoid reading the file from external storage, e.g. if the file came in from the network, then search it before or instead of writing it to storage.

Try Matcher.find, splitting is slow since it creates a lot of objects

If you don't want to use Matcher.find for some reason, then at least go for using indexOf.
You can check on the whole line without breaking up the line into a lot of String Objects which then need iterating over.
int index = line.indexOf (inputSearch);
while (index != -1)
{
count++;
countBuffer++;
index = line.indexOf (inputSearch, index+1);
}

For a plain string, i.e., not a regex, and if you can't index the file first using some sophisticated engine (Lucene or Solr come to mind for such a large file) or database (?), you should check out the Rabin-Karp algorithm. It's a very clever algorithm that finds a simple string match in O(n+m) where n is the length of the text and m the length of the search string.

Your bottleneck may not be the time it takes to parse each line at all, but reading the actual file. Disk IO is at least an order of magnitude slower than iterating thru a char array. But you really won't know until you profile your code. Fire up VisualVM and use it to figure out where you are spending the most time. If you don't, you're just guessing.

Java - Comparing Lists

I have a program written in Java that reads in a file that is simply a list of strings into a LinkedHashMap. Then it takes a second file which consists of two columns and for each row see if the right-hand term matches one of the terms from the HashMap. The problem is it's running very slow.
Here's a code snippet, this is where it compares the second file to the HashMap terms:
String output = "";
infile = new File("2columns.txt");
try {
in = new BufferedReader(new FileReader(infile));
} catch (FileNotFoundException e2) {
System.out.println("2columns.txt" + " not found");
}
try {
fw = new FileWriter("newfile.txt");
out = new PrintWriter(fw);
try {
String str = in.readLine();
while (str != null) {
StringTokenizer strtok = new StringTokenizer(str);
strtok.nextToken();
String strDest = strtok.nextToken();
System.out.println("Term = " + strDest);
//if (uniqList.contains(strDest)) {
if (uniqMap.get(strDest) != null) {
output += str + "\r\n";
System.out.println("Matched! Added: " + str);
}
str = in.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
out.print(output);
I got a performance boost from switching from an ArrayList initially to the LinkedHashMap but it's still taking a long time. What can I do to speed this up?

Your major bottleneck may be that you are recreating a StringTokenizer for every iteration of the while loop. Moving this outside the loop could help considerably. Minor speed ups can be obtained by moving the String definition outside the while loop.
The biggest speedup will probably come from using a StreamTokenizer. See below for an example.
Oh and use a HashMap instead of a LinkedHashMap as #Doug Ayers says in the above comments :)
And # MДΓΓ БДLL's suggestion of profiling your code is bang on. checkout this Eclipse Profiling Example
Reader r = new BufferedReader(new FileReader(infile));
StreamTokenizer strtok = new StreamTokenizer(r);
String strDest ="";
while (strtok.nextToken() != StreamTokenizer.TT_EOF) {
strDest=strtok.sval; //strtok.toString() might be safer, but slower
strtok.nextToken();
System.out.println("Term = " + strtok.sval);
//if (uniqList.contains(strDest)) {
if (uniqMap.get(strtok.sval) != null) {
output += str + "\r\n";
System.out.println("Matched! Added: " + strDest +" "+ strtok.sval);
}
str = in.readLine();
}
One final thought is (and I'm not confident on this one) that writing to a file may also be faster if you do it in one go at the end. i.e. store all your matches in a buffer of some sort and do the writing in one hit.

StringTokenizer is a legacy class. The recommended replacement is the string "split" method.
Some of the trys might be consolidated. You can have multiple catches for a single try.
The suggestion to use HashMap instead of LinkedHashMap is a good one. Performance for gets and puts in a smidgeon faster since there is no need to maintain a list structure.
The "output" string should be a StringBuilder rather than a String. That could help a lot.

BufferedReader: read multiple lines into a single string

I'm reading numbers from a txt file using BufferedReader for analysis. The way I'm going about this now is- reading a line using .readline, splitting this string into an array of strings using .split
public InputFile () {
fileIn = null;
//stuff here
fileIn = new FileReader((filename + ".txt"));
buffIn = new BufferedReader(fileIn);
return;
//stuff here
}
public String ReadBigStringIn() {
String line = null;
try { line = buffIn.readLine(); }
catch(IOException e){};
return line;
}
public ProcessMain() {
initComponents();
String[] stringArray;
String line;
try {
InputFile stringIn = new InputFile();
line = stringIn.ReadBigStringIn();
stringArray = line.split("[^0-9.+Ee-]+");
// analysis etc.
}
}
This works fine, but what if the txt file has multiple lines of text? Is there a way to output a single long string, or perhaps another way of doing it? Maybe use while(buffIn.readline != null) {}? Not sure how to implement this.
Ideas appreciated,
thanks.

You are right, a loop would be needed here.
The usual idiom (using only plain Java) is something like this:
public String ReadBigStringIn(BufferedReader buffIn) throws IOException {
StringBuilder everything = new StringBuilder();
String line;
while( (line = buffIn.readLine()) != null) {
everything.append(line);
}
return everything.toString();
}
This removes the line breaks - if you want to retain them, don't use the readLine() method, but simply read into a char[] instead (and append this to your StringBuilder).
Please note that this loop will run until the stream ends (and will block if it doesn't end), so if you need a different condition to finish the loop, implement it in there.

I would strongly advice using library here but since Java 8 you can do this also using streams.
try (InputStreamReader in = new InputStreamReader(System.in);
BufferedReader buffer = new BufferedReader(in)) {
final String fileAsText = buffer.lines().collect(Collectors.joining());
System.out.println(fileAsText);
} catch (Exception e) {
e.printStackTrace();
}
You can notice also that it is pretty effective as joining is using StringBuilder internally.

If you just want to read the entirety of a file into a string, I suggest you use Guava's Files class:
String text = Files.toString("filename.txt", Charsets.UTF_8);
Of course, that's assuming you want to maintain the linebreaks. If you want to remove the linebreaks, you could either load it that way and then use String.replace, or you could use Guava again:
List<String> lines = Files.readLines(new File("filename.txt"), Charsets.UTF_8);
String joined = Joiner.on("").join(lines);

Sounds like you want Apache IO FileUtils
String text = FileUtils.readStringFromFile(new File(filename + ".txt"));
String[] stringArray = text.split("[^0-9.+Ee-]+");

If you create a StringBuilder, then you can append every line to it, and return the String using toString() at the end.
You can replace your ReadBigStringIn() with
public String ReadBigStringIn() {
StringBuilder b = new StringBuilder();
try {
String line = buffIn.readLine();
while (line != null) {
b.append(line);
line = buffIn.readLine();
}
}
catch(IOException e){};
return b.toString();
}

You have a file containing doubles. Looks like you have more than one number per line, and may have multiple lines.
Simplest thing to do is read lines in a while loop.
You could return null from your ReadBigStringIn method when last line is reached and terminate your loop there.
But more normal would be to create and use the reader in one method. Perhaps you could change to a method which reads the file and returns an array or list of doubles.
BTW, could you simply split your strings by whitespace?
Reading a whole file into a single String may suit your particular case, but be aware that it could cause a memory explosion if your file was very large. Streaming approach is generally safer for such i/o.

This creates a long string, every line is seprateted from string " " (one space):
public String ReadBigStringIn() {
StringBuffer line = new StringBuffer();
try {
while(buffIn.ready()) {
line.append(" " + buffIn.readLine());
} catch(IOException e){
e.printStackTrace();
}
return line.toString();
}

Java: How to read a text file

I want to read a text file containing space separated values. Values are integers.
How can I read it and put it in an array list?
Here is an example of contents of the text file:
1 62 4 55 5 6 77
I want to have it in an arraylist as [1, 62, 4, 55, 5, 6, 77]. How can I do it in Java?

You can use Files#readAllLines() to get all lines of a text file into a List<String>.
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
// ...
}
Tutorial: Basic I/O > File I/O > Reading, Writing and Creating text files
You can use String#split() to split a String in parts based on a regular expression.
for (String part : line.split("\\s+")) {
// ...
}
Tutorial: Numbers and Strings > Strings > Manipulating Characters in a String
You can use Integer#valueOf() to convert a String into an Integer.
Integer i = Integer.valueOf(part);
Tutorial: Numbers and Strings > Strings > Converting between Numbers and Strings
You can use List#add() to add an element to a List.
numbers.add(i);
Tutorial: Interfaces > The List Interface
So, in a nutshell (assuming that the file doesn't have empty lines nor trailing/leading whitespace).
List<Integer> numbers = new ArrayList<>();
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
for (String part : line.split("\\s+")) {
Integer i = Integer.valueOf(part);
numbers.add(i);
}
}
If you happen to be at Java 8 already, then you can even use Stream API for this, starting with Files#lines().
List<Integer> numbers = Files.lines(Paths.get("/path/to/test.txt"))
.map(line -> line.split("\\s+")).flatMap(Arrays::stream)
.map(Integer::valueOf)
.collect(Collectors.toList());
Tutorial: Processing data with Java 8 streams

Java 1.5 introduced the Scanner class for handling input from file and streams.
It is used for getting integers from a file and would look something like this:
List<Integer> integers = new ArrayList<Integer>();
Scanner fileScanner = new Scanner(new File("c:\\file.txt"));
while (fileScanner.hasNextInt()){
integers.add(fileScanner.nextInt());
}
Check the API though. There are many more options for dealing with different types of input sources, differing delimiters, and differing data types.

This example code shows you how to read file in Java.
import java.io.*;
/**
* This example code shows you how to read file in Java
*
* IN MY CASE RAILWAY IS MY TEXT FILE WHICH I WANT TO DISPLAY YOU CHANGE WITH YOUR OWN
*/
public class ReadFileExample
{
public static void main(String[] args)
{
System.out.println("Reading File from Java code");
//Name of the file
String fileName="RAILWAY.txt";
try{
//Create object of FileReader
FileReader inputFile = new FileReader(fileName);
//Instantiate the BufferedReader Class
BufferedReader bufferReader = new BufferedReader(inputFile);
//Variable to hold the one line data
String line;
// Read file line by line and print on the console
while ((line = bufferReader.readLine()) != null) {
System.out.println(line);
}
//Close the buffer reader
bufferReader.close();
}catch(Exception e){
System.out.println("Error while reading file line by line:" + e.getMessage());
}
}
}

Look at this example, and try to do your own:
import java.io.*;
public class ReadFile {
public static void main(String[] args){
String string = "";
String file = "textFile.txt";
// Reading
try{
InputStream ips = new FileInputStream(file);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String line;
while ((line = br.readLine()) != null){
System.out.println(line);
string += line + "\n";
}
br.close();
}
catch (Exception e){
System.out.println(e.toString());
}
// Writing
try {
FileWriter fw = new FileWriter (file);
BufferedWriter bw = new BufferedWriter (fw);
PrintWriter fileOut = new PrintWriter (bw);
fileOut.println (string+"\n test of read and write !!");
fileOut.close();
System.out.println("the file " + file + " is created!");
}
catch (Exception e){
System.out.println(e.toString());
}
}
}

Just for fun, here's what I'd probably do in a real project, where I'm already using all my favourite libraries (in this case Guava, formerly known as Google Collections).
String text = Files.toString(new File("textfile.txt"), Charsets.UTF_8);
List<Integer> list = Lists.newArrayList();
for (String s : text.split("\\s")) {
list.add(Integer.valueOf(s));
}
Benefit: Not much own code to maintain (contrast with e.g. this). Edit: Although it is worth noting that in this case tschaible's Scanner solution doesn't have any more code!
Drawback: you obviously may not want to add new library dependencies just for this. (Then again, you'd be silly not to make use of Guava in your projects. ;-)

Use Apache Commons (IO and Lang) for simple/common things like this.
Imports:
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.ArrayUtils;
Code:
String contents = FileUtils.readFileToString(new File("path/to/your/file.txt"));
String[] array = ArrayUtils.toArray(contents.split(" "));
Done.

Using Java 7 to read files with NIO.2
Import these packages:
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
This is the process to read a file:
Path file = Paths.get("C:\\Java\\file.txt");
if(Files.exists(file) && Files.isReadable(file)) {
try {
// File reader
BufferedReader reader = Files.newBufferedReader(file, Charset.defaultCharset());
String line;
// read each line
while((line = reader.readLine()) != null) {
System.out.println(line);
// tokenize each number
StringTokenizer tokenizer = new StringTokenizer(line, " ");
while (tokenizer.hasMoreElements()) {
// parse each integer in file
int element = Integer.parseInt(tokenizer.nextToken());
}
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
To read all lines of a file at once:
Path file = Paths.get("C:\\Java\\file.txt");
List<String> lines = Files.readAllLines(file, StandardCharsets.UTF_8);

All the answers so far given involve reading the file line by line, taking the line in as a String, and then processing the String.
There is no question that this is the easiest approach to understand, and if the file is fairly short (say, tens of thousands of lines), it'll also be acceptable in terms of efficiency. But if the file is long, it's a very inefficient way to do it, for two reasons:
Every character gets processed twice, once in constructing the String, and once in processing it.
The garbage collector will not be your friend if there are lots of lines in the file. You're constructing a new String for each line, and then throwing it away when you move to the next line. The garbage collector will eventually have to dispose of all these String objects that you don't want any more. Someone's got to clean up after you.
If you care about speed, you are much better off reading a block of data and then processing it byte by byte rather than line by line. Every time you come to the end of a number, you add it to the List you're building.
It will come out something like this:
private List<Integer> readIntegers(File file) throws IOException {
List<Integer> result = new ArrayList<>();
RandomAccessFile raf = new RandomAccessFile(file, "r");
byte buf[] = new byte[16 * 1024];
final FileChannel ch = raf.getChannel();
int fileLength = (int) ch.size();
final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
fileLength);
int acc = 0;
while (mb.hasRemaining()) {
int len = Math.min(mb.remaining(), buf.length);
mb.get(buf, 0, len);
for (int i = 0; i < len; i++)
if ((buf[i] >= 48) && (buf[i] <= 57))
acc = acc * 10 + buf[i] - 48;
else {
result.add(acc);
acc = 0;
}
}
ch.close();
raf.close();
return result;
}
The code above assumes that this is ASCII (though it could be easily tweaked for other encodings), and that anything that isn't a digit (in particular, a space or a newline) represents a boundary between digits. It also assumes that the file ends with a non-digit (in practice, that the last line ends with a newline), though, again, it could be tweaked to deal with the case where it doesn't.
It's much, much faster than any of the String-based approaches also given as answers to this question. There is a detailed investigation of a very similar issue in this question. You'll see there that there's the possibility of improving it still further if you want to go down the multi-threaded line.

read the file and then do whatever you want
java8
Files.lines(Paths.get("c://lines.txt")).collect(Collectors.toList());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

External Sort GC Overhead - java

Related

How sort N files

Faster way of searching a String in a .txt

Java - Comparing Lists

BufferedReader: read multiple lines into a single string

Java: How to read a text file

Categories

Resources