Java: Create large text file with random numbers?

Java: Create large text file with random numbers? - java

I'm trying to create a text file with random numbers on each line.
I have managed to do this but for some reason the largest file I can seem to generate is 768MBs and I need files up to 15Gbs.
Ay ideas why this is happening? My guess is some sort of size limitation or memory issue?
This is the code I have written:
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException {
//Size in Gbs of my file that I want
double wantedSize = Double.parseDouble("1.5");
Random random = new Random();
PrintWriter writer = new PrintWriter("AvgNumbers.txt", "UTF-8");
boolean keepGoing = true;
int counter = 0;
while(keepGoing){
counter++;
StringBuilder stringValue = new StringBuilder();
for (int i = 0; i < 100; i++) {
double value = 0.1 + (100.0 - 0.1) * random.nextDouble();
stringValue.append(value);
stringValue.append(" ");
}
writer.println(stringValue.toString());
//Check to see if the current size is what we want it to be
if (counter == 10000) {
File file = new File("AvgNumbers.txt");
double currentSize = file.length();
double gbs = (currentSize/1000000000.00);
if(gbs > wantedSize){
keepGoing=false;
writer.close();
}else{
writer.flush();
counter = 0;
}
}
}
}

This is how I would code it. It produces the size you want as well.
public static void main(String... ignored) throws FileNotFoundException, UnsupportedEncodingException {
//Size in Gbs of my file that I want
double wantedSize = Double.parseDouble(System.getProperty("size", "1.5"));
Random random = new Random();
File file = new File("AvgNumbers.txt");
long start = System.currentTimeMillis();
PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF-8")), false);
int counter = 0;
while (true) {
String sep = "";
for (int i = 0; i < 100; i++) {
int number = random.nextInt(1000) + 1;
writer.print(sep);
writer.print(number / 1e3);
sep = " ";
}
writer.println();
//Check to see if the current size is what we want it to be
if (++counter == 20000) {
System.out.printf("Size: %.3f GB%n", file.length() / 1e9);
if (file.length() >= wantedSize * 1e9) {
writer.close();
break;
} else {
counter = 0;
}
}
}
long time = System.currentTimeMillis() - start;
System.out.printf("Took %.1f seconds to create a file of %.3f GB", time / 1e3, file.length() / 1e9);
}
prints finally
Took 58.3 seconds to create a file of 1.508 GB

You never clean your StringBuilder, and it keeps accumulating all the random number strings you have stored. Just after you write do a clear().

Related

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

So I've been trying to read the content of a text file and write the content chunk by chunk alternately into e.g. 2 new files.
I already tried multiple ways to do that but it won't work (OutputStream and FileOutputStream seems to be the most suitable).
Before i tried to part the file in e.g. 3 Parts and wrote the first part in one file, the second part in another and so on. Which worked perfectly fine with OutputStream and FileOutputStream.
But it won't work when i want to do it alternately.
To do it alternately i use the round robin algorithm, which on its own works fine.
I would be really thankful if you could show me some examples to do it!
public void splitFile(String filePath, int numberOfParts, long sizeOfParts[]) throws FileNotFoundException, IOException, SQLException {
long bytes = 8;
OutputStream partsPath[] = new OutputStream[numberOfParts];
long bytePositition[] = new long[numberOfParts];
long copy_size[] = new long[numberOfParts];
for (int i = 0; i < numberOfParts; i++) {
copy_size[i] = sizeOfParts[i];
partsPath[i] = new FileOutputStream(path); //Gets Path from my Database (works)
//System.out.println(cloudsTable.getCloudsPathsFromDatabase(i) + '\\' + name + (i + 1) + fileType);
}
InputStream file = new FileInputStream(filePath);
while (true) {
boolean done = true;
for (int i = 0; i < numberOfParts; i++) {
if (copy_size[i] > 0) {
done = false;
if (copy_size[i] > bytes) {
copy_size[i] -= bytes;
bytePositition[i] += bytes;
System.out.println("file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
} else {
bytePositition[i] += copy_size[i];
System.out.println("rest file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
copy_size[i] = 0;
}
}
}
if (done == true) {
break;
}
}
file.close();
for (int i = 0; i < partsPath.length; i++) {
partsPath[i].close();
}
}
private void readWrite(InputStream file, long bytes, OutputStream path) throws IOException {
byte[] buf = new byte[(int) bytes];
while (file.read(buf) != -1) {
path.write(buf);
path.flush();
}
}
What the code does is, it only write the content of the Originalfile in the first-copied file and the following files are empty
EDIT:
To clarify what the code should do is write the first 8 bytes to go to file 1, second 8 bytes to go to file 2, third 8 bytes to go to file 3, fourth 8 bytes to go to file 1, and so on, round robin, until file 1 is sizeOfParts[0] long, file 2 is sizeOfParts[1] long, and file 3 is sizeOfParts[2] long.

The main problem is that the readWrite() method is only supposed to copy one 8-byte block of bytes, but has a loop that makes it copy all the remaining bytes in the input file.
In addition, the code should be enhanced to use try-finally to close the files, and to correctly handle end-of-file, in case the input file is shorter than the sum of parts.
I would eliminate the readWrite() method, and consolidate the logic to prevent duplicate code, like this:
public void splitFile(String inPath, long[] sizeOfParts) throws IOException, SQLException {
final int numberOfParts = sizeOfParts.length;
String[] outPath = new String[numberOfParts];
// Gets Paths from Database here
InputStream in = null;
OutputStream[] out = new OutputStream[numberOfParts];
try {
in = new BufferedInputStream(new FileInputStream(inPath));
for (int part = 0; part < numberOfParts; part++)
out[part] = new BufferedOutputStream(new FileOutputStream(outPath[part]));
byte[] buf = new byte[8];
long[] remain = sizeOfParts.clone();
for (boolean done = false; ! done; ) {
done = true;
for (int part = 0; part < numberOfParts; part++) {
if (remain[part] > 0) {
int len = in.read(buf, 0, (int) Math.min(remain[part], buf.length));
if (len == -1) {
done = true;
break;
}
remain[part] -= len;
System.out.println("file " + part + " " + (sizeOfParts[part] - remain[part]));
out[part].write(buf, 0, len);
done = false;
}
}
}
} finally {
if (in != null)
in.close();
for (int part = 0; part < out.length; part++)
if (out[part] != null)
out[part].close();
}
}

Java FileChannel Vs BufferedReader - Spring Batch - Reader

We process huge files (sometimes 50 GB each file). The application reads this one file and based on the business logic, it will write multiple output files (4-6).
The records in the file are of variable length and each field in a record is a delimiter separated.
Going by the understanding that reading a file using FileChannel with a ByteBuffer was always better than using a BufferedReader.readLine and then using a split by the delimiter.
BufferSizes tried 10240(10KB) and even more
Commit interval - 5000, 10000 etc
Below is how we used file channel to read:
Read byte by byte. Check if the read byte is a new line char(10) -
which means end of line.
check for delimiter bytes. capture the bytes read in a byte array(we initialized this byte array with a maximum field size of 350 bytes) until delimiter bytes are encountered.
convert these bytes read until this time, to String using UTF-8 encoding - new String(byteArr, 0, index,"UTF-8") to be specific - index is the number of bytes read until delimiter.
Using this method of reading the file using FileChannel took 57 minutes to process the file.
We want to decrease this time and tried using BufferredReader.readLine() and then use a split by delimiter, to see how it fares.
And shockingly the same file completed processing only in 7 minutes.
What's the catch here? Why FileChannel is taking more time than a buffered reader and then using a string split.
I was always under the assumption that ReadLine and Split combination will have a big performance impact?
Can any one throw light on if I was using FileChannel in a wrong way? One
Thanks in advance. Hope I have summarized the issue properly.
The below is sample code :
while (inputByteBuffer.hasRemaining() && (b = inputByteBuffer.get()) != 0){
boolean endOfField = false;
if (b == 10){
break;
}
else{
if (b == 94){//^
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (inputByteBuffer.hasRemaining()){
byte b2 = inputByteBuffer.get();
if (b2 == 124){//|
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (inputByteBuffer.hasRemaining()){
byte b3 = inputByteBuffer.get();
if (b3 == 94){//^
String field = new String(fieldBytes, 0, index, encoding);
if(fieldIndex == -1){
fields = new String[sizeFromAConfiguration];
}else{
fields[fieldIndex] = field;
}
fieldBytes = new byte[maxFieldSize];
endOfField = true;
fieldIndex++;
}
else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b2, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b3, index);
}
}
else{
endOfFile = true;
//fields.add(new String(fieldBytes, 0, index, encoding));
fields[fieldIndex] = new String(fieldBytes, 0, index, encoding);
fieldBytes = new byte[maxFieldSize];
endOfField = true;
}
}else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b2, index);
}
}else{
endOfFile = true;
fieldBytes = addFieldBytes(fieldBytes, b, index);
}
}
else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
}
}
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (endOfField){
index = 0;
}
else{
index++;
}
}

You're causing a lot of overhead with the constant hasRemaining()/read() checks as well as the constant get() calls. It would probably be better to get() the entire buffer into an array and process that directly, only calling read() when you get to the end.
And to answer a question in comments, you should not allocate a new ByteBuffer per read. This is expensive. Keep using the same one. And NB do not use a DirectByteBuffer for this application. It is not appropriate: it's only appropriate when you want the data to stay south of the JVM/JNI boundary, e.g. when merely copying between channels.
But I think I would throw this away, or rather rewrite it, using BufferedReader.read(), rather than readLine() followed by string splits, and using much the same logic as you have here, except of course that you don't need to keep calling hasRemaining() and filling the buffer, which BufferedReader will do automatically for you.
You have to take care to store the result of read() into an int, and to check it for -1 after every read().
It isn't clear to me that you should be using a Reader at all actually, unless you know you have multibyte text. Possibly a simple BufferedInputStream would be more appropriate.

While one cannot tell with certainty how a particular code will behave I would imagine the best way is to profile it just like you did.The FileChannel while percieved to be faster is actually not helping in your case.But this may not be because of reading from the file but actual processing that you do with the content you read.
One article I would like to point out while dealing with files is
https://www.redgreencode.com/why-is-java-io-slow/
Also the corresponding Github codebase
Java IO benchmark
I would like to point out this code to use a combination of both worlds
fos = new FileOutputStream(outputFile);
outFileChannel = fos.getChannel();
bufferedWriter = new BufferedWriter(Channels.newWriter(outFileChannel, "UTF-8"));
Since it is read in your case I will consider
File inputFile = new File("C:\\input.txt");
FileInputStream fis = new FileInputStream(inputFile);
FileChannel inputChannel = fis.getChannel();
BufferedReader bufferedReader = new BufferedReader(Channels.newReader(inputChannel,"UTF-8"));
Also I will tweak the chunksize and with Spring batch it is always trial and error to find sweet spot.
On a completely unrelated note the reason for your problem of not able to use BufferedReader is because of doubling of charecters and I am assuming this happens more commonly with ebcdic charecters.I will simply run a loop like this to identfy the troublemakers and eliminate at the source.
import java.io.UnsupportedEncodingException;
public class EbcdicConvertor {
public static void main(String[] args) throws UnsupportedEncodingException {
int index = 0;
for (int i = -127; i < 128; i++) {
byte[] b = new byte[1];
b[0] = (byte) i;
String cp037 = new String(b, "CP037");
if (cp037.getBytes().length == 2) {
index++;
System.out.println(i + "::" + cp037);
}
}
System.out.println(index);
}
}
The above answer is without testing my actual hypothesis.Here is an actual program to measure time.The results speak for themselves on a 200 MB file
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;
public class ReadComplexDelimitedFile {
private static long total = 0;
private static final Pattern DELIMITER_PATTERN = Pattern.compile("\\^\\|\\^");
private void readFileUsingScanner() {
String s;
try (Scanner stdin = new Scanner(new File(this.getClass().getResource("input.txt").getPath()))) {
while (stdin.hasNextLine()) {
s = stdin.nextLine();
String[] fields = DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingCustomBufferedReader() {
try (BufferedReader stdin = new BufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReader() {
try (java.io.BufferedReader stdin = new java.io.BufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReaderFileChannel() {
try (FileInputStream fis = new FileInputStream(this.getClass().getResource("input.txt").getPath())) {
try (FileChannel inputChannel = fis.getChannel()) {
try (BufferedReader stdin = new BufferedReader(Channels.newReader(inputChannel, "UTF-8"))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
}
} catch (Exception e) {
System.err.println("Error");
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReaderByteFileChannel() {
try (FileInputStream fis = new FileInputStream(this.getClass().getResource("input.txt").getPath())) {
try (FileChannel inputChannel = fis.getChannel()) {
try (BufferedReader stdin = new BufferedReader(Channels.newReader(inputChannel, "UTF-8"))) {
int b;
StringBuilder sb = new StringBuilder();
while ((b = stdin.read()) != -1) {
if (b == 10) {
total = total + DELIMITER_PATTERN.split(sb, 0).length;
sb = new StringBuilder();
} else {
sb.append((char) b);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingFileChannelStream() {
try (RandomAccessFile fis = new RandomAccessFile(new File(this.getClass().getResource("input.txt").getPath()), "r")) {
try (FileChannel inputChannel = fis.getChannel()) {
ByteBuffer byteBuffer = ByteBuffer.allocate(8192);
ByteBuffer recordBuffer = ByteBuffer.allocate(250);
int recordLength = 0;
while ((inputChannel.read(byteBuffer)) != -1) {
byte b;
byteBuffer.flip();
while (byteBuffer.hasRemaining() && (b = byteBuffer.get()) != -1) {
if (b == 10) {
recordBuffer.flip();
total = total + splitIntoFields(recordBuffer, recordLength);
recordBuffer.clear();
recordLength = 0;
} else {
++recordLength;
recordBuffer.put(b);
}
}
byteBuffer.clear();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private int splitIntoFields(ByteBuffer recordBuffer, int recordLength) {
byte b;
String[] fields = new String[17];
int fieldCount = -1;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < recordLength - 1; i++) {
b = recordBuffer.get(i);
if (b == 94 && recordBuffer.get(++i) == 124 && recordBuffer.get(++i) == 94) {
fields[++fieldCount] = sb.toString();
sb = new StringBuilder();
} else {
sb.append((char) b);
}
}
fields[++fieldCount] = sb.toString();
return fields.length;
}
public static void main(String args[]) {
//JVM wamrup
for (int i = 0; i < 100000; i++) {
total += i;
}
// We know scanner is slow-Still warming up
ReadComplexDelimitedFile readComplexDelimitedFile = new ReadComplexDelimitedFile();
List<Long> longList = new ArrayList<>(50);
for (int i = 0; i < 50; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingScanner();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingScanner");
longList.forEach(System.out::println);
// Actual performance test starts here
longList = new ArrayList<>(10);
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReaderFileChannel();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReaderFileChannel");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingCustomBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingCustomBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReaderByteFileChannel();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReaderByteFileChannel");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingFileChannelStream();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingFileChannelStream");
longList.forEach(System.out::println);
}
}
BufferedReader was written very long back and hence we can rewrite some parts relevant to this example.For instance we don't care about \r and skipLF or skipCR or those kinds of stuff
We are going to read the file( no need for syncrhonized)
By extension no need for StringBuffer even otherwise StringBuilder can be used.Performance improvement immediately seen.
dangerous hack,remove synchronized and replace StringBuffer with StringBuilder don't use it without proper testing and not knowing what you are doing
public String readLine() throws IOException {
StringBuilder s = null;
int startChar;
bufferLoop:
for (; ; ) {
if (nextChar >= nChars)
fill();
if (nextChar >= nChars) { /* EOF */
if (s != null && s.length() > 0)
return s.toString();
else
return null;
}
boolean eol = false;
char c = 0;
int i;
/* Skip a leftover '\n', if necessary */
charLoop:
for (i = nextChar; i < nChars; i++) {
c = cb[i];
if (c == '\n') {
eol = true;
break charLoop;
}
}
startChar = nextChar;
nextChar = i;
if (eol) {
String str;
if (s == null) {
str = new String(cb, startChar, i - startChar);
} else {
s.append(cb, startChar, i - startChar);
str = s.toString();
}
nextChar++;
return str;
}
if (s == null)
s = new StringBuilder(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
}
}
Java 8 Intel i5 12 GB RAM Windows 10
Result:
Time taken for readFileUsingBufferedReaderFileChannel::
2581635057 1849820885 1763992972 1770510738 1746444157 1733491399
1740530125 1723907177 1724280512 1732445638
Time taken for readFileUsingBufferedReader
1851027073 1775304769 1803507033 1789979554 1786974538 1802675458
1789672780 1798036307 1789847714 1785302003
Time taken for readFileUsingCustomBufferedReader
1745220476 1721039975 1715383650 1728548462 1724746005 1718177466
1738026017 1748077438 1724608192 1736294175
Time taken for readFileUsingBufferedReaderByteFileChannel
2872857919 2480237636 2917488143 2913491126 2880117231 2904614745
2911756298 2878777496 2892169722 2888091211
Time taken for readFileUsingFileChannelStream
3039447073 2896156498 2538389366 2906287280 2887612064 2929288046
2895626578 2955326255 2897535059 2884476915
Process finished with exit code 0

I did try NIO with all possible options(provided in this post and to the best of my knowledge and research) and found that it no where came close to BufferedReader in terms of reading a text file.
Changing BufferedReader to use StringBuilder in place of StringBuffer, I don't see any significant improvement in performance (only very few seconds for some files and some of them were better using StringBuffer itself).
Removing synchronized block also didn't give much/any improvement. And it's not worth to tweak something by which we didn't receive any benefit.
The below is the time taken(reading, processing, writing - time taken for processing and writing is not significant - not even 20% of time) for file which is around 50 GB
NIO : 71.67 (Minutes)
IO (BufferedReader) : 10.84 (Minutes)
Thank you all for your time to reading and responding to this post and providing suggestions.

The main issue here is creating a new byte[] very rapidly(fieldBytes = new byte[maxFieldSize];).
Since for every iteration a new array is being created, garbage collection is being kicked off very often which triggers "stop the world" to reclaim the memory.
And also, the object creation could be expensive.
We could rather initialize the byte array once and then track the indexes to just convert the field to string with an end index.
And anyway, BufferedReader is faster than FileChannel, atleast to read the ASCII files, and to keep the code simple, we continued using Bufferred Reader itself.
Using Bufferred reader, the development and testing effort can be reduced by not having tedious logic to find delimiters and populating the object.

How to read the large text files efficiently in java

Here, I am reading the 18 MB file and store it in a two dimensional array. But this program takes almost 15 minutes to run. Is there anyway to optimize the running time of the program. The file contains only binary values. Thanks in advance…
public class test
{
public static void main(String[] args) throws FileNotFoundException, IOException
{
BufferedReader br;
FileReader fr=null;
int m = 2160;
int n = 4320;
int[][] lof = new int[n][m];
String filename = "D:/New Folder/ETOPOCHAR";
try {
Scanner input = new Scanner(new File("D:/New Folder/ETOPOCHAR"));
double range_km=1.0;
double alonn=-57.07; //180 to 180
double alat=38.53;
while (input.hasNextLine()) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
try
{
lof[j][i] = input.nextInt();
System.out.println("value[" + j + "][" + i + "] = "+ lof[j][i]);
}
catch (java.util.NoSuchElementException e) {
// e.printStackTrace();
}
}
} //print the input matrix
}
I have also tried with byte array but i can not save it in twoD array...
public class FileToArrayOfBytes
{
public static void main( String[] args )
{
FileInputStream fileInputStream=null;
File file = new File("name of file");
byte[] bFile = new byte[(int) file.length()];
try {
//convert file into array of bytes
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
for (int i = 0; i < bFile.length; i++) {
System.out.print((char)bFile[i]);
}
System.out.println("Done");
}catch(Exception e){
e.printStackTrace();
}
}
}

You can read the file into a byte array first, then deserialize these bytes. Start with 2048 bytes buffer (as input buffer), then experiment by increasing/decreasing its size, but the experimental buffer size values should be a power of two (512, 1024, 2048, etc).
As far as I rememenber, there are good chances that the best performance can be achived with a buffer of size 2048 bytes, but it is OS dependent and should be verified.
Code sample (here you can try different values of BUFFER_SIZE variable, in my case I've read a test file of size 7.5M in less then one second):
public static void main(String... args) throws IOException {
File f = new File(args[0]);
byte[] buffer = new byte[BUFFER_SIZE];
ByteBuffer result = ByteBuffer.allocateDirect((int) f.length());
try (FileInputStream fos = new FileInputStream(f)) {
int bytesRead;
int totalBytesRead = 0;
while ((bytesRead = fos.read(buffer, 0, BUFFER_SIZE)) != -1) {
result.put(buffer, 0, bytesRead);
totalBytesRead += bytesRead;
}
// debug info
System.out.printf("Read %d bytes\n", totalBytesRead);
// Here you can do whatever you want with the result, including creation of a 2D array...
int pos = result.position();
result.rewind();
for (int i = 0; i < pos / 4; i++) {
System.out.println(result.getInt());
}
}
}
Take your time and read docs for java.io, java.nio packages as well as Scanner class, just to improve understanding.

Calculating network download speed

I have written the follwing code to calculate download speed using java.
But it is not giving correct results.What is the problem?.Is there a problem with my logic , or is it a problem with java networking classes usage?I think it is a problem with the usage of java networking classes.Can anybody tell me what exactly the problem is?
/*Author:Jinu Joseph Daniel*/
import java.io.*;
import java.net.*;
class bwCalc {
static class CalculateBw {
public void calculateUploadBw() {}
public float calculateDownloadRate(int waitTime) throws Exception {
int bufferSize = 1;
byte[] data = new byte[bufferSize]; // buffer
BufferedInputStream in = new BufferedInputStream(new URL("https://www.google.co.in/").openStream());
int count = 0;
long startedAt = System.currentTimeMillis();
long stoppedAt;
float rate;
while (((stoppedAt = System.currentTimeMillis()) - startedAt) < waitTime) {
if ( in .read(data, 0, bufferSize) != -1) {
count++;
} else {
System.out.println("Finished");
break;
}
}
in .close();
rate = 1000 * (((float) count*bufferSize*8 / (stoppedAt - startedAt)) )/(1024*1024);//rate in Mbps
return rate;
}
public float calculateAverageDownloadRate() throws Exception{
int times[] = {100,200,300,400,500};
float bw = 0,curBw;
int i = 0, len = times.length;
while (i < len) {
curBw = calculateDownloadRate(times[i++]);
bw += curBw;
System.out.println("Current rate : "+Float.toString(curBw));
}
bw /= len;
return bw;
}
}
public static void main(String argc[]) throws Exception {
CalculateBw c = new CalculateBw();
System.out.println(Float.toString(c.calculateAverageDownloadRate()));
}
}

There are many problems with your code...
you're not checking how many bytes you are reading
testing with Google's home page is useless, since the content size is very small and most of the download time is related to network latency; you should try downloading a large file (10+ MB) - UNLESS you actually want to measure latency rather than bandwidth, in which case you can simply run ping
you also need to give it more than 500ms if you want to get any relevant result - I'd say at least 5 sec
plenty of code style issues, but those are less important

Here is the code which will calculate the average download rate for you in KBs and MBs per second you can scale them by 8 to get the rate in bits per second.
public static void main(String argc[]) throws Exception {
long totalDownload = 0; // total bytes downloaded
final int BUFFER_SIZE = 1024; // size of the buffer
byte[] data = new byte[BUFFER_SIZE]; // buffer
BufferedInputStream in = new BufferedInputStream(
new URL(
"http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.15/linux-headers-2.6.15-020615_2.6.15-020615_all.deb")
.openStream());
int dataRead = 0; // data read in each try
long startTime = System.nanoTime(); // starting time of download
while ((dataRead = in.read(data, 0, 1024)) > 0) {
totalDownload += dataRead; // adding data downloaded to total data
}
/* download rate in bytes per second */
float bytesPerSec = totalDownload
/ ((System.nanoTime() - startTime) / 1000000000);
System.out.println(bytesPerSec + " Bps");
/* download rate in kilobytes per second */
float kbPerSec = bytesPerSec / (1024);
System.out.println(kbPerSec + " KBps ");
/* download rate in megabytes per second */
float mbPerSec = kbPerSec / (1024);
System.out.println(mbPerSec + " MBps ");
}

How to access the same file in two different places in Java

I want to read from file from two different places concurrently. I also want to use buffered i/o stream for efficiency. I tried to work out sth on my own given java API, but it's not working. Anybody will help? I need it for external merge-sort. Thanks for help!

You need to create a RandomAccessFile, which is basically Java's equivalent of C's memory mapped file.
I found an example of this:
try {
File file = new File("filename");
// Create a read-only memory-mapped file
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, (int)roChannel.size());
// Create a read-write memory-mapped file
FileChannel rwChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, (int)rwChannel.size());
// Create a private (copy-on-write) memory-mapped file.
// Any write to this channel results in a private copy of the data.
FileChannel pvChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer pvBuf = roChannel.map(FileChannel.MapMode.READ_WRITE, 0, (int)rwChannel.size());
} catch (IOException e) {
}
Edit, you stated you can't use a RandomAccessFile, which is the only way to skip up and down through the file. If you're stuck without it, then you must read the file sequentially, but that doesn't mean that you can't open multiple pointers to the same file for reading.
I put together the following test/sample and it shows clearly that you can open the file "twice" with different read pointers and sequentially sum two halves of the file. Again, if you need random access, you must use a RandomAccessFile, and that's what I'd suggest, but here you go:
public class FileTest {
public static void main(String[] args) throws IOException, InterruptedException, ExecutionException{
File temp = File.createTempFile("asfd", "");
BufferedWriter wrt = new BufferedWriter(new FileWriter(temp));
int testLength = 10000;
int numWidth = String.valueOf(testLength).length();
int targetSum = 0;
for(int i = 0; i < testLength; i++){
// each line guaranteed to have a good number of characters for our test
wrt.write(String.format("%0"+ numWidth +"d\n", i));
targetSum += i;
}
wrt.close();
BufferedReader rdr1 = new BufferedReader(new FileReader(temp));
BufferedReader rdr2 = new BufferedReader(new FileReader(temp));
rdr2.skip((numWidth+1)*testLength / 2); // skip first half of the lines
Summer sum1 = new Summer(rdr1, testLength / 2);
Summer sum2 = new Summer(rdr2, testLength / 2);
ExecutorService executor = Executors.newFixedThreadPool(2);
Future<Integer> halfSum1 = executor.submit(sum1);
Future<Integer> halfSum2 = executor.submit(sum2);
System.out.println("Total sum = " + (halfSum1.get() + halfSum2.get()) + " reference " + targetSum);
rdr1.close();
rdr2.close();
temp.delete();
}
private static class Summer implements Callable<Integer>{
private BufferedReader rdr;
private int limit;
public Summer(BufferedReader rdr, int limit) throws IOException{
this.rdr = rdr;
this.limit = limit;
}
#Override
public Integer call() throws Exception {
System.out.println(Thread.currentThread().getName() + " started " + System.currentTimeMillis());
int sum = 0;
for(int i = 0; i < limit; i++){
sum += Integer.valueOf(rdr.readLine());
// uncomment to see interleaving of threads:
//System.out.println(Thread.currentThread().getName());
}
System.out.println(Thread.currentThread().getName() + " finished " + System.currentTimeMillis());
return sum;
}
}
}

What's to stop you from simply opening the file twice, and working with it as if it were two independent files?
File inputFile = new File("src/SameFileTwice.java");
BufferedReader in1 = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile)));
BufferedReader in2 = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile)));
try {
String strLine;
while ((strLine = in1.readLine()) != null && (strLine = in2.readLine()) != null) {
System.out.println(strLine);
}
} finally {
in1.close();
in2.close();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Create large text file with random numbers? - java

You never clean your StringBuilder, and it keeps accumulating all the random number strings you have stored. Just after you write do a clear().

Related

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

Java FileChannel Vs BufferedReader - Spring Batch - Reader

How to read the large text files efficiently in java

Calculating network download speed

How to access the same file in two different places in Java

Categories

Resources