Split large file into chunks

Split large file into chunks - java

I have a method which accept file and size of chunks and return list of chunked files. But the main problem that my line in file could be broken, for example in main file I have next lines:
|1|aaa|bbb|ccc|
|2|ggg|ddd|eee|
After split I could have in one file:
|1|aaa|bbb
In another file:
|ccc|2|
|ggg|ddd|eee|
Here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
byte[] buffer = new byte[sizeOfChunk];
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
String name = file.getName();
int tmp = 0;
while ((tmp = bis.read(buffer)) > 0) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, tmp);
}
files.add(newFile);
}
}
return files;
}
Should I use RandomAccessFile class for above purposes (main file is really big - more then 5 Gb)?

If you don't mind to have chunks of different lengths (<=sizeOfChunk but closest to it) then here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<File>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
String eof = System.lineSeparator();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String name = file.getName();
String line = br.readLine();
while (line != null) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(newFile))) {
int fileSize = 0;
while (line != null) {
byte[] bytes = (line + eof).getBytes(Charset.defaultCharset());
if (fileSize + bytes.length > sizeOfChunk)
break;
out.write(bytes);
fileSize += bytes.length;
line = br.readLine();
}
}
files.add(newFile);
}
}
return files;
}
The only problem here is file charset which is default system charset in this example. If you want to be able to change it let me know. I'll add third parameter to "splitFile" function for it.

Just in case anyone is interested in a Kotlin version.
It creates an iterator of ByteArray chunks:
class ByteArrayReader(val input: InputStream, val chunkSize: Int, val bufferSize: Int = 1024*8): Iterator<ByteArray> {
var eof: Boolean = false
init {
if ((chunkSize % bufferSize) != 0) {
throw RuntimeException("ChunkSize(${chunkSize}) should be a multiple of bufferSize (${bufferSize})")
}
}
override fun hasNext(): Boolean = !eof
override fun next(): ByteArray {
var buffer = ByteArray(bufferSize)
var chunkWriter = ByteArrayOutputStream(chunkSize) // no need to close - implementation is empty
var bytesRead = 0
var offset = 0
while (input.read(buffer).also { bytesRead = it } > 0) {
if (chunkWriter.use { out ->
out.write(buffer, 0, bytesRead)
out.flush()
offset += bytesRead
offset == chunkSize
}) {
return chunkWriter.toByteArray()
}
}
eof = true
return chunkWriter.toByteArray()
}
}

Split a file to multiple chunks (in memory operation), here I'm splitting any file to a size of 500kb(500000 bytes) and adding to a list :
public static List<ByteArrayOutputStream> splitFile(File f) {
List<ByteArrayOutputStream> datalist = new ArrayList<>();
try {
int sizeOfFiles = 500000;
byte[] buffer = new byte[sizeOfFiles];
try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
try (OutputStream out = new ByteArrayOutputStream()) {
out.write(buffer, 0, bytesAmount);
out.flush();
datalist.add((ByteArrayOutputStream) out);
}
}
}
} catch (Exception e) {
//get the error
}
return datalist;
}

Split files in chunks depending upon your chunk size
val f = FileInputStream(file)
val data = ByteArray(f.available()) // Size of original file
var subData: ByteArray
f.read(data)
var start = 0
var end = CHUNK_SIZE
val max = data.size
if (max > 0) {
while (end < max) {
subData = data.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
//Function to upload your chunk
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = data.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}
If you are taking the file from the user through intent you may get file URI as content, so in that case.
Uri uri = data.getData();
InputStream inputStream = getContext().getContentResolver().openInputStream(uri);
fileInBytes = IOUtils.toByteArray(inputStream);
Add the dependency in you build gradle to use IOUtils
compile 'commons-io:commons-io:2.11.0'
Now do a little modification in the above code to send your file to server.
var subData: ByteArray
var start = 0
var end = CHUNK_SIZE
val max = fileInBytes.size
if (max > 0) {
while (end < max) {
subData = fileInBytes.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = fileInBytes.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}

Related

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

So I've been trying to read the content of a text file and write the content chunk by chunk alternately into e.g. 2 new files.
I already tried multiple ways to do that but it won't work (OutputStream and FileOutputStream seems to be the most suitable).
Before i tried to part the file in e.g. 3 Parts and wrote the first part in one file, the second part in another and so on. Which worked perfectly fine with OutputStream and FileOutputStream.
But it won't work when i want to do it alternately.
To do it alternately i use the round robin algorithm, which on its own works fine.
I would be really thankful if you could show me some examples to do it!
public void splitFile(String filePath, int numberOfParts, long sizeOfParts[]) throws FileNotFoundException, IOException, SQLException {
long bytes = 8;
OutputStream partsPath[] = new OutputStream[numberOfParts];
long bytePositition[] = new long[numberOfParts];
long copy_size[] = new long[numberOfParts];
for (int i = 0; i < numberOfParts; i++) {
copy_size[i] = sizeOfParts[i];
partsPath[i] = new FileOutputStream(path); //Gets Path from my Database (works)
//System.out.println(cloudsTable.getCloudsPathsFromDatabase(i) + '\\' + name + (i + 1) + fileType);
}
InputStream file = new FileInputStream(filePath);
while (true) {
boolean done = true;
for (int i = 0; i < numberOfParts; i++) {
if (copy_size[i] > 0) {
done = false;
if (copy_size[i] > bytes) {
copy_size[i] -= bytes;
bytePositition[i] += bytes;
System.out.println("file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
} else {
bytePositition[i] += copy_size[i];
System.out.println("rest file " + i + " " + bytePositition[i]);
readWrite(file, bytePositition[i], partsPath[i]);
copy_size[i] = 0;
}
}
}
if (done == true) {
break;
}
}
file.close();
for (int i = 0; i < partsPath.length; i++) {
partsPath[i].close();
}
}
private void readWrite(InputStream file, long bytes, OutputStream path) throws IOException {
byte[] buf = new byte[(int) bytes];
while (file.read(buf) != -1) {
path.write(buf);
path.flush();
}
}
What the code does is, it only write the content of the Originalfile in the first-copied file and the following files are empty
EDIT:
To clarify what the code should do is write the first 8 bytes to go to file 1, second 8 bytes to go to file 2, third 8 bytes to go to file 3, fourth 8 bytes to go to file 1, and so on, round robin, until file 1 is sizeOfParts[0] long, file 2 is sizeOfParts[1] long, and file 3 is sizeOfParts[2] long.

The main problem is that the readWrite() method is only supposed to copy one 8-byte block of bytes, but has a loop that makes it copy all the remaining bytes in the input file.
In addition, the code should be enhanced to use try-finally to close the files, and to correctly handle end-of-file, in case the input file is shorter than the sum of parts.
I would eliminate the readWrite() method, and consolidate the logic to prevent duplicate code, like this:
public void splitFile(String inPath, long[] sizeOfParts) throws IOException, SQLException {
final int numberOfParts = sizeOfParts.length;
String[] outPath = new String[numberOfParts];
// Gets Paths from Database here
InputStream in = null;
OutputStream[] out = new OutputStream[numberOfParts];
try {
in = new BufferedInputStream(new FileInputStream(inPath));
for (int part = 0; part < numberOfParts; part++)
out[part] = new BufferedOutputStream(new FileOutputStream(outPath[part]));
byte[] buf = new byte[8];
long[] remain = sizeOfParts.clone();
for (boolean done = false; ! done; ) {
done = true;
for (int part = 0; part < numberOfParts; part++) {
if (remain[part] > 0) {
int len = in.read(buf, 0, (int) Math.min(remain[part], buf.length));
if (len == -1) {
done = true;
break;
}
remain[part] -= len;
System.out.println("file " + part + " " + (sizeOfParts[part] - remain[part]));
out[part].write(buf, 0, len);
done = false;
}
}
}
} finally {
if (in != null)
in.close();
for (int part = 0; part < out.length; part++)
if (out[part] != null)
out[part].close();
}
}

Kotlin gzip uncompress fail

I try to simplify my java gzip uncompress code to kotlin. But after I changed, it sames broken.
Here is the java code
public static byte[] uncompress(byte[] compressedBytes) {
if (null == compressedBytes || compressedBytes.length == 0) {
return null;
}
ByteArrayOutputStream out = null;
ByteArrayInputStream in = null;
GZIPInputStream gzipInputStream = null;
try {
out = new ByteArrayOutputStream();
in = new ByteArrayInputStream(compressedBytes);
gzipInputStream = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n = 0;
while ((n = gzipInputStream.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toByteArray();
} catch (IOException ignore) {
} finally {
CloseableUtils.closeQuietly(gzipInputStream);
CloseableUtils.closeQuietly(in);
CloseableUtils.closeQuietly(out);
}
return null;
}
This is my kotlin code.
payload = GZIPInputStream(payload.inputStream())
.bufferedReader()
.use { it.readText() }
.toByteArray()
And I got this error.
com.google.protobuf.nano.InvalidProtocolBufferNanoException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
It seems that the decompression process was interrupted by reader?

The readText(charset: Charset = Charsets.UTF_8) decodes the bytes into UTF-8 character set, which is why it says "This could mean either than the input has been truncated" it probably have tried to convert 8-bits into a Char and build a String out of it.
Use the readBytes() to get ByteArray which is represented same as byte[] in JVM platform.
Example:
GZIPInputStream(payload.inputStream())
.bufferedReader()
.use { it.readBytes() }
Edit:
For reading bytes, you shouldn't be using the Reader, it is meant for reading the Text in UTF-8 format as defined in the Kotlin's InputStream.bufferedReader:
public inline fun InputStream.bufferedReader(charset: Charset = Charsets.UTF_8): BufferedReader = reader(charset).buffered()
The InputStream.readBytes() will read the bytes at a buffer of 8KB itself.
public fun InputStream.readBytes(): ByteArray {
val buffer = ByteArrayOutputStream(maxOf(DEFAULT_BUFFER_SIZE, this.available()))
copyTo(buffer)
return buffer.toByteArray()
}
// This copies with 8KB buffer automatically
// DEFAULT_BUFFER_SIZE = 8 * 1024
public fun InputStream.copyTo(out: OutputStream, bufferSize: Int = DEFAULT_BUFFER_SIZE): Long {
var bytesCopied: Long = 0
val buffer = ByteArray(bufferSize)
var bytes = read(buffer)
while (bytes >= 0) {
out.write(buffer, 0, bytes)
bytesCopied += bytes
bytes = read(buffer)
}
return bytesCopied
}
So you just have to do:
GZIPInputStream(payload.inputStream()).use { it.readBytes() }

use the following function:
fun File.unzip(unzipLocationRoot: File? = null) {
val rootFolder = unzipLocationRoot
?: File(parentFile.absolutePath + File.separator + nameWithoutExtension)
if (!rootFolder.exists()) {
rootFolder.mkdirs()
}
ZipFile(this).use { zip ->
zip
.entries()
.asSequence()
.map {
val outputFile = File(rootFolder.absolutePath + File.separator + it.name)
ZipIO(it, outputFile)
}
.map {
it.output.parentFile?.run {
if (!exists()) mkdirs()
}
it
}
.filter { !it.entry.isDirectory }
.forEach { (entry, output) ->
zip.getInputStream(entry).use { input ->
output.outputStream().use { output ->
input.copyTo(output)
}
}
}
}
}
Pass the file as a parameter as follow:
val zipFile = File(your file directory, your file name)
zipFile.unzip()
Hope this would help 🙏🏼

Android Socket TCP Dataloss

I am unable to transmit an entire file using WiFi-Direct. The file sender is indicating that the entire file has been copied over to the socket output stream. The file receiver is only receiving roughly half of the file.
I looked at the contents of both the original file and the contents of the file storing the received data, and found the receiver is only receiving pieces of the original file. For example, it would receive bytes 0-100, and then it would jump to byte 245-350.
Why is the receiver only receiving bits and pieces of the file, rather than the entire file?
File Receiving Logic
private class FileReceiveThread(val channel: Channel) : TransmissionThread() {
private var mFileName: String = ""
private var mFileSize: Long = 0L
private var mBytesReceivedTotal = 0L
override fun run() {
try {
Timber.d("File receive thread running: fileSize=$mFileSize, fileName=$mFileName")
val outputFile = File.createTempFile("file", "")
val fileOutput = outputFile.outputStream()
val channelInput = channel.getInputStream().unwrap()
val inputBuffer = ByteArray(FILE_TX_BUFFER_SIZE)
var bytesReceived = channelInput.read(inputBuffer)
while (bytesReceived > 0) {
fileOutput.write(inputBuffer)
mBytesReceivedTotal += bytesReceived
Timber.d("Received $mBytesReceivedTotal total bytes")
bytesReceived = channelInput.read(inputBuffer)
}
onTransmitComplete?.invoke()
} catch (e: Exception) {
e.printStackTrace()
}
}
fun start(filename: String, size: Long) {
mFileName = filename
mFileSize = size
start()
}
}
File Sending Logic
private class FileSendThread : TransmissionThread() {
var mFile: File? = null
var mOutputStream: OutputStream? = null
override fun run() {
if (mFile != null && mOutputStream != null) {
val inputStream = mFile!!.inputStream()
val channelStream = mOutputStream!!
val buffer = ByteArray(FILE_TX_BUFFER_SIZE)
var bytesRead = inputStream.read(buffer)
var totalBytesRead = 0L + bytesRead
while (bytesRead > 0) {
Timber.v("Read $bytesRead, total $totalBytesRead")
channelStream.write(buffer)
bytesRead = inputStream.read(buffer)
totalBytesRead += bytesRead
}
Timber.d("Wrote file to output stream")
inputStream.close()
Timber.d("No more data to send")
onTransmitComplete?.invoke()
} else Timber.d("Parameters null: file=$mFile")
}
fun start(file: File, stream: OutputStream) {
mFile = file
mOutputStream = stream
start()
}
}

while (inputStream.read(buffer) > 0) {
channelStream.write(buffer)
}
The read() will often not fill the complete buffer. Hence if you write the buffer then only as far as it is filled.
var totalbytesread = 0;
var nread;
while ((nread = inputStream.read(buffer)) > 0) {
channelStream.write(buffer, 0, nread)
totalbytesread += nread;
}
channelStream.close()';
Log the totalbytesread.
Your original code would have caused a bigger received file so there is something else to be discovered..

Read large file error "outofmemoryerror"(java)

sorry for my english. I want to read a large file, but when I read error occurs outOfMemoryError. I do not understand how to work with memory in the application. The following code does not work:
try {
StringBuilder fileData = new StringBuilder(1000);
BufferedReader reader = new BufferedReader(new FileReader(file));
char[] buf = new char[8192];
int bytesread = 0,
bytesBuffered = 0;
while( (bytesread = reader.read( buf )) > -1 ) {
String readData = String.valueOf(buf, 0, bytesread);
bytesBuffered += bytesread;
fileData.append(readData); //this is error
if (bytesBuffered > 1024 * 1024) {
bytesBuffered = 0;
}
}
System.out.println(fileData.toString().toCharArray());
} finally {
}

You need pre allocate a large buffer to avoid reallocate.
File file = ...;
StringBuilder fileData = new StringBuilder(file.size());
And running with large heap size:
java -Xmx2G
==== update
A while loop using buffer doesn't need too memory to run. Treat input like a stream, match your search string with the stream. It's a really simple state machine. If you need search multiple words, you can find a TrieTree implementation(support stream) for that.
// the match state model
...xxxxxxabxxxxxaxxxxxabcdexxxx...
ab a abcd
File file = new File("path_to_your_file");
String yourSearchWord = "abcd";
int matchIndex = 0;
boolean matchPrefix = false;
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
int chr;
while ((chr = reader.read()) != -1) {
if (matchPrefix == false) {
char searchChar = yourSearchWord.charAt(0);
if (chr == searchChar) {
matchPrefix = true;
matchIndex = 0;
}
} else {
char searchChar = yourSearchWord.charAt(++matchIndex);
if (chr == searchChar) {
if (matchIndex == yourSearchWord.length() - 1) {
// match!!
System.out.println("match: " + matchIndex);
matchPrefix = false;
matchIndex = 0;
}
} else {
matchPrefix = false;
matchIndex = 0;
}
}
}
}

Try this. This might be helpful :-
try{
BufferedReader reader = new BufferedReader(new FileReader(file));
String txt = "";
while( (txt = reader.read()) != null){
System.out.println(txt);
}
}catch(Exception e){
System.out.println("Error : "+e.getMessage());
}

You should not hold such big files in memory, because you run out of it, as you see. Since you use Java 7, you need to read the file manually as stream and check the content on the fly. Otherwise you could use the stream API of Java 8. This is just an example. It works, but keep in mind, that the position of the found word could vary due to encoding issues, so this is no production code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileReader
{
private static String wordToFind = "SEARCHED_WORD";
private static File file = new File("YOUR_FILE");
private static int currentMatchingPosition;
private static int foundAtPosition = -1;
private static int charsRead;
public static void main(String[] args) throws IOException
{
try (FileInputStream fis = new FileInputStream(file))
{
System.out.println("Total size to read (in bytes) : " + fis.available());
int c;
while ((c = fis.read()) != -1)
{
charsRead++;
checkContent(c);
}
if (foundAtPosition > -1)
{
System.out.println("Found word at position: " + (foundAtPosition - wordToFind.length()));
}
else
{
System.out.println("Didnt't find the word!");
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
private static void checkContent(int c)
{
if (currentMatchingPosition >= wordToFind.length())
{
//already found....
return;
}
if (wordToFind.charAt(currentMatchingPosition) == (char)c)
{
foundAtPosition = charsRead;
currentMatchingPosition++;
}
else
{
currentMatchingPosition = 0;
foundAtPosition = -1;
}
}
}

Java - Read file and split into multiple files

I have a file which I would like to read in Java and split this file into n (user input) output files. Here is how I read the file:
int n = 4;
BufferedReader br = new BufferedReader(new FileReader("file.csv"));
try {
String line = br.readLine();
while (line != null) {
line = br.readLine();
}
} finally {
br.close();
}
How do I split the file - file.csv into n files?
Note - Since the number of entries in the file are of the order of 100k, I can't store the file content into an array and then split it and save into multiple files.

Since one file can be very large, each split file could be large as well.
Example:
Source File Size: 5GB
Num Splits: 5: Destination
File Size: 1GB each (5 files)
There is no way to read this large split chunk in one go, even if we have such a memory. Basically for each split we can read a fix size byte-array which we know should be feasible in terms of performance as well memory.
NumSplits: 10 MaxReadBytes: 8KB
public static void main(String[] args) throws Exception
{
RandomAccessFile raf = new RandomAccessFile("test.csv", "r");
long numSplits = 10; //from user input, extract it from args
long sourceSize = raf.length();
long bytesPerSplit = sourceSize/numSplits ;
long remainingBytes = sourceSize % numSplits;
int maxReadBufferSize = 8 * 1024; //8KB
for(int destIx=1; destIx <= numSplits; destIx++) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx));
if(bytesPerSplit > maxReadBufferSize) {
long numReads = bytesPerSplit/maxReadBufferSize;
long numRemainingRead = bytesPerSplit % maxReadBufferSize;
for(int i=0; i<numReads; i++) {
readWrite(raf, bw, maxReadBufferSize);
}
if(numRemainingRead > 0) {
readWrite(raf, bw, numRemainingRead);
}
}else {
readWrite(raf, bw, bytesPerSplit);
}
bw.close();
}
if(remainingBytes > 0) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
readWrite(raf, bw, remainingBytes);
bw.close();
}
raf.close();
}
static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
byte[] buf = new byte[(int) numBytes];
int val = raf.read(buf);
if(val != -1) {
bw.write(buf);
}
}

import java.io.*;
import java.util.Scanner;
public class split {
public static void main(String args[])
{
try{
// Reading file and getting no. of files to be generated
String inputfile = "C:/test.txt"; // Source File Name.
double nol = 2000.0; // No. of lines to be split and saved in each output file.
File file = new File(inputfile);
Scanner scanner = new Scanner(file);
int count = 0;
while (scanner.hasNextLine())
{
scanner.nextLine();
count++;
}
System.out.println("Lines in the file: " + count); // Displays no. of lines in the input file.
double temp = (count/nol);
int temp1=(int)temp;
int nof=0;
if(temp1==temp)
{
nof=temp1;
}
else
{
nof=temp1+1;
}
System.out.println("No. of files to be generated :"+nof); // Displays no. of files to be generated.
//---------------------------------------------------------------------------------------------------------
// Actual splitting of file into smaller files
FileInputStream fstream = new FileInputStream(inputfile); DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine;
for (int j=1;j<=nof;j++)
{
FileWriter fstream1 = new FileWriter("C:/New Folder/File"+j+".txt"); // Destination File Location
BufferedWriter out = new BufferedWriter(fstream1);
for (int i=1;i<=nol;i++)
{
strLine = br.readLine();
if (strLine!= null)
{
out.write(strLine);
if(i!=nol)
{
out.newLine();
}
}
}
out.close();
}
in.close();
}catch (Exception e)
{
System.err.println("Error: " + e.getMessage());
}
}
}

Though its a old question but for reference I am listing out the code which I used to split large files to any sizes and it works with any Java versions above 1.4 .
Sample Split and Join blocks were like below:
public void join(String FilePath) {
long leninfile = 0, leng = 0;
int count = 1, data = 0;
try {
File filename = new File(FilePath);
//RandomAccessFile outfile = new RandomAccessFile(filename,"rw");
OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
while (true) {
filename = new File(FilePath + count + ".sp");
if (filename.exists()) {
//RandomAccessFile infile = new RandomAccessFile(filename,"r");
InputStream infile = new BufferedInputStream(new FileInputStream(filename));
data = infile.read();
while (data != -1) {
outfile.write(data);
data = infile.read();
}
leng++;
infile.close();
count++;
} else {
break;
}
}
outfile.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public void split(String FilePath, long splitlen) {
long leninfile = 0, leng = 0;
int count = 1, data;
try {
File filename = new File(FilePath);
//RandomAccessFile infile = new RandomAccessFile(filename, "r");
InputStream infile = new BufferedInputStream(new FileInputStream(filename));
data = infile.read();
while (data != -1) {
filename = new File(FilePath + count + ".sp");
//RandomAccessFile outfile = new RandomAccessFile(filename, "rw");
OutputStream outfile = new BufferedOutputStream(new FileOutputStream(filename));
while (data != -1 && leng < splitlen) {
outfile.write(data);
leng++;
data = infile.read();
}
leninfile += leng;
leng = 0;
outfile.close();
count++;
}
} catch (Exception e) {
e.printStackTrace();
}
}
Complete java code available here in File Split in Java Program link.

a clean solution to edit.
this solution involves loading the entire file into memory.
set all line of a file in List<String> rowsOfFile;
edit maxSizeFile to choice max size of a single file splitted
public void splitFile(File fileToSplit) throws IOException {
long maxSizeFile = 10000000 // 10mb
StringBuilder buffer = new StringBuilder((int) maxSizeFile);
int sizeOfRows = 0;
int recurrence = 0;
String fileName;
List<String> rowsOfFile;
rowsOfFile = Files.readAllLines(fileToSplit.toPath(), Charset.defaultCharset());
for (String row : rowsOfFile) {
buffer.append(row);
numOfRow++;
sizeOfRows += row.getBytes(StandardCharsets.UTF_8).length;
if (sizeOfRows >= maxSizeFile) {
fileName = generateFileName(recurrence);
File newFile = new File(fileName);
try (PrintWriter writer = new PrintWriter(newFile)) {
writer.println(buffer.toString());
}
recurrence++;
sizeOfRows = 0;
buffer = new StringBuilder();
}
}
// last rows
if (sizeOfRows > 0) {
fileName = generateFileName(recurrence);
File newFile = createFile(fileName);
try (PrintWriter writer = new PrintWriter(newFile)) {
writer.println(buffer.toString());
}
}
Files.delete(fileToSplit.toPath());
}
method to generate Name of file:
public String generateFileName(int numFile) {
String extension = ".txt";
return "myFile" + numFile + extension;
}

Have a counter to count no of entries. Let's say one entry per line.
step1: Initially create new subfile, set counter=0;
step2: increment counter as you read each entry from source file to buffer
step3: when counter reaches limit to number of entries that you want to write in each sub file, flush contents of buffer to subfile. close the subfile
step4 : jump to step1 till you have data in source file to read from

There's no need to loop twice through the file. You could estimate the size of each chunk as the source file size divided by number of chunks needed. Then you just stop filling each cunk with data as it's size exceeds estimated.

Here is one that worked for me and I used it to split 10GB file. it also enables you to add a header and a footer. very useful when splitting document based format such as XML and JSON because you need to add document wrapper in the new split files.
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class FileSpliter
{
public static void main(String[] args) throws IOException
{
splitTextFiles("D:\\xref.csx", 750000, "", "", null);
}
public static void splitTextFiles(String fileName, int maxRows, String header, String footer, String targetDir) throws IOException
{
File bigFile = new File(fileName);
int i = 1;
String ext = fileName.substring(fileName.lastIndexOf("."));
String fileNoExt = bigFile.getName().replace(ext, "");
File newDir = null;
if(targetDir != null)
{
newDir = new File(targetDir);
}
else
{
newDir = new File(bigFile.getParent() + "\\" + fileNoExt + "_split");
}
newDir.mkdirs();
try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName)))
{
String line = null;
int lineNum = 1;
Path splitFile = Paths.get(newDir.getPath() + "\\" + fileNoExt + "_" + String.format("%02d", i) + ext);
BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
while ((line = reader.readLine()) != null)
{
if(lineNum == 1)
{
System.out.print("new file created '" + splitFile.toString());
if(header != null && header.length() > 0)
{
writer.append(header);
writer.newLine();
}
}
writer.append(line);
if (lineNum >= maxRows)
{
if(footer != null && footer.length() > 0)
{
writer.newLine();
writer.append(footer);
}
writer.close();
System.out.println(", " + lineNum + " lines written to file");
lineNum = 1;
i++;
splitFile = Paths.get(newDir.getPath() + "\\" + fileNoExt + "_" + String.format("%02d", i) + ext);
writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
}
else
{
writer.newLine();
lineNum++;
}
}
if(lineNum <= maxRows) // early exit
{
if(footer != null && footer.length() > 0)
{
writer.newLine();
lineNum++;
writer.append(footer);
}
}
writer.close();
System.out.println(", " + lineNum + " lines written to file");
}
System.out.println("file '" + bigFile.getName() + "' split into " + i + " files");
}
}

Below code used to split a big file into small files with lesser lines.
long linesWritten = 0;
int count = 1;
try {
File inputFile = new File(inputFilePath);
InputStream inputFileStream = new BufferedInputStream(new FileInputStream(inputFile));
BufferedReader reader = new BufferedReader(new InputStreamReader(inputFileStream));
String line = reader.readLine();
String fileName = inputFile.getName();
String outfileName = outputFolderPath + "\\" + fileName;
while (line != null) {
File outFile = new File(outfileName + "_" + count + ".split");
Writer writer = new OutputStreamWriter(new FileOutputStream(outFile));
while (line != null && linesWritten < linesPerSplit) {
writer.write(line);
line = reader.readLine();
linesWritten++;
}
writer.close();
linesWritten = 0;//next file
count++;//nect file count
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}

Split a file to multiple chunks (in memory operation), here I'm splitting any file to a size of 500kb(500000 bytes) :
public static List<ByteArrayOutputStream> splitFile(File f) {
List<ByteArrayOutputStream> datalist = new ArrayList<>();
try {
int sizeOfFiles = 500000;
byte[] buffer = new byte[sizeOfFiles];
try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
try (OutputStream out = new ByteArrayOutputStream()) {
out.write(buffer, 0, bytesAmount);
out.flush();
datalist.add((ByteArrayOutputStream) out);
}
}
}
} catch (Exception e) {
//get the error
}
return datalist; }

I am a bit late to answer, But here's how I did it:
Approach:
First I determine how many bytes each of the individual files should contain then I split the large file by bytes. Only one file chunk worth of data is loaded into memory at a time.
Example:- if a 5 GB file is split into 10 files then only 500MB worth of bytes are loaded into memory at a time which are held in the buffer variable in the splitBySize method below.
Code Explaination:
The method splitFile first gets the number of bytes each of the individual file chunks should contain by calling the getSizeInBytes method, then it calls the splitBySize method which splits the large file by size (i..e maxChunkSize represents the number of bytes each of file chunks will contain).
public static List<File> splitFile(File largeFile, int noOfFiles) throws IOException {
return splitBySize(largeFile, getSizeInBytes(largeFile.length(), noOfFiles));
}
public static List<File> splitBySize(File largeFile, int maxChunkSize) throws IOException {
List<File> list = new ArrayList<>();
int numberOfFiles = 0;
try (InputStream in = Files.newInputStream(largeFile.toPath())) {
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
numberOfFiles++;
dataRead = in.read(buffer);
}
}
System.out.println("Number of files generated: " + numberOfFiles);
return list;
}
private static int getSizeInBytes(long totalBytes, int numberOfFiles) {
if (totalBytes % numberOfFiles != 0) {
totalBytes = ((totalBytes / numberOfFiles) + 1)*numberOfFiles;
}
long x = totalBytes / numberOfFiles;
if (x > Integer.MAX_VALUE){
throw new NumberFormatException("Byte chunk too large");
}
return (int) x;
}
Full Code:
public class StackOverflow {
private static final String INPUT_FILE_PATH = "/Users/malkesingh/Downloads/5MB.zip";
private static final String TEMP_DIRECTORY = "/Users/malkesingh/temp";
public static void main(String[] args) throws IOException {
File input = new File(INPUT_FILE_PATH);
File outPut = fileJoin2(splitFile(input, 5));
try (InputStream in = Files.newInputStream(input.toPath()); InputStream out = Files.newInputStream(outPut.toPath())) {
System.out.println(IOUtils.contentEquals(in, out));
}
}
public static List<File> splitFile(File largeFile, int noOfFiles) throws IOException {
return splitBySize(largeFile, getSizeInBytes(largeFile.length(), noOfFiles));
}
public static List<File> splitBySize(File largeFile, int maxChunkSize) throws IOException {
List<File> list = new ArrayList<>();
int numberOfFiles = 0;
try (InputStream in = Files.newInputStream(largeFile.toPath())) {
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
numberOfFiles++;
dataRead = in.read(buffer);
}
}
System.out.println("Number of files generated: " + numberOfFiles);
return list;
}
private static int getSizeInBytes(long totalBytes, int numberOfFiles) {
if (totalBytes % numberOfFiles != 0) {
totalBytes = ((totalBytes / numberOfFiles) + 1)*numberOfFiles;
}
long x = totalBytes / numberOfFiles;
if (x > Integer.MAX_VALUE){
throw new NumberFormatException("Byte chunk too large");
}
return (int) x;
}
private static File stageLocally(byte[] buffer, int length) throws IOException {
File outPutFile = File.createTempFile("temp-", "split", new File(TEMP_DIRECTORY));
try(FileOutputStream fos = new FileOutputStream(outPutFile)) {
fos.write(buffer, 0, length);
}
return outPutFile;
}
public static File fileJoin2(List<File> list) throws IOException {
File outPutFile = File.createTempFile("temp-", "unsplit", new File(TEMP_DIRECTORY));
FileOutputStream fos = new FileOutputStream(outPutFile);
for (File file : list) {
Files.copy(file.toPath(), fos);
}
fos.close();
return outPutFile;
}}

import java.util.*;
import java.io.*;
public class task13 {
public static void main(String[] args)throws IOException{
Scanner s =new Scanner(System.in);
System.out.print("Enter path:");
String a=s.next();
File f=new File(a+".txt");
Scanner st=new Scanner(f);
System.out.println(f.canRead()+"\n"+f.canWrite());
long l=f.length();
System.out.println("Length is:"+l);
System.out.print("Enter no.of partitions:");
int p=s.nextInt();
long x=l/p;
st.useDelimiter("\\Z");
String t=st.next();
int j=0;
System.out.println("Each File Length is:"+x);
for(int i=1;i<=p;i++){
File ft=new File(a+"-"+i+".txt");
ft.createNewFile();
int g=(j*(int)x);
int h=(j+1)*(int)x;
if(g<=l&&h<=l){
FileWriter fw=new FileWriter(a+"-"+i+".txt");
String v=t.substring(g,h);
fw.write(v);
j++;
fw.close();
}}
}}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split large file into chunks - java

Related

Java - Write content from one file chunk by chunk (e.g. 8 Bytes) alternately into multiple files

Kotlin gzip uncompress fail

Android Socket TCP Dataloss

Read large file error "outofmemoryerror"(java)

Java - Read file and split into multiple files

Categories

Resources