I have a boolean method of files comparison. It get's part of bb and check out on equal.
If parts equal - get next block. If position (point) > file size and all blocks are equal - return true.
Works on small files (10MB), but have troubles on big one.
private static boolean getFiles(File file1, File file2) throws IOException {
FileChannel channel1 = new FileInputStream(file1).getChannel();
FileChannel channel2 = new FileInputStream(file2).getChannel();
int SIZE;
MappedByteBuffer buffer1, buffer2;
for (int point = 0; point < channel1.size(); point += SIZE) {
SIZE = (int) Math.min((4096*1024), channel1.size() - point);
buffer1 = channel1.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
buffer2 = channel2.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
if (!buffer1.equals(buffer2)) {
return false;
}
}
return true;
}
How can I modify it? Change the size of blocks?
if file2 is smaller than file1 you will get an error when trying to read data after the end of the file2, at this line:
buffer2 = channel2.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
Apart from the few corner cases that you missed, I using a Direct Allocated Byte Buffer is supposed to be faster than your method :)
public static void main (String [] args) throws IOException {
final File file1 = new File(args[0]);
final File file2 = new File(args[1]);
//check if the files exist and are not blank
if(!file1.exists() || !file2.exists() ||
file1.length() == 0 || file2.length() == 0) {
System.out.println("ILLEGAL FILES");
return;
}
//if the length of the files is not same they are obviously not the same files
if(file1.length() != file2.length()) {
System.out.println("DIFFERENT SIZE");
return;
}
final FileChannel channel1 = new FileInputStream(file1).getChannel();
final FileChannel channel2 = new FileInputStream(file2).getChannel();
//DirectByteBuffers for faster IO
final ByteBuffer byteBuffer1 = ByteBuffer.allocateDirect(128 * 1024);
final ByteBuffer byteBuffer2 = ByteBuffer.allocateDirect(128 * 1024);
System.out.println("Starting Compare");
while(true) {
int read1, read2 =0;
read1 = channel1.read(byteBuffer1);
if(read1 == -1) break;
while (read2 < read1 && read2 >= 0) {
read2 += (channel2.read(byteBuffer2));
}
byteBuffer1.flip();byteBuffer2.flip();
if(byteBuffer1.compareTo(byteBuffer2) != 0) {
System.out.println("NOT SAME");
return;
}
byteBuffer1.clear();
byteBuffer2.clear();
}
System.out.println("SAME :)");
return;
}
Related
Is there any ways to compare two files in Android?
For example: I am having two files under same folder, which are same.
They are same(also in size), but their namea are like
myFileA.pdf and myFileB.pdf. So how can I identify that they are
same or not.
What already I had tried:
compareTo() method: Tried myFileA.compare(myFileB), but that's giving some weird values like -1, -2, etc. I think those values are files' PATH dependent.
myFile.length(): but in some rare cases (very rare cases), two different files can have same size, so I think this is not a proper way.
NOTE: I told that files are under same folder for just example, they can be anywhere like myFileA.pdf can be in
NewFolder1 and myFileB.pdf can be in NewFolder2.
Some times ago I've written an utility to compare the content of two stream in an efficient way: stop the comparison when the first difference is found.
Here is the code that I think it's quite self explicable:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.Comparator;
public class FileComparator implements Comparator<File> {
#Override
public int compare(File file1, File file2) {
// one or both null
if (file1 == file2) {
return 0;
} else if (file1 == null && file2 != null) {
return -1;
} else if (file1 != null && file2 == null) {
return 1;
}
if (file1.isDirectory() || file2.isDirectory()) {
throw new IllegalArgumentException("Unable to compare directory content");
}
// not same size
if (file1.length() < file2.length()) {
return -1;
} else if (file1.length() > file2.length()) {
return 1;
}
try {
return compareContent(file1, file2);
} catch (IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}
private int bufferSize(long fileLength) {
int multiple = (int) (fileLength / 1024);
if (multiple <= 1) {
return 1024;
} else if (multiple <= 8) {
return 1024 * 2;
} else if (multiple <= 16) {
return 1024 * 4;
} else if (multiple <= 32) {
return 1024 * 8;
} else if (multiple <= 64) {
return 1024 * 16;
} else {
return 1024 * 64;
}
}
private int compareContent(File file1, File file2) throws IOException {
final int BUFFER_SIZE = bufferSize(file1.length());
// check content
try (BufferedInputStream is1 = new BufferedInputStream(new FileInputStream(file1), BUFFER_SIZE); BufferedInputStream is2 = new BufferedInputStream(new FileInputStream(file2), BUFFER_SIZE);) {
byte[] b1 = new byte[BUFFER_SIZE];
byte[] b2 = new byte[BUFFER_SIZE];
int read1 = -1;
int read2 = -1;
int read = -1;
do {
read1 = is1.read(b1);
read2 = is2.read(b2);
if (read1 < read2) {
return -1;
} else if (read1 > read2) {
return 1;
} else {
// read1 is equals to read2
read = read1;
}
if (read >= 0) {
if (read != BUFFER_SIZE) {
// clear the buffer not filled from the read
Arrays.fill(b1, read, BUFFER_SIZE, (byte) 0);
Arrays.fill(b2, read, BUFFER_SIZE, (byte) 0);
}
// compare the content of the two buffers
if (!Arrays.equals(b1, b2)) {
return new String(b1).compareTo(new String(b2));
}
}
} while (read >= 0);
// no difference found
return 0;
}
}
}
Comparing two files:
public static boolean compareFiles(File file1, File file2) {
byte[] buffer1 = new byte[1024];
byte[] buffer2 = new byte[1024];
try {
FileInputStream fileInputStream1 = new FileInputStream(file1);
FileInputStream fileInputStream2 = new FileInputStream(file2);
while (fileInputStream1.read(buffer1) != -1) {
if (fileInputStream2.read(buffer2) != -1 && !Arrays.equals(buffer1, buffer2))
return false;
}
return true;
} catch (Exception ignore) {
return false;
}
}
Of course, before you do that, you have to compare file sizes. Only if it matches, then compare the contents.
I have a method which accept file and size of chunks and return list of chunked files. But the main problem that my line in file could be broken, for example in main file I have next lines:
|1|aaa|bbb|ccc|
|2|ggg|ddd|eee|
After split I could have in one file:
|1|aaa|bbb
In another file:
|ccc|2|
|ggg|ddd|eee|
Here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
byte[] buffer = new byte[sizeOfChunk];
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
String name = file.getName();
int tmp = 0;
while ((tmp = bis.read(buffer)) > 0) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, tmp);
}
files.add(newFile);
}
}
return files;
}
Should I use RandomAccessFile class for above purposes (main file is really big - more then 5 Gb)?
If you don't mind to have chunks of different lengths (<=sizeOfChunk but closest to it) then here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<File>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
String eof = System.lineSeparator();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String name = file.getName();
String line = br.readLine();
while (line != null) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(newFile))) {
int fileSize = 0;
while (line != null) {
byte[] bytes = (line + eof).getBytes(Charset.defaultCharset());
if (fileSize + bytes.length > sizeOfChunk)
break;
out.write(bytes);
fileSize += bytes.length;
line = br.readLine();
}
}
files.add(newFile);
}
}
return files;
}
The only problem here is file charset which is default system charset in this example. If you want to be able to change it let me know. I'll add third parameter to "splitFile" function for it.
Just in case anyone is interested in a Kotlin version.
It creates an iterator of ByteArray chunks:
class ByteArrayReader(val input: InputStream, val chunkSize: Int, val bufferSize: Int = 1024*8): Iterator<ByteArray> {
var eof: Boolean = false
init {
if ((chunkSize % bufferSize) != 0) {
throw RuntimeException("ChunkSize(${chunkSize}) should be a multiple of bufferSize (${bufferSize})")
}
}
override fun hasNext(): Boolean = !eof
override fun next(): ByteArray {
var buffer = ByteArray(bufferSize)
var chunkWriter = ByteArrayOutputStream(chunkSize) // no need to close - implementation is empty
var bytesRead = 0
var offset = 0
while (input.read(buffer).also { bytesRead = it } > 0) {
if (chunkWriter.use { out ->
out.write(buffer, 0, bytesRead)
out.flush()
offset += bytesRead
offset == chunkSize
}) {
return chunkWriter.toByteArray()
}
}
eof = true
return chunkWriter.toByteArray()
}
}
Split a file to multiple chunks (in memory operation), here I'm splitting any file to a size of 500kb(500000 bytes) and adding to a list :
public static List<ByteArrayOutputStream> splitFile(File f) {
List<ByteArrayOutputStream> datalist = new ArrayList<>();
try {
int sizeOfFiles = 500000;
byte[] buffer = new byte[sizeOfFiles];
try (FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis)) {
int bytesAmount = 0;
while ((bytesAmount = bis.read(buffer)) > 0) {
try (OutputStream out = new ByteArrayOutputStream()) {
out.write(buffer, 0, bytesAmount);
out.flush();
datalist.add((ByteArrayOutputStream) out);
}
}
}
} catch (Exception e) {
//get the error
}
return datalist;
}
Split files in chunks depending upon your chunk size
val f = FileInputStream(file)
val data = ByteArray(f.available()) // Size of original file
var subData: ByteArray
f.read(data)
var start = 0
var end = CHUNK_SIZE
val max = data.size
if (max > 0) {
while (end < max) {
subData = data.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
//Function to upload your chunk
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = data.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}
If you are taking the file from the user through intent you may get file URI as content, so in that case.
Uri uri = data.getData();
InputStream inputStream = getContext().getContentResolver().openInputStream(uri);
fileInBytes = IOUtils.toByteArray(inputStream);
Add the dependency in you build gradle to use IOUtils
compile 'commons-io:commons-io:2.11.0'
Now do a little modification in the above code to send your file to server.
var subData: ByteArray
var start = 0
var end = CHUNK_SIZE
val max = fileInBytes.size
if (max > 0) {
while (end < max) {
subData = fileInBytes.copyOfRange(start, end)
start = end
end += CHUNK_SIZE
if (end >= max) {
end = max
}
uploadFileInChunk(subData, isLast = false)
}
// For the Last Chunk
end--
subData = fileInBytes.copyOfRange(start, end)
uploadFileInChunk(subData, isLast = true)
}
sorry for my english. I want to read a large file, but when I read error occurs outOfMemoryError. I do not understand how to work with memory in the application. The following code does not work:
try {
StringBuilder fileData = new StringBuilder(1000);
BufferedReader reader = new BufferedReader(new FileReader(file));
char[] buf = new char[8192];
int bytesread = 0,
bytesBuffered = 0;
while( (bytesread = reader.read( buf )) > -1 ) {
String readData = String.valueOf(buf, 0, bytesread);
bytesBuffered += bytesread;
fileData.append(readData); //this is error
if (bytesBuffered > 1024 * 1024) {
bytesBuffered = 0;
}
}
System.out.println(fileData.toString().toCharArray());
} finally {
}
You need pre allocate a large buffer to avoid reallocate.
File file = ...;
StringBuilder fileData = new StringBuilder(file.size());
And running with large heap size:
java -Xmx2G
==== update
A while loop using buffer doesn't need too memory to run. Treat input like a stream, match your search string with the stream. It's a really simple state machine. If you need search multiple words, you can find a TrieTree implementation(support stream) for that.
// the match state model
...xxxxxxabxxxxxaxxxxxabcdexxxx...
ab a abcd
File file = new File("path_to_your_file");
String yourSearchWord = "abcd";
int matchIndex = 0;
boolean matchPrefix = false;
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
int chr;
while ((chr = reader.read()) != -1) {
if (matchPrefix == false) {
char searchChar = yourSearchWord.charAt(0);
if (chr == searchChar) {
matchPrefix = true;
matchIndex = 0;
}
} else {
char searchChar = yourSearchWord.charAt(++matchIndex);
if (chr == searchChar) {
if (matchIndex == yourSearchWord.length() - 1) {
// match!!
System.out.println("match: " + matchIndex);
matchPrefix = false;
matchIndex = 0;
}
} else {
matchPrefix = false;
matchIndex = 0;
}
}
}
}
Try this. This might be helpful :-
try{
BufferedReader reader = new BufferedReader(new FileReader(file));
String txt = "";
while( (txt = reader.read()) != null){
System.out.println(txt);
}
}catch(Exception e){
System.out.println("Error : "+e.getMessage());
}
You should not hold such big files in memory, because you run out of it, as you see. Since you use Java 7, you need to read the file manually as stream and check the content on the fly. Otherwise you could use the stream API of Java 8. This is just an example. It works, but keep in mind, that the position of the found word could vary due to encoding issues, so this is no production code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileReader
{
private static String wordToFind = "SEARCHED_WORD";
private static File file = new File("YOUR_FILE");
private static int currentMatchingPosition;
private static int foundAtPosition = -1;
private static int charsRead;
public static void main(String[] args) throws IOException
{
try (FileInputStream fis = new FileInputStream(file))
{
System.out.println("Total size to read (in bytes) : " + fis.available());
int c;
while ((c = fis.read()) != -1)
{
charsRead++;
checkContent(c);
}
if (foundAtPosition > -1)
{
System.out.println("Found word at position: " + (foundAtPosition - wordToFind.length()));
}
else
{
System.out.println("Didnt't find the word!");
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
private static void checkContent(int c)
{
if (currentMatchingPosition >= wordToFind.length())
{
//already found....
return;
}
if (wordToFind.charAt(currentMatchingPosition) == (char)c)
{
foundAtPosition = charsRead;
currentMatchingPosition++;
}
else
{
currentMatchingPosition = 0;
foundAtPosition = -1;
}
}
}
I have a big file encoded 1250. Lines are just single polish words one after another:
zając
dzieło
kiepsko
etc
I need to choose random 10 unique lines from this file in a quite fast way. I did this but when I print these words they have wrong encoding [zaj?c, dzie?o, kiepsko...], I need UTF8. So I changed my code to read bytes from file not just read lines, so my efforts ended up with this code:
public List<String> getRandomWordsFromDictionary(int number) {
List<String> randomWords = new ArrayList<String>();
File file = new File("file.txt");
try {
RandomAccessFile raf = new RandomAccessFile(file, "r");
for(int i = 0; i < number; i++) {
Random random = new Random();
int startPosition;
String word;
do {
startPosition = random.nextInt((int)raf.length());
raf.seek(startPosition);
raf.readLine();
word = grabWordFromDictionary(raf);
} while(checkProbability(word));
System.out.println("Word: " + word);
randomWords.add(word);
}
} catch (IOException ioe) {
logger.error(ioe.getMessage(), ioe);
}
return randomWords;
}
private String grabWordFromDictionary(RandomAccessFile raf) throws IOException {
byte[] wordInBytes = new byte[15];
int counter = 0;
byte wordByte;
char wordChar;
String convertedWord;
boolean stop = true;
do {
wordByte = raf.readByte();
wordChar = (char)wordByte;
if(wordChar == '\n' || wordChar == '\r' || wordChar == -1) {
stop = false;
} else {
wordInBytes[counter] = wordByte;
counter++;
}
} while(stop);
if(wordInBytes.length > 0) {
convertedWord = new String(wordInBytes, "UTF8");
return convertedWord;
} else {
return null;
}
}
private boolean checkProbability(String word) {
if(word.length() > MAX_LENGTH_LINE) {
return true;
} else {
double randomDouble = new Random().nextDouble();
double probability = (double) MIN_LENGTH_LINE / word.length();
return probability <= randomDouble;
}
}
But something is wrong. Could you look at this code and help me? Maybe you see some obvious errors but not obvious for me? I will appreciate any help.
Your file is in 1250, so you need to decode it in 1250, not UTF-8. You can save it as UTF-8 after the decoding process though.
Charset w1250 = Charset.forName("Windows-1250");
convertedWord = new String(wordInBytes, w1250);
i have file reader which read entire file and write it's bits.
I have this class which help reading:
import java.io.*;
public class FileReader extends ByteArrayInputStream{
private int bitsRead;
private int bitPosition;
private int currentByte;
private int myMark;
private final static int NUM_BITS_IN_BYTE = 8;
private final static int END_POSITION = -1;
private boolean readingStarted;
/**
* Create a BitInputStream for a File on disk.
*/
public FileReader( byte[] buf ) throws IOException {
super( buf );
myMark = 0;
bitsRead = 0;
bitPosition = NUM_BITS_IN_BYTE-1;
currentByte = 0;
readingStarted = false;
}
/**
* Read a binary "1" or "0" from the File.
*/
public int readBit() throws IOException {
int theBit = -1;
if( bitPosition == END_POSITION || !readingStarted ) {
currentByte = super.read();
bitPosition = NUM_BITS_IN_BYTE-1;
readingStarted = true;
}
theBit = (0x01 << bitPosition) & currentByte;
bitPosition--;
if( theBit > 0 ) {
theBit = 1;
}
return( theBit );
}
/**
* Return the next byte in the File as lowest 8 bits of int.
*/
public int read() {
currentByte = super.read();
bitPosition = END_POSITION;
readingStarted = true;
return( currentByte );
}
/**
*
*/
public void mark( int readAheadLimit ) {
super.mark(readAheadLimit);
myMark = bitPosition;
}
/**
* Add needed functionality to super's reset() method. Reset to
* the last valid position marked in the input stream.
*/
public void reset() {
super.pos = super.mark-1;
currentByte = super.read();
bitPosition = myMark;
}
/**
* Returns the number of bits still available to be read.
*/
public int availableBits() throws IOException {
return( ((super.available() * 8) + (bitPosition + 1)) );
}
}
In class where i call this, i do:
FileInputStream inputStream = new FileInputStream(file);
byte[] fileBits = new byte[inputStream.available()];
inputStream.read(fileBits, 0, inputStream.available());
inputStream.close();
FileReader bitIn = new FileReader(fileBits);
and this work correctly.
However i have problems with big files above 100 mb because byte[] have the end.
So i want to read bigger files. Maybe some could suggest how i can improve this code ?
Thanks.
If scaling to large file sizes is important, you'd be better off not reading the entire file into memory. The downside is that handling the IOException in more locations can be a little messy. Also, it doesn't look like your application needs something that implements the InputStream API, it just needs the readBit() method. So, you can safely encapsulate, rather than extend, the InputStream.
class FileReader {
private final InputStream src;
private final byte[] bits = new byte[8192];
private int len;
private int pos;
FileReader(InputStream src) {
this.src = src;
}
int readBit() throws IOException {
int idx = pos / 8;
if (idx >= len) {
int n = src.read(bits);
if (n < 0)
return -1;
len = n;
pos = 0;
idx = 0;
}
return ((bits[idx] & (1 << (pos++ % 8))) == 0) ? 0 : 1;
}
}
Usage would look similar.
FileInputStream src = new FileInputStream(file);
try {
FileReader bitIn = new FileReader(src);
...
} finally {
src.close();
}
If you really do want to read in the entire file, and you are working with an actual file, you can query the length of the file first.
File file = new File(path);
if (file.length() > Integer.MAX_VALUE)
throw new IllegalArgumentException("File is too large: " + file.length());
int len = (int) file.length();
FileInputStream inputStream = new FileInputStream(file);
try {
byte[] fileBits = new byte[len];
for (int pos = 0; pos < len; ) {
int n = inputStream.read(fileBits, pos, len - pos);
if (n < 0)
throw new EOFException();
pos += n;
}
/* Use bits. */
...
} finally {
inputStream.close();
}
org.apache.commons.io.IOUtils.copy(InputStream in, OutputStream out)