Is there any ways to compare two files in Android?
For example: I am having two files under same folder, which are same.
They are same(also in size), but their namea are like
myFileA.pdf and myFileB.pdf. So how can I identify that they are
same or not.
What already I had tried:
compareTo() method: Tried myFileA.compare(myFileB), but that's giving some weird values like -1, -2, etc. I think those values are files' PATH dependent.
myFile.length(): but in some rare cases (very rare cases), two different files can have same size, so I think this is not a proper way.
NOTE: I told that files are under same folder for just example, they can be anywhere like myFileA.pdf can be in
NewFolder1 and myFileB.pdf can be in NewFolder2.
Some times ago I've written an utility to compare the content of two stream in an efficient way: stop the comparison when the first difference is found.
Here is the code that I think it's quite self explicable:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.Comparator;
public class FileComparator implements Comparator<File> {
#Override
public int compare(File file1, File file2) {
// one or both null
if (file1 == file2) {
return 0;
} else if (file1 == null && file2 != null) {
return -1;
} else if (file1 != null && file2 == null) {
return 1;
}
if (file1.isDirectory() || file2.isDirectory()) {
throw new IllegalArgumentException("Unable to compare directory content");
}
// not same size
if (file1.length() < file2.length()) {
return -1;
} else if (file1.length() > file2.length()) {
return 1;
}
try {
return compareContent(file1, file2);
} catch (IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}
private int bufferSize(long fileLength) {
int multiple = (int) (fileLength / 1024);
if (multiple <= 1) {
return 1024;
} else if (multiple <= 8) {
return 1024 * 2;
} else if (multiple <= 16) {
return 1024 * 4;
} else if (multiple <= 32) {
return 1024 * 8;
} else if (multiple <= 64) {
return 1024 * 16;
} else {
return 1024 * 64;
}
}
private int compareContent(File file1, File file2) throws IOException {
final int BUFFER_SIZE = bufferSize(file1.length());
// check content
try (BufferedInputStream is1 = new BufferedInputStream(new FileInputStream(file1), BUFFER_SIZE); BufferedInputStream is2 = new BufferedInputStream(new FileInputStream(file2), BUFFER_SIZE);) {
byte[] b1 = new byte[BUFFER_SIZE];
byte[] b2 = new byte[BUFFER_SIZE];
int read1 = -1;
int read2 = -1;
int read = -1;
do {
read1 = is1.read(b1);
read2 = is2.read(b2);
if (read1 < read2) {
return -1;
} else if (read1 > read2) {
return 1;
} else {
// read1 is equals to read2
read = read1;
}
if (read >= 0) {
if (read != BUFFER_SIZE) {
// clear the buffer not filled from the read
Arrays.fill(b1, read, BUFFER_SIZE, (byte) 0);
Arrays.fill(b2, read, BUFFER_SIZE, (byte) 0);
}
// compare the content of the two buffers
if (!Arrays.equals(b1, b2)) {
return new String(b1).compareTo(new String(b2));
}
}
} while (read >= 0);
// no difference found
return 0;
}
}
}
Comparing two files:
public static boolean compareFiles(File file1, File file2) {
byte[] buffer1 = new byte[1024];
byte[] buffer2 = new byte[1024];
try {
FileInputStream fileInputStream1 = new FileInputStream(file1);
FileInputStream fileInputStream2 = new FileInputStream(file2);
while (fileInputStream1.read(buffer1) != -1) {
if (fileInputStream2.read(buffer2) != -1 && !Arrays.equals(buffer1, buffer2))
return false;
}
return true;
} catch (Exception ignore) {
return false;
}
}
Of course, before you do that, you have to compare file sizes. Only if it matches, then compare the contents.
Related
I want to read fast line by line big csv files (approx ~ 1gb) in UTF-8. I have created a class for it, but it doesn't work properly. UTF-8 decodes Cyrillic symbol from 2 bytes. I use byte buffer to read it, for example, it has 10 bytes length. So if symbol composed from 10 and 11 bytes in the file it wouldn't be decoded normally :(
public class MyReader extends InputStream {
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(10);
private int buffSize = 0;
private int position = 0;
private boolean EOF = false;
private CharBuffer charBuffer;
private MyReader() {}
static MyReader getFromFile(final String path) throws IOException {
MyReader myReader = new MyReader();
myReader.channel = FileChannel.open(Path.of(path),
StandardOpenOption.READ);
myReader.initNewBuffer();
return myReader;
}
private void initNewBuffer() {
try {
buffSize = channel.read(buffer);
buffer.position(0);
charBuffer = Charset.forName("UTF-8").decode(buffer);
buffer.position(0);
} catch (IOException e) {
throw new RuntimeException("Error reading file: {}", e);
}
}
#Override
public int read() throws IOException {
if (EOF) {
return -1;
}
if (position < charBuffer.length()) {
return charBuffer.array()[position++];
} else {
initNewBuffer();
if (buffSize < 1) {
EOF = true;
} else {
position = 0;
}
return read();
}
}
public char[] readLine() throws IOException {
int readResult = 0;
int startPos = position;
while (readResult != -1) {
readResult = read();
}
return Arrays.copyOfRange(charBuffer.array(), startPos, position);
}
}
Bad solution, but it works)
private void initNewBuffer() {
try {
buffSize = channel.read(buffer);
buffer.position(0);
charBuffer = StandardCharsets.UTF_8.decode(buffer);
if (buffSize > 0) {
byte edgeByte = buffer.array()[buffSize - 1];
if (edgeByte == (byte) 0xd0 ||
edgeByte == (byte) 0xd1 ||
edgeByte == (byte) 0xc2 ||
edgeByte == (byte) 0xd2 ||
edgeByte == (byte) 0xd3
) {
channel.position(channel.position() - 1);
charBuffer.limit(charBuffer.limit()-1);
}
}
buffer.position(0);
} catch (IOException e) {
throw new RuntimeException("Error reading file: {}", e);
}
}
First: the gain is questionable.
The Files class has many nice and quite production fast methods.
Bytes with high bit 1 (< 0) are part of a UTF-8 multibyte sequence.
With high bits 10 they are continuation bytes.
Sequences might be upto 6 bytes nowadays (I believe).
So the next buffer starts with some continuation bytes, they belong to the previous buffer.
The programming logic I gladly leave to you.
I have a lot of massive files I need convert to CSV by replacing certain characters.
I am looking for reliable approach given InputStream return OutputStream and replace all characters c1 to c2.
Trick here is to read and write in parallel, I can't fit whole file in memory.
Do I need to run it in separate thread if I want read and write at the same time?
Thanks a lot for your advices.
To copy data from an input stream to an output stream you write data while you're reading it either a byte (or character) or a line at a time.
Here is an example that reads in a file converting all 'x' characters to 'y'.
BufferedInputStream in = new BufferedInputStream(new FileInputStream("input.dat"));
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream("output.dat"));
int ch;
while((ch = in.read()) != -1) {
if (ch == 'x') ch = 'y';
out.write(ch);
}
out.close();
in.close();
Or if can use a Reader and process a line at a time then can use this aproach:
BufferedReader reader = new BufferedReader(new FileReader("input.dat"));
PrintWriter writer = new PrintWriter(
new BufferedOutputStream(new FileOutputStream("output.dat")));
String str;
while ((str = reader.readLine()) != null) {
str = str.replace('x', 'y'); // replace character at a time
str = str.replace("abc", "ABC"); // replace string sequence
writer.println(str);
}
writer.close();
reader.close();
BufferedInputStream and BufferedReader read ahead and keep 8K of characters in a buffer for performance. Very large files can be processed while only keeping 8K of characters in memory at a time.
FileWriter writer = new FileWriter("Report.csv");
BufferedReader reader = new BufferedReader(new InputStreamReader(YOURSOURCE, Charsets.UTF_8));
String line;
while ((line = reader.readLine()) != null) {
line.replace('c1', 'c2');
writer.append(line);
writer.append('\n');
}
writer.flush();
writer.close();
You can find related answer here: Filter (search and replace) array of bytes in an InputStream
I took #aioobe's answer in that thread, and built the replacing input stream module in Java, which you can find it in my GitHub gist: https://gist.github.com/lhr0909/e6ac2d6dd6752871eb57c4b083799947
Putting the source code here as well:
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
/**
* Created by simon on 8/29/17.
*/
public class ReplacingInputStream extends FilterInputStream {
private Queue<Integer> inQueue, outQueue;
private final byte[] search, replacement;
public ReplacingInputStream(InputStream in, String search, String replacement) {
super(in);
this.inQueue = new LinkedList<>();
this.outQueue = new LinkedList<>();
this.search = search.getBytes();
this.replacement = replacement.getBytes();
}
private boolean isMatchFound() {
Iterator<Integer> iterator = inQueue.iterator();
for (byte b : search) {
if (!iterator.hasNext() || b != iterator.next()) {
return false;
}
}
return true;
}
private void readAhead() throws IOException {
// Work up some look-ahead.
while (inQueue.size() < search.length) {
int next = super.read();
inQueue.offer(next);
if (next == -1) {
break;
}
}
}
#Override
public int read() throws IOException {
// Next byte already determined.
while (outQueue.isEmpty()) {
readAhead();
if (isMatchFound()) {
for (byte a : search) {
inQueue.remove();
}
for (byte b : replacement) {
outQueue.offer((int) b);
}
} else {
outQueue.add(inQueue.remove());
}
}
return outQueue.remove();
}
#Override
public int read(byte b[]) throws IOException {
return read(b, 0, b.length);
}
// copied straight from InputStream inplementation, just needed to to use `read()` from this class
#Override
public int read(byte b[], int off, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = read();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
}
Sorry for my english. I try read realy fast big size text file character-by-character(not use readLine()) but it has not yet obtained. My code:
for(int i = 0; (i = textReader.read()) != -1; ) {
char character = (char) i;
}
It read 1GB text file 56666ms, how can i read faster?
UDP
Its method read 1GB file 28833ms
FileInputStream fIn = null;
FileChannel fChan = null;
ByteBuffer mBuf;
int count;
try {
fIn = new FileInputStream(textReader);
fChan = fIn.getChannel();
mBuf = ByteBuffer.allocate(128);
do {
count = fChan.read(mBuf);
if(count != -1) {
mBuf.rewind();
for(int i = 0; i < count; i++) {
char c = (char)mBuf.get();
}
}
} while(count != -1);
}catch(Exception e) {
}
The fastest way to read input is to use buffer. Here is an example of a class that has internal buffer.
class Parser
{
final private int BUFFER_SIZE = 1 << 16;
private DataInputStream din;
private byte[] buffer;
private int bufferPointer, bytesRead;
public Parser(InputStream in)
{
din = new DataInputStream(in);
buffer = new byte[BUFFER_SIZE];
bufferPointer = bytesRead = 0;
}
public int nextInt() throws Exception
{
int ret = 0;
byte c = read();
while (c <= ' ') c = read();
//boolean neg = c == '-';
//if (neg) c = read();
do
{
ret = ret * 10 + c - '0';
c = read();
} while (c > ' ');
//if (neg) return -ret;
return ret;
}
private void fillBuffer() throws Exception
{
bytesRead = din.read(buffer, bufferPointer = 0, BUFFER_SIZE);
if (bytesRead == -1) buffer[0] = -1;
}
private byte read() throws Exception
{
if (bufferPointer == bytesRead) fillBuffer();
return buffer[bufferPointer++];
}
}
This parser has function that will give you nextInt, if you want next char you can can call read() function.
This is the fastest way to read from a file (as far as I know)
You would initialize this parser like this:
Parser p = new Parser(new FileInputStream("text.txt"));
int c;
while((c = p.read()) != -1)
System.out.print((char)c);
This code reads 250mb in 7782ms.
Disclaimer:
the code is not mine, it has been posted as a solution to a problem on CodeChef by the user 'Kamalakannan CM'
I would use BufferedReader, it reads buffered. A short sample:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.CharBuffer;
public class Main {
public static void main(String... args) {
try (FileReader fr = new FileReader("a.txt")) {
try (BufferedReader reader = new BufferedReader(fr)) {
CharBuffer charBuffer = CharBuffer.allocate(8192);
reader.read(charBuffer);
} catch (IOException e) {
e.printStackTrace();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The default constructor uses a default buffersize of 8192. In case you want to use a different buffer size you can use this constructor. Alternatively you can read in an array buffer:
....
char[] buffer = new char[255];
reader.read(buffer);
....
or read one character at a time:
int char = reader.read();
sorry for my english. I want to read a large file, but when I read error occurs outOfMemoryError. I do not understand how to work with memory in the application. The following code does not work:
try {
StringBuilder fileData = new StringBuilder(1000);
BufferedReader reader = new BufferedReader(new FileReader(file));
char[] buf = new char[8192];
int bytesread = 0,
bytesBuffered = 0;
while( (bytesread = reader.read( buf )) > -1 ) {
String readData = String.valueOf(buf, 0, bytesread);
bytesBuffered += bytesread;
fileData.append(readData); //this is error
if (bytesBuffered > 1024 * 1024) {
bytesBuffered = 0;
}
}
System.out.println(fileData.toString().toCharArray());
} finally {
}
You need pre allocate a large buffer to avoid reallocate.
File file = ...;
StringBuilder fileData = new StringBuilder(file.size());
And running with large heap size:
java -Xmx2G
==== update
A while loop using buffer doesn't need too memory to run. Treat input like a stream, match your search string with the stream. It's a really simple state machine. If you need search multiple words, you can find a TrieTree implementation(support stream) for that.
// the match state model
...xxxxxxabxxxxxaxxxxxabcdexxxx...
ab a abcd
File file = new File("path_to_your_file");
String yourSearchWord = "abcd";
int matchIndex = 0;
boolean matchPrefix = false;
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
int chr;
while ((chr = reader.read()) != -1) {
if (matchPrefix == false) {
char searchChar = yourSearchWord.charAt(0);
if (chr == searchChar) {
matchPrefix = true;
matchIndex = 0;
}
} else {
char searchChar = yourSearchWord.charAt(++matchIndex);
if (chr == searchChar) {
if (matchIndex == yourSearchWord.length() - 1) {
// match!!
System.out.println("match: " + matchIndex);
matchPrefix = false;
matchIndex = 0;
}
} else {
matchPrefix = false;
matchIndex = 0;
}
}
}
}
Try this. This might be helpful :-
try{
BufferedReader reader = new BufferedReader(new FileReader(file));
String txt = "";
while( (txt = reader.read()) != null){
System.out.println(txt);
}
}catch(Exception e){
System.out.println("Error : "+e.getMessage());
}
You should not hold such big files in memory, because you run out of it, as you see. Since you use Java 7, you need to read the file manually as stream and check the content on the fly. Otherwise you could use the stream API of Java 8. This is just an example. It works, but keep in mind, that the position of the found word could vary due to encoding issues, so this is no production code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class FileReader
{
private static String wordToFind = "SEARCHED_WORD";
private static File file = new File("YOUR_FILE");
private static int currentMatchingPosition;
private static int foundAtPosition = -1;
private static int charsRead;
public static void main(String[] args) throws IOException
{
try (FileInputStream fis = new FileInputStream(file))
{
System.out.println("Total size to read (in bytes) : " + fis.available());
int c;
while ((c = fis.read()) != -1)
{
charsRead++;
checkContent(c);
}
if (foundAtPosition > -1)
{
System.out.println("Found word at position: " + (foundAtPosition - wordToFind.length()));
}
else
{
System.out.println("Didnt't find the word!");
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
private static void checkContent(int c)
{
if (currentMatchingPosition >= wordToFind.length())
{
//already found....
return;
}
if (wordToFind.charAt(currentMatchingPosition) == (char)c)
{
foundAtPosition = charsRead;
currentMatchingPosition++;
}
else
{
currentMatchingPosition = 0;
foundAtPosition = -1;
}
}
}
I have a boolean method of files comparison. It get's part of bb and check out on equal.
If parts equal - get next block. If position (point) > file size and all blocks are equal - return true.
Works on small files (10MB), but have troubles on big one.
private static boolean getFiles(File file1, File file2) throws IOException {
FileChannel channel1 = new FileInputStream(file1).getChannel();
FileChannel channel2 = new FileInputStream(file2).getChannel();
int SIZE;
MappedByteBuffer buffer1, buffer2;
for (int point = 0; point < channel1.size(); point += SIZE) {
SIZE = (int) Math.min((4096*1024), channel1.size() - point);
buffer1 = channel1.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
buffer2 = channel2.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
if (!buffer1.equals(buffer2)) {
return false;
}
}
return true;
}
How can I modify it? Change the size of blocks?
if file2 is smaller than file1 you will get an error when trying to read data after the end of the file2, at this line:
buffer2 = channel2.map(FileChannel.MapMode.READ_ONLY, point, SIZE);
Apart from the few corner cases that you missed, I using a Direct Allocated Byte Buffer is supposed to be faster than your method :)
public static void main (String [] args) throws IOException {
final File file1 = new File(args[0]);
final File file2 = new File(args[1]);
//check if the files exist and are not blank
if(!file1.exists() || !file2.exists() ||
file1.length() == 0 || file2.length() == 0) {
System.out.println("ILLEGAL FILES");
return;
}
//if the length of the files is not same they are obviously not the same files
if(file1.length() != file2.length()) {
System.out.println("DIFFERENT SIZE");
return;
}
final FileChannel channel1 = new FileInputStream(file1).getChannel();
final FileChannel channel2 = new FileInputStream(file2).getChannel();
//DirectByteBuffers for faster IO
final ByteBuffer byteBuffer1 = ByteBuffer.allocateDirect(128 * 1024);
final ByteBuffer byteBuffer2 = ByteBuffer.allocateDirect(128 * 1024);
System.out.println("Starting Compare");
while(true) {
int read1, read2 =0;
read1 = channel1.read(byteBuffer1);
if(read1 == -1) break;
while (read2 < read1 && read2 >= 0) {
read2 += (channel2.read(byteBuffer2));
}
byteBuffer1.flip();byteBuffer2.flip();
if(byteBuffer1.compareTo(byteBuffer2) != 0) {
System.out.println("NOT SAME");
return;
}
byteBuffer1.clear();
byteBuffer2.clear();
}
System.out.println("SAME :)");
return;
}