QUESTION :
Is there a better way to compare two low size(100Kb) files, while selectively ignoring a certain portion of text. and report differences
Looking for default/existing java libraries or any windows native apps
Below is scenario:
Expected file 1 located at D:\expected\FileA_61613.txt
..Actual file 2 located at D:\actuals\FileA_61613.txt
Content in expected File
Some first line here
There may be whitespaces, line breaks, indentation and here is another line
Key : SomeValue
Date : 01/02/2012
Time : 18:20
key2 : Value2
key3 : Value3
key4 : Value4
key5 : Value5
Some other text again to indicate that his is end of this file.
Actual File to be compared:
Some first line here
There may be whitespaces, line breaks, indentation and here is another line
Key : SomeValue
Date : 18/09/2013
Timestamp : 15:10.345+10.00
key2 : Value2
key3 : Value3
key4 : Something Different
key5 : Value5
Some other text again to indicate that his is end of this file.
File 1 and 2 need to be compared line by line., WITHOUT ignoring
whitespaces, indentation, linebreaks
The comparison result should be like something below:
Line 8 - Expected Time, but actual Timestamp
Line 8 - Expected HH.mm, but actual HH.mm .345+10.00
Line 10 - Expected Value4, but actual Something different.
Line 11 - Expected indentation N spaces, but actual only X spaces
Line 13 - Expected a line break, but no linebreak present.
Below have also changed but SHOULD BE IGNORED :
Line 7 - Expected 01/02/2012, but actual 18/09/2013 (exactly and only the 10chars)
Line 8 - Expected 18:20 but actual :15:20 (exactly and only 5 chars should be ignored)
Note : The remaining .345+10.00 should be reported
It is fine even if result just contains the line numbers and no analysis of why it failed.
But it should not just report a failure at line 8 and exit.
It should report all the changes, except for the excluded "date" and "time" values.
Some search results pointed to solutions using Perl.
But Looking for Java / Javascript solutions.
The inputs to the solution would be full file path to both the files.
My current work-around:
Replace the text to be ignored with '#'.
When performing comparison, if we encounter #, do not consider as difference.
Below is my working code. But I need to know if i can use some default / existing libraries or functions to achieve this.
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
public class fileComparison {
public static void main(String[] args) throws IOException {
FileInputStream fstream1 = new FileInputStream(
"D:\\expected\\FileA_61613.txt");
FileInputStream fstream2 = new FileInputStream(
"D:\\actuals\\FileA_61613.txt");
DataInputStream in1 = new DataInputStream(fstream1);
BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
int lineNumber = 0;
String strLine1 = null;
String strLine2 = null;
StringBuilder sb = new StringBuilder();
System.out.println(sb);
boolean isIgnored = false;
while (((strLine1 = br1.readLine()) != null)
&& ((strLine2 = br2.readLine()) != null)) {
lineNumber++;
if (!strLine1.equals(strLine2)) {
int strLine1Length = strLine1.length();
int strLine2Length = strLine2.length();
int maxIndex = Math.min(strLine1Length, strLine2Length);
if (maxIndex == 0) {
sb.append("Mismatch at line " + lineNumber
+ " all characters " + '\n');
break;
}
int i;
for (i = 0; i < maxIndex; i++) {
if (strLine1.charAt(i) == '#') {
isIgnored = true;
continue;
}
if (strLine1.charAt(i) != strLine2.charAt(i)) {
isIgnored = false;
break;
}
}
if (isIgnored) {
sb.append("Ignored line " + lineNumber + '\n');
} else {
sb.append("Mismatch at line " + lineNumber + " at char "
+ i + '\n');
}
}
}
System.out.println(sb.toString());
br1.close();
br2.close();
}
}
I am able to get the output as :
Ignored line 7
Mismatch at line 8 at char 4
Mismatch at line 11 at char 13
Mismatch at line 12 at char 8
Mismatch at line 14 all characters
However, when there are multiple differences in same line. I am not able to log them all, because i am comparing char by char and not word by word.
I did not prefer word by word comparison because, i thought it would not be possible to compare linebreaks, and whitespaces. Is my understanding right ?
java.lang.StringIndexOutOfBoundsException comes from this code:
for (int i = 0; i < strLine1.length(); i++) {
if (strLine1.charAt(i) != strLine2.charAt(i)) {
System.out.println("char not same at " + i);
}
}
When you scroll larger String strLine to an index, that is greater than the length of strLine2 (second file is smaller than the first) you get that exception. It comes, because strLine2 does not have values on those indexes when it is shorter.
Related
I have a problem with a simple code nd don't know how to do it;
I have 3 txt files.
First txt file looks like this:
1 2 3 4 5 4.5 4,6 6.8 8,9
1 3 4 5 8 9,2 6,3 6,7 8.9
I would like to read numbers from this txt file and save integers to one txt file and floats to another.
You can do it with the following easy steps:
When you read a line, split it on whitespace and get an array of tokens.
While processing each token,
Trim any leading and trailing whitespace and then replace , with .
First check if the token can be parsed into an int. If yes, write it into outInt (the writer for integers). Otherwise, check if the token can be parsed into float. If yes, write it into outFloat (the writer for floats). Otherwise, ignore it.
Demo:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws FileNotFoundException, IOException {
BufferedReader in = new BufferedReader(new FileReader("t.txt"));
BufferedWriter outInt = new BufferedWriter(new FileWriter("t2.txt"));
BufferedWriter outFloat = new BufferedWriter(new FileWriter("t3.txt"));
String line = "";
while ((line = in.readLine()) != null) {// Read until EOF is reached
// Split the line on whitespace and get an array of tokens
String[] tokens = line.split("\\s+");
// Process each token
for (String s : tokens) {
// Trim any leading and trailing whitespace and then replace , with .
s = s.trim().replace(',', '.');
// First check if the token can be parsed into an int
try {
Integer.parseInt(s);
// If yes, write it into outInt
outInt.write(s + " ");
} catch (NumberFormatException e) {
// Otherwise, check if token can be parsed into float
try {
Float.parseFloat(s);
// If yes, write it into outFloat
outFloat.write(s + " ");
} catch (NumberFormatException ex) {
// Otherwise, ignore it
}
}
}
}
in.close();
outInt.close();
outFloat.close();
}
}
Assuming that , is also a decimal separator . it may be possible to unify this characters (replace , with .).
static void readAndWriteNumbers(String inputFile, String intNums, String dblNums) throws IOException {
// Use StringBuilder to collect the int and double numbers separately
StringBuilder ints = new StringBuilder();
StringBuilder dbls = new StringBuilder();
Files.lines(Paths.get(inputFile)) // stream of string
.map(str -> str.replace(',', '.')) // unify decimal separators
.map(str -> {
Arrays.stream(str.split("\\s+")).forEach(v -> { // split each line into tokens
if (v.contains(".")) {
if (dbls.length() > 0 && !dbls.toString().endsWith(System.lineSeparator())) {
dbls.append(" ");
}
dbls.append(v);
}
else {
if (ints.length() > 0 && !ints.toString().endsWith(System.lineSeparator())) {
ints.append(" ");
}
ints.append(v);
}
});
return System.lineSeparator(); // return new-line
})
.forEach(s -> { ints.append(s); dbls.append(s); }); // keep lines in the results
// write the files using the contents from the string builders
try (
FileWriter intWriter = new FileWriter(intNums);
FileWriter dblWriter = new FileWriter(dblNums);
) {
intWriter.write(ints.toString());
dblWriter.write(dbls.toString());
}
}
// test
readAndWriteNumbers("test.dat", "ints.dat", "dbls.dat");
Output
//ints.dat
1 2 3 4 5
1 3 4 5 8
// dbls.dat
4.5 4.6 6.8 8.9
9.2 6.3 6.7 8.9
So this method is supposed to read a text file and output the frequency of each letter. The text file reads:
aaaa
bbb
cc
So my output should be:
a = 4
b = 3
c = 2
Unfortunately, my output is:
a = 4
a = 4
b = 3
a = 4
b = 3
c = 2
Does anyone know why?
I tried modifying the loops but still haven't resolved this.
public void getFreq() throws FileNotFoundException, IOException, Exception {
File file = new File("/Users/guestaccount/IdeaProjects/Project3/src/sample/testFile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
HashMap<Character, Integer> hash = new HashMap<>();
String line;
while ((line= br.readLine()) != null) {
line = line.toLowerCase();
line = line.replaceAll("\\s", "");
char[] chars = line.toCharArray();
for (char c : chars) {
if (hash.containsKey(c)){
hash.put(c, hash.get(c)+1);
}else{
hash.put(c,1);
}
}
for (Map.Entry entry : hash.entrySet()){
System.out.println(entry.getKey() + " = " + entry.getValue());
}
}
}
Chrisvin Jem gave you the code to change because your for loop was in your while loop when reading from the File.
Does anyone know why?
As your question states, I'm going to explain why it gave you that output.
Reason: The reason that it gave you the output of a=4, a=4, b=3, a=4, b=3, c=3 is because your for loop was in your while loop meaning that each time that the BufferedReader read a new line, you iterated through the HashMap and printed its contents.
Example: When the BufferedReader reads the second line of the file, the HashMap hash already has the key, value pair for a and now, it just got the value for b. As a result, in addition to having already printed the value for a when reading the first line, it also prints the current contents of the HashMap, including the redundant a. The same thing happens for the third line of the file.
Solution: By moving the for loop out of the while loop, you only print the results after the HashMap has all its values, and not while the HashMap is still getting the values.
for (Map.Entry entry : hash.entrySet())
System.out.println(entry.getKey() + " = " + entry.getValue());
I hope this answer was able to explain why you were getting that specific output.
Just move the printing loop outside of the reading loop.
public void getFreq() throws FileNotFoundException, IOException, Exception {
File file = new File("/Users/guestaccount/IdeaProjects/Project3/src/sample/testFile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
HashMap<Character, Integer> hash = new HashMap<>();
String line;
while ((line= br.readLine()) != null) {
line = line.toLowerCase();
line = line.replaceAll("\\s", "");
char[] chars = line.toCharArray();
for (char c : chars) {
if (hash.containsKey(c)){
hash.put(c, hash.get(c)+1);
}else{
hash.put(c,1);
}
}
}
for (Map.Entry entry : hash.entrySet()){
System.out.println(entry.getKey() + " = " + entry.getValue());
}
}
I'll admit it, I'm stumped. It's not a double. It's not outside of the range of an integer. It's not NAN. It's not a non-integer in any way shape or form as far as I can tell.
Why would I get that error?
Here's the code that causes it:
String filename = "confA.txt";
//Make a new filereader to read in confA
FileReader fileReader = new FileReader(filename);
//Wrap into a bufferedReader for sanity's sake
BufferedReader bufferedReader = new BufferedReader(fileReader);
//Get the port number that B is listening to
int portNum = Integer.parseInt(bufferedReader.readLine());
It fails on that last line, stating:
java.lang.NumberFormatException: For input string: "5000"
Which is the number I want.
I've also attempted
Integer portNum = Integer.parseInt(bufferedReader.readLine());
But that didn't work either. Neither did valueOf().
Most probably there is some unprintable character somewhere in your file line. Please consider the following example (this was tested in Java 9 jshell)
jshell> String value = "5000\u0007";
value ==> "5000\007"
jshell> Integer.parseInt(value);
| java.lang.NumberFormatException thrown: For input string: "5000"
| at NumberFormatException.forInputString (NumberFormatException.java:65)
| at Integer.parseInt (Integer.java:652)
| at Integer.parseInt (Integer.java:770)
| at (#15:1)
Here the string contains the "bell" character at the end. It makes parse to fail while it is not printed in exception text. I think you have something similar. The simpliest way to verify this is to check
String line = bufferedReader.readLine();
System.out.println("line length: " + line.length());
The value other than 4 will support my idea.
I had the same problem. I was reading in a flat file with the Buffered reader and saving the contents as an ArrayList of type String, but on performing an integer.parse when retrieving a string value from the list, I realised that there was a whole lot of garbage in the string read from the file, as I got a java.lang.NumberFormatException.
This was the method I implemented with the code (called from my main method) to solve the problem:
`
// class level
private static final Pattern numericPattern = Pattern.compile("([0-9]+).([\\\\.]{0,1}[0-9]*)");
// in main method after reading in the file
String b = stripNonNumeric(stringvaluefromfile);
int a = Integer.parseInt(b);
public static String stripNonNumeric(String number) {
//System.out.println(number);
if (number == null || number.isEmpty()) {
return "0";
}
Matcher matcher = numericPattern.matcher(number);
// strip out all non-numerics
StringBuffer sb = new StringBuffer("");
while (matcher.find()) {
sb.append(matcher.group());
}
// make sure there's only one dot
int prevDot = -1;
for (int i = sb.length() - 1; i >= 0; i--) {
if (sb.charAt(i) == '.') {
if (prevDot > 0) {
sb.deleteCharAt(prevDot);
}
prevDot = i;
}
}
if (sb.length() == 0) {
sb.append("0");
}
return sb.toString();
}`
Why does Java String.split() generate different results when working with string defined in code versus string read from a file when numbers are involved? Specifically I have a file called "test.txt" that contains chars and numbers separated by spaces:
G H 5 4
The split method does not split on spaces as expected. But if a string variable is created within code with same chars and numbers separated by spaces then the result of split() is four individual strings, one for char and number. The code below demonstrates this difference:
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
public class SplitNumber {
//Read first line of text file
public static void main(String[] args) {
try {
File file = new File("test.txt");
FileReader fr = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fr);
String firstLine;
if ((firstLine = bufferedReader.readLine()) != null) {
String[] firstLineNumbers = firstLine.split("\\s+");
System.out.println("First line array length: " + firstLineNumbers.length);
for (int i=0; i<firstLineNumbers.length; i++) {
System.out.println(firstLineNumbers[i]);
}
}
bufferedReader.close();
String numberString = "G H 5 4";
String[] numbers = numberString.split("\\s+");
System.out.println("Numbers array length: " + numbers.length);
for (int i=0; i<numbers.length; i++) {
System.out.println(numbers[i]);
}
} catch(Exception exception) {
System.out.println("IOException occured");
exception.printStackTrace();
}
}
}
The result is:
First line array length: 3
G
H
5 4
Numbers array length: 4
G
H
5
4
Why do the numbers from the file not get parsed the same as the same string defined within code?
Based on feedback I changed the regex to split("[\\s\\h]+") which resolved the issue; the numbers for the file were properly split which clearly indicated that I had a different whitespace-like character in the text file that I was using. I then replaced the contents of the file (using notepad) and reverted back to split("\\s+") and found that it worked correctly this time. So at some point I must have introduced different white-space like characters in the file (maybe a copy/paste issue). In the end the take away is I should use split("[\\s\\h]+") when reading from a file where I want to split on spaces as it will cover more scenarios that may not be immediately obvious.
Thanks to all for helping me find the root cause of my issue.
I have this program that reads a text file. I need to get some data out of it.
The text files look like this:
No. Ret.Time Peak Name Height Area Rel.Area Amount Type
min µS µS*min % mG/L
1 2.98 Fluoride 0.161 0.028 0.72 15.370 BMB
2 3.77 Chloride 28.678 3.784 99.28 2348.830 BMB
Total: 28.839 3.812 100.00 2364.201
I need to start reading from line #29 and from there get the Peak Name and the Amount of each element like Fluoride, Chloride and so on. The example only shows those two elements, but other text files will have more. I know I will need some sort of loop to iterate through those lines starting on line #29 which is where the "1" starts then the "2" which will be the 30th line and so on.
I have tried to make this work, but I am missing something I think and I`m not sure what. Here is my Code.
int lines = 0;
BufferedReader br = new BufferedReader(new FileReader(selectFile.getSelectedFile()));
Scanner sc = new Scanner(new FileReader(selectFile.getSelectedFile()));
String word = null;
while((word =br.readLine()) != null){
lines++;
/*if(lines == 29)
System.out.println(word);*/
if ((lines == 29) && sc.hasNext())
count++;
String value = sc.next();
if (count == 2)
System.out.println(value + ",");
}
Here's some code for you:
int linesToSkip = 28;
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ( (line = br.readLine()) != null) {
if (linesToSkip-- > 0) {
continue;
}
String[] values = line.split(" +");
int index = 0;
for (String value : values) {
System.out.println("values[" + index + "] = " + value);
index++;
}
}
}
Note that I've surrounded it in a try(expr) {} block to ensure that the reader is closed at the end, otherwise you'll consume resources and possibly lock the file from other processes.
I've also renamed the variable you called word as line to make it clearer what it contains (i.e. a string representing a line in the file).
The line.split(" +") uses a regular expression to split a String into its constituent values. In this case your values have spaces between, so we're using " +" which means 'one or more spaces'. I've just looped through the values and printed them out; obviously, you will need to do whatever it is you need to do with them.
I replaced the line count with a linesToSkip variable that decrements. It's less code and explains better what you're trying to achieve. However, if you need the line number for some reason then use that instead, as follows:
if (++lineCount <= 28) {
continue;
}
If I'm reading it correctly, you are mixing two different readers with the BufferedReader and the Scanner, so you are not going to get the results right changing from one to the other (one is not pointing to the same position than the other). You already have the line in word and you can parse it, no need of using the Scanner. Just skip until line 29 (lines > 29) and then parse the values you want, line by line.
You are reading the file twice... try something like this
int lines = 0;
BufferedReader br = new BufferedReader(new FileReader(selectFile.getSelectedFile()));
String line = null;
while ((line = br.readLine()) != null) {
if (++lines < 29)
continue; //this ignores the line
for(String word : line.split("separator here")) {
// this will iterate over every word on that line
// I think you can take it from here
System.out.println(word);
}
}