Frequency of letters in text file. Small output issue

Frequency of letters in text file. Small output issue - java

So this method is supposed to read a text file and output the frequency of each letter. The text file reads:
aaaa
bbb
cc
So my output should be:
a = 4
b = 3
c = 2
Unfortunately, my output is:
a = 4
a = 4
b = 3
a = 4
b = 3
c = 2
Does anyone know why?
I tried modifying the loops but still haven't resolved this.
public void getFreq() throws FileNotFoundException, IOException, Exception {
File file = new File("/Users/guestaccount/IdeaProjects/Project3/src/sample/testFile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
HashMap<Character, Integer> hash = new HashMap<>();
String line;
while ((line= br.readLine()) != null) {
line = line.toLowerCase();
line = line.replaceAll("\\s", "");
char[] chars = line.toCharArray();
for (char c : chars) {
if (hash.containsKey(c)){
hash.put(c, hash.get(c)+1);
}else{
hash.put(c,1);
}
}
for (Map.Entry entry : hash.entrySet()){
System.out.println(entry.getKey() + " = " + entry.getValue());
}
}
}

Chrisvin Jem gave you the code to change because your for loop was in your while loop when reading from the File.
Does anyone know why?
As your question states, I'm going to explain why it gave you that output.
Reason: The reason that it gave you the output of a=4, a=4, b=3, a=4, b=3, c=3 is because your for loop was in your while loop meaning that each time that the BufferedReader read a new line, you iterated through the HashMap and printed its contents.
Example: When the BufferedReader reads the second line of the file, the HashMap hash already has the key, value pair for a and now, it just got the value for b. As a result, in addition to having already printed the value for a when reading the first line, it also prints the current contents of the HashMap, including the redundant a. The same thing happens for the third line of the file.
Solution: By moving the for loop out of the while loop, you only print the results after the HashMap has all its values, and not while the HashMap is still getting the values.
for (Map.Entry entry : hash.entrySet())
System.out.println(entry.getKey() + " = " + entry.getValue());
I hope this answer was able to explain why you were getting that specific output.

Just move the printing loop outside of the reading loop.
public void getFreq() throws FileNotFoundException, IOException, Exception {
File file = new File("/Users/guestaccount/IdeaProjects/Project3/src/sample/testFile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
HashMap<Character, Integer> hash = new HashMap<>();
String line;
while ((line= br.readLine()) != null) {
line = line.toLowerCase();
line = line.replaceAll("\\s", "");
char[] chars = line.toCharArray();
for (char c : chars) {
if (hash.containsKey(c)){
hash.put(c, hash.get(c)+1);
}else{
hash.put(c,1);
}
}
}
for (Map.Entry entry : hash.entrySet()){
System.out.println(entry.getKey() + " = " + entry.getValue());
}
}

Related

Split a string in Java and insert it into an empty string

I have a CSV file with the following data:
20210903|0000000001|0081|A|T60|BSN|002|STATE UNITED
I have imported this file in my java application with this code:
public List<EquivalenceGroupsTO> read() throws FileNotFoundException, IOException {
try (BufferedReader br = new BufferedReader(new FileReader("/home/myself/Desk/blaBla/T60.csv"))) {
List<String> file = new ArrayList<String>();
StringBuilder sb = new StringBuilder();
String line = br.readLine();
Integer count = 0;
HashSet<String> hset = new HashSet<String>();
while (line != null) {
//System.out.println("data <" + count + "> :" + line);
count++;
file.add(line);
file.add("\n");
line = br.readLine();
}
EquivalenceGroupsTO equivalenceGroupsTO = new EquivalenceGroupsTO();
List<EquivalenceGroupsTO> equivalenceGroupsTOs = new ArrayList<>();
for (String row : file) {
equivalenceGroupsTO = new EquivalenceGroupsTO();
String[] str = row.split("|");
equivalenceGroupsTO.setEquivalenceGroupsCode(str[5]);
equivalenceGroupsTO.setDescription(str[7]);
equivalenceGroupsTO.setLastUpdateDate(new Date());
equivalenceGroupsTOs.add(equivalenceGroupsTO);
System.out.println("Tutto ok!");
}
return equivalenceGroupsTOs;
}
}
I need to set in the equivalenceGroupsTO.setEquivalenceGroupsCode and in the equivalenceGroupsTO.setDecription (which are strings) respectively the strings after the fifth and the seventh "|" , then "BSN" and "STATE UNITED".
But if I start this script it gives me this error:
java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 1
at it.utils.my2.read(OpenTXTCodifa.java:46)
What am I doing wrong?

Main issue is mentioned in the comments: when splitting by | character, it has to be escaped as \\| because the pipe character is user as OR operator in the regular espressions.
Next issue is adding a line containing only \n to file. When this line is split, str[5] will fail with ArrayIndexOutOfBoundsException.
Other minor issues are unused variables count and hset.
However, it may be better to refactor existing code to use NIO and Stream API to get a stream of lines and convert each line into corresponding list of EquivalenceGroupsTO:
public List<EquivalenceGroupsTO> read(String filename) throws IOException {
return Files.lines(Paths.get(filename)) // Stream<String>
.map(s -> s.split("\\|")) // Stream<String[]>
// make sure all data are available
.filter(arr -> arr.length > 7) // Stream<String[]>
.map(arr -> {
EquivalenceGroupsTO egTo = new EquivalenceGroupsTO();
egTo.setEquivalenceGroupsCode(str[5]);
egTo.setDescription(str[7]);
egTo.setLastUpdateDate(new Date());
return egTo;
}) // Stream<EquivalenceGroupsTO>
.collect(Collectors.toList())
}

Java using \034 as delimiter in a string

I am trying to use '\034' field separator character as a delimiter in a string.
The issue is when I hardcode "\034"+opField and write it to a file it works, but if the "\034" character is read from a file, it writes the output as string "col1\034col2'.
I tried using StringBuilder but it escapes the \034 to "\\034".
I am using the following code to read the character from the file:
try (BufferedReader br = new BufferedReader(new FileReader(fConfig))){
int lc = 1;
for(String line;(line = br.readLine())!=null;){
String[] rowList = line.split(delim);
int row_len = rowList.length;
if (row_len<2){
System.out.println("Incorrect dictionary file row:"+fConfig.getAbsolutePath()+"\nNot enough values found at row:"+line);
}else{
String key = rowList[0];
String value = rowList[1];
dictKV.put(key, value);
}
lc++;
}
}catch(Exception e){
throw e;
}
Any help is welcome...
[update]: The same thing is happening with '\t' character, if harcoded fine, but if read from a file its getting appended as characters. "col0\tcol1"
if(colAl.toLowerCase().contains(" as ")){
String temp = colAl.replaceAll("[ ]+as[ ]+"," | ");
ArrayList<String> tempA = this.brittle_delim(temp,'|');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}else {
ArrayList<String> tempA = this.brittle_delim(colAl,' ');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}
if(i==0){
sb.append(colAl);
headerCols+=colAl.trim();
}else{
headerCols+= this.output_field_delim + colAl;
sb.append(this.output_field_delim);
sb.append(colAl);
}
}
}
System.out.println("SB Header Cols:"+sb.toString());
System.out.println("Header Cols:"+headerCols);
Output:
SB Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
In the above code if I do the following I am getting correct results:
headerCols+= "\034"+ colAl;
output:
SPRN_CO_IDFISC_YR_MTH_DSPLY_CDCST_OBJ_CDPRFT_CTR_CDLEGL_CO_CDHEAD_CT_TYPE_IDFIN_OWN_CDFUNC_AREA_CDHEAD_CT_NR
The FS characters are there even if they are geting removed here

You should provide an example demonstrating your problem. Not just incomplete code snippets.
Following runable snippet does what you explained.
// create a file one line
byte[] bytes = "foo bar".getBytes(StandardCharsets.ISO_8859_1);
String fileName = "/tmp/foobar";
Files.write(Paths.get(fileName), bytes);
String headerCols = "";
String outputFieldDelim = "\034";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
// read the line from the file and split by blank character
String[] cols = br.readLine().split(" ");
// contcatenate the values with "\034"
// but ... for your code ...
// don't concatenate String objects in a loop like below
// use a StringBuilder or StringJoiner instead
headerCols += outputFieldDelim + cols[0];
headerCols += outputFieldDelim + cols[1];
}
// output with the "\034" character
System.out.println(headerCols);

I guess this is where I found my solution and the actual words for my Question.
How to unescape string literals in java

Bidimap size always 1

I am entering values to Bidimap. In each loop I am put(ing) values into the Bidimap and I am also printing the size of the Bidimap. It is always 1. I have also checked value of map through debbugger and it shows only one value which is the present value being put in the most recent iteration in the map. What is going wrong here? How am I suppose to save key-value in the Bidimap.
Please find below the complete code.
public static void main(String args[])
{
//Read file
BufferedReader br = null;
int wordCount=0;
String wordArray[] = null;
BidiMap<String, Integer> map = new DualHashBidiMap<String, Integer>();
try {
String sCurrentLine;
br = new BufferedReader(new InputStreamReader(new FileInputStream("C:\\IASTATE\\test.txt"), StandardCharsets.UTF_16));
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
wordArray = sCurrentLine.split("\\s+");
wordCount += wordArray.length;
}
//Read word 1,word 2,word 3
int count;
String key;
for(int i=0;i<wordArray.length;i++)
{
key=wordArray[i]+wordArray[i+1]+wordArray[i+2];
//Compare Hashmap if the String {'word 1','word 2','word 3'}
//exists
if(map.containsKey(key))
{
//If exists increment counter
count=(Integer) map.get(key);
count++;
map.put(key, count);
}
else
{
//If doesnot exist push String {'word 1','word 2','word 3'}
//in the Hashmap and initialize counter to 1
map.put(key, 1);
}
key=null;
System.out.println("Size of Map"+map.size());
}
} catch (IOException e) {
e.printStackTrace();
}
}
test.txt content is
This is line one
This is line two
This is line three
This is line four

You're reading every single line of the file but you're not processing any of those lines until after you've finished:
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
wordArray = sCurrentLine.split("\\s+");
wordCount += wordArray.length;
}
// Now you're messing about with wordArray, which is the _last_ line.
It's possible that your intent is to append all the words of each line to the wordArray array but that's not what you're doing.
That means you're only processing the last line of your file. If the last line is truly "this is line four", I'd still expect two entries, one for "this is line" and another for "is line four". But I'd fix up the problem described above before you start worrying about that.
By fixing it, I mean not overwriting wordArray every time a line is read in but instead appending to that array.

Java / Javascript : File Comparison line by line while ignoring certain section

QUESTION :
Is there a better way to compare two low size(100Kb) files, while selectively ignoring a certain portion of text. and report differences
Looking for default/existing java libraries or any windows native apps
Below is scenario:
Expected file 1 located at D:\expected\FileA_61613.txt
..Actual file 2 located at D:\actuals\FileA_61613.txt
Content in expected File
Some first line here
There may be whitespaces, line breaks, indentation and here is another line
Key : SomeValue
Date : 01/02/2012
Time : 18:20
key2 : Value2
key3 : Value3
key4 : Value4
key5 : Value5
Some other text again to indicate that his is end of this file.
Actual File to be compared:
Some first line here
There may be whitespaces, line breaks, indentation and here is another line
Key : SomeValue
Date : 18/09/2013
Timestamp : 15:10.345+10.00
key2 : Value2
key3 : Value3
key4 : Something Different
key5 : Value5
Some other text again to indicate that his is end of this file.
File 1 and 2 need to be compared line by line., WITHOUT ignoring
whitespaces, indentation, linebreaks
The comparison result should be like something below:
Line 8 - Expected Time, but actual Timestamp
Line 8 - Expected HH.mm, but actual HH.mm .345+10.00
Line 10 - Expected Value4, but actual Something different.
Line 11 - Expected indentation N spaces, but actual only X spaces
Line 13 - Expected a line break, but no linebreak present.
Below have also changed but SHOULD BE IGNORED :
Line 7 - Expected 01/02/2012, but actual 18/09/2013 (exactly and only the 10chars)
Line 8 - Expected 18:20 but actual :15:20 (exactly and only 5 chars should be ignored)
Note : The remaining .345+10.00 should be reported
It is fine even if result just contains the line numbers and no analysis of why it failed.
But it should not just report a failure at line 8 and exit.
It should report all the changes, except for the excluded "date" and "time" values.
Some search results pointed to solutions using Perl.
But Looking for Java / Javascript solutions.
The inputs to the solution would be full file path to both the files.
My current work-around:
Replace the text to be ignored with '#'.
When performing comparison, if we encounter #, do not consider as difference.
Below is my working code. But I need to know if i can use some default / existing libraries or functions to achieve this.
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
public class fileComparison {
public static void main(String[] args) throws IOException {
FileInputStream fstream1 = new FileInputStream(
"D:\\expected\\FileA_61613.txt");
FileInputStream fstream2 = new FileInputStream(
"D:\\actuals\\FileA_61613.txt");
DataInputStream in1 = new DataInputStream(fstream1);
BufferedReader br1 = new BufferedReader(new InputStreamReader(in1));
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
int lineNumber = 0;
String strLine1 = null;
String strLine2 = null;
StringBuilder sb = new StringBuilder();
System.out.println(sb);
boolean isIgnored = false;
while (((strLine1 = br1.readLine()) != null)
&& ((strLine2 = br2.readLine()) != null)) {
lineNumber++;
if (!strLine1.equals(strLine2)) {
int strLine1Length = strLine1.length();
int strLine2Length = strLine2.length();
int maxIndex = Math.min(strLine1Length, strLine2Length);
if (maxIndex == 0) {
sb.append("Mismatch at line " + lineNumber
+ " all characters " + '\n');
break;
}
int i;
for (i = 0; i < maxIndex; i++) {
if (strLine1.charAt(i) == '#') {
isIgnored = true;
continue;
}
if (strLine1.charAt(i) != strLine2.charAt(i)) {
isIgnored = false;
break;
}
}
if (isIgnored) {
sb.append("Ignored line " + lineNumber + '\n');
} else {
sb.append("Mismatch at line " + lineNumber + " at char "
+ i + '\n');
}
}
}
System.out.println(sb.toString());
br1.close();
br2.close();
}
}
I am able to get the output as :
Ignored line 7
Mismatch at line 8 at char 4
Mismatch at line 11 at char 13
Mismatch at line 12 at char 8
Mismatch at line 14 all characters
However, when there are multiple differences in same line. I am not able to log them all, because i am comparing char by char and not word by word.
I did not prefer word by word comparison because, i thought it would not be possible to compare linebreaks, and whitespaces. Is my understanding right ?

java.lang.StringIndexOutOfBoundsException comes from this code:
for (int i = 0; i < strLine1.length(); i++) {
if (strLine1.charAt(i) != strLine2.charAt(i)) {
System.out.println("char not same at " + i);
}
}
When you scroll larger String strLine to an index, that is greater than the length of strLine2 (second file is smaller than the first) you get that exception. It comes, because strLine2 does not have values on those indexes when it is shorter.

Java + readLine with BufferedReader

I'm trying to read a line of text from a text file and put each line into a Map so that I can delete duplicate words (e.g. test test) and print out the lines without the duplicate words. I must be doing something wrong though because I basically get just one line as my key, vs each line being read one at a time. Any thoughts? Thanks.
public DeleteDup(File f) throws IOException {
line = new HashMap<String, Integer>();
try {
BufferedReader in = new BufferedReader(new FileReader(f));
Integer lineCount = 0;
for (String s = null; (s = in.readLine()) != null;) {
line.put(s, lineCount);
lineCount++;
System.out.println("s: " + s);
}
}
catch(IOException e) {
e.printStackTrace();
}
this.deleteDuplicates(line);
}
private Map<String, Integer> line;

To be honest, your question isn't particularly clear - it's not obvious why you've got the lineCount, or what deleteDuplicates will do, or why you've named the line variable that way when it's not actually a line - it's a map from lines to the last line number on which that line appeared.
Unless you need the line numbers, I'd use a Set<String> instead.
However, all that aside, if you look at the keySet of line afterwards, it will be all the lines. That's assuming that the text file is genuinely in the default encoding for your system (which is what FileReader uses, unfortunately - I generally use InputStreamReader and specify the encoding explicitly).
If you could give us a short but complete program, the text file you're using as input, the expected output and the actual output, that would be helpful.

What I understood from your question is to print the lines which do not have duplicate words in the line.
May be you could try the following snippet for it.
public void deleteDup(File f)
{
try
{
BufferedReader in = new BufferedReader(new FileReader(f));
Integer wordCount = 0;
boolean isDuplicate = false;
String [] arr = null;
for (String line = null; (line = in.readLine()) != null;)
{
isDuplicate = false;
wordCount = 0;
wordMap.clear();
arr = line.split("\\s+");
for(String word : arr)
{
wordCount = wordMap.get(word);
if(null == wordCount)
{
wordCount = 1;
}
else
{
wordCount++;
isDuplicate = true;
break;
}
wordMap.put(word, wordCount);
}
if(!isDuplicate)
{
lines.add(line);
}
}
}
catch(IOException e)
{
e.printStackTrace();
}
}
private Map<String, Integer> wordMap = new HashMap<String, Integer>();
private List<String> lines = new ArrayList<String>();
In this snippet, lines will contain the lines which do not have duplicate words in it.
It would have been easier to find your problem if we knew what
this.deleteDuplicates(line);
tries to do. Maybe it is not clearing any of the data structure used. Hence, the words checked in previous lines will be checked for other lines too though they are not present.

Your question is not very clear.
But while going through your code snippet, I think you tried to remove duplicate words in each line.
Following code snippet might be helpful.
public class StackOverflow {
public static void main(String[] args) throws IOException {
List<Set<String>> unique = new ArrayList<Set<String>>();
BufferedReader reader = new BufferedReader(
new FileReader("C:\\temp\\testfile.txt"));
String line =null;
while((line = reader.readLine()) != null){
String[] stringArr = line.split("\\s+");
Set<String> strSet = new HashSet<String>();
for(String tmpStr : stringArr){
strSet.add(tmpStr);
}
unique.add(strSet);
}
}
}

Only problem with your code I see is That DeleteDup doesn't have return type specified.
Otherwise code looks fine and reads from file properly.
Please post deleteDuplicates method code and file used.

You are printing out every line read, not just the unique lines.
Your deleteDuplicateLines() method won't do anything, as there will never be any duplicates in the HashMap.
So it isn't at all clear what your actual problem is.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Frequency of letters in text file. Small output issue - java

Related

Split a string in Java and insert it into an empty string

Java using \034 as delimiter in a string

Bidimap size always 1

Java / Javascript : File Comparison line by line while ignoring certain section

Java + readLine with BufferedReader

Categories

Resources