JAVA : read and write a file together - java

I am trying to read a java file and modify it simultaneously. This is what I need to do : My file is of the format :
aaa
bbb
aaa
ccc
ddd
ddd
I need to read through the file and get the count of the # of occurrences and modify the duplicates to get the following file:
aaa - 2
bbb - 1
ccc - 1
ddd - 2
I tried using the RandomAccessFile to do this, but couldn't do it. Can somebody help me out with the code for this one?

It's far easier if you don't do two things at the same time. The best way is to run through the entire file, count all the occurrences of each string in a hash and then write out all the results into another file. Then if you need to, move the new file over the old one.
You never want to read and write to the same file at the same time. Your offsets within the file will shift everytime you make a write and the read cursor will not keep track of that.

I'd do it this way:
- Parse the original file and save all entries into a new file. Use fixed length data blocks to write entries to the new file (so, say your longest string is 10 bytes long, take 10 + x as block length, x is for the extra info you want to save along the entries. So the 10th entry in the file would be at byte position 10*(10+x)). You'd also have to know the number of entries to create the (so the file size would noOfEntries*blocklength, use a RandomAccesFile and setLength to set the this file length).
- Now use quicksort algorithm to sort the entries in the file (my idea is to have a sorted file in the end which makes things far easier and faster finally. Hashing would theoretically work too, but you'd have to deal with rearranging duplicate entries then to have all duplicates grouped - not really a choice here).
- Parse the file with the now sorted entries. Save a pointer to the entry of the first occurence of a entry. Increment the number of duplicates until there is a new entry. Change the first entry and add that additonal info you want to have there into a new "final result" file. Continue this way with all remaining entries in the sorted file.
Conclusions: I think this should be a reasonably fast and use reasonable amount of resources. However, it depends on the data you have. If you have a very large number of duplicates, quicksort performance will degrade. Also, if your longest data entry is way longer than the average, it will also waste file space.

If you have to, there are ways you can manipulate the same file and update the counters, without having to open another file or keep everything in memory. However, the simplest of the approaches would be very slow.

import java.util.*;
import java.io.*;
import java.util.*;
class WordFrequencyCountTest
{
public static void main( String args[])
{
System.out.println(" enter the file name");
Scanner sc = new Scanner(System.in);
String fname= sc.next();
File f1 = new File(fname);
if(!f1.exists())
{
System.out.println(" Source file doesnot exists");
System.exit(0);
}
else{
try{
FileReader fis = new FileReader(f1);
BufferedReader br = new BufferedReader(fis);
String str = "";
int count=0;
Map<String, Integer> map = new TreeMap<String, Integer>();
while((str = br.readLine()) != null )
{
String[] strArray = str.split("\\s");
count=1;
for(String token : strArray) // iteration of strArray []
{
if(map.get(token)!=null )
{
count=map.get(token);
count++;
map.put(token, count);
count=1;
}else{
map.put(token, count);
}
}
}
Set set=map.entrySet();
Iterator itr = set.iterator();
System.out.println("========");
while(itr.hasNext())
{
Map.Entry entry = (Map.Entry)itr.next();
System.out.println( entry.getKey()+ " "+entry.getValue());
}
fis.close();
}catch(Exception e){}
}
}
}

Related

I have a problem with writing the content of ArrayLists into a file

So I wrote some code for a vocabulary trainer for my german class and want to write the content of my ArrayLists to a file. However it only writes the first of the 3 ArrayLists into the file when saving. Does anyone know what causes this, or better yet, how to fix it? Thanks for you help!
I have already reset all the ArrayLists and re-implemented the file it should write into, but nothing helped.
These are all just sequences of Code, not the whole program. It is over 400 lines Long so I didnt want to paste the whole thing. The Code runs flawlessly until I open the file I wrote into.
static ArrayList<String> vokabel = new ArrayList<String>();
static ArrayList<String> uebersetzung = new ArrayList<String>();
static ArrayList<Integer> kasten = new ArrayList<Integer>();
static void beenden() {
for(int m = 0; m < groesse; m++) {
String str = vokabel.get(m).toString();
textWriter.write(str);
textWriter.write(" ");
}
textWriter.close();
textWriter.println();
for(int n = 0; n < groesse; n++) {
String str = uebersetzung.get(n).toString();
textWriter.write(str);
textWriter.write(" ");
}
textWriter.close();
textWriter.println();
for(int o = 0; o < groesse; o++) {
String str = kasten.get(o).toString();
textWriter.write(str);
textWriter.write(" ");
}
textWriter.close();
textWriter.println();
System.exit(0);
}
I expect it to write the content of all 3 ArrayLists into the file, though it didn't work up until now.
This is what ends up in the file after entering 3 words with their translations and their corresponding case number. Only the words themselves make it into the file:
Hund Nein Hallo
The reason it only writes the first ArrayList into the file is because you're closing the TextWriter immediately after writing it (and when the TextWriter is closed, it doesn't write stuff). Just remove all the
textWriter.close();
lines, and then put just one right before System.exit(0), and it should work properly.
You write all 3 lists one after the other. AND: Inbetween you close the writer! So you are "lucky" not to get an IOException.
I guess you would like the entries of the 3 lists together (vokabel + uebersetzung+ kasten). Therefore I suggest you create a class taking 3 fields with the information. Give this class a sensible toString() and simply write those objects one line at a time.
Ah and btw: Don't ever call System.exit! It makes your program unusable in a larger context and prevents proper cleanup of resources.
Use Flush.when you have text longer than buffer size it will not write.
System.IO.TextWriter writeFile = new StreamWriter("c:\\textwriter.txt");
writeFile.WriteLine("csharp.net-informations.com");
writeFile.Flush();
writeFile.Close();

Would you help me to get an idea to do next after String w1 = scanner.nextLine(); to check for the occurrence of the words

A file called “getwordinfo.txt” should reside in your project directory that should contain some of the words that have been found in the input files. Read in each of the words from this file (maybe using a simple Scanner object), and then output the following to the console window:
The word itself
The list of occurrences of that word, or, if the word never occurred, simply output “Not found”
The total number of occurrences of the word, and the usage frequency of the word (as a percentage) relative to all word occurrences in the input files
//
File fileInTheFolder = new File(f, docname);
fileInTheFolder.createNewFile();
File infile = new File("input.txt");
Scanner scanner = new Scanner(infile);
String w1 = scanner.nextLine();
What I suggest you do (read: what I would probably do as a first approach) is to create a Map to hold the data in and then, using your reader insert the data into the map one by one.
A Map, for example:
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
works by having two fields, a key and a value. In your case the key is the word you want to count instances of and the value is the counter value.
Once you have your map you can begin inserting into it.
For example, as seen here:
for (String a : args) {
Integer count = m.get(a);
m.put(a, (count == null) ? 1 : count + 1);
}
What we do is:
Check if a has already been seen.
If it has been seen then we add to the counter ( + 1) otherwise we set the initial counter value to 1.
So if you could take your line and parse it into words, go through the words and insert them into the map you will have your answer.

Word Count from a file

I'm at the start of writing my program (this is for a class) and I'm running into trouble to just write it down. Here's a list of goals I am hoping to meet.
It is a method given a .txt file (using java.io.File)
It needs to read the file and split the words, duplicates are allowed. (I plan to use String.split and util.regex.Pattern to work out whitespace and punctuation)
I'm aiming to put the words in a 1D array and then just find the length of the array.
The problem I'm running into is parsing the txt file. I was told in class that Scanner can, but I'm not finding it while R(ing)TFM. I guess I'm asking for some directions in the API that helps me understand how to read a file with Scanner. Once I can get it to put each word in the array I should be in the clear.
EDIT: I figured out what I needed to do thanks to everyone's help and input. My final snippet ends up looking like this, should anyone in the future come across this question.
Scanner in = new Scanner(file).useDelimiter(" ");
ArrayList<String> prepwords=new ArrayList<String>();
while(in.hasNext())
prepwords.add(in.next());
return prepwords; //returns an ArrayList without spaces but still has punctuation
I had to throw IOExceptions since java hates not being sure a file exists, so if you run into "FileNotFoundException", you need to import and throw IOException. At the very least this worked for me. Thank you everyone for your input!
BufferedReader input = new BufferedReader(new FileReader(filename));
input.readLine();
This is what I use to read from files. Note that you have to handle the IOException
Here is a link to the JSE 6.0 Scanner API
Here is the info you need to complete your project:
1. Use the Scanner(File) constructor.
2. Use a loop that is, essentially this:
a. Scanner blam = new Scanner(theInputFile);
b. Map<String, Integer> wordMap = new HashMap<String, Integer>();
c. Set<String> wordSet = new HashSet<String>();
d. while (blam.hasNextLine)
e. String nextLine = blam.nextLine();
f. Split nextLine into words (head about the read String.split() method).
g. If you need a count of words: for each word on the line, check if the word is in the map, if it is, increment the count. If not, add it to the map. This uses the wordMap (you dont need wordSet for this solution).
h. If you just need to track the words, add each word on the line to the set. This uses the wordSet (you dont need wordMap for this solution).
3. that is all.
If you dont need either the map or the set, then use a List<String> and either an ArrayList or a LinkedList. If you dont need random access to the words, LinkedList is the way to go.
Something simple:
//variables you need
File file = new File("someTextFile.txt");//put your file here
Scanner scanFile = new Scanner(new FileReader(file));//create scanner
ArrayList<String> words = new ArrayList<String>();//just a place to put the words
String theWord;//temporary variable for words
//loop through file
//this looks at the .txt file for every word (moreover, looks for white spaces)
while (scanFile.hasNext())//check if there is another word
{
theWord = scanFile.next();//get next word
words.add(theWord);//add word to list
//if you dont want to add the word to the list
//you can easily do your split logic here
}
//print the list of words
System.out.println("Total amount of words is: " + words.size);
for(int i = 0; i<words.size(); i++)
{
System.out.println("Word at " + i + ": " + words.get(i));
}
Source:
http://www.dreamincode.net/forums/topic/229265-reading-in-words-from-text-file-using-scanner/

Java Scanner to print previous and next lines

I am using 'java.util.Scanner' to read and scan for keywords and want to print the previous 5 lines and next 5 lines of the encountered keyword, below is my code
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("ERROR");
keywords.add("EXCEPTION");
java.io.File file = new java.io.File(LOG_FILE);
Scanner input = null;
try {
input = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int count = 0;
String previousLine = null;
while(input.hasNext()){
String line = input.nextLine();
for(String keyword : keywords){
if(line.contains(keyword)){
//print prev 5 lines
system.out.println(previousLine); // this will print only last previous line ( i need last 5 previous lines)
???
//print next 5 lines
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
}
previousLine = line;
}
any pointers to print previous 5 lines..?
any pointers to print previous 5 lines..?
Save them in an Dequeue<String> such as a LinkedList<String> for its "First In First Out (FIFO)" behavior.
Either that or use 5 variables or an array of 5 Strings, manually move Strings from one slot or variable to another, and then print them.
If you use Dequeue/LinkedList, use the Dequeue's addFirst(...) method to add a new String to the beginning and removeLast() to remove the list's last String (if its size is > 5). Iterate through the LinkedList to get the current Strings it contains.
Other suggestions:
Your Scanner's check scanner.hasNextXXX() method should match the get method, scanner.nextXXX(). So you should check for hasNextLine() if you're going to call nextLine(). Otherwise you risk problems.
Please try to post real code here in your questions, not sort-of, will never compile code. i.e., system.out.println vs System.out.println. I know it's a little thing, but it means a lot when others try to play with your code.
Use ArrayList's contains(...) method to get rid of that for loop.
e.g.,
LinkedList<String> fivePrevLines = new LinkedList<>();
java.io.File file = new java.io.File(LOG_FILE);
Scanner input = null;
try {
input = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while (input.hasNextLine()) {
String line = input.nextLine();
if (keywords.contains(line)) {
System.out.println("keyword found!");
for (String prevLine : fivePrevLines) {
System.out.println(prevLine);
}
} else {
fivePrevLines.addFirst(line);
if (fivePrevLines.size() > 5) {
fivePrevLines.removeLast();
}
}
}
if (input != null) {
input.close();
}
Edit
You state in comment:
ok i ran small test program to see if the contains(...) method works ...<unreadable unformatted code>... and this returned keyword not found...!
It's all how you use it. The contains(...) method works to check if a Collection contains another object. It won't work if you feed it a huge String that may or may not use one of the Strings in the collection, but will work on the individual Strings that comprise the larger String. For example:
ArrayList<String> temp = new ArrayList<String>();
temp.add("error");
temp.add("exception");
String s = "Internal Exception: org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object";
String[] tokens = s.split("[\\s\\.:,]+");
for (String token : tokens) {
if (temp.contains(token.toLowerCase())) {
System.out.println("keyword found: " + token);
} else {
System.out.println("keyword not found: " + token);
}
}
Also, you will want to avoid posting code in comments since they don't retain their formatting and are unreadable and untestable. Instead edit your original question and post a comment to alert us to the edit.
Edit 2
As per dspyz:
For stacks and queues, when there isn't any significant functionality/performance reason to use one over the other, you should default to ArrayDeque rather than LinkedList. It's generally faster, takes up less memory, and requires less garbage collection.
If your file is small (< a million lines) you are way better off just copying the lines into an ArrayList and then getting the next and previous 5 lines using random access into the array.
Sometimes the best solution is just plain brute force.
Your code is going to get tricky if you have two keyword hits inside your +-5 line window. Let's say you have hits two lines apart. Do you dump two 10-line windows? One 12-line window?
Random access will make implementing this stuff way easier.

Merge sorting multiple files which have variable word counts

I am splitting a 10 GB file into multiple files of 100000 + few hundred words(since I read upto the line when I encounter 100000 words).
private void splitInputFile(String path) {
try{
File file=new File(path);
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String temp;
temp = br.readLine();
String fileName="fileName";
int fileCount = 1;
while(temp!=null){
//TODO Read 100000 words, sort and write to a file. Repeat for the entire file
if(wordsToBeSorted.size()<=100000){
startCounting(temp);
temp=br.readLine();
}//end of if -> place 100000+ words inside the list
else{
Collections.sort(wordsToBeSorted);
fileName = "fileName"+fileCount;
fileCount++;
File splitFile = new File(fileName);
PrintWriter pr = new PrintWriter(splitFile);
for(String word:wordsToBeSorted){
pr.write(word);
pr.write("\n");//check if this works -> 1 word per line
}//end of for
}//end of else
}//end of while
mergeSort(fileCount);
}//end of try
catch(Exception e){
e.printStackTrace();
}
}
private void startCounting(String sb) {
StringTokenizer tokenizer = new StringTokenizer(sb);// Split by space
while (tokenizer.hasMoreTokens()) {
String text = tokenizer.nextToken();
text = text.replaceAll("\\W", "");// Remove all symbols
if("".equals(text.trim()))
continue;
wordsToBeSorted.add(text);
}
}
Now I wonder how to do a sorting with these files. I found out that I am supposed to do a Merge Sort. Considering the facts that each splitFile would have variable number of words(100000 + a few extra words), is it possible to do a merge sort involving files of variable word counts? Or should I follow some other approach to split the file?
is it possible to do a merge sort involving files of variable word counts?
Sure. I assume the goal here is external sorting. Just open up all input files (unless there are really really many, in which case you might have to do multiple runs), read the first word from each. Then identify the input with the smallest word, put that into the output and read the next word from that input. Close and remove any inputs which become empty, unless you have no more inputs.
If you have many inputs, you can use a heap to organize your inputs, with the next word as key. You'd remove the minimal object and then reinsert it after you have proceeded to the next word.

Categories