Scanner to reset pointer at previous line - java

My problem could be solved if Scanner class had previous() method on it. I am asking this question to know if there are any methods to achieve this functionality.
Input:
a file with contents like
a,1
a,2
a,3
b,1
c,1
c,2
c,3
c,4
d,1
d,2
d,3
e,1
f,1
I need to create a list of all lines that has same alphabet.
try {
Scanner scanner = new Scanner(new File(fileName));
List<String> procList = null;
String line =null;
while (scanner.hasNextLine()){
line = scanner.nextLine();
System.out.println(line);
String[] sParts = line.split(",");
procList = new ArrayList<String>();
procList.add(line);
boolean isSamealpha = true;
while(isSamealpha){
String s1 = scanner.nextLine();
if (s1.contains(sParts[0])){
procList.add(s1);
}else{
isSamealpha = false;
System.out.println(procList);
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
I get output like
a,1
[a,1, a,2, a,3]
c,1
[c,1, c,2, c,3, c,4]
d,2
[d,2, d,3]
f,1
[f,1]
As you can see it missed list for b and e. If I has scanner.previous() method, I would have put it in else of second while loop. Because there is no previous method, I am stuck.
Please let me know if there are any methods I can use. I can't use FileUtils.readLines() because its a 3GB file and I don't want to use my java memory to store all the file.

I would suggest reconsidering your algorithm instead. You are missing tokens because your algorithm involves reading ahead to determine when the sequence has broken, yet you aren't collecting that next line of input into the same structures that you are placing "duplicate" entries.
You can solve this without needing to read backwards. If you know that the input is always sorted, just read line by line and keep a reference to the last line (to compare with the current one).

Below is some sample code that should help. (I only typed this; I did no checking.)
Scanner scanner = new Scanner(new File(fileName));
List<String> procList = null;
String line = null;
String previousAlpha = null;
while (scanner.hasNextLine()){
line = scanner.nextLine();
if (previousAlpha == null) {
// very first line in the file
procList = new ArrayList<String>();
procList.add(line);
System.out.println(line);
previousAlpha = line.split(",")[0];
}
else if (line.contains(previousAlpha)) {
// same letter as before
procList.add(line);
}
else {
// new letter, but not the very first
// line
System.out.println(procList);
procList = new ArrayList<String>();
procList.add(line);
System.out.println(line);
previousAlpha = line.split(",")[0];
}
}

Related

How read data from file that is separated by a blank line in Java

For example I have a file "input.txt" :
This is the
first data
This is the second
data
This is the last data
on the last line
And I want to store this data in a ArrayList in this form:
[This is the first data, This is the second data, This is the last data on the last line]
Note: Every data in file is separated by a blank line. How to skip this blank line?
I try this code but it don't work right:
List<String> list = new ArrayList<>();
File file = new File("input.txt");
StringBuilder stringBuilder = new StringBuilder();
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
if (!line.trim().isEmpty())
stringBuilder.append(line).append(" ");
else {
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
Blank lines are not really blank. There are end-of-line character(s) involved the terminate each line. An apparent empty line means you have a pair of end-of-line character(s) abutting.
Search for that pair, and break your inputs when found. For example, using something like String::split.
For example, suppose we have a file with the words this and that.
this
that
Let's visualize this file, showing the LINE FEED (LF) character (Unicode code point 10 decimal) used to terminate each line as <LF>.
this<LF>
<LF>
that<LF>
To the computer, there are no “lines”, so the text appears to Java like this:
this<LF><LF>that<LF>
You can more clearly now notice how pairs of LINE FEED (LF) characters delimit each line. Search for the instances of that pairing to parse your text.
You are actually almost there. What you missed is that the last 2 lines need to be handled differently, as there is NO empty-string line at the bottom of the file.
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
//System.out.println(line);
if (!line.trim().isEmpty())
stringBuilder.append(line).append(" ");
else { //this is where new line happens -> store the combined string to arrayList
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
//Below is to handle the last line, as after the last line there is NO empty line
if (stringBuilder.length() != 0) {
list.add(stringBuilder.toString());
} //end if
for (int i=0; i< list.size(); i++) {
System.out.println(list.get(i));
} //end for
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
Output of above:
This is the first data
This is the second data
This is the last data on the last line
I added an if codition right after the while loop in your code and it worked,
List<String> list = new ArrayList<>();
File file = new File("input.txt");
StringBuilder stringBuilder = new StringBuilder();
try (Scanner in = new Scanner(file)) {
while (in.hasNext()) {
String line = in.nextLine();
if (!line.trim().isEmpty()) {
stringBuilder.append(line).append(" ");
}
else {
list.add(stringBuilder.toString());
stringBuilder = new StringBuilder();
}
}
if (stringBuilder.toString().length() != 0) {
list.add(stringBuilder.toString());
}
} catch (FileNotFoundException e) {
System.out.println("Not found file: " + file);
}
System.out.println(list.toString());
I got the below output
[This is the first data , This is the second data , This is the last data on the last line ]

Handling string out of range exception

I am doing a very basic loop through a file. The file contains a number of entries, however, it seems to break after the 3rd loop which definately contains more than 25 characters. The simple loop is as follows:
public static void organiseFile() throws FileNotFoundException {
ArrayList<String> lines = new ArrayList<>();
String directory = "C:\\Users\\hussainm\\Desktop\\Files\\ex1";
Scanner fileIn = new Scanner(new File(directory + "_temp.txt"));
PrintWriter out = new PrintWriter(directory + "_ordered.txt");
while (fileIn.hasNextLine() == true) {
if (!fileIn.nextLine().isEmpty()) {
lines.add(fileIn.nextLine());
String test = fileIn.nextLine().substring(12, 25);
System.out.println(test);
}
}
I am not sure what the issue is, but it keeps throwing:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
String index out of range: 25 at java.lang.String.substring(Unknown
Source) at
fedOrganiser.fedOrganiser.organiseFile(fedOrganiser.java:41) at
fedOrganiser.fedOrganiser.main(fedOrganiser.java:31)
Not sure what its issue is.
File is as follows:
https://www.dropbox.com/s/69h1f8u387zikbp/ex1_temp.txt?dl=0
Every call to nextLine() reads the next line from the stream. It is nextLine(), not hasNextLine(), which advances the stream one line's worth of text. You are reading 3 lines per loop.
When calling nextLine for the first time in a loop, assign it to a variable and refer to that variable for the rest of the loop.
String line = fileIn.nextLine();
if (!line.isEmpty()) {
lines.add(line);
String test = line.substring(12, 25);
System.out.println(test);
}
Incidentally, there is no need to compare a boolean such as what is returned by hasNextLine() to true. Just use the boolean itself, e.g.:
while (fileIn.hasNextLine()) {
You're assuming every line has at least 25 characters in it with the line:
String test = fileIn.nextLine().substring(12, 25);
I'm guessing you have some lines that are shorter or blank.
You'd check String length() before doing substrings.
The call to scanner.nextLine() advances to the next line. You should do it like this, if you are sure that every line has at least 25 characters:
public static void organiseFile() throws FileNotFoundException {
ArrayList<String> lines = new ArrayList<>();
String directory = "C:\\Users\\hussainm\\Desktop\\Files\\ex1";
Scanner fileIn = new Scanner(new File(directory + "_temp.txt"));
PrintWriter out = new PrintWriter(directory + "_ordered.txt");
while (fileIn.hasNextLine()) {
String line = fileIn.nextLine();
if (!line.isEmpty()) {
lines.add(line);
String test = line.substring(12, 25);
System.out.println(test);
}
}
...
}
What I do not understand is what you want to test with line.isEmpty() because this will be always true as long as there are lines. Even a seemingly empty line contains at least a line break.
The exception will be thrown if the line you are parsing is of less than 25 chars long.
Notes
Not sure if your intent is to parse every 3 lines, but
fileIn.nextLine() appear three time. So you are missing one line out of three.
See doc:
Advances this scanner past the current line and returns the input that
was skipped.
Maybe this is what you are trying to do:
Scanner in = null;
PrintWriter out = null;
try {
URL url = this.getClass().getResource("/test_in.txt");
File file = new File(url.getFile());
ArrayList<String> lines = new ArrayList<>();
in = new Scanner(file);
out = new PrintWriter("/test_out.txt");
int lineNumber = 0;
while (in.hasNextLine()) {
String line = in.nextLine();
lineNumber++;
if (line != null && line.trim().length() > 0) {
lines.add(line);
String test = line.substring(12, line.length()<25?line.length():25);
System.out.println(String.format("line# %d: \t\"%s\"", lineNumber, test));
}
}
System.out.println(String.format("last line number: %d", lineNumber));
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
in.close();
out.close();
}
EDIT: For the completeness

removeAll operation on arraylist makes program hang

I'm trying to read in from two files and store them in two separate arraylists. The files consist of words which are either alone on a line or multiple words on a line separated by commas.
I read each file with the following code (not complete):
ArrayList<String> temp = new ArrayList<>();
FileInputStream fis;
fis = new FileInputStream(fileName);
Scanner scan = new Scanner(fis);
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
scan.close();
return temp;
Each file contains almost 1 million words (I don't know the exact number), so I'm not entirely sure that the above code works correctly - but it seems to.
I now want to find out how many words are exclusive to the first file/arraylist. To do so I planned on using list1.removeAll(list2) and then checking the size of list1 - but for some reason this is not working. The code:
public static ArrayList differentWords(String fileName1, String fileName2) {
ArrayList<String> file1 = readFile(fileName1);
ArrayList<String> file2 = readFile(fileName2);
file1.removeAll(file2);
return file1;
}
My main method contains a few different calls and everything works fine until I reach the above code, which just causes the program to hang (in netbeans it's just "running").
Any idea why this is happening?
You are not using input in
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
I think you meant to do this:
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (input.hasNext()) {
String md5 = input.next();
temp.add(md5);
}
}
but that said you should look into String#split() that will probably save you some time:
while (scan.hasNextLine()) {
String line = scan.nextLine();
String[] tokens = line.split(",");
for (String token: tokens) {
temp.add(token);
}
}
try this :
for(String s1 : file1){
for(String s2 : file2){
if(s1.equals(s2)){file1.remove(s1))}
}
}

Get the offset of previous line in a file

I'm extracting data from a file line by line into a database and i can't figure out a proper way to flag lines that I've already read into my database.
I have the following code that I use to iterate through the file lines and I attempt to verify
that the line has my flag or else I try to append the flag to the file line
List<String> fileLines = new ArrayList<String>();
File logFile = new File("C:\\MyStuff\\SyslogCatchAllCopy.txt");
try {
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line = "";
String doneReadingFlag = "##";
Scanner fileScanner = new Scanner(logFile);
while ((line = raf.readLine()) != null && !line.contains(doneReading)) {
Scanner s = new Scanner(line);
String temp = "";
if (!s.hasNext(doneReadingFlag)) {
fileLines.add(line);
raf.write(doneReadingFlag.getBytes(), (int) raf.getFilePointer(),
doneReadingFlag.getBytes().length);
} else {
System.err.println("Allready Red");
}
}
} catch (FileNotFoundException e) {
System.out.println("File not found" + e);
} catch (IOException e) {
System.out.println("Exception while reading the file ");
}
// return fileLines;
// MoreProccessing(fileLines);
This code appends the flag to the next line and it overwrites the characters in that position
Any Help ?
When you write to a file, it doesn't insert do you should expect it to replace the characters.
You need to reserve space in the file for information you want to change or you can add information to another file.
Or instead of marking each file, you can store somewhere the lines number (or better the character position) you have read up to.
If you are not restarting your process you can have process read the file as it is appended (meaning you might not need to store where you are up to anywhere)
#Peter Lawrey I did as you said and it worked for me like that:
as follows:
ArrayList<String> fileLines=new ArrayList<String>();
File logFile=new File("C:\\MyStuff\\MyFile.txt");
RandomAccessFile raf = new RandomAccessFile(logFile, "rw");
String line="";
String doneReadingFlag="#";
long oldOffset=raf.getFilePointer();
long newOffset=oldOffset;
while ((line=raf.readLine())!=null)
{
newOffset=raf.getFilePointer();
if(!line.contains(doneReadingFlag))
{
fileLines.add(line);
raf.seek((long)oldOffset);
raf.writeChars(doneReadingFlag);
raf.seek(newOffset);
System.out.println("Line added and flaged");
}
else
{
System.err.println("Already Red");
}
oldOffset=newOffset;
}

How do I make my java code search only for a to z and 0 to 9

My java code takes almost 10-15minutes to run (Input file is 7200+ lines long list of query). How do I make it run in short time to get same results?
How do I make my code to search only for aA to zZ and 0 to 9??
If I don't do #2, some characters in my output are shown as "?". How do I solve this issue?
// no parameters are used in the main method
public static void main(String[] args) {
// assumes a text file named test.txt in a folder under the C:\file\test.txt
Scanner s = null;
BufferedWriter out = null;
try {
// create a scanner to read from the text file test.txt
FileInputStream fstream = new FileInputStream("C:\\user\\query.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
// Write to the file
out = new BufferedWriter(new FileWriter("C:\\user\\outputquery.txt"));
// keep getting the next String from the text, separated by white space
// and print each token in a line in the output file
//while (s.hasNext()) {
// String token = s.next();
// System.out.println(token);
// out.write(token + "\r\n");
//}
String strLine="";
String str="";
while ((strLine = br.readLine()) != null) {
str+=strLine;
}
String st=str.replaceAll(" ", "");
char[]third =st.toCharArray();
System.out.println("Character Total");
for(int counter =0;counter<third.length;counter++){
//String ch= "a";
char ch= third[counter];
int count=0;
for ( int i=0; i<third.length; i++){
// if (ch=="a")
if (ch==third[i])
count++;
}
boolean flag=false;
for(int j=counter-1;j>=0;j--){
//if(ch=="b")
if(ch==third[j])
flag=true;
}
if(!flag){
System.out.println(ch+" "+count);
out.write(ch+" "+count);
}
}
// close the output file
out.close();
} catch (IOException e) {
// print any error messages
System.out.println(e.getMessage());
}
// optional to close the scanner here, the close can occur at the end of the code
finally {
if (s != null) {
// close the input file
s.close();
}
}
}
For something like this I would NOT recommend java though it entirely possible it is much easier with GAWK or something similar. GAWK also has java like syntax so its easy to pick up. You should check it out.
SO isn't really the place to ask such a broad how-do-I-do-this-question but I will refer you to the following page on regular expression and text match in Java. Also, check out the Javadocs for regexes.
If you follow that link you should get what you want, else you could post a more specific question back on SO.

Categories