Scanner unable to capture last newline character if last line is empty - java

I am taking a the programming class where we have to compress a file using a Huffman Tree and decompress it.
I am running into a problem where I am unable to capture the last newline character of a txt file.
E.G.
This is a line
This is a secondline
//empty line
So if I compress and decompress the above text in a file, I end up with a file with this
This is a line
This is a secondline
Right now I'm doing
while(Scanner.hasNextLine()){
char[] cArr = file.nextLine().toCharArray();
//count amount of times a character appears with a hashmap
if(file.hasNextLine()){
//add an occurrence of \n to the hashmap
}
}
I understand the problem is that the last line technically does not have a "Scanner.hasNextline()" since I just consumed the last '\n' of the file with the nextLine() call.
Upon realizing that I have tried doing useDelimiter("") and Scanner.next() instead of Scanner.nextLine() and both still lead to similar problems.
So is there a way to fix this?
Thanks in advance.

Not to completely change your code or approach, using StringBuilder seems to work well.
File testfile = new File("test.txt");
StringBuilder stringBuffer = new StringBuilder();
try{
BufferedReader reader = new BufferedReader(new FileReader(testfile));
char[] buff = new char[500];
for (int charsRead; (charsRead = reader.read(buff)) != -1; ) {
stringBuffer.append(buff, 0, charsRead);
}
}
catch(Exception e){
System.out.print(e);
}
System.out.println(stringBuffer);
Checking the total bytes read:
System.out.println(stringBuffer.length()); // 51
List of file size:
ls -l .
51 test.txt
Bytes read match, so it appears it got all lines including the blank line.
note: I use Java 6, modify to suit your version.
Hope this helps.

Related

Scanner difficulties with different escape characters

First off let me start by saying that I know I'm not the only one who has experienced this issue and I spent the last couple of hours to research how to fix it. Sadly, I can't get my scanner to work. I'm new to java so I don't understand more complicated explanations that some answers have in different questions.
Here is a rundown:
I'm trying to read out of a file which contains escape characters of cards. Here is a short version: (Numbers 2 and 3 of 4 different card faces)
\u26602,2
\u26652,2
\u26662,2
\u26632,2
\u26603,3
\u26653,3
\u26663,3
\u26633,3
This is the format: (suit)(face),(value). an example:
\u2663 = suit
3 = face
3 = value
This is the code I'm using for reading it:
File file = new File("Cards.txt");
try {
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] temp = line.split(",");
cards.add(new Card(temp[0], Integer.parseInt(temp[1])));
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
the ArrayList cards should have 52 cards after this containing a name (suit and face) and a value. When i try to print the name this is the output:
\u26633
While it should be:
♣3
Can anyone give me pointers towards a solution? I really need this issue resolved. I don't want you to write my code for me.
Thanks in advance
Simply store directly the suit characters into your files Cards.txt using UTF-8 as character encoding instead of the corresponding unicode character format that is only understood by java such that when it is read from your file it is read as the String "\u2660" not as the corresponding unicode character.
Its content would then be something like:
♠2,2
...
Another way could be to use StringEscapeUtils.unescapeJava(String input) to unescape your unicode character.
The code change would then be:
cards.add(new Card(StringEscapeUtils.unescapeJava(temp[0]), Integer.parseInt(temp[1])));
You'll have to save your file with UTF-8 encoding and then read the file using the same encoding.
♥,1
♥,2
♥,3
Here is the code snippet:
BufferedReader buff = new BufferedReader(new InputStreamReader(
new FileInputStream("Cards.txt"), "UTF-8"));
String input = null;
while (null != (input = buff.readLine())) {
System.out.println(input);
String[] temp = input.split(",");
cards.add(new Card(temp[0], Integer.parseInt(temp[1])));
}
buff.close();
Also, you need to make sure that your console is enabled to support UTF-8. Look at this answer to read more about it.

Remove new line character from the middle of a file line in java

I would like to remove the new line character in the middle of a line of a file while it is reading the file.
If I'm going to read the file with BufferedReader then it is recognised as a new line and split the line in the middle. I want to be able to read the file and remove those new line characters of the middle while reading.
The format of each line is a simple Json.
Thank you
If what youre saying is you want to remove the newlines from the original file after reading them, I think you can just write to a new (temporary) file while youre reading the lines, and then replace the file with the original after youre done writing.
If I'm interpreting your question correctly, what you want to do isn't quite as simple as "read and write at the same time". What you need is a loop and a StringBuilder.
public String readFileWithNoLines(BufferedReader reader) throws IOException {
StringBuilder builder = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
builder.append(line);
}
return builder.toString();
}
Then you want to write the return value of that function to the file.

How to copy a file line by line keeping its original line breaks

I need to replace keys in various files (all types and line break format).
To do this, i tried to copy the file line by line and replace keys in the line. This works but original line breaks are lost.
Here is my code, quite common:
FileInputStream fis = new FileInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(fis));
FileWriter writer = new FileWriter(tmpFile);
BufferedWriter out = new BufferedWriter(writer);
String line;
while ((line = reader.readLine() != null) {
String updatedLine = replaceKeys(line);
out.write(updatedLine);
out.newLine();
}
I need to read the file line by line to be able to replace keys correctly (keys are determined by some delimiters, they must not be cut during file reading).
The problem is my unix files (.sh) has wrong line breaks after replacement (code is run on windows). And files with only one line is changed into a 2-lines file.
Question is, how to keep original file line breaks while copying the file line by line or, at least, how to be able to determine the end of the file to not add an additional line at the end? Thanks for your help.
Edit: Useless DataInputStream removed.
You can use a Scanner for this job:
try(Scanner s=new Scanner(file).useDelimiter("(?<=\n)|(?!\n)(?<=\r)");
FileWriter out=new FileWriter(tmpFile)) {
while(s.hasNext()){
String line=s.next();
String updatedLine = replaceKeys(line);
out.write(updatedLine);
}
}
The key point is the regex specified as delimiter. The pattern used above will match what BufferedReader.readLine() matches for a line break, that is, a '\n', '\r' followed by '\n', or a lone '\r'. But it uses “zero width lookbehind” to match the position after the line break rather than the line break itself so the line break becomes part of the token returned by Scanner.next().
So the String line will contain the line break at its end, unless it’s the last line not terminated by a line break. So all you have to do, assuming that replaceKeys leaves the line break untouched, is to write the Stringas-is without appending a line break manually.
If replaceKeys can not cope with the String having a line break at its end, you have to split it before calling the method and joining afterwards.

Reading file to String in Java results in invisible characters

I'm having trouble around reading from a text file into a String in Java. I have a text file (created in Eclipse, if that matters) that contains a short amount of text -- approximately 98 characters. Reading that file to a String via several methods results in a String that is quite a bit longer -- 1621 characters. All but the relevant 98 are invisible in the debugger/console.
I've tried the following methods to load the String:
apache commons-io:
FileUtils.readFileToString(new File(path));
FileUtils.readFileToString(new File(path), "UTF-8");
byte[] b = FileUtils.readFileToByteArray(new File(path);
new String(b, "UTF-8");
byte[] b = FileUtils.readFileToByteArray(new File(path);
Charset.defaultCharset().decode(ByteBuffer.wrap(bytes)).toString();
NIO:
new String(Files.readAllBytes(path);
And so on.
Is there a method to strip away these control chars? Is there a way to read files to strings that doesn't have this issue?
As noted in the comments below, this behavior is due to a corrupted(?) file generated by Eclipse. I'd still be interested in hearing any strategies for trimming away control characters from Strings, though!
If you want to strip out all non-printable characters, try this
str = str.replaceAll("[^\\p{Graph}\n\r\t ]", "");
The regex matches all "invisible" characters, except ones we want to keep; in this case newline chars, tabs and spaces.
\p{Graph} is a POSIX character class for all printable/visible characters. To negate a POSIX character class, we can use capital P, ie P{Graph} (all non-printable/invisible characters), however we need to not exclude newlines etc, so we need [^\\p{Graph}\n\r\t] .
Read it line by line into a StringBuilder, and then convert it to a String:
StringBuilder sb = new StringBuilder();
BufferedReader file = new BufferedReader(new FileReader(fileName));
while (true)
{
String line = file.readLine();
if (line == null)
break;
sb.append(line+"\n");
}
file.close();
return sb.toString();

Conflicting character counts

I'm trying to find the number of characters in a given text file.
I've tried using both a scanner and a BufferedReader, but I get conflicting results. With the use of a scanner I concatenate every line after I append a new line character. E.g. like this:
FileReader reader = new FileReader("sampleFile.txt");
Scanner lineScanner = new Scanner(reader);
String totalLines = "";
while (lineScanner.hasNextLine()){
String line = lineScanner.nextLine()+'\n';
totalLines += line;
}
System.out.println("Count "+totalLines.length());
This returns the true character count for my file, which is 5799
Whereas when I use:
BufferedReader reader = new BufferedReader(new FileReader("sample.txt"));
int i;
int count = 0;
while ((i = in.read()) != -1) {
count++;
}
System.out.println("Count "+count);
I get 5892.
I know using the lineScanner will be off by one if there is only one line, but for my text file I get the correct ouput.
Also in notepad++ the file length in bytes is 5892 but the character count without blanks is 5706.
Your file may have lines terminated with \r\n rather than \n. That could cause your discrepancy.
You have to consider the newline/carriage returns character in a text file. This also counts as a character.
I would suggest using the BufferedReader as it will return more accurate results.

Categories