Java Scanner does not ignore new lines (\n) - java

I know that by default, the Scanner skips over whitespaces and newlines.
There is something wrong with my code because my Scanner does not ignore "\n".
For example: the input is "this is\na test." and the desired output should be ""this is a test."
this is what I did so far:
Scanner scan = new Scanner(System.in);
String token = scan.nextLine();
String[] output = token.split("\\s+");
for (int i = 0; i < output.length; i++) {
if (hashmap.containsKey(output[i])) {
output[i] = hashmap.get(output[i]);
}
System.out.print(output[i]);
if (i != output.length - 1) {
System.out.print(" ");
}

nextLine() ignores the specified delimiter (as optionally set by useDelimiter()), and reads to the end of the current line.
Since input is two lines:
this is
a test.
only the first line (this is) is returned.
You then split that on whitespace, so output will contain [this, is].
Since you never use the scanner again, the second line (a test.) will never be read.
In essence, your title is right on point: Java Scanner does not ignore new lines (\n)
It specifically processed the newline when you called nextLine().

You don't have to use a Scanner to do this
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String result = in.lines().collect(Collectors.joining(" "));
Or if you really want to use a Scanner this should also work
Scanner scanner = new Scanner(System.in);
Spliterator<String> si = Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED);
String result = StreamSupport.stream(si, false).collect(Collectors.joining(" "));

Related

Java Scanner ignores first token if it is empty

I'm trying to read an InputStream of String tokens with a Scanner. Every token ends with a comma ,. An empty string "" is also a valid token. In that case the whole token is just the comma that ends it.
The InputStream is slowly read from another process, and any tokens should be handled as soon as they have been fully read. Therefore reading the whole InputStream to a String is out of the question.
An example input could look like this:
ab,,cde,fg,
If I set the delimiter of the Scanner to a comma, it seems to handle the job just fine.
InputStream input = slowlyArrivingStreamWithValues("ab,,cde,fg,");
Scanner scan = new Scanner(input);
scan.useDelimiter(Pattern.quote(","));
while (scan.hasNext()) {
System.out.println(scan.next());
}
output:
ab
cde
fg
However the problems appear when the stream begins with an empty token. For some reason Scanner just ignores the first token if it is empty.
/* begins with empty token */
InputStream input = slowlyArrivingStreamWithValues(",ab,,cde,fg,");
...
output:
ab
cde
fg
Why does Scanner ignore the first token? How can I include it?
Try using a lookbehind as the pattern:
(?<=,)
and then replace comma with empty string with each token that you match. Consider the following code:
String input = ",ab,,cde,fg,";
Scanner scan = new Scanner(input);
scan.useDelimiter("(?<=,)");
while (scan.hasNext()) {
System.out.println(scan.next().replaceAll(",", ""));
}
This outputs the following:
(empty line)
ab
cde
fg
Demo
It's easier if you write it yourself, without using Scanner:
static List<String> getValues(String source){
List<String> list = new ArrayList<String>();
for(int i = 0; i < source.length();i++){
String s = "";
while(source.charAt(i) != ','){
s+=source.charAt(i++);
if(i >= source.length()) break;
}
list.add(s);
}
return list;
}
For example, if source = ",a,,b,,c,d,e", the output will be "", "a", "", "c", "d", "e".

How to break a file into tokens based on regex using Java

I have a file in the following format, records are separated by newline but some records have line feed in them, like below. I need to get each record and process them separately. The file could be a few Mb in size.
<?aaaaa>
<?bbbb
bb>
<?cccccc>
I have the code:
FileInputStream fs = new FileInputStream(FILE_PATH_NAME);
Scanner scanner = new Scanner(fs);
scanner.useDelimiter(Pattern.compile("<\\?"));
if (scanner.hasNext()) {
String line = scanner.next();
System.out.println(line);
}
scanner.close();
But the result I got have the begining <\? removed:
aaaaa>
bbbb
bb>
cccccc>
I know the Scanner consumes any input that matches the delimiter pattern. All I can think of is to add the delimiter pattern back to each record mannully.
Is there a way to NOT have the delimeter pattern removed?
Break on a newline only when preceded by a ">" char:
scanner.useDelimiter("(?<=>)\\R"); // Note you can pass a string directly
\R is a system independent newline
(?<=>) is a look behind that asserts (without consuming) that the previous char is a >
Plus it's cool because <=> looks like Darth Vader's TIE fighter.
I'm assuming you want to ignore the newline character '\n' everywhere.
I would read the whole file into a String and then remove all of the '\n's in the String. The part of the code this question is about looks like this:
String fileString = new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8);
fileString = fileString.replace("\n", "");
Scanner scanner = new Scanner(fileString);
... //your code
Feel free to ask any further questions you might have!
Here is one way of doing it by using a StringBuilder:
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(new File("C:\\test.txt"));
StringBuilder builder = new StringBuilder();
String input = null;
while (in.hasNextLine() && null != (input = in.nextLine())) {
for (int x = 0; x < input.length(); x++) {
builder.append(input.charAt(x));
if (input.charAt(x) == '>') {
System.out.println(builder.toString());
builder = new StringBuilder();
}
}
}
in.close();
}
Input:
<?aaaaa>
<?bbbb
bb>
<?cccccc>
Output:
<?aaaaa>
<?bbbb bb>
<?cccccc>

How to read file line by line by CRLF

I have the following file:
and following code:
Scanner scanner = new Scanner(new FileReader(new File(file.txt)));
scanner.useDelimiter("\r\n");
int i = 0;
while (scanner.hasNext()) {
scanner.nextLine();
i++;
}
System.out.println(i);
It returns 5.
expected result: 2.
What do I wrong?
I want to split by CRLF only (not LF).
Use scanner.next() to invoke the delimiter specified.
scanner.nextLine() will use \n (exact pattern is \r\n|[\n\r\u2028\u2029\u0085]) as delimiter, hence the length is 5.
while (scanner.hasNext()) {
scanner.next();
i++;
}

Scanner reading only half the no. of lines in a file

I am trying to read a file using Scanner Object with the following code -
public void read(){
Scanner scanner = new Scanner(dataFile).useDelimiter("\n");
String line;
int i = 0;
while(scanner.hasNext()){
line = scanner.next();
i++;
}
System.out.println(i);
}
The file which I am trying to read from has 117000 lines, out of which the scanner only reads first 59550 odd lines. It does not throw any exception and simply returns.
When I change the implementation to use a BufferedReader it reads all 117000 lines -
public void read(){
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(dataFile)));
String line;
int i=0;
while((line = br.readLine())!= null){
i++;
}
System.out.println(i);
}
Can anyone explain why scanner doesn't read all lines ?
One probable reason could be that Scanner's(1KB) buffer limit is less than that of BufferedReader(8KB).
The following program works for me:
Scanner scanner = new Scanner(dataFile);
String line;
int i = 0;
while(scanner.hasNextLine()){
line = scanner.nextLine();
// System.out.println(line); // remove comment for debug
i++;
}
System.out.println(i);
scanner.close();
The changes from the original program are:
Changed hasNext() and next() to hasNextLine() and nextLine(). In this case the default delimiter is fine
Fixed a typo - system.out.println should be System.out.println
Added a comment to print line (and check if the delimiter is OK)
Added scanner.close()
It's probably something to do with the line ending, delimiter used by Scanner.
You should use the methods :
hasNextLine() and nextLine()
Can anyone explain why scanner doesn't read all lines ?
br.readLine also selects lines that end with \r (and not \n). This is one important difference with your Scanner that only reads lines with \n.

How to determine the end of a line with a Scanner?

I have a scanner in my program that reads in parts of the file and formats them for HTML. When I am reading my file, I need to know how to make the scanner know that it is at the end of a line and start writing to the next line.
Here is the relevant part of my code, let me know if I left anything out :
//scanner object to read the input file
Scanner sc = new Scanner(file);
//filewriter object for writing to the output file
FileWriter fWrite = new FileWriter(outFile);
//Reads in the input file 1 word at a time and decides how to
////add it to the output file
while (sc.hasNext() == true)
{
String tempString = sc.next();
if (colorMap.containsKey(tempString) == true)
{
String word = tempString;
String color = colorMap.get(word);
String codeOut = colorize(word, color);
fWrite.write(codeOut + " ");
}
else
{
fWrite.write(tempString + " ");
}
}
//closes the files
reader.close();
fWrite.close();
sc.close();
I found out about sc.nextLine(), but I still don't know how to determine when I am at the end of a line.
If you want to use only Scanner, you need to create a temp string instantiate it to nextLine() of the grid of data (so it returns only the line it skipped) and a new Scanner object scanning the temp string. This way you're only using that line and hasNext() won't return a false positive (It isn't really a false positive because that's what it was meant to do, but in your situation it would technically be). You just keep nextLine()ing the first scanner and changing the temp string and the second scanner to scan each new line etc.
Lines are usually delimitted by \n or \r so if you need to check for it you can try doing it that way, though I'm not sure why you'd want to since you are already using nextLine() to read a whole line.
There is Scanner.hasNextLine() if you are worried about hasNext() not working for your specific case (not sure why it wouldn't though).
you can use the method hasNextLine to iterate the file line by line instead of word by word, then split the line by whitespaces and make your operations on the word
here is the same code using hasNextLine and split
//scanner object to read the input file
Scanner sc = new Scanner(file);
//filewriter object for writing to the output file
FileWriter fWrite = new FileWriter(outFile);
//get the line separator for the current platform
String newLine = System.getProperty("line.separator");
//Reads in the input file 1 word at a time and decides how to
////add it to the output file
while (sc.hasNextLine())
{
// split the line by whitespaces [ \t\n\x0B\f\r]
String[] words = sc.nextLine().split("\\s");
for(String word : words)
{
if (colorMap.containsKey(word))
{
String color = colorMap.get(word);
String codeOut = colorize(word, color);
fWrite.write(codeOut + " ");
}
else
{
fWrite.write(word + " ");
}
}
fWrite.write(newLine);
}
//closes the files
reader.close();
fWrite.close();
sc.close();
Wow I've been using java for 10 years and have never heard of scanner!
It appears to use white space delimiters by default so you can't tell when an end of line occurs.
Looks like you can change the delimiters of the scanner - see the example at Scanner Class:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();

Categories