Scanner cuts off my String after about 2400 characters - java

I've got some very basic code like
while (scan.hasNextLine())
{
String temp = scan.nextLine();
System.out.println(temp);
}
where scan is a Scanner over a file.
However, on one particular line, which is about 6k chars long, temp cuts out after something like 2470 characters. There's nothing special about when it cuts out; it's in the middle of the word "Australia." If I delete characters from the line, the place where it cuts out changes; e.g. if I delete characters 0-100 in the file then Scanner will get what was previously 100-2570.
I've used Scanner for larger strings before. Any idea what could be going wrong?

At a guess, you may have a rogue character at the cut-off point: look at the file in a hex editor instead of just a text editor. Perhaps there's an embedded null character, or possibly \r in the middle of the string? It seems unlikely to me that Scanner.nextLine() would just chop it arbitrarily.
As another thought, are you 100% sure that it's not all there? Perhaps System.out.println is chopping the string - again due to some "odd" character embedded in it? What happens if you print temp.length()?
EDIT: I'd misinterpreted the bit about what happens if you cut out some characters. Sorry about that. A few other things to check:
If you read the lines with BufferedReader.readLine() instead of Scanner, does it get everything?
Are you specifying the right encoding? I can't see why this would show up in this particular way, but it's something to think about...
If you replace all the characters in the line with "A" (in the file) does that change anything?
If you add an extra line before this line (or remove a line before it) does that change anything?
Failing all of this, I'd just debug into Scanner.nextLine() - one of the nice things about Java is that you can debug into the standard libraries.

Related

Removing comments from code character by character [Java]

I need to remove comments from code, but in this case I'll have to do it without using
System.out.println(sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", ""));
The program needs to check the code character by character to look for "/" and then proceed to check if the next character is "/" or "*".
I'm looking for a good way to read through the code and check characters letter by letter
This is a classic problem given to new learners in Java. I would suggest to go for a simple approach as it is intended to help you practice your coding skills
Read the java source code as a file in your program char by char.
Search for comments beginning. In this case, there are 2, /* and //.
Open a string buffer and start writing the read contents into it.
If its /*, then don't write it in buffer. Keep on moving to next character till you find */.
Repeat till end of file is reached.
If single line comments need to be removed, then same algorithm can be followed till you get a new line character.
If you need help in reading from file char by char, refer to Java documentation.
When end of file is reached, then write the string buffer back to the file.

Reading in line with blank space at end using java scanner class

I'm helping my sisters with a simple java program and I'm stumped. They've only learned scanner classes to read file contents, so I think they're supposed to use the scanner class. Each line contains letters and potentially a blank space, and we're hoping to store each line in an array. This works fine and dandy until one of the lines contains something like:
abcde f (the blank space after f should be read in as part of the
line).
However, scanner.nextLine() seems to disregard this last blank space. I figured I could set my scanner delimiter to \n like so:
scanner.useDelimiter("\n")
and then use scanner.Next() from there, but this still doesn't seem to work. I've googled around and taken a look at a few stackoverflow questions. This question here seems to suggest this is not easily done with the scanner class: How to read whitespace with scanner.next()
Any ideas? I feel like there's an easy way I'm overlooking.
This is how I'm reading in the lines:
While(scanner.hasNextLine(){
String nextLine = scanner.nextLine();
Using the above example, my string would read abcde f. It will get rid of the empty space at the end.
I've also tried to use hasNext and next.
Pardon my formatting, I'm editing on a phone.
Save your text file as ANSI encoding and try again.
By right scanner.nextLine() will capture everything in the line, including whitespace.
scanner.next() will not capture whitespace as the delimiter is whitespace by default.

How to keep spaces where they were after modifying string

So I have to get words from a text file, change them, and put them into a new text file.
The problem I'm having is, lets say the first line of the file is
hello my name is bob
the modified result should be:
ellohay myay amenay isay bobay
but instead, the result ends up being
ellomynameisbobhay
so scanner has .nextLine() but I want to have a method that is .nextWord() or something, so that it will recognize something as a word until it has a space after it. how can I create this?
nextLine() gives you the whole line.
What you should use is just next(), that will give you the next word.
Also see String.split() or StringTokenizer if you wanted to post-process whole lines. It sound s as though in your situation just using the scanner is fine, but I though i'd mention it because I assumed you'd have just used those methods if you knew about them.

Suggested ways of reading a text file with inconsistent formatting

I'm trying to read a text file of numbers as a double array and after various methods (usually resulting in an input format exception) I have come to the conclusion that the text file I am trying to read is inconsistent with it's delimiting.
The majority of the text format is in the form "0.000,0.000" so I have been using a Scanner and the useDelimiter(",") to read in each value.
It turns out though (this is a big file of numbers) that some of the formatting is in the form "0.000 0.000" (at the end of a line I presume) which of course produces an input format exception.
This is an open question really, I'm a pretty basic Java programmer so I would just like to see if there are any suggestions/ways of performing this. Is Scanner the correct class to go on this?
Thank you for your time!
Read file as text line-by-line. Then split line into parts:
String[] parts = line.split("[ ,]");
Now iterate over the parts and call Double.parseDouble() for each part.
Scanner allows any Java Regex Pattern to function as a delimiter. You should be able to use any number of delimiters by doing the following:
scanner.setDelimiter("[,\\s]"); // Will match commas and whitespace
I'd like to comment this in instead of making it a separate answer, but my reputation is too low. Apologies, Alex.
You mentioned having two different delimited characters used in different instances, not a combination of the two as a single delimiter.
You can use the vertical bar as logical OR in a regular expression.
scanner.setDelimiter("[,|\\s]"); //Will match commas or whitespace as appropriate
line by line:
String[] parts = line.split("[,|\\s]");

Java scanner to detect blank line?

I've been having lots of trouble trying to get either a scanner or a buffered reader to try and detect a blank line. For example if I have a file that contains:
there
cat
dog
(BLANK LINE)
If I do this:
while( scan.hasNextLine() )
{
String line = scan.nextLine();
...
...
}
The scanner doesn't pick up the blank line. I tried to use a buffered reader also but I run into this issue. Is there some way the scanner can just return a "" whenever it finds a blank line like that? Cheers
Your input has as many lines as it has \n characters. Given the input
"there\ncat\ndog\n"
the next-lines will be correctly divided as
"there\n"
"cat\n"
"dog\n"
(In other words, there is no fourth blank line, since it is not terminated by a \n.)
Put differently, after the "dog\n" has been read, the scanner (or buffered reader for that matter) has reached EOF and there's not even an empty line to return. (Note that when the lines are returned, the new-line character is stripped off.)
So, since this is the expected behavior, I don't know what the easiest fix is. I suspect that the best way to solve this is simply to append a \n to the input, so that the loop runs an extra iteration.

Categories