So I have to get words from a text file, change them, and put them into a new text file.
The problem I'm having is, lets say the first line of the file is
hello my name is bob
the modified result should be:
ellohay myay amenay isay bobay
but instead, the result ends up being
ellomynameisbobhay
so scanner has .nextLine() but I want to have a method that is .nextWord() or something, so that it will recognize something as a word until it has a space after it. how can I create this?
nextLine() gives you the whole line.
What you should use is just next(), that will give you the next word.
Also see String.split() or StringTokenizer if you wanted to post-process whole lines. It sound s as though in your situation just using the scanner is fine, but I though i'd mention it because I assumed you'd have just used those methods if you knew about them.
Related
I'm helping my sisters with a simple java program and I'm stumped. They've only learned scanner classes to read file contents, so I think they're supposed to use the scanner class. Each line contains letters and potentially a blank space, and we're hoping to store each line in an array. This works fine and dandy until one of the lines contains something like:
abcde f (the blank space after f should be read in as part of the
line).
However, scanner.nextLine() seems to disregard this last blank space. I figured I could set my scanner delimiter to \n like so:
scanner.useDelimiter("\n")
and then use scanner.Next() from there, but this still doesn't seem to work. I've googled around and taken a look at a few stackoverflow questions. This question here seems to suggest this is not easily done with the scanner class: How to read whitespace with scanner.next()
Any ideas? I feel like there's an easy way I'm overlooking.
This is how I'm reading in the lines:
While(scanner.hasNextLine(){
String nextLine = scanner.nextLine();
Using the above example, my string would read abcde f. It will get rid of the empty space at the end.
I've also tried to use hasNext and next.
Pardon my formatting, I'm editing on a phone.
Save your text file as ANSI encoding and try again.
By right scanner.nextLine() will capture everything in the line, including whitespace.
scanner.next() will not capture whitespace as the delimiter is whitespace by default.
I'm trying to read a text file of numbers as a double array and after various methods (usually resulting in an input format exception) I have come to the conclusion that the text file I am trying to read is inconsistent with it's delimiting.
The majority of the text format is in the form "0.000,0.000" so I have been using a Scanner and the useDelimiter(",") to read in each value.
It turns out though (this is a big file of numbers) that some of the formatting is in the form "0.000 0.000" (at the end of a line I presume) which of course produces an input format exception.
This is an open question really, I'm a pretty basic Java programmer so I would just like to see if there are any suggestions/ways of performing this. Is Scanner the correct class to go on this?
Thank you for your time!
Read file as text line-by-line. Then split line into parts:
String[] parts = line.split("[ ,]");
Now iterate over the parts and call Double.parseDouble() for each part.
Scanner allows any Java Regex Pattern to function as a delimiter. You should be able to use any number of delimiters by doing the following:
scanner.setDelimiter("[,\\s]"); // Will match commas and whitespace
I'd like to comment this in instead of making it a separate answer, but my reputation is too low. Apologies, Alex.
You mentioned having two different delimited characters used in different instances, not a combination of the two as a single delimiter.
You can use the vertical bar as logical OR in a regular expression.
scanner.setDelimiter("[,|\\s]"); //Will match commas or whitespace as appropriate
line by line:
String[] parts = line.split("[,|\\s]");
I need to write a parser for textfiles (at least 20 kb), and I need to determine if words out of a set of words appear in this textfile (about 400 words and numbers). So I am looking for the most efficient possibilitie to do this (if a match is found, i need to do some further processing of this and it's previous line).
What I currently do, is to exclude lines that do not contain any information for sure (kind of metadata lines) and then compare word by word - but i don't think that only comparing word by word is the most efficient possibility.
Can anyone please provide some tips/hints/ideas/...
Thank you very much
It depends on what you mean with "efficient".
If you want a very straightforward way to code it, keep in mind that the String object in java has method String.contains(CharSequence sequence).
Then, you could put the file content into a String and then iterate on your keywords you want to check to see if any of those appear in String, using the method contains().
How about the following:
Put all your keywords in a HashSet (Set<String> keywords;)
Read the file one line at once
For each line in file:
Tokenize to words
For each word in line:
If word is contained in keywords (keywords.containes(word))
Process actual line
If previous line is available
Process previous line
Keep track of previous line (prevLine = line;)
I'm currently writing something which is validating our vbscript files. Right at the start I wish to remove all lines of code which are comments. I was expecting to be able to use the "'" (comment symbol in vbscript) and '\n'. However, when I write the content of the file to screen, the new lines are not formatting. Does this mean there are actually no new lines in the original vbscript file and if not, how could I remove comments?
first read whole file in string example
then use regex or simply substring for removing extra syntax
How are you parsing the file? Are you also taking the '\r' into consideration when removing the comments? Or maybe you are accidentally removing all newline characters.
I would create some state flags to tell the parser when I was in a comment or not.
I've got some very basic code like
while (scan.hasNextLine())
{
String temp = scan.nextLine();
System.out.println(temp);
}
where scan is a Scanner over a file.
However, on one particular line, which is about 6k chars long, temp cuts out after something like 2470 characters. There's nothing special about when it cuts out; it's in the middle of the word "Australia." If I delete characters from the line, the place where it cuts out changes; e.g. if I delete characters 0-100 in the file then Scanner will get what was previously 100-2570.
I've used Scanner for larger strings before. Any idea what could be going wrong?
At a guess, you may have a rogue character at the cut-off point: look at the file in a hex editor instead of just a text editor. Perhaps there's an embedded null character, or possibly \r in the middle of the string? It seems unlikely to me that Scanner.nextLine() would just chop it arbitrarily.
As another thought, are you 100% sure that it's not all there? Perhaps System.out.println is chopping the string - again due to some "odd" character embedded in it? What happens if you print temp.length()?
EDIT: I'd misinterpreted the bit about what happens if you cut out some characters. Sorry about that. A few other things to check:
If you read the lines with BufferedReader.readLine() instead of Scanner, does it get everything?
Are you specifying the right encoding? I can't see why this would show up in this particular way, but it's something to think about...
If you replace all the characters in the line with "A" (in the file) does that change anything?
If you add an extra line before this line (or remove a line before it) does that change anything?
Failing all of this, I'd just debug into Scanner.nextLine() - one of the nice things about Java is that you can debug into the standard libraries.