I am using Java's Scanner to parse some text. Say I have set as a delimiter a variety of characters [#$]
With next I get the text till that delimiter, but I would like for a way to learn if parsing stopped because it found # or because it found $.
Is there some way to do that? Or should I break it in two, as in try with the first delimiter, and if you fail try with the second?
Found it! :)
You can use
scanner.findWithinHorizon("[\\#]", 2)
to see if # was the delimeter found.
Related
I am attempting to use substrings in order to prevent word splitting on the system output.
I have seen responses that split it by each word, but I want to do it only when necessary.
Original code:
System.out.println("This is a very extra long quote that doesn't break properly.");
Build output:
This is a very extr
a long quote that d
oesn't break proper
ly.
Desired build output:
This is a very
extra long
quote that
doesn't
break properly.
Of course, I am not trying to have it that narrow-- I just want the word to break into a new line when the word splits.
Thank you anyone that helps! All responses are appreciated!
You can make use of wrap method from WordUtils (Apache commons library Link to documentation)
The method makes use of wraplength and white spaces to wrap words from a string.
I am having issues using my delimiter in my scanner. I am currently using a scanner to read a text file and put tokens into a string. My tutor told me to use the delimiter (useDelimiter("\t|\n")). However each token that it is grabbing is ending in /r (due to a return in the text file). This is fine for printing purposes, however i need to get the string length. And instead of returning the number of actual characters, it is returning the number of characters including that /r. Is there a better delimiter I can use that will accomplish the same thing (without grabbing the /r)? code is as follows:
studentData.useDelimiter("\t|\n");
while (studentData.hasNext())
{
token = studentData.next();
int tokenLength = token.length();
statCalc(tokenLength);
}
I am well aware that I could simply remove the last character of the string token. However, for many reasons, I just want it to grab the token without the /r. Any and all help would be greatly appreciated.
Try this:
studentData.useDelimiter("\\t|\\R");
The \R pattern matches any linebreak, see documentation.
I guess the remaining \r char is a partially consumed linebreak in Windows environment. With the aforementioned delimiter, the scanner will properly consume the line.
Replace all Carriage and form return from your string.Try this
s = s.replaceAll("\\n", "");
s = s.replaceAll("\\r", "");
Windows-style line ending is usually: \r\n but you are ignoring \r as delimiter. Your regex pattern (\t|\n) can be improved by using:
(\t|\r\n|\r|\n)
However, it looks to me like what you're trying to accomplish is to create a "tokenizer" which breaks a text file into words (since you're also looking for \t) so my guess is that you're better of with:
studentData.useDelimiter("\\s*");
which will take in consideration any white-space.
You can learn more about regular expressions.
I'm helping my sisters with a simple java program and I'm stumped. They've only learned scanner classes to read file contents, so I think they're supposed to use the scanner class. Each line contains letters and potentially a blank space, and we're hoping to store each line in an array. This works fine and dandy until one of the lines contains something like:
abcde f (the blank space after f should be read in as part of the
line).
However, scanner.nextLine() seems to disregard this last blank space. I figured I could set my scanner delimiter to \n like so:
scanner.useDelimiter("\n")
and then use scanner.Next() from there, but this still doesn't seem to work. I've googled around and taken a look at a few stackoverflow questions. This question here seems to suggest this is not easily done with the scanner class: How to read whitespace with scanner.next()
Any ideas? I feel like there's an easy way I'm overlooking.
This is how I'm reading in the lines:
While(scanner.hasNextLine(){
String nextLine = scanner.nextLine();
Using the above example, my string would read abcde f. It will get rid of the empty space at the end.
I've also tried to use hasNext and next.
Pardon my formatting, I'm editing on a phone.
Save your text file as ANSI encoding and try again.
By right scanner.nextLine() will capture everything in the line, including whitespace.
scanner.next() will not capture whitespace as the delimiter is whitespace by default.
I'm trying to read a text file of numbers as a double array and after various methods (usually resulting in an input format exception) I have come to the conclusion that the text file I am trying to read is inconsistent with it's delimiting.
The majority of the text format is in the form "0.000,0.000" so I have been using a Scanner and the useDelimiter(",") to read in each value.
It turns out though (this is a big file of numbers) that some of the formatting is in the form "0.000 0.000" (at the end of a line I presume) which of course produces an input format exception.
This is an open question really, I'm a pretty basic Java programmer so I would just like to see if there are any suggestions/ways of performing this. Is Scanner the correct class to go on this?
Thank you for your time!
Read file as text line-by-line. Then split line into parts:
String[] parts = line.split("[ ,]");
Now iterate over the parts and call Double.parseDouble() for each part.
Scanner allows any Java Regex Pattern to function as a delimiter. You should be able to use any number of delimiters by doing the following:
scanner.setDelimiter("[,\\s]"); // Will match commas and whitespace
I'd like to comment this in instead of making it a separate answer, but my reputation is too low. Apologies, Alex.
You mentioned having two different delimited characters used in different instances, not a combination of the two as a single delimiter.
You can use the vertical bar as logical OR in a regular expression.
scanner.setDelimiter("[,|\\s]"); //Will match commas or whitespace as appropriate
line by line:
String[] parts = line.split("[,|\\s]");
I'm trying to scan a file that has the DOS ^M as end-of-line using something like:
Scanner file = new Scanner(new File(saveToFilePath)).useDelimiter("(?=\^M)")
In other words, I want to read the text line by line but also keep the ^M that marks the end of the line. This would be easy with \n but I'm not good with regexes and the DOS end-of-line is driving me crazy.
After some research I finally got it. The following is the correct regex for finding and keeping ^M. I didn't know that it meant CTRL-M, so some of your responses helped with that. For some reason, the "M" is not included in the regex and I'm not sure why it works, but it does. This gives us a delimiter for lines that includes the delimiter (with a lookahead regex) when searching for the elusive "^M".
Scanner file = new Scanner(source).useDelimiter("(?=\p{Cntrl})")
Thank you, everyone.