I am having issues using my delimiter in my scanner. I am currently using a scanner to read a text file and put tokens into a string. My tutor told me to use the delimiter (useDelimiter("\t|\n")). However each token that it is grabbing is ending in /r (due to a return in the text file). This is fine for printing purposes, however i need to get the string length. And instead of returning the number of actual characters, it is returning the number of characters including that /r. Is there a better delimiter I can use that will accomplish the same thing (without grabbing the /r)? code is as follows:
studentData.useDelimiter("\t|\n");
while (studentData.hasNext())
{
token = studentData.next();
int tokenLength = token.length();
statCalc(tokenLength);
}
I am well aware that I could simply remove the last character of the string token. However, for many reasons, I just want it to grab the token without the /r. Any and all help would be greatly appreciated.
Try this:
studentData.useDelimiter("\\t|\\R");
The \R pattern matches any linebreak, see documentation.
I guess the remaining \r char is a partially consumed linebreak in Windows environment. With the aforementioned delimiter, the scanner will properly consume the line.
Replace all Carriage and form return from your string.Try this
s = s.replaceAll("\\n", "");
s = s.replaceAll("\\r", "");
Windows-style line ending is usually: \r\n but you are ignoring \r as delimiter. Your regex pattern (\t|\n) can be improved by using:
(\t|\r\n|\r|\n)
However, it looks to me like what you're trying to accomplish is to create a "tokenizer" which breaks a text file into words (since you're also looking for \t) so my guess is that you're better of with:
studentData.useDelimiter("\\s*");
which will take in consideration any white-space.
You can learn more about regular expressions.
Related
My Java project based on WebView component.
Now, I want to call some JS function with single String argument.
To do this, I'm using simple code:
webEngine.executeScript("myFunc('" + str + "');");
*str text is getting from the texarea.
This solution works, but not safe enough.
Some times we can get netscape.javascript.JSException: SyntaxError: Unexpected EOF
So, how to handle str to avoid Exception?
Letfar's answer will work in most cases, but not all, and if you're doing this for security reasons, it's not sufficient. First, backslashes need to be escaped as well. Second, the line.separator property is the server side's EOL, which will only coincidentally be the same as the client side's, and you're already escaping the two possibilities, so the second line isn't necessary.
That all being said, there's no guarantee that some other control or non-ASCII character won't give some browser problems (for example, see the current Chrome nul in a URL bug), and browsers that don't recognize JavaScript (think things like screenreaders and other accessibility tools) might try to interpret HTML special characters as well, so I normally escape [^ -~] and [\'"&<>] (those are regular expression character ranges meaning all characters not between space and tilde inclusive; and backslash, single quote, double quote, ampersand, less than, greater than). Paranoid? A bit, but if str is a user entered string (or is calculated from a user entered string), you need to be a bit paranoid to avoid a security vulnerability.
Of course the real answer is to use some open source package to do the escaping, written by someone who knows security, or to use a framework that does it for you.
I have found this quick fix:
str = str.replace("'", "\\'");
str = str.replace(System.getProperty("line.separator"), "\\n");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\n");
I am trying to replace all occurrences of a substring from a String.
I want to replace "\t\t\t" with "<3tabs>"
I want to replace "\t\t\t\t\t\t" with "<6tabs>"
I want to replace "\t\t\t\t" with "< >"
I am using
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
But no use, it does not replace anything, then i tried using
s = s.replaceAll("\t\t\t\t", "< >");
s = s.replaceAll("\t\t\t", "<3tabs>");
s = s.replaceAll("\t\t\t\t\t\t", "<6tabs>");
Again, no use, it does not replace anything. after trying these two methods i tried StringBuilder
I was able to replace the items through StringBuilder, My Question is, why am i unable to replace the items directly through String from the above two commands? Is there any method from which i can directly replace items from String?
try in this order
String s = "This\t\t\t\t\t\tis\t\t\texample\t\t\t\t";
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
System.out.print(s);
output:
This<6tabs>is<3tabs>example< >
6tabs is never going to find a match as the check before it will have already replaced them with two 3tabs.
You need to start with largest match first.
Strings are immutable so you can't directly modify them, s.replace() returns a new String with the modifications present in it. You then assign that back to s though so it should work fine.
Put things in the correct order and step through it with a debugger to see what is happening.
Take a look at this
Go through your text, divide it into a char[] array, then use a for loop to go through the individual characters.
Don't print them out straight, but print them using a %x tag (or %d if you like decimal numbers).
char[] characters = myString.tocharArray();
for (char c : characters)
{
System.out.printf("%x%n", c);
}
Get an ASCII table and look up all the numbers for the characters, and see whether there are any \n or \f or \r. Do this before or after.
Different operating systems use different line terminating characters; this is the first reference I found from Google with "line terminator Linux Windows." It says Windows uses \r\f and Linux \f. You should find that out from your example. Obviously if you strip \n and leave \r you will still have the text break into separate lines.
You might be more successful if you write a regular expression (see this part of the Java Tutorial, etc) which includes whitespace and line terminators, and use it as a delimiter with the String.split() method, then print the individual tokens in order.
I am working on a school project to build a pseudo terminal and file system. The terminal is scanning System.in and pass the string to controller.
Input to console: abc\r\nabc\r\nabc
Here is the code I tried
Scanner systemIn = Scanner(System.in);
input = systemIn.nextLine();
input = input.replaceAll("\\\\r\\\\n",System.getProperty("line.separator"));
System.out.print(input);
I want java to treat the \r\n I typed to console as a line separator, not actually \ and r.
What it does now is print the input as is.
Desired Ouput:
abc
abc
abc
UPDATE: I tried input = StringEscapeUtils.unescapeJava(input); and it solved the problem.
You need to double-escape the regexes in java (once for the regex backslash, once for the Java string). You dont want a linebreak (/\n/, "\\n"), but a backslash (/\\/) plus a "n": /\\n/, "\\\\n". So this should work:
input.replaceAll("(\\\\r)?\\\\n", System.getProperty("line.separator"));
For a more broad handling of escape sequences see How to unescape a Java string literal in Java?
If your input has the string '\r\n', try this
Scanner systemIn = Scanner(System.in);
input = systemIn.nextLine();
input = input.replaceAll("\\\\r\\\\n",System.getProperty("line.separator"))
For consistent behaviour I would replace \\r with \r and \\n with \n rather than replace \\r\\n with the newline as this will have different behaviour on different systems.
You can do
input = systemIn.nextLine().replaceAll("\\\\r", "\r").replaceAll("\\\\n", "\n");
nextLine() strips of the newline at the end. If you want to add a line separator you can do
input = systemIn.nextLine() + System.getProperty("line.separator");
if you are using println() you don't need to add it back.
System.out.println(systemIn.nextLine()); // prints a new line.
As it was mentioned by r0dney, the Bergi's solution doesn't work.
The ability to use some 3rd party libraries is good, however for a person who studies it is better to know the theory, because not for every problem exists some 3rd party library.
Overload project with tons of 3rd party libraries for tasks which can be solved in one line code makes project bulky and not easy maintainable. Anyway here is what's working:
content.replaceAll("(\\\\r)?\\\\n", System.getProperty("line.separator"));
Unless you are actually typing \ and r and \ and n into the console, you don't need to do this at all: instead you have a major misunderstanding. The CR character is represented in a String as \r but it consists of only one byte with the hex value 0xD. And if you are typing backslashes into the console, the simple answer is "don't". Just hit the Enter key: that's what it's for. It will transmit the CR byte into your code.
I am using Java's Scanner to parse some text. Say I have set as a delimiter a variety of characters [#$]
With next I get the text till that delimiter, but I would like for a way to learn if parsing stopped because it found # or because it found $.
Is there some way to do that? Or should I break it in two, as in try with the first delimiter, and if you fail try with the second?
Found it! :)
You can use
scanner.findWithinHorizon("[\\#]", 2)
to see if # was the delimeter found.
I need to strip all xml tags from an xml document, but keep the space the tags occupy, so that the textual content stays at the same offsets as in the xml. This needs to be done in Java, and I thought RegExp would be the way to go, but I have found no simple way to get the length of the tags that match my regular expression.
Basically what I want is this:
Pattern p = Pattern.compile("<[^>]+>[^<]*]+>");
Matcher m = p.matcher(stringWithXMLContent);
String strippedContent = m.replaceAll("THIS IS A STRING OF WHITESPACES IN THE LENGTH OF THE MATCHED TAG");
Hope somebody can help me to do this in a simple way!
Since < and > characters always surround starting and ending tags in XML, this may be simpler with a straightforward statemachine. Simply loop over all characters (in some writeable form - not stored in a string), and if you encounter a < flip on the "replacement mode" and start replacing all characters with spaces until you encounter a >. (Be sure to replace both the initial < and the closing >).
If you care about layout, you may wish to avoid replacing tab characters and/or newline characters. If all you care about is overall string length, that obviously won't matter.
Edit: If you want to support comments, processing instructions and/or CData sections, you'll need to explicitly recognize these too; also, attribute values unfortunately can include > as well; all this means a full-fledged implementation will be more complex that you'd like.
A regular transducer would be perfect for this task; but unfortunately those aren't exactly commonly found in class libraries...
Pattern p = Pattern.compile("<[^>]+>[^<]*]+>");
In the spirit of You Can't Parse XML With Regexp, you do know that's not an adequate pattern for arbitrary XML, right? (It's perfectly valid to have a > character in an attribute value, for example, not to mention other non-tag constructs.)
I have found no simple way to get the length of the tags that match my regular expression.
Instead of using replaceAll, repeatedly call find on the Matcher. You can then read start/end to get the indexes to replace, or use the appendReplacement method on a buffer. eg.
StringBuffer b= new StringBuffer();
while (m.find()) {
String spaces= StringUtils.repeat(" ", m.end()-m.start());
m.appendReplacement(b, spaces);
}
m.appendTail(b);
stringWithXMLContent= b.toString();
(StringUtils comes from Apache Commons. For more background and library-free alternatives see this question.)
Why not use an xml pull parser and simply echo everything that you want to keep as you encounter it, e.g. character content and whenever you reach a start or end tag find out the length using the name of the element, plus any attributes that it has and write the appropriate number of spaces.
The SAX API also has callbacks for ignoreable whitespace. So you can also echo all whitespace that occurs in your document.
Maybe m.start() and m.end() can help.
m.start() => "The index of the first character matched"
m.end() => "The offset after the last character matched"
(m.end() - m.start())-2 and you know how many /s you need.
**string**.replaceAll("(</?[a-zA-Z]{1}>)*", "")
you can also try this. it searches for <, then / 0 or 1 occurance then followed by characters only 1 (small or capital char), then followed by a > , then * for multiple occurrence of this pattern.
:)