I've ran into a bit of a rough spot in this Java program I'm writing an thought I would ask for some help. I'm using regex to replace certain lines in a file being read in and not getting the desired result. I want to replace all series of 3 new lines in my file and thought this would be straight forward since my regex is working in notepad++ but I guess not. Below is what an example of what the file is like:
FIRST SENTENCECRLF
CRLF
CRLF
CRLF
CRLF
CRLF
SECOND SENTENCECRLF
So, in other words, I am wanting to remove 3 of those carriage return\line feed instances between the first and second sentence lines. Below is what I've tried so far. The first tried in Java results in no change to the file (works in Notepad++ fine). The second, pretty much the same as the first works in notepad++ but not Java. The third is pretty much the exact same case as the other two. Anyone have any helpful suggestions as to what might work in this situation. At this point anything would be greatly appreciated!
^(\r\n){3}
^\r\n(\r\n)(\r\n)
^\r\n\r\n\r\n
Try the following regex:
(?m)^(\r\n){3}
The (?m) enables multi-line mode in Java, as explained in How to use java regex to match a line
Related
I was processing some data tweeter using java. I read them from the file, do some process and print to the stdout.
The text in file looks like this:
"RT #Bollogosta319a: #BuyBookSilentSinners \u262fGain Followers\n\u262fRT This\n\u262fMUST FOLLOW ME I FOLLOW BACK\n\u262fFollow everyone who rts\n\u262fGain\n #ANDROID \u2026"
I read it in, and print it out to stdout. The output is supposed to be:
"RT #Bollogosta319a: #BuyBookSilentSinners ☯Gain Followers\n☯RT This\n☯MUST FOLLOW ME I FOLLOW BACK\n☯Follow everyone who rts\n☯Gain\n #ANDROID …"
But my output is like this:
"RT #Bollogosta319a: #BuyBookSilentSinners ?Gain Followers
?RT This
?MUST FOLLOW ME I FOLLOW BACK
?Follow everyone who rts
?Gain
#ANDROID ?"
So, it seems that I have two problems to deal with:
1. print the exact Unicode character instead of Unicode string
2. keep "\n" as it is, instead of a newline in the output.
How can I do this? (I'm really crazy about dealing with different coding in Java)
I don't know how you are parsing the file, but the method you are using seems to be interpreting escape codes (like \n and \u262f). To leave instances of \n in the file literally, you could replace \n with \\n prior to using whatever means of interpreting the escape codes. The \\ will be converted to a single \, and the n will be left alone. Have you tried using a plain java.io.FileReader to read the file? That may be simpler.
The Unicode symbols may actually be read correctly; many terminals do not support the full range of Unicode characters and print some symbol in place of those it does not understand. Perhaps your program prints ☯ and the terminal simply doesn't know how to render it, so it prints a ? instead.
I'm trying to make sure that a string contains between 0 and 3 lines, and that for a given line that is present that it contains 0 to 100 characters. It would need to be a valid expression for JavaScript and Java. Like many people doing RegEx I'm copying from various spots on the Internet.
Working backwards I think ^.{0,100}$ gets me the "line contains 0 to 100 characters", but trying to group that as (^.{0,100}$){0,3} doesn't work.
The new line character is probably part of my problem, so I ended up with something like .{0,100}(?:\n.{0,100}){0,2} trying to say "a line of 0 to 100 characters optionally followed by 0 to 2 instances of a new line and 0 to 100 more characters", but that also failed.
Up until now I got those expressions from other people. Using an online test tool I finally monkeyed this together: ^.{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}$ which appears to work.
So, my question is, am I missing any pitfalls in ^.{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}$ given what I'm after? Furthermore, even if that does work is it the best expression to use?
I think what you have will work fine. You can make the line break part a little more compact if you want, and you don't need ^ and $ if you are using matches():
String regex = ".{0,100}(?:[\r\n]+.{0,100}){0,2}";
EDIT
After some more thoughts I realized the newline suggestion above will match 4 (or more) lines as long as a couple of them are empty. So, we are back to your suggested example. Oh well, at least the start and end characters can be omitted.
String regex = ".{0,100}(?:(?:\r\n|[\r\n]).{0,100}){0,2}";
I'm not very good at regular expressions but would this work?
^.{0,100}\n?(.{0,100}\n)?.{0,100}?$
Again I'm still new to reg exp, so if there is an error(which is likely) please tell me.
I have a little problem.
I have a text that i have to read in browser several time.
Everytime, I open this text, automatically start a replaceAll that i wrote.
It's very simple, basic but that problem is that when i do replace next time (every time i read this text) i have a replaceAll of replaceAll.
For example i have in the text:
XIII
I want to replace it whith
<b>XIII</b>
with:
txt.replaceAll("XIII","<b>XIII</b>")
The first time it's everything fine, but then, when i read again the text, it become:
<b><b>XIII</b></b>
It's a stupid problem, but i start now with Java.
I read that is possibile use regex.Could someone post a little example?
Thanks, and excuse me for my poor english.
You need negative lookbehind to prevent a match on an already marked-up string:
txt.replaceAll("(?<!>)XIII","<b>XIII</b");
This expression looks a bit convoluted, but this is how it decomposes:
(?<! ... ) is the template for the negative lookbehind;
> is the specific character we want to make sure doesn't occur in front of your string.
I should also warn you that fixing up HTML with regex's usually turns into a diabolic cycle of upgrading the regex to handle yet another special case, only to see it fail on the next one. It ends up with a monster that nobody can read, let alone improve.
There's a really fast solution. Do the opposite Replace before doing your own.
Let me show:
txt.replaceAll("<b>XIII</b>","XIII").replaceAll("XIII","<b>XIII</b>")
So you first turn your <b> into normal and than turn it back with <b> and it will achieve the same result without adding the new level of <b>.
What about this:
txt = txt.replaceAll ("XIII", "<b>XIII</b>").
replceAll ("<b><b>", "<b>").replaceAll ("</b></b>", "</b>");
I think <b><b> and </b></b> do not have much sense in HTML, so it is fine to remove duplicates even in other places.
I have data coming in a txt file delimited by pipes. The unfortunate thing is 2 fields can have multiple values. To separate these multiples, the sender used pipes again, but put quotes around it. My regex worked for months until a certain rare situation...
Regex currently:
([^\|]*)\|"?([^"]*)"?\|([^\|]*)\|"?([^"]*)"?
And it worked for the following situation which happens most of the time:
abc|"part1|part2"|abc|"tool1|tool2"
But this case is where the ([^"]*) jumps ahead and takes all from the blank to the end of the quotes:
abc||abc|"tool1|tool2"
So I realize I must account for when there is a pipe next instead of a quote.
Just not sure how.............
P.S. For those PIG people that might be looking at this, I removed a backslash from each escape, to make it look more like Java, but in PIG you need 2, fyi.
In your expression you need to specify that the part between |s can be either quoted or not quoted. You can do it as follows:
(("[^"]*")|((?!")[^|]*))
Now you can repeat this part several times with |s in between, as you need.
I've been working on a small Java problem set and have come across some trouble. I'm not very experienced writing regular expressions and could really use two for verifying line entries in /etc/group and /etc/passwd in Java.
I found Regex Verification of Line in /etc/passwd earlier and have yet to test it, but it looks adaptable for what I need. Could anyone else help in providing a regex string for either file?
I'm looking to verify user-entered passwd and group lines, in java, before writing them out to disk. If not, I'll likely end up tokenizing each piece and running various expensive operations.
Rather than writing a regex you should probably just read the files with Scanner and parse each line with String.split(":"). Then you can check that each part is valid without dealing with a complex expression to handle all cases. It'll probably be easier to write the code and easier to read it later.
Why do you want to use regular expressions? Just split the line on the colons and inspect the pieces.