Regex to detect end of line(\n) that has double slash(//) - java

I need a regex for this example:
//This is a comment and I need this \n position
String notwanted ="//I do not need this end of line position";

Try this regex:
(?<!")\/\/[^\n]+(\n)
you can use Matcher method matcher.start(1) to get index of \n character, but in will not match String where \\ is preceded by ". Example in Java:
public class Main {
public static void main(String[] args){
String example = "//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";";
Pattern regex = Pattern.compile("(?<!\")//[^\\n]+(\\n)");
Matcher matcher = regex.matcher(example);
while (matcher.find()) {
System.out.println(matcher.start(1));
}
}
}
however it would be enough to use:
(?<!")\/\/[^\n]+
and just use matcher.end(), to get start position of new line.
Another case, if you would like to split a string using this position, you can also use this one:
example.split("(?<=^//[^\n]{0,1000})\n");
The (?<=^//[^\n]{0,999}) means:
?<= - lookbehind,
^// - beginning of a line, fallowed by // comments sign
[^\n]{0,1000} - multiple characters but not new lines; here is tricky thing, as lookbehind need to have defined lenght, you cannot use quatifires like * or +, this is why you need to use interval, in this case, from 0 to 1000 characters, but be aware, if your comment is more than 1000 characters (not too possible but still possible), it will not work - so set this number (1000 in this example) carefully
\n - new line you are looking for
but if you would like to split whole string in multiple places, you will need to add modifier (?m) - multiline match - on the beginning of regex:
(?m)(?<=^//[^\n]{0,1000})\n
but I'm not entirely sure
>>EDIT<< response to questions from comments
Try this code:
public class Main {
public static void main(String[] args){
String example =
"//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";\n" +
"String a = aaa; //comment here";
Pattern regex = Pattern.compile("(?m)(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)");
Matcher matcher = regex.matcher(example);
while(matcher.find()){
System.out.println(matcher.start());
}
System.out.println(example.replaceAll("(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)", " (X)\n"));
}
}
maybe this regex will fulfill your expectations. If not, please redefine and ask another question with more details like: input, expexted output, your current code, your goal.

This should work for you. It's really really awful. Couldn't really think of a much better, versatile solution. I'm assuming you also wanted comments like this:
String myStr = "asasdasd"; //some comment here
^[^"\n]*?(?:[^"\n]*?"(?>\\"|[^"\n])*?"[^"\n]*?)*?[^"\n]*?\/\/.*?(\n)
Regex101

Related

Regular expression not matching on first and last word of string

I am trying to write a java program that will look for a specific words in a string. I have it working for the most part but it doesnt seem to match if the word to match is the first or last word in the string. Here is an example:
"trying to find the first word".matches(".*[^a-z]find[^a-z].*") //returns true
"trying to find the first word".matches(".*[^a-z]trying[^a-z].*") //returns false
"trying to find the first word".matches(".*[^a-z]word[^a-z].*") //returns false
Any idea how to make this match on any word in the string?
Thanks in advance,
Craig
The problem is your character class before and after the words [^a-z]- I think that what you actually want is a word boundary character \b (as per ColinD's comment) as opposed to not a character in the a-z range. As pointed out in the comments (thanks) you'll also needs to handle the start and end of string cases.
So try, eg:
"(?:^|.*\b)trying(?:\b.*|$)"
You can use the optional (?) , check below link and test more cases if this give proper output:
https://regex101.com/r/oP5zB8/1
(.*[^a-z]?trying[^a-z]?.*)
I think (^|^.*[^a-z])trying([^a-z].*$|$) just fits your need.
Or (?:^|^.*[^a-z])trying(?:[^a-z].*$|$) for non capturing parentheses.
You can try following program to check the existence on start and end of any string:
package com.ajsodhi.utilities;
import java.util.regex.Pattern;
public class RegExStartEndWordCheck {
public static final String stringToMatch = "StartingsomeWordsEndWord";
public static void main(String[] args) {
String regEx = "Starting[A-Za-z0-9]{0,}EndWord";
Pattern patternOriginalSign = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
boolean OriginalStringMatchesPattern = patternOriginalSign.matcher(stringToMatch).matches();
System.out.println(OriginalStringMatchesPattern);
}
}
you should use the boundary \b that's specify a beginning or a ending of a word instead of [^a-z] which is not so logic.
Just something like
".*\\bfind\\b.*"

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

java regex how to match some string that is not some substring

For example, my org string is:
CCC=123
CCC=DDDDD
CCC=EE
CCC=123
CCC=FFFF
I want everything that does not equal to "CCC=123" to be changed to "CCC=AAA"
So the result is:
CCC=123
CCC=AAA
CCC=AAA
CCC=123
CCC=AAA
How to do it in regex?
If I want everything that is equal to "CCC=123" to be changed to "CCC=AAA", it is easy to implement:
(AAA[ \t]*=)(123)
You can use a negative lookahead:
public static void main(String[] args)
{
String foo = "CCC=123 CCC=DDD CCC=EEE";
Pattern p = Pattern.compile("(CCC=(?!123).{3})");
Matcher m = p.matcher(foo);
String result = m.replaceAll("CCC=AAA");
System.out.println(result);
}
output:
CCC=123 CCC=AAA CCC=AAA
These are zero-width, non capturing, which is why you have to then add the .{3} to capture the non-matching characters to be replaced.
s = s.replaceAll("(?m)^CCC=(?!123$).*$", "CCC=AAA");
(?m) activates MULTILINE mode, which allows ^ and $ to match the beginning and and end of lines, respectively. The $ in the lookahead makes sure you don't skip something that matches only partially, like CCC=12345. The $ at the very end isn't really necessary, since the .* will consume the rest of the line in any case, but it helps communicate your intent.

How do I make a regex match for measurement units?

I'm building a small Java library which has to match units in strings. For example, if I have "300000000 m/s^2", I want it to match against "m" and "s^2".
So far, I have tried most imaginable (by me) configurations resembling (I hope it's a good start)
"[[a-zA-Z]+[\\^[\\-]?[0-9]+]?]+"
To clarify, I need something that will match letters[^[-]numbers] (where [ ] denotes non obligatory parts). That means: letters, possibly followed by an exponent which is possibly negative.
I have studied regex a little bit, but I'm really not fluent, so any help will be greatly appreciated!
Thank you very much,
EDIT:
I have just tried the first 3 replies
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
and it doesn't work... I know the code which tests the patterns work, because if I try something simple, like matching "[0-9]+" in "12345", it will match the whole string. So, I don't get what's still wrong. I'm trying with changing my brackets for parenthesis where needed at the moment...
CODE USED TO TEST:
public static void main(String[] args) {
String input = "30000 m/s^2";
// String input = "35345";
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
String regex10 = "[0-9]+";
String regex = "([a-zA-Z]+)(?:\\^\\-?[0-9]+)?";
Pattern pattern = Pattern.compile(regex3);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("MATCHES");
do {
int start = matcher.start();
int end = matcher.end();
// System.out.println(start + " " + end);
System.out.println(input.substring(start, end));
} while (matcher.find());
}
}
([a-zA-Z]+)(?:\^(-?\d+))?
You don't need to use the character class [...] if you're matching a single character. (...) here is a capturing bracket for you to extract the unit and exponent later. (?:...) is non-capturing grouping.
You're mixing the use of square brackets to denote character classes and curly brackets to group. Try this instead:
[a-zA-Z]+(\^-?[0-9]+)?
In many regular expression dialects you can use \d to mean any digit instead of [0-9].
Try
"[a-zA-Z]+(?:\\^-?[0-9]+)?"

Regular Expression problem in Java

I am trying to create a regular expression for the replaceAll method in Java. The test string is abXYabcXYZ and the pattern is abc. I want to replace any symbol except the pattern with +. For example the string abXYabcXYZ and pattern [^(abc)] should return ++++abc+++, but in my case it returns ab++abc+++.
public static String plusOut(String str, String pattern) {
pattern= "[^("+pattern+")]" + "".toLowerCase();
return str.toLowerCase().replaceAll(pattern, "+");
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
When I try to replace the pattern with + there is no problem - abXYabcXYZ with pattern (abc) returns abxy+xyz. Pattern (^(abc)) returns the string without replacement.
Is there any other way to write NOT(regex) or group symbols as a word?
What you are trying to achieve is pretty tough with regular expressions, since there is no way to express “replace strings not matching a pattern”. You will have to use a “positive” pattern, telling what to match instead of what not to match.
Furthermore, you want to replace every character with a replacement character, so you have to make sure that your pattern matches exactly one character. Otherwise, you will replace whole strings with a single character, returning a shorter string.
For your toy example, you can use negative lookaheads and lookbehinds to achieve the task, but this may be more difficult for real-world examples with longer or more complex strings, since you will have to consider each character of your string separately, along with its context.
Here is the pattern for “not ‘abc’”:
[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c
It consists of five sub-patterns, connected with “or” (|), each matching exactly one character:
[^abc] matches every character except a, b or c
a(?!bc) matches a if it is not followed by bc
(?<!a)b matches b if it is not preceded with a
b(?!c) matches b if it is not followed by c
(?<!ab)c matches c if it is not preceded with ab
The idea is to match every character that is not in your target word abc, plus every word character that, according to the context, is not part of your word. The context can be examined using negative lookaheads (?!...) and lookbehinds (?<!...).
You can imagine that this technique will fail once you have a target word containing one character more than once, like example. It is pretty hard to express “match e if it is not followed by x and not preceded by l”.
Especially for dynamic patterns, it is by far easier to do a positive search and then replace every character that did not match in a second pass, as others have suggested here.
[^ ... ] will match one character that is not any of ...
So your pattern "[^(abc)]" is saying "match one character that is not a, b, c or the left or right bracket"; and indeed that is what happens in your test.
It is hard to say "replace all characters that are not part of the string 'abc'" in a single trivial regular expression. What you might do instead to achieve what you want could be some nasty thing like
while the input string still contains "abc"
find the next occurrence of "abc"
append to the output a string containing as many "+"s as there are characters before the "abc"
append "abc" to the output string
skip, in the input string, to a position just after the "abc" found
append to the output a string containing as many "+"s as there are characters left in the input
or possibly if the input alphabet is restricted you could use regular expressions to do something like
replace all occurrences of "abc" with a single character that does not occur anywhere in the existing string
replace all other characters with "+"
replace all occurrences of the target character with "abc"
which will be more readable but may not perform as well
Negating regexps is usually troublesome. I think you might want to use negative lookahead. Something like this might work:
String pattern = "(?<!ab).(?!abc)";
I didn't test it, so it may not really work for degenerate cases. And the performance might be horrible too. It is probably better to use a multistep algorithm.
Edit: No I think this won't work for every case. You will probably spend more time debugging a regexp like this than doing it algorithmically with some extra code.
Try to solve it without regular expressions:
String out = "";
int i;
for(i=0; i<text.length() - pattern.length() + 1; ) {
if (text.substring(i, i + pattern.length()).equals(pattern)) {
out += pattern;
i += pattern.length();
}
else {
out += "+";
i++;
}
}
for(; i<text.length(); i++) {
out += "+";
}
Rather than a single replaceAll, you could always try something like:
#Test
public void testString() {
final String in = "abXYabcXYabcHIH";
final String expected = "xxxxabcxxabcxxx";
String result = replaceUnwanted(in);
assertEquals(expected, result);
}
private String replaceUnwanted(final String in) {
final Pattern p = Pattern.compile("(.*?)(abc)([^a]*)");
final Matcher m = p.matcher(in);
final StringBuilder out = new StringBuilder();
while (m.find()) {
out.append(m.group(1).replaceAll(".", "x"));
out.append(m.group(2));
out.append(m.group(3).replaceAll(".", "x"));
}
return out.toString();
}
Instead of using replaceAll(...), I'd go for a Pattern/Matcher approach:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static String plusOut(String str, String pattern) {
StringBuilder builder = new StringBuilder();
String regex = String.format("((?:(?!%s).)++)|%s", pattern, pattern);
Matcher m = Pattern.compile(regex).matcher(str.toLowerCase());
while(m.find()) {
builder.append(m.group(1) == null ? pattern : m.group().replaceAll(".", "+"));
}
return builder.toString();
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
}
Note that you'll need to use Pattern.quote(...) if your String pattern contains regex meta-characters.
Edit: I didn't see a Pattern/Matcher approach was already suggested by toolkit (although slightly different)...

Categories