Regex negative lookbehind to escape period/dot - java

I had been struggling to find out a regex pattern that would escape "." if a escape char is found before it. Negative lookbehind was promising but I suppose it doesn't work for "." as with below syntax
String test = "hostname.domain.com/abc/def/v1.8/ghi"
In above example, string needs to be split by "." , but I need to escape the v1.8 so that v1 and 8 are not treated as different array elements in the URI part.
String test = "hostname.domain.com/abc/def/v1\\.8/ghi"
test.split("(?!\\\\).");
The expected output {"hostname","domain","com/abc/def/v1.8/ghi"} . The URI context path should not be split by "." if it carries any "." it would just for representing version.
The above negative lookbehind syntax works for other char's like -, but doesn't work for ".". I assume the escape character needs to be different, but adding other escape chars might cause issue in further processing of the string as the input is of URI type and don't want any reserved/special chars in URI to be used as char to prepend for this. Any thoughts/help from anyone is appreciated.

Why use regex..Use URL class
URL url=new URL(yourURL);
url.getPath();//abc/def/v1.8/ghi
url.getPort();//-1 in your case
url.getHost();//hostname.domain.com
You can now split the hostname with .

You can use this negative lookahead regex:
(?!\\\\)(?:^|.)\\.
OR Using negative lookbehind:
(?<!\\\\)\\.
Online Demo: http://www.rubular.com/r/Sqa2P7A6dR and http://www.rubular.com/r/xgE7onrwzX

To avoid multiple use of escape characters in the regex string (one level of escaping is removed by the Java compiler; the other level is removed by the regex engine) it is possible to "escape" characters by enclosing them in square brackets. For example, \\\\. would become a more readable [.].
In your case, you could tell Java not to use a dot that is between two digits, because it's a decimal separator:
String test = "hostname.domain.com/abc/def/v1.8/ghi";
for (String s : test.split("(?<!\\d)[.](?!\\d)")) {
System.out.println(s);
}
Here is a demo on ideone.

try this expr
String[] s = "hostname.domain.com/abc/def/v1.8/ghi".split("(?<!/.{0,99})\\.");

Related

Regex to match \a574322 in Java

I have long string looking like this: \c53\e59\c9\e28\c20140326\a4095\c8\c15\a546\c11 and I need to find expressions starting with \a and followed by digits. For example: \a574322
And I have no idea how to build it. I can't use:
Pattern p = Pattern.compile("\\a\\d*");
because \a is special character in regex.
When I try to group it like this:
Pattern p = Pattern.compile("(\\)(a)(\\d)*");
I get unclosed group error even though there is even number of brackets.
Can you help me with this?
Thank you all very much for solution.
You can use this regex:
\\\\a\\d+
Code Demo
Since in Java you need to double escape the \\ once for String and second time for regex engine.
You have to change your regex to:
Pattern p = Pattern.compile("(\\\\a\\d+)");
The regex is:
(\\a\d+)
The idea is to escape a backslash and then also escape the backslash for \a, and match digits too.
You need 4 \.
2 to indicate to regex that it is not a special character, but a plain \, and 2 for each to tell the Java String that these are not special characters either. So you need to represent it in code this way:
"\\\\a\\d*"
Which is actually the regex \\a\d*
\\(a)[0-9]+ this should work
you can't try your regexps on this page or some similar
http://regex101.com/

Cutting String java

I need to cut certain strings for an algorithm I am making. I am using substring() but it gets too complicated with it and actually doesn't work correctly. I found this topic how to cut string with two regular expression "_" and "."
and decided to try with split() but it always gives me
java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+
^
So this is the code I have:
String[] result = "234*(4-5)+56".split("+");
/*for(int i=0; i<result.length; i++)
{
System.out.println(result[i]);
}*/
Arrays.toString(result);
Any ideas why I get this irritating exception ?
P.S. If I fix this I will post you the algorithm for cutting and then the algorithm for the whole calculator (because I am building a calculator). It is gonna be a really badass calculator, I promise :P
+ in regex has a special meaning. to be treated as a normal character, you should escape it with backslash.
String[] result = "234*(4-5)+56".split("\\+");
Below are the metacharaters in regex. to treat any of them as normal characters you should escape them with backslash
<([{\^-=$!|]})?*+.>
refer here about how characters work in regex.
The plus + symbol has meaning in regular expression, which is how split parses it's parameter. You'll need to regex-escape the plus character.
.split("\\+");
You should split your string like this: -
String[] result = "234*(4-5)+56".split("[+]");
Since, String.split takes a regex as delimiter, and + is a meta-character in regex, which means match 1 or more repetition, so it's an error to use it bare in regex.
You can use it in character class to match + literal. Because in character class, meta-characters and all other characters loose their special meaning. Only hiephen(-) has a special meaning in it, which means a range.
+ is a regex quantifier (meaning one or more of) so needs to be escaped in the split method:
String[] result = "234*(4-5)+56".split("\\+");

regex help in java

I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.
You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+
Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.
Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr
there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.

regex to find substring between special characters

I am running into this problem in Java.
I have data strings that contain entities enclosed between & and ; For e.g.
&Text.ABC;, &Links.InsertSomething;
These entities can be anything from the ini file we have.
I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.
I am trying to use regex to pattern match and failing.
Can anyone suggest the regex for this problem?
Thanks!
Here is the regex:
"&[A-Za-z]+(\\.[A-Za-z]+)*;"
It starts by matching the character &, followed by one or more letters (both uppercase and lower case) ([A-Za-z]+). Then it matches a dot followed by one or more letters (\\.[A-Za-z]+). There can be any number of this, including zero. Finally, it matches the ; character.
You can use this regex in java like this:
Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");
Or just
"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");
If you want to remove whitespaces after the matched tokens, you can use this re:
"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace
And there is a nice online regular expression tester which uses the java regexp library.
http://www.regexplanet.com/simple/index.html
You can try:
input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");
See it

How to replace a special character with single slash

I have a question about strings in Java. Let's say, I have a string like so:
String str = "The . startup trace ?state is info?";
As the string contains the special character like "?" I need the string to be replaced with "\?" as per my requirement. How do I replace special characters with "\"? I tried the following way.
str.replace("?","\?");
But it gives a compilation error. Then I tried the following:
str.replace("?","\\?");
When I do this it replaces the special characters with "\\". But when I print the string, it prints with single slash. I thought it is taking single slash only but when I debugged I found that the variable is taking "\\".
Can anyone suggest how to replace the special characters with single slash ("\")?
On escape sequences
A declaration like:
String s = "\\";
defines a string containing a single backslash. That is, s.length() == 1.
This is because \ is a Java escape character for String and char literals. Here are some other examples:
"\n" is a String of length 1 containing the newline character
"\t" is a String of length 1 containing the tab character
"\"" is a String of length 1 containing the double quote character
"\/" contains an invalid escape sequence, and therefore is not a valid String literal
it causes compilation error
Naturally you can combine escape sequences with normal unescaped characters in a String literal:
System.out.println("\"Hey\\\nHow\tare you?");
The above prints (tab spacing may vary):
"Hey\
How are you?
References
JLS 3.10.6 Escape Sequences for Character and String Literals
See also
Is the char literal '\"' the same as '"' ?(backslash-doublequote vs only-doublequote)
Back to the problem
Your problem definition is very vague, but the following snippet works as it should:
System.out.println("How are you? Really??? Awesome!".replace("?", "\\?"));
The above snippet replaces ? with \?, and thus prints:
How are you\? Really\?\?\? Awesome!
If instead you want to replace a char with another char, then there's also an overload for that:
System.out.println("How are you? Really??? Awesome!".replace('?', '\\'));
The above snippet replaces ? with \, and thus prints:
How are you\ Really\\\ Awesome!
String API links
replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
replace(char oldChar, char newChar)
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
On how regex complicates things
If you're using replaceAll or any other regex-based methods, then things becomes somewhat more complicated. It can be greatly simplified if you understand some basic rules.
Regex patterns in Java is given as String values
Metacharacters (such as ? and .) have special meanings, and may need to be escaped by preceding with a backslash to be matched literally
The backslash is also a special character in replacement String values
The above factors can lead to the need for numerous backslashes in patterns and replacement strings in a Java source code.
It doesn't look like you need regex for this problem, but here's a simple example to show what it can do:
System.out.println(
"Who you gonna call? GHOSTBUSTERS!!!"
.replaceAll("[?!]+", "<$0>")
);
The above prints:
Who you gonna call<?> GHOSTBUSTERS<!!!>
The pattern [?!]+ matches one-or-more (+) of any characters in the character class [...] definition (which contains a ? and ! in this case). The replacement string <$0> essentially puts the entire match $0 within angled brackets.
Related questions
Having trouble with Splitting text. - discusses common mistakes like split(".") and split("|")
Regular expressions references
regular-expressions.info
Character class and Repetition with Star and Plus
java.util.regex.Pattern and Matcher
In case you want to replace ? with \?, there are 2 possibilities: replace and replaceAll (for regular expressions):
str.replace("?", "\\?")
str.replaceAll("\\?","\\\\?");
The result is "The . startup trace \?state is info\?"
If you want to replace ? with \, just remove the ? character from the second argument.
But when I print the string, it prints
with single slash.
Good. That's exactly what you want, isn't it?
There are two simple rules:
A backslash inside a String literal has to be specified as two to satisfy the compiler, i.e. "\". Otherwise it is taken as a special-character escape.
A backslash in a regular expresion has to be specified as two to satisfy regex, otherwise it is taken as a regex escape. Because of (1) this means you have to write 2x2=4 of them:"\\\\" (and because of the forum software I actually had to write 8!).
String str="\\";
str=str.replace(str,"\\\\");
System.out.println("New String="+str);
Out put:- New String=\
In java "\\" treat as "\". So, the above code replace a "\" single slash into "\\".

Categories