pattern matching in java using regular expression

pattern matching in java using regular expression - java

I am looking for a pattern to match this "LA5#10.232.140.133#Po6" and one more "LA5#10.232.140.133#Port-channel7" expression in Java using regular expression.
Like we have \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} for IP address validation.
Can we have the pattern like below? Please suggest--
[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]#\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}#Po\d[1-9]
[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]#\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}#Port-channel\d[1-9]
Thanks in advance.
==============================
In my program i have,
import java.util.regex.*;
class ptternmatch {
public static void main(String [] args) {
Pattern p = Pattern.compile("\\w\\w\\w#\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}#*");
Matcher m = p.matcher("LA5#10.232.140.133#Port-channel7");
boolean b = false;
System.out.println("Pattern is " + m.pattern());
while(b = m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
}
But i am getting compilation error with the pattern.--> Invalid escape sequence
The sequence will be like a ->a 3 character word of digit n letter#ipaddress#some text..

Well, if you want to validate the IP address, then you need something a little bit more involved than \d{1,3}. Also, keep in mind that for Java string literals, you need to escape the \ with \\ so you end up with a single backslash in the actual regex to escape a character such as a period (.).
Assuming the LA5# bit is static and that you're fine with either Po or Port-channel followed by a digit on the end, then you probably need a regex along these lines:
LA5#(((2((5[0-5])|([0-4][0-9])))|(1[0-9]{2})|([1-9][0-9]?)\\.){3}(2(5[0-5]|[0-4][0-9]))|(1[0-9]{2})|([1-9][0-9]?)#Po(rt-channel)?[1-9]
(Bracketing may be wonky, my apologies)

You can do something like matcher.find() and, if it is true, the groups to capture the information. Take a look a the tutorial here:
http://download.oracle.com/javase/tutorial/essential/regex/
You would need to wrap the necessary parts int parentheses - e.g. (\d{1,3}). If you wrap all 4, you will have 4 groups to access.
Also, take a look at this tutorial
http://www.javaworld.com/javaworld/jw-07-2001/jw-0713-regex.html?page=3
It's a very good tutorial, I think this one would explain most of your questions.
To match the second of your strings:
LA5#10.232.140.133#Port-channel7
you can use something like:
\w{2}\d#\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}#[a-zA-Z\-]+\d
This depends on what you want to do, so the regex might change.

Related

Porting Twemoji regex to extract Unicode emojis in Java

I'm trying to identify the same emojis in a String for extraction that Twemoji would, using Java. A straight up port isn't working for a great deal of emojis - I think I've identified the issue, so I'll give it in an example below:
Suppose we have the emoji 🪔 (Codeunits being \ud83e\ude94). In Javascript regex, this is captured by, \ud83e[\ude94-\ude99] which will first match the \ude83e then find subsequent \ude94 within the range indicated inside the brackets. The same expression in Java regex, however, fails to match at all. If I modify the Java pattern to [\ud83e[\ude94-\ude99]], according to an online engine, the 2nd half is captured, but not the 1st.
My working theory is that Java encounters the brackets and treats everything inside as a single codepoint and when combined with the outside codeunit, thinks it's looking for two codepoints instead of one. Is there an easy way to fix this or the regex pattern to work around it? The obvious fix would be to use something like [\ud83e\ude94-\ud83e\ude99], the actual regex pattern is quite lengthy. I wonder if there might be an easy encoding fix somewhere here as well.
Toy sample below:
public static void main(String[] args) {
String emojiPattern = "\ud83e[\ude94-\ude99]";
String raw = "\ud83e\ude94";
Pattern pattern = Pattern.compile(emojiPattern);
Matcher matcher = pattern.matcher(raw);
System.out.println(matcher.matches());
}

If you're trying to match a single specific codepoint, don't mess with surrogate pairs; refer to it by number:
String emojiPattern = "\\x{1FA94}";
or by name:
String emojiPattern = "\\N{DIYA LAMP}"
If you want to match any codepoint in the block U+1FA94 is in, use the name of the block in a property atom:
String emojiPattern = "\\p{blk=Symbols and Pictographs Extended-A}";
If you switch out any of these three regular expressions your example program will print 'true'.
The problem you're running into is a UTF-16 surrogate pair is a single codepoint, and the RE engine matches codepoints, not code units; you can't match just the low or high half - just the pattern "\ud83e" will fail to match too (When used with Matcher#find instead of Matcher#matches of course), for example. It's all or none.
To do the kind of ranged matching you want, you have to turn away from regular expressions and look at the code units directly. Something like
char[] codeUnits = raw.toCharArray();
for (int i = 0; i < codeUnits.length - 1; i++) {
if (codeUnits[i] == 0xD83E &&
(codeUnits[i + 1] >= 0xDE94 && codeUnits[i + 1] <= 0xDE99)) {
System.out.println("match");
}
}

Why do the Regex is not working properly?

I have some String like
s3://my-source-bucket/molomics/molecules35455720556210282.csv or,
s3://my-source-bucket/molecules10282.csv
s3://my-source-bucket/molename
Criterias:
1. the portion of `s3://` is fixed
2. the bucket name will be consists of letters, numbers and dash(-) and dots(.), say,
my-source-bucket and will be followed by /
3. Number 2 will repeat one or more time
4. In the end there will be no /
I would like to match them using the regex. I have this small program that I use to get the matches provided below,
public static void findMatchUsingRegex(String input) {
String REGEX = "(w+://)([0-9A-Za-z-]+/)([0-9A-Za-z-/]+)([0-9A-Za-z-.]+)?";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(input); // get a matcher object
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
In the online editor, I find the matches.However, these doesn't return anything as expected in the actual run of the program. How to change the regex to work it properly and may be to work better ?

Some points in order
Criterion #1 states that s3:// is fixed, so you can use that explicitly.
You need to escape special regex characters like ., -, and /. Because you're writing the regex as a Java string, you'll need to use two backslashes: \\. to match the literal ..
It looks like you can simplify your pattern quite a bit.
I don't know exactly what findMatchUsingRegex is supposed to do, but make sure you want to use Pattern.find over Pattern.match.
A solution
s3:\/(\/[0-9A-Za-z\-\.]+)+
Note how the \/ comes first, so the string must end with a number, letter, ., or -. In Java, you'll need to write this as:
s3:\\/(\\/[0-9A-Za-z\\-\\.]+)+
(Technically, you don't need to escape - and . here. But that's probably good practice because they're special characters.)

Java regular expression for number starts with code

I am not a Java developer but I am interfacing with a Java system.
Please help me with a regular expression that would detect all numbers starting with with 25678 or 25677.
For example in rails would be:
^(25677|25678)
Sample input is 256776582036 an 256782405036

^(25678|25677)
or
^2567[78]
if you do ^(25678|25677)[0-9]* it Guarantees that the others are all numbers and not other characters.
Should do the trick for you...Would look for either number and then any number after

In Java the regex would be the same, assuming that the number takes up the entire line. You could further simplify it to
^2567[78]
If you need to match a number anywhere in the string, use \b anchor (double the backslash if you are making a string literal in Java code).
\b2567[78]
how about if there is a possibility of a + at the beginning of a number
Add an optional +, like this [+]? or like this \+? (again, double the backslash for inclusion in a string literal).
Note that it is important to know what Java API is used with the regular expression, because some APIs will require the regex to cover the entire string in order to declare it a match.

Try something like:
String number = ...;
if (number.matches("^2567[78].*$")) {
//yes it starts with your number
}
Regex ^2567[78].*$ Means:
Number starts with 2567 followed by either 7 or 8 and then followed by any character.
If you need just numbers after say 25677, then regex should be ^2567[78]\\d*$ which means followed by 0 or n numbers after your matching string in begining.

The regex syntax of Java is pretty close to that of rails, especially for something this simple. The trick is in using the correct API calls. If you need to do more than one search, it's worthwhile to compile the pattern once and reuse it. Something like this should work (mixed Java and pseudocode):
Pattern p = Pattern.compile("^2567[78]");
for each string s:
if (p.matcher(s).find()) {
// string starts with 25677 or 25678
} else {
// string starts with something else
}
}
If it's a one-shot deal, then you can simplify all this by changing the pattern to cover the entire string:
if (someString.matches("2567[78].*")) {
// string starts with 25677 or 25678
}
The matches() method tests whether the entire string matches the pattern; hence the leading ^ anchor is unnecessary but the trailing .* is needed.
If you need to account for an optional leading + (as you indicated in a comment to another answer), just include +? at the start of the pattern (or after the ^ if that's used).

regular expression for key=(value) syntax

I am currently writing a java program with regular expression but I am struggling as I am pretty new in regex.
KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!##\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]/ ]*";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)";
The target syntax is key(value)+key(value)+key(value). Key is alphanumeric and value is allowed to be any combination.
This has been okay so far. However, I have a problem with '(', ')' in value. If I place '(' or ')' in the value, value includes all the rest.
e.g. number(abc(kk)123)+status(open) returns key:number, value:abc(kk)123)+status(open
It is supposed to be two pairs of key-value.
Can you guys suggest to improve the expression above?

Not possible with regular expressions at all, sorry. If you want to count opening and closing parantheses, regular expressions are, in general, not good enough. The language you are trying to parse is not a regular language.
Of course, there may be ways around that limitation. We cannot know that if you give us as little context as you did.

Get the matched group from index 1 and 2
([a-zA-Z0-9]+)\((.*?)\)(?=\+|$)
Here is online demo
The above regex pattern looks of for )+ as delimiter between keys and values.
Note: The above regex pattern will not work if value contains )+ for example number(abc(kk)+123+4+4)+status(open)
Sample code:
String str = "number(abc(kk)123)+status(open)";
Pattern p = Pattern.compile("([a-zA-Z0-9]+)\\((.*?)\\)(?=\\+|$)");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1) + ":" + m.group(2));
}
output:
number:abc(kk)123
status:open

Someone posted an answer with a working solution regex: ([a-zA-z0-9]+)\((.*?)\)(?=\+|$) - This works great. When I tested on online regex tester site and came back, the post had gone. Is it right solution? I am wondering why the answer has been deleted.
See this golfed regex:
([^\W_]+)\((.*?)\)(?![^+])
You can use a shorthanded character class [^\W_] instead of [a-zA-Z0-9].
You can use a negative lookahead assertion (?![^+]) to match without backtracking.
However, this is not a practical solution as )+ within inner elements will break: number(abc(kk)+5+123+4+4)+status(open)
This is the case where Java, which has the regex implementation that doesn't support recursion, is disadvantaged. As I mentioned in this thread, the practical approach would be to use a workaround (copy-paste regex), or build your own finite state machine to parse it.
Also, you have a typographical error in your original regex. [a-zA-z0-9]+ has a range "A-z". You meant to type "A-Z".

I'll do a little assumption that you're able to add a + at the end of your chunk
i.e. number(abc(kk)123)+status(open)+
If it is possible you'll have it work with:
KEY_EXPRESSION = "[a-zA-z0-9]+";
VALUE_EXPRESSION = "[a-zA-Z0-9\\*\\+,%_\\-!##\\$\\^=<>\\.\\?';:\\|~`&\\{\\}\\[\\]\\(\\)/ ]*?";
CHUNK_EXPRESSION = "(" + KEY_EXPRESSION + ")\\((" + VALUE_EXPRESSION + ")\\)+";
The changes are on line 2 adding the ( ) with escaping and replacing * by *?
The ? turn off the greedy matching and try to keep the shortest match (reluctant operator).
On line 3 adding a + at the end of the mask to help separate the key(value) fields.

Java Regexp clarification

I have a string like :
<RandomText>
executeRule(x, y, z)
<MoreRandomText>
What I would like to accomplish is the following: if this executeRule string exists in the bigger text block, I would like to get its 2'nd parameter.
How could I do this ?

What do you mean the bigger text block?
If you want to extract the second param from that expression, it would be something like
executeRule\(\w+,\s*(\w+),\s*\w+\)
The second param is held on capture group $1.
Keep in mind that to use this expression in Java, you need to escape the '\'. Also, I'm just assuming \w is good enough to match your params, that would depend on your particular rules.
If you need some help with actually using regexes in Java, there are many resources you can turn to, I found this tutorial to be fairly simple and it explains the basic usages:
http://www.vogella.de/articles/JavaRegularExpressions/article.html

import java.util.regex.Matcher;
import java.util.regex.Pattern;
...
Pattern p = Pattern.compile("executeRule\\(\\w+, (\\w+), \\w+\\)");
Matcher m = p.matcher(YOUR_TEXT_FROM_FILE);
while (m.find()) {
String secondArgument = m.group(1);
...process secondArgument...
}
Once this code executes secondArgument will contain the value of y. The above regular expression assumes that you expect the arguments to be composed of word characters (i.e. small and capital letters, digits and underscore).
Double backslashes are needed by Java string literal syntax, regexp engine will see single backslashes.
If you'd like to allow for whitespace in the string as it is allowed in most programming languages, you may use the following regexp:
Pattern p = Pattern.compile("executeRule\\(\\s*\\w+\\s*,\\s*(\\w+)\\s*,\\s*\\w+\\s*\\)");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pattern matching in java using regular expression - java

Related

Porting Twemoji regex to extract Unicode emojis in Java

Why do the Regex is not working properly?

Java regular expression for number starts with code

regular expression for key=(value) syntax

Java Regexp clarification

Categories

Resources