How to match key/value groups with regular expressions

How to match key/value groups with regular expressions - java

Provided the following string:
#NAMEONE=any#character#OTHERNAME=any # character#THIRDNAME=even new lines
are possible
How can we match the full name/value pairs like #NAMEONE=any#character?
I am stuck with this regex (#(?:NAMEONE|OTHERNAME|THIRDNAME)=.+?)+ as it only matches #NAMEONE=a, #OTHERNAME=a etc. Using Java.

This would match any character but not of # and also # only if the preceding and following character of # is a non-word character.
"#(?:NAMEONE|OTHERNAME|THIRDNAME)=(?:\\B#\\B|[^#])*"
DEMO
or
"(?s)#(?:NAMEONE|OTHERNAME|THIRDNAME)=.*?(?=#(?:NAMEONE|OTHERNAME|THIRDNAME)=|$)"
DEMO

Here is a bit shorter version based on the uperrcase name for variables:
(#[A-Z]+=.+?)(?=#[A-Z]+=|$)
Explanation:
#[A-Z]+= matches the variable name and the = sign
.+? laziely matches any character
(?=#[A-Z]+=|$) positive look-ahead for variable name or end of string
Java code:
public static void test()
{
String str = "#NAMEONE=any # character#OTHERNAME=any # character#THIRDNAME=even";
Matcher matcher = Pattern.compile("(#[A-Z]+=.+?)(?=#[A-Z]+=|$)").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
prints
#NAMEONE=any # character
#OTHERNAME=any # character
#THIRDNAME=eve
DEMO

Related

Regulare expression for finding words started with # and end with whitespaces or new line

I am looking for a Java regex to find usernames in a text.
Usernames always start with #. No whitespace is acceptable after #. Usernames are combinations of upper- or lowercase letters, digits, ., _, -.
So, the regex should match words that start with # and end with whitespace or newline.
For example, the text hi #anyOne.2 I'm looking for #Name_13
#name14 #_n.a.m.e-15 but not # name16 contains the following matches: anyOne.2, Name_13, name14, _n.a.m.e-15.
I am using
String pattern = "#[^\\s]*(\\w+)";

You may use
\B#(\S+)
See the regex demo.
Details
\B - a non-word boundary, the char that is right before the current location must be a non-word char (or start of string)
# - a # char
(\S+) - Capturing group 1: one or more non-whitespace characters.
See Java sample usage:
String text = "hi #anyOne.2 I'm looking for #Name_13 #name14 #_n.a.m.e-15 but not # name16";
String pattern = "\\B#(\\S+)";
// Java 9+
String[] results = Pattern.compile(pattern).matcher(text).results().flatMap(grps -> Stream.of(grps.group(1))).toArray(String[]::new);
System.out.println(Arrays.toString(results)); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]
// Java 8 (include import java.util.stream.*)
Matcher m = Pattern.compile(pattern).matcher(text);
List<String> strs = new ArrayList<>();
while(m.find()) {
strs.add(m.group(1));
}
System.out.println(strs); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]

what will be the java regex to Split a string based the last occurance of an unescaped # character

What will be the regex pattern to split a string based on the last occurance of an un-escaped # character ?
For eg:
Path1\\P#ath2\\Path3\\File1\\#12.1234wer#tjava\\#rep\o1 - should split using the 3rd # symbol
Path1\\Path2\\Path3\\File1\\#12.1234wertjava\\#repo1# - should split using the last # symbol
Path1\\Path2\\Pat#h3\\File1\\12.1234wertjava\\#rep\\o1 - should split using the first # symbol

You need a negative lookbehind:
string.split("(?<!\\\\)#");
This splits string by all # which are not preceeded by a backslash. Here's a solution to split the string by the last # not preceeded by a backslash.
Pattern p = Pattern.compile("(.*)(?<!\\\\)#(.*)");
Matcher matcher = p.matcher(string);
String[] parts = new String[2];
if (matcher.matches()) {
parts[0] = matcher.group(1);
parts[1] = matcher.group(2);
}

You could try this regex: /[^\\]#/. It takes the first # not preceded by \.
Then use your split function in the language you use (it seems to be Java, according to your tags, still you could have posted some code you tried) to split the string using this pattern.

If you mean to split string last occurrence of # ...
String filePath= "Path1\P#ath2\Path3\File1\#12.1234wer#tjava\#rep\o1";
int p=filePath.lastIndexOf(pattern.quote("#"));
String e=filePath.substring(p+1);

I have email address. I dont want to show full information. So I am thinking mask some character using Regex or MaskFormatter

Input and desired result
1) testing#mint.com - t***g#m***.**m
I want to do this by using regex or mask.
Here is what I have done so far:
public class MobileMasking {
public static void main(String[] args) {
String email = "danish3jawed#gmail.com";
String masked = email.replaceAll("(?<=.).(?=[^#]*?.#)", "*");
System.out.println(masked);
}
}
When I use
String email = "testing#mint.com";
String masked = email.replaceAll("(?<=.).(?=[^#]*?.#)", "");
The output is coming like t*g#mint.com but i want like this tg#m***.**m.

You can use this lookaround based regex:
email = email.replaceAll("(?<!^|#)[^.#](?!#|$)", "*");
RegEx Demo
RegEx Breakup:
(?<!^|#) # previous char is not start OR #
[^.#] # current char is not a DOT or #
(?!#|$) # next char is not line end OR #

https://regex101.com/r/rH3wC6/2
Explained:
(?<=^.|#.) # match the start or a #, followed by any non-newline character
(?<toreplace>[^#]*?) # match anything except #
(?=.#|.$) # match any character followed by the end of the string or a #
Demo: http://ideone.com/PeRvy1
Wait, do you need to keep the dots in the second part? Isn't it useless?

Regex that allows only single separators between words

I need to construct a regular expression such that it should not allow / at the start or end, and there should not be more than one / in sequence.
Valid Expression is: AB/CD
Valid Expression :AB
Invalid Expression: //AB//CD//
Invalid Expression: ///////
Invalid Expression: AB////////
The / character is just a separator between two words. Its length should not be more than one between words.

Assuming you only want to allow alphanumerics (including underscore) between slashes, it's pretty trivial:
boolean foundMatch = subject.matches("\\w+(?:/\\w+)*");
Explanation:
\w+ # Match one or more alnum characters
(?: # Start a non-capturing group
/ # Match a single slash
\w+ # Match one or more alnum characters
)* # Match that group any number of times

This regex does it:
^(?!/)(?!.*//).*[^/]$
So in java:
if (str.matches("(?!/)(?!.*//).*[^/]"))
Note that ^ and $ are implied by matches(), because matches must match the whole string to be true.

[a-zA-Z]+(/[a-zA-Z]+)+
It matches
a/b
a/b/c
aa/vv/cc
doesn't matches
a
/a/b
a//b
a/b/
Demo
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Reg {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[a-zA-Z]+(/[a-zA-Z]+)+");
Matcher matcher = pattern.matcher("a/b/c");
System.out.println(matcher.matches());
}
}

Regex for numeric portion of Java string

I'm trying to write a Java method that will take a string as a parameter and return another string if it matches a pattern, and null otherwise. The pattern:
Starts with a number (1+ digits); then followed by
A colon (":"); then followed by
A single whitespace (" "); then followed by
Any Java string of 1+ characters
Hence, some valid string thats match this pattern:
50: hello
1: d
10938484: 394958558
And some strings that do not match this pattern:
korfed49
: e4949
6
6:
6:sdjjd4
The general skeleton of the method is this:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
// Else, return null.
}
Here's my best attempt so far, but I know I'm wrong:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
String regex = "???";
if(toMatch.matches(regex))
return toMatch.substring(0, toMatch.indexOf(":"));
// Else, return null.
return null;
}
Thanks in advance.

Your description is spot on, now it just needs to be translated to a regex:
^ # Starts
\d+ # with a number (1+ digits); then followed by
: # A colon (":"); then followed by
# A single whitespace (" "); then followed by
\w+ # Any word character, one one more times
$ # (followed by the end of input)
Giving, in a Java string:
"^\\d+: \\w+$"
You also want to capture the numbers: put parentheses around \d+, use a Matcher, and capture group 1 if there is a match:
private static final Pattern PATTERN = Pattern.compile("^(\\d+): \\w+$");
// ...
public String extractNumber(String toMatch) {
Matcher m = PATTERN.matcher(toMatch);
return m.find() ? m.group(1) : null;
}
Note: in Java, \w only matches ASCII characters and digits (this is not the case for .NET languages for instance) and it will also match an underscore. If you don't want the underscore, you can use (Java specific syntax):
[\w&&[^_]]
instead of \w for the last part of the regex, giving:
"^(\\d+): [\\w&&[^_]]+$"

Try using the following: \d+: \w+

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to match key/value groups with regular expressions - java

This would match any character but not of # and also # only if the preceding and following character of # is a non-word character. "#(?:NAMEONE|OTHERNAME|THIRDNAME)=(?:\\B#\\B|[^#])" DEMO or "(?s)#(?:NAMEONE|OTHERNAME|THIRDNAME)=.?(?=#(?:NAMEONE|OTHERNAME|THIRDNAME)=|$)" DEMO

Related

Regulare expression for finding words started with # and end with whitespaces or new line

what will be the java regex to Split a string based the last occurance of an unescaped # character

I have email address. I dont want to show full information. So I am thinking mask some character using Regex or MaskFormatter

Regex that allows only single separators between words

Regex for numeric portion of Java string

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to match key/value groups with regular expressions - java

This would match any character but not of # and also # only if the preceding and following character of # is a non-word character. "#(?:NAMEONE|OTHERNAME|THIRDNAME)=(?:\\B#\\B|[^#])*" DEMO or "(?s)#(?:NAMEONE|OTHERNAME|THIRDNAME)=.*?(?=#(?:NAMEONE|OTHERNAME|THIRDNAME)=|$)" DEMO

Related

Regulare expression for finding words started with # and end with whitespaces or new line

what will be the java regex to Split a string based the last occurance of an unescaped # character

I have email address. I dont want to show full information. So I am thinking mask some character using Regex or MaskFormatter

Regex that allows only single separators between words

Regex for numeric portion of Java string

Categories

Resources

This would match any character but not of # and also # only if the preceding and following character of # is a non-word character. "#(?:NAMEONE|OTHERNAME|THIRDNAME)=(?:\\B#\\B|[^#])" DEMO or "(?s)#(?:NAMEONE|OTHERNAME|THIRDNAME)=.?(?=#(?:NAMEONE|OTHERNAME|THIRDNAME)=|$)" DEMO