Regex to match only one character sequence within string - java

I have a string in a jList that I am looking to split with a regex (for future simplicity if requirements change)
The string looks a lot like this:
ID: GF68464, Name: productname
the ID could be any combination of letters and numbers and could be any length.
I only want the ID to be matched, i.e excluding "ID: " and anything after the comma following the ID.
Here is what I have thus far but it doesn't seem to do what I ask it to
[^ID: ][a-zA-Z1-9][^,^.]
FURTHER INFO (EDIT)
I plan on extracting the ID to match against an array. (hence the need for a regex). Could this be done a different way?

You can try this:
ID:\s*(\w+),
and extract the 1st capturing group. You can also use lookarounds (+1 to #p.s.w.g).
String str = "ID: GF68464, Name: productname";
Matcher m = Pattern.compile("ID:\\s*(\\w+),").matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
GF68464

You could try using lookarounds:
(?<ID:\s*)\w+(?=,)
This will match any sequence of one or more word characters preceded by "ID:" and any number of white space characters, and followed by a comma.

What you want is called a non-capturing group. There are already some fairly high-quality examples of doing this in Java on SO - for example, this question: What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

Create a regex like /^[a-z A-Z 0-9]*,/ then use can use match function and use value match[0] like
var regex = /^[a-z A-Z 0-9]*\,/;
var matches = your_string.match(regex);
var required_value = matches[0];
hope this helps

Related

Java Regex : How to return the whole word if the words ends with a specific string

Using Pattern/Matcher, I'm trying to find a regex in Java for searching in a text for table names that end with _DBF or _REP or _TABLE or _TBL and return the whole table names.
These tables names may contain one or more underscores _ in between the table name.
For example I'd like to retrieve table names like :
abc_def_DBF
fff_aaa_aaa_dbf
AAA_REP
123_frfg_244_gegw_TABLE
etc
Could someone please propose a regex for this ?
Or would it be easier to read text line by line and use String's method endsWith() instead ?
Many thanks in advance,
GK
Regex pattern
You could use a simple regex like this:
\b(\w+(?:_DBF|_REP|_TABLE|_TBL))\b
Working demo
Java code
For java you could use a code like below:
String text = "HERE THE TEXT YOU WANT TO PARSE";
String patternStr = "\\b(\\w+(?:_DBF|_REP|_TABLE|_TBL))\\b";
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
This is the match information:
MATCH 1
1. [0-11] `abc_def_DBF`
MATCH 2
1. [28-43] `fff_aaa_aaa_dbf`
MATCH 3
1. [45-52] `AAA_REP`
MATCH 4
1. [54-77] `123_frfg_244_gegw_TABLE`
Regex pattern explanation
If you aren't familiar with regex to understand how this pattern works the idea of this regex is:
\b --> use word boundaries to avoid having anything like $%&abc
(\w+ --> table name can contain alphanumeric and underscore characters (\w is a shortcut for [A-Za-z_])
(?:_DBF|_REP|_TABLE|_TBL)) --> must finish with any of these combinations
\b --> word boundaries again
Try this:
System.out.println("blah".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));
System.out.println("blah_TBL".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));
System.out.println("blah_TBL1".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));
This regexp should work to match the whole word:
\w+_([Dd][Bb][Ff]|REP|TABLE)
Here is is:
This regexp should work to match the keywords:
_(DBF)|(REP)|(TABLE)
The _ is matched, followed by either DBF or REP or TABLE.
It is unclear to me if you wish to match _dbf (lower case). If so simply change DBF to [Dd][Bb][Ff]:
_([Dd][Bb][Ff])|(REP)|(TABLE)
If you wish to match any more keywords just add another |(abc) group.
Of course this method works only if you know that these "keywords" will appear only once, and only at the end of the string. If you have 123_frfg_TABLE_244_gegw_TABLE for example you will match both.
Below is a screenshot of regexpal in action:
A simple alternative might be this regex ".*(_DBF|_REP|_TABLE|_TBL)$" which means any string that ends in _DBF or _REP or _TABLE or _TBL.
PS: Specify the regex to be caseless

Java Match string with optional hyphen

I am trying to match a series of string thats looks like this:
item1 = "some value"
item2 = "some value"
I have some strings, though, that look like this:
item-one = "some new value"
item-two = "some new value"
I am trying to parse it using regular expressions, but I can't get it to match the optional hyphen.
Here is my regex string:
Pattern p = Pattern.compile("^(\\w+[-]?)\\w+?\\s+=\\s+\"(.*)\"");
Matcher m = p.matcher(line);
m.find();
String option = m.group(1);
String value = m.group(2);
May someone please tell me what I could be doing wrong.
Thank you
I suspect that main reason of your problem is that you are expecting w+? to make w+ optional, where in reality it will make + quantifier reluctant so regex will still try to find at least one or more \\w here, consuming last character from ^(\\w+.
Maybe try this way
Pattern.compile("^(\\w+(?:-\\w+)?)\\s+=\\s+\"(.*?)\"");
in (\\w+(?:-\\w+)?) -> (?:-\\w+) part will create non-capturing group (regex wont count it as group so (.*?) will be group(2) even if this part will exist) and ? after it will make this part optional.
in \"(.*?)\" *? is reluctant quantifier which will make regex to look for minimal match that exist between quotation marks.
Demo
Your problem is that you have the ? in the wrong place:
Try this regex:
^((\\w+-)?\\w+)\\s*=\\s*\"([^\"]+)\"
But use groups 1 and 3.
I've cleaned up the regex a bit too
This regex should work for you:
^\w[\w-]*(?<=\w)\s*=\s*\"([^"]*)\"
In Java:
Pattern p = Pattern.compile("^\\w[\\w-]*(?<=\\w)\\s*=\\s*\"([^\"]*)\"");
Live Demo: http://www.rubular.com/r/0CvByDnj5H
You want something like this:
([\w\-]+)\s*=\s*"([^"]*)"
With extra backslashes for Java:
([\\w\\-]+)\\s*=\\s*\"([^\"]*)\"
If you expect other symbols to start appearing in the variable name, you could make it a character class like [^=\s] to accept any characters not = or whitespace, for example.

Need regex to match the given string

I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}

regex to find substring between special characters

I am running into this problem in Java.
I have data strings that contain entities enclosed between & and ; For e.g.
&Text.ABC;, &Links.InsertSomething;
These entities can be anything from the ini file we have.
I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.
I am trying to use regex to pattern match and failing.
Can anyone suggest the regex for this problem?
Thanks!
Here is the regex:
"&[A-Za-z]+(\\.[A-Za-z]+)*;"
It starts by matching the character &, followed by one or more letters (both uppercase and lower case) ([A-Za-z]+). Then it matches a dot followed by one or more letters (\\.[A-Za-z]+). There can be any number of this, including zero. Finally, it matches the ; character.
You can use this regex in java like this:
Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");
Or just
"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");
If you want to remove whitespaces after the matched tokens, you can use this re:
"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace
And there is a nice online regular expression tester which uses the java regexp library.
http://www.regexplanet.com/simple/index.html
You can try:
input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");
See it

Java - Extract strings with Regex

I've this string
String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";
and I need to extract these 3 substrings
1234
06:30
07:45
If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30
Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)
matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?
UPDATE
Thanks to Adam suggestion I've now this regex that match my string
Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})";
I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);
The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.
The javadoc on the Pattern class covers the regular expression syntax.
If you are looking for a pattern that might recur some number of times, you can use Matcher.find() repeatedly until it returns false. Matcher.group(0) once on each iteration will then return what matched that time.
If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.
In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.
Lets say I had strings of the form:
Eat 12 carrots at 12:30
Take 3 pills at 01:15
And I wanted to extract the quantity and times. My regular expression would look something like:
"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"
The code would look something like:
Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
System.out.println("The quantity is " + m.group(1));
System.out.println("The time is " + m.group(2));
}
The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.

Categories