I have a string with data separated by commas like this:
$d4kjvdf,78953626,10.0,103007,0,132103.8945F,
I tried the following regex but it doesn't match the strings I want:
[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,
The $ at the beginning of your data string is not matching the regex. Change the first character class to [$a-zA-Z0-9]. And a couple of the comma separated values contain a literal dot. [$.a-zA-Z0-9] would cover both cases. Also, it's probably a good idea to anchor the regex at the start and end by adding ^ and $ to the beginning and end of the regex respectively. How about this for the full regex:
^[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,$
Update:
You said number of commas is your primary matching criteria. If there should be 6 commas, this would work:
^([^,]+,){6}$
That means: match at least 1 character that is anything but a comma, followed by a comma. And perform the aforementioned match 6 times consecutively. Note: your data must end with a trailing comma as is consistent with your sample data.
Well your regular expression is certainly jarbled - there are clearly characters (like $ and .) that your expression won't match, and you don't need to \\ escape ,s. Lets first describe our requirements, you seem to be saying a valid string is defined as:
A string consisting of 6 commas, with one or more characters before each one
We can represent that with the following pattern:
(?:[^,]+,){6}
This says match one or more non-commas, followed by a comma - [^,]+, - six times - {6}. The (?:...) notation is a non-capturing group, which lets us say match the whole sub-expression six times, without it, the {6} would only apply to the preceding character.
Alternately, we could use normal, capturing groups to let us select each individual section of the matching string:
([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?
Now we can not only match the string, but extract its contents at the same time, e.g.:
String str = "$d4kjvdf,78953626,10.0,103007,0,132103.8945F,";
Pattern regex = Pattern.compile(
"([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?");
Matcher m = regex.matcher(str);
if(m.matches()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
This prints:
$d4kjvdf
78953626
10.0
103007
0
132103.8945F
Related
I'm trying to get a regex for the following expression but can't make it:
String have 4 words separated with dots(.).
First word matches a given one (HELLO for example).
Second and third words could have any character but dot itself (.).
Last word matches a given one again(csv for example).
So:
HELLO.something.Somethi#gElse.csv should match.
something.HELLO.?.csv shouldn't match.
HELLO.something...csv shouldn't match.
HELLO.something.somethingelse.notcsv shouldn't match
I can do it with split(.) and then check for individual words, but I'm trying to get it working with Regex and Pattern class.
Any help would be really appreciated.
This is relatively straightforward, as long as you understand character classes. A regex with square brackets [xyz] matches any character from the list {x, y, z}; a regex [^xyz] matches any character except {x, y, z}.
Now you can construct your expression:
^HELLO\.[^.]+\.[^.]+\.csv$
+ means "one or more of the preceding expression"; \. means "dot itself". ^ means "the beginning of the string"; $ means "the end of the string". These anchors prevent regex from matching
blahblahHELLO.world.world.csvblahblah
Demo.
A common goal for writing regular expressions like that is to capture some content, for example, the string between the first and the second dot, and the string between the second and the third dot. Use capturing groups to bring the content of these strings into your Java program:
^HELLO\.([^.]+)\.([^.]+)\.csv$
Each pair of parentheses defines a capturing group, indexed from 1 (group at index zero represents the capture of the entire expression). Once you obtain a match object from the pattern, you can query it for the groups, and extract the corresponding strings.
Note that backslashes in Java regex need to be doubled.
(^HELLO\.[^.]+\.[^.]+\.csv$)
Here is the same regex with token explanation on regex101.
I have String like below ,I want to get subString If any special character is there.
String myString="Regular $express&ions are <patterns <that can# be %matched against *strings";
I want out like below
express
inos
patterns
that
matched
Strings
Any one help me.Thanks in Advance
Note: as #MaxZoom pointed out, it seems that I didn't understand the OP's problem properly. The OP apparently does not want to split the string on special characters, but rather keep the words starting with a special character. The former is adressed by my answer, the latter by #MaxZoom's answer.
You should take a look at the String.split() method.
Give it a regexp matching all the characters you want, and you'll get an array of all the strings you want. For instance:
String myString = "Regular $express&ions are <patterns <that can# be %matched against *strings";
String[] words = myString.split("[$&<#%*]");
This regex will select words that starts with special character:
[$&<%*](\w*)
explanation:
[$&<%*] match a single character present in the list below
$&<%* a single character in the list $&<%* literally (case sensitive)
1st Capturing group (\w*)
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
g modifier: global. All matches (don't return on first match)
DEMO
MATCH 1 [9-16] express
MATCH 2 [17-21] ions
MATCH 3 [27-35] patterns
MATCH 4 [37-41] that
MATCH 5 [51-58] matched
MATCH 6 [68-75] strings
Solution in Java code:
String str = "Regular $express&ions are <patterns <that can# be %matched against *strings";
Matcher matcher = Pattern.compile("[$&<%*](\\w*)").matcher(str);
List<String> words = new ArrayList<>();
while (matcher.find()) {
words.add(matcher.group(1));
}
System.out.println(words.toString());
// prints [express, ions, patterns, that, matched, strings]
I have this requirement - for an input string such as the one shown below
8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs
I would like to strip the matched word boundaries (where the matching pair is 8 or & or % etc) and will result in the following
This is really a test of repl%acing %mul%tiple matched 9pairs
This list of characters that is used for the pairs can vary e.g. 8,9,%,# etc and only the words matching the start and end with each type will be stripped of those characters, with the same character embedded in the word remaining where it is.
Using Java I can do a pattern as \\b8([^\\s]*)8\\b and replacement as $1, to capture and replace all occurrences of 8...8, but how do I do this for all the types of pairs?
I can provide a pattern such as \\b8([^\\s]*)8\\b|\\b9([^\\s]*)9\\b .. and so on that will match all types of matching pairs *8,9,..), but how do I specify a 'variable' replacement group -
e.g. if the match is 9...9, the the replacement should be $2.
I can of course run it through multiple of these, each replacing a specific type of pair, but I am wondering if there is a more elegant way.
Or is there a completely different way of approaching this problem?
Thanks.
You could use the below regex and then replace the matched characters by the characters present inside the group index 2.
(?<!\S)(\S)(\S+)\1(?=\s|$)
OR
(?<!\S)(\S)(\S*)\1(?=\s|$)
Java regex would be,
(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)
DEMO
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)", "$2"));
Output:
This is reallly a test of repl%acing %mul%tiple matched 9pairs
Explanation:
(?<!\\S) Negative lookbehind, asserts that the match wouldn't be preceded by a non-space character.
(\\S) Captures the first non-space character and stores it into group index 1.
(\\S+) Captures one or more non-space characters.
\\1 Refers to the character inside first captured group.
(?=\\s|$) And the match must be followed by a space or end of the line anchor.
This makes sure that the first character and last character of the string must be the same. If so, then it replaces the whole match by the characters which are present inside the group index 2.
For this specific case, you could modify the above regex as,
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)([89&#%])(\\S+)\\1(?=\\s|$)", "$2"));
DEMO
(?<![a-zA-Z])[8&#%9](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[8&#%9](?![a-zA-Z])
Try this.Replace with $1 or \1.See demo.
https://regex101.com/r/qB0jV1/15
(?<![a-zA-Z])[^a-zA-Z](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[^a-zA-Z](?![a-zA-Z])
Use this if you have many delimiters.
How can i write this as a regular expression?
"blocka#123#456"
i have used # symbol to split the parameters in the data
and the parameters are block name,startX coordinate,start Y corrdinate
this is the data embedded in my QR code.so when i scan the QR i want to check if its the right QR they're scanning. For that i need a regular expression for the above syntax.
my method body
public void Store_QR(String qr){
if( qr.matches(regular Expression here)) {
CurrentLocation = qr;
}
else // Break the operation
}
The Information you specified does not justice using a regular expression at all.
Try to from it in a more general way.
If you really need to scan for "blocka#123#456" then use qr.contains("blocka#123#456");
It depends on what you want to match.
Here are some regex propositions:
^blocka#[0-9]{3}#[0-9]{3}$
^blocka#[0-9]+#[0-9]+$
^blocka(#[0-9]{3}){2}$
^blocka(#[0-9]+){2}$
^blocka(#[0-9]{3})+$
^blocka(#[0-9]+)+$
Otherwise, just use contains() or similar.
myregexp.com is nice to do some testing.
Official Java Regex Tutorial is quite ok to learn and includes most things one needs to know.
The Pattern documentation also includes fancy predefined character classes that are missing in above tutorial.
You did not specify anything that has to be regular in that example you gave. Regular expressions make only sense if there are rules to validate the input.
If it has to be exactly "blocka#123#456" then "blocka#123#456" or "^blocka#123#456$" will work as regex. Stuff between ^ and $ means that the regex inside must span from begin to end of the input. Sometimes required and usually a good idea to put that around your regex.
If blocka is dynamic replace it with [a-z]+ to match any sequence of lowercase letters a through z with length of at least 1. block[a-z] would match blocka, blockb, etc.
And [a-z]{6} would match any sequence of exactly 6 letters. [a-zA-Z] also includes uppercase letters and \p{L} matches any letter including unicode stuff (e.g. Blüc本).
# matches #. Like any character without special regex meaning ( \ ^ $ . | ? * + ( ) [ ] { } ) characters match themselves. [^#] matches every character but #.
Regarding the numbers: [0-9]+ or \d+ is a generic pattern for several numbers, [0-9]{1,4} would match anything consisting out of 1-4 numbers like 007, 5, 9999. (?:0|[1-9][0-9]{0,3}) for example will only match numbers between 0 and 9999 and does not allow leading zeros. (?:STUFF) is a non-capturing group that does not affect the groups you can extract via Matcher#group(1..?). Useful for logical grouping with |. The meaning of (?:0|[1-9][0-9]{0,3}) is: either a single 0 OR ( 1x 1-9 followed by 0 to 3 x 0-9).
[0-9] is so common that there is a predefinition for it : \d (digit). It's \\d inside the regex String since you have to escape the \.
So some of your options are
".*" which matches absolutely everything
"^[^#]+(?:#[^#]+)+$" which matches anything separated by # like "hello #world!1# -12.f #本#foo#bar"
"^blocka(#\\d+)+$" which matches blocka followed by at least one group of numbers separated by # e.g. blocka#1#12#0007#949432149#3
"^blocka#(?:[0-9]|[1-9][0-9]|[1-3][0-9]{2})#[4-9][0-9]{2}$" which will match only if it finds blocka# followed by numbers 0 - 399, followed by a # and finally numbers 400-999
"^blocka#123#456$" which matches only exactly that string.
All that are regular expressions that match the example you gave.
But it's probably as simple as
public void Store_QR(String qr){
if( qr.matches("^blocka#\\d+#\\d+$")) {
CurrentLocation = qr;
}
else // Break the operation
}
or
private static final Pattern QR_PATTERN = Pattern.compile("^blocka#(\\d+)#(\\d+)$");
public void Store_QR(String qr){
Matcher matcher = QR_PATTERN.matcher(qr);
if(matcher.matches()) {
int number1 = Integer.valueOf(matcher.group(1));
int number2 = Integer.valueOf(matcher.group(2));
CurrentLocation = qr;
}
else // Break the operation
}
BlockName#start_X#start_Y any block name.. starting with the string"block" and followed by two integers
I guess a good regex for that would be "^block\\w+#\\d+#\\d+$", starting with "block", then any combination of a-z, A-Z, 0-9 and _ (thats the \w) followed by #, numbers, #, numbers.
Would match block_#0#0, blockZ#9#9, block_a_Unicorn666#0000#1234, but not block#1#2 because there is no name at all and would not match blockName#123#abc because letters instead of number. Would also not match Block_a#123#456 because of the uppercase B.
If the name part (\\w+) is too liberal (___, _123 would be a legal names) use e.g. "^block_?[a-zA-Z]+#\\d+#\\d+$", what won't allow numbers and names may only be separated by a single optional _ and there have to be letters after that. Would allow _a, a, _ABc, but not _, _a_b, _a9. If you want to allow numbers in names [a-zA-Z0-9] would be the character class to use.
I suggest:
[a-z]+#\d+#\d+
And if you want capture the 3 parts:
([a-z]+)#(\d+)#(\d+)
Matcher.group( 1, 2 or 3 ) returns the parts
I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.
You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+
Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.
Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr
there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.