I have not been able to find a proper regex to match any string not ending with some condition. For example, I don't want to match anything ending with an a.
This matches
b
ab
1
This doesn't match
a
ba
I know the regex should be ending with $ to mark the end, though I don't know what should preceed it.
Edit: The original question doesn't seem to be a legit example for my case. So: how to handle more than one character? Say anything not ending with ab?
I've been able to fix this, using this thread:
.*(?:(?!ab).).$
Though the downside with this is, it doesn't match a string of one character.
You don't give us the language, but if your regex flavour support look behind assertion, this is what you need:
.*(?<!a)$
(?<!a) is a negated lookbehind assertion that ensures, that before the end of the string (or row with m modifier), there is not the character "a".
See it here on Regexr
You can also easily extend this with other characters, since this checking for the string and isn't a character class.
.*(?<!ab)$
This would match anything that does not end with "ab", see it on Regexr
Use the not (^) symbol:
.*[^a]$
If you put the ^ symbol at the beginning of brackets, it means "everything except the things in the brackets." $ is simply an anchor to the end.
For multiple characters, just put them all in their own character set:
.*[^a][^b]$
To search for files not ending with ".tmp" we use the following regex:
^(?!.*[.]tmp$).*$
Tested with the Regex Tester gives following result:
.*[^a]$
the regex above will match strings which is not ending with a.
Try this
/.*[^a]$/
The [] denotes a character class, and the ^ inverts the character class to match everything but an a.
The accepted answer is fine if you can use lookarounds. However, there is also another approach to solve this problem.
If we look at the widely proposed regex for this question:
.*[^a]$
We will find that it almost works. It does not accept an empty string, which might be a little inconvinient. However, this is a minor issue when dealing with just a one character. However, if we want to exclude whole string, e.g. "abc", then:
.*[^a][^b][^c]$
won't do. It won't accept ac, for example.
There is an easy solution for this problem though. We can simply say:
.{,2}$|.*[^a][^b][^c]$
or more generalized version:
.{,n-1}$|.*[^firstchar][^secondchar]$
where n is length of the string you want forbid (for abc it's 3), and firstchar, secondchar, ... are first, second ... nth characters of your string (for abc it would be a, then b, then c).
This comes from a simple observation that a string that is shorter than the text we won't forbid can not contain this text by definition. So we can either accept anything that is shorter("ab" isn't "abc"), or anything long enough for us to accept but without the ending.
Here's an example of find that will delete all files that are not .jpg:
find . -regex '.{,3}$|.*[^.][^j][^p][^g]$' -delete
The question is old but I could not find a better solution I post mine here. Find all USB drives but not listing the partitions, thus removing the "part[0-9]" from the results. I ended up doing two grep, the last negates the result:
ls -1 /dev/disk/by-path/* | grep -P "\-usb\-" | grep -vE "part[0-9]*$"
This results on my system:
pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0
If I only want the partitions I could do:
ls -1 /dev/disk/by-path/* | grep -P "\-usb\-" | grep -E "part[0-9]*$"
Where I get:
pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0-part1
pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0-part2
And when I do:
readlink -f /dev/disk/by-path/pci-0000:00:0b.0-usb-0:1:1.0-scsi-0:0:0:0
I get:
/dev/sdb
Anything that matches something ending with a --- .*a$ So when you match the regex, negate the condition
or alternatively you can also do .*[^a]$ where [^a] means anything which is not a
If you are using grep or sed the syntax will be a little different. Notice that the sequential [^a][^b] method does not work here:
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n'
jd8a
8$fb
q(c
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a]$"
8$fb
q(c
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b]$"
jd8a
q(c
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^c]$"
jd8a
8$fb
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a][^b]$"
jd8a
q(c
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a][^c]$"
jd8a
8$fb
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a^b]$"
q(c
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^a^c]$"
8$fb
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b^c]$"
jd8a
balter#spectre3:~$ printf 'jd8a\n8$fb\nq(c\n' | grep ".*[^b^c^a]$"
FWIW, I'm finding the same results in Regex101, which I think is JavaScript syntax.
Bad: https://regex101.com/r/MJGAmX/2
Good: https://regex101.com/r/LzrIBu/2
Related
I have input String '~|~' as the delimiter.
For example:
String s = "1~|~Vijay~|~25~|~Pune";
when I am splitting it with '~\\|~' in Java it is working fine.
String sa[] = s.split("~\\|~", -1);
for(String str : sa) {
System.out.println(str);
}
I am getting the below output.
1
Vijay
25
Pune
When the same program I am running by passing a command-line argument('~\\|~'). It is not properly parsing the string and giving it below output.
1
|
Vijay
|
25
|
Pune
Is anyone else facing the same issue? please comment on this issue.
You only need a single backslash when running it from the command line. The reason you need two when making the regular expression in Java is that backslash is used to escape the next character in a string literal or start an escape sequence so one backslash is needed to escape the next one in order for it to be interpreted literally.
~\|~
Please, do a System.out.println("[" + args[i] + "]"); to see what java is receiving from the command line, as the \ character is special for the shell and aso are the | and ~ chars (the last one expands to your home directory, which could be a problem)
You need to pass:
java foo_bar '~\|~'
(Java still needs a single \ this time to escape the vertical bar, as you are not writing a string literal for the java compiler but a simple string representing the internal representation of the above string literal, the \ character doesn't need to be escaped, as it is inside single quotes so it is passed directly to the java program) Any quoting (single or double quotes) suffices to avoid ~ expansion.
If you are passing
java foo_bar '~\\|~'
the shell will not assume the \ as a escaping character and will pass the equivalent to this String literal:
String sa[] = s.split("~\\\\|~", -1); /* to escapes mean a literal escape */
(see that now the vertical bar doesn't have its special significance)
...which is far different (you meant this time: split on one ~\ sequence, this is, a ~ followed by a backslash, or just a single ~ character, and as there are no ~s followed by a backslash, the second option was used. You should get:
1
|
Vijay
|
25
|
Pune
Which is the output you post.
You don't have to escape:
public static void main(String[] args) {
Pattern p = Pattern.compile(args[0], Pattern.LITERAL);
final String[] result = p.split("1~|~Vijay~|~25~|~Pune");
Arrays.stream(result).forEach(System.out::println);
}
Running:
javac Main.java
java Main "~|~"
Output:
1
Vijay
25
Pune
Where args[0] is equal to ~|~ (no escaping). The trick is that pattern flag, Pattern.LITERAL, which treats every character, including |, as normal character, ignoring their meta meaning.
Given a text:
Why should the number 12.8 be rounded to 13. It must be rather 11
What must be a regex to extract, the integer values only:
13
11
I tried this: \d+(?!\\.)
But still no luck.
You need to use lookarounds (lookbehind, lookahead) to check what happens before and after the digits you match:
a naive approach:
(?<![0-9]|[0-9]\.)[0-9]+(?!\.?[0-9])
an efficient approach:
[0-9](?<![0-9][0-9]|[0-9]\.[0-9])[0-9]*+(?!\.[0-9])
(Because this one quickly discards positions where there is not a digit)
Note: don't forget to escape the backslashes in the java string.
You can also write it like this:
\b[0-9](?<![0-9]\.[0-9])[0-9]*+(?!\.[0-9])
I solved applying two regex. The command line bellow shows how they work:
echo "Why number 12.8 be rounded to 13. It must be rather 11" | grep -Po '\b\d+\.?\d\b' | grep -Po '^\d+$'
The first regex select all numbers, including floating points. The second regex selects only integers.
In java, use "\\b\\d+\\.?\\d\\b" to select all numbers, and "^\\d+$" to select only integers.
I have written this simple script for matching in shell script
file_name="xyz_abc_diagnostics.wifi2.2015-07-30.12-30-52.tar.gz"
chk_regex=".*\.\d+\-\d+\-\d+\.\d+\-\d+\-\d+.*"
if [[ "$file_name" =~ $chk_regex ]];then
echo "in obs regex"
else
echo "dont triggered"
fi
I have checked this regex in java and here it is working fine.
my syntax is right because when i use
.*
it is working fine.
For shell script regex testing i have use this site
http://regexraptor.net/ to check it also don't matching but in https://regex101.com/ which uses java regex it matches.
I am not able to understand why it is failing in shell script.
Is there any difference in shell script regex?If yes then please suggest me what changes i have to make
It is wrong to assume that all flavours of regex are the same. In this case, \d is not supported by bash regular expressions. You should change your regex to this:
chk_regex='\.[0-9]+-[0-9]+-[0-9]+\.[0-9]+-[0-9]+-[0-9]+'
Of course, this assumes that when you say \d you don't require anything more than the digits from 0 to 9, as opposed to anything considered to be a digit in your locale. If you want to also match characters outside this range, then [[:digit:]] is probably what you want, instead of [0-9].
If you don't require parameter expansion, it's generally a good habit to use ' rather than ".
I have also removed the leading and trailing .* (as they don't do anything useful) and un-escaped the - (thanks for the comment gniourf_gniourf).
Working example:
$ file_name="xyz_abc_diagnostics.wifi2.2015-07-30.12-30-52.tar.gz"
$ chk_regex='\.[0-9]+-[0-9]+-[0-9]+\.[0-9]+-[0-9]+-[0-9]+'
$ if [[ "$file_name" =~ $chk_regex ]];then
> echo "in obs regex"
> else
> echo "dont triggered"
> fi
in obs regex
As you can see, the pattern matches, so the if branch is taken.
As mentioned in the comments, you can use globs to match this pattern as well:
[[ $file_name = *.+([[:digit:]])-+([[:digit:]])-+([[:digit:]]).+([[:digit:]])-+([[:digit:]])-+([[:digit:]])* ]]
Granted, it's longer to write but globs may be useful if you wanted to loop through files matching this pattern, for example:
for archive in *.+([[:digit:]])-+([[:digit:]])-+([[:digit:]]).+([[:digit:]])-+([[:digit:]])-+([[:digit:]])*
do
# some stuff
done
Note that in the example containing a loop (and in both examples on older versions of bash) you will need to enable extended globs using shopt -s extglob.
Here is a fix, use [0-9] class instead of a \d and use {2} limiting quantifier to make it shorter (and really, the leading/trailing .* are useless since you are not using the matched string, just check for presence):
#!/bin/bash
file_name="xyz_abc_diagnostics.wifi2.2015-07-30.12-30-52.tar.gz"
chk_regex="(\.[0-9]+(-[0-9]+){2}){2}"
if [[ "$file_name" =~ $chk_regex ]];then
echo "in obs regex"
else
echo "dont triggered"
fi
See IDEONE demo
Result: in obs regex
I have the following string in java.
"sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu (operating system)|Ubuntu]] sdfspp"
I want to use String#replaceAll(regex) to get the following
"sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu]] sdfspp"
I tried
s = s.replaceAll("(\\[\\[)(.+)(\\|)(.+)(\\]\\])}", "$4");
without success
any help?
thanks.
This works for me (for given string):
s = s.replaceAll("(\\[\\[)([^\\[\\]]+)(\\|)([^\\[\\]]+)(\\]\\])", "[[$4]]");
Demo on ideone.
Regex questions should always specify what rules you want your search, or your transformation to follow. Questions like "I have this specific string, and I want to get that specific string as a result" are never good enough, because we are left guessing what's supposed to happen if you give it a different string as input. There are always several possible ways we can interpret the question, and we have to guess which one. We are not mind readers.
Assuming that your rule is "if you see | followed by some text inside [[ and ]], then remove the | and the preceding text": then this should work:
s = s.replaceAll("\\[\\[.*\\|(.*?\\]\\])","[[$1");
What this does is:
First part: picks up the first [[.
Second part: picks up some text, followed by |.
Third part: picks up the text following the |, followed by other ]]. This part is in parentheses; therefore it becomes group 1. ? in .*? is a "reluctant" qualifier, which means it matches as few characters as possible to get to the next ]]. This is necessary because you don't want the match to zoom through all your ]] if you have more than one [[..]] in the input.
The replacement text is [[ followed by this third part (group 1). Thus, the second part, i.e. the text followed by |, is removed.
Your attempt:
s = s.replaceAll("(\\[\\[)(.+)(\\|)(.+)(\\]\\])}", "$4");
has a } in it that appears to be a typo. If you remove it, the statement
will pick up the [[, following text, |, following text, and ]], and replace the entire match with the text following the | (group 4). That is, it will delete the [[, first part of the inner text, the |, and the ]], which is kind of the opposite of what you want--you're deleting the things you want to keep, mostly, and keeping the things you want to delete.
It seems that you are looking for something like
replaceAll("\\[\\[([^|\\]]*\\|)?([^|\\]]*)]]", "[[$2]]")
This regex will search for data which
starts with [[ and ends with ]]
and in the middle have optional non | or non ] characters with pipe after it (like Ubuntu (operating system)|) which will be placed in group 1 (not important or used later)
rest of non | or non ] characters which are placed before closing ]] like Ubuntu]] - this part will be placed in group 2 and we will want to reuse it in replacement
So all you need to do is replace it with [[ and ]] with part from group 2 between [[$2]].
Demo:
String s = "sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu (operating system)|Ubuntu]] sdfspp";
System.out.println(s.replaceAll("\\[\\[([^|\\]]*\\|)?([^|\\]]*)]]", "[[$2]]"));
Output: sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu]] sdfspp
It seems you want to keep z, not y. So search for \[[^[]+\| and replace it with [ (escaping the backslashes appropriately).
I.e., delete the sequence of chars that are not [, between [ and |.
Try this regex:
(.+:\s\[\[)(.+)\|(.+)
It works like that:
String tem = "sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu (operating system)|Ubuntu]] sdfspp";
tem=tem.replaceAll("(.+:\\s\\[\\[)(.+)\\|(.+)","$1$3");
System.out.println(tem);
Output:
sdfsdfsdf [[Ubuntu Touch]]: [[Ubuntu]] sdfspp
Explanation:
(.+:\s\[\[)
This part finds a chain of characters(.+) followed by :, space (\s) and two braces (\[\[). Then groups it in first group $1.
(.+)
This part finds all characters inside the brackets [[ but before the pipe | and groups it as $2.
\|
This part finds the pipe |.
(.+)
This part finds all characters after the pipe | and groups it as $3.
As return value you want all from $1 and $3.
My task was to write a regular expression in java and shell script which will validate all strings contaning only a single digit and any number of characters preceded or followed by that digit.
I wrote [a-z]*[0-9]{1}[a-z]* and it worked fine for Java but it is not at all working for Shell Script
Can anyone please help me to create a regular esxpression meeting my requirement in Shell Script
Edit (from comments):
I tried with grep but it did not gave me useful results. [...] If the string validates I want to perform certain operations. So I used it like
if [$password eq [a-z]*[0-9]{1}[a-z]]; then
# "do this"
else
# "do that"
fi
Could you put some light on syntax differences in Java and shell script related to regular expressions?
You did not specify which tool shall evaluate the regexp in the shell environment. But most tools will not recognize the {1} part. This part is not necessary anyway, because a [0-9] alone stand also for exactly one occurrence.
The shell script part would look like this:
if [[ "$var" =~ ^[a-zA-Z]*[0-9][a-zA-Z]*$ ]]; then
echo matching
else
echo not matching
fi
The key parts are:
use the [[ expression, not the [ one, because the former supports pattern matching (via ==) and regexp matching (via =~)
The =~, like most regexp libraries, will return true if the pattern matches any part of the string, not necessarily the complete string. For example [a-z]*[0-9][a-z]* would match foo123bar456bla because the pattern matches the 3bar part with "zero occurrences of the first [a-z], one occorrence of [0-9] and three occurrences of the second [a-z]". Therefore is is necessary to pin the regexp to the start and the end of the string using ^ and $.
In Java:
String var = ...;
if( var.matches("[a-zA-Z]*[0-9][a-zA-Z]*") )
System.out.println("matches");
Here the String.matches implicitly matches the complete string. That's all a bit fuddled by history.