Regex and java to ignore keywords and string inside quotation - java

I'm searching for keywords that has to start with a letter followed by a letter or a character or nothing
Things I am looking for: x, x2, xx, and so on
the regular expression i have is [A-Za-z][A-Za-z0-9]+|[a-zA-Z]
I need to ignore words such as INT, WRITE, READ and so on, not sure how to implement
also if it comes across a string with quotation, I need it to ignore whatever that is inside the quotation?
any help?
Thanks in advance.

Your question is not clear to me. If you want to accept words that start with a letter, and continue with either a letter or a digit (or an underscore) ; but exclude words from a list, you can use the regex:
(?!\b(?:INT|WRITE|READ)\b)\b[A-Za-z]\w*\b
If, instead of a list, you want to exclude words that consist of all capitalized letters, then try:
(?!(?:\b[A-Z]+\b))\b[A-Za-z]\w*\b
In Java, I believe you need to double the backslashes for the metacharacters, so it might be something like:
"(?!\\b(?:INT|WRITE|READ)\\b)\\b[A-Za-z]\\w*\\b"
If you also want to exclude strings within quotes, you could use something like:
"[^"]+"|((?!\b(?:INT|WRITE|READ)\b)\b[A-Za-z]\w*\b)
and then check to see if there is anything within capturing group 1 which would NOT include the phrases delineated by the double quote marks.
Another option would be to replace all those parameters you don't want with nothing -- the word list as well as the quoted text. In Java, something like:
String resultString = subjectString.replaceAll("\"[^\"]*\"|\\b(?:WRITE|INT|READ)\\b", "");

Related

Regex for a Person Name

I wanted to make a TextField validation for a person name. I wanted to have person name to be valid if:
it contains only english alphabets a-z, A-Z
it may contain multiple space characters
it may contain any no. of dot characters for the names like A.B. Devilliers
What I have tried?
NAME_REGEX="(\\w|\\s|(\\.))+"
Note: I am working in Java.
When I have NAME_REGEX="(\\w|\\s)+" then rules 1. and 2. are followed but I also want the 3. rule.
It is very hard to specify what should be allowed in a name, so trying to tweak too much is probably counter-productive. If anything, I would err on the side or allowing more things.
It looks to me like you want something like this:
(?i)^(?:[a-z]+(?: |\. ?)?)+[a-z]$
On the demo, see which names are allowed.
I am assuming you want to start and end with a letter
You have not specified that any letter must specifically be in upper case, so this will accept aLan parsoN
If you want to allow quotes or other chars, let me know.
How it works:
(?i) puts us in case-insensitive mode
^ asserts that we are at the beginning of the string
(?:[a-z]+(?: |\. ?)?)+ matches {any letters, optionally followed by a space or dot or dot-space}, once or more
[a-z]$ ensures that the last character is a letter
Try This regex for java:::
NameRegex = ^^[a-z A-Z\\.\\s]+$

Constructing regular expression

I want to construct a regular expression in order to validate a string with Oval.
I got lost in all the signs and expression.
I want my string not to contain certain words at the beginning and doesn't contain some special words.
Like for example I want to exclude the words ignoreMe1,ignoreMe2,ignoreMe3at the beginning of the string and exclude ?;*/.
I tried to do this as start : ^(?!ignoreMe1|ignoreMe2|ignoreMe3) but it doesn't work.
How to proceed?
This will match anything that doesn't start with ignoreMe1/2/3 and does not contain any of those symbols
^(?!ignoreMe1|ignoreMe2|ignoreMe3)[^?;*/]+$
See example here

java regular expression exclusion list pattern

I understand that when I do [^abc] this will match any thing other than a,b, and c. What if I want it to match anything other than a "..". So far the exclusion list I have is:
[^<>:\"/\|?*]+
I want to add a ".." as well into this exclusion list. So in english it would be "if it's anything other than the left brackets, right brackets, double quote, asterix, double dot (".."), the rest of the characters here, then it should match".
The test case I need to pass is:
foo/../baz needs to be /baz
bar/../../foo needs to be /../foo
Not a java expert, but it looks like you have a negated character class defined there. A character class is basically a list of characters in that class, or in your case, not in that class, and you can apply this to a string.
It seems that you're most likely after a match for the string "..". If so, I think you just need a specific regex for it. Maybe this would do the trick:
\.\.
A dot "." by itself of course matches any single character, so the backslash escapes are needed to match an actual string of two dots instead of any two characters.
Ok I have been playing around for a bit and this will prevent a match if the .. string is present:
^(?:(?!(\.\.)).)*$
I'm going to carry on but you might consider simply running two separate regex and making sure neither match.
If you are not particularly interested in delimiter itself you could use this
String source = "aa^bb<cc>dd:ee\"ff/gg\\hh|ii?jj*kk:ll/mm..mn";
String regex = "[<>:\"/\\\\|?*^]+|[.]{2}";
String[] splits = source.split(regex);
System.out.println(Arrays.toString(splits));
Output
[aa, bb, cc, dd, ee, ff, gg, hh, ii, jj, kk, ll, mm, mn]
If java can do lookahead assertions, one way is this:
(?:(?!\.\.)[^<>:"/|?*])+ untested
edit the above will match up until the first ..
Its not clear what you are trying to do, but to validate the entire string to these conditions, simply add ^$ -
^(?:(?!\.\.)[^<>:"/|?*])+$ untested
#xonegirlz - The Way to exclude .. depends on what you're trying to do. Your "test case I need to pass" is some help, but the statement "needs to be" is vague.
It would be helpful if you could state it like
"I'm writing a function X that, given a String like Y,
would return a String like Z.
I'm trying to use a regex-replace to find ______ in Y
and replace all execpt _______ to return Z".
You're current question is asking only about negated character classes, and the answer to that is You can't do ("..") that way

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Regular expression removing all words shorter than n

Well, I'm looking for a regexp in Java that deletes all words shorter than 3 characters.
I thought something like \s\w{1,2}\s would grab all the 1 and 2 letter words (a whitespace, one to two word characters and another whitespace), but it just doesn't work.
Where am I wrong?
I've got it working fairly well, but it took two passes.
public static void main(String[] args) {
String passage = "Well, I'm looking for a regexp in Java that deletes all words shorter than 3 characters.";
System.out.println(passage);
passage = passage.replaceAll("\\b[\\w']{1,2}\\b", "");
passage = passage.replaceAll("\\s{2,}", " ");
System.out.println(passage);
}
The first pass replaces all words containing less than three characters with a single space. Note that I had to include the apostrophe in the character class to eliminate because the word "I'm" was giving me trouble without it. You may find other special characters in your text that you also need to include here.
The second pass is necessary because the first pass left a few spots where there were double spaces. This just collapses all occurrences of 2 or more spaces down to one. It's up to you whether you need to keep this or not, but I think it's better with the spaces collapsed.
Output:
Well, I'm looking for a regexp in Java that deletes all words shorter than 3 characters.
Well, looking for regexp Java that deletes all words shorter than characters.
If you don't want the whitespace matched, you might want to use
\b\w{1,2}\b
to get the word boundaries.
That's working for me in RegexBuddy using the Java flavor; for the test string
"The dog is fun a cat"
it highlights "is" and "a". Similarly for words at the beginning/end of a line.
You might want to post a code sample.
(And, as GameFreak just posted, you'll still end up with double spaces.)
EDIT:
\b\w{1,2}\b\s?
is another option. This will partially fix the space-stripping issue, although words at the end of a string or followed by punctuation can still cause issues. For example, "A dog is fun no?" becomes "dog fun ?" In any case, you're still going to have issues with capitalization (dog should now be Dog).
Try: \b\w{1,2}\b although you will still have to get rid of the double spaces that will show up.
If you have a string like this:
hello there my this is a short word
This regex will match all words in the string greater than or equal to 3 characters in length:
\w{3,}
Resulting in:
hello there this short word
That, to me, is the easiest approach. Why try to match what you don't want, when you can match what you want a lot easier? No double spaces, no leftovers, and the punctuation is under your control. The other approaches break on multiple spaces and aren't very robust.

Categories