Trouble with regular expressions in java - java

I want to check given logical formulars with a regular expression.
The logical connectives for this form are & (and) , | (or), (!) negation sign (multiple negations allowed) and the variables are normal character sequences followed with cardinalities [0],[1],[0..1].
the variable names can also something be like this "F.G.H." or "F:G:H:" or simple "F" etc.
the square brackets belongs to the cardinalites.,lso constants are allowed, e.g.
with this pattern it is not working:
Pattern.compile("([!]*[a-zA-Z][\\.])?([!]*[a-zA-Z][\\.]?)*((\\[0\\])?|(\\[1\\])?|(\\[0\\.\\.1\\])?)|(TRUE)|(FALSE)|(&)|(|)|(!)");
my current case that a variable like this: !!F[0] is not accepted, but i want this to be accepted.
here some examples for the formulars, which i want to allow
!!F[0] & !F1.G[0..1] | (F1[1] | F2[0]) & F:G[0..1]
also whitespaces between each element, except variables and their cardinalities shall be allowed.

This one is quite awful but should suit your needs:
[!(]*([A-Z]+[0-9]*([.:][A-Z]+[0-9]*)*\[([01]|0[.]{2}1)\]|TRUE|FALSE)[)]*( *[&|] *[!(]*([A-Z]+[0-9]*([.:][A-Z]+[0-9]*)*\[([01]|0[.]{2}1)\]|TRUE|FALSE)[)]*)*
Demo
Please note that is simply allows parentheses without counting them, i.e. inputs such as !((!(F[0]) will match while only !((!(F[0]))) should.
If you want something already cleaner, you could build your regex step by step:
String atomVarPref = "[!(]*";
String atomVar = "[A-Z]+[0-9]*";
String atomSep = "[.:]";
String atomVarCard = "\\[([01]|0[.]{2}1)\\]";
String atomVarSuff = "[)]*";
String sep = " *[&|] *";
String varTemplate = "%s(%s(%s%s)*%s|TRUE|FALSE)%s";
String var = String.format(varTemplate, atomVarPref, atomVar, atomSep, atomVar, atomVarCard, atomVarSuff);
String regexTemplate = "%s(%s%s)*";
String regex = String.format(regexTemplate, var, sep, var);
Calling:
String input = "!!F[0] & !F1.G[0..1] | (F1[1] | F2[0]) & F:G[0..1]";
System.out.println(input.matches(regex)); // prints true

Related

String MUST contain a hexadecimal value - Regex for this? [duplicate]

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?
Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$
May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}
Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

Remove Special Characters For A Pattern Java

I want to remove that characters from a String:
+ - ! ( ) { } [ ] ^ ~ : \
also I want to remove them:
/*
*/
&&
||
I mean that I will not remove & or | I will remove them if the second character follows the first one (/* */ && ||)
How can I do that efficiently and fast at Java?
Example:
a:b+c1|x||c*(?)
will be:
abc1|xc*?
This can be done via a long, but actually very simple regex.
String aString = "a:b+c1|x||c*(?)";
String sanitizedString = aString.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(sanitizedString);
I think that the java.lang.String.replaceAll(String regex, String replacement) is all you need:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String).
there is two way to do that :
1)
ArrayList<String> arrayList = new ArrayList<String>();
arrayList.add("+");
arrayList.add("-");
arrayList.add("||");
arrayList.add("&&");
arrayList.add("(");
arrayList.add(")");
arrayList.add("{");
arrayList.add("}");
arrayList.add("[");
arrayList.add("]");
arrayList.add("~");
arrayList.add("^");
arrayList.add(":");
arrayList.add("/");
arrayList.add("/*");
arrayList.add("*/");
String string = "a:b+c1|x||c*(?)";
for (int i = 0; i < arrayList.size(); i++) {
if (string.contains(arrayList.get(i)));
string=string.replace(arrayList.get(i), "");
}
System.out.println(string);
2)
String string = "a:b+c1|x||c*(?)";
string = string.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(string);
Thomas wrote on How to remove special characters from a string?:
That depends on what you define as special characters, but try
replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since
you'd then either have to escape it or it would mean "any but these
characters".
Another note: the - character needs to be the first or last one on the
list, otherwise you'd have to escape it or it would define a range (
e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character
positioning, you might want to escape all those characters that have a
special meaning in regular expressions (the following list is not
complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex:
\p{P}\p{S} (keep in mind that in Java strings you'd have to escape
back slashes: "\p{P}\p{S}").
A third way could be something like this, if you can exactly define
what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
Here's less restrictive alternative to the "define allowed characters"
approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and
not a separator (whitespace, linebreak etc.). Note that you can't use
[\P{L}\P{Z}] (upper case P means not having that property), since that
would mean "everything that is not a letter or not whitespace", which
almost matches everything, since letters are not whitespace and vice
versa.

Split String at n-th character preserving words

Expanding on this answer, using this regex (?<=\\G.{" + count + "}); I would also like to modify the expression to not split words in the middle.
Example:
String string = "Hello I would like to split this string preserving these words";
if I want to split on 10 characters it would look like this:
[Hello I wo, uld like t, o split th, is string , preserving, these wor, ds]
Question:
Is this even possible using only regex, or would a lexer or some other string manipulation be needed?
UPDATE
This is what I want to use it on:
+ -------------------------------------------JVM Information------------------------------------------ +
| sun.boot.class.path : C:\Program Files\Java\jdk1.6.0_33\jre\lib\resources.jar;C:\Program Files\Java\ |
| jdk1.6.0_33\jre\lib\rt.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\sunrsasig |
| n.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\jsse.jar;C:\Program Files\Java |
| \jdk1.6.0_33\jre\lib\jce.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\charset |
| s.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\modules\jdk.boot.jar;C:\Progra |
| m Files\Java\jdk1.6.0_33\jre\classes |
+ ---------------------------------------------------------------------------------------------------- +
The box surrounding it has the character limit minus the key width, however this does not look good. This example is also not the only use-case, i use that box for multiple types of information.
I have looked at this problem and none of those replies actually convinced me! Here is my version. It is very likely that it can be improved.
public static String[] splitPresenvingWords(String text, int length) {
return text.replaceAll("(?:\\s*)(.{1,"+ length +"})(?:\\s+|\\s*$)", "$1\n").split("\n");
}
"not split words in the middle" does not define what should happen in case of "not splitting".
Given the split length being 10 and the string:
Hello I would like to split this string preserving these words
If you want to split right after a word, resulting in the list:
Hello I would, like to split, this string, preserving, these words
You can accomplish all kinds of tricky "splits" by using plain matching.
Simply match all occurences of this expression:
(?s)\G.{10,}?\b
(Using (?s) to turn on the DOTALL flag.)
In Perl it's as simple as #array = $str =~ /\G.{10,}?\b/gs, but Java seems to lack a quick function to return all matches, so you'd probably have to use a matcher and push the results on to an array/list.
No regex, but it seems to work:
List<String> parts = new ArrayList<String>();
while (true) {
// look for space to the left of n-th character
int index = string.lastIndexOf(" ", n);
if (index == -1) {
// no space to the left (very long word) -> next space to the right
// change this to 'index = n' to break words in this case
index = string.indexOf(" ", n);
}
if (index == -1) {
break;
}
parts.add(string.substring(0, index));
string = string.substring(index+1);
}
parts.add(string);
This will first look if there is a space to the left of the n-th character. In this case, the string is split there. Otherwise, it looks for the next space to the right. Alternatively, you could break the word in this case.

Match any word but two in specific

I'm trying to create a regular expresion to match any word ( \w+ ) except true or false.
This is what I got so far is: \w+\s*=\s*[^true|^false]\w+
class Ntnf {
public static void main ( String ... args ) {
System.out.println( args[0].matches("\\w+\\s*=\\s*[^true|^false]\\w+") );
}
}
But is not working for:
a = b
a = true
a = false
It matches always.
How can I match any word ( \w+ ) except true or false?
EDIT
I'm trying to spot this pattern:
a = b
x = y
name = someothername
etc = xyz
x = truea
n = falsea
But avoid matching
a = true
etc = false
name = true
You can use:
^(?!(true|false)$)
^ - beginning of string
?! - negative lookahead
$ - end of string
So it matches as long as the whole string isn't just "true" or "false". Note that it can still start with one of those.
However, it may be more straightforward to use regular string comparisons.
EDIT:
The whole regex (without escaping) for your situation is:
^\w+\s*=\s*(?!(true|false)$)\w+$
It's the same idea, except that we're putting it in the equation form.
[^true] Is a character class. It only matches one character. [^true] means: "Match this character only if it not one of t, r, u or e". This is not what you need, right?
Regex is not a good idea for this task. It will be quite complicated to do it in regex. Just use string comparison.
Square brackets match a list of possible characters, or reject a list of possible characters (not necessarily in the order you specify), so [^true] is not the way to go.
When I'm trying not to match a certain word, I usually do the following:
([^t]|t[^r]|tr[^u]|tru[^e])

Regex to check string contains only Hex characters

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?
Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$
May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}
Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

Categories