Regex not filter by delimiters - java

I want to create a regular expresion where I want match in case my number are separated by a coma.
For example:
1 OK
1,2,3 OK
1\n2,3 OK
1,\n Not OK
1,,2 Not OK
1,\n2 Not Ok
So far I create this expresion
\d+(([,.|\n])+\d+)*
If I change the last * to be at least 1 with +
\d+(([,.|\n])+\d+)+
Then all previous scenarios works but not this one
1 Not OK//And should be ok
I´m using matcher.find()
Matcher matcher = Pattern.compile(pattern).matcher(number);
if (matcher.find()) {
System.out.println("total number:" + matcher.group(0));;
}
Any idea what I´m doing wrong in my regex?

You can use this regex:
^\d+(?:(?:,|\n)\d+)*$
Java regex:
Pattern p = Pattern.compile("^\\d+(?:(?:,|\\n)\\d+)*$");
RegEx Demo
PS: To match literal \n you will need:
^\d+(?:(?:,|\\n)\d+)*$

Related

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)

search and replace string in java using pattern

Given the string
Content ID [9283745997] Content ID [9283005997] There can be text in between Content ID [9283745953] Content ID [9283741197] Content ID [928374500] There can be valid text here which should not be removed.
I want to remove the text starting Content ID followed by [9283745997] any numbers can be present between square brackets. Eventually I want the result string to be
There can be text in between There can be valid text here which should not be removed.
Could anyone please provide a valid regex to capture this recurring text but the numerals within square brackets are unique?
I appreciate your help!
My soulution to this was :
Pattern p = Pattern.compile("(Content ID \\[\\d*\\] )");
Matcher m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, "");
}
m.appendTail(sb);
System.out.println(sb);
So basically you are trying to remove each of Content ID [one or more digits].
To do this you can use replaceAll("regex","replacement") method of String class. As replacement you can use empty String "".
Only problem that stays is what regex should you use.
to match Content ID just write it normally as "Content ID "
to match [ or ] you will have to add \ before each of them because they are regex metacharacters and you need to escape them (in Java you will need to write \ as "\\")
to represent one digit (character from range 0-9) regex uses \d (again in Java you will need to write \ as "\\" which will result in "\\d")
to say "one or more of previously described element" just add + after definition of such element. For example if you want to match one or more letters a you can write it as a+.
Now you should be able to create correct regex. If you will have some questions feel free to ask them in comments.
Try this one:
(Content ID \[[0-9]+\])
You can test it here: http://regexpal.com/
I would use the regex
Content ID \[\d+\] ?
Implement it like this:
str.replaceAll("Content ID \\[\\d+\\] ?", "");
You can find an explanation and demonstration here: http://regex101.com/r/qD5rJ6

how to get all names and date of births from a specific file using java

Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.
I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}

Bug in java.util.regex in sun jdk 6.0.24?

The following code blocks on my system. Why?
System.out.println( Pattern.compile(
"^((?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*)/\\*.*?\\*/(.*)$",
Pattern.MULTILINE | Pattern.DOTALL ).matcher(
"\n\n\n\n\n\nUPDATE \"$SCHEMA\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';"
).matches() );
The pattern (designed to detect comments of the form /*...*/ but not within ' or ") should be fast, as it is deterministic...
Why does it take soooo long?
You're running into catastrophic backtracking.
Looking at your regex, it's easy to see how .*? and (.*) can match the same content since both also can match the intervening \*/ part (dot matches all, remember). Plus (and even more problematic), they can also match the same stuff that ((?:[^'"][^'"]*|"[^"]*"|'[^']*')*) matches.
The regex engine gets bogged down in trying all the permutations, especially if the string you're testing against is long.
I've just checked your regex against your string in RegexBuddy. It aborts the match attempt after 1.000.000 steps of the regex engine. Java will keep churning on until it gets through all permutations or until a Stack Overflow occurs...
You can greatly improve the performance of your regex by prohibiting backtracking into stuff that has already been matched. You can use atomic groups for this, changing your regex into
^((?>[^'"]+|"[^"]*"|'[^']*')*)(?>/\*.*?\*/)(.*)$
or, as a Java string:
"^((?>[^'\"]+|\"[^\"]*\"|'[^']*')*)(?>/\\*.*?\\*/)(.*)$"
This reduces the number of steps the regex engine has to go through from > 1 million to 58.
Be advised though that this will only find the first occurrence of a comment, so you'll have to apply the regex repeatedly until it fails.
Edit: I just added two slashes that were important for the expressions to work. Yet I had to change more than 6 characters.... :(
I recommend that you read Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...).
I think it's because of this bit:
(?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*
Removing the second and third alternatives gives you:
(?:[^'\"][^'\"]*)*
or:
(?:[^'\"]+)*
Repeated repeats can take a long time.
For comment /* and */ detection I would suggest having a code like this:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" /*a comment\n\n*/ SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Pattern pt = Pattern.compile("\"[^\"]*\"|'[^']*'|(/\\*.*?\\*/)",
Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = pt.matcher(str);
boolean found = false;
while (matcher.find()) {
if (matcher.group(1) != null) {
found = true;
break;
}
}
if (found)
System.out.println("Found Comment: [" + matcher.group(1) + ']');
else
System.out.println("Didn't find Comment");
For above string it prints:
Found Comment: [/*a comment
*/]
But if I change input string to:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" '/*a comment\n\n*/' SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
OR
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" \"/*a comment\n\n*/\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Output is:
Didn't find Comment

How to extract CSS color using regex?

I have a CSS style that I need to extract the color from using a Java regex.
eg
color:#000;
I need to extract the thing after : to ;. Can anyone give an example?
I'm not sure how to apply it to Java, but one regex to do this would be:
^color:\s*(#[0-9a-f]+);?$
To just extract from : up to ; do something like:
Pattern pattern = Pattern.compile("[^:]*:(.*);");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
String value = matcher.group(1);
System.out.println("'" + value+ "'"); // do something with value
}
[^:]* - any number of chars that are not ':'
: - one ':'
(...) - a capturing group
.*- any number of any character
;- the terminating ';'
use color:(.*); for only accepting values for 'color'.
/(?<=:).+(?=;)/
That will do it for you
Not sure how you implement regex in Java though.
www.regexr.com to help you text out your regex in real time.
The expression
":(#.+);"
should do it

Categories