How Match a Pattern in Text using Scanner and Pattern Classes? - java

i want to find whether a particular pattern exists in my text file or not.
i m using following classes for this :
java.util.regex.Pattern and java.util.Scanner;
my sample text Line is
String Line="DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396";
and, i want to match following kind of pattern :
A 102 190
where, at A's position a-z or A-Z but single charter.
at 102's position any integer and of any length.
at 190's position any integer and of any length.
and,My code for pattern matching is:
Scanner sr=new Scanner(Line);
Pattern p = Pattern.compile("\\s+([a-zA-Z]){1}\\s+\\d{1,}\\s+\\d{1,}\\s+");
while(sr.hasNext(p))
{
System.out.println("Pattern exists");
System.out.println("Matched String : "+sr.next(p));
}
but,
pattern is not matching even it exist there..
i think the problem is with my pattern string :
\\s+([a-zA-Z]){1}\\s+\\d{1,}\\s+\\d{1,}\\s+"
anybody, Plz help me what pattern string should i use????

I'm not sure that Scanner is the best tool for this as hasNext(Pattern) checks to see if the next complete token has the next pattern. Your pattern goes across tokens.
Have you tried using a Matcher object instead of the Scanner?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Foo2 {
public static void main(String[] args) {
String line = "DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396";
Pattern p = Pattern.compile("\\s+[a-zA-Z]\\s+\\d{1,}\\s+\\d{1,}\\s+");
Matcher matcher = p.matcher(line);
while (matcher.find()) {
System.out.printf("group: %s%n", matcher.group());
}
System.out.println("done");
}
}

This regex line works:
\\s+\\w\\s+\\d+\\s+\\d+
group(0) of your matcher (p.matcher) gives A 102 190
.
[EDIT] Ok, I'll give you a complete working sample then:
Pattern p = Pattern.compile("\\s+\\w\\s+\\d+\\s+\\d+");
Matcher matcher = p.matcher("DBREF 1A1F A 102 190 UNP P08046 EGR1_MOUSE 308 396");
matcher.find();
System.out.println("Found match: " + matcher.group(0));
// Found match: A 102 190

Related

Looking for A Regular expression to match java regex (punct) pattern

I am looking for help/support for a Regex expression which will match studentIdMatch2 value in below class. studentIdMatch1 matches fine.However the studentIdMatch2 has studentId which can allow all the special characters other than : and ^ and comma.Hence its not working,thank you for your time and appreciate your support.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegEx {
public static void main(String args[]){
String studentIdMatch1 = "studentName:harry,^studentId:Id123";
String studentIdMatch2 = "studentName:harry,^studentId:Id-H/MPU/L&T/OA+_T/(1490)/17#)123";
Pattern pattern = Pattern
.compile("(\\p{Punct}?)(\\w+?)(:)(\\p{Punct}?)(\\w+?)(\\p{Punct}?),");
Matcher matcher = pattern.matcher(studentIdMatch1 + ","); // Works Fine(Matches Student Name and Id)
// No Special Characters in StudentId
//Matcher matcher = pattern.matcher(studentIdMatch2 + ","); //Wont work Special Characters in StudentId. Matches Student Name
while (matcher.find()) {
System.out.println("group1 = "+matcher.group(1)+ "group2 = "+matcher.group(2) +"group3 = "+matcher.group(3) +"group4 = "+matcher.group(4)+"group5 = "+matcher.group(5));
}
System.out.println("match ended");
}
}
You may try:
^SutdentName:(\w+),\^StudenId:([^\s,^:]+)$
Explanation of the above regex:
^, $ - Represents start and end of line respectively.
SutdentName: - Matches SutdentName: literally. Although according to me it should be StudentName; but I didn't changed it.
(\w+) - Represents first capturing group matching only word characters i.e. [A-Za-z0-9_] one or more times greedily.
,\^StudenId: - Matches ,^StudenId literally. Here also I guess it should be StudentId.
([^\s,^:]+) - Represents second capturing group matching everything other than white-space, ,, ^ and : one or more times greedily. You can add others according to your requirements.
You can find the demo of the above regex in here.
Sample Implementation in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
private static final Pattern pattern = Pattern.compile("^SutdentName:(\\w+),\\^StudenId:([^\\s,^:]+)$", Pattern.MULTILINE);
public static void main(String[] args) {
String string = "SutdentName:harry,^StudenId:Id123\n"
+ "SutdentName:harry,^StudenId:Id-H/MNK/U&T/BA+_T/(1490)/17#)123";
Matcher matcher = pattern.matcher(string);
while(matcher.find()){
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
}
You can find the sample run of the above code in here.
The second (\\w+?) only captures words. So change it to capture what you want. i.e
allow all the special characters other than : and ^ and comma
like ([^:^,]+?)
^ - Negate the match
:^, - Matches : , ^ and comma

Java - Find Regex to match in String

I need some help because i'm junior in Java and after some research on the web
I can't find a solution.
So my problem:
String str : size="A4"
I would like to extract 'A4' with a regex by giving the word "size" in the regex.
How can I do ?
import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
Matcher m=Pattern.compile("size\\s*=\\s*\"([^\"]*)\"").matcher("size=\"A4\"");
while(m.find())
System.out.println(m.group(1));
}
}
Output:
A4
http://ideone.com/FqMuTA
Regex breakdown:
size\\s*=\\s*\"([^\"]*)\"
size matches size literally
\\s*=\\s* matches 0 or more white spaces leading or trailing the
= sign
\" matches a double quote
([^\"]*) matches 0 or more characters(which is not a double quote
[^\"]) and remembers the captured text as back-reference 1 i.e
nothing but captured group number 1 used below in the while loop
\" we match the ending double quote
You can find more info on regex here
Create a Pattern java.util.regex.Pattern that matches your conditions.
Generate a Matcher java.util.regex.Matcher that handles the input String
let the Matcher find the desired value (by using Matcher.group(group) )
.
//1. create Pattern
Pattern p = Pattern.compile("size=\\\"([A-Za-z0-9]{2})\\\"");
//2. generate Matcher
Matcher m = p.matcher(myString);
//3. find value using groups(int)
if(m.find()) {
System.out.println( m.group(1) );
}

How to figure out exact reason why Regex is failing in java

I have a Regex Pattern that i am using to match screen.
When i use it to test in Sublime Text, the same is working just fine.
but in Java execution, the code is failing
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRBRR"));//false
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRRRR"));//true
The above code should be coming as true in both cases, whereas in java it is coming as false.
my basic requirement is to identify groups of unique character in sequence...
meaning if String is
RRRRBBBRRBBBRBBBRRR
Then it should identify as
RRRR BBB RR BBB R BBB RRR
Please help...Thanks in advance
Try this:
String value = "RRRRBBBRRBBBRBBBRRR";
Pattern pattern = Pattern.compile("B+|R+");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
The fact that the first expression returns false is due to the fact that you have a B in a middle of several R so you don't have an exact match since your regular expression expect only Rs or Bs
matches adds an implicit ^ at the start & $ at the end which means substring matches wont work. find() will look for substring.
Matcher is best suited for this:
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(B+)?|(R+)?";
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher("RRBRR");
System.out.println(matcher.find());
int count = 0;
while(matcher.find()){
System.out.println(matcher.group());
count++;
}
System.out.println("Count:"+count);
}

Regex to match words of a certain length

I would like to know the regex to match words such that the words have a maximum length.
for eg, if a word is of maximum 10 characters in length, I would like the regex to match, but if the length exceeds 10, then the regex should not match.
I tried
^(\w{10})$
but that brings me matches only if the minimum length of the word is 10 characters. If the word is more than 10 characters, it still matches, but matches only first 10 characters.
I think you want \b\w{1,10}\b. The \b matches a word boundary.
Of course, you could also replace the \b and do ^\w{1,10}$. This will match a word of at most 10 characters as long as its the only contents of the string. I think this is what you were doing before.
Since it's Java, you'll actually have to escape the backslashes: "\\b\\w{1,10}\\b". You probably knew this already, but it's gotten me before.
^\w{0,10}$ # allows words of up to 10 characters.
^\w{5,}$ # allows words of more than 4 characters.
^\w{5,10}$ # allows words of between 5 and 10 characters.
Length of characters to be matched.
{n,m} n <= length <= m
{n} length == n
{n,} length >= n
And by default, the engine is greedy to match this pattern. For example, if the input is 123456789, \d{2,5} will match 12345 which is with length 5.
If you want the engine returns when length of 2 matched, use \d{2,5}?
Method 1
Word boundaries would work perfectly here, such as with:
\b\w{3,8}\b
\b\w{2,}
\b\w{,10}\b
\b\w{5}\b
RegEx Demo 1
Java
Some languages such as Java and C++ are double-escape required:
\\b\\w{3,8}\\b
\\b\\w{2,}
\\b\\w{,10}\\b
\\b\\w{5}\\b
PS: \\b\\w{,10}\\b may not work for all languages or flavors.
Test 1
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "\\b\\w{3,8}\\b";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 1
Full match: words
Full match: with
Full match: length
Full match: three
Full match: eight
Method 2
Another good-to-know method is to use negative lookarounds:
(?<!\w)\w{3,8}(?!\w)
(?<!\w)\w{2,}
(?<!\w)\w{,10}(?!\w)
(?<!\w)\w{5}(?!\w)
Java
(?<!\\w)\\w{3,8}(?!\\w)
(?<!\\w)\\w{2,}
(?<!\\w)\\w{,10}(?!\\w)
(?<!\\w)\\w{5}(?!\\w)
RegEx Demo 2
Test 2
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "(?<!\\w)\\w{1,10}(?!\\w)";
final String string = "words with length three to eight";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
}
}
}
Output 2
Full match: words
Full match: with
Full match: length
Full match: three
Full match: to
Full match: eight
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Even, I was looking for the same regex but I wanted to include the all special character and blank spaces too. So here is the regex for that:
^[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{0,10}$
Simple, complete and tested java code, for finding words of certain length n:
int n = 10;
String regex = "\\b\\w{" + n + "}\\b";
String str = "Hello, this is a test 1234567890";
ArrayList<String> words = new ArrayList<>();
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
words.add(matcher.group(0));
}
System.out.println(words);
For more explanations and different options - see other answers.
Liked Pardeep's answer but I needed whole word bounds in a string/title that can be any messed up string an advertising dept. can think up .
**\b\w(**[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22}**)\b**
should iterate through a string ( tested notepad++ ) and get the largest group of words in the range i.e. 1,22 chars here without splitting mid word.
Here was the final command for me in python to add some LF's
name = re.sub(r"\b(\w[A-Za-z0-9\s$&+,:;=?##|'<>.^*()%!-]{1,22})\b","\\\1\\\n",name)

Regular Expression in Java: How to refer to "matched patterns"?

I was reading the Java Regular Expression tutorial, and it seems only to teach to test whether a pattern matched or not, but does not tell me how to refer to a matched pattern.
For example, I have a string "My name is xxxxx". And I want to print xxxx. How would I do that with Java regular expressions?
Thanks.
What tutorial were you reading ? The sun's one tackles that topic quite thoroughly, but you have to read it correctly :)
Capturing a part of a string is done through the parentheses. If you want to capture a group in a string, you have to put this part of the regular expression in parentheses. The groups are defined in the order the parentheses appear, and the group with index 0 represents the whole string.
For instance, the regexp "Day ([0-9]+) - Note ([0-9]+)" would define 3 groups :
group(0) : The whole string
group(1) : The first group in the regexp, that is to say the day number
group(2) : The second group in the regexp, that is to say the note number
As for the actual code and how to retrieve the groups you've defined in your regexp, have a look at the Java documentation, especially the Matcher class and its group method : http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html
You can test your regexps with that very useful tool : http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
Hope this helped,
Cheers
Note the use of parentheses in the pattern and the group() method on Matcher
import java.util.regex.*;
public class Example {
static public void main(String[] args) {
Pattern regex = Pattern.compile("My name is (.*)");
String s = "My name is Michael";
Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println("original string: " + matcher.group(0));
System.out.println("first group: " + matcher.group(1));
}
}
}
Output is:
original string: My name is Michael
first group: Michael
You can use the Matcher group(int) method:
Pattern p = Pattern.compile("My name is (.*)");
Matcher m = p.matcher("My name is akf");
m.find();
String s = m.group(1); //grab the first group*
System.out.println(s);
output:
akf
* look at matching groups
Matcher m = Pattern.compile("name is (.*)").matcher("My name is Ross");
if (m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
The parens form a capturing group. Group 0 is the entire pattern and group 1 is the back reference.
The above program outputs:
name is Ross
Ross

Categories