My task is to devise a regular expression that will recognize the indefinite article in English – the word “a” or “an” i.e. to write a regular expression to identify the word a or the word an. I must test the expression by writing a test driver which reads a file containing approximately ten lines of text. Your program should count the occurrences of the words “a” and “an”. I shall not match the characters a and an in words such as than.
This is my code so far:
import java.io.IOException;
import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexeFindText {
public static void main(String[] args) throws IOException {
// Input for matching the regexe pattern
String file_name = "Testing.txt";
ReadFile file = new ReadFile(file_name);
String[] aryLines = file.OpenFile();
String asString = Arrays.toString(aryLines);
// Regexe to be matched
String regexe = ""; //<<--this is where the problem lies
int i;
for ( i=0; i < aryLines.length; i++ ) {
System.out.println( aryLines[ i ] ) ;
}
// Step 1: Allocate a Pattern object to compile a regexe
Pattern pattern = Pattern.compile(regexe);
//Pattern pattern = Pattern.compile(regexe, Pattern.CASE_INSENSITIVE);
// case- insensitive matching
// Step 2: Allocate a Matcher object from the compiled regexe pattern,
// and provide the input to the Matcher
Matcher matcher = pattern.matcher(asString);
// Step 3: Perform the matching and process the matching result
// Use method find()
while (matcher.find()) { // find the next match
System.out.println("find() found the pattern \"" + matcher.group()
+ "\" starting at index " + matcher.start()
+ " and ending at index " + matcher.end());
}
// Use method matches()
if (matcher.matches()) {
System.out.println("matches() found the pattern \"" + matcher.group()
+ "\" starting at index " + matcher.start()
+ " and ending at index " + matcher.end());
} else {
System.out.println("matches() found nothing");
}
// Use method lookingAt()
if (matcher.lookingAt()) {
System.out.println("lookingAt() found the pattern \"" + matcher.group()
+ "\" starting at index " + matcher.start()
+ " and ending at index " + matcher.end());
} else {
System.out.println("lookingAt() found nothing");
}
}
}
What do I have to use to find those words within my text?
Here's the regex that will match "a" or "an":
String regex = "\\ban?\\b";
Let's break that regex down:
\b means word boundary (a single back slash is written as "\\" in java)
a is simply a literal "a"
n? means zero or one literal "n"
Related
Need to grab string text of email value in big XML/normal string.
Been working with Regex for it and as of now below Regex is working correctly for normal String
Regex : ^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}$
Text : paris#france.c
but in case when above text is enclosed in XML tag it fails to return.
<email>paris#france.c</email>
I am trying to amend some change to this regex so that it will work for both of the scenarios
You have put ^ at the beginning which means the "Start of the string", and $ at the end which means the "End of the string". Now, look at your string:
<email>paris#france.c</email>
Do you think, it starts and ends with an email address?
I have removed them and also escaped the - in your regex. Here you can check the following auto-generated Java code with the updated regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+(?:\\\\.[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}";
final String string = "paris#france.c\n"
+ "<email>paris#france.c</email>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output:
Full match: paris#france.c
Full match: paris#france.c
How to extract the strings between the delimiters '<' and '>' from the string
“Rahul<is>an<entrepreneur>”
I tried using substring() method, but I could only extract one string out of the primary string. How to loop this and get all the strings between the delimiters from the primary string
You could use Pattern and Matcher for pattern lookup. For example, see code below:
String STR = "Rahul<is>an<entrepreneur>";
Pattern pattern = Pattern.compile("<(.*?)>", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(STR);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.end() + " " + matcher.group());
}
Output of above will give you start and end indexes and group substring:
5 9 <is>
11 25 <entrepreneur>
More specifically if you just want the strings, you can get string between the group start and end indexes.
STR.substring(matcher.start() + 1, matcher.end() - 1);
This gives you only the matching strings.
This worked for me:
String str = "Rahul<is>an<entrepreneur>";
String[] tempStr = str.split("<");
for (String st : tempStr) {
if (st.contains(">")) {
int index = st.indexOf('>');
System.out.println(st.substring(0, index));
}
}
Output:
is
entrepreneur
I am very new in regex and need your help. I wanna take numbers and letters between two span.
<span>454.000 $</span>
I wanna take 454.000 $. There are 12 space before . Please help me.
This Should Work.
Regexp:
\s+<.+>(.+)<.+>
Input:
<span>454.000 $</span>
Output:
454.000 $
JAVA CODE:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\s+<.+>(.+)<.+>";
final String string = " <span>454.000 $</span>";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
See: https://regex101.com/r/2zg5Ws/1
Capturing group using pattern matching is something like below
String x = " <span>454.000 $</span> ";
Pattern p = Pattern.compile("<span>(.*?)</span>");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println(">> "+ m.group(1)); // output 454.000 $
}
But for such cases I always prefer to use the replaceAll() as it is shorter version of code:
String num = x.replaceAll(".*<span>(.*?)</span>.*", "$1");
// num has 454.000 $
For the replace it is actually capturing the group from the text and replacing the whole text with that group ($1). This solution depends upon how your input string is.
So I'm trying to pull two strings via a matcher object from one string that is stored in my online databases.
Each string appears after s:64: and is in quotations
Example s:64:"stringhere"
I'm currently trying to get them as so but any regex that I've tried has failed,
Pattern p = Pattern.compile("I don't know what to put as the regex");
Matcher m = p.matcher(data);
So with that said, all I need is the regex that will return the two strings in the matcher so that m.group(1) is my first string and m.group(2) is my second string.
Try this regex:-
s:64:\"(.*?)\"
Code:
Pattern pattern = Pattern.compile("s:64:\"(.*?)\"");
Matcher matcher = pattern.matcher(YourStringVar);
// Check all occurance
int count = 0;
while (matcher.find() && count++ < 2) {
System.out.println("Group : " + matcher.group(1));
}
Here group(1) returns the each match.
OUTPUT:
Group : First Match
Group : Second Match
Refer LIVE DEMO
String data = "s:64:\"first string\" random stuff here s:64:\"second string\"";
Pattern p = Pattern.compile("s:64:\"([^\"]*)\".*s:64:\"([^\"]*)\"");
Matcher m = p.matcher(data);
if (m.find()) {
System.out.println("First string: '" + m.group(1) + "'");
System.out.println("Second string: '" + m.group(2) + "'");
}
prints:
First string: 'first string'
Second string: 'second string'
Regex you need should be compile("s:64:\"(.*?)\".*s:64:\"(.*?)\"")
and as usual thank you in advance.
I am trying to familiarize myself with regEx and I am having an issue matching a URL.
Here is an example URL:
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
here is what my regex breakdown looks like:
[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html
the [id] is a string 22 characters in length that can be either uppercase or lowercase letters, as well as numbers. However, I do not want to extract that from the URL. Just clarifying
Now, I need to extract two values from this url.
First,
I need to extract the dirs(s). However, the [dir] is optional, but also can be as many as wanted. In other words that parameter could not be there, or it could be dir1/dir2/dir3 ..etc . So, going off my first example :
www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Here I would need to extract dir1/dir2/dir3 where a dir is a string that is a single word with all lowercase letters (ie sports/mlb/games). There are no numbers in the dir, only using that as an example.
But in this example of a valid URL:
www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
There is no [dir] so I would not extract anything. thus, the [dir] is optional
Secondly,
I need to extract the [storyTitle] where the [storyTitle] is also optional just like the [dir] above, but however if there is a storyTitle there can only be one.
So going off my previous examples
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
would be valid where I need to extract 'title-of-some-story' where story titles are dash separated strings that are always lowercase. The example belowis also valid:
www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
In the above example, there is no [storyTitle] thus making it optional
Lastly, just to be thorough, a URL without a [dir] and without a [storyTitle] are also valid. Example:
www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Is a valid URL. Any input would be helpful I hope I am clear.
Here is one example that will work.
public static void main(String[] args) {
Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");
String[] strings ={
"www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
};
for (int idx = 0; idx < strings.length; idx++) {
Matcher m = p.matcher(strings[idx]);
if (m.find()) {
String dir = m.group(1);
String title = m.group(2);
if (title != null) {
title = title.substring(1); // remove the leading /
}
System.out.println(idx+": Dir: "+dir+", Title: "+title);
}
}
}
Here is an all regex solution.
Edit: Allows for http://
Java source:
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Match 1st url
System.out.println("Match 1st URL:");
Matcher matcher = pattern.matcher(url);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 2nd url
System.out.println("\nMatch 2nd URL:");
matcher = pattern.matcher(url2);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 3rd url
System.out.println("\nMatch 3rd URL:");
matcher = pattern.matcher(url3);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
}
}
Output:
Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story
Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE:
Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR:
TITLE: title-of-some-story