I'm trying to use regex to find phone numbers in the form (xxx) xxx-xxxx that are all inside a text document with messy html.
The text file has lines like:
<div style="font-weight:bold;">
<div>
<strong>Main Phone:
<span style="font-weight:normal;">(713) 555-9539
<strong>Main Fax:
<span style="font-weight:normal;">(713) 555-9541
<strong>Toll Free:
<span style="font-weight:normal;">(888) 555-9539
and my code contains:
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time
if (m.matches()) {
stringArray.add(line);
}
The problem is when I put even simple things into the pattern to compile, it still returns nothing. And if it doesn't even recognize something like \d, how am I going to get a telephone number? For example:
Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d"); //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D"); //Returns lines
This is really confusing to me, and any help would be appreciated.
Use Matcher#find() instead of matches() which would try to match the complete line as a phone number. find() would search and return true for sub-string matches as well.
Matcher m = p.matcher(line);
Also, the line above suggests you're creating the same Pattern and Matcher again in your loop. That's not efficient. Move the Pattern outside your loop and reset and reuse the same Matcher over different lines.
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
stringArray.add(line);
}
while ((line = reader.readLine()) != null) {
m.reset(line);
if (m.find()) {
stringArray.add(line);
}
}
Or instead of regexp you can use Google library - libphonenumber, just as follows
Set<String> phones = new HashSet<>();
PhoneNumberUtil util = PhoneNumberUtil.getInstance();
Iterator<PhoneNumberMatch> iterator = util.findNumbers(source, null).iterator();
while (iterator.hasNext()) {
phones.add(iterator.next().rawString());
}
Related
Here is my code:
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com\\/excludethis).*\\/"); //search for this pattern
Matcher m = p.matcher(stringToSearch); //match pattern in StringToSearch
String store= "";
// print match and store match in String Store
if (m.find())
{
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
//repeat the process
Pattern p1 = Pattern.compile("(.*)[^\\/]");
Matcher m1 = p1.matcher(store);
if (m1.find())
{
String theGroup = m1.group(0);
System.out.format("'%s'\n", theGroup);
}
I want to to match everything that is after excludethis and before a / that comes after.
With "(?<=.com\\/excludethis).*\\/" regex I will match 123456/ and store that in String store. After that with "(.*)[^\\/]" I will exclude / and get 123456.
Can I do this in one line, i.e combine these two regex? I can't figure out how to combine them.
Just like you have used a positive look behind, you can use a positive look ahead and change your regex to this,
(?<=.com/excludethis).*(?=/)
Also, in Java you don't need to escape /
Your modified code,
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com/excludethis).*(?=/)"); // search for this pattern
Matcher m = p.matcher(stringToSearch); // match pattern in StringToSearch
String store = "";
// print match and store match in String Store
if (m.find()) {
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
System.out.println("Store: " + store);
Prints,
'123456'
Store: 123456
Like you wanted to capture the value.
This may be useful for you :)
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern pattern = Pattern.compile("excludethis([\\d\\D]+?)/");
Matcher matcher = pattern.matcher(stringToSearch);
if (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
If you don't want to use regex, you could just try with String::substring*
String stringToSearch = "https://example.com/excludethis123456/moretext";
String exclusion = "excludethis";
System.out.println(stringToSearch.substring(stringToSearch.indexOf(exclusion)).substring(exclusion.length(), stringToSearch.substring(stringToSearch.indexOf(exclusion)).indexOf("/")));
Output:
123456
* Definitely don't actually use this
i have a string like this:
font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;
How can I get the value of the color and the value of background-color?
color:#ffffff;
background-color:#ff0000;
i have tried the following code but the result is not my expected.
Pattern pattern = Pattern.compile("^.*(color:|background-color:).*;$");
The result will display:
font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;
If you want to have multiple matches in a string, don't assert ^ and $ because if those matches, then the whole string matches, which means that you can't match it again.
Also, use a lazy quantifier like *?. This will stop matching as soon as it finds some string that matches the pattern after it.
This is the regex you should use:
(color:|background-color:)(.*?);
Group 1 is either color: or background-color:, group 2 is the color code.
Demo
To do this you should use the (?!abc) expression in regex. This finds a match but doesn't select it. After that you can simply select the hexcode, like this:
String s = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman";
Pattern pattern = Pattern.compile("(?!color:)#.{6}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Pattern pattern = Pattern.compile("color\\s*:\\s*([^;]+)\\s*;\\s*background-color\\s*:\\s*([^;]+)\\s*;");
Matcher matcher = pattern.matcher("font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;");
if (matcher.find()) {
System.out.println("color:" + matcher.group(1));
System.out.println("background-color:" + matcher.group(2));
}
No need to describe the whole input, only the relevant part(s) that you're looking to extract.
The regex color:(#[\\w\\d]+); does the trick for me:
String input = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;";
String regex = "color:(#[\\w\\d]+);";
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Notice that m.group(1) returns the matching group which is inside the parenthesis in the regex. So the regex actually matches the whole color:#ffffff; and color:#ff0000; parts, but the print only handles the number itself.
Use a CSS parser like ph-css
String input = "font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;";
final CSSDeclarationList cssPropertyList =
CSSReaderDeclarationList.readFromString(input, ECSSVersion.CSS30);
System.out.println(cssPropertyList.get(1).getProperty() + " , "
+ cssPropertyList.get(1).getExpressionAsCSSString());
System.out.println(cssPropertyList.get(2).getProperty() + " , "
+ cssPropertyList.get(2).getExpressionAsCSSString());
Prints:
color , #ffffff
background-color , #ff0000
Find more about ph-css on github
I have been trying to use pattern matcher to find the specific pattern and I have created the regex pattern through this website and it shows that the pattern is found in the text file I wanted to read.
Extra info.: This code works like this : Start reading the textfile,
when meet >D10, enter another loop and get the information until the
next >D10 is found. Loop this process until EOF.
My sample text file:
D14*
Y7620D03*
X247390Y66680D03*
X251540Y160150D03*
G01Y136780*
G03X-374970Y133680I3100J0*
D17*
Y7620D03*
X247390Y66680D03*
X251540Y160150D03*
G01Y136780*
G03X-374970Y133680I3100J0*
My pattern code in java:
private final Pattern PinNamePattern = compile("(D[1-9][0-9])\\*");
private final Pattern LocationXYPattern = compile("^(G0[1-3])?(X|Y)(-?[\\d]+)(D0[1-3])?\\*",Pattern.MULTILINE);
private final Pattern LocationXYIJPattern = compile("^(G0[1-3])?X(-?[\\d]+)?Y(-?[\\d]+)?I?(-?[\\d]+)?J?(-?[\\d]+)?(D0[1-3])?\\*",Pattern.MULTILINE);
My code in java:
while ((line = br.readLine()) != null) {
Matcher pinNameMatcher = PinNamePattern.matcher(line);
//If found Aperture Name
if (pinNameMatcher.find()) {
currentApperture = pinNameMatcher.group(1);
System.out.println(currentApperture);
pinNameMatcher.reset();
//Start matching Location X Y I J
//Will keep looping as long as next aperture name not found
//Second While loop
while (!(pinNameMatcher.find()) ) {
line = br.readLine();
Matcher locXYMatcher = LocationXYPattern.matcher(line);
Matcher locXYIJMatcher = LocationXYIJPattern.matcher(line);
LineNumber++;
if (locXYMatcher.find()) {
System.out.println("XY FOUND");
if (locXYIJMatcher.find()) {
System.out.println("XYIJ FOUND");
}
}
However, when I'm using java to read, the pattern just simply cannot be found. Is there anything I missed out or am I doing it wrong? I have tried removing the "^" and MULTILINE flag but the pattern is still not found.
Your regex looks and works fine, it's possible you aren't searching it properly.
String s = "G03X-374970Y133680I3100J0*";
Pattern pattern = Pattern.compile("^(G0[1-3])?X(-?[\\d]+)?Y(-?[\\d]+)?I?(-?[\\d]+)?J?(-?[\\d]+)?(D0[1-3])?\\*");
Matcher m = pattern.matcher(s);
while (m.find()) {
String s = m.group(0);
System.out.println(s); // prints G03X-374970Y133680I3100J0*
}
In your updated code, you are looking for the second and third pattern only when the first pattern matches, which is probably not what you want. Try using this as a foundation and improving upon it:
while ((line = br.readLine()) != null) {
Matcher pinNameMatcher = PinNamePattern.matcher(line);
if (pinNameMatcher.find()) {
currentApperture = pinNameMatcher.group(0);
System.out.println(currentApperture);
}
Matcher locXYMatcher = LocationXYPattern.matcher(line);
if (locXYMatcher.find()) {
System.out.println(locXYMatcher.group(0));
}
Matcher locXYIJMatcher = LocationXYIJPattern.matcher(line);
if (locXYMatcher.find()) {
System.out.println(locXYIJMatcher.group(0));
}
}
I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?
Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.
You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo
Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}
I have got string like str = Adobe Flash Player 11.4.402.287 (11.3 MB), I need to extract only Adobe Flash Player as the output. Pls suggest..
I tried using Regex like :
String str = "Adobe Flash Player 11.4.402.287 (11.3 MB)";
Pattern p = Pattern.compile("^[a-zA-Z]+([0-9]+).*");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
As suggested by #MarkoTopolink, regexp [\\p{L}\\s]+ helped me. thanks.
Try this:
String str = "Adobe Flash Player 11.4.402.287 (11.3 MB)";
Pattern p = Pattern.compile("^([a-zA-Z ]+)([0-9]+).*");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
There are two problems in your try:
Grouping is done using (), you did not define a group for the text you actually wanted
You need to add a space to get more than one word.
You can use regex:
String str = "Adobe Flash Player 11.4.402.287 (11.3 MB)";
String [] strs = str.split("([ |0-9|.|(.*MB)]*) [ |0-9|.|(.*MB)]*");
for (String strng : strs) {
System.out.println(strng.trim());
}