How to take a substring using pattern match - java

I have
String content= "<a data-hovercard=\"/ajax/hovercard/group.php?id=180552688740185\">
<a data-hovercard=\"/ajax/hovercard/group.php?id=21392174\">"
I want to get all the id between "group.php?id=" and "\""
Ex:180552688740185
Here is my code:
String content1 = "";
Pattern script1 = Pattern.compile("group.php?id=.*?\"");
Matcher mscript1 = script1.matcher(content);
while (mscript1.find()) {
content1 += mscript1.group() + "\n";
}
But for some reason it does not work.
Can you give me some advice?

Why are you using .*? to match the id. .*? will match every character. You just need to check for digits. So, just use \\d.
Also, you need to capture the id and then print it.
// To consider special characters as literals
String str = Pattern.quote("group.php?id=") + "(\\d*)";
Pattern script1 = Pattern.compile(str);
// Your matcher line
while (mscript1.find()) {
content += mscript1.group(1) + "\n"; // Capture group 1 contains your id
}

Related

How to find match for exact word using pattern matcher in java

I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true
For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";
Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";
If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true
Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";

Java String tokens

I have a string line
String user_name = "id=123 user=aron name=aron app=application";
and I have a list that contains: {user,cuser,suser}
And i have to get the user part from string. So i have code like this
List<String> userName = Config.getConfig().getList(Configuration.ATT_CEF_USER_NAME);
String result = null;
for (String param: user_name .split("\\s", 0)){
for(String user: userName ){
String userParam = user.concat("=.*");
if (param.matches(userParam )) {
result = param.split("=")[1];
}
}
}
But the problem is that if the String contains spaces in the user_name, It do not work.
For ex:
String user_name = "id=123 user=aron nicols name=aron app=application";
Here user has a value aron nicols which contain spaces. How can I write a code that can get me exact user value i.e. aron nicols
If you want to split only on spaces that are right before tokens which have = righ after it such as user=... then maybe add look ahead condition like
split("\\s(?=\\S*=)")
This regex will split on
\\s space
(?=\\S*=) which has zero or more * non-space \\S characters which ends with = after it. Also look-ahead (?=...) is zero-length match which means part matched by it will not be included in in result so split will not split on it.
Demo:
String user_name = "id=123 user=aron nicols name=aron app=application";
for (String s : user_name.split("\\s(?=\\S*=)"))
System.out.println(s);
output:
id=123
user=aron nicols
name=aron
app=application
From your comment in other answer it seems that = which are escaped with \ shouldn't be treated as separator between key=value but as part of value. In that case you can just add negative-look-behind mechanism to see if before = is no \, so (?<!\\\\) right before will require = to not have \ before it.
BTW to create regex which will match \ we need to write it as \\ but in Java we also need to escape each of \ to create \ literal in String that is why we ended up with \\\\.
So you can use
split("\\s(?=\\S*(?<!\\\\)=)")
Demo:
String user_name = "user=Dist\\=Name1, xyz src=activedirectorydomain ip=10.1.77.24";
for (String s : user_name.split("\\s(?=\\S*(?<!\\\\)=)"))
System.out.println(s);
output:
user=Dist\=Name1, xyz
src=activedirectorydomain
ip=10.1.77.24
Do it like this:
First split input string using this regex:
" +(?=\\w+(?<!\\\\)=)"
This will give you 4 name=value tokens like this:
id=123
user=aron nicols
name=aron
app=application
Now you can just split on = to get your name and value parts.
Regex Demo
Regex Demo with escaped =
CODE FISH, this simple regex captures the user in Group 1: user=\\s*(.*?)\s+name=
It will capture "Aron", "Aron Nichols", "Aron Nichols The Benevolent", and so on.
It relies on the knowledge that name= always follows user=
However, if you're not sure that the token following user is name, you can use this:
user=\s*(.*?)(?=$|\s+\w+=)
Here is how to use the second expression (for the first, just change the string in Pattern.compile:
String ResultString = null;
try {
Pattern regex = Pattern.compile("user=\\s*(.*?)(?=$|\\s+\\w+=)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

Regular expression on a string

I have a String like below
String phone = (123) 456-7890
Now I would like my program to verify if that my input is the same pattern as string 'phone'
I did the following
if(phone.contains("([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")) {
//display pass
}
else {
//display fail
}
It didn't work. I tried with other combinations too. nothing worked.
Question :
1. How can I achieve this without using 'Pattern' like above?
2. How to do this with pattern. I tried with pattern as below
Pattern pattern = Pattern.compile("(\d+)");
Matcher match = pattern.matcher(phone);
if (match.find()) {
//Displaypass
}
String#matches checks if a string matches a pattern:
if (phone.matches("\\(\\d{3}\\) \\d{3}-\\d{4}")) {
//Displaypass
}
The pattern is a regular expression. Therefor I had to escape the round brackets, as they have a special meaning in regex (they denote capturing groups).
contains() only checks if a string contains the substring passed to it.
I'm not going to dive too deeply into regex syntax, but there definitely is something off with your regex.
"([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
it containes ( and ) and those have special meaning. Escape them
"\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
and you'll also have to escape your \ for the final
"\\([0-9][0-9][0-9]\\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
You can write like:
Pattern pattern = Pattern.compile("\\(\\d{3}\\) \\d{3}-\\d{4}");
Matcher matcher = pattern.matcher(sPhoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
}
For more information you can visit this article.
It appears that your problem is that you didn't escape the parentheses, so your Regex is failing. Try this:
\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
This works
String PHONE_REGEX = "[(]\\b[0-9]{3}\\b[)][ ]\\b[0-9]{3}\\b[-]\\b[0-9]{4}\\b";
String phone1 = "(1234) 891-6762";
Boolean b = phone1.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone1 + " :Valid = " + b);
String phone2 = "(143) 456-7890";
b = phone2.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone2 + " :Valid = " + b);
Output:
is phone: (1234) 891-6762 :Valid = false
is phone: (143) 456-7890 :Valid = true

Replace String in Java with regex and replaceAll

Is there a simple solution to parse a String by using regex in Java?
I have to adapt a HTML page. Therefore I have to parse several strings, e.g.:
href="/browse/PJBUGS-911"
=>
href="PJBUGS-911.html"
The pattern of the strings is only different corresponding to the ID (e.g. 911). My first idea looks like this:
String input = "";
String output = input.replaceAll("href=\"/browse/PJBUGS\\-[0-9]*\"", "href=\"PJBUGS-???.html\"");
I want to replace everything except the ID. How can I do this?
Would be nice if someone can help me :)
You can capture substrings that were matched by your pattern, using parentheses. And then you can use the captured things in the replacement with $n where n is the number of the set of parentheses (counting opening parentheses from left to right). For your example:
String output = input.replaceAll("href=\"/browse/PJBUGS-([0-9]*)\"", "href=\"PJBUGS-$1.html\"");
Or if you want:
String output = input.replaceAll("href=\"/browse/(PJBUGS-[0-9]*)\"", "href=\"$1.html\"");
This does not use regexp. But maybe it still solves your problem.
output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";
This is how I would do it:
public static void main(String[] args)
{
String text = "href=\"/browse/PJBUGS-911\" blahblah href=\"/browse/PJBUGS-111\" " +
"blahblah href=\"/browse/PJBUGS-34234\"";
Pattern ptrn = Pattern.compile("href=\"/browse/(PJBUGS-[0-9]+?)\"");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find())
{
String match = mtchr.group(0);
String insMatch = mtchr.group(1);
String repl = match.replaceFirst(match, "href=\"" + insMatch + ".html\"");
System.out.println("orig = <" + match + "> repl = <" + repl + ">");
}
}
This just shows the regex and replacements, not the final formatted text, which you can get by using Matcher.replaceAll:
String allRepl = mtchr.replaceAll("href=\"$1.html\"");
If just interested in replacing all, you don't need the loop -- I used it just for debugging/showing how regex does business.

Categories