Pattern matching in Java for alphabets and numbers - java

I am doing a simple pattern matching, which is not working. Please help
The string is:
The number *TER8347834SC* has problems.
The String contains a number TER8347834SC which may change with different messages, so i need to use regex to match this number while comparing the String. So while comparing String I am using the regex as [A-Z0-0] for TER8347834SC which doesn't match.
I know this is quite simple, but i tried many times, please help me in this.

Think you mean this,
"\\b[A-Z0-9]+\\b"
Note that \\b word boundary is a much needed one.

Try using this one:
([A-Z]+[0-9]+)

Your pattern should be like this ONLY if the message is always the same as you mentioned:
String pattern = "The number (.*) has problems.";

Related

replacing string with regex in java

I think I have a decent handle wrt matching strings using Regex in Java, but now I am trying to replace strings using Regex and not having much success.
Simply put, I am trying to find where there is a digit immediately followed by a constant string "CMR", then adding a space between the digit and the "CMR" substring. "0CMR" should become "0 CMR", "5CMR" should become "5 CMR", etc. Any preceding non-digit should be left as it was.
So my source string is "theStringThat0CMRhas"
my command is:
replaceAll("[0-9]CMR", "[0-9] CMR");
I get the added space in the result, but the result becomes "theStringThat[0-9] CMRhas" which obviously isn't what I need. Somehow I need to tell Regex not to replace with "[0-9]", but with whatever it matched on in the first place.
I know I'm doing this wrong, but I don't know what's right.
Any help appreciated.
Thanks,
Tom
You want to use a capturing group:
replaceAll("([0-9])CMR", "$1 CMR")
$1 references the first group in the match, denoted by parentheses.
Also, [0-9] can be substituted with \d.
Try this:
replaceAll("(?<=\\d)(?=\\D)"," ")
It uses look ahead for non digit character and negative look ahead for digit characters.
If you want just do it for the one with CMR after the digits, use:
"(?<=\\d)(?=CMR)"
You should group the number regex and call argument. Your code here:
replaceAll("([0-9])CMR", "$1 CMR");
For more regex knowledge, please read this document
https://www.tutorialspoint.com/java/java_regular_expressions.htm
Good luck!
a good starting point may be here for reading regex: http://www.regular-expressions.info/java.html
on this site the replacing string page is here: http://www.regular-expressions.info/replacetutorial.html
$with a number represents a whole regex match, and you can use these to refer to what you were doing
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("[0-9]CMR","$0");
System.out.println(resultString);
this would result in the answer: theStringThat0CMR has
you obviously didnt want this, so lets change the answer up a little
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$0 CMR");
System.out.println(resultString);
now we are referencing the parenthsis, in which it hasn't done anything yet, so its replacing what it found, with the same thing, a space, and CMR
your result would now be: theStringThat0CMR CMRhas
so lets reference the part where we have chosen the number
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$1 CMR");
System.out.println(resultString);
now your answer will be: theStringThat0 CMRhas
it is finding where it picked a number, replacing it with that number, a space, and then CMR
you are trying to do what I believe to be called a backreference though I am unsure. Regex is still not my strong suit either.

Finding whole word only in Java string search

I'm running into the problem of finding a searched pattern within a larger pattern in my Java program. For example, I'll try and find all for loops, but will stumble upon formula. Most of the suggestions I've found talk about using regular expression searches like
String regex = "\\b"+keyword+"\\b";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(searchString);
or some variant of this. The issue I'm running into is that I'm crawling through code, not a book-like text where there are spaces on either side of every word. For example, this will miss for(, which I would like to find. Is there another clever way to find whole words only?
Edit: Thanks for the suggestions. How about cases in which there the keyword starts on the first entry of the string? For example,
class Vec {
public:
...
};
where I'm searching for class (or alternatively public). The patterns suggested by Thanga, Austin Lee, npinti, and Kai Iskratsch do not work in this case. Any ideas?
In your case, the issue is that the \b flag will look for punctuation marks, white spaces and the beginning or end of the string. An opening bracket does not fall within any of these categories, and is thus omitted.
The easiest way to fix this would be to replace "\\b"+keyword+"\\b" with "[\\b(]"+keyword+"[\\b)]".
In regex syntax, the square brackets denote a set of which the regex engine will attempt to match any character it contains.
As per this previous SO question, it would seem that \b and [\b] are not the same. Whilst \b represents a word boundary, [\b] represents a backspace character. To fix this, simply replace "\\b"+keyword+"\\b" with "(\b|\()"+keyword+"(\b|\))".
Regex should match 0 or more chars. The below code change will fix the issue
String regex = ".*("+keyword+").*";
You could modify your regex to search for multiple characters afterwords, for example
[^\w]+"for"+[^\w] using the Pattern class in Java.
For your reference:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Basically you will have to adapt your regex to all the possible patterns it can find. But considering your actually dealing with code, you are better of building a parser/tokenizer for that language, or using one that already exists. Then all you have to do is run through the tokens to find the the ones you want.

Java Regex - Finding specific string within a String

I am trying to match a string that start with the set word "hotel", then a hyphen, then a word of any length, then another hyphen and finally a number of any length.
Edit: Dima gave the solution I needed in the comments of this question! Thanks Dima.
Further edit: elaborating on Dima's answer, adding capturing groups making it easier to retrieve the information entered, and correcting the last bit to only accept digits:
^hotel-(.+)-(\d+)
^hotel-(.)*$
(But hotel-something WILL work, according to your initial statement).
So, if you actually want something like:
hotel-XXXXXX-YYYYYYY
Then the regex is :
^hotel-(.)*-(.)*$
Try a regex online tester like http://www.regextester.com/.
If you want to match the start of the input, you use ^.
so if you have ^hotel-\b, that will force hotel to be at the start of the string.
as a note, you can use $ for the end of the string in a similar way.
\bhotel-[^\s-]+-[^\s-]+\b
\b means that it should be a word boundery
[^\s-] means anything but - or whitespace
https://regex101.com/r/mH3vY8/1

get the last portion of the link using java regex

I have an arraylist links. All links having same format abc.([a-z]*)/\\d{4}/
List<String > links= new ArrayList<>();
links.add("abc.com/2012/aa");
links.add("abc.com/2014/dddd");
links.add("abc.in/2012/aa");
I need to get the last portion of every link. ie, the part after domain name. Domain name can be anything(.com, .in, .edu etc).
/2012/aa
/2014/dddd
/2012/aa
This is the output i want. How can i get this using regex?
Thanks
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
(see here for background)
Why use regex ? Perhaps a simpler solution is to use String.split("/") , which gives you an array of substrings of the original string, split by /. See this question for more info.
Note that String.split() does in fact take a regex to determine the boundaries upon which to split. However you don't need a regex in this case and a simple character specification is sufficient.
Try with below regex and use regex grouping feature that is grouped based on parenthesis ().
\.[a-zA-Z]{2,3}(/.*)
Pattern description :
dot followed by two or three letters followed by forward slash then any characters
DEMO
Sample code:
Pattern pattern = Pattern.compile("\\.[a-zA-Z]{2,3}(/.*)");
Matcher matcher = pattern.matcher("abc.com/2012/aa");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
/2012/aa
Note:
You can make it more precise by using \\.[a-zA-Z]{2,3}(/\\d{4}/.*) if there are always 4 digits in the pattern.
String result = s.replaceAll("^[^/]*","");
s would be the string in your list.
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Why not just use the URI class?
output = new URI(link).getPath()
Try this one and use the second capturing group
(.*?)(/.*)
Use foreach loop to iterate over list.
Use substring and indexOf('/').
FOR EXAMPLE
String s="abc.com/2014/dddd";
System.out.println(s.substring(s.indexOf('/')));
OUTPUT
/2014/dddd
Or you can go for split method.
System.out.println(s.split("/",2)[1]);//OUTPUT:2014/dddd --->you need to add /

Regex which matches a string containing at least the specified characters

I have a huge dictionary which I'm trying to look through using a regex. What I would like to do is to find all the words in the dictionary which contain at least one occurrences of each character I provide in no particular order.
Right now I can find words which only contain the specified characters but like I said that is not exactly what I want.
Example:
I want at least one occurrence of each of the following characters {b, a, d}
astring.matches(regex)
I would expect words like:
badder,
baddest,
baffled
Notice they all contain at least one occurence of each character but in no particular order and other characters are present in the strings.
Anyone know how to do this? Other suggestions are also welcome!
You need a series of look-aheads:
^(?=.*b)(?=.*a)(?=.*d).*
which is a pain to construct. However, you can ease the pain by using regex to build it:
String regex = "^" + "bad".replaceAll(".", "(?=.*$0)") + ".*";
If using repeatedly with String.matches(), you would be better to use the following code, because every call to String.matches() compiles the regex again (there is no caching):
// do this once
Pattern pattern = Pattern.compile(regex);
// reuse the pattern many times
if (pattern.matcher(input).matches())
You can use a lookahead to do this if it's available
(?=.*b)(?=.*a)(?=.*d)
However this is quite inefficient. Any reason you can't use multiple String.indexOf checks?

Categories