Java Regex to extract numbers and strings from a string - java

I want to extract numbers and strings from a string.
Ex.
"TU111 1998 SUMMER”
"TU-232 SU 1999"
"TU232 1999 SUMMER"
I was able to get it using two patterns Pattern.compile("\\d+") and Pattern.compile("[a-zA-Z]+")
Is there a way to get it using one pattern?
The expected outcome should be
1=>TU 2=> 111/232 3=>1998/1999 4=>SUMMER/SU

You can just pipe the two regexes together:
[0-9]+|[a-zA-Z]+
Demo

Try with this.
Pattern pattern = Pattern.compile("((\\d+)|([a-zA-Z]+))");
Matcher matcher = pattern.matcher("TU111 1998 SUMMER");
while (matcher.find()) {
System.out.println(matcher.group());
}

Hey you have to use 2 regexes [a-zA-Z]+|[0-9]+ and maybe the different code I wrote below might give you a hint.just updating Pattern.compile() and string will be enough.
Pattern p = Pattern.compile("-?\\d+(,\\d+)*?\\.?\\d+?");
List<String> numbers = new ArrayList<String>();
Matcher m = p.matcher("your string");
while (m.find()) {
numbers.add(m.group());
}
System.out.println(numbers);

Related

How can I get the second matcher in regex in Java? [duplicate]

This question already has answers here:
Match at every second occurrence
(6 answers)
Closed 4 years ago.
I want to extract the second matcher in a regex pattern between - and _ in this string:
VA-123456-124_VRG.tif
I tried this:
Pattern mpattern = Pattern.compile("-.*?_");
But I get 123456-124 for the above regex in Java.
I need only 124.
How can I achieve this?
If you know that's your format, this will return the requested digits.
Everything before the underscore that is not a dash
Pattern pattern = Pattern.compile("([^\-]+)_");
I would use a formal pattern matcher here, to be a specific as possible. I would use this pattern:
^[^-]+-[^-]+-([^_]+).*
and then check the first capture group for the possible match. Here is a working code snippet:
String input = "A-123456-124_VRG.tif";
String pattern = "^[^-]+-[^-]+-([^_]+).*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
124
Demo
By the way, there is a one liner which would also work here:
System.out.println(input.split("[_-]")[2]);
But, the caveat here is that it is not very specific, and might fail for your other data.
You know you want only digits so be more specific Pattern.compile("-([0-9]+)_");
Try using below regex:
.*-(.*?)_
What this will do is : .* will match all the characters till it finds - . Also, as it is greedy, it will try to find the last possible option, which is just before 24
Demo: https://regex101.com/r/NWgZoH/1
JShell Output:
jshell> Pattern pattern = Pattern.compile(".*-(.*?)_");
pattern ==> .*-(.*?)_
jshell> Matcher matcher = pattern.matcher("VA-123456-124_VRG.tif");
matcher ==> java.util.regex.Matcher[pattern=.*-(.*?)_ region=0,21 lastmatch=]
jshell> if(matcher.find()){
...> System.out.println(matcher.group(1));
...> }
124
Your test case are very low, but if I answer your test case I think below regex can be helpful.
-.*-(.*)_
then extract first group.
if you just want to extract in simple way go ahead with this,
public static void main(String[] args) {
String s = "VA-123456-124_VRG.tif";
System.out.println(s.split("[_-]")[2]);
}

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

Regex for find data

I have used this (?:#\d{7}) regex for extracting only 7 digit after '#'.
For example I have string something like "#1234567890". After using the above patterrn I will get 7 digit after '#'.
Now the problem is : I have string something like that "Referenc number #1234567890"
where "Referenc number #" fixed.
Now I am finding for regex which can return the 1234567 number from the above string.
I have a one file which contains above string and there are also other data available.
You can try something like this:
String ref_no = "Referenc number #123456789";
Pattern p = Pattern.compile("Referenc number #([0-9]{7})");
Matcher m = p.matcher(ref_no);
while (m.find())
{
System.out.println(m.group(1));
}
The ?: should make your group "non-capturing", so if you add that separately around the hash sign, it should used for matching but excluded from capture.
(?:#)(\d{7})
If the String always starts with Referenc number # you could just use the following code:
String text = "Referenc number #1234567890";
Pattern pattern = Pattern.compile("\\d{7}");
Matcher matcher = pattern.matcher(text);
while(matcher.find()){
System.out.println(matcher.group());
}

regular expression text between two sign

I have a text and I want to replace variables in it with proper values and my variables located between two #. When I use [/(?m)#.*?#/] to get these texts it also returns texts before and after first and last #. how could I get texts only between these two # sign. thanks in advance.
I use String.split("") method in Java.
for example I want use on the following String:
this is #the best# possible way #t#o do result!!!
and I wanna get these two results:
the best
t
In Java you can use this regex to grab value between first and second #:
String repl = input.replaceFirst("(?m)^[^#]*#([^#]*)#.*$" "$1");
To grab value between first and last #:
String repl = input.replaceFirst("(?m)^[^#]*#(.*?)#[^#]*$" "$1");
To find multiple matches use Pattern, Matcher:
Pattern p = Pattern.compile("#([^#]*)#"):
Matcher m = p.matcher(p);
while (m.find()) {
System.out.prinln(m.group(1));
}
RegEx Demo
Split() is the wrong tool to use here, use the Matcher() method to do this instead.
String s = "this is #the best# possible way #t#o do result!!!";
Pattern p = Pattern.compile("#([^#]*)#");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Output
the best
t

java regex: extract text after delimeter?

i am new to regular expressions in Java. I like to extract a string by using regular expressions.
This is my String: "Hello,World"
I like to extract the text after ",". The result would be "World". I tried this:
final Pattern pattern = Pattern.compile(",(.+?)");
final Matcher matcher = pattern.matcher("Hello,World");
matcher.find();
But what would be the next step?
You don't need Regex for this. You can simply split on comma and get the 2nd element from the array: -
System.out.println("Hello,World".split(",")[1]);
OUTPUT: -
World
But if you want to use Regex, you need to remove ? from your Regex.
? after + is used for Reluctant matching. It will only match W and stop there.
You don't need that here. You need to match until it can match.
So use greedy matching instead.
Here's the code with modified Regex: -
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT: -
World
Extending what you have, you need to remove the ? sign from your pattern to use the greedy matching and then process the matched group:
final Pattern pattern = Pattern.compile(",(.+)"); // removed your '?'
final Matcher matcher = pattern.matcher("Hello,World");
while (matcher.find()) {
String result = matcher.group(1);
// work with result
}
Other answers suggest different approaches to your problem and might offer better solution for what you need.
System.out.println( "Hello,World".replaceAll(".*,(.*)","$1") ); // output is "World"
You are using a reluctant expression and will only select a single character W, whereas you can use a greedy one and print your matched group content:
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
World
See Regex Pattern doc

Categories