Regex java a regular expression to extract string except the last number

Regex java a regular expression to extract string except the last number - java

How to extract all characters from a string without the last number (if exist ) in Java, I found how to extract the last number in a string using this regex [0-9.]+$ , however I want the opposite.
Examples :
abd_12df1231 => abd_12df
abcd => abcd
abcd12a => abcd12a
abcd12a1 => abcd12a

What you might do is match from the start of the string ^ one or more word characters \w+ followed by not a digit using \D
^\w+\D
As suggested in the comments, you could expand the characters you want to match using a character class ^[\w-]+\D or if you want to match any character you could use a dot ^.+\D

If you want to remove one or more digits at the end of the string, you may use
s = s.replaceFirst("[0-9]+$", "");
See the regex demo
To also remove floats, use
s = s.replaceFirst("[0-9]*\\.?[0-9]+$", "");
See another regex demo
Details
(?s) - a Pattern.DOTALL inline modifier
^ - start of string
(.*?) - Capturing group #1: any 0+ chars other than line break chars as few as possible
\\d*\\.?\\d+ - an integer or float value
$ - end of string.
Java demo:
List<String> strs = Arrays.asList("abd_12df1231", "abcd", "abcd12a", "abcd12a1", "abcd12a1.34567");
for (String str : strs)
System.out.println(str + " => \"" + str.replaceFirst("[0-9]*\\.?[0-9]+$", "") + "\"");
Output:
abd_12df1231 => "abd_12df"
abcd => "abcd"
abcd12a => "abcd12a"
abcd12a1 => "abcd12a"
abcd12a1.34567 => "abcd12a"
To actually match a substring from start till the last number, you may use
(?s)^(.*?)\d*\.?\d+$
See the regex demo
Java code:
String s = "abc234 def1.566";
Pattern pattern = Pattern.compile("(?s)^(.*?)\\d*\\.?\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}

With this Regex you could capture the last digit(s)
\d+$
You could save that digit and do a string.replace(lastDigit,"");

Related

regex to capture the string between a word and first occurrence of a character

Want to capture the string after the last slash and before the first occurrence of backward slash().
sample data:
sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemSummary/amex","Accept":"application/json, application/javascript","sessionId":"ggh76734",
targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/basket","Accept":"application/json, application/javascript","sessionId":"ggh76734", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/","Accept":"application/json, application/javascript","sessionId":"ggh76734",targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;List item
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,"myreferer":"https://www.example.com/mybook/order/newbooking/itemList/basket?id=76734&para=jhjdfhj&type=new&ordertype=kjkf", "Accept":"application/json, application/javascript","sessionId":"ggh76734", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
Expecting the below output:
amex
basket
''(empty string)
basket
Have build the below regex to capture it but its 100% accurate. It is capturing some additional part.
Regex
\bmyreferer\\\":\\\"\S+\/(.*?)\\\",
Could you please help me to improve the regex to get desired output?

You could use a negated character class with a capture group:
\bmyreferer":"[^"]+/([^/"]*)"
\bmyreferer":" Match literally preceded by a word boundary
[^"]+/ Match 1+ times any char except ", followed by a /
( Capture group 1
[^/"]* Optionally match (to also match an empty string) any char except / and "
)" Close group 1 and match "
regex demo | Java demo
Example code
String regex = "\\bmyreferer\":\"[^\"]+/([^/\"]*)\"";
String string = "sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemSummary/amex\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\", targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;\n\n"
+ "sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemList/basket\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\", targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;\n\n"
+ "sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; ,\"myreferer\":\"https://www.example.com/mybook/order/newbooking/itemList/\",\"Accept\":\"application/json, application/javascript\",\"sessionId\":\"ggh76734\",targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;List item";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Group 1 value: " + matcher.group(1));
}
Output
Group 1 value: amex
Group 1 value: basket
Group 1 value:

Regex pattern to separate values using comma but retain commas used within parenthesis

I am trying to modify a regex expression so that it retains commas used within parenthesis and separate all other values.
Existing pattern : ([^\\s,]+)\\s*=>([^,]+)
Updated pattern : ([^\\s,]+)\\s*=>([^(,)]+)
Java code:
public static void main(String[] args) {
String softParms = "batch_code => 'batchCd',user_id => 'SYSUSER',thread_pool => 'tpName',business_date => FN_DATE_ARG(null,0),rerun_number => 0,max_timeout_mins => 0,raise_error => false,thread_notifications => false";
//Pattern paramPattern = Pattern.compile("([^\\s,]+)\\s*=>([^,]+)");
Pattern paramPattern = Pattern.compile("([^\\s,]+)\\s*=>([^(,)]+)");
Matcher matcher = paramPattern.matcher(softParms);
while (matcher.find()) {
String param = matcher.group(1);
String value = matcher.group(2);
System.out.println("Param: " + param + ", Value: " + value);
}
}
The param value for business_date should come as FN_DATE_ARG(null,0) but the function is either returning FN_DATE_ARG(null or FN_RMB_DATE_ARG
Would appreciate any help on this!

You can use
([^\s,]+)\s*=>\s*(.*?)(?=\s*,\s*\w+\s*=>|$)
See the regex demo. Details:
([^\s,]+) - Group 1: one or more chars other than whitespace and a comma
\s*=>\s* - => enclosed with zero or more whitespaces
(.*?) - Group 2: any zero or more chars other than line break chars as few as possible
(?=\s*,\s*\w+\s*=>|$) - up to the leftmost sequence of 0+ whitespaces, comma, 0+ whitespaces, 1+ word chars, 0+ whitespaces, =>, or end of string.
In your code, use
Pattern paramPattern = Pattern.compile("([^\\s,]+)\\s*=>\\s*(.*?)(?=\\s*,\\s*\\w+\\s*=>|$)");
See the Java demo online.

Why use an overly complex regular expression when all that is needed is a single one-line call to String#replaceAll:
String softParms = "batch_code => 'batchCd',user_id => 'SYSUSER',thread_pool => 'tpName',business_date => FN_DATE_ARG(null,0),rerun_number => 0,max_timeout_mins => 0,raise_error => false,thread_notifications => false";
String businessDate = softParms.replaceAll(".*\\bbusiness_date => (.*?)\\s*(?:,[^,\\s]+ =>.*|$)", "$1");
System.out.println(businessDate);
This prints:
FN_DATE_ARG(null,0)
The regex pattern will match the key business_date followed by \\s*,[^,\\s]+ =>, which in this case will match the text FN_DATE_ARG(null,0). The (.*?) matching group will stop matching at the comma before the next key.

Find duplicate char sequences in String by regex in Java

I have an input string and I want to use regex to check if this string has = and $, e.g:
Input:
name=alice$name=peter$name=angelina
Output: true
Input:
name=alicename=peter$name=angelina
Output: false
My regex does't work:
Pattern pattern = Pattern.compile("([a-z]*=[0-9]*$])*");
Matcher matcher = pattern.matcher("name=rob$name=bob");

With .matches(), you may use
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*"); // With `matches()` to ensure whole string match
Details
\p{Lower}+ - 1+ lowercase letters (use \p{L} to match any and \p{Alpha} to only match ASCII letters)
= - a = char
\p{Lower}+ - 1+ lowercase letters
(?:\\$\\p{Lower}+=\\p{Lower}+)* - 0 or more occurrences of:
\$ - a $ char
\p{Lower}+=\p{Lower}+ - 1+ lowercase letters, = and 1+ lowercase letters.
See the Java demo:
List<String> strs = Arrays.asList("name=alice$name=peter$name=angelina", "name=alicename=peter$name=angelina");
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*");
for (String str : strs)
System.out.println("\"" + str + "\" => " + pattern.matcher(str).matches());
Output:
"name=alice$name=peter$name=angelina" => true
"name=alicename=peter$name=angelina" => false

You have extra ] and need to escape $ to use it as a character though you also need to match the last parameter without $ so use
([a-z]*=[a-z0-9]*(\$|$))*
• [a-z]*= : match a-z zero or more times, match = character
• [a-z0-9]*(\$|$): match a-z and 0-9, zero or more times, followed by either $ character or end of match.
• ([a-z]*=[a-z0-9]*(\$|$))*: match zero or more occurences of pairs.
Note: use + (one or more matches) instead of * for strict matching as:
([a-z]+=[a-z0-9]+(\$|$))*

Java Regex Word Extract exclude with special char

below are the String values
"method" <in> abs
("method") <in> abs
method <in> abs
i want to extract only the Word method, i tries with below regex
"(^[^\\<]*)" its included the special char also
O/p for the above regex
"method"
("method")
method
my expected output
method
method
method

^\\W*(\\w+)
You can use this and grab the group 1 or capture 1.See demo.
https://regex101.com/r/sS2dM8/20

A couple of words on your "(^[^<]*)" regex: it does not match because it has beginning of string anchor ^ after ", which is never the case. However, even if you remove it "([^<]*)", it will not match the last case where " and ( are missing. You need to make them optional. And note the brackets must escaped, and the order of quotes and brackets is different than in your input.
So, your regex could be fixed as
^\(?"?(\b[^<]*)\b"?\)?(?=\s+<)
See demo
However, I'd suggest using a replaceAll approach:
String rx = "(?s)\\(?\"?(.*?)\"?\\)?\\s+<.*";
System.out.println("\"My method\" <in> abs".replaceAll(rx, "$1"));
See IDEONE demo
If the strings start with ("My method, you can also add ^ to the beginning of the pattern: String rx = "(?s)^\\(?\"?(.*?)\"?\\)?\\s+<.*";.
The regex (?s)^\\(?\"?(.*?)\"?\\)?\\s+<.* matches:
(?s) makes . match a newline symbol (may not be necessary)
^ - matches the beginning of a string
\\(? - matches an optional (
\"? - matches an optional "
(.*?) - matches and captures into Group 1 any characters as few as possible
\"? - matches an optional "
\\)? - matches an optional )
\\s+ - matches 1 or more whitespace
< - matches a <
.* - matches 0 or more characters to the end of string.
With $1, we restore the group 1 text in the resulting string.

In fact it is not too complicated.
Here is my answer:
Pattern pattern = Pattern.compile("([a-zA-Z]+)");
String[] myStrs = {
"\"method\"",
"(\"method\")",
"method"
};
for(String s:myStrs) {
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
System.out.println( matcher.group(0) );
}
}
The output is:
method
method
method
You just need to use:
[a-zA-Z]+

regex to remove round brackets from a string

i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks

Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity

You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex java a regular expression to extract string except the last number - java

How to extract all characters from a string without the last number (if exist ) in Java, I found how to extract the last number in a string using this regex [0-9.]+$ , however I want the opposite. Examples : abd_12df1231 => abd_12df abcd => abcd abcd12a => abcd12a abcd12a1 => abcd12a

With this Regex you could capture the last digit(s) \d+$ You could save that digit and do a string.replace(lastDigit,"");

Related

regex to capture the string between a word and first occurrence of a character

Regex pattern to separate values using comma but retain commas used within parenthesis

Find duplicate char sequences in String by regex in Java

Java Regex Word Extract exclude with special char

regex to remove round brackets from a string

Categories

Resources