How to create regex expression for 3 links at once - java

I created regex expression in JAVA for 2 links at once:
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/test0218.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/TestTes-09-05-2018.pdf
Regex:
String REGEX_LINK = "https:..downloads.test.test.testagain.tes.test-test.test."
Pattern pattern = Pattern.compile( REGEX_LINK + ".[\w*/]*.((\d{2}-\d{2}-)?\d{4}).pdf" );
But I have to create regex expression for 3 links at once and I don't know how to do that, I need help with this:
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/test0218.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/TestTes-09-05-2018.pdf
https://downloads.test.test.testagain.tes/test-test/test/te25st24w/te43s5t25x/0twt42ts/01-01-18_Testt_Testing_ASB_Test_Final.pdf
I have to create one regex expression to extract String from 1 link: "0218", from 2 link: "09-05-2018", from 3 link: "01-01-18"
Maybe someone has a any idea how to do this?

You could match 2 times 2 digits with an optional hyphen, and then optionally 4 or 2 digits preceded by a hyphen.
Note that the pattern by itself does not verify a valid date.
(?<!\d)(\d{2}-?\d{2}(?:-(?:\d{4}|\d{2}))?)\S*\.pdf\b
Explanation
(?<!\d) Negative lookbehind, assert not a digit to the left
( Capture group 1
\d{2}-?\d{2} Match 2 digits, optional hyphen and 2 digits
(?:-(?:\d{4}|\d{2}))? Optionally match - and either 4 or 2 digits
) Close group 1
\S* Match optional non whitespace chars
\.pdf\b Match a dot and pdf followed by a word boundary
Regex demo
Or if there can not be any other digits following till the end of the string:
(?<!\d)(\d{2}-?\d{2}(?:-(?:\d{4}|\d{2}))?)[^\d\s]*\.pdf\b
Regex demo

Related

Given string filter out unique element from string using regex

I have this String and I want to filter the digit that came after the big number with the space, so in this case I want to filter out 2 and 0.32. I used this regex below which only filters out decimal numbers, however I want to filter both decimals and integer numbers, is there any way?
String s = "ABB123,ABPP,ADFG0/AA/BHJ.S,392483492389 2,BBBB,YUIO,BUYGH/AA/BHJ.S,3232489880 0.32"
regex = .AA/BHJ.S,\d+ (\d+.?\d+)
https://regex101.com/r/ZqHDQ8/1
The problem is that \d+.?\d+ matches at least two digits. \d+ matches one or more digits, then .? matches any optional char other than line break char, and then again \d+ matches (requires) at least one digit (it matches one or more).
Also, note that all literal dots must be escaped.
You can use
.AA/BHJ\.S,\d+\s+(\d+(?:\.\d+)?)
See the regex demo.
Details:
. - any one char
AA/BHJ\.S, - a AA/BHJ.S, string
\d+ - one or more digits
\s+ - one or more whitespaces
(\d+(?:\.\d+)?) - Group 1: one or more digits, and then an optional sequence of a dot and one or more digits.
You could look for anything following /AA/BHJ with a reluctant quantifier, then use a capturing group to look for either digits or one or more digits followed by a decimal separator and other digits.
/AA/BHJ.*?\s+(\d+\.\d+|\d+)
Here is a link to test the regex:
https://regex101.com/r/l5nMrD/1

Java Regex Troubles

I have a string that needs to be extracted using regex. It’s preferable that only a single regex is used. As it’s used in a loop with 9 pre-existing Regex’s.(Ie, so i can just add it to the ArrayList of available regex's)
The pattern of strings will always be
Between {4,8} A-Z0-9. Followed by either,
[A-Z]{1} or [A-Z0-9]{2} or, another [A-Z0-9]{4,8}
For example:
“A1B1C1 ABCD E FGHI JK X0Y0Z0”
I’d want this to return four matches.
A1B1C1 & ABCD E & FGHI JK & X0Y0Z0
I've been trying to match the first part of {4,8} chatactures, followed by a non-greedy match for {1,2}. For example(s):
[A-Z0-9]{4,8}(\\s{1}[A-Z0-9]{1,2})*? && [A-Z0-9]{4,8}(\\s{1}[A-Z]{1}|\\s{1}[A-Z0-9]{2})*?
But this never returns more than the first {4,8} charactures.
You could use an optional part with a word boundary and an alternation to match either [A-Z0-9]{2} or [A-Z]
\b[A-Z0-9]{4,8}(?:\h+(?:[A-Z0-9]{2}|[A-Z]))?\b
\b Word boundary
[A-Z0-9]{4,8} Match 4 - 8 times A-Z0-9
(?: Non capture group
\h+ Match 1+ horizontal whitespace chars
(?:[A-Z0-9]{2}|[A-Z]) Match either 2 x A-Z0-9 or 1 x A-Z
)? Close non capture group and make it optional
\b Word boundary
Regex demo | Java demo
In Java
String regex = "\\b[A-Z0-9]{4,8}(?:\\h+(?:[A-Z0-9]{2}|[A-Z]))?\\b";

Regular Expression that matches number with max 2 decimal places

I'm writing a simple code in java/android.
I want to create regex that matches:
0
123
123,1
123,44
and slice everything after second digit after comma.
My first idea is to do something like that:
^\d+(?(?=\,{1}$)|\,\d{1,2})
^ - from begin
\d+ match all digits
?=\,{1}$ and if you get comma at the end
do nothin
else grab two more digits after comma
but it doesn't match numbers without comma; and I don't understand what is wrong with the regex.
You may use
^(\d+(?:,\d{1,2})?).*
and replace with $1. See the regex demo.
Details:
^ - start of string
-(\d+(?:,\d{1,2})?) - Capturing group 1 matching:
\d+ - one or more digits
(?:,\d{1,2})? - an optional sequence of:
, - a comma
\d{1,2} - 1 or 2 digits
.* - the rest of the line that is matched and not captured, and thus will be removed.
basic regex : [0-9]+[, ]*[0-9]+
In case you want to specify min max length use:
[0-9]{1,3}[, ]*[0-9]{0,2}
Here:
,{1}
says: exactly ONE ","
Try:
,{0,1}
for example.

Regex to find integer or decimal from a string in java in a single group?

I am trying (\d+|\d+\.\d+) on this sample string:
Oats 124 0.99 V 1.65
but it is giving me decimal number in different groups when I am using pattern matcher classes in Java.
I want my answers in a single group.
You don't need to have a separate patterns for integer and floating point numbers. Just make the decimal part as optional and you could get both type of numbers from a single group.
(\d+(?:\.\d+)?)
Use the above pattern and get the numbers from group index 1.
DEMO
Code:
String s = "Oats 124 0.99 V 1.65";
Pattern regex = Pattern.compile("(\\d+(?:\\.\\d+)?)");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
124
0.99
1.65
Pattern explanation:
() capturing group .
\d+ matches one or more digits.
(?:) Non-capturing group.
(?:\.\d+)? Matches a dot and the following one or more digits. ? after the non-capturing group makes the whole non-capturing group as optional.
OR
Your regex will also work only if you change the order of the patterns.
(\d+\.\d+|\d+)
DEMO
Try this pattern:
\d+(?:\.\d+)?
Edit:
\d+ match 1 or more digit
(?: non capturing group (optional)
\. '.' character
\d+ 1 or more digit
)? Close non capturing group
Question not entirely clear, but the first problem I see is . is a magic character in regex meaning any character. You need to escape it with as . There are lots of regex cheat sheets out there, for example JavaScript Regex Cheatsheet
(\d+|\d+\.\d+)

String validation in java using regex

I have to validate a set of strings and do stuff with it. The acceptable formats are :
1/2
12/1/3
1/23/333/4
The code used for validation is:
if (str.matches("(\\d+\\/|\\d+){2,4}")) {
// do some stuff
} else {
// do other stuff
}
But it will match any integer with or without slashes, I want to exclude ones without slashes.. How can I match only the valid patterns?
It looks like you want to find number (series of one or more digits - \d+) with one or more /number after it. If that is the case then you can write your regex as
\\d+(/\\d+)+
You can try
(\d+/){1,3}\d+
digits followed by / one to three times----^^^^^^ ^^------followed by digit
Sample code:
System.out.println("1/23/333/4".matches("(\\d+/){1,3}\\d+")); // true
System.out.println("1/2".matches("(\\d+/){1,3}\\d+")); // true
System.out.println("12/1/3".matches("(\\d+/){1,3}\\d+")); // true
Pattern explanation:
( group and capture to \1 (between 1 and 3 times):
\d+ digits (0-9) (1 or more times)
/ '/'
){1,3} end of \1
\d+ digits (0-9) (1 or more times )
\\b\\d+(/\\d+){1, 3}\\b
\b is a word boundary. This will match all tokens with 1-3 slashes, with the slashes surrounded by digits and the token surrounded by word boundaries.

Categories