Java - regex any string between < and > [duplicate] - java

This question already has answers here:
regexp dot with square brackets or not
(2 answers)
Closed 5 years ago.
I use a String where I need to get rid of any occurence of: < any string here >
I tried line = line.replaceAll("<[.]+>", "");
but it gives the same String... How can I delete the < any string between > substrings?

As per my original comment...
Brief
Your regex <[.]+> says to match <, followed by the dot character . (literally) one or more times, followed by >
Removing [] will get you a semi-appropriate answer. The problem with this is that it's greedy, so it'll actually replace everything from the first occurrence of < to the last occurrence of > in the entire string (see the link to see it in action).
What you want is to either make the quantifier lazy or use a character class to ensure it's not going past the ending character.
Code
Method 1 - Lazy Quantifier
This method .*? matches any character any number of times, but as few as possible
See regex in use here
<.*?>
Method 2 - Character Set
This method [^>]* matches any character except > any number of times
See regex in use here
<[^>]*>
Note: This method performs much better than the first.

line = line.replaceAll("<[^>]*>", "");

Related

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

Having difficulty understanding Java regex interpretation [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Can someone help me with the following Java regex expression? I've done some research but I'm having a hard time putting everything together.
The regex:
"^-?\\d+$"
My understandning of what each symbol does:
" = matches the beginning of the line
- = indicates a range
? = does not occur or occurs once
\\d = matches the digits
+ = matches one or more of the previous thing.
$ = matches end of the line
Is the regex saying it only want matches that start or end with digits? But where do - and ? come in?
- only indicates a range if it's within a character class (i.e. square brackets []). Otherwise, it's a normal character like any other. With that in mind, this regex matches the following examples:
"-2"
"3"
"-700"
"436"
That is, a positive or negative integer: at least one digit, optionally preceded by a minus sign.
Some regex is composed, as you have now, the correct way to read your regex is :
^ start of word
-? optional minus character
\\d+ one or more digits
$ end of word
This regex match any positive or negative numbers, like 0, -15, 558, -19663, ...
Fore details check this good post Reference - What does this regex mean?
"^-?\\d+$" is not a regex, it's a Java string literal.
Once the compiler has parsed the string literal, the string value is ^-?\d+$, which is a regex matching like this:
^ Matches beginning of input
- Matches a minus sign
? Makes previous match (minus sign) optional
\d Matches a digit (0-9)
+ Makes previous match (digit) match repeatedly (1 or more times)
$ Matches end of input
All-in-all, the regex matches a positive or negative integer number of unlimited length.
Note: A - only denotes a range when inside a [] character class, e.g. [4-7] is the range of characters between '4' and '7', while [3-] and [-3] are not ranges since the start/end value is missing, so they both just match a 3 or - character.

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?
^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo
Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String
Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

Regex for arithmetic expression [duplicate]

This question already has answers here:
In a java regex, how can I get a character class e.g. [a-z] to match a - minus sign?
(5 answers)
Closed 5 years ago.
The regex -?\d+ [+|-|*|/] -?\d+ matches expression 1 + 3 without any problems also 1 + -2 without any problems, but I don't know why it does not match 1 - 2. Could you explaing why it does not match the - char correctly?
By my regex I wanted to achieve:
optional - at the beginning
string of digits
whitespace then operator then whitespace
optional - before second stringof digits
A - unescaped in the middle of a character class creates a range. You can escape it or move it to the start or end of the character class. You also don't need/want the |s I'd guess.
You currently make a range between | and | which doesn't really make sense. You also could just use grouping instead of a character class.
(\+|-|\*|/)
With this approach the + and * need to be escaped because they are quantifiers when outside a character class.

Printing multiple characters from a string [duplicate]

This question already has answers here:
Removing duplicates from a String in Java
(50 answers)
Closed 7 years ago.
If I have a string "SSSAAADDDCCC" how would I print just "SADC". Can it be done using SubString or would I have to use charAt()?
There is a simple way to do this - however since I don't see any code and I do not see any effort on your part I will not just give you the answer. Below is some psudo code you can work off to try to find the right answer. Good Luck!
currentChar = myString.charAt(0);
i = 0;
print current character //as per comments, cover the base case
while(string has more characters)
if current character != next character
print next character
i++
Use regular expression to replace all repeating characters with a single character:
"SSSAAADDDCCC".replaceAll("(.)\\1+", "$1") // returns "SADC"
(.) matches and captures a character.
\\1+ matches one or more instances of the captured character.
$1 replaces the entire matched value with the captured character.
Non-repeating characters are not matched, and are therefore left alone.
If you don't like the charAt method you could use substrings like this:
int j=0;
String in="sssdddaaaccc";
String out="";
for(int i=0;i<4;i++)
{
out=out+in.subString(j,j+1);
for(j=j; j<3;j++);
}
System.out.println(out);

Categories