Java Regex, remove leading spaces of each line - java

Want to use Java String.replaceAll(regex, "") to remove all leading spaces of each line in a multi-line text string. if possible remove all carriage-returns as well. What the "regex" should be?

Converting my comment to answer so that solution is easy to find for future visitors.
You may use regex replacement in Java:
str = str.replaceAll("(?m)^\\s+|\\s+$", "");
RegEx Details:
(?m): Enable MULTILINE mode so that ^ and $ are matched in every line.
^\\s+: Match 1+ whitespaces at line start
|: OR
\\s+$: Match 1+ whitespaces before line end

Related

Replace URL String with Integer characters located in the end of that String

I have some URL link and tried to replace all non-integer values with integers in the end of the link using regex
The URL is something like
https://some.storage.com/test123456.bucket.com/folder/80.png
Regex i tried to use:
Integer.parseInt(string.replaceAll(".*[^\\d](\\d+)", "$1"))
Output for that regex is "80.png", and i need only "80". Also i tried this tool - https://regex101.com. And as i see the main problem is that ".png" not matching with my regex and then, after substitution, this part adding to matching group.
I'm totally noob in regex, so i kindly ask you for help.
You may use
String result = string.replaceAll("(?:.*\\D)?(\\d+).*", "$1");
See the regex demo.
NOTE: If there is no match, the result will be equal to the string value. If you do not want this behavior, instead of "(?:.*\\D)?(\\d+).*", use "(?:.*\\D)?(\\d+).*|.+".
Details
(?:.*\D)? - an optional (it must be optional because the Group 1 pattern might also be matched at the start of the string) sequence of
.* - any 0+ chars other than line break chars, as many as possible
\D - a non-digit
(\d+) - Group 1: any one or more digits
.* - any 0+ chars other than line break chars, as many as possible
The replacement is $1, the backreference to Group 1 value, actually, the last 1+ digit chunk in the string that has no line breaks.
Line breaks can be supported if you prepend the pattern with the (?s) inline DOTALL modifier, i.e. "(?s)(?:.*\\D)?(\\d+).*|.+".

Difficulties finding a Java regex equivalent to a JavaScript regex

So, what I am trying to do is:
I have a string:
Special Skills:
someText
could range
through multiple lines
Special Abilities:
another
someText
Background:
multiline
text
I've already managed to come up with the following regex. It works perfectly in JavaScript according to regexr.com, but not in Java, according to Intellij's built-in Check-Regex and freeformatter.com.
Special Abilities:\n(.*\n)+?(Special Skills:|Background:)
The expression should, first off, extract
Special Skills:
someText
could range
through multiple lines
Mind that the both the sections "Special Abilities" and "Background" are optional.
Since I am kindoff stuck here, any help would be greatly appreciated!
You may add the end-of-string(line) anchor $ as an alternative to the alternation group at the end of the pattern, make sure the . matches carriage returns with (?d) Pattern.UNIX_LINES embedded flag and wrap (.*\n)+? with a capturing group to capture all text it matches into 1 group (and the (.*\n)+? can be changed into a non-capturing group):
(?d)Special Abilities:\r?\n((?:.*\n)*?)(Special Skills:|Background:|$)
See this regex demo.
Details
(?d) - . now matches any char but a newline
Special Abilities: - a literal text
\r?\n - a CRLF or LF line ending
((?:.*\n)*?) - Group 1: zero or more, but as few as possible, repetitionsof 0+ chars other than LF symbol and then an LF symbol
(Special Skills:|Background:|$) - either of the three alternatives: Special Skills:, Background: or end of string ($).
An alternative expression:
(?ms)Special Abilities:\r?\n(.*?)(^Special Skills:|^Background:|\Z)
See this regex demo
Here, (?ms) defines the multiline and dotall modes (^ will match start of a line here and . will match all symbols). Instead of $, we need to use \Z - end of string anchor.

How to remove specific repeated characters from text?

I have a String like
"this is line 1\n\n\nthis is line 2\n\n\nthis is line 3\t\t\tthis is line 3 also"
What I want to do is remove repeated specific characters like "\n", "\t" from this text.
"this is line 1\nthis is line 2\nthis is line 3\tthis is line 3 also"
I tried some regular expressions but didn't work for me.
text = text.replace("/[^\\w\\s]|(.)\\1/gi", "");
Is there any regex for this?
If you need to only remove sepcific whitespace chars, \s won't help as it will overmatch, i.e. it will also match spaces, hard spaces, etc.
You may use a character class with the chars, wrap them with a capturing group and use a backreference to the value captured. And replace with the backreference to the Group 1 value:
.replaceAll("([\n\t])\\1+", "$1")
See the regex demo.
Details
([\n\t]) - Group 1 (referred to with \1 from the pattern and $1 from the replacement pattern): a character class matching either a newline or tab symbols
\1+ - one or more repetitions of the value in Group 1.
I would use Guava's CharMatcher:
CharMatcher.javaIsoControl().removeFrom(myString)

Java replaceAll remove spaces from empty lines

I'm trying to remove all spaces from lines in a block of text which contain nothing but spaces, leaving the line breaks in place.
I tried the following:
str = " text\n \n \n text";
str = str
.replaceAll("\\A +\\n", "\n")
.replaceAll("(\\n +\\n)", "\n\n")
.replaceAll("\\n +\\Z", "\n");
I was expecting the output to be
" text\n\n\n text"
but instead it was
" text\n\n \n text"
The space in the third line of the block had not been removed. What am I doing wrong here?
Use the MULTILINE flag, so that ^ and $ will match the beginning and end of each line. The problem with your regex is that it is capturing the newline character, so the next match will advance past it, and cannot match.
str.replaceAll("(?m)^ +$", "")
You need to match lines with horizontal spaces only and the Pattern.MULTILINE modifier is required for the ^ and $ anchors to match start and end of lines respectively (its embedded option is (?m)). Use
String str = " text\n \n \n text";
str = str.replaceAll("(?m)^[\\p{Zs}\t]+$", "");
See the Java demo.
Details:
(?m) - Multiline mode
^ - start of line
[\\p{Zs}\t]+ - 1 or more horizontal whitespaces
$ - end of line.
An alternative to [\p{Zs}\t] is a pattern to match any whitespace excluding vertical whitespace symbols. In Java, character class subtraction can be handy: [\s&&[^\r\n]] where [\s] matches any whitespace and &&[^\r\n] excludes a carriage return and newline characters from it. A full pattern would look like .replaceAll("(?Um)^[\\s&&[^\r\n]]+$", "").
Use anchors:
str = str.replaceAll("(?m)^[^\\S\\n]+$", "");
Where ^ and $ match respectively the start and the end of a line when the multiline flag (?m) is switched on.
The problem with your pattern is that you use \\n around the horizontal whitespaces replaceAll("(\\n +\\n)", "\n\n") (simple spaces in your pattern). If you do that you can't obtain contiguous results since you can't match the same character twice.
Note: add eventually \\r in the character class (to exclude it as \\n) if you want to take in account Windows or old Mac end of lines.

Regular expression to match '\n' character

I am having a string "<?xml version=2.0><rss>Feed</rss>" I wrote a regex to match this string as
"<?xml.*<rss.*</rss>"
But if the input string contains \n like `"\nFeed" doesn't work for the above regex.
How to modify my regex to include \n character between strings.
The matching behavior of a dot can be controlled with a flag. It looks like in Java the default matching behavior for the dot is any character except the line terminators \r and \n.
I'm not a Java programmer, but usually using (?s) at beginning of a search string changes the matching behavior for a dot to any character including line terminators. So perhaps "(?s)<?xml.*<rss.*</rss>" works.
But better would be here to use "<?xml.*?<rss[\s\S]*?</rss>" as search string.
\s matches any whitespace character which includes line terminators and \S matches any non whitespace character. Both in square brackets results in matching any character.
For completness: [\w\W] matches also always any character.
You can combine it with (\\n)*. It is necessary to add an extra \ because it is a special character.
Another option is to execute replaceAll("\\n","") before executing the regex.

Categories