Regex java : replaceAll whitespace except between hyphen - java

I've got a string:
-----test test----- testestestest testestest -----test test-----
I'd like to replace each whitespace with \n, but I'd have to keep the whitespaces between the hyphens. Here is perfect result:
-----test test-----\ntestestestest\ntestestest\n-----test test-----
I've tried a lot of different regex but none of them work, here is my best try..
Pattern ws = Pattern.compile("\\s(?![\-]*\-)");
Matcher matcher = ws.matcher(myString);
String result = matcher.replaceAll("\n");
Could somebody help me?
PS: What I really don't understand is that by replacing the hyphens with brackets (in the string as well as the regex), it works correctly...\s(?![^\{]*\})

Just match whitespace at the end of a line:
/\s$/
Here's the code:
String result = myString.replaceAll("(?m)\\s$", "\\\\n");
Result:
-----test test-----\n
testestestest\n
testestest\n
-----test test-----\n
That's in your code:
Pattern ws = Pattern.compile("\\s$", Pattern.MULTILINE);
Matcher matcher = ws.matcher(myString);
String result = matcher.replaceAll("\\\\n");

Do you know there is always a single space at the end of 'every' line? If so, use this:
String text = "-----test test----- ";
text = text.substring(0, text.length() - 1) + "\\n";

Related

Java regex extract capture group if it exists

I apparently don't understand Java's regex library or regex either for that matter.
for this string:
String text = "asdf 2013-05-12 asdf";
this regex explodes in my face:
String REGEX_FORMAT_1 = ".+?([0-9]{4}\\s?-\\s?[0-9]{2}\\s?-\\s?[0-9]{2}).+";
Matcher matcher_1 = PATTERN_FORMAT_1.matcher(text);
if(matcher_1.matches()) {
String matchedGroup = matcher_1.group();
...
}
Semantically this makes sense to me but it seems I've totally misunderstood something. The regex works fine in some online regex editors like regex101 but not in others. Could someone please help me understand why I don't get the capture group containing 2013-05-12 ...
group() is equivalent to group(0) and returns the entire matched string. Use group(1) to pull out the first matched group.
String text = "asdf 2013-05-12 asdf";
String regex = ".+?([0-9]{4}\\s?-\\s?[0-9]{2}\\s?-\\s?[0-9]{2}).+";
Matcher matcher = Pattern.compile(regex).matcher(text);
if (matcher.matches()) {
String matchedGroup = matcher.group(1);
System.out.println(matchedGroup);
}
Output:
2013-05-12

how to exclude "<" in regex match

I have a String which looks like "<name><address> and <Phone_1>". I have get to get the result like
1) <name>
2) <address>
3) <Phone_1>
I have tried using regex "<(.*)>" but it returns just one result.
The regex you want is
<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>
Which will then spit out the stuff you want in the 3 capture groups. The full code would then look something like this:
Matcher m = Pattern.compile("<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>").matcher(string);
if (m.find()) {
String name = m.group(1);
String address = m.group(2);
String phone = m.group(3);
}
The pattern .* in a regex is greedy. It will match as many characters as possible between the first < it finds and the last possible > it can find. In the case of your string it finds the first <, then looks for as much text as possible until a >, which it will find at the very end of the string.
You want a non-greedy or "lazy" pattern, which will match as few characters as possible. Simply <(.+?)>. The question mark is the syntax for non-greedy. See also this question.
This will work if you have dynamic number of groups.
Pattern p = Pattern.compile("(<\\w+>)");
Matcher m = p.matcher("<name><address> and <Phone_1>");
while (m.find()) {
System.out.println(m.group());
}

How to match on single line for regex?

I have a regex to match a line and delete it. Everything is below it (and keep everything above it).
Two Part Ask:
1) Why won't this pattern match the given String text below?
2) How can I be sure to just match on a single line and not multiple lines?
- The pattern has to be found on the same single line.
String text = "Keep this.\n\n\nPlease match junkhere this t-h-i-s is missing.\n"
+ "Everything should be deleted here but don't match this on this line" + "\n\n";
Pattern p = Pattern.compile("^(Please(\\s)(match)(\\s)(.*?)\\sthis\\s(.*))$", Pattern.DOTALL );
Matcher m = p.matcher(text);
if (m.find()) {
text = (m.replaceAll("")).replaceAll("[\n]+$", ""); // remove everything below at and below "Please match ... this"
System.out.println(text);
}
Expected Output:
Keep this.
You are complicating your life...
First, as I said in the comment, use Pattern.MULTILINE.
Then, to truncate the string from the beginning of the match, use .substring():
final Pattern p = Pattern.compile("^Please\\s+match\\b.*?this",
Pattern.MULTILINE);
final Matcher m = p.matcher(input);
return m.find() ? input.substring(0, m.start()) : input;
Remove DOTALL to make sure to match on a single line and convert \s to " "
Pattern p = Pattern.compile("^(Please( )(match)( )(.*?) this (.*))$");
DOTALL makes a dot match newlines as well
\s can match any whitespace including new lines.

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

regex pattern - extract a string only if separated by a hyphen

I've looked at other questions, but they didn't lead me to an answer.
I've got this code:
Pattern p = Pattern.compile("exp_(\\d{1}-\\d)-(\\d+)");
The string I want to be matched is: exp_5-22-718
I would like to extract 5-22 and 718. I'm not too sure why it's not working What am I missing? Many thanks
Try this one:
Pattern p = Pattern.compile("exp_(\\d-\\d+)-(\\d+)");
In your original pattern you specified that second number should contain exactly one digit, so I put \d+ to match as more digits as we can.
Also I removed {1} from the first number definition as it does not add value to regexp.
If the string is always prefixed with exp_ I wouldn't use a regular expression.
I would:
replaceFirst() exp_
split() the resulting string on -
Note: This answer is based on the assumptions. I offer it as a more robust if you have multiple hyphens. However, if you need to validate the format of the digits then a regular expression may be better.
In your regexp you missed required quantifier for second digit \\d. This quantifier is + or {2}.
String yourString = "exp_5-22-718";
Matcher matcher = Pattern.compile("exp_(\\d-\\d+)-(\\d+)").matcher(yourString);
if (matcher.find()) {
System.out.println(matcher.group(1)); //prints 5-22
System.out.println(matcher.group(2)); //prints 718
}
You can use the string.split methods to do this. Check the following code.
I assume that your strings starts with "exp_".
String str = "exp_5-22-718";
if (str.contains("-")){
String newStr = str.substring(4, str.length());
String[] strings = newStr.split("-");
for (String string : strings) {
System.out.println(string);
}
}

Categories