What should be the Regex Pattern - java

Hi I have a String like "ANCBTH2016100931011730300000458" which always start with ANCBTH followed by Numbers.
What can be a regex to match a word which may or may not have spaces and the position of space is also not fixed.
Example:
ANCBTH2016100931011730300000458
ANCBTH 2016100931011730300000458
ANCBTH 20161009 31011730300000458
I would like to have a regex which satisfied all above examples.

You can test this regex: ANCBTH((?:\s?\d+)*)
To test: regex101

Why don't you remove all the spaces from incoming string e.g.
String yourString = "ANCBTH 20161009 31011730300000458";
yourString.replaceAll("\\s+","");
yourString will become like ANCBTH2016100931011730300000458
and then you may run a regex like
ANCBTH[0-9]*
and all of your string with spaces anywhere will pass this pattern.

Related

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.
It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")
Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc
You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

Trying to remove up to 2 digits following a match

Say I have a few string like Foo3,5bar, Foo14,5bar and Foo23,42bar
I want to remove the second number, following the comma, as well as the comma, using Java Regex.
So far, I've tried String.replaceAll("(?<=Foo\d{1,2}),\d{1,2}", ""), using (?<=Foo\d{1,2}),\d{1,2} as my regex, but it's not working.
Use String#replaceAll that has regex support:
String str = "Foo3,4HelloFoo5,3World";
str = str.replaceAll("(\\d),\\d+", "$1"); // Foo3HelloFoo5World
OR else if you want to restrict matching to max 2 digits after comma then use:
str = str.replaceAll("(\\d),\\d{1,2}", "$1"); // Foo3HelloFoo5World
Live Demo: http://ideone.com/5P1guJ
str = str.replaceFirst(",\\d+$")
what about Integer.valueOf(str.substring(string.lastIndexOf(",")+1));
is regex a necessity ?
Your regex is almost correct. You forget to escape the \ in the regex. The correct one is:
(?<=Foo\\d{1,2}),\\d{1,2}
Note the \\ instead of \.
See https://ideone.com/W7IuT1 for a demo on your strings.

java Regex - split but ignore text inside quotes?

using only regular expression methods, the method String.replaceAll and ArrayList
how can i split a String into tokens, but ignore delimiters that exist inside quotes?
the delimiter is any character that is not alphanumeric or quoted text
for example:
The string :
hello^world'this*has two tokens'
should output:
hello
worldthis*has two tokens
I know there is a damn good and accepted answer already present but I would like to add another regex based (and may I say simpler) approach to split the given text using any non-alphanumeric delimiter which not inside the single quotes using
Regex:
/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/
Which basically means match a non-alphanumeric text if it is followed by even number of single quotes in other words match a non-alphanumeric text if it is outside single quotes.
Code:
String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);
Output:
[hello, world'this*has two tokens', 2ndToken]
Demo:
Here is a live working Demo of the above code.
Use a Matcher to identify the parts you want to keep, rather than the parts you want to split on:
String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
See it working online: ideone
You cannot in any reasonable way. You are posing a problem that regular expressions aren't good at.
Do not use a regular expression for this. It won't work. Use / write a parser instead.
You should use the right tool for the right task.

regex to find substring between special characters

I am running into this problem in Java.
I have data strings that contain entities enclosed between & and ; For e.g.
&Text.ABC;, &Links.InsertSomething;
These entities can be anything from the ini file we have.
I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.
I am trying to use regex to pattern match and failing.
Can anyone suggest the regex for this problem?
Thanks!
Here is the regex:
"&[A-Za-z]+(\\.[A-Za-z]+)*;"
It starts by matching the character &, followed by one or more letters (both uppercase and lower case) ([A-Za-z]+). Then it matches a dot followed by one or more letters (\\.[A-Za-z]+). There can be any number of this, including zero. Finally, it matches the ; character.
You can use this regex in java like this:
Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");
Or just
"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");
If you want to remove whitespaces after the matched tokens, you can use this re:
"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace
And there is a nice online regular expression tester which uses the java regexp library.
http://www.regexplanet.com/simple/index.html
You can try:
input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");
See it

Problem replacing words using [^a-zA-Z] regex

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?
[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.
String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

Categories