delete all text after found word java - java

I need to cut the tail of the string in some cases - I have done this with indexOf and substring, but it slowed my code(( I have thought about regular expressions but this tails have only similar beginnings - this is not "stable" word
For example I have such string
aaaaa bbb cc (bb) (r-1hh)
and I need a result
aaaaa bbb cc (bb)
but there also could be such string
aaaaa bbb cc (bb) (r3-34fff)
or
aaaaa bbb cc (bb) [tagBB- na]
So, the question is - could I use regex to find an index of tail ?
The other question - is IndexOf or Substring uses regex in java?

How to find regex match position:
Pattern p = Pattern.compile("i.*t");
String s = "my input string";
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("match begins at " + m.start()); // 3
System.out.println("match ends at " + m.end()); // 11
} else {
System.out.println("no match found");
}
But you can remove trailing text this way:
String res = s.replaceFirst("^(.* input).*", "$1");
System.out.println("'" + res + "'");
Or use an exact match without escaping each special char this way:
String res = s.replaceFirst("^(.* " + Pattern.quote("^something$wierd^") + ").*", "$1");
System.out.println("'" + res + "'");

You may write a regex which contains anything but ) and ends on ), so you avoid matching anything after the first ).

You could use $ to match the end of the string and then find a common pattern for your tail. Is it always going to be an alphanumeric/dash/space character situated between [] or ()? Then that's your pattern.
Then just substring everything between the beginning of your initial string and the beginning of the substring you found using the pattern for the tail.

You asked:
Can regex be used to find the index of the String?
You can use a Pattern and Matcher to acheive this.
Just noticed someone else has commented this so I won't give an example.
Do the String methods IndexOf or Substring use regex in Java?
No, String in java uses Character parsing. You can see the Javadoc or source for more detail on this.
You can acheive this with Java fairly easily, this example may be similar to your existing implementation:
public String truncate(String str, String tail) {
int lengthOfTail = tail.length();
int indexOfTail = str.indexOf(tail);
return str.substring(0, indexOfTail + lengthOfTail);
}
(error handling omitted for clarity)

Related

Get substring between "first two" occurrences of a character

I have a String:
String thestra = "/aaa/bbb/ccc/ddd/eee";
Every time, in my situation, for this Sting, a minimum of two slashes will be present without fail.
And I am getting the /aaa/ like below, which is the subString between "FIRST TWO occurrences" of the char / in the String.
System.out.println("/" + thestra.split("\\/")[1] + "/");
It solves my purpose but I am wondering if there is any other elegant and cleaner alternative to this?
Please notice that I need both slashes (leading and trailing) around aaa. i.e. /aaa/
You can use indexOf, which accepts a second argument for an index to start searching from:
int start = thestra.indexOf("/");
int end = thestra.indexOf("/", start + 1) + 1;
System.out.println(thestra.substring(start, end));
Whether or not it's more elegant is a matter of opinion, but at least it doesn't find every / in the string or create an unnecessary array.
Scanner::findInLine returning the first match of the pattern may be used:
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(new Scanner(thestra).findInLine("/[^/]*/"));
Output:
/aaa/
Use Pattern and Matcher from java.util.regex.
Pattern pattern = Pattern.compile("/.*?/");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String match = matcher.group(0); // output
}
Pattern.compile("/.*?/")
.matcher(thestra)
.results()
.map(MatchResult::group)
.findFirst().ifPresent(System.out::println);
You can test this variant :)
With best regards, Fr0z3Nn
Every time, in my situation, for this Sting, minimum two slashes will be present
if that is guaranteed, split at each / keeping those delimeters and take the first three substrings.
String str = String.format("%s%s%s",(thestra.split("((?<=\\/)|(?=\\/))")));
You could also match the leading forward slash, then use a negated character class [^/]* to optionally match any character except / and then match the trailing forward slash.
String thestra = "/aaa/bbb/ccc/ddd/eee";
Pattern pattern = Pattern.compile("/[^/]*/");
Matcher matcher = pattern.matcher(thestra);
if (matcher.find()) {
System.out.println(matcher.group());
}
Output
/aaa/
One of the many ways can be replacing the string with group#1 of the regex, [^/]*(/[^/].*?/).* as shown below:
public class Main {
public static void main(String[] args) {
String thestra = "/aaa/bbb/ccc/ddd/eee";
String result = thestra.replaceAll("[^/]*(/[^/].*?/).*", "$1");
System.out.println(result);
}
}
Output:
/aaa/
Explanation of the regex:
[^/]* : Not the character, /, any number of times
( : Start of group#1
/ : The character, /
[^/]: Not the character, /
.*?: Any character any number of times (lazy match)
/ : The character, /
) : End of group#1
.* : Any character any number of times
Updated the answer as per the following valuable suggestion from Holger:
Note that to the Java regex engine, the / has no special meaning, so there is no need for escaping here. Further, since you’re only expecting a single match (the .* at the end ensures this), replaceFirst would be more idiomatic. And since there was no statement about the first / being always at the beginning of the string, prepending the pattern with either , .*? or [^/]*, would be a good idea.
I am surprised nobody mentioned using Path as of Java 7.
String thestra = "/aaa/bbb/ccc/ddd/eee";
String path = Paths.get(thestra).getName(0).toString();
System.out.println("/" + path + "/");
/aaa/
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(thestra.substring(0, thestra.indexOf("/", 2) + 1));

Remove a string if it ends within java

I have to remove "OR" if it ends with in a given string.
public class StringReplaceTest {
public static void main(String[] args) {
String text = "SELECT count OR %' OR";
System.out.println("matches:" + text.matches("OR$"));
Pattern pattern = Pattern.compile("OR$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Found match at: " + matcher.start() + " to " + matcher.end());
System.out.println("substring:" + text.substring(matcher.start(), matcher.end()));
text = text.replace(text.substring(matcher.start(), matcher.end()), "");
System.out.println("after replace:" + text);
}
}
}
Output:
matches:false
Found match at: 19 to 21
substring:OR
after replace:SELECT count %'
Its removing all the occurrences of the string "OR" but I have to remove if its ends with only.
How to do that ?
Also regex is working with Pattern but not working with String.matches().
What is the difference between both and what is the best way to remove a string if it ends with ?
text.matches(".*OR$") as the match goes over the entire string.
Or:
if (text.endsWith("OR"))
Or:
text = text.replaceFirst(" OR$", "");
If you need to just remove the last OR, then I suggest using substring method as it is faster than a full regex pattern. In that case, you can remove the OR using this code:
text.substring(0, text.lastIndexOf("OR"));
If you need to replace OR by something else, you will need to use this code which detects the last OR with a break in the string.
text.replaceFirst("\\bOR$", "SOME");

Java regex to match after start of previous match [duplicate]

How can I extract overlapping matches from an input using String.split()?
For example, if trying to find matches to "aba":
String input = "abababa";
String[] parts = input.split(???);
Expected output:
[aba, aba, aba]
String#split will not give you overlapping matches. Because a particular part of the string, will only be included in a unique index, of the array obtained, and not in two indices.
You should use Pattern and Matcher classes here.
You can use this regex: -
Pattern pattern = Pattern.compile("(?=(aba))");
And use Matcher#find method to get all the overlapping matches, and print group(1) for it.
The above regex matches every empty string, that is followed by aba, then just print the 1st captured group. Now since look-ahead is zero-width assertion, so it will not consume the string that is matched. And hence you will get all the overlapping matches.
String input = "abababa";
String patternToFind = "aba";
Pattern pattern = Pattern.compile("(?=" + patternToFind + ")");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(patternToFind + " found at index: " + matcher.start());
}
Output: -
aba found at index: 0
aba found at index: 2
aba found at index: 4
I would use indexOf.
for(int i = text.indexOf(find); i >= 0; i = text.indexOf(find, i + 1))
System.out.println(find + " found at " + i);
This is not a correct use of split(). From the javadocs:
Splits this string around matches of the given regular expression.
Seems to me that you are not trying to split the string but to find all matches of your regular expression in the string. For this you would have to use a Matcher, and some extra code that loops on the Matcher to find all matches and then creates the array.

Find a subtring in a string using a regular expression - JAVA

Suppose i have a string " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' "
I want to replace the substring a.b.c in the string which are only outside the single quote , but it is not working.
Here is my code
`
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
Pattern p = Pattern.compile("a\\.b\\.c");
Matcher m = p.matcher(str);
int x = m.find()
`
use this pattern : a\.b\.c(?=(([^']*'){2})*[^']*$) Demo
To search for a substring outside quotes, you can do something like this:
Pattern pat = Pattern.compile("^(?:[^']|'[^']*')*?a\\.b\\.c");
The first part will skip over:
every character that isn't a quote mark ([^']), or
every sequence of non-quote-mark characters enclosed in quotes ('[^']*').
Once those are skipped, then if it sees the pattern you want, it will know that it isn't inside quote marks.
This will handle a simple case. If things start getting more complicated, e.g. you want to allow \' to quote a quote mark in your input string the way C or Java does in a string literal, the regex starts getting more complicated, and you can quickly reach a point whether either your regex is unreadable or regexes aren't suitable solutions.
EDIT: fixed to put "reluctant" qualifier after second *, so that the first a.b.c will be found.
EDIT 2: If you want to replace the substring you find, it gets trickier. The above pattern matches the entire beginning of the string up through a.b.c, and I couldn't get a look-behind to work so that the match would be only the a.b.c part. I think you'll need to put the beginning of the string in a group, and then use $1 in the replacement string to copy the beginning:
Pattern pat = Pattern.compile("^((?:[^']|'[^']*')*?)a\\.b\\.c");
Matcher m = pat.matcher(source);
if (m.find()) {
result = m.replaceFirst("$1replacement");
}
I'm not sure replaceAll works with this, so if you want to replace all of them, you may need to loop.
I wouldn't mess with REGEX.
public static void main(String[] args) {
String str = " kk a.b.cjkmkc jjkocc a.b.c. jjj 'a.b.ckkkkkkkkkkkkkkkk ' ";
String[] s = str.split("'");
str = s[0].replaceAll("[abc]", "") + "'"+ s[1]+"'"
+ s[2].replaceAll("[abc]", "");
System.out.println(str);
}
OP:
kk ..jkmk jjko ... jjj 'a.b.ckkkkkkkkkkkkkkkk '
Inefficient.. but works

Retrieving Regex matched pattern

I need to retrieve a regex pattern matched strings from the given input.
Lets say, the pattern I need to get is like,
"http://mysite.com/<somerandomvalues>/images/<againsomerandomvalues>.jpg"
Now I created the following regex pattern for this,
http:\/\/.*\.mysite\.com\/.*\/images\/.*\.jpg
Can anybody illustrate how to retrieve all the matched pattern with this regx expression using Java?
You don't mask slashes but literal dots:
String regex = "http://(.*)\\.mysite\\.com/(.*)/images/(.*)\\.jpg";
String url = "http://www.mysite.com/work/images/cat.jpg";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
Result:
www
work
cat
Some simple Java example:
String my_regex = "http://.*.mysite.com/.*/images/.*.jpg";
Pattern pattern = Pattern.compile(my_regex);
Matcher matcher = pattern.matcher(string_to_be_matched);
// Check all occurance
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
In fact, it is not clear if you want the whole matching string or only the groups.
Bogdan Emil Mariesan's answer can be reduced to
if ( matcher.matches () ) System.out.println(string_to_be_matched);
because you know it is mathed and there are no groups.
IMHO, user unknown's answer is correct if you want to get matched groups.
I just want to add additional information (for others) that if you need matched group you can use replaceFirst() method too:
String firstGroup = string.replaceFirst( "http://mysite.com/(.*)/images/", "$1" );
But performance of Pattern.compile approach if better if there are two or more groups or if you need to do that multiple times (on the other hand in programming contests, for example, it is faster to write replaceFirst()).

Categories