I have to remove "OR" if it ends with in a given string.
public class StringReplaceTest {
public static void main(String[] args) {
String text = "SELECT count OR %' OR";
System.out.println("matches:" + text.matches("OR$"));
Pattern pattern = Pattern.compile("OR$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Found match at: " + matcher.start() + " to " + matcher.end());
System.out.println("substring:" + text.substring(matcher.start(), matcher.end()));
text = text.replace(text.substring(matcher.start(), matcher.end()), "");
System.out.println("after replace:" + text);
}
}
}
Output:
matches:false
Found match at: 19 to 21
substring:OR
after replace:SELECT count %'
Its removing all the occurrences of the string "OR" but I have to remove if its ends with only.
How to do that ?
Also regex is working with Pattern but not working with String.matches().
What is the difference between both and what is the best way to remove a string if it ends with ?
text.matches(".*OR$") as the match goes over the entire string.
Or:
if (text.endsWith("OR"))
Or:
text = text.replaceFirst(" OR$", "");
If you need to just remove the last OR, then I suggest using substring method as it is faster than a full regex pattern. In that case, you can remove the OR using this code:
text.substring(0, text.lastIndexOf("OR"));
If you need to replace OR by something else, you will need to use this code which detects the last OR with a break in the string.
text.replaceFirst("\\bOR$", "SOME");
Related
I have a problem with not working REGEX. I dont know what I am doing wrong. My code:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("\\btimetable:(.*);");
//also tried "timetable:(.*);" and "(\\btimetable:)(.*)(;)"
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("S:" + m.start() + ", E:" + m.end());
System.out.println("x: "+ test.substring(m.start(), m.end()));
}
Expected result:
(1) "timetable:xxxxxtimetable:"
(2) "timetable: fullihhghtO"
I thanks for any help.
A non-capturing group could be handy in our case:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("(?:\\btimetable:(.*?);)+"); // <-- here
Matcher m = p.matcher(test);
int i = 1;
while (m.find()) {
System.out.println(i + ") "+ m.group(1));
i++;
}
OUTPUT
1) xxxxxtimetable:
2) fullihhghtO
Regex explained:
(?:\\btimetable:(.*?);)+ by using the non-capturing (?:\\btimetable:...) we'll consume the "timetable:" without capturing it, then the second matching group (.*?) captures what we want to capture (everything between \btimetable: and ;). Pay special attention to the non-greedy term: .*? which means that we'll consume the minimum possible amount of characters until the ;. If we won't use this lazy form, the regex will use "greedy" default mode and will consume all the characters until the last ; in the string!
Now, all that is relevant if you wanted to catch only the unique part, but if you wanted to catch the whole thing:
1) timetable:xxxxxtimetable:;
2) timetable: fullihhghtO;
It can be done easily by modifying the line with the regex to:
Pattern p = Pattern.compile("\\b(timetable:.*?;)+");
which is even simpler: only one capturing group (see that we still have to use the non-greedy mode!).
You don't need to use regex, a simple split would do it :
public static void main(String[] args) throws IOException {
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
String[] array = test.split(";");
String str1 = array[0].trim();
String str2 = array[1].trim();
System.out.println(str1 + "\n" + str2); //timetable:xxxxxtimetable:
//timetable: fullihhghtO
}
i want to print out the position of the second occurrence of zip in text, or -1 if it does not occur at least twice.
public class UdaciousSecondOccurence {
String text = "all zip files are zipped";
String text1 = "all zip files are compressed";
String REGEX = "zip{2}"; // atleast two occurences
protected void matchPattern1(){
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(text);
while(m.find()){
System.out.println("start index p" +m.start());
System.out.println("end index p" +m.end());
// System.out.println("Found a " + m.group() + ".");
}
output for matchPattern1()
start index p18
end index p22
But it does not print anything for pattern text1 - i have used a similar method for second pattern -
text1 does not match the regex zip{2}, therefore the while loop never iterates because there are no matches.
The expression is attempting to match the literal zipp, which is contained in text but not text1. regexr
If you want to match the second occurrence, I would recommend using a capture group: .*zip.*?(zip)
Example
String text = "all zip files are zip";
String text1 = "all zip files are compressed";
String REGEX = ".*zip.*?(zip)";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(text);
if(m.find()){
System.out.println("start index p" + m.start(1));
System.out.println("end index p" + m.end(1));
}else{
System.out.println("Match not found");
}
Use the below code it may work for you
public class UdaciousSecondOccurence {
String text = "all zip files are zipped";
String text1 = "all zip files are compressed";
String REGEX = "zip{2}"; // atleast two occurences
protected void matchPattern1(){
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(text);
if(m.find()){
System.out.println("start index p" +m.start());
System.out.println("end index p" +m.end());
// System.out.println("Found a " + m.group() + ".");
}else{
System.out.println("-1");
}
}
public static void main(String[] args) {
UdaciousSecondOccurence uso = new UdaciousSecondOccurence();
uso.matchPattern1();
}
}
If it must match twice, rather than using a while loop I would code it like this using regex "zip" (once, not twice):
if (m.find() && m.find()) {
// found twice, Matcher at 2nd match
} else {
// not found twice
}
p.s. text1 doesn't have two zips
zip{2} matches the string zipp -- the {2} applies only to the element immediately preceding. 'p'.
That is not what you want.
You probably just want to use zip as your regex, and leave the counting of occurrences to the code around it.
Why don't you just use String.indexOf twice?
String text = "all zip files are zipped";
String text1 = "all zip files are compressed";
int firstOccurrence = text.indexOf("zip");
int secondOccurrence = text.indexOf("zip", firstOccurrence + 1);
System.out.println(secondOccurrence);
firstOccurrence = text1.indexOf("zip");
secondOccurrence = text1.indexOf("zip", firstOccurrence + 1);
System.out.println(secondOccurrence);
Output
18
-1
The second time, statements inside while(m.find()) are never executed. because find() will not be able to find any match
You need one or 2 pattern matching. Try with regex zip{1,2},
String REGEX = "zip{1,2}";
There could be two reasons:
1st: Text1 doesn't contain two 'zip'.
2nd: You need to add the piece of code that would print '-1' upon finding no match. e.g. if m.find = true then print index
else print -1
I need to cut the tail of the string in some cases - I have done this with indexOf and substring, but it slowed my code(( I have thought about regular expressions but this tails have only similar beginnings - this is not "stable" word
For example I have such string
aaaaa bbb cc (bb) (r-1hh)
and I need a result
aaaaa bbb cc (bb)
but there also could be such string
aaaaa bbb cc (bb) (r3-34fff)
or
aaaaa bbb cc (bb) [tagBB- na]
So, the question is - could I use regex to find an index of tail ?
The other question - is IndexOf or Substring uses regex in java?
How to find regex match position:
Pattern p = Pattern.compile("i.*t");
String s = "my input string";
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("match begins at " + m.start()); // 3
System.out.println("match ends at " + m.end()); // 11
} else {
System.out.println("no match found");
}
But you can remove trailing text this way:
String res = s.replaceFirst("^(.* input).*", "$1");
System.out.println("'" + res + "'");
Or use an exact match without escaping each special char this way:
String res = s.replaceFirst("^(.* " + Pattern.quote("^something$wierd^") + ").*", "$1");
System.out.println("'" + res + "'");
You may write a regex which contains anything but ) and ends on ), so you avoid matching anything after the first ).
You could use $ to match the end of the string and then find a common pattern for your tail. Is it always going to be an alphanumeric/dash/space character situated between [] or ()? Then that's your pattern.
Then just substring everything between the beginning of your initial string and the beginning of the substring you found using the pattern for the tail.
You asked:
Can regex be used to find the index of the String?
You can use a Pattern and Matcher to acheive this.
Just noticed someone else has commented this so I won't give an example.
Do the String methods IndexOf or Substring use regex in Java?
No, String in java uses Character parsing. You can see the Javadoc or source for more detail on this.
You can acheive this with Java fairly easily, this example may be similar to your existing implementation:
public String truncate(String str, String tail) {
int lengthOfTail = tail.length();
int indexOfTail = str.indexOf(tail);
return str.substring(0, indexOfTail + lengthOfTail);
}
(error handling omitted for clarity)
I'm trying to write a function that extracts each word from a sentence that contains a certain substring e.g. Looking for 'Po' in 'Porky Pork Chop' will return Porky Pork.
I've tested my regex on regexpal but the Java code doesn't seem to work. What am I doing wrong?
private static String foo()
{
String searchTerm = "Pizza";
String text = "Cheese Pizza";
String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b";
Pattern pattern = Pattern.compile ( sPattern );
Matcher matcher = pattern.matcher ( text );
if(matcher.find ())
{
String result = "-";
for(int i=0;i < matcher.groupCount ();i++)
{
result+= matcher.group ( i ) + " ";
}
return result.trim ();
}else
{
System.out.println("No Luck");
}
}
In Java to pass \b word boundaries to regex engine you need to write it as \\b. \b represents backspace in String object.
Judging by your example you want to return all words that contains your substring. To do this don't use for(int i=0;i < matcher.groupCount ();i++) but while(matcher.find()) since group count will iterate over all groups in single match, not over all matches.
In case your string can contain some special characters you probably should use Pattern.quote(searchTerm)
In your code you are trying to find "Pizza" in "Cheese Pizza" so I assume that you also want to find strings that same as searched substring. Although your regex will work fine for it, you can change your last part (.+?)?) to \\w* and also add \\w* at start if substring should also be matched in the middle of word (not only at start).
So your code can look like
private static String foo() {
String searchTerm = "Pizza";
String text = "Cheese Pizza, Other Pizzas";
String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b";
StringBuilder result = new StringBuilder("-").append(searchTerm).append(": ");
Pattern pattern = Pattern.compile(sPattern);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
result.append(matcher.group()).append(' ');
}
return result.toString().trim();
}
While the regex approach is certainly a valid method, I find it easier to think through when you split the words up by whitespace. This can be done with String's split method.
public List<String> doIt(final String inputString, final String term) {
final List<String> output = new ArrayList<String>();
final String[] parts = input.split("\\s+");
for(final String part : parts) {
if(part.indexOf(term) > 0) {
output.add(part);
}
}
return output;
}
Of course it is worth nothing that doing this will effectively be doing two passes through your input String. The first pass to find the characters that are whitespace to split on, and the second pass looking through each split word for your substring.
If one pass is necessary though, the regex path is better.
I find nicholas.hauschild's answer to be the best.
However if you really wanted to use regex, you could do it as such:
String searchTerm = "Pizza";
String text = "Cheese Pizza";
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm)
+ "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Pizza
The pattern should have been
String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b";
You want to capture the whole (pizza)string.?: ensures you don't capture a part of the string twice.
Try this pattern:
String searchTerm = "Po";
String text = "Porky Pork Chop oPod zzz llPo";
Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+");
Matcher m = p.matcher(myString);
while(m.find()) {
System.out.println(">> " + m.group());
}
Ok, I give you a pattern in raw style (not java style, you must double escape yourself):
(?i)\b[a-z]*po[a-z]*\b
And that's all.
I am a new to Java. I want to search for a string in text file. Suppose the file contains:
Hi, I am learning Java.
I am using this below pattern to search through every exact word.
Pattern p = Pattern.compile("\\b"+search string+"\\b", Pattern.CASE_INSENSITIVE);
It works fine but it doesn't find "java." How to find both patterns. i.e with boundary symbols and with "." at end of the string. Does anyone have any ideas on how I can solve this problem?
You should parse your search string in order to change the dot . into a RegEx dot: \\.. Note that a single dot is a metacharacter in Regular Expressions and means any character. For example, you can replace all the dots in your String for \\.
If you don't want to do all that job, then just send java\\. as your search string
More info:
Using Regular Expressions in Java
Java Regex Tutorial
Java Regular Expressions
Code example:
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java";
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
It would print: 17 java
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java\\.";
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
It would print: 17 java. (note the dot in the end)
EDIT: As a very basic solution, since the only problem you have is with the dot, you can replace all the dots in your string with \\.
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java.";
//this will do the trick even if the "searchString" doesn't contain a dot inside
searchString = searchString.replaceAll("\\.", "\\.");
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
"\\b" + searchstring + "(?:\\.|\\b)"
If you want to stipulate that the dot must be followed by a non-word character or the end of the string, you could add a positive look-ahead
"\\b" + searchstring + "(?:\\.(?=\\W|$)|\\b)"
Pattern p = Pattern.compile(".*\\W*" + searchWord + "\\W*.*", Pattern.CASE_INSENSITIVE);
To be absolutely sure, the above says "find me a bit of text that starts with 0 or more characters, followed by 0 or more non-word characters specifically (\W* - the word boundary) followed by the search word, followed by the next word boundary followed by anything else".
This will caters for situations where the search word is at the beginning of the file, at the very end, or between punctuation eg: "hi,I am learning,java.".
Hope this helps...