I want to find and replace a substring beginning with string 'sps.jsp' and ending with substring 'FILE_ARRAY_INDEX=12'.
Following is my string content
beginning with strings............[sps.jsp]..anything between.. [FILE_ARRAY_INDEX=12] ending with some strings....
Below is my code
Pattern r = Pattern.compile("sps.jsp[\\s\\S]*?FILE_ARRAY_INDEX=12");
Matcher m = r.matcher(InputStr);
if (m.find( ))
{
System.out.println("Found value: " + m.group() );
}
I'm not able to get my pattern and replace it with a new string.
All you need is to String::replaceAll with this regex sps.jsp(.*?)FILE_ARRAY_INDEX=12
String inputStr = "....";//your input
inputStr = inputStr.replaceAll("sps.jsp(.*?)FILE_ARRAY_INDEX=12", "[some string]");
Outputs
beginning with strings............[some string] ending with some strings....
Related
How to tokenize an String like in lexer in java?
Please refer to the above question. I never used java regex . How to put the all substring into new string with matched characters (symbols like '(' ')' '.' '<' '>' ") separated by single space . for e.g. before regex
String c= "List<String> uncleanList = Arrays.asList(input1.split("x"));" ;
I want resultant string like this .
String r= " List < String > uncleanList = Arrays . asList ( input1 . split ( " x " ) ) ; "
Referring to the code that you linked to, matcher.group() will give you a single token. Simple use a StringBuilder to append this token and a space to get a new string where the tokens are space-separated.
String c = "List<String> uncleanList = Arrays.asList(input1.split(\"x\"));" ;
Pattern pattern = Pattern.compile("\\w+|[+-]?[0-9\\._Ee]+|\\S");
Matcher matcher = pattern.matcher(c);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String token = matcher.group();
sb.append(token).append(" ");
}
String r = sb.toString();
System.out.println(r);
String c = "List<String> uncleanList = Arrays.asList(input1.split('x'));";
Matcher matcher = Pattern.compile("\\<|\\>|\\\"|\\.|\\(|\\)").matcher(c);
while(matcher.find()){
String symbol = matcher.group();
c = c.replace(symbol," " + symbol + " ");
}
Actually if you look deeply You can figure out that you have to separate only not alphabet symbols and space ((?![a-zA-Z]|\ ).)
How to extract the strings between the delimiters '<' and '>' from the string
“Rahul<is>an<entrepreneur>”
I tried using substring() method, but I could only extract one string out of the primary string. How to loop this and get all the strings between the delimiters from the primary string
You could use Pattern and Matcher for pattern lookup. For example, see code below:
String STR = "Rahul<is>an<entrepreneur>";
Pattern pattern = Pattern.compile("<(.*?)>", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(STR);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.end() + " " + matcher.group());
}
Output of above will give you start and end indexes and group substring:
5 9 <is>
11 25 <entrepreneur>
More specifically if you just want the strings, you can get string between the group start and end indexes.
STR.substring(matcher.start() + 1, matcher.end() - 1);
This gives you only the matching strings.
This worked for me:
String str = "Rahul<is>an<entrepreneur>";
String[] tempStr = str.split("<");
for (String st : tempStr) {
if (st.contains(">")) {
int index = st.indexOf('>');
System.out.println(st.substring(0, index));
}
}
Output:
is
entrepreneur
I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22
I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true
For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";
Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";
If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true
Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";
I am trying to use regex to remove nbsp; from my string . Following is the program.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final StringBuffer testRegex =
new StringBuffer("<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font>" +
"<BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>Test</p><strong>" +
"<FONT color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
"<br><p>TestTest</p><br><BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ffcc66\">TestTestTestTestTest</font><br>" +
"<p>TestTestTestTest</p></blockquote><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003333\">TestTestTest</font></p><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003399\">TestTest</font></p><p> </p>");
//"This is test<P>Tag Tag</P>";
public static void main(String[] args) {
System.out.println("***Testing***");
String temp = checkRegex(testRegex);
System.out.println("***FINAL = "+temp);
}
private static String checkRegex(StringBuffer sample){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(sample);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String group = matcher.group();
System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);
String substring = sample.substring(start, end);
System.out.println(" Substring = "+substring);
String replacedSubString = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubString);
sample.replace(start, end, replacedSubString);
System.out.println(" NEW SAMPLE = "+sample);
}
System.out.println("********WHILE OVER ********");
return sample.toString();
}
}
I am getting java.lang.StringIndexOutOfBoundsException at line while (matcher.find()). I am currently using java Pattern and Matcher to find nbsp; and the replace it with " ". Does anyone know what causes this ? What should I do to remove the extra nbsp; from my string ?
Thanks
Use matcher.reset(); after sample.replace(start, end, replacedSubString);
This is because when you replace the string sample, the end would point to an invalid position.So,you need to use matcher.reset(); after every replace.
For example if start is 0 and end is 5 and when you replace with ,the end would point to an invalid position and then find method would throw a StringIndexOutOfBoundsException exception if end points to position outside the string length.
If string is huge,reset can cause a major performance bottleneck because reset would again start matching from beginning.You can instead use
matcher.region(start,sample.length());
This would start matching from the last matched position!
You need to create a new StringBuffer to hold the replaced string, then use appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) methods in Matcher class to do the replacement. There is probably way to do this in-place, but the approach above is the most straight-forward way to do this.
This is your checkRegex method re-written:
private static String checkRegex(String inputString){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(inputString);
// Create a new StringBuffer to hold the string after replacement
StringBuffer replacedString = new StringBuffer();
while (matcher.find()) {
// matcher.group() returns the substring that matches the whole regex
String substring = matcher.group();
System.out.println(" Substring = "+substring);
String replacedSubstring = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubstring);
// appendReplacement is a clean approach to append the text which comes
// before a match, and append the replacement text for the matched text
// Note that appendReplacement will interpret $ in the replacement string
// with special meaning (for referring to text matched by capturing group).
// Matcher.quoteReplacement is necessary to provide a literal string as
// replacement
matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));
System.out.println(" NEW SAMPLE = "+replacedString);
}
// appendTail is used to append the text after the last match to the
// replaced string.
matcher.appendTail(replacedString);
System.out.println("********WHILE OVER ********");
return replacedString.toString();
}
// change the group and it is source string is automatically updated
There is no way what so ever to change any string in Java, so what you're asking for is impossible.
To remove or replace a pattern with a string can be achieved with a call like
someString = someString.replaceAll(toReplace, replacement);
To transform the matched substring, as seems to be indicated by your line
m.group().replaceAll("something","");
The best solution is probably to use a StringBuffer for the result
Matcher.appendReplacement and Matcher.appendTail.
Example:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"