Replace String in Java with regex and replaceAll

Replace String in Java with regex and replaceAll - java

Is there a simple solution to parse a String by using regex in Java?
I have to adapt a HTML page. Therefore I have to parse several strings, e.g.:
href="/browse/PJBUGS-911"
=>
href="PJBUGS-911.html"
The pattern of the strings is only different corresponding to the ID (e.g. 911). My first idea looks like this:
String input = "";
String output = input.replaceAll("href=\"/browse/PJBUGS\\-[0-9]*\"", "href=\"PJBUGS-???.html\"");
I want to replace everything except the ID. How can I do this?
Would be nice if someone can help me :)

You can capture substrings that were matched by your pattern, using parentheses. And then you can use the captured things in the replacement with $n where n is the number of the set of parentheses (counting opening parentheses from left to right). For your example:
String output = input.replaceAll("href=\"/browse/PJBUGS-([0-9]*)\"", "href=\"PJBUGS-$1.html\"");
Or if you want:
String output = input.replaceAll("href=\"/browse/(PJBUGS-[0-9]*)\"", "href=\"$1.html\"");

This does not use regexp. But maybe it still solves your problem.
output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";

This is how I would do it:
public static void main(String[] args)
{
String text = "href=\"/browse/PJBUGS-911\" blahblah href=\"/browse/PJBUGS-111\" " +
"blahblah href=\"/browse/PJBUGS-34234\"";
Pattern ptrn = Pattern.compile("href=\"/browse/(PJBUGS-[0-9]+?)\"");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find())
{
String match = mtchr.group(0);
String insMatch = mtchr.group(1);
String repl = match.replaceFirst(match, "href=\"" + insMatch + ".html\"");
System.out.println("orig = <" + match + "> repl = <" + repl + ">");
}
}
This just shows the regex and replacements, not the final formatted text, which you can get by using Matcher.replaceAll:
String allRepl = mtchr.replaceAll("href=\"$1.html\"");
If just interested in replacing all, you don't need the loop -- I used it just for debugging/showing how regex does business.

Related

regular expression to replaceall substrings embedded in open curling brackets and followed by equal sign and digits

In the follwing String
String toBeFormatted= "[[LngLatAlt{longitude=-7.125924901999952, latitude=33.831783175000055, altitude=NaN},
LngLatAlt{longitude=-5.401396163999948, latitude=35.92213140900003, altitude=NaN}]]"
1- I need to replace all "LngLatAlt{longitude=" with open bracket "["
2- also need to replace all the intermediate ", latitude=33.831783175000055, altitude=NaN}" with ",33.831783175000055]"
That way my string result :
"[[[-7.125924901999952,33.831783175000055],[-5.401396163999948,35.92213140900003]]]"
try it the following reg exp :
String regexTarget = "(\\[\\[LngLatAlt\\{longitude=)";
toBeFormatted.replaceAll(regexTarget, "\\[\\[\\[");
String regexTarget0 = "(, altitude=NaN\\}, LngLatAlt\\{longitude=)";
toBeFormatted.replaceAll(regexTarget0, "],\\[");
String regexTarget1 = "(, latitude=)";
toBeFormatted.replaceAll(regexTarget1, " ,");
String regexTarget2 = "(, altitude=NaN\\})";
toBeFormatted.replaceAll(regexTarget2, "]");
but it seems not working.
Thank you for your help.

try something like:
String result = toBeFormatted.replaceAll("LngLatAlt\\{longitude=([^,]+), latitude=([^,]+), ([^}]+)\\}", "[$1, $2]");
System.out.println(result);

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}

You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters

You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>

To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);

Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

Need help to form a regex in java

I want to find a regx and occurrences of it in the page source using language Java. The value I am trying to search is as given in the program below.
There might be one or more spaces between tags. I am not able to form a regx for this value. Can some one please help me to find the regx for this value?
My program which checks regx is as given below-
String regx=""<img height=""1"" width=""1"" style=""border-style:none;"" alt="""" src=""//api.adsymptotic.com/api/s/trackconversion?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel""/>";
WebDrive driver = new FirefoxDriver();
driver.navigate().to("abc.xom");
int count=0, found=0;
source = driver.getPageSource();
source = source.replaceAll("\\s+", " ").trim();
pattern = Pattern.compile(regx);
matcher = pattern.matcher(source);
while(matcher.find())
{
count++;
found=1;
}
if(found==0)
{
System.out.println("Maximiser not found");
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Fail";
}
else
{
System.out.println("Maximiser is found" + count);
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Pass";
}
count=0; found=0;

Hard to tell without the original text and expected result, but your Pattern clearly won't compile as is.
You should single-escape double quotes (\") and double-escape special characters (i.e. \\?) for your code and your Pattern to compile.
Something in the lines of:
String regx="<img height=\"1\" width=\"1\" style=\"border-style:none;\" " +
"alt=\"\" src=\"//api.adsymptotic.com/api/s/trackconversion" +
"\\?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b" +
"&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel\"/>";
Also consider scraping markup with appropriate framework (i.e. JSoup for HTML) instead of regex.

I want to check if a word or a set of words exists in a String

My requirement is to check if a group of words or a single word is present in a larger string. I tried using String.contains() method but this fails in case the larger string has new line character. Currently I am using a regex mentioned below. But this works for only one word. The searched text is a user entered value and can contain more than one word. This is an android application.
String regex = ".*.{0}" + searchText + ".{0}.*";
Pattern pattern = Pattern.compile(regex);
pattern.matcher(largerString).find();
Sample String
String largerString ="John writes about this, and John writes about that," +
" and John writes about everything. ";
String searchText = "about this";

Why not just replace line breaks with spaces, and on top of that, convert it all to lower case?
String s = "hello";
String originalString = "Does this contain \n Hello?";
String formattedString = originalString.toLowerCase().replace("\n", " ");
System.out.println(formattedString.contains(s));
Edit: Thinking about it, I don't really understand how line breaks make a difference...
Edit 2: I was right. Line breaks don't matter.
String s = "hello";
String originalString = "Does this contain \nHello?";
String formattedString = originalString.toLowerCase();
System.out.println(formattedString.contains(s));

here is code not using regex.
String largerString = "John writes about this, and John writes about that," +" and John writes about everything. ";
String searchText = "about this";
Pattern pattern = Pattern.compile(searchText);
Matcher m = pattern.matcher(largerString);
if(m.find()){
System.out.println(m.group().toString());
}
Result:
about this
I hope it will help you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace String in Java with regex and replaceAll - java

This does not use regexp. But maybe it still solves your problem. output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";

Related

regular expression to replaceall substrings embedded in open curling brackets and followed by equal sign and digits

Two separate patterns and matchers (java)

complex regular expression in Java

Need help to form a regex in java

I want to check if a word or a set of words exists in a String

Categories

Resources