Java regex: Replacing dynamic substrings

Java regex: Replacing dynamic substrings - java

Suppose I have a String containing static tags that looks like this:
mystring = "[tag]some text[/tag] untagged text [tag]some more text[/tag]"
I want to remove everything between each tag pair. I've figured out how to do so by using the following regex:
mystring = mystring.replaceAll("(?<=\\[tag])(.*?)(?=\\[/tag])", "");
The result of which will be:
mystring = "[tag][/tag] untagged text [tag][/tag]"
However, I'm unsure how to accomplish the same goal if the opening tag is dynamic. Example:
mystring = "[tag parameter="123"]some text[/tag] untagged text [tag parameter="456"]some more text[/tag]"
The "value" of the parameter portion of the tag is dynamic. Somehow, I have to introduce a wildcard to my current regex, but I am unsure how to do this.
Essentially, replace the contents of all pairings of "[tag*]" and "[/tag]" with empty string.
An obvious solution would be to do something like this:
mystring = mystring.replaceAll("(?<=\\[tag)(.*?)(?=\\[/tag])", "");
However, I feel like that would be hacking around the problem because I'm not really capturing a full tag.
Could anyone provide me with a solution to this problem? Thanks!

I guess I've got it.
I thought long and hard about what #AshishMathew said, and yeah, lookbehinds can't have unfixed, lengths, but maybe instead of replacing it with nothing, we add a ] to it, like so:
mystring = mystring.replaceAll("(?<=\\[tag)(.*?)(?=\\[/tag])", "]");
(?<=\\[tag) is the look-behind which matches [tag
(.*?) is all the code between [tag and [/tag], which may even be the parameters of the tag, all of which is replaced by a ]
When I tried this code by replacing the match with "", I got [tag[/tag] untagged text [tag[/tag] as the output. Hence, by replacing the match with a ] instead of nothing, you get the (hopefully) desired output.
So this is my lazy solution (pardon the regex pun) to the problem.

I suggest matching the whole tag with content and replacing with the opening/closing tags without content :
mystring.replaceAll("\\[tag[^\\]]*\\][^\\[]*\\[/tag]", "[tag][/tag]")
Ideone test.
Note that I didn't bother conserving the tag attributes since you mentionned in another answer's comments that you didn't need them, but they could be kept by using a capturing group.

Related

Java remove dynamic substring from string

I need to remove dynamic substring from string. There is a few similar topic of this theme, but noone of them helped me. I have a string e.g.:
product test1="001" test2="abc" test3="123xzy"
and i need output:
product test1="001" test3="123xzy"
I mean I need remove test2="abc". test2 is an unique element and can be placed anywhere in original string. "abc" is dynamic variable and can have various length. What is the fastest and the most elegant solution of this problem? Thx

You can use a regular expression:
String input = "product test1=\"001\" test2=\"abc\" test3=\"123xzy\"";
String result = input.replaceAll("test2=\".*?\"\\s+", "");
In substance: find a substring like test2="xxxxxx", optionally followed by some spaces (\\s+) and replace it with nothing.

Replace part of string with known beginning and end

I get some string from server with known and unknow parts. For example:
<simp>example1</simp><op>example2</op><val>example2</val>
I do not wish to parse XML or any use of parsing. What I wish to do is replace
<op>example2</op>
with empty string ("") which string will look like:
<simp>example1</simp><val>example2</val>
What I know it start with op (in <>) and ends with /op (in <>) but the content (example2) may vary.
Can you give me pointer how accomplish this?

You can use regex. Something like
<op>[A-Za-z0-9]*<\/op>
should match. But you can adapt it so that it fits your requirements better. For example if you know that only certain characters can be shown, you can change it.
Afterwards you can use the String#replaceAll method to remove all matching occurrences with the empty string.
Take a look here to test the regex: https://regex101.com/r/WhPIv4/3
and here to check the replaceAll method that takes the regex and the replacement as a parameter: https://developer.android.com/reference/java/lang/String#replaceall

You can try
str.replace(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
To remove all, use replaceAll()
str.replaceAll(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
I tried sample,
String str="<simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val>";
Log.d("testit", str.replaceAll(str.substring(str.indexOf("<op>"), str.indexOf("</op>") + 5), ""));
And the log output was
D/testit: <simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val>
Edit
As #Elsafar said , str.replaceAll("<op>.*?</op>", "") will work.

Use like this:
String str = "<simp>example1</simp><op>example2</op><val>example2</val>";
String garbage = str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5).trim();
String newString = str.replace(garbage,"");

I combined all the answers and eventually used:
st.replaceAll("<op>.*?<\\/op>","");
Thank you all for the help

How to Remove Special Character Except Comma

I have an Output in my Android EditText like below :
["HOT","SMALL"]
I want my Output like below :
HOT,SMALL
I want to remove [] and "" but not the Comma , . I have read this but its not work. I tried this but this one remove all Special Chars. Anybody can help my problem, any suggest will helpfull for me. Thanks Before.

There are a couple of ways I'd do this.
The first, quick and straight forward is to just replace all the special characters with "", using a regex and String.replaceAll
myString.replaceAll("[\\\"\\[\\]]", "");
(Btw, I used http://rubular.com/ as a quick way to check my regex. Remember that the regex needs to be escaped for java - I used this tool to do that.)
The alternative is that you're actually looking at the String representation of a JSON object here, so convert the JSON string into a Java array of Strings using something like org.json, and then concatenate the strings together with a , delimiter.

Java Inner Text (getTextContents()) Problem

I'm trying to do some parsing in Java and I'm using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like
"\n\n\nValue\n-\nValue\n\n\n"
Is there a built in way to get rid of the line breaks? I would like to do a RegEx like
(?:\s*([^-]+)\s*-\s*([^-]+)\s*)
on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.
Example Input:
Value
-
Value
Thanks

You can use String.replaceAll().
String trimmed = original_string.replaceAll("\n", "");
The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "") for instance.

I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:
String s = node.getTextContents().replaceAll("\\s","");
If you just want to get rid of the leading/trailing whitespace, use trim().

replacing regex in java string

I have this java string:
String bla = "<my:string>invalid_content</my:string>";
How can I replace the "invalid_content" piece?
I know I should use something like this:
bla.replaceAll(regex,"new_content");
in order to have:
"<my:string>new_content</my:string>";
but I can't discover how to create the correct regex
help please :)

You could do something like
String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");

Mark's answer will work, but can be improved with two simple changes:
The central parentheses are redundant if you're not using that group.
Making it non-greedy will help if you have multiple my:string tags to match.
Giving:
String ResultString = SubjectString.replaceAll
( "(<my:string>).*?(</my:string>)" , "$1whatever$2" );
But that's still not how I'd write it - the replacement can be simplified using lookbehind and lookahead, and you can avoid repeating the tag name, like this:
String ResultString = SubjectString.replaceAll
( "(?<=<(my:string)>).*?(?=</\1>)" , "whatever" );
Of course, this latter one may not be as friendly to those who don't yet know regex - it is however more maintainable/flexible, so worth using if you might need to match more than just my:string tags.

See Java regex tutorial and check out character classes and capturing groups.

The PCRE would be:
/invalid_content/
For a simple substitution. What more do you want?

Is invalid_content a fix value? If so you could simply replace that with your new content using:
bla = bla.replaceAll("invalid_content","new_content");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex: Replacing dynamic substrings - java

Related

Java remove dynamic substring from string

Replace part of string with known beginning and end

How to Remove Special Character Except Comma

Java Inner Text (getTextContents()) Problem

replacing regex in java string

Categories

Resources