replacing regex in java string

replacing regex in java string - java

I have this java string:
String bla = "<my:string>invalid_content</my:string>";
How can I replace the "invalid_content" piece?
I know I should use something like this:
bla.replaceAll(regex,"new_content");
in order to have:
"<my:string>new_content</my:string>";
but I can't discover how to create the correct regex
help please :)

You could do something like
String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");

Mark's answer will work, but can be improved with two simple changes:
The central parentheses are redundant if you're not using that group.
Making it non-greedy will help if you have multiple my:string tags to match.
Giving:
String ResultString = SubjectString.replaceAll
( "(<my:string>).*?(</my:string>)" , "$1whatever$2" );
But that's still not how I'd write it - the replacement can be simplified using lookbehind and lookahead, and you can avoid repeating the tag name, like this:
String ResultString = SubjectString.replaceAll
( "(?<=<(my:string)>).*?(?=</\1>)" , "whatever" );
Of course, this latter one may not be as friendly to those who don't yet know regex - it is however more maintainable/flexible, so worth using if you might need to match more than just my:string tags.

See Java regex tutorial and check out character classes and capturing groups.

The PCRE would be:
/invalid_content/
For a simple substitution. What more do you want?

Is invalid_content a fix value? If so you could simply replace that with your new content using:
bla = bla.replaceAll("invalid_content","new_content");

Related

Replace part of string with known beginning and end

I get some string from server with known and unknow parts. For example:
<simp>example1</simp><op>example2</op><val>example2</val>
I do not wish to parse XML or any use of parsing. What I wish to do is replace
<op>example2</op>
with empty string ("") which string will look like:
<simp>example1</simp><val>example2</val>
What I know it start with op (in <>) and ends with /op (in <>) but the content (example2) may vary.
Can you give me pointer how accomplish this?

You can use regex. Something like
<op>[A-Za-z0-9]*<\/op>
should match. But you can adapt it so that it fits your requirements better. For example if you know that only certain characters can be shown, you can change it.
Afterwards you can use the String#replaceAll method to remove all matching occurrences with the empty string.
Take a look here to test the regex: https://regex101.com/r/WhPIv4/3
and here to check the replaceAll method that takes the regex and the replacement as a parameter: https://developer.android.com/reference/java/lang/String#replaceall

You can try
str.replace(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
To remove all, use replaceAll()
str.replaceAll(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
I tried sample,
String str="<simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val>";
Log.d("testit", str.replaceAll(str.substring(str.indexOf("<op>"), str.indexOf("</op>") + 5), ""));
And the log output was
D/testit: <simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val>
Edit
As #Elsafar said , str.replaceAll("<op>.*?</op>", "") will work.

Use like this:
String str = "<simp>example1</simp><op>example2</op><val>example2</val>";
String garbage = str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5).trim();
String newString = str.replace(garbage,"");

I combined all the answers and eventually used:
st.replaceAll("<op>.*?<\\/op>","");
Thank you all for the help

Using regular expressions in JAVA how do i say 4 any letter a space and then 4 numbers

What I want is a class code like ACCT 4838.
I tried
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z][\\s][\\d][\\d][\\d][\\d]";
String REGEX = "[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\\s\\d\\d\\d\\d"
I apologize if this gets flagged i have been looking around for a while and i cant quite peg what it is im doing wrong. should be a quick one for someone.

You can use a regex like this:
(?i)^[a-z]{4} \d{4}$ // With inline insensitive flag
^[A-Za-z]{4} \d{4}$ // without inline flag
Remember to escape backslashes in java like ^[A-Za-z]{4} \\d{4}$
IdeOne example

Below works. In java the single \ gives an error. I was stupidly feeding in the wrong string in addition to not having the proper code.
String REGEX = "^[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]\s\d\d\d\d";

Best way to detect logical connectors

I'm trying to detect logical connectors in a string (AND, OR, NOT) using Java. What I would like to do is:
Given a string (e.g ((blue) AND (yellow) OR (pink)), separate each word and put them in a List. The result should be something like {"blue","yellow","pink"}
I know that for match the words, I need to use a regex like \b(AND|OR|NOT)\b.
But I don't know how return each word after or before the connector.
Other question: Is usefull use regex or maybe I have to use contains()?

How about this?
String s = "((blue) AND (yellow) OR (pink))";
s = s.replaceAll("\\(|\\)", "");
String[] words = s.split("AND|OR|NOT");
System.out.println(Arrays.toString(words));
Output:
[blue , yellow , pink]

string s="((blue) AND (yellow) OR (pink))";
s.split("\bAND\b|\bNOT\b|\bOR\b");

You could try using string.split("AND|OR|NOT");.
EDIT: oops, forgot \b:
string.split("\b(AND|OR|NOT)\b");

Parsing this kind of strings is no task for regular expressions, regular expressions represent a finite predefined automaton.
you need to use some sort of pushdown automaton for this task.
http://en.wikipedia.org/wiki/Pushdown_automaton
The easiest way to do this, is with a recursion that identifies a "logical string structure"
(...) AND (...) or (...) OR (...), NOT (...), etc...
and strips the parenthesis, and repeats until you discover a string that doesn't match this kind of structure.
This string is what you're looking for.

Extracting substrings from a string in Java

I have a number of files and they are all called something like name_version_xyz.ext.
In my Java code I need to extract the name and the version part of the filename. I can accomplish this using lastIndexOf where I look for underscore, but I don't think that's the nicest solution. Can this be done with a regexp somehow?
Note that the "name" part can contain any number of underscores.

If you are guaranteed to having the last part of your files named _xyz.ext, then this is really the cleanest way to do it. (If you aren't guaranteed this, then, you will need to figure out something else, of course.)
As the saying goes with regular expressions:
If you solve you a problem with regular expressions, you now have two problems.

You could use Regex but I think it is a bit overkill in this case. So I personally would stick with your current solution.
It is working, not too complicated and that's why I don't see any reasons to switch to another approach.

If you don't want to use regular expression I think the easiest solution is when you retrieve files and get only part without extension and then:
String file = "blah_blah_version_123";
String[] tmp = file.split("_version_");
System.out.println("name = " + tmp[0]);
System.out.println("version = " + tmp[1]);
Output:
name = blah_blah
version = 123

Yes, the regexp as a Java string would just look something like (untested)
(.+)_(\\d+)_([^_]+)\\.(???)
"name" would be group(1), "version" group(2), xyz is group(3), and ext is group(4).

Use a string tokeniser:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Or alternatively, String.split():
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29

Encoding URL strings with regular expression

I'm trying to replace several different characters with different values. For example, if I have: #love hate then I would like to do is get back %23love%20hate
Is it something to do with groups? i tried to understand using groups but i really didn't understand it.

You can try to do this:
String encodedstring = URLEncoder.encode("#love hate","UTF-8");
It will give you the result you want. To revers it you should do this:
String loveHate = URLDecoder.decode(encodedstring);

You don't need RegEx to replace single characters. RegEx is an overkill for such porposes. You can simply use the plain replace method of String class in a loop, for each character that you want to replace.
String output = input.replace("#", "%23");
output = output.replace(" ", "%20");
How many such characters do you want to get replaced?

If you are trying to encode a URL to utf-8 or some encoding using existing classes will be much easier
eg.
commons-httpclient project
URIUtil.encodeWithinQuery(input,"UTF-8");

No, you will need multiple replaces. Another option is to use group to find the next occurrence of one of several strings, inspect what the string is and replace appropriately, perhaps using a map.

i think what you want to achieve is kind of url encoding instead of pure replacement.
see some answers on this thread of SO , especially the one with 7 votes which may be more interesting for you.
HTTP URL Address Encoding in Java

As Mat said, the best way to solve this problem is with URLEncoder. However, if you insist on using regex, then see the sample code in the documentation for java.util.regex.Matcher.appendReplacement:
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
Within the loop, you can use m.group() to see what substring matched and then do a custom substitution based on that. This technique can be used for replacing ${variables} by looking them up in a map, etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

replacing regex in java string - java

You could do something like String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");

See Java regex tutorial and check out character classes and capturing groups.

The PCRE would be: /invalid_content/ For a simple substitution. What more do you want?

Is invalid_content a fix value? If so you could simply replace that with your new content using: bla = bla.replaceAll("invalid_content","new_content");

Related

Replace part of string with known beginning and end

Using regular expressions in JAVA how do i say 4 any letter a space and then 4 numbers

Best way to detect logical connectors

Extracting substrings from a string in Java

Encoding URL strings with regular expression

Categories

Resources