XML String replacement - java

How can I replace String in xml ..
I've
<schema>src/main/castor/document.xsd</schema>
I need to replace to
<schema>cs/src/main/castor/document.xsd</schema>
If I use simple , xmlInStr is the string form of xml document
xmlInStr.replaceAll(
"src/main/castor/GridDocument.xsd",
"correspondenceCastor/src/main/castor/GridDocument.xsd"
);
I Tried replace instead ,
xmlInStr.replace("src/main/castor/GridDocument.xsd".toCharArray().toString(), "correspondenceCastor/src/main/castor/GridDocument.xsd".toCharArray().toString());
it's not working . any clues
Managed like this
int indx = from.indexOf(from);
xmlInStr = xmlInStr.substring(0,indx) + to + xmlInStr.substring(indx + from.length());

String.replaceAll takes a regular expression as the first argument. Use replace instead.

You use an XML parser to parse and manipulate XML, don't try and use regular expression based string replacement mechanisms it will not work and will only bring pain and suffering.

You can use repalce or replaceAll. Anyway you have to use the value returned by this method. The method does not modify the string itself because String class is immutable.

Both replace() and replaceAll() don't actually replace anything in the string (strings are immutable). They return a new string instead, but you just discard the return value, that's why you don't see it anywhere. By the way, that .toCharArray().toString() looks completely useless to me. A character literal is already a full-fledged String.
But you really should use an XML parser instead. Unless your task is very simple and you're absolutely sure that you don't replace anything that shouldn't be replaced.

Related

Replace part of string with known beginning and end

I get some string from server with known and unknow parts. For example:
<simp>example1</simp><op>example2</op><val>example2</val>
I do not wish to parse XML or any use of parsing. What I wish to do is replace
<op>example2</op>
with empty string ("") which string will look like:
<simp>example1</simp><val>example2</val>
What I know it start with op (in <>) and ends with /op (in <>) but the content (example2) may vary.
Can you give me pointer how accomplish this?
You can use regex. Something like
<op>[A-Za-z0-9]*<\/op>
should match. But you can adapt it so that it fits your requirements better. For example if you know that only certain characters can be shown, you can change it.
Afterwards you can use the String#replaceAll method to remove all matching occurrences with the empty string.
Take a look here to test the regex: https://regex101.com/r/WhPIv4/3
and here to check the replaceAll method that takes the regex and the replacement as a parameter: https://developer.android.com/reference/java/lang/String#replaceall
You can try
str.replace(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
To remove all, use replaceAll()
str.replaceAll(str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5),"");
I tried sample,
String str="<simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val><simp>example1</simp><op>example2</op><val>example2</val>";
Log.d("testit", str.replaceAll(str.substring(str.indexOf("<op>"), str.indexOf("</op>") + 5), ""));
And the log output was
D/testit: <simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val><simp>example1</simp><val>example2</val>
Edit
As #Elsafar said , str.replaceAll("<op>.*?</op>", "") will work.
Use like this:
String str = "<simp>example1</simp><op>example2</op><val>example2</val>";
String garbage = str.substring(str.indexOf("<op>"),str.indexOf("</op>")+5).trim();
String newString = str.replace(garbage,"");
I combined all the answers and eventually used:
st.replaceAll("<op>.*?<\\/op>","");
Thank you all for the help

How to bypass reqular expression validation for specific characters

We use a library which uses the regular expression
Pattern.compile("^\\w+(\\.\\w+)*$")
which is used to validate a string .
For example abc.xyz is valid string and it passes through the validation.
As a workaround for another issue i need provide the string as abc.xyz,efg.ghi, which obviously does not get past the regex validation.Is there a way to make this string pass through the validation and if yes, how ?
PS: I tried using the escape sequences abc.xyz\\,efg\\.ghi. It did not work .
Just put comma and dot inside a character class.
Pattern.compile("^\\w+([,.]\\w+)*$");
DEMO
As it stands now, you can't pass in , characters. However, if there really is no way to change the library (e.g. proprietary), you can abuse Java's String cache + reflection to change the String literal before the proprietary class is loaded.

How to extract a specific substring in Java

I have a string that I define as
String string = "<html><color=black><b><center>Line1</center><center>Line2</center></b></font></html>";
that I apply to a JButton to get 2 lines of text on it, and that works perfectly. When I call the JButton.getText() method, it returns the whole string. What I want is to take the string it returns, and get the string "Line1Line2" from it. (So I want to remove all the HTML code and just get the text that appears on the screen.) I have tried using something like
if(string.contains("<html>"))
{
string.replace("<html>", "");
}
and then doing the same thing for all the other "<(stuff)>", but if I then print the string, I still get the whole thing. I think using regular expressions is a better way than to manually remove all the "<(stuff)>", but I don't know how.
Any help would be most appreciated!
string.replace() doesn't modify the string: a String is immutable. It returns a new string where the replacement has been done.
So your code should be
if (string.contains("<html>")) {
string = string.replace("<html>", "");
}
String is immutable, so String#replace does not change the String but rather returns the changed String.
string = string.replace("<html>", "");
and so on should do the thing.
String also has a replaceAll() method.
you could try string.replaceAll("<.*?>", "");
Also keep in mind that Strings in java are immutable and this operation will return a new String with your result

Java Regex - exclude empty tags from xml

let's say I have two xml strings:
String logToSearch = "<abc><number>123456789012</number></abc>"
String logToSearch2 = "<abc><number xsi:type=\"soapenc:string\" /></abc>"
String logToSearch3 = "<abc><number /></abc>";
I need a pattern which finds the number tag if the tag contains value, i.e. the match should be found only in the logToSearch.
I'm not saying i'm looking for the number itself, but rather that the matcher.find method should return true only for the first string.
For now i have this:
Pattern pattern = Pattern.compile("<(" + pattrenString + ").*?>",
Pattern.CASE_INSENSITIVE);
where the patternString is simply "number". I tried to add "<(" + pattrenString + ")[^/>].*?> but it didn't work because in [^/>] each character is treated separately.
Thanks
This is absolutely the wrong way to parse XML. In fact, if you need more than just the basic example given here, there's provably no way to solve the more complex cases with regex.
Use an easy XML parser, like XOM. Now, using xpath, query for the elements and filter those without data. I can only imagine that this question is a precursor to future headaches unless you modify your approach right now.
So a search for "<number[^/>]*>" would find the opening tag. If you want to be sure it isn't empty, try "<number[^/>]*>[^<]" or "<number[^/>]*>[0-9]"

Java Inner Text (getTextContents()) Problem

I'm trying to do some parsing in Java and I'm using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like
"\n\n\nValue\n-\nValue\n\n\n"
Is there a built in way to get rid of the line breaks? I would like to do a RegEx like
(?:\s*([^-]+)\s*-\s*([^-]+)\s*)
on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.
Example Input:
Value
-
Value
Thanks
You can use String.replaceAll().
String trimmed = original_string.replaceAll("\n", "");
The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "") for instance.
I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:
String s = node.getTextContents().replaceAll("\\s","");
If you just want to get rid of the leading/trailing whitespace, use trim().

Categories